The awstats program is a versatile tool for generating web traffic reports. In this article we will review a simple awstats installation to track statistics for your cloud site.
If you run a web site, you might be curious about statistics such as
Web traffic analysis can be useful in obtaining these statistics. There are many options for analyzing network traffic. This article series will explain a program called awstats. Awstats scans web logs and uses its findings to generate reports. The reports break down the data and reveal information such as the popular sections of your site, what search terms were used to arrive at your site, and what search engines have recently spidered your site.
This article will detail a simple approach for analyzing logs with awstats. Awstats has some useful features that we will not be using with this simple approach, features such as dynamically generating reports via CGI; however, the benefit of this simple approach is that it is light on resource use and easy to set up no matter what web server you use.
The approach awstats uses is called log analysis. The analyzer program goes through your web logs line by line and sorts out what files got served and where the requests came from. Because this approach does not rely on the visiting web browser to execute any specific code properly, the numbers for total traffic gathered by a log analyzer are closer to an accurate tally. The downside to this approach is that the only identifying information recorded about a visitor is their IP address. This is not always a reliable way to distinguish between users since several users can be behind the same proxy server or firewall. Log analysis provides better information about how much traffic your web server handles, but is less reliable when it comes to determining site usage patterns.
Both page tagging and log analysis have complementary strengths and weaknesses. Usage reports from both approaches can be used to obtain an accurate assessment of your web server.
Choose the virtual host you want to analyze before installing awstats. You can analyze more than than one host, but make sure each virtual host is logging to its own access log. Follow the steps in this article for each virtual host. It is possible to use a single log for multiple sites; however, this is more complicated and is not recommended.
Start by verifying that you have a web server, then verify that the virtual host to be tracked is logging in the right format. While awstats can handle other log formats, we recommend using the standard combined web log format. Most web servers, like nginx or lighttpd, use the combined log format by default.
If you use apache, it might be logging in either a combined or a common log format. To determine which, check your virtual host config file and look for the CustomLog directive:
CustomLog /var/www/access.log combined
Notice the last word in the directive. It is "combined" in the example above. The last word in the virtual host config file is the log format. If the log format word is missing from the config file, or if it is "common," change the config file to "combined" for the format instead. For more information on the combined log format, check out this article for Apache or this article for nginx. If you changed the format word used for the virtual host log, remember to reload the web server to implement the change.
The awstats scripts use a scripting language called Perl. It has many uses and may already be installed on your machine. To verify that it is installed, run this command:
If you get a response that gives you a Perl version, then it is installed. If you get a "command not found" error, then install Perl through your distribution's package manager, like yum or aptitude.
In our example, we will not use the Linux distribution's pre-packaged version of awstats. The awstats program is updated regularly and new versions include data on the latest web browser and operating system identifiers. It is best to install the source package for awstats and manually update it regularly so that you have the most accurate reports possible.
You can get the latest version of awstats from the project's download page. You can choose between the latest beta for cutting-edge reporting data or the latest stable version. In this guide we are using the latest stable version (7.1 at the time of this writing).
tar -xvzf awstats-6.95.tar.gz
You should now have a directory named after the awstats version, like awstats-6.95. Move this directory to the location where awstats will be installed. In this example, /usr/local/awstats was used:
sudo mv awstats-6.95 /usr/local/awstats
To update to another version of awstats later, repeat these steps with the new version. Replace the old awstats directory with the new one.
Because the reports are static html pages, the output directory for your reports can be located wherever you prefer. The only requirement is that they be accessible from your web browser for viewing. In this example we will track traffic for "www.example.com," so we will put that site's files in a directory in the "demo" user's home directory. We will call that directory "webstats" for awstats reports:
mkdir -p /home/demo/public_html/example.com/public/webstats
The html pages created when awstats generates reports contain images placed to make the reports more engaging. The command below is used to make a copy of those images available for the reports we generate:
cp -R /usr/local/awstats/wwwroot/icon /home/demo/public_html/example.com/public/webstats/awstatsicons
When you update awstats, recopy this directory as well in order to catch any additions such as icons for new browsers.
Create a directory where awstats can store the data it uses to generate reports. The default directory is "/var/lib/awstats."
sudo mkdir -p /var/lib/awstats
Set up a config file to tell awstats how to process the logs for your domain:
sudo mkdir -p /etc/awstats
sudo cp /usr/local/awstats/wwwroot/cgi-bin/awstats.model.conf /etc/awstats/awstats.www.example.com.conf
Working with the config file you just created, edit the following line using your preferred text editor:
Change "www.example.com" to the name of the domain you will be tracking. If you receive a message that the file does not exist, verify that you typed the file name correctly. At this stage of installtion, the file will contain many items. Because this is a basic installation, it is only necessary to change a few config settings at the beginning of the file. Below is a list of important awstats directives and their default values:
The LogFile directive tells awstats where to find the log it will be analyzing. For our example, we will change that value to the location of example.com's access log:
The LogFormat directive is a setting you do not need to change; however, you can change it if you have your domain logging in a custom format, or if you want to use another standard format like the common log format. The commented text that precedes this directive explains how to tell awstats what your log format looks like.
The following are predefined log formats:
|Predefined Log Format||Value|
|Combined log format (default)||"1"|
|Common log format||"4"|
The SiteDomain directive tells awstats what your site's main domain name is. If you usually direct visitors to "www.example.com," you would change this setting to:
HostAliases="localhost 127.0.0.1 REGEX[myserver\.com$]"
The HostAliases directive informs awstats of all the different domains visitors might use to get to your site. It is there to separate external ULRs from local or internal URLs. The default has some unusual "REGEX" items that describe a "regular expression." This is a flexible but daunting way to describe a search filter.
The example above checks only for "myserver.com." Regular expressions are not simple, but they are useful when you know how to use them. The only requirement for this setting is a list of domains. It is a good practice to add "localhost" and "127.0.0.1" in this directive, and to add your main domain name as well as any alternates you have for the site. These should be separated by spaces:
HostAliases="www.example.com example.com www.olddomain.com olddomain.com localhost 127.0.0.1"
The DNSLookup is a directive that should not be changed. At its current setting, awstats will not run DNS lookups on visitor IP addresses. The only information awstats might glean is what country the visitor is in. This can be nice to know and chart, but it is not worth the time and effort required for awstats to make all those DNS queries.
If you want country data for visitors, check the awstats plugin site for alternatives to DNS lookup. These plug-ins also impact awstats performance, but not to the degree that DNS lookups do.
This directive tells awstats where to store the data it uses to generate reports, i.e., the data directory you created in step 2. If you did not use "/var/lib/awstats," then remember to change this value to point to your data directory.
The generated awstats reports reference images using this value. Th icon directory is relative to the location of the reports. For our example, it points to the "awstatsicons" directory we created when we copied the default images directory. For your purposes, change the directory name so the generated reports can find the images in your directory.
In this article we reviewed the following basic steps to install Awstats on Linux:
Follow these steps for any additional domains you want to track. You can share the data and report directories across multiple virtual hosts; however, you must use different logs and config files for each site.
In the next article we will explain how to build and view reports, and how to schedule awstats to generate reports on a regular basis.
© 2011-2013 Rackspace US, Inc.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License