Scheduling Awstats Report Generation


The previous article, Generating and Viewing Awstats Reports explained how to manually run awstats reports. In this article, we will review the steps you need to automate report generation. This procedure eliminates the need to manually generate awstats reports.

Automating awstats

The command to update is large and time-consuming to run every time you want to view updated stats. Fortunately, it is easy to automate procedures in Linux. While this can be done by creating a cron script to do the updating, it is simpler to use logrotate, a built-in tool awstats that regulalrly processes your web logs. With this tool you can piggyback updates onto logrotate's regular rotation tasks.

Scheduling reports with logrotate

With this approach we'll take advantage of the fact that logrotate is already performing regular log rotation for your domain. If you don't have logrotate working, then visit this article series on logrotate and follow the directions there to set up log rotation for your virtual host. Log rotation keeps awstats logs from becoming giant disk-space-eating behemoths,

Once you have logrotate managing your web logs, the next step to what logrotate does when it performs the rotation.

Editing the logrotate entry

This is a logrotate.d file for a virtual host that is a modification of the default entry for apache on Ubuntu:

/home/demo/public_html/example.com/logs/*.log {
    weekly
    missingok
    rotate 52
    compress
    delaycompress
    notifempty
    create 640 root adm
    sharedscripts
    postrotate
        if [ -f "`. /etc/apache2/envvars ; echo ${APACHE_PID_FILE:-/var/run/apache2.pid}`" ]; then
            /etc/init.d/apache2 reload > /dev/null
        fi
    endscript
    }
    1. First, take a look at the "weekly" directive in that logrotate file. This works in the case of a simple log rotation, but if you want your web traffic stats updating daily, then change the directivet to "daily." This causes the rotate script to run more frequently. You can also modify the "rotate 52" entry to keep more archived log files.
    2. Next we look at what we want to add to the logrotate process. See that "postrotate" block up there? Don't worry about what's inside, just note that it's there. What that does is run some stuff after the log rotation is done (in this case, tell apache to reload if it's running). The reason we're looking at it is because there's also a "prerotate" directive that we can use to have awstats run through a log file before it's rotated.
    3. The "prerotate" directive should run the stats update and generate the reports. And it should run just like the command we ran to get our reports. We would create a prerotate block like:
      prerotate
          /usr/local/awstats/tools/awstats_buildstaticpages.pl -update -config=www.example.com -dir=/home/demo/public_html/example.com/webstats -awstatsprog=/usr/local/awstats/wwwroot/cgi-bin/awstats.pl > /dev/null
      endscript

      The "> /dev/null" bit redirects the normal output of the command so it doesn't get sent to the console or emailed to root.

      Inserted into the existing logrotate file for our virtual host, the whole thing would look like:

      /home/demo/public_html/example.com/logs/*.log {
          daily
          missingok
          rotate 52
          compress
          delaycompress
          notifempty
          create 640 root adm
          sharedscripts
          prerotate
              /usr/local/awstats/tools/awstats_buildstaticpages.pl -update -config=www.example.com -dir=/home/demo/public_html/example.com/public/webstats -awstatsprog=/usr/local/awstats/wwwroot/cgi-bin/awstats.pl > /dev/null
          endscript
          postrotate
              if [ -f "`. /etc/apache2/envvars ; echo ${APACHE_PID_FILE:-/var/run/apache2.pid}`" ]; then
                  /etc/init.d/apache2 reload > /dev/null
              fi
          endscript
      }

Every time the logs get rotated, the web stats get updated too.

Why use logrotate?

Instead of using logrotate to run the web stats update we could just schedule the stats to update through cron. So why insert the commands into logrotate?

The main reason is accuracy. By sticking a web stats update into the log rotation process we make sure that awstats is looking at log entries up until the last possible moment, when the log is rotated. If you have the web server reloading instead of restarting there might be a couple log entries missed (since the old log file would still be used as the old connections finish). The information lost in that case is negligible and not worth the downtime required to fully restart the web server.

If you want the web traffic reports to update more often than the web logs are rotated, you can use cron to run the update and report-building scripts on a more frequent schedule (like hourly). Awstats will recognize log entries that have already been processed and skip them when analyzing the data.

Monthly reports

Something you'll notice if you leave awstats running the way we've set it up is that the reports being generated only go into detail for the current month. Dividing the information up by month is a decent interval, but you may want to look at previous months in detail instead of only the current one.

So it's entirely optional, but if you'd like to keep monthly reports around it's really just a matter of tailoring a shell script to fit your site.

A cron script

We'll name the script after the awstats config file for the site in question and put it in the cron monthly directory. Using "www.example.com", we would create the file:

/etc/cron.monthly/awstats.www.example.com

Inside the file put the following script:

#!/bin/sh
#
# Run the awstats build script to generate a report for last month.
#

# Modify these 3 variables for your environment:
#
# The location of the awstats installation
AWSTATSDIR=/usr/local/awstats

# Your main domain, as you reported it to awstats
DOMAIN=www.example.com

# The directory where you're storing the reports for this domain
REPORTDIR=/home/demo/public_html/example.com/public/webstats

LASTMONTH=`date -d "last month" +%B`
LASTMONTHNUM=`date -d "last month" +%m`
LASTMONTHDIR=$REPORTDIR/$LASTMONTH

mkdir -p $LASTMONTHDIR
cp -Rf $AWSTATSDIR/wwwroot/icon $LASTMONTHDIR/awstatsicons

$AWSTATSDIR/tools/awstats_buildstaticpages.pl -month=$LASTMONTHNUM -update -config=$DOMAIN -dir=$LASTMONTHDIR -awstatsprog=$AWSTATSDIR/wwwroot/cgi-bin/awstats.pl > /dev/null

Change the values of "AWSTATSDIR", "DOMAIN", and "REPORTDIR" to match your environment and site.

After saving the file make your new script executable:

sudo chmod a+x /etc/cron.monthly/awstats.www.example.com

You can even run that script now to test it (and to get a detailed report for last month's data, if you have some). If you do, be sure to run it using sudo.

What it does

When the script runs it creates (if it doesn't already exist) a directory named after last month. If the current month is September, the directory will be named "August" (or the equivalent for your machine's locale setting). The awstats icons directory will be copied there, then a report will be generated for last month and put in the month's directory.

To look at a monthly report you'll use a similar URL to viewing the current report, but you'll insert the name of the month you want to view in front of the page name. The first letter of the monthly directory names is capitalized, so it will have to be capitalized in the URL used to visit a monthly report as well.

To take our example URL from earlier and use it to look at the August statistics we would change it to:

http://www.example.com/webstats/August/awstats.www.example.com.html

Keeping more monthly reports

Note that the way the script is written the monthly reports get replaced every year (since they're only made available by month name, not by year).

If you want monthly reports to never get overwritten you can modify the script to add the year to the directory names. Change the line that defines "LASTMONTH" above to something like:

LASTMONTH=`date -d "last month" +%B-%Y`

If the month were September, the above line would cause the script to use the directory name "August-2010".

Further reading

You have a solid but basic installation of awstats now. If you want to get more out of awstats there are a few advanced features you can look into.

For starters, you can browse the awstats documentation online.

If you want to generate reports on the fly you can do so by setting up the main awstats.pl script to run as a CGI script from a web browser. It would be a good idea to still run the stats update through a schedule. You'll want to be familiar with using CGI with a web server, and be aware of the risks that can be involved (for both performance and security). Then check the awstats install docs and take a look at the script they provide for automating aspects of that configuration.

If you want to extend awstats there are plugins available on the project's web site. A couple of the more interesting ones let awstats determine the country of origin of visitor IP addresses without launching a lot of cumbersome DNS lookups.

With a little tweaking and help from the documentation, awstats can also be used to build reports for mail and FTP servers.

Digging through all the options in the config file will give you an idea of what sorts of changes you can make to reports and their formatting.

Summary

You should have a nice setup now, with awstats generating reports automatically. Remember that you can run reports for more than one domain, you'll just need to set up separate configs and automated schedules for them.

-- Jered



Was this content helpful?




© 2014 Rackspace US, Inc.

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License


See license specifics and DISCLAIMER