• Sales: 1-800-961-2888
  • Support: 1-800-961-4454

Reading apache web logs


Whether you are dealing with web server difficulties or just want to see what Apache is up to, your best bet is to look in its logs.

Keeping tabs on your web server

Periodically, you will likely want to know more about what your web server is up to. Fortunately, Apache (like many other server applications) keeps a diary of sorts called a "log". Think of your web server as a very organized diary-writer, maintaining different diaries ("logs") for different kinds of events that have happened throughout the day. Logs are where Apache records events like visitors to your site and problems it has encountered. By default Apache writes information about its activities in two types of logs — the error log and the access log.

Error log

The error log is where your web server records anything it does not think is quite right. Much of the time what gets recorded there are actual errors, like a visiting web client requesting a file that does not exist. Sometimes you will also see warnings in there that do not indicate that a problem has occurred yet, but advise you that a particular event or configuration could cause problems later.

If you are having trouble with your web server this is the place to go first. For example, if you try to start your web server and it fails without telling you anything on the command line, it may be recording a reason in its error log. There you may find out about a misconfiguration or learn that it could not bind to the address or port for which it is configured (possibly because some other program is already using the port).

Access log

The access log is where your web server records all the visitors to your site. There you can see what files users are accessing, how the web server responded to requests, and other information like what kind of web browsers visitors are using.

The access log can be used with programs called "traffic analyzers" to track the site's usage over time.

It can also be used to watch for unusual client behaviors that indicate someone is looking for a vulnerability they can exploit to hack your machine. If someone is sending unusual requests to an application you are running on your web server (like phpmyadmin or WordPress), it is usually a good idea to make sure you are running the latest version of the software.

Where to find your web logs

The most straightforward way to find your logs is to look for the configuration directives that tell Apache where to create them.

Error log

To find the error log look in your main Apache config file. The error log should be defined there with the "ErrorLog" directive. For example:

ErrorLog /var/log/Apache2/error.log

Note that a lot of systems will restrict the permissions for Apache's log directory to just root, so you may need to use the sudo command to look at the error log. For instance:

sudo cat /var/log/Apache2/error.log

Access log

The access log is typically defined inside a virtual host block but can sometimes have a default defined in the main Apache config file. You will want to look for the "CustomLog" directive:

CustomLog /var/log/Apache2/access.log combined

The first argument to CustomLog gives the file's location. The second argument ("combined") defines the format of the log. We will get into what that means and how to change it later.

If a default CustomLog is defined in the main Apache configuration and a different CustomLog is defined within a virtual host, the access log (or logs) defined in the virtual host will replace the default access log for just that virtual host.

Reading the logs

Now that you know where to find the logs, let's look at what is inside each. And most importantly, let's look at what they can tell you about your web server.

Error log

The error log is where the server will log errors, as they occur. These are usually errors the program encountered when trying to start a process or use a module, but they can also be errors that were sent to web clients, like a "file not found" error.

An error log entry for a file not found error would look something like:

[Mon Aug 23 15:25:35 2010] [error] [client 80.154.42.54] File does not exist: /var/www/phpmy-admin

In this case, a web client tried to visit a page in a "phpmy-admin" directory that did not exist. It was a script looking for an exploitable version of that software. It is a good indication that a program like fail2ban (needed to block activity like this) be installed.

Error log components

The first part of the log entry is the date and time (server time) when the event occurred. Apart from just being informative, that time can be useful for looking for entries in other logs at the same time. In this case, check the access log to see the full URL that the web client tried to visit. If it was an error that indicated a module had trouble talking to a database, then look in the database server's logs at the same time to see what prevented the connection from happening.

The next part, "[error]", describes the level of the alert. This will often be "error", but sometimes other levels will indicate that the message logged is just a warning, or it may represent a critical error that caused the web server to shut down or fail to start.

The next part, "[client 80.154.42.54]", shows the source of the error. In this case the source is a web client, so the visitor's IP address was logged.

The last part of the log entry is the error itself.

Combined log format

The most common format for web log entries (and the default for most modern web servers) is the "combined" format, also referred to as "CLF" (Combined Log Format). A log entry in combined log format might look like this:

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1" 200 2 "http://www.example.com/wordpress3/wp-admin/post-new.php" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3"

When broken down, the log entry it contains a standard set of information in a standard order.

Combined log components

The first entry is the IP address of the web client accessing your server.

The second entry above is "-", which is what gets logged when there is nothing to put in that part of the log. In this case, the entry would represent the name of a remote log, if one were being used. You will pretty much always see "-" here.

The third entry above is another "-". That slot contains the username the web client was authorized under, if any. If you enabled password protection for a file or directory, then the username the visitor used to log in would be recorded here.

The next entry is the date and time of the access.

The next entry is the first line of the request the web client sent to the server. In this case it is:

POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1

That entry means the web client sent a "POST" request (a submission of information) to the file at "/wordpress3/wp-admin/admin-ajax.php". That is a relative location, which means that if you wanted to find that file you would start at the document root of that virtual host. If your document root was "/var/www", then the file being accessed above would be at "/var/www/wordpress3/wp-admin/admin-ajax.php". The last entry describes the protocol used for the request, in this case HTTP version 1.1.

The next entry tells us the status code that was returned for the request. The code above, "200", is hopefully one you will see most often in your access logs — it means that the file was found and served to the client. Other common status codes are "403" (access forbidden) and "404" (file not found). We go into more detail about status codes in interpreting common status codes in web logs, but for a full list you can visit the official W3C website's list of status codes.

The next number is the size of the response your server sent, in bytes. In this case it was a very small response (2 bytes), so it was likely just an acknowledgement from the server rather than a full page access.

The next entry is the "referrer URL". In this case the entry is:

http://www.example.com/wordpress3/wp-admin/post-new.php

That is the page the web client visited before sending the recorded access request. Usually that means it is the page that linked to the one they accessed. The referrer can be useful information if you are wondering where people are finding links to your site (from a Google search, or a link from a partner site), or if you want to find the page that contained a bad link if the access entry was an error.

The last entry is called the "user agent". Most of the time that just means it is the identifier used by the web browser the visitor used. In this case, the user agent was:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3

The user agent often specific. In this entry the web browser told the server not only what its name was (Chrome, in this case), but also what operating system it is running on (Mac OS X), the version of the browser and the system, and the components that the web browser uses from the operating system. This information can be used by a web application to tailor the page it returns to look best on that particular visitor's browser.

Putting them together

To give another example, let's look in again on the would-be intruder from earlier. Reviewing the error log, note the time of the attempted access to the non-existent directory. By looking at the same time in the access log see if more information about what he tried to access:

80.154.42.54 - - [23/Aug/2010:15:25:35 +0000] "GET /phpmy-admin/scripts/setup.php HTTP/1.1" 404 347 "-" "ZmEu"

There is the same IP address and the same time. So we can see that his script used a "GET" method (a request for a page) to ask for the setup script for php-myadmin. The "404" status means that the file was not found. And while the user agent entry is not any web browser on the market, some web searches will turn up other people who have been hit by what is probably the same script. So even if that user agent is not the browser, it can be useful in determining the type of attack your site was experiencing, and how many 404s you can expect your server to have to handle when it hits you.

Common log format

It is not used a lot anymore, but you may run into a CustomLog directive that uses the log format of "common" in some older configurations of Apache. The "common" format is similar to "combined", but omits the referrer and user agent information at the end of the log entry. Otherwise it can be read the same way as a "combined" format log.

Useful commands

Here are a few commands that can make browsing log files a little quicker or easier. These are very basic overviews of the commands in question. For more information on each you can check their respective main pages.

cat

The "cat" command simply displays the contents of a file. To see the whole error log all at once, run:

sudo cat /var/log/Apache2/error.log

less

If the log you want to look at is particularly large, you probably do not want to look at the whole thing at once. To browse through a file you can use the "less" command:

sudo less /var/log/Apache2/error.log

While less is displaying a file you can hit the space bar to page down, and the up and down arrow keys on your keyboard to scroll up or down one line at a time. tail

The "tail" command returns lines from the end of a file. By default it displays the last ten lines of the file, so this command would display the last ten lines of an access log:

sudo tail /var/log/Apache2/access.log

To specify the number of lines to grab, use the "-n [number]" option. To display the last 100 lines of the access log, you could run:

sudo tail -n 100 /var/log/Apache2/access.log

You can also save yourself a little typing by just using "-[number]" instead of "-n [number]", as in:

sudo tail -100 /var/log/Apache2/access.log

The tail command is useful if you are just looking for recent activity in a log. If you want to watch the end of a file for changes as they happen, you can use the "-f" option:

sudo tail -f /var/log/Apache2/access.log

With this version of tail running, when a new line is added to the log file you will see it on your screen too. To get out of tail when it is in this mode use control-C.

grep

If you are looking for a particular item in a web log (like a certain IP address, or any "404" responses), skimming through the log manually can be tiresome. It is easier to let the "grep" command do the work for you.

The grep command will look through its input, or a file, and return any lines that contain the search term sent to it. To look for the term "404" in an access log, you might run:

sudo grep 404 /var/log/Apache2/access.log

The first argument is the text grep is searching for, and the second argument is the file to search.

If you want to look for a phrase, you can do that by enclosing the phrase in quotes. To look for requests for a particular file, you could run:

sudo grep "GET /images/butterfly.jpg" /var/log/Apache2/access.log

By default grep's searches are case-sensitive. If you specify capital letters like "GET", then lines with "get" in lowercase letters will not be returned as hits. To make the search case-insensitive, pass grep the "-i" option, as in:

sudo grep -i "get /images/butterfly.jpg" /var/log/Apache2/access.log

You can combine tail and grep by using a symbol called a "pipe":

sudo tail -n 100 /var/log/Apache2/access.log | grep 404

The first part of that statement just lists the last 100 lines of the access log. The next character, "|", is the "pipe". It redirects the output of the last command and sends it to the next command in the statement. In this case that second command is "grep", searching for 404. So the above command would return any 404 errors found in the last 100 access log entries.

Summary

You should now have a better handle on what kind of web logs you have at your disposal and how to read them. There are several other avenues you can pursue from here.

If you are seeing an unusual status code for some requests, you might take a look at a interpreting common status codes in web logs and what can cause them.

If you are having a problem and the web logs do not tell you what is wrong, review the linked guide to server troubleshooting for other approaches to identifying the issue.

If you reviewed the access logs and felt they were missing something, or thought they were entirely too busy, you might look into customizing your apache web logs.

Either way, remember: Don't underestimate the utility of logs (for a web server, or any other application).







© 2011-2013 Rackspace US, Inc.

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License


See license specifics and DISCLAIMER