Reading Nginx Web Logs
Whether you're dealing with web server difficulties or just want to see what nginx is up to, your best bet is to look in its logs.
Keeping tabs on your web server
Sooner or later you'll want to know more about what your web server is up to. Luckily, nginx (like many other server applications) keeps a diary of sorts called a "log". Well, actually, more than one log, so the analogy isn't terribly good. Unless you think of your web server as a very organized diary-writer, maintaining different diaries for different kinds of events that have happened throughout the day. Still not a great analogy, but it will do.
In plainer terms: Logs are where nginx records events like visitors to your site and problems it's encountered. By default nginx writes stuff about its activities in two types of logs — the error log and the access log.
The error log is where your web server records anything it doesn't think is quite right. Much of the time what gets recorded there are actual errors, like a visiting web client requesting a file that doesn't exist. Sometimes you'll also see warnings in there that don't indicate that a problem has occurred yet, but advise you that a particular event or configuration could cause problems later.
If you're having trouble with your web server this is the place to go first. For example, if you try to start your web server and it fails without telling you anything on the command line, it may be recording a reason in its error log. There you may find out about a misconfiguration or learn that it couldn't bind to the address or port it's configured for (possibly because some other program is already using the port).
The access log is where your web server records all the visitors to your site. There you can see what files users are accessing, how the web server responded to requests, and other information like what kind of web browsers visitors are using. The access log can be used with programs called "traffic analyzers" to track the site's usage over time. It can also be used to watch for unusual client behaviors that indicate someone is looking for a vulnerability they can exploit to hack your machine. If someone is sending unusual requests to an application you're running on your web server (like phpmyadmin or WordPress), it's usually a good idea to make sure you're running the latest version of the software.
Where to find your web logs
Before you can read your logs you'll need to find them. The most straightforward way to do that is to look for the configuration directives that tell nginx where to create them.
To find the error log look in your main nginx config file. The error log should be defined there with the "error_log" directive. For example:
Note that a lot of systems will restrict the permissions for nginx's log directory to just root, so you may need to use the sudo command to look at the error log. For instance:
sudo cat /var/log/nginx/error.log
The access log is typically defined inside a virtual host block but can sometimes have a default defined in the main nginx config file. You'll want to look for the "access_log" directive:
The argument to access_log gives the file's location.
You can also supply a second argument to access_log that describes the format of the access log. Don't worry about that part just yet, but know that without a log format specified access_log will use the standard "combined log format" (CLF).
If a default access_log is defined in the main nginx configuration and a different access_log is defined within a virtual host, the access log (or logs) defined in the virtual host will replace the default access log for just that virtual host.
Reading the logs
Now that you know where to find the logs, let's look at what's inside each. And most importantly, let's look at what they can tell you about your web server.
The error log is where the server will log, well, errors. These are usually errors the program encountered when trying to start a process or use a module, but they can also be errors that were sent to web clients, like a "file not found" error.
An error log entry for a file not found error would look something like:
2010/08/23 15:25:35 [error] 19997#0: *1 open() "/var/www/nginx-default/phpmy-admin/scripts/setup.php" failed (2: No such file or directory), client: 188.8.131.52, server: localhost, request: "GET /phpmy-admin/scripts/setup.php HTTP/1.1", host: "www.example.com"
In this case, a web client tried to visit a page in a "phpmy-admin" directory that didn't exist. Fortunately I happen to know that I don't have phpmy-admin installed, so it's not a broken link I need to fix. It's just some script kiddie looking for an exploitable version of that software. It's a good indication that I should install a program like fail2ban to block people like him.
Error log components
The first part of the log entry is the date and time (server time) when the event occurred. Apart from just being informative, that time can be useful for looking for entries in other logs at the same time. In this case I could check the access log to see more information about the access attempt and search for any successful connections that may have come from that source (so I can make sure the accessed applications weren't compromised). If it were an error that indicated a module had trouble talking to a database, then I could look in the database server's logs at the same time to see what prevented the connection from happening.
The next part, "[error]", describes the level of the alert. This will often be "error", but sometimes other levels will indicate that the message logged is just a warning, or it may represent a critical error that caused the web server to shut down or fail to start.
The next part of the log entry describes the error itself. In this case the error is that when someone tried to access a file (the "open()" function), that attempt failed because the file did not exist ("No such file or directory").
The remaining parts of the error log entry describe the source of the error and what kind of request triggered the error. The "client:" entry shows the source of the request (the IP address), "server:" gives the nginx server name that recorded the error, "request:" lists the actual request that triggered the error, and "host:" tells you what virtual host was accessed.
Combined log format
The most common format for web log entries (and the default for most modern web servers) is the "combined" format, also referred to as "CLF" (Combined Log Format). A log entry in combined log format might look like this:
184.108.40.206 - - [23/Aug/2010:03:50:59 +0000] "POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1" 200 2 "http://www.example.com/wordpress3/wp-admin/post-new.php" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3"
There's a lot of stuff there, but when you break the log entry down it contains a standard set of information in a standard order.
Combined log components
The first entry is the IP address of the web client accessing your server.
The second entry above is "-", which is what gets logged when there's nothing to put in that part of the log. In this case, the entry would represent the name of a remote log, if one were being used. You'll pretty much always see "-" here.
The third entry above is another "-". That slot contains the username the web client was authorized under, if any. If you enabled password protection for a file or directory, then the username the visitor used to log in would be recorded here.
The next entry is the date and time of the access.
The next entry is the first line of the request the web client sent to the server. In this case it's:
POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1
That entry means the web client sent a "POST" request (a submission of information) to the file at "/wordpress3/wp-admin/admin-ajax.php". That's a relative location, which means that if you wanted to find that file you'd start at the document root of that virtual host. If your document root was "/var/www", then the file being accessed above would be at "/var/www/wordpress3/wp-admin/admin-ajax.php". The last entry describes the protocol used for the request, in this case HTTP version 1.1.
The next entry tells us the status code that was returned for the request. The code above, "200", is hopefully one you'll see most often in your access logs — it means that the file was found and served to the client. Other common status codes are "403" (access forbidden) and "404" (file not found). We go into more detail about status codes in another article, but for a full list you can visit the official w3 website's list of status codes.
The next number is the size of the response your server sent, in bytes. In this case it was a very small response (2 bytes), so it was likely just an acknowledgement from the server rather than a full page access.
The next entry is the "referrer URL". In this case the entry is:
reThat's the page the web client visited before sending the recorded access request. Usually that means it's the page that linked to the one they accessed. The referrer can be useful information if you're wondering where people are finding links to your site (from a Google search, or a link from a partner site), or if you want to find the page that contained a bad link if the access entry was an error.
The last entry is called the "user agent". Most of the time that just means it's the identifier used by the web browser the visitor used. In this case, the user agent was:
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3
The user agent is pretty specific sometimes. In this entry the web browser told the server not only what its name was (Chrome, in this case), but also what operating system it's running on (Mac OS X), the version of the browser and the system, and the components that the web browser uses from the operating system. It's usually a lot more than you need, but if you know your site will display differently in different browsers, all that information can be used by a web application to tailor the page it returns to look best on that particular visitor's browser.
Putting them together
Whew! Lots of stuff there, but it's useful stuff. To give another example, let's look in again on our would-be intruder from earlier. Looking in the error log I saw what time he tried to access the non-existent directory. By looking at the same time in the access log I can see more information about what he tried to access:
220.127.116.11 - - [23/Aug/2010:15:25:35 +0000] "GET /phpmy-admin/scripts/setup.php HTTP/1.1" 404 347 "-" "ZmEu"
There's the same IP address and the same time. So we can see that his script used a "GET" method (a request for a page) to ask for the setup script for php-myadmin. The "404" status means that the file wasn't found. And while the user agent entry certainly isn't any web browser on the market, some web searches will turn up other people who have been hit by what is probably the same script. So even if that user agent isn't the browser, it can be useful in determining the type of attack your site was experiencing, and how many 404s you can expect your server to have to handle when it hits you.
Here are a few commands that can make browsing log files a little quicker or easier. These are very basic overviews of the commands in question. For more information on each you can check their respective man pages.
The "cat" command simply displays the contents of a file. To see the whole error log all at once, you might run:
sudo cat /var/log/nginx/error.log
If the log you want to look at is particularly large, you probably don't want to look at the whole thing at once. To browse through a file you can use the "less" command:
sudo less /var/log/nginx/error.log
While less is displaying a file you can hit the space bar to page down, and the up and down arrow keys on your keyboard to scroll up or down one line at a time.
The "tail" command returns lines from the end of a file. By default tail displays the last ten lines of a file, so this command would display the last ten lines of an access log:
sudo tail /var/log/nginx/access.log
To specify the number of lines to grab, use the "-n [number]" option. To display the last 100 lines of the access log, you could run:
sudo tail -n 100 /var/log/nginx/access.log
You can also save yourself a little typing by just using "-[number]" instead of "-n [number]", as in:
sudo tail -100 /var/log/nginx/access.log
The tail command is useful if you're just looking for recent activity in a log. If you want to watch the end of a file for changes as they happen, you can use the "-f" option:
sudo tail -f /var/log/nginx/access.log
With this version of tail running, when a new line is added to the log file you'll see it on your screen too. To get out of tail when it's in this mode use control-C.
If you're looking for a particular item in a web log (like a certain IP address, or any "404" responses), skimming through the log manually can be tiresome. It's easier to let the "grep" command do the work for you.
The grep command will look through its input, or a file, and return any lines that contain the search term sent to it. To look for the term "404" in an access log, you might run:
sudo grep 404 /var/log/nginx/access.log
The first argument is the text grep is searching for, and the second argument is the file to search.
If you want to look for a phrase you can do that by enclosing the phrase in quotes. To look for requests for a particular file, you could run:
sudo grep "GET /images/butterfly.jpg" /var/log/nginx/access.log
By default grep's searches are case-sensitive. If you specify capital letters like "GET", then lines with "get" in lowercase letters won't be returned as hits. To make the search case-insensitive, pass grep the "-i" option, as in:
sudo grep -i "get /images/butterfly.jpg" /var/log/nginx/access.log
You can combine tail and grep by using what's called a "pipe":
sudo tail -n 100 /var/log/nginx/access.log | grep 404
The first part of that statement just lists the last 100 lines of the access log. The next character, "|", is the "pipe". It redirects the output of the last command and sends it to the next command in the statement. In this case that second command is "grep", searching for 404. So the above command would return any 404 errors found in the last 100 access log entries.
You should now have a better handle on what kind of web logs you have at your disposal and how to read them. There are several other avenues you can pursue from here.
If you're seeing an unusual status code for some requests, you might take a look at a list of common codes and what can cause them.
If you're having a problem and the web logs don't tell you what's wrong, take a gander at this guide to server troubleshooting for other approaches to identifying the culprit.
If you looked the access logs over and felt they were missing something, or thought they were entirely too busy, you might look into customizing your access logs.
Either way, remember: Don't underestimate the utility of logs (for a web server, or any other application). Sometimes they really do contain all the answers.
© 2011-2013 Rackspace US, Inc.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License
See license specifics and DISCLAIMER