The status codes you find in your web logs are useful troubleshooting tools, but only if you know what they mean.
When a web browser talks to a web server, the server lets the client know the status of its request by sending a "status code". This status code will show up in the access logs of the server as a number. There are a lot of different status codes that can be passed to a web client, and you can view the full list at W3C's website.
Fortunately there are only a few status codes that you're likely to see in your access logs, so consider the following descriptions to be highlights from the full list of status codes.
The 200 status code indicates that the request was successful. This is the one you want to see in your logs. At its most basic it means that when a web browser asked for a file, the server was able to find it and send it back to the browser.
The 403 status code indicates that the server is not allowed to respond to the web client's request.
One circumstance that can cause a 403 status is if you do not have "Indexes" enabled for a directory, and the directory doesn't have an index file in it that the server can access. In other words, the client asked for a directory, and the server doesn't find anything there it can show to the client.
A more common circumstance is that the permissions on the file or directory being requested don't allow access by the web server's user. If the web server is running as user "www-data", any files you want the web server to serve will have to be accessible by the user "www-data". For example, if a directory's permissions look like:
drwx------ 5 root root 4096 2009-12-18 01:39 wordpress
Then the user "www-data" will not be able to access any of the files inside. Requests sent to the server that ask for the "wordpress" directory or any of its contents will yield 403 status codes instead of serving the file requested.
For more information on how Linux file permissions work, you can read this article series. In a nutshell, the web server user needs to have read permission for files in order to serve them, and it has to have read and execute permissions for directories in order to see files inside them.
A 404 status code means that the requested file could not be found. If you see this error often you should check the links on your site to make sure they're pointing to the right places.
Since the filesystem is case-sensitive you should also make sure the capitalization matches between the request in the URL and the name of the file on the disk. For example, if a file is named "File.txt" and the URL requests "file.txt", the file won't be found by the web server. Either the URL or the file name would need to be changed so the capitalization matches in both instances.
A couple commonly-requested files are worthy of note.
If you see 404 errors connected to a file named "robots.txt", that's the result of a spider program (like web search engines use) checking to see what your preferences are for indexing your site.
If you don't want to restrict the access of web spider robots to your site, you can just create an empty robots.txt file and the 404 errors will go away.
The robots.txt file can be useful if there are parts of the site that you want search engines to ignore. If you don't want search engines to record anything in the "orders" or "scripts" directories on your site, for example, you could use the following robots.txt file:
User-agent: * Disallow: /orders/ Disallow: /scripts/
A slash at the end of a disallow will let the search engine robot know that it refers to a directory.
The "User-agent" part of the file describes what user agent the robots.txt would apply to. The "*" means that you want the rule to apply to everybody. You can have more than one User-agent entry in a robots.txt file, as in:
User-agent: EvilSearch Disallow: /
User-agent: * Disallow:
In that file, the EvilSearch engine's robot would be asked not to record anything on the site (thus the "/"), while everything else will be allowed to record anything they can find (which is what the empty argument to Disallow means).
Note that the robots.txt instructions aren't enforced in any way. A spider can freely ignore them. The better search engines (the ones you've heard of) tend to obey the robots.txt file, while spiders used by spammers and email harvesters will ignore robots.txt entirely.
Any 404 errors connected to "favicon.ico" are the result of a web browser checking for a favorites icon for the site. That's another file not found error that can be safely ignored if you don't want to make a favorites icon for the site.
The favorites icon is often used by modern browsers both as an icon in a bookmarks list and as an identifying icon in a tabbed interface. If you've noticed that bringing up a site puts an image associated with the site next to your address bar or in the tab for that page, the favicon.ico file is where your browser got that image.
There are ways to point a browser to another file for the favorites icon, but if you want to make a quick-and-dirty favorites icon there are several utilities on the web that either allow you to create your own or convert an image file. Once you've generated the favicon.ico file you can upload it to the document root of your site and the associated 404 errors should stop appearing in your log.
The 500 status code is kind of a catchall error code for when a module or external program doesn't do what the web server was expecting it to do. If you have a module that proxies requests back to an application server behind your web server, and the application server is having problems, then the server could return a 500 error to web clients.
The 503 status code appears when the web server can't create a new connection to handle an incoming request. If you see this status code in your logs it usually means that you're getting more web traffic than can be handled by your current web server configuration. You'll then need to look into increasing the number of clients the server can handle at one time in order to be rid of this status code.
Those are the status codes you'll find in web logs most often, particularly the unexpected ones. If you're having an issue and the codes associated with recent visitors to the site don't make clear what's wrong, there's still more troubleshooting you can perform to look for another cause. Web logs are important tools, but they can only record what the web server sees, not issues stemming from other systems on your system.
© 2011-2013 Rackspace US, Inc.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License