Excluding Certain Files or Directories with rsync


rsync is cool. What would be even cooler would be excluding particular files or even a whole folder from the backup process.

That's no problem using the rsync '--exclude' option.

Contents

Why?

In the previous article, we looked at backing up files and folders to another server.

This was easily completed but there may well be some time sensitive files that are not needed such as log files. Sure, there are some log files (perhaps Apache logs) that you want to keep but others you won't such as a Ruby on Rails production log.

Perhaps there are files containing your DB password, such as a PHP mysqli connection file. Although needed on the main server, it is not needed on the backup.

A folder I always exclude when completing an rsync on my home folder is the 'sources' directory: I don't need copies of the source code I have download.

Let's see how to exclude that directory.

Single files and folders

The original command was like this:

rsync -e 'ssh -p 30000' -avl --delete --stats --progress demo@123.45.67.890:/home/demo /backup/

To exclude the sources directory we simply add the '--exclude' option:

--exclude 'sources'

Note: the directory path is relative to the folder you are backing up.

So looking at the original command, I am backing up /home/demo. Adding the name 'sources' to the exclude pattern will exclude the 'home/demo/sources' folder.

As such, the final command would look like this:

rsync -e 'ssh -p 30000' -avl --delete --stats --progress --exclude 'sources' demo@123.45.67.890:/home/demo /backup/

The same can be applied to files. I have decided that in addition to the sources folder, I want to exclude the file named 'database.txt' which resides in the public_html folder. So I add this:

--exclude 'public_html/database.txt'

So now the command looks like:

rsync -e 'ssh -p 30000' -avl --delete --stats --progress --exclude 'sources' --exclude 'public_html/database.txt' demo@123.45.67.890:/home/demo /backup/

Multiple files and folders

Unfortunately, I have a load of files and folders I don't want to backup and adding each one like that will get tedious very quickly.

Not only will it get boring, it will make the command super long and prone to easy mistakes.

That's OK as I can define all the files and folders I want exclude in a single file and have rsync read that.

To do this create a file called 'exclude.txt' on the destination machine (the system you give the rsync command on):

nano /home/backup/exclude.txt

Define

Then define the files and folders you want to exclude from the rsync:

sources
public_html/database.*
downloads/test/*

As you can see, you can define patterns.

The first entry is straight forward. It will exclude any file or folder called 'sources' (remember the path is relative).

The second entry will look int the public_html folder and exclude any files (or folders) that begin with 'database.'. The * at the end indicates a wild card, so 'public_html/database.txt' and 'public_html/database.yaml' will be excluded from the backup.

Using a wildcard, the final entry will exclude the contents of the 'downloads/test/' but still download the folder (in other words, I will end up with an empty 'test' folder).

Final command

Now we have defined what to exclude we can direct rsync to the file with:

--exclude-from '/home/backup/exclude.txt'

Note: the path for this file is absolute. You are defining where in the file system rsync should look for the exclude patterns.

So, this time the final command would be:

rsync -e 'ssh -p 30000' -avl --delete --stats --progress --exclude-from '/home/backup/exclude.txt' demo@123.45.67.890:/home/demo /backup/

That's better.

Summary

Excluding files from a backup can be time and space saver. They usually include superfluous downloads or files that contain sensitive information such as passwords that just don't need to be in several locations.

Using the exclude-from option allows for easy control over the exclude patterns - you can even define several exclude files and simply point rsync to the one that is convenient.

--Kelly Koehn 13:18, 2 April 2009 (CDT)



Was this content helpful?




© 2011-2013 Rackspace US, Inc.

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License


See license specifics and DISCLAIMER