Tar and untar
When transferring files and backing up data from one computer to another, the smaller the file and the fewest number of files speeds up the process enormously.
This article concentrates on compressing and uncompressing files and folders using the command line into, and from, different formats.
The three main ways of compressing files and folders are tar, tar.gz and tar.bz2.
Each has uses different compression methods and produce different sized results. The most common you will find are tar.gz and tar.bz2.
Although not often used on it's own, tar (which comes from the words 'tape archive' or 'tape archiving'), does not actually compress files or folders but rather puts them into one file.
This is very handy when transferring or backing-up a directory as it produces one easy to manipulate file rather than an unwieldy folder full of them.
To tar files is very simple:
tar cvf dest.tar file.txt file2.txt
As you can see, the destination file is named first and anything named after that is placed into the tar file.
Tarring folders is done in exactly the same way:
tar cvf dest.tar myfolder/
Naturally, once transferred you will want to untar the file or folder like this:
tar xvf dest.tar
You may also wonder what is in the tar file so a quick list of files would be helpful. No worries:
tar tvf dest.tar
You may have notice the pattern in the commands: compressing uses the 'c' option, extracting uses the 'x' option and listing uses the 't' option.
That's all well and good but what if the tar file is several MB or even GB in size?
That's where compressing the archive comes into play. Once created, the tar file can be compressed to a smaller size.
One compression method uses gzip. So to tar a folder and then compress it using gzip we would use:
tar cvfz dest.tar.gz myfolder/
Again, the destination file, ending in tar.gz, is named first and the files and folders to be compressed are then named.
Uncompressing a gzip file is just as simple:
tar xvfz dest.tar.gz
And to have a quick peek inside the file without uncompressing the whole thing:
tar tvzf dest.tar.gz
The third method again uses tar to place all the files and folders into one large file but uses the bzip2 compressions utility instead of gzip.
Although not an absolute, bzip2 will usually produce a smaller file than when using gzip.
I am sure you are getting the hang of it, but to compress a folder, we first tar it and then, using bzip2, compress it:
tar cvjf dest.tar.bz2 myfolder/
To extract it:
tar xvjf dest.tar.bz2
And to have a little look inside:
tar tvjf dest.tar.bz2
Extracting individual files
I've left this useful addition to the end as it is easily applied to all three methods shown above.
The situation is commonly encountered: you have a compressed backup, perhaps several MB or GB in size, but you only want one file from it.
No problem. Simply name the file you want to extract after the initial command like so:
tar xvjf dest.tar.bz2 textfile.txt
As you can see, it uses exactly the same method of extraction used before, but instead of uncompressing the whole lot, it just uncompresses the named file - in this example I wanted textfile.txt to be uncompressed.
In all three methods (tar, tar.gz and tar.bz2), just name the specific file, or files, you want extracting after the standard uncompress command.
So let's see how to add a single file or folder to existing backups. What about deleting individual files without extracting the whole archive?
Let's start with a common requirement by adding a file or folder to an existing tar archive.
Why would we do this? Well, it's a lot easier adding a file to an existing archive than extracting the archive, adding the file and then tarring it again, especially if the archive is large.
Add a file
To add a file to an existing archive:
tar rvf dest.tar myfile.txt
The command uses the 'r' option which is the short form of 'append'. In this case, the file called 'myfile.txt' is added (appended) to the tar archive named 'dest.tar'.
Add a directory
Exactly the same procedure is used to add a directory:
tar rvf dest.tar myfolder/
In that case, the directory name 'myfolder' was added to the archive named 'dest.tar'.
Don't forget that you can check the procedure worked by listing the contents of the archive:
tar tvf dest.tar
In the same way as wanting the ability to add files to our archive, we want to be able to delete files and folders.
As you would imagine, this is pretty simple:
tar --delete -vf dest.tar myfile.txt
Note the syntax of the command. The delete option is followed by the '-vf' options. Where as the 'v', meaning 'verbose' is optional, the 'f' option, meaning that this is not actually a tape drive but a file system, is not.
So that command would delete the file named 'myfile.txt' from the archive named 'dest.tar'.
To delete a folder from the archive, append the folder name to the command:
tar --delete -vf dest.tar myfolder/
As expected, this removes the directory named 'myfolder' from the archive.
We know that creating an archive is fairly simple. However, the situation often occurs where you have a directory of files you want to archive but there are some files you want to leave out.
This is where the 'exclude' option comes in:
tar cvf dest.tar --exclude='myfile.txt' myfolder/
The format is very similar to creating an full archive. However, this time we excluded the file named 'myfile.txt'.
It can be convenient to create a list of files to exclude from the archive, especially when using tar for regular backups.
To do this, create a file named 'exclude.txt' and enter each filename to be excluded.
The file may look like this:
myfile.txt myfile2.txt .config
Now when issuing the archive command you would use the 'X' option:
tar cvf dest.tar -X exclude.txt myfolder/
As you may expect, the archive name 'dest.tar' is created from the contents of 'myfolder' but the list of files in 'exclude.txt' have not been included.
There are, of course, more options available with the tar command and, as you can see, it's very flexible and can be used for more than creating 'simple' backups.
Ask the man about tar. He'll tell you more:
© 2011-2013 Rackspace US, Inc.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License
See license specifics and DISCLAIMER