I recently upgraded to Mozilla Thunderbird 3.0. That got me thinking that now might be a good time to clean up my local mail folders. All of my mail from the past few years is stored on my IMAP server. I still have a few gigabytes of old mail from my old POP3 days stored in Thunderbird's
I decided that now might be a good time to do some spring cleaning and not carry around my old POP3 mail anymore. I figured it is also a good time to store this current copy of my old mail with my long-term backups.
My Old Friend
I have been using
rzip for quite a few years. Its job is to find and encode large chunks of duplicated data over very large distances, up to 900 MB. Once that is complete, it runs the resulting data through
bzip2. For large datasets, I've found it to be much faster than `bzip2 and it usually results in an archive that is about 30% smaller than just using
rzip can’t operate on pipes. All of my automated backup scripts run along pipelines, usually from
gpg. They never touch the disk unencrypted, which probably isn't always helpful.
My New Friend
I recently discovered Con Kolivas'
rzip a couple steps further. It lets you choose a compressor other than `bzip2 for the second stage of compression. It can also be used in a pipe.
Unfortunately, when used in a pipe it generates a large temp file. This can be a problem if you are trying to generate a large archive and don't have a lot of free disk space, or if you don't want unencrypted data being written to the disk.
I compressed the tarball of my
.thunderbird directory every way I could think of that made sense. The default settings for
lrzip kept erroring out on me at about 30%. I had to use the
-w switch to reduce the window size from 20. I chose 12, which should be about 30% higher than
Size Minutes Ratio (MB) uncompressed 5761 na 1.0:1 lrzip zpaq 1207 265 4.7:1 lrzip lzma 1262 60 4.5:1 lrzip bzip2 1401 27 4.1:1 rzip 1362 20 4.2:1 lzma 1441 97 4.0:1 bzip2 1748 38 3.3:1
lrzip achieved a smaller file size in less time than
zpaq is over 13 times slower than
rzip for a savings of 155 MB, or about 12%.
Why Would Anyone Wait for
zpaq to Finish?
Most of the time it isn't worth the wait. I'm a huge fan of smaller backups. Backups become significantly more expensive every time a single backup has to span a second (or third, or fourth…) piece of media. It's another floppy, CD, DVD, Blu-ray, tape, hard drive, or flash drive to have to manually swap around and keep safe.
I like flash drives for my personal backups. I have too many CDs and DVDs that are unreadable. I've accidentally run an old compact flash drive through the laundry and it still worked. I'm sure all flash drives won't survive that, but they do tend to be very durable.
zpaq did not get the file size down enough for the archive to fit on the backup flash drive that I keep around the house. Another 100 MB or so would have done the trick and would save me quite a bit of effort.
Which One Should You Use?
For most archives I would probably just choose
bzip2. It does a very good job, and a decompressor is always very readily available.
For almost every very large archive, I will definitely be sticking with
rzip. It is faster and more space-efficient than
bzip2. It is also easier to find than
lrzip; my Ubuntu machine has an
rzip package available in
I will be sure to keep
zpaq in mind, though. Sometimes an extra couple hundred MB will save the time, effort, and cost of a second piece of media. The other downside to
zpaq is that decompression is also very slow as well.