I recently upgraded to Mozilla Thunderbird 3.0. That got me thinking that now might be a good time to clean up my local mail folders. All of my mail from the past few years is stored on my IMAP server. I still have a few gigabytes of old mail from my old POP3 days stored in Thunderbird's Local Folders
.
I decided that now might be a good time to do some spring cleaning and not carry around my old POP3 mail anymore. I figured it is also a good time to store this current copy of my old mail with my long-term backups.
My Old Friend rzip
I have been using rzip
for quite a few years. Its job is to find and encode large chunks of duplicated data over very large distances, up to 900 MB. Once that is complete, it runs the resulting data through bzip2
. For large datasets, I've found it to be much faster than `bzip2 and it usually results in an archive that is about 30% smaller than just using bzip2
.
Unfortunately, rzip
can’t operate on pipes. All of my automated backup scripts run along pipelines, usually from tar
to bzip2
to gpg
. They never touch the disk unencrypted, which probably isn't always helpful.
My New Friend lrzip
I recently discovered Con Kolivas' lrzip
. lrzip
takes rzip
a couple steps further. It lets you choose a compressor other than `bzip2 for the second stage of compression. It can also be used in a pipe.
Unfortunately, when used in a pipe it generates a large temp file. This can be a problem if you are trying to generate a large archive and don't have a lot of free disk space, or if you don't want unencrypted data being written to the disk.
Some Benchmarks
I compressed the tarball of my .thunderbird
directory every way I could think of that made sense. The default settings for lrzip
kept erroring out on me at about 30%. I had to use the -w
switch to reduce the window size from 20. I chose 12, which should be about 30% higher than rzip
's window.
Size Minutes Ratio
(MB)
uncompressed 5761 na 1.0:1
lrzip zpaq 1207 265 4.7:1
lrzip lzma 1262 60 4.5:1
lrzip bzip2 1401 27 4.1:1
rzip 1362 20 4.2:1
lzma 1441 97 4.0:1
bzip2 1748 38 3.3:1
Both rzip
and lrzip
achieved a smaller file size in less time than bzip2
. lrzip
with zpaq
is over 13 times slower than rzip
for a savings of 155 MB, or about 12%.
Why Would Anyone Wait for zpaq
to Finish?
Most of the time it isn't worth the wait. I'm a huge fan of smaller backups. Backups become significantly more expensive every time a single backup has to span a second (or third, or fourth…) piece of media. It's another floppy, CD, DVD, Blu-ray, tape, hard drive, or flash drive to have to manually swap around and keep safe.
I like flash drives for my personal backups. I have too many CDs and DVDs that are unreadable. I've accidentally run an old compact flash drive through the laundry and it still worked. I'm sure all flash drives won't survive that, but they do tend to be very durable.
Unfortunately, lrzip
with zpaq
did not get the file size down enough for the archive to fit on the backup flash drive that I keep around the house. Another 100 MB or so would have done the trick and would save me quite a bit of effort.
Which One Should You Use?
For most archives I would probably just choose bzip2
. It does a very good job, and a decompressor is always very readily available.
For almost every very large archive, I will definitely be sticking with rzip
. It is faster and more space-efficient than bzip2
. It is also easier to find than lrzip
; my Ubuntu machine has an rzip
package available in apt
.
I will be sure to keep lrzip
with zpaq
in mind, though. Sometimes an extra couple hundred MB will save the time, effort, and cost of a second piece of media. The other downside to zpaq
is that decompression is also very slow as well.