Tweak btrfs-snap for More Frequent Snapshots

| Comments

I have been using the btrfs-snap script for a few weeks and it is working very well except for one small problem. Btrfs seems to have trouble if you remove snapshots too quickly.

I made a few simple tweaks to btrfs-snap to help alleviate this problem. I added a check to make sure only one instance of the btrfs-snap script can be actively removing snapshots at a time. I also added a delay between snapshot removals.

This version of the script has been running on my laptop for the last few days, keeping a dozen snapshots at five-minute intervals without any problems. With the original script, btrfs-snap processes would start getting gummed up within the first few hours.

Update 2010-11-14:

I’ve been running this script for a little over two weeks now and I ran into my first runaway snapshot situation last night. Snapshot removal was hanging, and by the time I noticed it I had over 500 extra snapshots of each volume for a total of something over 1650 total snapshots on the file system.

After a reboot, snapshots could be removed again. Early on, the removals took over 30 seconds each, and disk I/O slowed to an absolute crawl. I don’t really want to be stuck with this many snapshots again…

I added a check to the btrfs-snap to make it skip snapshot creation if too many snapshots with the same prefix already exist.

Update 2010-11-29:

I seem to be getting gummed up more often lately, probably every few days. The file system isn’t getting clogged up with huge amounts of extra snapshots anymore, but by the time I notice things went wrong, my process table usually has a few thousand btrfs-snap processes sitting around.

They’re getting hung up trying to count snapshots. It seems that it isn’t possible to get an ls of the .snapshot directory while btrfs is in the middle of failing to remove a snapshot. I moved the check for the sentinel file up a bit so that it creates the lock before counting snapshots. I also added a little countdown loop so that it will give up if it can’t get the lock after a few tries.