[Discuss] rsnapshot vs. rdiff-backup

Mon Dec 2 22:26:35 EST 2013

I've been using rsnapshot for several years now and I'm reasonably 
familiar with it. It was recently suggested to me to use rdiff-backup to 
copy files to a FAT32 file system because it is aware of FAT32 and exFAT 
file name restrictions. Since then I've been experimenting with 
rdiff-backup. Here are some of the high and low points of the two.

rsnapshot is, as the name suggests, a snapshot system. It uses a 
combination of GNU cp's hard link directory replication and rsync itself 
to maintain time-based snapshots. It functions similarly to Apple's Time 
Machine with one notable difference. Where Time Machine's snapshots run 
back forever until disk runs out then the oldest are pruned to make 
room, rsnapshot's snapshots are rotated at fixed points: hourly, daily, 
weekly, monthly, yearly with pruning managed by a retention policy. 
While I've repeatedly stated -- and still maintain -- that sync is not 
backup, maintaining many sync-based snapshots is close enough for some 
uses. When you have many users who want to be able to pluck single files 
from arbitrary times out of a backup system is when rsnapshot shines.

There are two big drawbacks to rsnapshot. The first is setup. It's 
tedious. You need to configure the increments and retention in a 
configuration file. You need to match up the increments with associated 
cron jobs. And you need to make sure that the cron jobs are staggered so 
that they don't step on each other. rsnapshot is smart enough not to let 
that be destructive but it can mean missing snapshot runs and that's not 
good for a backup system.

The second is that it is terrible for things like databases that grow 
forever. Each run will copy an entire database dump or log file or 
whatever which can lead to massively inflated disk usage.

The third -- okay, three big drawbacks -- is that it only works on Unix 
file systems and their network equivalents. The hard link mechanism 
won't work on either NTFS or FAT* which makes it unusable for either 
Windows clients (being backed up) or storage.

rdiff-backup, as the name suggests, is a backup mechanism that uses 
diffs. Specifically, it uses the rsync algorithm to calculate deltas 
(rdiff) and uses these deltas to build backup histories. Operation is 
more like Time Machine: each run adds new deltas to the history until 
you run out of space (at which point the whole thing falls apart) or you 
invoke a dedicated cleanup run to prune based on relative or absolute 
time or number of backup runs. As with rsnapshot, sync is not backup but 
a history of snapshots is close enough.

There is practically no setup with rdiff-backup. Everything is command 
line arguments or external files (e.g., exclude lists) noted in the 
arguments. This makes a backup script literally a sequence of 
rdiff-backup commands. As I noted in the introduction, rdiff-backup is 
smart about escaping characters that are prohibited on target file 
systems. It also maintains a log of file ownerships and attributes 
including NTFS ACLs. That's a huge win for disaster recovery.

Another win is that because it's based on deltas, and those deltas are 
compressed, it is vastly more efficient for continuously growing files 
like databases and logs and VM images. Since the rdiff algorithm is 
based on rsync it doesn't matter if the files are text or binary data. 
It's all just bits to rdiff.

Now the bad. The big one is that it isn't so obvious how to find a 
specific file at a specific date and time. Only the most recent backup 
run is in the target directory. All of the compressed deltas are stored 
in a subdirectory under the target. Getting at those requires invoking 
the rdiff-backup command.

rdiff-backup runs are slower than comparable rsnapshot runs. Calculating 
and compressing deltas is more CPU intensive than GNU cp and rsync runs. 
rdiff-backup's efficiency comes at a price.

There they are. Two very different backup systems built on the same 
rsync algorithm.

-- 
Rich P.