[Discuss] rsnapshot vs. rdiff-backup
Richard Pieri
richard.pieri at gmail.com
Mon Dec 2 22:26:35 EST 2013
I've been using rsnapshot for several years now and I'm reasonably
familiar with it. It was recently suggested to me to use rdiff-backup to
copy files to a FAT32 file system because it is aware of FAT32 and exFAT
file name restrictions. Since then I've been experimenting with
rdiff-backup. Here are some of the high and low points of the two.
rsnapshot is, as the name suggests, a snapshot system. It uses a
combination of GNU cp's hard link directory replication and rsync itself
to maintain time-based snapshots. It functions similarly to Apple's Time
Machine with one notable difference. Where Time Machine's snapshots run
back forever until disk runs out then the oldest are pruned to make
room, rsnapshot's snapshots are rotated at fixed points: hourly, daily,
weekly, monthly, yearly with pruning managed by a retention policy.
While I've repeatedly stated -- and still maintain -- that sync is not
backup, maintaining many sync-based snapshots is close enough for some
uses. When you have many users who want to be able to pluck single files
from arbitrary times out of a backup system is when rsnapshot shines.
There are two big drawbacks to rsnapshot. The first is setup. It's
tedious. You need to configure the increments and retention in a
configuration file. You need to match up the increments with associated
cron jobs. And you need to make sure that the cron jobs are staggered so
that they don't step on each other. rsnapshot is smart enough not to let
that be destructive but it can mean missing snapshot runs and that's not
good for a backup system.
The second is that it is terrible for things like databases that grow
forever. Each run will copy an entire database dump or log file or
whatever which can lead to massively inflated disk usage.
The third -- okay, three big drawbacks -- is that it only works on Unix
file systems and their network equivalents. The hard link mechanism
won't work on either NTFS or FAT* which makes it unusable for either
Windows clients (being backed up) or storage.
rdiff-backup, as the name suggests, is a backup mechanism that uses
diffs. Specifically, it uses the rsync algorithm to calculate deltas
(rdiff) and uses these deltas to build backup histories. Operation is
more like Time Machine: each run adds new deltas to the history until
you run out of space (at which point the whole thing falls apart) or you
invoke a dedicated cleanup run to prune based on relative or absolute
time or number of backup runs. As with rsnapshot, sync is not backup but
a history of snapshots is close enough.
There is practically no setup with rdiff-backup. Everything is command
line arguments or external files (e.g., exclude lists) noted in the
arguments. This makes a backup script literally a sequence of
rdiff-backup commands. As I noted in the introduction, rdiff-backup is
smart about escaping characters that are prohibited on target file
systems. It also maintains a log of file ownerships and attributes
including NTFS ACLs. That's a huge win for disaster recovery.
Another win is that because it's based on deltas, and those deltas are
compressed, it is vastly more efficient for continuously growing files
like databases and logs and VM images. Since the rdiff algorithm is
based on rsync it doesn't matter if the files are text or binary data.
It's all just bits to rdiff.
Now the bad. The big one is that it isn't so obvious how to find a
specific file at a specific date and time. Only the most recent backup
run is in the target directory. All of the compressed deltas are stored
in a subdirectory under the target. Getting at those requires invoking
the rdiff-backup command.
rdiff-backup runs are slower than comparable rsnapshot runs. Calculating
and compressing deltas is more CPU intensive than GNU cp and rsync runs.
rdiff-backup's efficiency comes at a price.
There they are. Two very different backup systems built on the same
rsync algorithm.
--
Rich P.
More information about the Discuss
mailing list