[Discuss] Deduplication
Kent Borg
kentborg at borg.org
Thu Sep 5 00:06:52 EDT 2024
For many years now I've been good about keeping off line backups on
(encrypted) external disks. I have been backing up my daily computer(s)
over several generations of said computers. Which means I manage to put
large amounts of data on big disks the modern way: by collecting and
storing duplicates of stuff.
I am a big fan of rsync's "--link-dest" feature, so complete backup
trees actually share common files that didn't change. But sometimes
copies (I stored those photos twice?) slip in, or things get moved.
So today I ran "duperemove" on a couple volumes, and it scared up some
non-trivial space. I decided to run it on a third volume.
Nope! It works by telling the kernel to make files that match to share
the same extents, but that only works for some file systems.
- XFS. yes, I have used that a long time, it is clever enough to CoW any
changes that are later made, so files that match can later later diverge.
- btrfs, which I have been using recently, because god knows it is heavy
in the CoW-ing world
But it doesn't work on any of the extN filesystems. I have used XFS on
my running volumes for a long time, but for backups I guess I stuck
longer with ext4 and I maybe even earlier ext-s on some disks—but they
aren't active, so that's okay.
And anyway, backups are backups, not working containers. I'm happy I can
dedup what I can.
-kb, the Kent who has long worried about bit rot, for backups, and ever
since disks got big enough to hold lots of idle data, and who is
reassured that btrfs CRCs both meta-data and data.
More information about the Discuss
mailing list