Backing up sparse files ... VM's and TrueCrypt ... etc
Edward Ned Harvey
blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org
Sun Feb 21 09:40:50 EST 2010
> The prior info might also explain why rsync is slow in this situation.
> With your use case of a sparse file that's only about 10% used, and
> your
> point that it still takes time to process the zeros produced by the OS,
> which rsync then has to calculate an MD5 hash of, it can take a while.
Here's a benchmark.
These are empty TrueCrypt volumes, so the nonsparse file takes 5G on disk,
while the sparse one takes 256K on disk, and is "apparently" 5G in length.
$ time cat truecrypt-5G-sparsefile.tc > /dev/null ; time cat
truecrypt-5G-nonsparsefile.tc > /dev/null
real 0m6.854s
real 1m33.533s
$ time md5sum truecrypt-5G-sparsefile.tc > /dev/null ; time md5sum
truecrypt-5G-nonsparsefile.tc > /dev/null
real 0m18.398s
real 1m25.641s
$ time gzip --fast -c truecrypt-5G-sparsefile.tc > /dev/null ; time gzip
--fast -c truecrypt-5G-nonsparsefile.tc > /dev/null
real 0m37.922s
real 4m35.956s
> What you really need is a hypothetical sparse_cat that is file system
> aware and can efficiently skip over the unused sectors. Or better yet,
> the equivalent functionality built-in to your archiving tool.
I agree, that would be nice. However, as I mentioned above, you may be
overestimating the time to read or md5sum all the 0's in the hole of sparse
files. The hypothetical sparse_cat would improve performance, but just
marginally.
> Basically they use a VMware tool to backup the VM image, and then rsync
> that backup file.
Oh la la. That might be ok for them, having already bought the license for
other purposes, but it's $995 or higher, as far as I can tell.
More information about the Discuss
mailing list