ZFS and block deduplication
Mark Woodward
markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org
Fri Apr 22 12:07:09 EDT 2011
On 04/22/2011 12:00 PM, discuss-request-mNDKBlG2WHs at public.gmane.org wrote:
> Message: 15 Date: Fri, 22 Apr 2011 11:53:23 -0400 From: David
> Rosenstrauch <darose-prQxUZoa2zOsTnJN9+BGXg at public.gmane.org> Subject: Re: ZFS and block
> deduplication To: discuss-mNDKBlG2WHs at public.gmane.org Message-ID:
> <4DB1A473.1090701-prQxUZoa2zOsTnJN9+BGXg at public.gmane.org> Content-Type: text/plain;
> charset=ISO-8859-1; format=flowed On 04/22/2011 11:41 AM, Mark
> Woodward wrote:
>> > I have been trying to convince myself that the SHA2/256 hash is
>> > sufficient to identify blocks on a file system. Is anyone familiar with
>> > this?
>> >
>> > The theory is that you take a hash value of a block on a disk, and the
>> > hash, which is smaller than the actual block, is unique enough that the
>> > probability of any two blocks creating the same hash, is actually less
>> > than the probability of hardware failure.
>> > Given a small enough block size with a small enough set size, I can
>> > almost see it as safe enough for backups, but I certainly wouldn't put
>> > mission critical data on it. Would you? Tell me how I'm flat out wrong.
>> > I need to hear it.
> If you read up on the rsync algorithm
> (http://cs.anu.edu.au/techreports/1996/TR-CS-96-05.html), he uses a
> combination of 2 different checksums to determine block uniqueness.
> And, IIRC, even then he still does an additional final check to make
> sure that the copied data is correct (and copies again if not).
That's rsync, and I tend to agree with their level of paranoia. Take a
look at this link:
http://blogs.sun.com/bonwick/entry/zfs_dedup
More information about the Discuss
mailing list