ZFS and block deduplication
David Rosenstrauch
darose-prQxUZoa2zOsTnJN9+BGXg at public.gmane.org
Fri Apr 22 11:53:23 EDT 2011
On 04/22/2011 11:41 AM, Mark Woodward wrote:
> I have been trying to convince myself that the SHA2/256 hash is
> sufficient to identify blocks on a file system. Is anyone familiar with
> this?
>
> The theory is that you take a hash value of a block on a disk, and the
> hash, which is smaller than the actual block, is unique enough that the
> probability of any two blocks creating the same hash, is actually less
> than the probability of hardware failure.
> Given a small enough block size with a small enough set size, I can
> almost see it as safe enough for backups, but I certainly wouldn't put
> mission critical data on it. Would you? Tell me how I'm flat out wrong.
> I need to hear it.
If you read up on the rsync algorithm
(http://cs.anu.edu.au/techreports/1996/TR-CS-96-05.html), he uses a
combination of 2 different checksums to determine block uniqueness.
And, IIRC, even then he still does an additional final check to make
sure that the copied data is correct (and copies again if not).
DR
More information about the Discuss
mailing list