[Discuss] Backing up LVM partitions using snapshots

markw at mohawksoft.com markw at mohawksoft.com
Sun Dec 11 21:15:40 EST 2011


> On Dec 11, 2011, at 2:54 PM, markw at mohawksoft.com wrote:
>>
>> How?
>
> Let's posit a 7 day backup cycle.  On day 1 you do your full backup (all
> volume blocks).  On day's 2 through 7 you do incremental backups (changed
> volume blocks).
>
> Let's say that the volume contains your company's code repository.  Users
> are working on this volume almost constantly.  Regardless of what they do,
> the file system blocks are going to be changing just as frequently, and
> what you see on day of your cycle 5 may not look at all like what you saw
> on day 1.
>
> As a specific example, I create a file on day 1 that spans 3 volume
> blocks.  On day 2 I change the first block but the remaining two blocks
> remain unchanged (I'm doing random I/O for performance reasons, just as
> one would with a database).  On day 3 I make changes to the second block.
> Your block-level backups for this file would contain 3 blocks on day 1, 1
> block on day 1 and 1 block on day 3.
>
> On day 6 I accidentally delete the file and I contact you to restore it.
> You go to your day 1 backup and do the full restore which gets the
> original version of the file.  Then you go to restore the day 2
> incremental and find that it is unusable.  You can certainly restore the
> day 3 incremental but I'm still missing the block of data backed up on day
> 2, a block of data that can not be recovered because the backup of it is
> gone.  Much of my work is lost and I have to do it all over again.
>
> Now, consider the effect of missing blocks on directory data and inode
> tables.
>

Your whole scenario hinges on a big misunderstanding of how block-level
backup works.

Lets say our block size is 8K. The volume you want to back up is 512G.
That means your first backup contains 62.5 million blocks. Lets also
assume you've zero'd out the empty space of the volume and are using block
level deduplication, ala something like zfs. You only really backup 30
million blocks of data and the rest are zero.

You are left with two components, the "data" as addressed by a hash code
(SHA2), and the "structure" of your disk which is a linear list of blocks.

Now, you make a HUGE (5%) change to your disk, you back-up 3 million
changed blocks! You overlay those new blocks on to a list of old blocks
and create a new list for the backup. Even though you've done an
incremental backup, you still have a "whole" representation of the volume.

These lists can be used to calculate the deltas between any two historical
points, or even an arbitrary snapshot.

Here's the other scenario. You can create a snapshot of any volume. Using
the change log, or scanning it directly, you can use the block-level data
and the block list to re-create a previous point-in-time snapshot even if
you lost the original snapshot volumes.









More information about the Discuss mailing list