How do hard drives handle bad blocks nowadays?
Chuck Anderson
cra-WCkJK2/AXBA at public.gmane.org
Sun Apr 3 17:57:54 EDT 2011
On Sun, Apr 03, 2011 at 05:00:27PM -0400, MBR wrote:
> It's now two decades later, and I'm trying to understand what's changed
> since then. In particular I recently cloned a laptop drive (IDE) to a
> new drive. When I did so, I encountered 2 bad blocks on the new drive.
> Based on my recollection from the late 1980s, I didn't think 2 bad
> blocks was a big deal because I assumed I could manually enter their
> addresses into the bad block list and they'd be replaced by spare
> blocks. But I haven't managed to find a tool to allow me to examine
> and/or edit the bad block list.
Modern ATA (IDE) drives do this remapping automatically, and
transparently to the host system--the LBA block number stays the same,
but the underlying physical sector is moved by the drive firmware to a
spare sector that was reserved for this purpose. Apparently, this
feature can be turned on and off with hdparm -D.
SCSI drives can also do this, and may be configured with this turned
off by default since they are expected to be used in RAID arrays and
servers that would handle this disk management on a higher level.
> After doing some web searches and a bit of reading on this, I get the
> impression that nowadays all modern drives implement S.M.A.R.T.
> (Self-Monitoring, Analysis, and Reporting Technology) and that using
> S.M.A.R.T. they all handle this behind the scenes. If that's true, then
> presumably the only time I should ever see a disk report a bad block is
> when there are no more spare blocks left. Am I right about that?
The remapping only happens on write, not read. This is so that you
can keep trying to read a bad block in the hopes that you might
eventually recover the data with a good read or partial good read.
Once you write to the sector, it then attempts the reallocation.
After it is reallocated, there is no easy way to get at the old
sector's data--it is effectively orphaned on the disk. (If that old
sector happened to have sensitive data on it, there is now no way for
you to erase it, hence the development of Anti-Forensic Splitting for
use with encryption schemes such as LUKS to mitigate against this
issue.)
I've had drives that were stubborn about reallocating automatically
with "normal" overwrites. I had to poke the sectors manually with
hdparm:
hdparm --read-sector <sector-number> # check if it's really bad
hdparm --write-sector <sector-number> # repair (reallocate) bad sector
> If so, then the fact that I encountered write errors on two blocks on
> the drive suggests that the brand new drive was in pretty bad shape to
> begin with.
Check smartctl -a /dev/foo and look for "pending" and "reallocated"
sectors. I usually replace a disk once it starts getting any of
those. A new disk shouldn't have any IMO, and I'd RMA it if that were
the case. I do have some older drives that were given to me that have
1 or 2 reallocated sectors that I might use for scratch storage as
long as the pending or reallocated counts don't keep increasing.
> Is there some tool that will allow me to examine the disk's bad block list?
For ATA, I'm not aware of how to examine the defect list. For SCSI,
you can use sdparm or sg3_utils. smartctl -a will at least tell you
how many have been reallocated.
I usually do the following to test suspect drives:
smartctl -l selftest # look for existing test results
smartctl -t short # do a quick test
smartctl -l selftest # look at the results
smartctl -t long # do a long test (could take an hour or more)
smartctl -l selftest # look at the results
> Also, should I use 'dd' to test all blocks before I put a drive into
> service, or is there a better tool out there?
Besides the above tests, I've often used dd for reading and writing
the entire drive as an extra sanity test, and to force overwrites and
possibly reallocate any bad sectors:
dd if=/dev/zero of=/dev/foo bs=32M
dd if=/dev/foo of=/dev/null bs=32M
In another window:
while true; do killall -USR1 dd; sleep 10; done
Watch the first window for once-per-10-second status updates from dd
:-)
More information about the Discuss
mailing list