[Discuss] raid issues

Fri Jun 20 09:15:18 EDT 2014

> From: discuss-bounces+blu=nedharvey.com at blu.org [mailto:discuss-
> bounces+blu=nedharvey.com at blu.org] On Behalf Of Stephen Adler
> 
> Any comments on how to deal with say a 16 disks and what's the current
> lore on making large redundant disk arrays?

However you decide to do your redundancy, try to eliminate as many single points of failure as possible.  For example, I strongly prefer to use ZFS, with mirrors, where each disk is mirrored to another disk on a different bus.  For these setups, I prefer absolutely dumb SATA/SAS buses with no hardware raid, buffering, caching of any kind, so those disks can be plugged into any dumb SATA/SAS system.  That way if you have an entire HBA fail, you still have your filesystem up and running.

If you're doing hardware raid, you probably won't be able to distribute the redundancy across separate buses (but you can get redundant HBA's on things like SAN's, with multipath, if you have the budget for it).  But assuming you're talking about an individual server with an individual HBA, then just make sure you have a super awesome warranty on that system, because the HBA will do something proprietary, and the only sure way to ensure recovery after replacing a failed HBA is to have a fully supported compatible (identical) replacement HBA.

Actually depending on the mode of failure, a failing HBA can also spray garbage bits all over the disks before its brains explode violently, thus rendering any recovery impossible even if you have a fully supported compatible or identical replacement HBA.

Nothing is a substitute for backups.