admins worst nightmare...

Edward Ned Harvey blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org
Wed Mar 10 08:29:56 EST 2010


> More particulars.... Backup system #1 is a Dell server which I bought
> about 5 months ago. Red Hat Enterprise Linux 5.4 did not run well on
> it,

I am curious what you mean by that.  I use RHEL all over the place on Dell
servers.  I can't say "never a problem" just as I could never say "never a
problem" on any other system ever.  However, I wouldn't say there's any
fundamental problem of RHEL vs Dell servers.


> disks, (the ones with the long mean time between failures.) But 4 of
> those disks are hanging off the dell system via external sata
> connections. I bought a pci sata controller with 4 external sata
> connectors. Finally, the 6 tera byte file system is made up of the 7
> tera byte drives running a software raid 5 raid array. Also, I have the
> smartd tools running doing nightly and weekly checks. With all that in
> place, there were no warning of errors on the file system. Which makes
> me think there is a bug in ext3/md raid5 or the PCI esata controller
> card is mucked up. I still have to very the memory, which is supposed
> to
> be ECC memory.

Given the new info - You're using "generic" disks (meaning not dell branded)
in external enclosures, using a commodity SATA controller instead of the
Dell enterprise controller ... I will strongly suspect the disks, and
particularly the external disks, or the controller the most.  It's still
possible the problem is the CPU or memory or something, but probably not.

I have many such personal experiences, which give me the personal belief
that "unsupported" commodity low-cost hardware has a much higher probability
of failure and undetectable or unfixable problems, as compared to the
"official" manufacturer recommended and supported solution.

The recommended solution for the setup you described would be either the
Dell built-in SAS or PERC controller, with Dell branded disks in the drive
bays, or if you needed additional disk slots, then a Dell DAS with Dell
disks connected to a Dell controller card.  

This is not a testament to the greatness of Dell.  Or HP or Sun or anybody.
They're all the same in this regard.  They build a server expecting you to
use their branded peripherals, which they have attached and tested and
designed specifically for such purposes.  If you deviate from that - It may
work just as well.  It may fail silently (as in your case), or it may fail
dramatically.  I've seen all of these take place.

I have a 2-page essay written on this subject, if you'd like any further
persuasion.  ;-)

Oh, by the way, for the purposes you've described, I would highly encourage
Solaris or OpenSolaris and ZFS instead of Linux and ext3.  ZFS would have
saved you from data loss in this case, because it does auto checksumming on
all data at all times.  I will admit that the solaris or opensolaris
installation process on a dell server can be confusing or sketchy, but as
long as solaris is listed as a purchase option for that model of server, you
can rest assured it will work.






More information about the Discuss mailing list