fault detection
Scott Prive
Scott.Prive at storigen.com
Mon Aug 19 17:51:26 EDT 2002
You could have faulty memory. The memory test on starting the BIOS or any OS, is a very limited test.
If you have a faulty memory chip, you increase the chances of "hitting it" by running some high-load benchmarks or system tests.
These are disruptive to a production server, but cpuburn, memtest, and the Linux Test Project are all pretty rough on the kernel & hardware. If there is an intermittent problem, these could help you triage more quickly than waiting for the next crash event.
-----Original Message-----
From: FRamsay at castelhq.com [mailto:FRamsay at castelhq.com]
Sent: Monday, August 19, 2002 3:42 PM
To: discuss at blu.org
Subject: fault detection
Does anyone know of any tools to help figure out why a box rebooted? One
of our client boxes rebooted
over the weekend for no apparent reason. The client claimed there was no
power outage, and a quick look
over the logs verifies the UPS didn't shut the computer down. Also I
didn't see a shutdown or reboot request
in /var/log/messages. So what tools do people use to figure out why a
Linux system crashed?
the system is running Redhat 7.2 kernel 2.4.9-13
-fjr
Frank Ramsay
Systems Programmer
Castel, Inc
14 Summer St, 3rd Floor
Malden, MA 02148
(781) 324-0140 (voice)
(781) 324-0277 (fax)
Emal: framsay at castel.com
_______________________________________________
Discuss mailing list
Discuss at blu.org
http://www.blu.org/mailman/listinfo/discuss
More information about the Discuss
mailing list