decoding MCE Logs? Possible hardware issue?
Derek Atkins
warlord-DPNOqEs/LNQ at public.gmane.org
Tue Sep 28 09:48:59 EDT 2010
Hey,
I noticed the following in my mcelog, and I was hoping someone could
help be decode this. My google fu has not let me to an answer.
I'm running a Supermicro H8DA3-2 with two Quad-Core AMD Opteron(tm)
Processor 2378 and 16GB of RAM (8 sticks of ACTICA DDR2 667 2GB ECC REG)
purchased with the machine in Jan, 2009.
Is this a memory issue? I'm worried because this is a VM server. After
I upgraded the host from Fedora 10 to Fedora 13 many of my VM guests
started getting virtual disk errors, some of them severe enough to cause
the virtual disk to turn read-only and break the system! I've backed
down to a F10 kernel in the hopes that it will keep my VMs up longer,
but if it's a hardware problem I'd like to diagnose and fix it quickly.
Of course it's a production server, so I can't take it down and run
memtest86 on it. :(
Suggestions?
-derek
PS: these are the last 3 entries in /var/log/mcelog. If I grep for CPU
in the log I see CPUs 0 and 2 and BANKs 0, 2, and 4. The rest of the
messages appear to be relatively consistent.
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 4 TSC 72e2f92e17ed46
MISC c008000001000000 ADDR 1c88309c0
STATUS 9c6cc450001d017b MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 4 TSC d562f475202ab
MISC c008000001000000 ADDR 114e7fc00
STATUS 9c74ccf8001d011b MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 4 TSC 11f2ad04402fe6
MISC c008000001000000 ADDR 234909fc0
STATUS 9c524484001d011b MCGSTATUS 0
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord-DPNOqEs/LNQ at public.gmane.org PGP key available
More information about the Discuss
mailing list