mcelog reports AMD DRAM Parity Error?
Jarod Wilson
jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org
Tue Nov 23 09:36:59 EST 2010
On Nov 23, 2010, at 9:26 AM, Derek Atkins wrote:
> Jarod Wilson <jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org> writes:
>
>> On Nov 19, 2010, at 10:10 AM, Derek Atkins wrote:
>>
>>> Jarod Wilson <jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org> writes:
>>>
>>>> On Nov 18, 2010, at 10:30 AM, Derek Atkins wrote:
>> ...
>>>>> Does this mean I have a busted CPU? Or busted RAM?
>>>>
>>>> RAM. However, its not a fatal error, its simply a corrected
>>>> ecc error. I'm told this is all a single event here, and the
>>>> event was the corrected ecc error, anyway. So you might want
>>>> to replace some memory at some point, but hey, its ecc memory
>>>> doing what its designed to do here.
>>>
>>> Is there an easy way to figure out which bank of RAM had the error?
>>>
>>> I guess I can wait until I have another issue..
>>
>> Its a mixed bag. For some boards, its quite simple, others, well,
>> notsomuch... I'm particularly unsure how to do it with mcelog,
>> but at least w/edac, there's an edac-utils userspace that can,
>> among other things, upload an address/bank/whatever to slot
>> mapping for specific motherboards...
>
> In my case it's a SuperMicro H8DA3-2 with two Quad-Core AMD Opteron(tm)
> Processor 2378 CPUs. Would edac work here?
I believe amd64_edac should work on that board, and SuperMicro boards
usually are reasonably well understood and supported by edac-utils,
if I'm thinking clearly, so yeah, there's a decent chance it would do
the trick for you here.
> It looks like I have not received a new mcelog entry.. Either that or I
> somehow disabled it a while ago and the mcelog upgrade didn't re-enable
> what I did. (Of course I don't remember what I did, and didn't log
> it..) *sigh*
I've got an opteron box here that'll go for quite some time w/o an
event, but a highly threaded kernel compile will just about always
cause one (or several) to pop up. I'm lazy though, its been this
way for ages, and hasn't caused any significant issues for me.
Well, and I'm loathe to pour more money into replacement RAM for a
box that is almost 5 years old now. :)
--
Jarod Wilson
jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org
More information about the Discuss
mailing list