[Discuss] Server won't boot kernel. initramfs problem?
John Abreau
abreauj at gmail.com
Sun Feb 24 15:53:18 EST 2013
Maybe not every 5 minutes the way most things are configured in nagios. But running it once a day, or even once a week, to allow nagios a chance to detect memory errors might be worth the overhead.
It would be sufficient just to detect that bad RAM exists. You have to power off the server anyway to replace a bad DIMM, so once you know you have bad RAM, you can run memtest86 to figure out the details.
It would be better than wasting days or weeks replacing hard drives and reinstalling the OS before thinking to test memory.
On Feb 24, 2013, at 3:16 PM, Bill Bogstad <bogstad at pobox.com> wrote:
> On Sat, Feb 23, 2013 at 12:22 PM, John Abreau <abreauj at gmail.com> wrote:
>> RAM going bad silently is an aggravating problem, and we often don't think
>> to test the RAM when some mysterious error crops up. It would be great if
>> Nagios was able to test RAM automatically.
>>
>> Is it possible to test RAM on a live system, rather than having to boot
>> into memtest86?
>
> There is a user space memory tester that tries to use mlock() to avoid paging.
>
> http://pyropus.ca/software/memtester/
>
> Some obvious caveats would apply:
>
> Kernel memory goes untested (whether code, data, or allocated to buffer caches).
> Memory allocated to other processes only gets tested when they get paged out.
> Since it is all virtual addresses, a failure doesn't tell you where
> the error occurred.
> Likely to trash your system performance while it runs.
>
> So chances are you probably don't want to use it....
>
> Bill Bogstad
More information about the Discuss
mailing list