On 1/9/2012 9:20 PM, Basil Chupin wrote:
I had the need to go to the memtest86.com site to grab a copy of the test to check the condition of my RAM and came across a patch for the kernel which purports to use a feature of the kernel to map bad RAM bytes and not use them thus avoiding the need to replace RAM (perhaps unnecessarily). The patches do not go beyond kernel 2.6.x.
I was wondering if this patch is now automatically built-in into the kernel and patching the kernel is therefore no longer necessary (I am using kernel 3.2 for example)?
Anybody know the answer, please?
BC
There is a built-in feature that can do the same thing. Just use the memmap=size$start kernel command line option. From /boot/grub/menu.lst an old 10.1 box of mine: # server is randomly crashing # memtest 4.00 reports 0x0006703d144,0xffffffffffc ###Don't change this comment - YaST2 identifier: Original name: linux### title SUSE Linux 10.1 root (hd0,1) kernel /boot/vmlinuz root=/dev/sda2 resume=/dev/sda1 showopts memmap=1K$0x0006703d144 initrd /boot/initrd So, memtest said that a very small discrepancy happens starting at address 0x0006703d144 So, I told the kernel via memmap=size$start to ignore 1K starting at that same address. Ypu have to let memtest run through all tests for a day or two and at full hot temperature (like, not with the case off and out on a bench instead of in whatever hole you would normally run it), to really be sure you are getting all marginal addresses. Then depending on the results, maybe the errors are all close together and you can encompass them in one memmap range that isn't too wastefully large, or you may want to use a few different smaller ranges. In the server above there were a few bad addresses, but they were all very close to each other and that tiny 1K range covers them all. And that server no longer crashes. The change was made about a year ago and at that time it was crashing once or twice a week. So that really did fix it. It's a stock opensuse 10.1 kernel (oss & updates repos). I think it still works the same at least as of 11.4. A few months ago I had a new box with new faulty ram that I had to use until the RMA came in. Another box would give errors but only because it was a desktop motherboard in a 1u rackmount case, which means the airflow was all wrong and the cpu got no air at all. The memory errors were pretty random, all over the map the longer the tests ran. Switching to a different case with different fan and power-supply arrangements made the memory errors go away on that one. -- bkw -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org