Re: [suse-amd64] Suse 9 Pro on dual opteron + 6GB mem crashes/panics

21 Dec 2003

      Andi Kleen wrote:
...
On Sat, 20 Dec 2003 22:24:43 +0100
Arjen van der Meijden  wrote:
...
Once it crashed with a more complete (and different) oops/panic:
Dec 20 20:28:02 apollo kernel: Unable to handle kernel paging request at 
virtual address 0000007f804537e0
Dec 20 20:28:02 apollo kernel:  printing rip:
Dec 20 20:28:02 apollo kernel: ffffffff801494f7
Dec 20 20:28:02 apollo kernel: PML4 1048b1067 PGD 0
Dec 20 20:28:02 apollo kernel: Oops: 0000
Dec 20 20:28:02 apollo kernel: CPU 1
Dec 20 20:28:02 apollo kernel: Pid: 7, comm: kswapd Not tainted
Dec 20 20:28:02 apollo kernel: RIP: 
0010:[kmem_cache_reap+343/880]{kmem_cache_reap+343}
I would suspect bad memory here.
-Andi
First of all, does that explain the 1.5G swapusage?
As in: would it use 1.5G of swap if the memory is broken, even if there 
is plenty of diskcache to remove?
I'm no kernel expert, so I don't know the answer to that and I hope you 
do :)

But there is some more news from our front.

- I've adjusted the kernel bootparameters to read iommu=fullflush (we 
noticed your comments on the 2.6.0-amd64-patchpack about the iommu being 
forced on a io-device that doesn't support it that well)
- Changed our 32bits mysql to use less than 2G of memory instead of more 
(mysql (actually, innodb) used to crash itself when it was configured 
with more than 2G of memory available to its buffers and such, due to 
issues with glibc orso).

And now it is already running for about 5 hours en 52 minutes, without a 
hick on exactly the same type of load as before (using the full 6G of 
memory), when it didn't get past the 2 hours.

The question is now: Is our problem solved now? And if so: What did 
solve it?

When it hits the 24hour mark, we'll probably try a few steps changing 
back, like booting without the iommu=fullflush and such things.

Best regards,

Arjen