Andi Kleen wrote:
On Sat, 20 Dec 2003 22:24:43 +0100 Arjen van der Meijden
wrote: Once it crashed with a more complete (and different) oops/panic:
Dec 20 20:28:02 apollo kernel: Unable to handle kernel paging request at virtual address 0000007f804537e0 Dec 20 20:28:02 apollo kernel: printing rip: Dec 20 20:28:02 apollo kernel: ffffffff801494f7 Dec 20 20:28:02 apollo kernel: PML4 1048b1067 PGD 0 Dec 20 20:28:02 apollo kernel: Oops: 0000 Dec 20 20:28:02 apollo kernel: CPU 1 Dec 20 20:28:02 apollo kernel: Pid: 7, comm: kswapd Not tainted Dec 20 20:28:02 apollo kernel: RIP: 0010:[kmem_cache_reap+343/880]{kmem_cache_reap+343}
I would suspect bad memory here.
-Andi
First of all, does that explain the 1.5G swapusage? As in: would it use 1.5G of swap if the memory is broken, even if there is plenty of diskcache to remove? I'm no kernel expert, so I don't know the answer to that and I hope you do :) But there is some more news from our front. - I've adjusted the kernel bootparameters to read iommu=fullflush (we noticed your comments on the 2.6.0-amd64-patchpack about the iommu being forced on a io-device that doesn't support it that well) - Changed our 32bits mysql to use less than 2G of memory instead of more (mysql (actually, innodb) used to crash itself when it was configured with more than 2G of memory available to its buffers and such, due to issues with glibc orso). And now it is already running for about 5 hours en 52 minutes, without a hick on exactly the same type of load as before (using the full 6G of memory), when it didn't get past the 2 hours. The question is now: Is our problem solved now? And if so: What did solve it? When it hits the 24hour mark, we'll probably try a few steps changing back, like booting without the iommu=fullflush and such things. Best regards, Arjen