Am Montag, 25. Februar 2008 schrieb Hans-Peter Jansen:
Since the users also complain about 1-10 second hangs in a terminal based order management system, I thought, it would be a good idea to try to move the kernel to 2.6.24.1 with all the fancy (IO) scheduling and engaging BKL, etc.. (I just rpmbuild Kernel:/HEAD/openSUSE_Factory/kernel-default-2.6.24.1-35.1 on that system).
The first tries consistently resulted in Oops during initrd, similar to:
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000040 printing eip: c011f96c *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /block/sde/sde1/dev Modules linked in: sata_sil24 libata 3w_9xxx sd_mod scsi_mod
Pid: 538, comm: udev Not tainted (2.6.24.1-35.1-default #1) EIP: 0060:[<c011f96c>] EFLAGS: 00010046 CPU: 0 EIP is at pick_next_task_fair+0x15/0x23 EAX: 00000000 EBX: f75d11f0 ECX: c202cdd0 EDX: 00000000 ESI: 00000000 EDI: 00000001 EBP: f7481f08 ESP: f7481f08 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process udev (pid: 538, ti=f7480000 task=f75d11f0 task.ti=f7480000) Stack: f7481f2c c02eaeea c202a1cc 00000000 c202cd80 f75d1358 f7481f44 00000001 00000001 f7481f9c c02eb95e 00000001 00000000 00000000 c013af1e f7ffc944 00000000 00000000 73e8b1d5 00000005 c013aeba c202a1cc 00000001 c02eb953 Call Trace: [<c02eaeea>] __sched_text_start+0x11a/0x379 [<c02eb95e>] do_nanosleep+0x3c/0x67 ======================= Code: 39 c8 73 0c 5e 89 f8 5b 5e 5f 5d e9 9b f6 ff ff 5b 5b 5e 5f 5d c3 55 83 c0 34 31 d2 83 78 08 00 89 e5 74 11 e8 15 fe ff ff 89 c2 <8b> 40 40 85 c0 75 f2 83 ea 30 5d 89 d0 c3 55 89 e5 53 89 d3 83 EIP: [<c011f96c>] pick_next_task_fair+0x15/0x23 SS:ESP 0068:f7481f08 ---[ end trace 18a67066b954c85e ]---
As it stands, it crashes consistently in pick_next_task_fair, even with a initrd within contraints..
Today I noticed Gregs 2.6.24.3 announcement, with two hrtimer related fixes, with could fit the picture.
I do confirm, that 2.6.24.3 overcomes the reported initrd problem (even using the native mkinitrd/udev setup from 9.3). With a few kernel config touches (NO_HZ disabled, switched to HZ_1000) and few nfs related package rebuilds from factory, it's finally running in production now. If it survives today, I feel much better. Hopefully the latency problems dimished, but I reported some not so funny looking numbers from a different setup to LKML, gathered with latencytop (which hopefully didn't induced some Heisenberg uncertainty relation problem). Should I report such problems here, too, given, that 2.6.24.3 isn't easily available for SUSE setups ATM? OTOH, I noticed still some problems with that kernel on openSUSE 10.2 (probably related to inet interface renaming?) - which resulted in: - a hard freeze immediately after initing lo in RL 3 (forces manual reset) - failing rename, which left the major device with some obscure ethxx(?) name Pete -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org