[opensuse-kernel] 2.6.24.1-35.1 udev crashes in initrd

25 Feb 2008

      Hi,

in a critical setting, I'm still using a SuSE 9.3 server (sorry), where the 
system gets unusable typically after 70-80 days uptime (only sysrq works 
then). Unfortunately, these crashes!?, even using the ATL-SysRQ [S] [U] [B] 
sequence leaves currupt ldap databases behind lately, too. All that mess is 
running 2.6.11.4-21.14-smp still.

Since the users also complain about 1-10 second hangs in a terminal based 
order management system, I thought, it would be a good idea to try to move 
the kernel to 2.6.24.1 with all the fancy (IO) scheduling and engaging BKL, 
etc.. (I just rpmbuild
Kernel:/HEAD/openSUSE_Factory/kernel-default-2.6.24.1-35.1 on that
system).

The first tries consistently resulted in Oops during initrd, similar to:

 BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip: c01e5374 *pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: /block/md0/dev
Modules linked in: xfs ide_cd cdrom ide_disk pata_amd amd74xx ide_core raid456 async_xor async_memcpy async_tx xor 
sata_
sil24 libata 3w_9xxx sd_mod scsi_mod

Pid: 710, comm: udev_volume_id Tainted: G       N (2.6.24.1-35.1-default #1)
EIP: 0060:[<c01e5374>] EFLAGS: 00010046 CPU: 0
EIP is at __rb_erase_color+0x19/0x13f
EAX: 00000000 EBX: f7fcd3a8 ECX: 00000000 EDX: 00000000
ESI: c2029f9c EDI: f7fcd3a0 EBP: f750be28 ESP: f750be10
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process udev_volume_id (pid: 710, ti=f750a000 task=f7fcd370 task.ti=f750a000)
Stack: f7525a20 c2029f80 c011f6fe c2029f80 f7525a20 f7509040 f750be38 c011f841 
       f75259f0 c02f65a0 f750be48 c01200bd c202cd80 f75259f0 f750be58 c0120146 
       f75259f0 00000000 f750be7c c02eaeb3 f7509040 f75259f0 c202cd80 f7fcd4d8 
Call Trace:
 [<c011f6fe>] dequeue_entity+0x2c/0x39
 [<c011f841>] dequeue_task_fair+0x18/0x2d
 [<c01200bd>] dequeue_task+0xd/0x18
 [<c0120146>] deactivate_task+0x1f/0x2b
 [<c02eaeb3>] __sched_text_start+0xe3/0x379
 [<c02eb8bf>] __mutex_lock_interruptible_slowpath+0x6e/0x9f
 [<c02eb7d0>] mutex_lock_interruptible+0x1b/0x21
 [<c0278f77>] md_open+0x1f/0x50
 [<c019cbbf>] do_open+0x1b6/0x248
 [<c019cd00>] blkdev_open+0x27/0x51
 [<c017a91c>] __dentry_open+0xd1/0x184
 [<ffffff9c>] 0xffffff9c
DWARF2 unwinder stuck at 0xffffff9c

Leftover inexact backtrace:

 [<c017aab2>] nameidata_to_filp+0x23/0x32
 [<c017aa0f>] do_filp_open+0x40/0x48
 [<c017ab6a>] get_unused_fd_flags+0x59/0xc3
 [<c017acac>] do_sys_open+0x48/0xc9
 [<c017ad47>] sys_open+0x1a/0x1c
 [<c0104fa2>] syscall_call+0x7/0xb
 =======================
Code: 01 0f 84 75 ff ff ff 8b 45 00 83 08 01 5b 5e 5f 5d c3 56 89 ce 53 89 d3 e9 19 01 00 00 8b 53 08 39 c2 0f 85 84 
00 
00 00 8b 4b 04 <8b> 01 a8 01 75 14 83 c8 01 89 f2 89 01 89 d8 83 23 fe e8 7d fe 
EIP: [<c01e5374>] __rb_erase_color+0x19/0x13f SS:ESP 0068:f750be10
---[ end trace 92ebfcce66e192a6 ]---

and:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000040
printing eip: c011f96c *pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: /block/sde/sde1/dev
Modules linked in: sata_sil24 libata 3w_9xxx sd_mod scsi_mod

Pid: 538, comm: udev Not tainted (2.6.24.1-35.1-default #1)
EIP: 0060:[<c011f96c>] EFLAGS: 00010046 CPU: 0
EIP is at pick_next_task_fair+0x15/0x23
EAX: 00000000 EBX: f75d11f0 ECX: c202cdd0 EDX: 00000000
ESI: 00000000 EDI: 00000001 EBP: f7481f08 ESP: f7481f08
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process udev (pid: 538, ti=f7480000 task=f75d11f0 task.ti=f7480000)
Stack: f7481f2c c02eaeea c202a1cc 00000000 c202cd80 f75d1358 f7481f44 00000001 
       00000001 f7481f9c c02eb95e 00000001 00000000 00000000 c013af1e f7ffc944 
       00000000 00000000 73e8b1d5 00000005 c013aeba c202a1cc 00000001 c02eb953 
Call Trace:
 [<c02eaeea>] __sched_text_start+0x11a/0x379
 [<c02eb95e>] do_nanosleep+0x3c/0x67
 =======================
Code: 39 c8 73 0c 5e 89 f8 5b 5e 5f 5d e9 9b f6 ff ff 5b 5b 5e 5f 5d c3 55 83 c0 34 31 d2 83 78 08 00 89 e5 74 11 e8 
15 
fe ff ff 89 c2 <8b> 40 40 85 c0 75 f2 83 ea 30 5d 89 d0 c3 55 89 e5 53 89 d3 83 
EIP: [<c011f96c>] pick_next_task_fair+0x15/0x23 SS:ESP 0068:f7481f08
---[ end trace 18a67066b954c85e ]---

Looking into requirements, I noticed that 2.6.24 needs a udev 081, while the 
system uses 053-15.4. As it also still has the plain old static /dev setup, I 
figured, I only need udev during boot, aka initrd. Since that initrd setup
differs considerable, I created the initrd on a openSUSE 10.2 system with 
udev-103-12 and a matching mkinitrd. Note, all I want is overcoming the 
initrd oopsing, but still no deal:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000040
printing eip: c011f96c *pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: /block/md0/dev
Modules linked in: xfs ide_cd cdrom ide_disk pata_amd amd74xx ide_core raid456 async_xor async_memcpy async_tx xor 
sata_
sil24 libata 3w_9xxx sd_mod scsi_mod

Pid: 538, comm: udev Tainted: G       N (2.6.24.1-35.1-default #1)
EIP: 0060:[<c011f96c>] EFLAGS: 00010046 CPU: 0
EIP is at pick_next_task_fair+0x15/0x23
EAX: 00000000 EBX: f75961b0 ECX: c202cdd0 EDX: 00000000
ESI: 00000000 EDI: 00000001 EBP: f765ff08 ESP: f765ff08
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process udev (pid: 538, ti=f765e000 task=f75961b0 task.ti=f765e000)
Stack: f765ff2c c02eaeea c202a1cc 00000000 c202cd80 f7596318 f765ff44 00000001 
       00000001 f765ff9c c02eb95e 00000001 00000000 00000000 c013af1e f7657f44 
       00000000 00000000 d00530e1 00000006 c013aeba c202a1cc 00000001 c02eb953 
Call Trace:
 [<c02eaeea>] __sched_text_start+0x11a/0x379
 [<c02eb95e>] do_nanosleep+0x3c/0x67
 =======================
Code: 39 c8 73 0c 5e 89 f8 5b 5e 5f 5d e9 9b f6 ff ff 5b 5b 5e 5f 5d c3 55 83 c0 34 31 d2 83 78 08 00 89 e5 74 11 e8 
15 
fe ff ff 89 c2 <8b> 40 40 85 c0 75 f2 83 ea 30 5d 89 d0 c3 55 89 e5 53 89 d3 83 
EIP: [<c011f96c>] pick_next_task_fair+0x15/0x23 SS:ESP 0068:f765ff08
---[ end trace b79f2f543ef32b7a ]---

As it stands, it crashes consistently in pick_next_task_fair, even with a 
initrd within contraints..

Today I noticed Gregs 2.6.24.3 announcement, with two hrtimer related fixes, 
with could fit the picture. Another problem in this setup and the reason for
this deliberate inquiry is, I normally only have a chance to fiddle with such 
things at sunday, if at all. Before I spam LKML, I thought, I ask the SUSE
kernel people here.

Do these oopses sound common to anybody? Do I have a chance to get over it with 
the 2.6.24.3 patches?

If you want more info, just ask..

Thanks in advance,
Pete
-- 
To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse-kernel+help@opensuse.org

Hans-Peter Jansen

Jiri Kosina

Jiri Kosina

Hans-Peter Jansen

Hans-Peter Jansen

tags

participants (2)