New subject: [Bug 224778] softirq.c#cpu_callback BUG_ON

30 Nov 2006

      https://bugzilla.novell.com/show_bug.cgi?id=224778

           Summary: softirq.c#cpu_callback BUG_ON
           Product: openSUSE 10.2
           Version: RC 4
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: P5 - None
         Component: Kernel
        AssignedTo: kernel-maintainers@forge.provo.novell.com
        ReportedBy: dhecht@vmware.com
         QAContact: qa@suse.de

While installing SuSE 10.2 alpha4 32-bit release on VMware, we hit the
following kernel BUG_ON: 
<0>kernel BUG at kernel/softirq.c:577!
We've confirmed the bug is still in RC1.

Note that this was also reported in:
https://bugzilla.novell.com/show_bug.cgi?id=210931. 

Also note that the kernel race leading to this BUG_ON is not specific to
running in a VM (granted, it may be hard to reproduce on native hardware).

The race is between the init path and the timer interrupt.

Before init thread completes spawn_ksoftirqd(), the timer interrupt fires,
calling update_process_times -> rcu_check_callbacks -> tasklet_schedule ->
__tasklet_schedule, which does __get_cpu_var(tasklet_vec).list = t.

This causes spawn_ksoftirqd -> cpu_callback to BUG_ON since the tasklet_vec
list is no longer empty.  I'm not sure who is using RCU so early to cause
rcu_pending() to return true before ksoftirqd is spawned.

Detailed description of the race:

The BUG_ON encountered is:

<4>CPU0: AMD Dual Core AMD Opteron(tm) Processor 275 stepping 02
<6>Total of 1 processors activated (4409.57 BogoMIPS).
<4>ENABLING IO-APIC IRQs
<6>..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
<0>------------[ cut here ]------------
<0>kernel BUG at kernel/softirq.c:577!
<0>invalid opcode: 0000 [#1]
<0>SMP
<0>last sysfs file:
<4>Modules linked in:
<0>CPU: 0
<4>EIP: 0060:[<c0124a6d>] Not tainted VLI
<4>EFLAGS: 00010286 (2.6.18-9-default #1)
<0>EIP is at cpu_callback+0x45/0x238
<0>eax: c03e606c ebx: 00000000 ecx: 00000000 edx: 00e1f100
<0>esi: 00000000 edi: 00000000 ebp: 00000000 esp: c1263f84
<0>ds: 007b es: 007b ss: 0068
<0>Process swapper (pid: 1, ti=c1262000 task=c12615f0 task.ti=c1262000)
<0>Stack: c01002fc 00000000 00000000 00000000 00000000 00000000 c03c5a71
c01002f
c
<0> c0100344 c12615f0 c12042e0 c03b1fcc c0103ca6 00000202 c01002fc 0000000
0
<0> 00000000 00000000 00000000 00000000 00000000 0000007b c01002fc 0000000
0
<0>Call Trace:
<4> [<c03c5a71>] spawn_ksoftirqd+0x1c/0x3b
<4> [<c0100344>] init+0x48/0x2bc
<4> [<c0102005>] kernel_thread_helper+0x5/0xb
<4>DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
<4>Leftover inexact backtrace:
<0>Code: 00 00 83 fa 04 0f 84 ae 00 00 00 83 fa 07 0f 85 ff 01 00 00 e9 d3 00
00
00 8b 14 8d 80 cb 35 c0 b8 6c 60 3e c0 83 3c 10 00 74 08 <0f> 0b 41 02 66 cc 2c
c0 b8 70 60 3e c0 83 3c 10 00 74 08 0f 0b
<0>EIP: [<c0124a6d>] cpu_callback+0x45/0x238 SS:ESP 0068:c1263f84
<4> <0>Kernel panic - not syncing: Attempted to kill init!
<4>

The BUG_ON statement hit is in kernel/softirq.c#cpu_callback:

                BUG_ON(per_cpu(tasklet_vec, hotcpu).list);

We debugged this further to confirm it is a kernel race.  We instrumented all
the places in that prepend to tasklet_vec[cpu].list with a BUG_ON checking that
the cpu_callback(...CPU_UP_PREPARE) had a chance to execute, and found that the
path that wins the race to updating tasklet_vec[cpu].list is:

checking if image is initramfs... it is
------------[ cut here ]------------
kernel BUG at kernel/softirq.c:356!
invalid opcode: 0000 [#1]
SMP
last sysfs file:
Modules linked in:
CPU: 0
EIP: 0060:[<c012539b>] Not tainted VLI
EFLAGS: 00010046 (2.6.18-9-vanilla #1)
EIP is at __tasklet_schedule+0x41/0x8c
eax: c03de078 ebx: 00000000 ecx: c03de06c edx: 00e27100
esi: cfb82000 edi: 00000046 ebp: 00000000 esp: cfb83784
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, ti=cfb82000 task=cfb815f0 task.ti=cfb82000)
Stack: cfb815f0 00000000 00000000 c0128f72 cfb837fc 00000000 00000000 c0107871
c02fca20 c0146fba cfb837fc c034ce28 c034ce00 00000000 cfb83868 c0147075
000000af cfb837fc c02fca20 00000000 cfb837fc 00000000 cfb83868 c0106966
Call Trace:
[<c0128f72>] update_process_times+0x4d/0x5c
[<c0107871>] timer_interrupt+0x4b/0x72
[<c0146fba>] handle_IRQ_event+0x23/0x49
[<c0147075>] __do_IRQ+0x95/0xee
[<c0106966>] do_IRQ+0x71/0x83
[<c0104e1a>] common_interrupt+0x1a/0x20
DWARF2 unwinder stuck at common_interrupt+0x1a/0x20
Leftover inexact backtrace:
[<c017332e>] do_path_lookup+0x106/0x25f
[<c017214d>] getname+0x59/0xb0
[<c0173bfb>] __user_walk_fd+0x2f/0x40
[<c016d6eb>] vfs_lstat_fd+0x16/0x3d
[<c016d726>] sys_newlstat+0x14/0x28
[<c03b1ef2>] clean_path+0x19/0x4e
[<c03b2d8b>] do_header+0x1a9/0x1b3
[<c03b27bd>] do_name+0x7f/0x1c2
[<c03b1a1b>] write_buffer+0x1a/0x28
[<c03b1a8a>] flush_window+0x61/0xaf
[<c03b1e6a>] inflate_codes+0x392/0x3f7
[<c03b327d>] inflate_dynamic+0x4e8/0x548
[<c03b37c0>] unpack_to_rootfs+0x4e3/0x8dc
[<c01002fc>] init+0x0/0x2bc
[<c01002fc>] init+0x0/0x2bc
[<c03b3c34>] populate_rootfs+0x7b/0xe2
[<c01002fc>] init+0x0/0x2bc
[<c01002fc>] init+0x0/0x2bc
[<c010032b>] init+0x2f/0x2bc
[<c0103ca6>] ret_from_fork+0x6/0x20
[<c01002fc>] init+0x0/0x2bc
[<c01002fc>] init+0x0/0x2bc
[<c0102005>] kernel_thread_helper+0x5/0xb
Code: 8b 14 9d 80 4b 35 c0 8b 14 11 89 10 8b 14 9d 80 4b 35 c0 89 04 11 8b 56
10 b8 78 e0 3d c0 8b 14 95 80 4b 35 c0 83 3c 10 00 75 08 <0f> 0b 64 01 94 59 2c
c0 b8 80 c3 3d c0 83 0c 10 20 89 e2 81 e2
EIP: [<c012539b>] __tasklet_schedule+0x41/0x8c SS:ESP 0068:cfb83784
<0>Kernel panic - not syncing: Fatal exception in interrupt

Note that 0xc0128f72 is the instruction after the call to rcu_check_callbacks
from update_process_times.  So, it is rcu_check_callbacks that is calling:

tasklet_schedule(&per_cpu(rcu_tasklet, cpu));

before spawn_ksoftirqd() had a chance to execute, leading to the original
BUG_ON.

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 224778] New: softirq.c#cpu_callback BUG_ON

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

tags

participants (1)