Bug ID 994969
Summary NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [irq/32-opal-elo:397]
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware PowerPC-64
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter ro@suse.com
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

since about kernel 4.0 I have not been able to boot a power7 machine
with OPAL/bare-metal.
Hang usually starts after scsi-controller/disk initialisation

ipr: 00000160: 00000000 18000040 0000031F 00060000
ipr: 00000170: 220218D1 0000C49B 0000CECE 00000000
scsi 0:255:255:255: No Device         IBM      2B4C001SISIOA    0150 PQ: 0
ANSI: 0
scsi 0:0:5:0: Direct-Access     IBM      ST9300653SS      740D PQ: 0 ANSI: 6
scsi 0:0:6:0: Direct-Access     IBM      ST9300653SS      740D PQ: 0 ANSI: 6
scsi 0:0:7:0: Direct-Access     IBM      ST9300653SS      740D PQ: 0 ANSI: 6
scsi 0:0:8:0: Direct-Access     IBM      ST9300653SS      740D PQ: 0 ANSI: 6
scsi 0:0:9:0: Direct-Access     IBM      ST9300653SS      740D PQ: 0 ANSI: 6
scsi 0:255:0:0: Direct-Access     IBM      IPR-0   112D8FAF      PQ: 0 ANSI: 3
 NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [irq/32-opal-elo:397]
Modules linked in: ipr(+) libata tg3 ptp pps_core libphy scsi_mod agpgart drbg
ansi_cprng
CPU: 8 PID: 397 Comm: irq/32-opal-elo Not tainted 4.7.0-2-default #1
task: c000000f2cda3780 ti: c000000f2de30000 task.ti: c000000f2de30000
NIP: c0000000000104c4 LR: c0000000000104c4 CTR: 0000000030041c0c
REGS: c000000f2de33830 TRAP: 0901   Not tainted  (4.7.0-2-default)
MSR: 900000000280b032 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 42000844  XER:
00000000
CFAR: c00000000023c480 SOFTE: 1
GPR00: c000000000010488 c000000f2de33ab0 c000000001320c00 0000000000000900
GPR04: c000000001178368 0000000000000008 900000000280b032 0000000000000000
GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000
GPR12: c00000000008728c c00000000fb84800
NIP [c0000000000104c4] .arch_local_irq_restore+0x74/0x90
LR [c0000000000104c4] .arch_local_irq_restore+0x74/0x90
Call Trace:
[c000000f2de33ab0] [c000000000010488] .arch_local_irq_restore+0x38/0x90
(unreliable)
[c000000f2de33b20] [c000000000166c04] .irq_finalize_oneshot.part.2+0xb4/0x250
[c000000f2de33bc0] [c000000000166edc] .irq_thread_fn+0x7c/0xa0
[c000000f2de33c50] [c000000000167390] .irq_thread+0x1b0/0x270
[c000000f2de33d30] [c00000000011ba4c] .kthread+0x10c/0x130
[c000000f2de33e30] [c00000000000966c] .ret_from_kernel_thread+0x58/0x6c
Instruction dump:
409e002c e92d0020 61298000 7d210164 38210070 e8010010 7c0803a6 4e800020
60000000 60000000 60000000 4bff1e0d <60000000> 4bffffdc 60000000 e92d0020
BUG: workqueue lockup - pool cpus=8 node=0 flags=0x0 nice=0 stuck for 51s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 16: cpus=8 node=0 flags=0x0 nice=0 active=3/256
    pending: .cache_reap, .push_to_pool, .push_to_pool
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
    in-flight: 509:.ipr_worker_thread [ipr]
workqueue vmstat: flags=0xc
  pwq 16: cpus=8 node=0 flags=0x0 nice=0 active=1/256
    pending: .vmstat_update
pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=1s workers=3 idle: 684 4
systemd-udevd[650]: seq 829
'/devices/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:06.0/0000:60:00.0' is
taking a long time
systemd-udevd[650]: seq 824
'/devices/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:05.0/0000:40:00.0' is
taking a long time
IPv6: ADDRCONF(NETDEV_UP): enP3p1s0f0: link is not ready
INFO: rcu_sched self-detected stall on CPU
        8-...: (6000 ticks this GP) idle=36d/140000000000001/0 softirq=162/162
fqs=4053
         (t=6000 jiffies g=24 c=23 q=41364)
Task dump for CPU 8:
irq/32-opal-elo R  running task        0   397      2 0x00000804
Call Trace:
[c000000f2de330d0] [c00000000012f7d8] .sched_show_task+0xd8/0x180 (unreliable)
[c000000f2de33150] [c000000000937d50] .rcu_dump_cpu_stacks+0xbc/0x104
[c000000f2de331f0] [c00000000017976c] .rcu_check_callbacks+0xa3c/0xaf0
[c000000f2de33320] [c000000000180c30] .update_process_times+0x50/0xa0
[c000000f2de333a0] [c000000000197c90] .tick_sched_handle.isra.6+0x40/0xd0
[c000000f2de33430] [c000000000197d84] .tick_sched_timer+0x64/0xd0
[c000000f2de334d0] [c000000000181798] .__hrtimer_run_queues+0x128/0x430
[c000000f2de335b0] [c0000000001826f8] .hrtimer_interrupt+0xf8/0x330
[c000000f2de336a0] [c00000000001f034] .__timer_interrupt+0x94/0x270
[c000000f2de33740] [c00000000001f3a0] .timer_interrupt+0x90/0xd0
[c000000f2de337c0] [c000000000002824] decrementer_common+0x124/0x180
--- interrupt: 901 at .arch_local_irq_restore+0x74/0x90
    LR = .arch_local_irq_restore+0x74/0x90
[c000000f2de33ab0] [c000000000010488] .arch_local_irq_restore+0x38/0x90
(unreliable)
[c000000f2de33b20] [c000000000166c04] .irq_finalize_oneshot.part.2+0xb4/0x250
[c000000f2de33bc0] [c000000000166edc] .irq_thread_fn+0x7c/0xa0
[c000000f2de33c50] [c000000000167390] .irq_thread+0x1b0/0x270
[c000000f2de33d30] [c00000000011ba4c] .kthread+0x10c/0x130
[c000000f2de33e30] [c00000000000966c] .ret_from_kernel_thread+0x58/0x6c
tg3 0003:01:00.0 enP3p1s0f0: Link is up at 1000 Mbps, full duplex
tg3 0003:01:00.0 enP3p1s0f0: Flow control is off for TX and off for RX
tg3 0003:01:00.0 enP3p1s0f0: EEE is enabled


You are receiving this mail because: