[Bug 1105271] New: Kernel 4.4.143-65-default does not boot on Dell Precision T5810 (Xeon E5-1620 v3)
http://bugzilla.suse.com/show_bug.cgi?id=1105271 Bug ID: 1105271 Summary: Kernel 4.4.143-65-default does not boot on Dell Precision T5810 (Xeon E5-1620 v3) Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.3 Hardware: x86-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Basesystem Assignee: kernel-maintainers@forge.provo.novell.com Reporter: sebastian.parschauer@suse.com QA Contact: qa-bugs@suse.de Found By: L3 Blocker: --- Created attachment 780104 --> http://bugzilla.suse.com/attachment.cgi?id=780104&action=edit Photo of screen when hanging after early ucode update The new Leap 42.3 kernel 4.4.143-65-default does not boot on my Dell Precision Tower 5810 SUSE R&D workstation with Intel Xeon E5-1620 v3 CPU. Please note: *That CPU has an unstable TSC.* Boot hangs directly after early microcode update. Had to boot the previous kernel 4.4.140-62-default. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c1
Sebastian Parschauer
https://www.dell.com/support/home/us/en/04/drivers/driversdetails?driverId=F...
Will update BIOS to A27 and retest. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c2
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c3
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c4
--- Comment #4 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c6
--- Comment #6 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c8
--- Comment #8 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c9
--- Comment #9 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c10
--- Comment #10 from Borislav Petkov
It's strange for me that the cores have different MHz values.
That's because it is sampling the effective, *current* core frequency from APERF/MPERF, for something like 10ms, I think. VS the static P0 frequency which you're probably expecting but that would be a lie so the actual freq sample is kinda closer to the current load on the box. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c11
--- Comment #11 from Sebastian Parschauer
DMI: Dell Inc. Precision Tower 5810/0K240Y, BIOS A27 06/25/2018 tsc: Fast TSC calibration using PIT ... clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns hpet clockevent registered tsc: Fast TSC calibration using PIT tsc: Detected 3492.005 MHz processor [Firmware Bug]: TSC ADJUST: CPU0: -3570856054 force to 0 Calibrating delay loop (skipped), value calculated using timer frequency.. 6984.01 BogoMIPS (lpj=13968020) ... DMAR-IR: Enabled IRQ remapping in xapic mode x2apic: IRQ remapping doesn't support X2APIC mode ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 TSC deadline timer enabled smpboot: CPU0: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz (family: 0x6, model: 0x3f, stepping: 0x2) Performance Events: PEBS fmt2+, Haswell events, 16-deep LBR, full-width counters, Intel PMU driver. ... version: 3 ... bit width: 48 ... generic registers: 4 ... value mask: 0000ffffffffffff ... max period: 00007fffffffffff ... fixed-purpose events: 3 ... event mask: 000000070000000f smp: Bringing up secondary CPUs ... x86: Booting SMP configuration: .... node #0, CPUs: #1 [Firmware Bug]: TSC ADJUST differs within socket(s), fixing all errors #2 NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. #3 #4 #5 #6 #7 smp: Brought up 1 node, 8 CPUs smpboot: Max logical packages: 1 smpboot: Total of 8 processors activated (55872.08 BogoMIPS) node 0 initialised, 2719092 pages in 24ms devtmpfs: initialized x86/mm: Memory block size: 128MB
A second later:
tsc: Refined TSC clocksource calibration: 3491.914 MHz clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x32557ae966b, max_idle_ns: 440795369289 ns
A further second later:
clocksource: Switched to clocksource tsc
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Takashi Iwai
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Marcus Meissner
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c15
--- Comment #15 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c16
Jiri Kosina
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Jiri Kosina
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c17
--- Comment #17 from Jiri Kosina
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c18
--- Comment #18 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Martin Pluskal
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c19
--- Comment #19 from Takashi Iwai
Btw, there's one more fix for the fix for the SMT off case:
bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")
Can you pls apply it ontop and test with it too?
I'm building a test kernel with this patch in OBS home:tiwai:bsc1105271 repo. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c20
--- Comment #20 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c21
--- Comment #21 from Takashi Iwai
I'm building a test kernel with this patch in OBS home:tiwai:bsc1105271 repo.
The test kernel is ready: http://download.opensuse.org/repositories/home:/tiwai:/bsc1105271/standard/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c22
--- Comment #22 from Sebastian Parschauer
(In reply to Takashi Iwai from comment #19)
I'm building a test kernel with this patch in OBS home:tiwai:bsc1105271 repo.
The test kernel is ready: http://download.opensuse.org/repositories/home:/tiwai:/bsc1105271/standard/
Same issue with this kernel-default-4.4.147-1.1.gb325bb2.x86_64.rpm. Did not help. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c23
--- Comment #23 from Jiri Kosina
Same issue with this kernel-default-4.4.147-1.1.gb325bb2.x86_64.rpm. Did not help.
I expected that a bit, as it's supposed to change behavior only if nosmt is on cmdline or SMT is disabled in BIOS. Would you be able to do the bisect please? My primary suspect would be patches.arch/15-cpu-hotplug-boot-HT-siblings-at-least-once.patch so you can either start with that one, or do a proper bisect of those 18 patches. Thanks again. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c24
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c25
--- Comment #25 from Takashi Iwai
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c26
--- Comment #26 from Sebastian Parschauer
http://download.opensuse.org/repositories/home:/tiwai:/bsc1105271-revert-15/ standard/
No luck with that. Problem persists. Will compile locally. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c27
--- Comment #27 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c28
--- Comment #28 from Takashi Iwai
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c29
--- Comment #29 from Sebastian Parschauer
patches.arch/05-cpu-hotplug-provide-knobs-to-control-smt.patch is not a problem but the L1TF patches heavily depend on this one.
With the SMT patches removed until this one, the problem persists. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c30
Jan Baier
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c31
--- Comment #31 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c32
--- Comment #32 from Jiri Kosina
Thanks, that helped. Removing everything until:
patches.arch/05-cpu-hotplug-provide-knobs-to-control-smt.patch is not a problem but the L1TF patches heavily depend on this one.
So could you please bisect the series then, including the L1TF patches (or just disable all them in one go, and see whether the problem persists, and if so, just bisect the nosmt series). Of course you have to start bisection by disabling always 'higher-numbered' patches first, so that the patches still apply. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c33
--- Comment #33 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c34
--- Comment #34 from Borislav Petkov
Now you need to help me again how to get from what kernel-source.git provides to what kernel.git provides. I've used kernel.git before and just used git revert.
Just comment out the patches in series.conf up to which you wanna test, sequence-patch.sh it, build and boot. Then you remove the comment of the next patch and repeat. Until you see the difference when booting. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c35
--- Comment #35 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c36
--- Comment #36 from Jiri Kosina
So could you please bisect the series then, including the L1TF patches (or just disable all them in one go, and see whether the problem persists, and if so, just bisect the nosmt series).
Of course you have to start bisection by disabling always 'higher-numbered' patches first, so that the patches still apply.
So acutally, to be completely on a safe side, I'd rather propose: - you start by commenting out *all* the l1tf and nosmt patches - if that kernel boots, it's clear that the issue is introduced by one of those patches, and start bisect-enabling them starting from the one with lowest number (so that they are still in proper series), until you identify the one that introduces the regression - if even with all the nosmt and l1tf patches disabled the problem still persists, it's caused by some other change that happened (I find that rather unlikely, but let's not completely ditch that possiblity). Thank you. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c37
--- Comment #37 from Sebastian Parschauer
patches.arch/01-sched-smt-update-sched_smt_present-at-runtime.patch patches.arch/02-x86-smp-provide-topology_is_primary_thread.patch patches.arch/03-x86-topology-provide-topology_smt_supported.patch patches.arch/04-cpu-hotplug-split-do_cpu_down.patch patches.arch/04.1-cpu-hotplug-add-sysfs-state-interface.patch patches.arch/04.2-x86-topology-add-topology_max_smt_threads.patch patches.arch/04.3-x86-smpboot-do-not-use-smp_num_siblings-in- _max_logical_packages-calculation.patch patches.arch/05-cpu-hotplug-provide-knobs-to-control-smt.patch
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c38
Sebastian Parschauer
patches.arch/01-sched-smt-update-sched_smt_present-at-runtime.patch
Without it, the system is booting. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c39
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c40
--- Comment #40 from Takashi Iwai
It is directly the first patch of the series:
patches.arch/01-sched-smt-update-sched_smt_present-at-runtime.patch
Without it, the system is booting.
So you just disabled this patch while keeping the rest, and it boots fine? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c41
--- Comment #41 from Jiri Kosina
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c42
--- Comment #42 from Sebastian Parschauer
So you just disabled this patch while keeping the rest, and it boots fine?
I've tested both ways based on the first bad KOTD commit f206194. First way: disabling everything at "nosmt", "KVM", "SMT runtime control" and "fixes". Second way: Only disable the patch patches.arch/01-sched-smt-update-sched_smt_present-at-runtime.patch. The result is not perfect. Systemd times out while looking for my encrypted /data partition on 1TB HDD after I provide the pw for encrypted /home on NVMe. But I think this is config related. I can try based on latest origin/SLE12-SP3 with the Leap 42.3 kernel-default config and only the one patch removed if you want. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Jiri Kosina
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c43
--- Comment #43 from Sebastian Parschauer
I can try based on latest origin/SLE12-SP3 with the Leap 42.3 kernel-default config and only the one patch removed if you want.
That method worked. System is coming up completely. I'm on commit ab97704c13 now. Will build two kernels next: One with the patch re-enabled and then one with the patch from comment 41 added at "fixes". -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c45
Jiri Kosina
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Jiri Kosina
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c47
--- Comment #47 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c48
--- Comment #48 from Sebastian Parschauer
[ 0.487983] smpboot: #2smpboot: #3smpboot: #4 [ 0.496242] BUG: scheduling while atomic: swapper/4/0/0x00000002
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c49
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c50
--- Comment #50 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c51
Jiri Kosina
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c52
--- Comment #52 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c53
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c54
--- Comment #54 from Sebastian Parschauer
x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT
4.12:
x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
The second WC is changed to WP. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c55
--- Comment #55 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c56
--- Comment #56 from Borislav Petkov
Do you have the TSC desync messages (and turning TSC off) in the dmesg of the boot that works (with the 42.3 kernel with just the bisected patch removed)?
So they're not present in the 4.12 dmesg and it says: [ 0.000000] [Firmware Bug]: TSC ADJUST differs within socket(s), fixing all errors and I know there was a major screwage in the TSC_ADJUST area and if BIOS is fumbling with it... I don't see now that would cause the hang though. Maybe boot with tsc=reliable to disable the sync check... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c57
--- Comment #57 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c58
--- Comment #58 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c59
--- Comment #59 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c60
--- Comment #60 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c61
--- Comment #61 from Takashi Iwai
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c62
--- Comment #62 from Sebastian Parschauer
Could you check the kernel in IBS (not OBS) home:tiwai:test:sle12-sp3-smt-test?
Done, kernel-default-4.4.151-1.1.g6950079.x86_64.rpm does not boot. Same issue like with ab97704c13. Do you need the log? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c63
--- Comment #63 from Takashi Iwai
(In reply to Takashi Iwai from comment #61)
Could you check the kernel in IBS (not OBS) home:tiwai:test:sle12-sp3-smt-test?
Done, kernel-default-4.4.151-1.1.g6950079.x86_64.rpm does not boot. Same issue like with ab97704c13. Do you need the log?
Not needed, thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c69
--- Comment #69 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c73
--- Comment #73 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c74
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c75
--- Comment #75 from Jiri Kosina
Created attachment 780667 [details] test patch 2
Ok, looks like that jump label call hangs somewhere. Looking at upstream, we don't have
d0646a6f5533 ("jump_label: Add RELEASE barrier after text changes")
so here's an updated patch.
I actually think the problem is elsewhere. arch_jump_label_transform() calls get_online_cpus(), but we're calling this from CPU hotplug/bringup path while holding (I will have to double check that, but I am pretty sure at this point) CPU hotplug lock, and therefore arch_jump_label_transform() deadlocks on acquiring it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c77
--- Comment #77 from Borislav Petkov
arch_jump_label_transform() calls get_online_cpus(), but we're calling this from CPU hotplug/bringup path while holding (I will have to double check that, but I am pretty sure at this point) CPU hotplug lock, and therefore arch_jump_label_transform() deadlocks on acquiring it.
Hmm, and the fix for that is: f2545b2d4ce1 ("jump_label: Reorder hotplug lock and jump_label_lock") but if I backport that, I'd need to backport the _cpuslocked() variants too. And yes, why does it trigger only on those machines is beyond me too. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c79
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c80
--- Comment #80 from Sebastian Parschauer
debug initcall_debug ignore_loglevel log_buf_len=16M earlyprintk=serial,ttyS0,115200 added to the boot parameters for this kernel.
Kernel is building. Will test it and attach the log. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c81
--- Comment #81 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c82
--- Comment #82 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c83
--- Comment #83 from Jiri Kosina
tsc_init() enables the __use_tsc static key on the BSP and *that* grabs jump_label_mutex before it grabs the hotplug lock while on another CPU the CPU notifier runs which holds the hotplug lock already and then enables another static key - the sched_smt_present - which causes the lock inversion by trying to grab the jump_label_mutex first.
Yeah, I agree with that analysis completely, thanks.
@jikos: the way I see it, there's no way around backporting the jump label patches so that we don't grab that hotplug again in the notifier call.
Or am I missing something?
I have been looking into the code a bit, and I really don't see any other way around this than factoring the hotplug lock out of jump label updating code :/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c85
--- Comment #85 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c86
--- Comment #86 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c87
--- Comment #87 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c88
--- Comment #88 from Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c97
--- Comment #97 from Sebastian Parschauer
https://build.opensuse.org/project/show/home:sparschauer:leap42.3_fixes
Version: 4.4.143-65.1.1.FIX.1105271 Change: added the 8 jump label patches on top of Leap 42.3 kernel MU Tested: yes, works Supported: no Thanks for all the help here! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c99
--- Comment #99 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c101
--- Comment #101 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c103
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c104
--- Comment #104 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c105
--- Comment #105 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c106
Sebastian Parschauer
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c107
Sebastian Vollath
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c108
Jan Baier
@Jan: Is this fixed with the SLE kernel MU for you as well? TIA
I do not have the precisely same software setup anymore, but the latest kernel seems to work for me as well. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c109
--- Comment #109 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1105271
http://bugzilla.suse.com/show_bug.cgi?id=1105271#c110
--- Comment #110 from Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com