[opensuse] Q: 4.16 kernel status for Leap 15
Hi ! Is this kernel (4.16 from pmdpalma home repository) works stable on Leap 15 ? I’m going to use it in production system, KVM host system. http://ftp.opensuse.org/repositories/home:/pmdpalma:/core:/Kernel_stable_sta... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Op woensdag 6 juni 2018 16:26:17 CEST schreef Andrei Verovski:
Hi !
Is this kernel (4.16 from pmdpalma home repository) works stable on Leap 15 ? I’m going to use it in production system, KVM host system.
http://ftp.opensuse.org/repositories/home:/pmdpalma:/core:/Kernel_stable_sta ndard/openSUSE_Leap_15.0/ And you ask this after deciding to use it on a production system? This is a home: repo, these are for packagers to create packages, break them, fix them, break them again. Never on one of my production systems. Never.
-- Gertjan Lettink a.k.a. Knurpht openSUSE Board Member openSUSE Forums Team -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Knurpht @ openSUSE wrote:
Op woensdag 6 juni 2018 16:26:17 CEST schreef Andrei Verovski:
Hi !
Is this kernel (4.16 from pmdpalma home repository) works stable on Leap 15 ? I’m going to use it in production system, KVM host system.
ndard/openSUSE_Leap_15.0/ And you ask this after deciding to use it on a production system? This is a home: repo, these are for packagers to create packages, break
http://ftp.opensuse.org/repositories/home:/pmdpalma:/core:/Kernel_stable_sta them, fix them, break them again. Never on one of my production systems. Never.
Couldn't agree more :-) if you _must_ have the latest kernel, for whatever reason, maybe check this out: https://en.opensuse.org/openSUSE:Kernel_of_the_day Personally, in production I prefer being backlevel. -- Per Jessen, Zürich (23.6°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 06/06/2018 11:26 AM, Per Jessen wrote:
if you _must_ have the latest kernel, for whatever reason, maybe check this out:
https://en.opensuse.org/openSUSE:Kernel_of_the_day
Personally, in production I prefer being backlevel.
In general 4.16 is fine. Been on it since April 20. Aside from a slight init delay on a SuperMicro H8DM8-2/H8DM8-2 dual-quad-core Opteron box, there have been no issues. There the only issue there is an init issue with: perf: interrupt took too long (10078 > 2500), lowering kernel.perf_event_max_sample_rate to 19800 ... Which then causes the kernel loop through each core in some type of NMI handler self-test dumping all registers until it figures out there really is nothing wrong and then proceeds to boot fine. Another quad-quad-core Opetron Supermicro board doesn't experience the same delay. Both boxes boot without any such problem on 4.14 -- so it looks like some oddball bios or firmware issue. I have encountered no operational issues otherwise (servers provide dhcp w/dynamic DNS updates, bind, apache, mysql, postfix/dovecot, hylafax/avantfax, vsftp, etc..) -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
David C. Rankin wrote:
On 06/06/2018 11:26 AM, Per Jessen wrote:
if you _must_ have the latest kernel, for whatever reason, maybe check this out:
https://en.opensuse.org/openSUSE:Kernel_of_the_day
Personally, in production I prefer being backlevel.
In general 4.16 is fine. Been on it since April 20. Aside from a slight init delay on a SuperMicro H8DM8-2/H8DM8-2 dual-quad-core Opteron box, there have been no issues.
There the only issue there is an init issue with:
perf: interrupt took too long (10078 > 2500), lowering kernel.perf_event_max_sample_rate to 19800
I see that all the time ? When I googled it very recently, I think the consensus was not to worry. -- Per Jessen, Zürich (17.8°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 06/07/2018 12:40 AM, Per Jessen wrote:
perf: interrupt took too long (10078 > 2500), lowering kernel.perf_event_max_sample_rate to 19800 I see that all the time ? When I googled it very recently, I think the consensus was not to worry.
Well, generally I don't, and LKML doesn't seem to be concerned with it, the only problem for me is the set of dumps and checks adds about 2 min.+ to the init of the box as looping though all cores sending NMI to every other core. Like I said, except for this one quirky board, (and even with this one after the delay) -- no problems whatsoever otherwise. Strange init issue though -- it showed up in one of the early 4.16.x releases, e.g. (1st 2 lines for context -- all OK to there) [ 0.464392] NET: Registered protocol family 1 [ 0.464769] PCI Interrupt Link [LUB0] enabled at IRQ 23 [ 10.412957] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.718 msecs [ 10.412957] perf: interrupt took too long (10078 > 2500), lowering kernel.perf_event_max_sample_rate to 19800 [ 10.466267] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 3.398 msecs [ 10.466267] perf: interrupt took too long (23235 > 12597), lowering kernel.perf_event_max_sample_rate to 8400 [ 11.139636] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 38.086 msecs [ 11.138340] perf: interrupt took too long (979330 > 911352), lowering kernel.perf_event_max_sample_rate to 300 [ 11.391081] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 72.692 msecs [ 11.139636] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 381.378 msecs [ 12.138796] perf: interrupt took too long (1897735 > 1224162), lowering kernel.perf_event_max_sample_rate to 300 [ 25.142864] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1947.833 msecs [ 28.140139] perf: interrupt took too long (13755889 > 2372168), lowering kernel.perf_event_max_sample_rate to 300 [ 28.138850] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] [ 28.138850] Modules linked in: [ 28.138850] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.13-1-ARCH #1 [ 86.138261] Hardware name: Supermicro H8DM8-2/H8DM8-2, BIOS 080014 11/06/2009 [ 90.139366] RIP: 0010:__do_softirq+0x75/0x2e0 [ 90.139366] RSP: 0018:ffff9b36e7c03f70 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff12 [ 90.139366] RAX: 0000000000021000 RBX: ffff9b36e7c15f00 RCX: 000000003543891f [ 91.138917] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 467.840 msecs [ 91.138917] RDX: 0000000000000000 RSI: 000000003543891f RDI: 0000000000000100 [ 91.138917] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000078 [ 91.138917] R10: 000000005b1397ce R11: 0000000000000000 R12: 0000000000000000 [ 91.138917] R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000000 [ 91.138917] FS: 0000000000000000(0000) GS:ffff9b36e7c00000(0000) knlGS:0000000000000000 [ 91.138917] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 91.138917] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4 [ 92.138146] CR2: ffff9b33d2655000 CR3: 000000011200a000 CR4: 00000000000006f0 [ 92.138146] Call Trace: [ 92.138146] <IRQ> [ 92.138146] irq_exit+0xc9/0xe0 [ 93.138944] smp_apic_timer_interrupt+0x73/0x160 [ 94.137773] Modules linked in: [ 94.137772] apic_timer_interrupt+0xf/0x20 [ 95.137578] </IRQ> [ 95.137578] RIP: 0010:native_safe_halt+0x2/0x10 [ 95.137578] RSP: 0018:ffffffffbf003ea0 EFLAGS: 00000202 [ 96.136670] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.16.13-1-ARCH #1 [ 97.138510] Hardware name: Supermicro H8DM8-2/H8DM8-2, BIOS 080014 11/06/2009 [ 97.138081] ORIG_RAX: ffffffffffffff12 <snip another 900 lines> [ 151.438482] Sending NMI from CPU 1 to CPUs 3: [ 151.469721] NMI backtrace for cpu 3 [ 151.469722] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G L 4.16.13-1-ARCH #1 [ 151.469723] Hardware name: Supermicro H8DM8-2/H8DM8-2, BIOS 080014 11/06/2009 [ 151.469724] RIP: 0010:native_queued_spin_lock_slowpath+0x19a/0x1c0 [ 151.469725] RSP: 0018:ffff9b36e7cc3ef8 EFLAGS: 00000046 [ 151.469726] RAX: 0000000000000000 RBX: ffffffffbf051640 RCX: ffff9b36e7ce2480 [ 151.469726] RDX: 0000000000000002 RSI: ffffffffbee70e1c RDI: 0000000000100000 [ 151.469727] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000101 [ 151.469728] R10: 0000000000338518 R11: 0000000000000000 R12: ffff9b36e7ce2480 [ 151.469728] R13: ffffffffbeed81f2 R14: 0000000000000009 R15: 0000000000000000 [ 151.469729] FS: 0000000000000000(0000) GS:ffff9b36e7cc0000(0000) knlGS:0000000000000000 [ 151.469730] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 151.469731] CR2: 0000000000000000 CR3: 000000011200a000 CR4: 00000000000006e0 [ 151.469731] Call Trace: [ 151.469732] <IRQ> [ 151.469733] _raw_spin_lock_irqsave+0x44/0x50 [ 151.469735] rcu_process_callbacks+0xce/0x3b0 [ 151.469737] __do_softirq+0xf1/0x2e0 [ 151.469739] irq_exit+0xc9/0xe0 [ 151.469741] smp_apic_timer_interrupt+0x73/0x160 [ 151.469743] apic_timer_interrupt+0xf/0x20 [ 151.469744] </IRQ> [ 151.469745] RIP: 0010:native_safe_halt+0x2/0x10 [ 151.469745] RSP: 0018:ffff9d6a431bbec0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff12 [ 151.469747] RAX: 0000000000000003 RBX: 0000000000000003 RCX: 0000000000000000 [ 151.469747] RDX: 0000000000000000 RSI: ffffffffbee70e1c RDI: ffffffffbee7107a [ 151.469748] RBP: ffffffffbf132b60 R08: 0000000000000000 R09: 0000000000000000 [ 151.469748] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffbeed81f2 [ 151.469749] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 151.469752] default_idle+0x18/0x100 [ 151.469753] do_idle+0x179/0x1b0 [ 151.469755] cpu_startup_entry+0x6f/0x80 [ 151.469757] start_secondary+0x1aa/0x200 [ 151.469758] secondary_startup_64+0xa5/0xb0 [ 151.469760] Code: 12 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 80 24 02 00 48 03 04 d5 80 04 ef be 48 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 48 8b 01 48 85 c0 0f 84 54 ff ff ff 49 89 c0 0f 0d [ 151.438482] Sending NMI from CPU 1 to CPUs 4: [ 151.470739] NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0x2/0x10 [ 151.438482] Sending NMI from CPU 1 to CPUs 5: [ 151.470007] NMI backtrace for cpu 5 skipped: idling at native_safe_halt+0x2/0x10 [ 151.438482] Sending NMI from CPU 1 to CPUs 7: [ 151.471011] NMI backtrace for cpu 7 skipped: idling at native_safe_halt+0x2/0x10 [ 151.474216] PCI Interrupt Link [LUB2] enabled at IRQ 22 [ 151.490261] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490302] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490342] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490383] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490428] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490477] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490527] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490581] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490636] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490694] pci 0000:00:00.0: Found enabled HT MSI Mapping [ 151.490723] pci 0000:01:05.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] [ 151.490725] PCI: CLS 64 bytes, default 64 [ 151.490783] Unpacking initramfs... [ 151.620261] Freeing initrd memory: 7660K [ 151.620268] PCI-DMA: Disabling AGP. All good from here on out... -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
David C. Rankin wrote:
On 06/07/2018 12:40 AM, Per Jessen wrote:
perf: interrupt took too long (10078 > 2500), lowering kernel.perf_event_max_sample_rate to 19800
I see that all the time ? When I googled it very recently, I think the consensus was not to worry.
Well, generally I don't, and LKML doesn't seem to be concerned with it, the only problem for me is the set of dumps and checks adds about 2 min.+ to the init of the box as looping though all cores sending NMI to every other core.
You are sure they are related? I don't get any dumps, and the "perf: interrupt took too long" messages usually turn up a while after a reboot - [ 3.835386] usbcore: registered new interface driver usbhid [ 501.792135] perf: interrupt took too long (2520 > 2500), lowering kernel.perf_event_max_sample_rate to 79250 [ 534.657898] perf: interrupt took too long (3154 > 3150), lowering kernel.perf_event_max_sample_rate to 63250 [ 575.439441] perf: interrupt took too long (3956 > 3942), lowering kernel.perf_event_max_sample_rate to 50500 [ 661.743477] perf: interrupt took too long (4957 > 4945), lowering kernel.perf_event_max_sample_rate to 40250 [ 850.194665] perf: interrupt took too long (6214 > 6196), lowering kernel.perf_event_max_sample_rate to 32000 [ 1172.381354] perf: interrupt took too long (7773 > 7767), lowering kernel.perf_event_max_sample_rate to 25500 [snip]
[ 28.138850] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! ^^^^^^^^^^^^^^^^
Try searching bugzilla for that one. Very annoying, I have had it a few times over the years. -- Per Jessen, Zürich (17.7°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 06/07/2018 01:46 AM, Per Jessen wrote:
Try searching bugzilla for that one. Very annoying, I have had it a few times over the years.
Oh, I have until my little fingers hurt. All the way down to core.c in the kernel. The problem is there are a number of factors that come into play. There is the nomodeset angle, and apic angle (on by default with SMP), then there is the high load angle. There have been a ton of perf patches flying around on LKML just in the past few days. As well as a few addressing nmi checks. It all boils down to unless you are one of the kernel devs, with a niche in multi-cpu pre-init, or you just have a month long sabbatical you are willing to devote to chasing the issue down, you are pretty much stuck. It was comical to see how many CPU stuck for 23 sec. posts there are out there. I know there were no issues in 4.14, there are issues in 4.16 -- those are about the only 2 hard data points I have. -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
David C. Rankin wrote:
On 06/07/2018 01:46 AM, Per Jessen wrote:
Try searching bugzilla for that one. Very annoying, I have had it a few times over the years.
Oh, I have until my little fingers hurt. All the way down to core.c in the kernel. The problem is there are a number of factors that come into play.
In bugzilla, I found 32 reports on "soft lockup", back to openSUSE 11.2. I have had a few myself, ISTR. Unless one of those reports should miraculously match your situation, you probably wont find any actual help there. I would open a new report. -- Per Jessen, Zürich (18.1°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 06/08/2018 12:31 AM, Per Jessen wrote:
In bugzilla, I found 32 reports on "soft lockup", back to openSUSE 11.2. I have had a few myself, ISTR. Unless one of those reports should miraculously match your situation, you probably wont find any actual help there. I would open a new report.
I've still got 2 open threads on LKML. I'll wait until I understand what the heck this problem is and then I'll open a bug. Unless the devs just happen to have this Supermicro board and CPU combination -- I suspect it would be a wild goose-chase resulting in a typical "WONTFIX - Works For Me". -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/06/18 00:26, Andrei Verovski wrote:
Hi !
Is this kernel (4.16 from pmdpalma home repository) works stable on Leap 15 ? I’m going to use it in production system, KVM host system.
http://ftp.opensuse.org/repositories/home:/pmdpalma:/core:/Kernel_stable_sta...
Why the kernel from this repository? I have used the kernel from this repo. (below) for a very, very, long time and the only hassle that I can recall ever having is that the nVidia driver occasionally not compiling with the 'latest' kernel. The latest kernel I am using is 4.17.0-1 (in Leap 15 and Tumbleweed), found here: download.opensuse.org/repositories/Kernel:/stable/standard/ BC -- "..The times have been That, when the brains were out, the man would die,.." "Macbeth", Shakespeare -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (5)
-
Andrei Verovski
-
Basil Chupin
-
David C. Rankin
-
Knurpht @ openSUSE
-
Per Jessen