[Bug 1175005] New: OpenSuse 15.2 with xen kernel freezes with no relevant details in system log
http://bugzilla.opensuse.org/show_bug.cgi?id=1175005 Bug ID: 1175005 Summary: OpenSuse 15.2 with xen kernel freezes with no relevant details in system log Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Xen Assignee: xen-bugs@suse.de Reporter: boris.grinac@upsserv.cz QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- My setup: fresh installed OpenSuse 15.2, xen, NVME disk hardware: Intel® Server Board S2600STB 2x LGA3647, C624, 16x DDR4, 10x SATA, 2x 10GbE, IPMI This server was running fine on OpenSuse 15.1 and later with OpenSuse 15.2 with sata SSD drive. I only had NVME M.2 drive for system boot. I have installed new NVME U.2 drive and server started to freeze after I started virtual machines. My vm images are file based, raw format. At first I thought the problem is because I formatted this new NVME drive with BTRFS. So I erased the drive, created lvm and formatted the logical volume as xfs. This helped in the sense that server freezed only after some hours, not immediately after I started the VMs. Symptoms in journal: processor xx soft lockup for xx seconds. Strictly speaking, system was still running with these messages in the log, but it was slow, unusable, not accessible with ssh. I have experimented with throwing more memory into dom0, higher processor weight for dom0, but this did not change the situation. I have resolved the situation by installing Ubuntu 20.04 server with xen. Now the server is rock stable, I did load tests and it just holds. Xen version in Ubuntu 20.04 is old, 3.11, I have to add xen command line sched=credit2. So it looks like this was not a problem with credit2 scheduler. Here is my new xl info, which was the nearly same with OpenSuse 15.2: xl info host : upsserver2 release : 5.4.0-42-generic version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 machine : x86_64 nr_cpus : 32 max_cpu_id : 223 nr_nodes : 2 cores_per_socket : 8 threads_per_core : 2 cpu_mhz : 2095.082 hw_caps : bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ff7eb:00000008:00000100 virt_caps : hvm total_memory : 130762 free_memory : 69548 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 11 xen_extra : .4-pre xen_version : 4.11.4-pre xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit2 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : xen_commandline : placeholder vga=gfx-1024x768x16 dom0_mem=8192M,max:8192M dom0_max_vcpus=4 dom0_vcpus_pin=true sched=credit2 no-real-mode edd=off cc_compiler : gcc (Ubuntu 9.2.1-31ubuntu3) 9.2.1 20200306 cc_compile_by : ubuntu-devel-di cc_compile_domain : lists.ubuntu.com cc_compile_date : Tue Mar 10 09:04:06 UTC 2020 build_id : 70edf50fce444a706eb5c69735c35c1838e4eaee xend_config_format : 4 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1175005 http://bugzilla.opensuse.org/show_bug.cgi?id=1175005#c1 Jürgen Groß <jgross@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |boris.grinac@upsserv.cz Flags| |needinfo?(boris.grinac@upss | |erv.cz) --- Comment #1 from Jürgen Groß <jgross@suse.com> --- Would it be possible to test a kernel patch in your setup? I have recently found a problem in the kernel which (in theory) could have such symptoms as you are seeing. Up to now I didn't have a good reproducer for the problem, it showed up only very very rarely. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1175005 http://bugzilla.opensuse.org/show_bug.cgi?id=1175005#c2 --- Comment #2 from Jürgen Groß <jgross@suse.com> --- Created attachment 840502 --> http://bugzilla.opensuse.org/attachment.cgi?id=840502&action=edit Tentative dom0 kernel patch This patch is just a wild guess. The issue it is repairing could result in a wide variety of symptoms, all resulting from inconsistencies in memory management. The reasons I think it might help in your case are: - your use of NVME devices will result in timings very different to those of spinning disks - btrfs is known to have lots of async handling, so your observation that btrfs seems to trigger the issue more often would make sense Are you able to build the kernel with the attached patch? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1175005 http://bugzilla.opensuse.org/show_bug.cgi?id=1175005#c3 --- Comment #3 from Jürgen Groß <jgross@suse.com> --- The kernel RPMs with the patch applied should now be available via the OBS repo: https://download.opensuse.org/repositories/home:/j_gross:/kernel-test-opensu... -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com