Dear OpenSuse Team: On Thu, Feb 13, 2020 at 12:44 PM Glen <glenbarney@gmail.com> wrote:
This is a followup to my two previous threads about 42.3 and 15.1 DomU machines hanging under high disk load. I repeat my thanks to all of you who responded to me and tried to help me with this.
Problem: Xen DomU guests randomly stall under high network/disk loads. Dom0 is not affected. Randomly means anywhere between 1 hour and 14 days after guest boot - the time seems to shorten with (or perhaps the problem is triggered by) increased network (and possibly disk) activity.
I wanted to report back here and let you all know what we've found so far. After I raised this on the xen-users list, a number of other people stepped in and said that they were having similar problems. Guided by members of their community, we've done a bit of poking and testing. You can see all the details in their archive ( https://lists.xenproject.org/archives/html/xen-users/2020-02/ ) but the short of it is: 1. Several people had the same problem, where guests randomly stall/freeze. 2. The problem seems NOT to be related to OpenSuse itself, or OpenSuse version, or Linux Kernel version. 3. The problem DOES seem to be related to Xen version, and to a specific module, the "credit-scheduler-2". Reverting to any Xen prior to Xen 4.12 fixes the problem (thank you Olaf!) but that's suboptimal in terms of wanting to run the latest software versions (or, more to the point, the production versions that come with the Leap releases.) With that in mind, the best fix so far seems to be to add "sched=credit" to GRUB_CMDLINE_XEN in /etc/default/grub, as in: GRUB_CMDLINE_XEN="dom0_mem=4G dom0_max_vcpus=4 dom0_vcpus_pin gnttab_max_frames=256 sched=credit" Adding that last parameter causes Xen to boot with the older "credit scheduler" instead of the newer "credit2 scheduler", and that seems to resolve the problem for everyone. (I'm still running longer stress tests on my guests, but the results are encouraging so far.) Members of the Xen community have suggested making sched=credit the default until problems with credit-scheduler-2 are fixed. I have no idea how that would apply to us, but felt I should mention that, as it seems important. I'm now inquiring of their users list when and how to file a bug report for this, and I'll continue to try to work with them, but I wanted to get this back to this group and list in case anyone else needs this info, and/or in case anyone here has any comments or additional guidance. Thank you again to all of you who have helped me during this extended incident. I am very grateful to this community for all of your help! Glen -- To unsubscribe, e-mail: opensuse-virtual+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-virtual+owner@opensuse.org