[Bug 1200564] New: io_uring instability on ppc64
https://bugzilla.suse.com/show_bug.cgi?id=1200564 Bug ID: 1200564 Summary: io_uring instability on ppc64 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: PowerPC-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: dmueller@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- when using aio=io_uring in qemu for KVM virtual machine guests on a SLE15SP4 or a opensuse tumbleweed host kernel on power8 or power9 machine, we have very rapid I/O corruption (within milliseconds-seconds) in the guest. on all other architectures things work perfectly fine. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c1
--- Comment #1 from Dirk Mueller
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c2
--- Comment #2 from Dirk Mueller
https://bugzilla.suse.com/show_bug.cgi?id=1200564
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c3
David Disseldorp
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c4
--- Comment #4 from Michal Suchanek
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c5
--- Comment #5 from Michal Suchanek
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c6
--- Comment #6 from David Disseldorp
I think it's worth filing an upstream bug report.
Agreed.
There isn't really Orthos HW for this. The Orthos KVM hosts like shiraz or zinfandel can run arbitrary VMs but it's probably not advisable to bring them down to boot a different host kernel.
Hmm, I might be able to give nested virtualization a shot(?). @Dirk: do you know of any hosts I might be able to use for this? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c7
--- Comment #7 from Dirk Mueller
I think it's worth filing an upstream bug report.
+1, just need a bit of help in drafting a proper upstream report. as you can see the current information is probably not informative enough. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c8
--- Comment #8 from Michal Suchanek
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c9
--- Comment #9 from Dirk Mueller
https://bugzilla.suse.com/show_bug.cgi?id=1200564
Gabriel Krisman Bertazi
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c10
--- Comment #10 from Gabriel Krisman Bertazi
actually already the liburing embedded testsuite is failing on ppc64le.
Dirk, I know this was quite a while, but was the testsuite running on the host or the VM? We've got quite a few fixes to io_uring on SP4 since last year. I'll try reproduce both the testsuite error and the fio corruption and report back. this might be already fixed. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c11
David Disseldorp
Hi.
I got a machine to work on this. Based on comment 1, I understand this is ppc64le, not ppc64, correct?
actually already the liburing embedded testsuite is failing on ppc64le.
Dirk, I know this was quite a while, but was the testsuite running on the host or the VM? We've got quite a few fixes to io_uring on SP4 since last year.
I'll try reproduce both the testsuite error and the fio corruption and report back. this might be already fixed.
IIRC the continuing testsuite failures on ppc64le appeared related to spurious EINTR syscall errors, which Dirk patched (in some places) via fa67f6aedcfdaffc14cbf0b631253477b2565ef0 . -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c12
--- Comment #12 from Dirk Mueller
I got a machine to work on this. Based on comment 1, I understand this is ppc64le, not ppc64, correct?
ppc64le indeed
actually already the liburing embedded testsuite is failing on ppc64le. Dirk, I know this was quite a while, but was the testsuite running on the host or the VM? We've got quite a few fixes to io_uring on SP4 since last year.
we ran it on both, issue happened on both.
I'll try reproduce both the testsuite error and the fio corruption and report back. this might be already fixed.
That'd be nice. we can certainly do a new experiment with current SP4 kernel and see how far we get. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1200564
https://bugzilla.suse.com/show_bug.cgi?id=1200564#c13
--- Comment #13 from Dirk Mueller
IIRC the continuing testsuite failures on ppc64le appeared related to spurious EINTR syscall errors, which Dirk patched (in some places) via fa67f6aedcfdaffc14cbf0b631253477b2565ef0 .
No, that's unrelated to this bugreport. the fs corruptions happens in real build during VM based OBS build jobs, while the patch above only fixed some testsuite issues. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com