[Bug 1192115] New: qemu-seabios 1.14 breaks booting from usb storage
https://bugzilla.suse.com/show_bug.cgi?id=1192115 Bug ID: 1192115 Summary: qemu-seabios 1.14 breaks booting from usb storage Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.3 Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Virtualization:Tools Assignee: virt-bugs@suse.de Reporter: fvogt@suse.com QA Contact: qa-bugs@suse.de CC: jose.ziviani@suse.com, li.zhang@suse.com Found By: --- Blocker: --- We found that after upgrading openQA workers to Leap 15.3, tests which boot .iso images using a "usb-storage" device fail with disk read issues. There are various failure modes, the one that appears the most is incredible slowness during disk access (loading the kernel times out in openQA: https://openqa.opensuse.org/tests/1997827#step/finish_desktop/2), but there are also read errors in grub (https://openqa.opensuse.org/tests/1996844#step/finish_desktop/7) or corruption while loading the initrd (https://openqa.opensuse.org/tests/1996844#step/finish_desktop/3). To not require a full openQA setup for reproducing, I have a qemu cmdline here which shows a similar issue: qemu-system-x86_64 -accel kvm -m 1024 -device usb-ehci -blockdev driver=file,read-only=on,filename=openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20211027-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 (using any TW live cd iso "works") Most of the time, it results in this error message at some point during the boot process and causes it to fail at that stage: cpage out of range (5) processing error - resetting ehci HC Unlike in openQA, it does boot successfully some of the time though. I did some tests and narrowed it down to the update of qemu-seabios to 1.14.0_0_g155821a-103.2. The issues disappear when using bios(-256k).bin from Leap 15.2's qemu-seabios package (qemu-seabios-1.12.1+-lp152.9.20.1 and older). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 Oliver Kurz <okurz@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |okurz@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 Charles Arnold <carnold@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|virt-bugs@suse.de |kvm-bugs@suse.de -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 Marius Kittler <marius.kittler@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |marius.kittler@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 Lubos Kocman <lubos.kocman@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High CC| |lubos.kocman@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 Liang Yan <lyan@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lyan@suse.com Assignee|kvm-bugs@suse.de |acho@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 Liang Yan <lyan@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|acho@suse.com |kvm-bugs@suse.de -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 Liang Yan <lyan@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kvm-bugs@suse.de |lma@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c2 --- Comment #2 from Fabian Vogt <fvogt@suse.com> --- Ping. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c3 --- Comment #3 from Lin Ma <lma@suse.com> --- (In reply to Fabian Vogt from comment #0)
...
Most of the time, it results in this error message at some point during the boot process and causes it to fail at that stage:
cpage out of range (5) processing error - resetting ehci HC
Unlike in openQA, it does boot successfully some of the time though.
I did some tests and narrowed it down to the update of qemu-seabios to 1.14.0_0_g155821a-103.2. The issues disappear when using bios(-256k).bin from Leap 15.2's qemu-seabios package (qemu-seabios-1.12.1+-lp152.9.20.1 and older).
I used to test the issue with seabios 1.12, 1.13 and seabios 1.14. I test the issue for roughly 50 times against each seabios. In seabios 1.12 environment, It 's hard to reproduce But I indeed experenced the "processing error - resetting ehci HC" error message once. In seabios 1.13 environment, not reproduce yet. In seabios 1.14 environment, not reproduce yet. Although I havn't reproduced the issue on seabios 1.13 and 1.14, So far it can't 100% proof these two version doesn't have this issue. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c4 --- Comment #4 from Lin Ma <lma@suse.com> --- Although the reproducibility is non-constant, Bisecting shows the seabios commit b3fa8577 "kvm: add support for reading tsc frequency" is the most suspicious one to trigger below error message: cpage out of range (5) processing error - resetting ehci HC I used to say 'I indeed experenced the "processing error - resetting ehci HC" error message once"' with seabios 1.12, But that one was not triggered by "cpage out of range (5)" @Fabian, Could you please help to verify below workaround? Modify the test script on openQA to inject "no-kvmclock" kernel-parameter to related boot entry in guest grub. Thanks -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c5 --- Comment #5 from Fabian Vogt <fvogt@suse.com> --- (In reply to Lin Ma from comment #4)
Although the reproducibility is non-constant, Bisecting shows the seabios commit b3fa8577 "kvm: add support for reading tsc frequency" is the most suspicious one to trigger below error message: cpage out of range (5) processing error - resetting ehci HC
I used to say 'I indeed experenced the "processing error - resetting ehci HC" error message once"' with seabios 1.12, But that one was not triggered by "cpage out of range (5)"
@Fabian, Could you please help to verify below workaround?
Modify the test script on openQA to inject "no-kvmclock" kernel-parameter to related boot entry in guest grub.
Thanks
The issue appears before the kernel is executed, so kernel parameters would not have any effect. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c6 --- Comment #6 from Lin Ma <lma@suse.com> --- (In reply to Fabian Vogt from comment #5)
(In reply to Lin Ma from comment #4)
Although the reproducibility is non-constant, Bisecting shows the seabios commit b3fa8577 "kvm: add support for reading tsc frequency" is the most suspicious one to trigger below error message: cpage out of range (5) processing error - resetting ehci HC
I used to say 'I indeed experenced the "processing error - resetting ehci HC" error message once"' with seabios 1.12, But that one was not triggered by "cpage out of range (5)"
@Fabian, Could you please help to verify below workaround?
Modify the test script on openQA to inject "no-kvmclock" kernel-parameter to related boot entry in guest grub.
Thanks
The issue appears before the kernel is executed, so kernel parameters would not have any effect.
Emm.. It's unbelievable a bit. So the issue (cpage out of range (5) processing error - resetting ehci HC) occurs before loading guest grub OR the issue occurs while the guest grub already start to work but is at eariler stage? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c7 --- Comment #7 from Fabian Vogt <fvogt@suse.com> --- (In reply to Lin Ma from comment #6)
(In reply to Fabian Vogt from comment #5)
(In reply to Lin Ma from comment #4)
Although the reproducibility is non-constant, Bisecting shows the seabios commit b3fa8577 "kvm: add support for reading tsc frequency" is the most suspicious one to trigger below error message: cpage out of range (5) processing error - resetting ehci HC
I used to say 'I indeed experenced the "processing error - resetting ehci HC" error message once"' with seabios 1.12, But that one was not triggered by "cpage out of range (5)"
@Fabian, Could you please help to verify below workaround?
Modify the test script on openQA to inject "no-kvmclock" kernel-parameter to related boot entry in guest grub.
Thanks
The issue appears before the kernel is executed, so kernel parameters would not have any effect.
Emm.. It's unbelievable a bit. So the issue (cpage out of range (5) processing error - resetting ehci HC) occurs before loading guest grub OR the issue occurs while the guest grub already start to work but is at eariler stage?
GRUB fails to read from the storage device, resulting in errors and corrupted data. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c8 Lin Ma <lma@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Lin Ma <lma@suse.com> --- A potential race condition in qemu causes this issue��� The part of queue element transfer descriptor migh be overwritten by a late DMA in qemu's ehci emulation, which causes cpage is out of range. The fix was posted to corresponding virt devel qemu project, It'll automatically sync to leap 15.3 and 15.4. Close this issue as fixed. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c9 --- Comment #9 from Fabian Vogt <fvogt@suse.com> --- (In reply to Lin Ma from comment #8)
A potential race condition in qemu causes this issue��� The part of queue element transfer descriptor migh be overwritten by a late DMA in qemu's ehci emulation, which causes cpage is out of range.
The fix was posted to corresponding virt devel qemu project, It'll automatically sync to leap 15.3 and 15.4.
Nice, can you link to the fix?
Close this issue as fixed.
-- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c10 --- Comment #10 from Lin Ma <lma@suse.com> --- The request for qemu in sles 15 sp3: https://build.suse.de/request/show/275031 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c15 --- Comment #15 from Swamp Workflow Management <swamp@suse.de> --- SUSE-SU-2022:3594-1: An update that solves 5 vulnerabilities and has one errata is now available. Category: security (important) Bug References: 1175144,1182282,1192115,1198035,1198037,1198038 CVE References: CVE-2021-3409,CVE-2021-4206,CVE-2021-4207,CVE-2022-0216,CVE-2022-35414 JIRA References: Sources used: openSUSE Leap 15.4 (src): qemu-4.2.1-150200.69.1 openSUSE Leap 15.3 (src): qemu-4.2.1-150200.69.1 SUSE Manager Server 4.1 (src): qemu-4.2.1-150200.69.1 SUSE Manager Retail Branch Server 4.1 (src): qemu-4.2.1-150200.69.1 SUSE Manager Proxy 4.1 (src): qemu-4.2.1-150200.69.1 SUSE Linux Enterprise Server for SAP 15-SP2 (src): qemu-4.2.1-150200.69.1 SUSE Linux Enterprise Server 15-SP2-LTSS (src): qemu-4.2.1-150200.69.1 SUSE Linux Enterprise Server 15-SP2-BCL (src): qemu-4.2.1-150200.69.1 SUSE Linux Enterprise High Performance Computing 15-SP2-LTSS (src): qemu-4.2.1-150200.69.1 SUSE Linux Enterprise High Performance Computing 15-SP2-ESPOS (src): qemu-4.2.1-150200.69.1 SUSE Enterprise Storage 7 (src): qemu-4.2.1-150200.69.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c16 --- Comment #16 from Swamp Workflow Management <swamp@suse.de> --- SUSE-SU-2022:3660-1: An update that solves two vulnerabilities and has one errata is now available. Category: security (moderate) Bug References: 1192115,1198038,1201367 CVE References: CVE-2022-0216,CVE-2022-35414 JIRA References: Sources used: openSUSE Leap Micro 5.2 (src): qemu-5.2.0-150300.118.3 openSUSE Leap 15.3 (src): qemu-5.2.0-150300.118.3, qemu-linux-user-5.2.0-150300.118.2, qemu-testsuite-5.2.0-150300.118.5 SUSE Linux Enterprise Module for Server Applications 15-SP3 (src): qemu-5.2.0-150300.118.3 SUSE Linux Enterprise Module for Basesystem 15-SP3 (src): qemu-5.2.0-150300.118.3 SUSE Linux Enterprise Micro 5.2 (src): qemu-5.2.0-150300.118.3 SUSE Linux Enterprise Micro 5.1 (src): qemu-5.2.0-150300.118.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1192115 https://bugzilla.suse.com/show_bug.cgi?id=1192115#c17 --- Comment #17 from Swamp Workflow Management <swamp@suse.de> --- SUSE-SU-2022:3795-1: An update that solves two vulnerabilities and has one errata is now available. Category: security (moderate) Bug References: 1192115,1198038,1201367 CVE References: CVE-2022-0216,CVE-2022-35414 JIRA References: Sources used: openSUSE Leap 15.4 (src): qemu-6.2.0-150400.37.8.2, qemu-linux-user-6.2.0-150400.37.8.1, qemu-testsuite-6.2.0-150400.37.8.4 SUSE Linux Enterprise Module for Server Applications 15-SP4 (src): qemu-6.2.0-150400.37.8.2 SUSE Linux Enterprise Module for Basesystem 15-SP4 (src): qemu-6.2.0-150400.37.8.2 SUSE Linux Enterprise Micro 5.3 (src): qemu-6.2.0-150400.37.8.2 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com