[Bug 1185570] New: New kernel breaks bcache mounted rootfs
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 Bug ID: 1185570 Summary: New kernel breaks bcache mounted rootfs Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: diego.ercolani@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- kernel-devel-5.12.0-1.2.noarch breaks completely the bcache mounted rootfs I verified that rolling back to 5.11.16 make bcache working again -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c2 --- Comment #2 from Diego Ercolani <diego.ercolani@gmail.com> --- Created attachment 848980 --> http://bugzilla.opensuse.org/attachment.cgi?id=848980&action=edit dmesg file carrying a single kernel oops Ok, sorry but it's very difficult to have a complete kernel dump as the system start to printout kernel panic at full speed. Normally if I jump into runlevel 1 the system seem to start correctly, as soon as I jump in another runlevel (eg. 3) system start to printout kernel panics in the console and is not possible to have any info. One time, when I jumped into runlevel 1, in the dmesg there was a kernel panic that I can include (dmesg.gz) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c3 --- Comment #3 from Diego Ercolani <diego.ercolani@gmail.com> --- Created attachment 848981 --> http://bugzilla.opensuse.org/attachment.cgi?id=848981&action=edit systemd-journal file showing a bunch of kernel oops while changing from runlevel 1 to runlevel 3 see with journalctl --file=./system.journal -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c4 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(diego.ercolani@gm | |ail.com) | --- Comment #4 from Takashi Iwai <tiwai@suse.com> --- Thanks! The Oops in comment 2 indicates that the bcache tries to call bio_alloc_bioset() with too many nr_vecs. In 5.11.x kernel, bio_alloc_bioset() returned NULL in such a case without complaints, but now it hits the kernel panic instead. The BUG() call is intentional, but it doesn't look like the most helpful way... The call pattern is via cached_dev_cache_miss(), and it calculates the nr_vecs like DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS) and this is likely over BIO_MAX_VECS (=256). Dropping BUG() call in bio.c should restore the old behavior (although there is still another WARN_ON()), but the real fix is needed rather in the caller side in bcache code, I suppose. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c6 Diego Ercolani <diego.ercolani@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(diego.ercolani@gm | |ail.com) | --- Comment #6 from Diego Ercolani <diego.ercolani@gmail.com> --- Created attachment 849061 --> http://bugzilla.opensuse.org/attachment.cgi?id=849061&action=edit dmesg booting from bsc1185570 kernel As said, now I have a single kernel oops -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c7 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |colyli@suse.com --- Comment #7 from Takashi Iwai <tiwai@suse.com> --- Thanks. It's no Oops but the normal kernel warning with stack trace, as expected. So far, so good. Usually this can be fixed by capping nr_iovecs via bio_max_segs(). But as I don't know the details of bcache, I reassign this bug to Coly. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c10 --- Comment #10 from Diego Ercolani <diego.ercolani@gmail.com> --- 5.12.2 released from opensuse but it have the same problem, the difference is that I have only a single kernel oops but the system is slowed down in an unmanageable way -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c15 Bodo Eggert <7eggert@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |7eggert@gmx.de --- Comment #15 from Bodo Eggert <7eggert@gmx.de> --- I'd be a potential tester, too. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c16 --- Comment #16 from Coly Li <colyli@suse.com> --- (In reply to Bodo Eggert from comment #15)
I'd be a potential tester, too.
I will add the fast fix to our kernel very soon. And the fast fix will be replaced with upstream version once it merged into kernel finally. Coly Li -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c17 --- Comment #17 from Diego Ercolani <diego.ercolani@gmail.com> --- New kernel 5.12.4, same issue so we are waiting -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c18 --- Comment #18 from Coly Li <colyli@suse.com> --- (In reply to Diego Ercolani from comment #17)
New kernel 5.12.4, same issue so we are waiting
OK, working on the fast fix backport now. Please notice: this is not final upstream version. Coly Li -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c21 --- Comment #21 from Diego Ercolani <diego.ercolani@gmail.com> --- Installed kernel-default-5.12.12-1.1 via the "zypper dup" command and suse repositories issue seems resolved Thank you -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c33 Diego Ercolani <diego.ercolani@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |--- --- Comment #33 from Diego Ercolani <diego.ercolani@gmail.com> --- Hello, last upgrade (kernel vmlinuz-5.15.2-1-default) broke bcache again -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185570 http://bugzilla.opensuse.org/show_bug.cgi?id=1185570#c34 --- Comment #34 from Coly Li <colyli@suse.com> --- (In reply to Diego Ercolani from comment #33)
Hello, last upgrade (kernel vmlinuz-5.15.2-1-default) broke bcache again
This is from another different regression. My current solution has 3 locations to fix, 1, Revert commit 2fd3e5efe791946be0957c8e1eed9560b541fe46 2, Revert commit f8b679a070c536600c64a78c83b96aa617f8fa71 3, Do the following change in drivers/md/bcache.c, @@ -885,9 +885,9 @@ static void bcache_device_free(struct bcache_device *d) bcache_device_detach(d); if (disk) { - blk_cleanup_disk(disk); ida_simple_remove(&bcache_device_idx, first_minor_to_idx(disk->first_minor)); + blk_cleanup_disk(disk); } Coly Li -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com