[Bug 1205649] New: SEGV when mounting / during boot
https://bugzilla.suse.com/show_bug.cgi?id=1205649 Bug ID: 1205649 Summary: SEGV when mounting / during boot Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: thomas.leroy@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 863042 --> https://bugzilla.suse.com/attachment.cgi?id=863042&action=edit Kernel BUG I receive a kernel BUG during boot when trying to mount the root partition. Unfortunately I can't copy/paste the journalctl output because it's the laptop's host kernel, but I linked a picture of the kernel BUG information. I'm running Tumbleweed with a 6.0.8-1-vanilla kernel on x86_64. Additionally, I tried to mount the decrypted disk in a live usb, and mount also segfaulted. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c1
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1205649
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c2
--- Comment #2 from Thomas Leroy
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c3
--- Comment #3 from Thomas Leroy
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c4
Wenruo Qu
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c5
Thomas Leroy
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c6
--- Comment #6 from Thomas Leroy
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c7
--- Comment #7 from Wenruo Qu
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c8
--- Comment #8 from Thomas Leroy
A lot of transid errors are the ones we didn't expect:
parent transid verify failed on 8781824 wanted 276925 found 277794 parent transid verify failed on 1095892992 wanted 276925 found 277710
They are all writes in the future.
I can not really say what's the cause, but some guesses include:
- Broken COW AKA, writes into some existing metadata. This may happen if your cache is corrupted.
- Bad cache management of the underlying stack It can be dm-crypto or hardware not handling write cache. But I doubt, as all the metadata corruption are happening for both copies.
Furthermore, the corruption is not limited to extent tree, but also some fs trees.
Thankfully it looks only root 1628 is corrupted, thus "-o ro,rescue=all" may be able to mount, allowing you to backup most things except something in subvolume 1628.
mount worked, but my home is still missing, I guess because it's not in the same subvolume, right?
I don't really believe we can repair the whole fs back to RW status, due to so many corrupted extent trees.
But if you really want an adventure, after backup all your data, you may want to try your luck with "btrfs check --init-exten-tree".
I will try that :) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c9
--- Comment #9 from Thomas Leroy
(In reply to Wenruo Qu from comment #7)
A lot of transid errors are the ones we didn't expect:
parent transid verify failed on 8781824 wanted 276925 found 277794 parent transid verify failed on 1095892992 wanted 276925 found 277710
They are all writes in the future.
I can not really say what's the cause, but some guesses include:
- Broken COW AKA, writes into some existing metadata. This may happen if your cache is corrupted.
- Bad cache management of the underlying stack It can be dm-crypto or hardware not handling write cache. But I doubt, as all the metadata corruption are happening for both copies.
Furthermore, the corruption is not limited to extent tree, but also some fs trees.
Thankfully it looks only root 1628 is corrupted, thus "-o ro,rescue=all" may be able to mount, allowing you to backup most things except something in subvolume 1628.
mount worked, but my home is still missing, I guess because it's not in the same subvolume, right?
Misunderstood sorry. So is it possible that my whole home is in subvolume 1628, making it unrecoverable? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c10
--- Comment #10 from Wenruo Qu
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c11
--- Comment #11 from Thomas Leroy
It is possible.
Since you can already mount the fs RO, you can go "btrfs subvolume list <mnt>" to make sure which directory is subvolume 1628.
Another thing is, if you're using SUSE based distro, you may want to mount with "-o rescue=all,ro,subvolid=5" to mount the real top level subvolume. Or some subvolumes/snapshots may be hidden.
This can make a huge difference between the current /home is in-accessible from some snapshots of /home are completely fine.
"-o rescue=all,ro,subvolid=5" worked like a charm, I have all my home back. Thanks A LOT Qu! (Btw, this worked on Ubuntu 22). Subvolume 1628 was actually a snapshot. I think I will reinstall the OS on the disk, and not trying to repair it. If you want to investigate the BUG_ON, I am happy to help, otherwise we can close this bug since this is good for me :) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1205649
https://bugzilla.suse.com/show_bug.cgi?id=1205649#c12
Wenruo Qu
participants (1)
-
bugzilla_noreply@suse.com