[Bug 997575] New: [20160901] filesystem turns RO after calling "zypper in && rm" in a loop
http://bugzilla.suse.com/show_bug.cgi?id=997575 Bug ID: 997575 Summary: [20160901] filesystem turns RO after calling "zypper in && rm" in a loop Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: okurz@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 691135 --> http://bugzilla.suse.com/attachment.cgi?id=691135&action=edit y2log after filesystem turned read-only ## observation Calling zypper install and remove of a package in a loop for many times fails at some time with an error message that the media point is bad or something. It turns out the btrfs filesystem was turned read-only. Testing on a physical notebook machine. ## steps to reproduce * On a Tumbleweed 20160901 installation with btrfs filesystem call * `for i in {1..1000000} ; do echo $i ; zypper -n in nfs-kernel-server ; zypper -n rm nfs-kernel-server ; done` * see it fail after some time ## problem Might be related to bug #990384 and I also intended to do a proper crosscheck after I had some similar symptoms yesterday with only qemu-x86_64 machines. I will try to reproduce again after a reboot (or reinstall). Logs attached, please take a look. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c1
--- Comment #1 from Oliver Kurz
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c2
--- Comment #2 from Oliver Kurz
http://bugzilla.suse.com/show_bug.cgi?id=997575
Oliver Kurz
http://bugzilla.suse.com/show_bug.cgi?id=997575
Takashi Iwai
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c3
--- Comment #3 from Filipe Manana
Created attachment 691135 [details] y2log after filesystem turned read-only
## observation Calling zypper install and remove of a package in a loop for many times fails at some time with an error message that the media point is bad or something. It turns out the btrfs filesystem was turned read-only. Testing on a physical notebook machine.
## steps to reproduce
* On a Tumbleweed 20160901 installation with btrfs filesystem call * `for i in {1..1000000} ; do echo $i ; zypper -n in nfs-kernel-server ; zypper -n rm nfs-kernel-server ; done` * see it fail after some time
## problem
Might be related to bug #990384 and I also intended to do a proper crosscheck after I had some similar symptoms yesterday with only qemu-x86_64 machines. I will try to reproduce again after a reboot (or reinstall).
Logs attached, please take a look.
Yes it's the same as #990384. The same debugging instructions listed there are needed. This depends on specific timings and at least snapshoting is happening in parallel (or has happened before that loop at least). The same has happened and reported rarely upstream (an extent item is missing in the extent tree for some unknown reason). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c4
--- Comment #4 from Filipe Manana
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c5
Oliver Kurz
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c6
--- Comment #6 from Oliver Kurz
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c7
--- Comment #7 from Filipe Manana
Trying to reproduce this on Leap 42.2 Beta1 and also a more recent build: In both cases on multiple tries I could somewhat reproduce issues but always ending up with an unresponsive system so no helpful logs could be gathered. After reboot, when the filesystem tries to replay the journal it takes ages and eventually fails with an OOM exception in plymouthd(!) so also there no luck.
I then installed Leap 42.1 and ran the test overnight. It stopped when the harddisk capacity was depleted because of snapper snapshots but did not fail with any kernel problems so I can assume neither my hardware setup is flawed nor that the issue has been in Leap 42.1, of course, testing also on btrfs+snapshots.
Now I will try an upgrade to Leap 42.2 and try again.
Thanks for your attempts to reproduce Oliver. But I don't think you need to do it. I was able to reproduce it too, even with an upstream vanilla kernel. It's much easier to reproduce immediately after installing tumbleweed (on the first boot), but not so easy after building a new kernel with extra logging/tracing/etc or doing a few balances before. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=997575
Takashi Iwai
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c8
--- Comment #8 from Oliver Kurz
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c9
--- Comment #9 from Filipe Manana
I just stumbled over this. Haven't heard about this for a long time. Isn't it solved by now? Maybe we had a corresponding SLES bug on top?
It is, for well over an year. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c10
--- Comment #10 from Oliver Kurz
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c11
Filipe Manana
http://bugzilla.suse.com/show_bug.cgi?id=997575
http://bugzilla.suse.com/show_bug.cgi?id=997575#c12
Oliver Kurz
participants (1)
-
bugzilla_noreply@novell.com