[opensuse-factory] Kernel 4.19 causing ext4 corruption
I realise that openSUSE doesn't use ext4 by default, but it's still a significant issue... E.g. both my desktop and laptop have ext4 partitions. A friend of mine contacted me to say: « Hi Liam, please note that there is a potential regression (root ext4 fs goes read-only) in the newest openSUSE-Tumbleweed-GNOME-Live-x86_64-Snapshot20181112-Media - as it already contains Linux kernel 4.19.1. How do I know? I was using Linux kernel 4.19.2 (on Ubuntu 18.04) on 2 laptops - and the root ex4 filesystem on both of them suddenly switched to read-only mode multiple times (never happened before upgrading to 4.19.x). Now I have rolled back to Linux 4.18.19 and will see if fs read-only switching continues.. » He found a related discussion: <https://askubuntu.com/questions/1092558/ubuntu-18-04-4-19-1-kernel-after-closing-the-lid-for-the-night-not-logging-ou> He added: « If you download the newest image from https://en.opensuse.org/openSUSE:Tumbleweed_installation (2018-11-16), it should contain kernel 4.19.1. openSUSE-Tumbleweed-GNOME-Live-x86_64-Snapshot20181118-Media.iso contains kernel 4.19.2. So... it's official now. 😀 https://www.phoronix.com/scan.php?page=news_item&px=EXT4-Linux-4.19-Corruption (And I can personally confirm that the issue is still present in the newest stable kernel 4.19.5.) Hi Liam, so... it's official now. 😀 https://www.phoronix.com/scan.php?page=news_item&px=EXT4-Linux-4.19-Corruption (And I can personally confirm that the issue is still present in the newest stable kernel 4.19.5.) » He's also raised it here: https://lkml.org/lkml/2018/12/1/576 -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 5/12/18 6:58 am, Liam Proven wrote:
I realise that openSUSE doesn't use ext4 by default, but it's still a significant issue... E.g. both my desktop and laptop have ext4 partitions.
A friend of mine contacted me to say:
« Hi Liam, please note that there is a potential regression (root ext4 fs goes read-only) in the newest openSUSE-Tumbleweed-GNOME-Live-x86_64-Snapshot20181112-Media - as it already contains Linux kernel 4.19.1.
How do I know? I was using Linux kernel 4.19.2 (on Ubuntu 18.04) on 2 laptops - and the root ex4 filesystem on both of them suddenly switched to read-only mode multiple times (never happened before upgrading to 4.19.x). Now I have rolled back to Linux 4.18.19 and will see if fs read-only switching continues.. »
He found a related discussion:
He added:
« If you download the newest image from https://en.opensuse.org/openSUSE:Tumbleweed_installation (2018-11-16), it should contain kernel 4.19.1. openSUSE-Tumbleweed-GNOME-Live-x86_64-Snapshot20181118-Media.iso contains kernel 4.19.2.
So... it's official now. 😀 https://www.phoronix.com/scan.php?page=news_item&px=EXT4-Linux-4.19-Corruption
(And I can personally confirm that the issue is still present in the newest stable kernel 4.19.5.)
Hi Liam, so... it's official now. 😀 https://www.phoronix.com/scan.php?page=news_item&px=EXT4-Linux-4.19-Corruption
(And I can personally confirm that the issue is still present in the newest stable kernel 4.19.5.) »
He's also raised it here: https://lkml.org/lkml/2018/12/1/576
Hi Liam, I just upgraded my Leap 15 to kernel 4.19.6. Do you know if the above problem still exists with 4.19.6, or how can I test for this problem? BC -- God created war so that Americans can learn geography. Mark Twain -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 12/4/18 6:22 PM, Basil Chupin wrote:
On 5/12/18 6:58 am, Liam Proven wrote:
I realise that openSUSE doesn't use ext4 by default, but it's still a significant issue... E.g. both my desktop and laptop have ext4 partitions.
A friend of mine contacted me to say:
« Hi Liam, please note that there is a potential regression (root ext4 fs goes read-only) in the newest openSUSE-Tumbleweed-GNOME-Live-x86_64-Snapshot20181112-Media - as it already contains Linux kernel 4.19.1.
How do I know? I was using Linux kernel 4.19.2 (on Ubuntu 18.04) on 2 laptops - and the root ex4 filesystem on both of them suddenly switched to read-only mode multiple times (never happened before upgrading to 4.19.x). Now I have rolled back to Linux 4.18.19 and will see if fs read-only switching continues.. »
He found a related discussion:
He added:
« If you download the newest image from https://en.opensuse.org/openSUSE:Tumbleweed_installation (2018-11-16), it should contain kernel 4.19.1. openSUSE-Tumbleweed-GNOME-Live-x86_64-Snapshot20181118-Media.iso contains kernel 4.19.2.
So... it's official now. 😀 https://www.phoronix.com/scan.php?page=news_item&px=EXT4-Linux-4.19-Corruption
(And I can personally confirm that the issue is still present in the newest stable kernel 4.19.5.)
Hi Liam, so... it's official now. 😀 https://www.phoronix.com/scan.php?page=news_item&px=EXT4-Linux-4.19-Corruption
(And I can personally confirm that the issue is still present in the newest stable kernel 4.19.5.) »
He's also raised it here: https://lkml.org/lkml/2018/12/1/576
Hi Liam,
I just upgraded my Leap 15 to kernel 4.19.6. Do you know if the above problem still exists with 4.19.6, or how can I test for this problem?
Both my / and /home partitions are ext4, and I generally run kernels generated from kernel HEAD. I have never had this happen to me, which lends credence to the supposition that the problem is coming from outside the ext4 code, and may in fact be introduced be some backported code in another subsystem. Larry -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tuesday, December 4, 2018 9:17:41 PM CST, Larry Finger wrote: ...
This corruption issue seems to be caused by blk-mq mode (which doesn't seem to be enabled by default). It is also not limited to ext4. There have been reports of it happening on other file systems also. You can check if you have it enabled by doing the following:
Checking if a drive is using the multi-queue I/O code can be done by checking for the presence of the /sys/block/DEVICE/mq directory.
More info: https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.19-EXT4-Issue-Likely-MQ Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=201685 Fix for this bug: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c255 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
05.12.2018 6:36, simonizor пишет:
On Tuesday, December 4, 2018 9:17:41 PM CST, Larry Finger wrote: ...
This corruption issue seems to be caused by blk-mq mode (which doesn't seem to be enabled by default). It is also not limited to ext4. There have been reports of it happening on other file systems also. You can check if you have it enabled by doing the following:
Checking if a drive is using the multi-queue I/O code can be done by checking for the presence of the /sys/block/DEVICE/mq directory.
More info: https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.19-EXT4-Issue-Likely-MQ
Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=201685
Fix for this bug: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c255
Also https://bugzilla.kernel.org/show_bug.cgi?id=201685#c245 for explanation. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue 04 Dec 2018 09:36:08 PM CST, simonizor wrote:
On Tuesday, December 4, 2018 9:17:41 PM CST, Larry Finger wrote: ...
This corruption issue seems to be caused by blk-mq mode (which doesn't seem to be enabled by default). It is also not limited to ext4. There have been reports of it happening on other file systems also.
You can check if you have it enabled by doing the following:
Checking if a drive is using the multi-queue I/O code can be done by checking for the presence of the /sys/block/DEVICE/mq directory.
More info: https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.19-EXT4-Issue-Likely-MQ
Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=201685
Fix for this bug: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c255 Hi It's been present for awhile now, just not default;
dmesg | grep "io scheduler" [ 2.793001] io scheduler noop registered [ 2.793004] io scheduler deadline registered [ 2.793050] io scheduler cfq registered (default) [ 2.793052] io scheduler mq-deadline registered [ 2.793052] io scheduler kyber registered [ 2.793100] io scheduler bfq registered I've been using the mq i/o scheduler with btrfs/xfs on ssd's no issues seen to date. cat /sys/block/sd[a,b]/queue/scheduler [mq-deadline] kyber bfq none [mq-deadline] kyber bfq none -- Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890) SLES 15 | GNOME Shell 3.26.2 | 4.12.14-25.25-default HP 255 G4 Notebook | E2-7110 X4 @ 1.80 GHz | AMD Radeon R3 up 9 days 4:15, 2 users, load average: 0.24, -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
05.12.2018 7:12, Malcolm пишет:
On Tue 04 Dec 2018 09:36:08 PM CST, simonizor wrote:
On Tuesday, December 4, 2018 9:17:41 PM CST, Larry Finger wrote: ...
This corruption issue seems to be caused by blk-mq mode (which doesn't seem to be enabled by default). It is also not limited to ext4. There have been reports of it happening on other file systems also.
You can check if you have it enabled by doing the following:
Checking if a drive is using the multi-queue I/O code can be done by checking for the presence of the /sys/block/DEVICE/mq directory.
More info: https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.19-EXT4-Issue-Likely-MQ
Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=201685
Fix for this bug: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c255 Hi It's been present for awhile now, just not default;
dmesg | grep "io scheduler"
[ 2.793001] io scheduler noop registered [ 2.793004] io scheduler deadline registered [ 2.793050] io scheduler cfq registered (default) [ 2.793052] io scheduler mq-deadline registered [ 2.793052] io scheduler kyber registered [ 2.793100] io scheduler bfq registered
I've been using the mq i/o scheduler with btrfs/xfs on ssd's no issues seen to date.
cat /sys/block/sd[a,b]/queue/scheduler [mq-deadline] kyber bfq none [mq-deadline] kyber bfq none
According to Chris Mason, "It triggers when you have scsi devices using elevator=none over blkmq". https://lore.kernel.org/linux-btrfs/FEDBF117-C83C-4DEC-96D1-A549DBCF4950@fb.... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 05. 12. 18, 4:36, simonizor wrote:
Fix for this bug: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c255
Adding to stable and pushing... -- js suse labs -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 05/12/2018 01:22, Basil Chupin wrote:
how can I test for this problem?
Make backups. It's like carrying an umbrella prevents rain. Make backups, then it won't happen to you until a fix is out. Or, if you like to live dangerously, _don't_ backup and then something exciting is much more likely to happen... -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
how can I test for this problem?
Make backups.
That's a possible choice for a stress test but doesn't the target device have to be a scsi drive to trigger to problem? Is there any indication how many TB one needs to transfer to have a good chance of hitting the problem? Do parallel I/O operations help to provoke it? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
I do not had Ext4 issues with Kernel 4.19, but I did not want to take any risks either. What is the best option to downgrade the Kernel to 4.18 or lower? I downgraded to Kernel 4.18 on my computers which use Tumbleweed (Kernel 4.18.5) or Leap 15.0 with Kernel_stable repo (Kernel 4.18.6) to a self-compiled Kernel 4.18.20. Unfortunately the scripts in package nvidia-gfxG05-kmp-default (Nvidia driver) do not work with self-compiled kernels. So I also had to re-compile the Nvidia driver manually. This is not an issue, if you use the Nvidia *run.sh driver download. Greetings, Björn -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 05/12/2018 10:22, Bjoern Voigt wrote:
I do not had Ext4 issues with Kernel 4.19, but I did not want to take any risks either.
What is the best option to downgrade the Kernel to 4.18 or lower?
I downgraded to Kernel 4.18 on my computers which use Tumbleweed (Kernel 4.18.5) or Leap 15.0 with Kernel_stable repo (Kernel 4.18.6) to a self-compiled Kernel 4.18.20.
Unfortunately the scripts in package nvidia-gfxG05-kmp-default (Nvidia driver) do not work with self-compiled kernels. So I also had to re-compile the Nvidia driver manually. This is not an issue, if you use the Nvidia *run.sh driver download.
See the Bugzilla discussion. You can simply disable the MQ scheduler with a kernel boot parameter and avoid the issue. -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, 5 Dec 2018 12:51:47 +0100 Liam Proven <lproven@suse.cz> wrote:
On 05/12/2018 10:22, Bjoern Voigt wrote:
I do not had Ext4 issues with Kernel 4.19, but I did not want to take any risks either.
What is the best option to downgrade the Kernel to 4.18 or lower?
I downgraded to Kernel 4.18 on my computers which use Tumbleweed (Kernel 4.18.5) or Leap 15.0 with Kernel_stable repo (Kernel 4.18.6) to a self-compiled Kernel 4.18.20.
Unfortunately the scripts in package nvidia-gfxG05-kmp-default (Nvidia driver) do not work with self-compiled kernels. So I also had to re-compile the Nvidia driver manually. This is not an issue, if you use the Nvidia *run.sh driver download.
See the Bugzilla discussion. You can simply disable the MQ scheduler with a kernel boot parameter and avoid the issue.
Which does not work for resolving the issue according to the discussion. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, 5 Dec 2018 14:01:54 +0100 Michal Suchánek <msuchanek@suse.de> wrote:
On Wed, 5 Dec 2018 12:51:47 +0100 Liam Proven <lproven@suse.cz> wrote:
On 05/12/2018 10:22, Bjoern Voigt wrote:
I do not had Ext4 issues with Kernel 4.19, but I did not want to take any risks either.
What is the best option to downgrade the Kernel to 4.18 or lower?
I downgraded to Kernel 4.18 on my computers which use Tumbleweed (Kernel 4.18.5) or Leap 15.0 with Kernel_stable repo (Kernel 4.18.6) to a self-compiled Kernel 4.18.20.
Unfortunately the scripts in package nvidia-gfxG05-kmp-default (Nvidia driver) do not work with self-compiled kernels. So I also had to re-compile the Nvidia driver manually. This is not an issue, if you use the Nvidia *run.sh driver download.
See the Bugzilla discussion. You can simply disable the MQ scheduler with a kernel boot parameter and avoid the issue.
Which does not work for resolving the issue according to the discussion: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c54
Nonetheless, a problem in blk-mq was identified. So the cases when disabling mq did not work must be a different issue or error in applying the workaround. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (10)
-
Andrei Borzenkov
-
Basil Chupin
-
Bjoern Voigt
-
Jiri Slaby
-
Joachim Wagner
-
Larry Finger
-
Liam Proven
-
Malcolm
-
Michal Suchánek
-
simonizor