[Bug 1094763] New: Fatal IO error 11 under heavy CPU load
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763 Bug ID: 1094763 Summary: Fatal IO error 11 under heavy CPU load Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.0 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: X.Org Assignee: xorg-maintainer-bugs@forge.provo.novell.com Reporter: rjhorn@alum.mit.edu QA Contact: xorg-maintainer-bugs@forge.provo.novell.com Found By: --- Blocker: --- This is on a fresh install of Leap 15 followed by all current updates. The hardware was previously very stable with Leap 42.3. Various GUI applications fail for no obvious single reason. This affects at least firefox, google-chrome, VLC, and emacs-gtk. All failures are while the application is doing something. The errors captured from the command line for firefox and google-chrome are: - firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :1 - ERROR:chrome_browser_main_extra_parts_x11.cc(62)] X IO error received (X server probably went away) This takes place while the system is under heavy CPU load. The CPU is between 98-100% load with a large computational task at reduced priority. Failures occur at intervals from after a few minutes use to about an hours use, mostly of web browsing. When the heavy CPU load is removed, the error rate drops dramatically, perhaps to zero. No errors after 2 hours. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c1
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
Yifan Jiang
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c2
Michal Srb
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c3
--- Comment #3 from robert Horn
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c4
Yifan Jiang
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c5
--- Comment #5 from Michal Srb
If there are suggestions about startup options or other logging locations I can try those.
Please attach /var/log/Xorg.0.log and output of dmesg after some application crashes this way. Maybe there will be some hint on what is happening. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c6
--- Comment #6 from Michael Vetter
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c7
Dominique Leuenberger
I'm slightly confused since various DEs are mentioned. Did I get it right: So it happens in GNOME and WindowMaker but not in XFCE?
Same here - and GNOME seems was just a guess/assumption by Stefan? There was no mention of GNOME y Robert at any time -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c8
--- Comment #8 from Stefan Dirsch
(In reply to Michael Vetter from comment #6)
I'm slightly confused since various DEs are mentioned. Did I get it right: So it happens in GNOME and WindowMaker but not in XFCE?
Same here - and GNOME seems was just a guess/assumption by Stefan? There was no mention of GNOME y Robert at any time
Sorry. Yes, this is indeed true. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c9
--- Comment #9 from robert Horn
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c10
--- Comment #10 from robert Horn
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c11
--- Comment #11 from robert Horn
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c12
--- Comment #12 from robert Horn
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c13
--- Comment #13 from Yifan Jiang
Another shorter journal. For other reasons, I had to erase and rebuild from the distribution. The Error 11 is still there, but it hits right away and did not need heavy CPU load.
The previous was an Upgrade from 42.3, so some files had been preserved. This started with reformatting the root directory, preserving the /home directory.
The shorter journal in commen#12 looks like an independent issue that at-spi-bus-launcher has spaces to handle exit more gracefully during power-off, cf. bsc#1094446. It's probably unrelated with the original issue here. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c14
--- Comment #14 from Michal Srb
The problem looks like it begins with Windowmaker. It starts at 10:15:53. The first visible symptom was at 10:16:02 when firefox startup failed. It brought up the profile manager selection list, then with a couple seconds that window vanished.
The log shows that Window Maker crashed (segfault). Then GDM terminates the X server and then all applications go down with IO error. This is expected behavior and it is also what I was able to simulate on my test machine by killing wmaker manually. I do not understand how you were able to see a new window for couple of seconds after that. Perhaps the high CPU utilization caused delay between the wmaker crash and the X server termination. However in comment 3 you said that only individual applications are killed, not the whole X server. I don't know how to explain that. Nevertheless, whatever is happening, it seems to be triggered by crash in Window Manager. We should try to fix that. Can you check if you have any wmaker coredump recorded in the output of `coredumpctl`? If yes, please export it using `coredumpctl dump <pid>` command and attach it to bugzilla. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c15
--- Comment #15 from Yifan Jiang
(In reply to robert Horn from comment #10)
The problem looks like it begins with Windowmaker. It starts at 10:15:53. The first visible symptom was at 10:16:02 when firefox startup failed. It brought up the profile manager selection list, then with a couple seconds that window vanished.
The log shows that Window Maker crashed (segfault). Then GDM terminates the X server and then all applications go down with IO error.
This is expected behavior and it is also what I was able to simulate on my test machine by killing wmaker manually.
I do not understand how you were able to see a new window for couple of seconds after that. Perhaps the high CPU utilization caused delay between the wmaker crash and the X server termination. However in comment 3 you said that only individual applications are killed, not the whole X server. I don't know how to explain that.
Nevertheless, whatever is happening, it seems to be triggered by crash in Window Manager. We should try to fix that. Can you check if you have any wmaker coredump recorded in the output of `coredumpctl`? If yes, please export it using `coredumpctl dump <pid>` command and attach it to bugzilla.
Hi Michal, A couple of seconds before wmaker crashed, it looks firefox complains something I couldn't understand. Can that be linked to you analysis as well? Thank you! May 29 10:07:09 quad firefox[3492]: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :1.0. May 29 10:07:28 quad firefox[3546]: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :1.0. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c16
--- Comment #16 from robert Horn
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c17
--- Comment #17 from Michal Srb
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c18
--- Comment #18 from robert Horn
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763
http://bugzilla.opensuse.org/show_bug.cgi?id=1094763#c19
--- Comment #19 from robert Horn
participants (1)
-
bugzilla_noreply@novell.com