[Bug 845539] New: crash after WARNING: at ext4_journal_start_sb

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c0 Summary: crash after WARNING: at ext4_journal_start_sb Classification: openSUSE Product: openSUSE 12.3 Version: Final Platform: x86-64 OS/Version: openSUSE 12.3 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: stephan.barth@suse.com QAContact: qa-bugs@suse.de Found By: --- Blocker: --- I can reproducibly crash my system with osc which writes to a file with dd. Currently I have no kernel dump yet, but the system cannot start new processes anymore after these warnings appear: [ 3321.969407] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.7.10/linux-3.7/fs/ext4/super.c:340 ext4_journal_start_sb+0x148/0x150() [ 3321.969410] Hardware name: Precision T3600 [ 3321.969412] Modules linked in: bnep bluetooth rfkill fuse nfsv3 nfs_acl nfsv4 auth_rpcgss nfs fscache lockd sunrpc nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_hdmi acpi_cpufreq mperf snd_hda_codec_realtek snd_hda_intel snd_usb_audio snd_hda_codec snd_pcm coretemp crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 snd_hwdep xts snd_usbmidi_lib gf128mul snd_rawmidi snd_seq snd_timer snd_seq_device iTCO_wdt iTCO_vendor_support snd serio_raw uvcvideo videobuf2_core videodev videobuf2_vmalloc sr_mod sg videobuf2_memops pcspkr sb_edac edac_core mei lpc_ich mfd_core i2c_i801 cdrom snd_page_alloc e1000e kvm_intel kvm microcode dcdbas soundcore shpchp pciehp pci_hotplug autofs4 nouveau ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video wmi xhci_hcd button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_alua scsi_dh_emc scsi_dh megaraid_sas [ 3321.969508] Pid: 1708, comm: master Not tainted 3.7.10-1.16-desktop #1 [ 3321.969510] Call Trace: [ 3321.969528] [<ffffffff81004818>] dump_trace+0x88/0x300 [ 3321.969538] [<ffffffff8158af33>] dump_stack+0x69/0x6f [ 3321.969547] [<ffffffff81045249>] warn_slowpath_common+0x79/0xc0 [ 3321.969555] [<ffffffff8120d428>] ext4_journal_start_sb+0x148/0x150 [ 3321.969570] [<ffffffff811f172f>] ext4_dirty_inode+0x1f/0x70 [ 3321.969580] [<ffffffff81197f4d>] __mark_inode_dirty+0x3d/0x270 [ 3321.969587] [<ffffffff81187db9>] update_time+0x89/0xe0 [ 3321.969593] [<ffffffff81187ead>] file_update_time+0x9d/0x100 [ 3321.969601] [<ffffffff81177663>] pipe_write+0x2a3/0x580 [ 3321.969610] [<ffffffff8116ef02>] do_sync_write+0x92/0xd0 [ 3321.969616] [<ffffffff8116f597>] vfs_write+0xa7/0x180 [ 3321.969623] [<ffffffff8116f8e1>] sys_write+0x51/0xa0 [ 3321.969629] [<ffffffff8159eaad>] system_call_fastpath+0x1a/0x1f [ 3321.969639] [<00007f2828f9e630>] 0x7f2828f9e62f [ 3321.969642] ---[ end trace 96f537d0e16a4415 ]--- [ 3323.970949] ------------[ cut here ]------------ [ 3323.970964] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.7.10/linux-3.7/fs/ext4/super.c:340 ext4_journal_start_sb+0x148/0x150() [ 3323.970967] Hardware name: Precision T3600 [ 3323.970969] Modules linked in: bnep bluetooth rfkill fuse nfsv3 nfs_acl nfsv4 auth_rpcgss nfs fscache lockd sunrpc nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_hdmi acpi_cpufreq mperf snd_hda_codec_realtek snd_hda_intel snd_usb_audio snd_hda_codec snd_pcm coretemp crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 snd_hwdep xts snd_usbmidi_lib gf128mul snd_rawmidi snd_seq snd_timer snd_seq_device iTCO_wdt iTCO_vendor_support snd serio_raw uvcvideo videobuf2_core videodev videobuf2_vmalloc sr_mod sg videobuf2_memops pcspkr sb_edac edac_core mei lpc_ich mfd_core i2c_i801 cdrom snd_page_alloc e1000e kvm_intel kvm microcode dcdbas soundcore shpchp pciehp pci_hotplug autofs4 nouveau ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video wmi xhci_hcd button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_alua scsi_dh_emc scsi_dh megaraid_sas [ 3323.971064] Pid: 1708, comm: master Tainted: G W 3.7.10-1.16-desktop #1 [ 3323.971067] Call Trace: [ 3323.971084] [<ffffffff81004818>] dump_trace+0x88/0x300 [ 3323.971094] [<ffffffff8158af33>] dump_stack+0x69/0x6f [ 3323.971104] [<ffffffff81045249>] warn_slowpath_common+0x79/0xc0 [ 3323.971111] [<ffffffff8120d428>] ext4_journal_start_sb+0x148/0x150 [ 3323.971127] [<ffffffff811f172f>] ext4_dirty_inode+0x1f/0x70 [ 3323.971136] [<ffffffff81197f4d>] __mark_inode_dirty+0x3d/0x270 [ 3323.971143] [<ffffffff81187db9>] update_time+0x89/0xe0 [ 3323.971150] [<ffffffff81187ead>] file_update_time+0x9d/0x100 [ 3323.971158] [<ffffffff81177663>] pipe_write+0x2a3/0x580 [ 3323.971166] [<ffffffff8116ef02>] do_sync_write+0x92/0xd0 [ 3323.971173] [<ffffffff8116f597>] vfs_write+0xa7/0x180 [ 3323.971179] [<ffffffff8116f8e1>] sys_write+0x51/0xa0 [ 3323.971186] [<ffffffff8159eaad>] system_call_fastpath+0x1a/0x1f [ 3323.971196] [<00007f2828f9e630>] 0x7f2828f9e62f [ 3323.971199] ---[ end trace 96f537d0e16a4416 ]--- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c1 --- Comment #1 from Stephan Barth <stephan.barth@suse.com> 2013-10-14 12:22:07 CEST --- I had to do a manual fsck, which resolved most of the issues. The full crash doesn't happen anymore, but instead the process is in D state now. 12000 12038 root D 0.0 sh sb_start_write 1 12102 root Ssl 0.0 automount ? 2 12125 root S< 0.0 rpciod rescuer_thread 2 12126 root S< 0.0 nfsiod rescuer_thread 2 12193 root S 0.0 kworker/2:0 worker_thread 2 12199 root S 0.0 kworker/4:1 worker_thread After sending w to sysrq-trigger: SysRq : Show Blocked State task PC stack pid father sh D ffff88044f2d32c0 0 12038 12000 0x00000004 ffff880395ce1c78 0000000000000082 ffff88043734a5c0 ffff880395ce1fd8 ffff880395ce1fd8 ffff880395ce1fd8 ffff88043ada2540 ffff88043734a5c0 0000000000000246 ffff880437194800 0000000000000000 0000000000000001 Call Trace: [<ffffffff8117165b>] __sb_start_write+0xcb/0x110 [<ffffffff8118d2db>] mnt_want_write+0x1b/0x50 [<ffffffff8117d6a7>] do_last+0xa47/0xed0 [<ffffffff8117dbf3>] path_openat+0xc3/0x4c0 [<ffffffff8117e894>] do_filp_open+0x44/0xb0 [<ffffffff8116ec13>] do_sys_open+0xf3/0x1e0 [<ffffffff8159eaad>] system_call_fastpath+0x1a/0x1f [<00007f3862909770>] 0x7f386290976f Sched Debug Version: v0.10, 3.7.10-1.16-desktop #1 [...] This thread/patch sounds related: http://lkml.indiana.edu/hypermail/linux/kernel/1307.0/02334.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c2 --- Comment #2 from Stephan Barth <stephan.barth@suse.com> 2013-10-14 17:24:02 CEST --- I updated to 13.1 and this issue still occurs. I also enabled smartmon and it doesn't show any issue: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list. Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.ST2000DM001_1CH164-W1E42WYH.ata.state Device: /dev/bus/6 [megaraid_disk_00], type changed from 'megaraid,0' to 'sat+megaraid,0' Device: /dev/bus/6 [megaraid_disk_00] [SAT], opened Device: /dev/bus/6 [megaraid_disk_00] [SAT], ST2000DM001-1CH164, S/N:W1E42WYH, WWN:5-000c50-06084b765, FW:CC24, 2.00 TB Device: /dev/bus/6 [megaraid_disk_00] [SAT], found in smartd database: Seagate Barracuda 7200.14 (AF) Device: /dev/bus/6 [megaraid_disk_00] [SAT], WARNING: A firmware update for this drive may be available, see the following Seagate web pages: http://knowledge.seagate.com/articles/en_US/FAQ/207931en http://knowledge.seagate.com/articles/en_US/FAQ/223651en Device: /dev/bus/6 [megaraid_disk_00] [SAT], not capable of SMART Health Status check Device: /dev/bus/6 [megaraid_disk_00] [SAT], is SMART capable. Adding to "monitor" list. Device: /dev/bus/6 [megaraid_disk_00] [SAT], state read from /var/lib/smartmontools/smartd.ST2000DM001_1CH164-W1E42WYH.ata.state Monitoring 2 ATA and 0 SCSI devices Device: /dev/sda [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 117 to 118 Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 64 Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 34 to 36 Device: /dev/bus/6 [megaraid_disk_00] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 117 to 118 Device: /dev/bus/6 [megaraid_disk_00] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 64 Device: /dev/bus/6 [megaraid_disk_00] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 34 to 36 Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.ST2000DM001_1CH164-W1E42WYH.ata.state Device: /dev/bus/6 [megaraid_disk_00] [SAT], state written to /var/lib/smartmontools/smartd.ST2000DM001_1CH164-W1E42WYH.ata.state There is no firmware update on the Seagate pages. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c3 --- Comment #3 from Stephan Barth <stephan.barth@suse.com> 2013-10-14 17:33:01 CEST --- I forgot to mention that I moved all build files to an ext3 partition and the issue still occurs there in the same way. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c4 --- Comment #4 from Stephan Barth <stephan.barth@suse.com> 2013-10-14 17:48:00 CEST --- I just tried the same osc build on my laptop on 12.3 and it also crashes my system. I just check out installation-images from openSUSE:12.3 and built it with "osc build". -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c5 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO CC| |jeffm@suse.com InfoProvider| |stephan.barth@suse.com Severity|Normal |Major --- Comment #5 from Jeff Mahoney <jeffm@suse.com> 2013-10-22 12:42:15 EDT --- Are you actually seeing a crash after the warnings? Or just the traces? This looks like something has frozen your fs. Can you identify what that is? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c6 Stephan Barth <stephan.barth@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED InfoProvider|stephan.barth@suse.com | Resolution| |INVALID --- Comment #6 from Stephan Barth <stephan.barth@suse.com> 2013-10-23 11:45:05 CEST --- Sorry, I forgot to close this bug. The actuall issue was this: https://bugzilla.novell.com/show_bug.cgi?id=846163 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c7 Matthias Hunstock <matthias.hunstock@tu-ilmenau.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |matthias.hunstock@tu-ilmena | |u.de --- Comment #7 from Matthias Hunstock <matthias.hunstock@tu-ilmenau.de> 2014-07-02 12:23:33 UTC --- I can reprocuce this error in OpenSUSE 12.3 x86_64 when creating a snapshot in a VMware ESX environment, which possibly freezes the FS via VMware Tools. Sometimes the system "crashes" after this, being in a limbo state. The kernel itself is still running, but the FS is inaccessible. Unfortunately, I am "not authorized" to see the bug linked by Stephan. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=845539 https://bugzilla.novell.com/show_bug.cgi?id=845539#c8 --- Comment #8 from Stephan Barth <stephan.barth@suse.com> 2014-07-02 14:42:08 CEST --- (In reply to comment #7)
I can reprocuce this error in OpenSUSE 12.3 x86_64 when creating a snapshot in a VMware ESX environment, which possibly freezes the FS via VMware Tools. Sometimes the system "crashes" after this, being in a limbo state. The kernel itself is still running, but the FS is inaccessible.
Unfortunately, I am "not authorized" to see the bug linked by Stephan.
The other bug was about a frozen file system while building grub. In that special case the build was not able anymore to unfreeze the file system, which led to a complete system freeze. Please check if unfreezing the affected file system solves the issue reported by you. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com