[Bug 246959] New: ext3 self-destruct on openSUSE 10.2 (kernel-default-2.6.18.2-34)
https://bugzilla.novell.com/show_bug.cgi?id=246959 Summary: ext3 self-destruct on openSUSE 10.2 (kernel-default- 2.6.18.2-34) Product: openSUSE 10.2 Version: Final Platform: i686 OS/Version: Linux Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: matthias.andree@gmx.de QAContact: qa@suse.de I've just had an i686 uniprocessor system (Athlon XP 1800+) effectively wipe itself out last Sunday. Software: kernel-default-2.6.18.2-34. e2fsprogs-1.39-21. Disk layout: /dev/hda1 vfat (unmounted and empty) /dev/hda3 ufs (unmounted, FreeBSD 6.2-PRERELEASE) /dev/hda5 ext3 as / /dev/hda6 ext3 as /home /dev/hda7 xfs as /musik There was a preceding issue where the system lost a few files three weeks ago: the hard disk drive (Seagate Barracuda ATA IV) remapped 5 sectors from /dev/hda5 and lost a few files. This was quickly repaired with e2fsck -fy /dev/hda5, rpm -Va and reinstalling the damaged packages. The current issue at hand however was violent however. The system remounted the file system R/O after finding a bitmap mismatch (logging below); several commands could not be found any more on the system after that event and the system was unbootable, so I rebooted into the DVD-based (original openSUSE 10.2 box DVD) rescue system and ran e2fsck there. This came up with lots of inconsistencies in e2fsck -pf /dev/hda, so I ran e2fsck -fy /dev/hda5 (e2fsprogs-1.39-21) which then relocated nearly 107,000 (one hundred and seven thousand!) files to /lost+found on /dev/hda5. /boot/grub and several /etc files and executables were missing, rendering the system unbootable, so I moved the rest into OLD/ and installed Ubuntu 6.10, since I don't trust the openSUSE 10.2 kernel for the nonce, until it's clear what caused this. I don't think it was e2fsck though, since before the reboot, some commands and /boot/grub had already gone missing, so I suspect kernel bugs here that systematically trashed the system. /dev/hda6 (according to e2fsck -pf) and /dev/hda7 (according to xfs_check) are undamaged. These are, in a sense, the final words the system uttered over the network to the loghost, I haven't found any other suspicious messages after the bootup at 21:14 that day. Feb 18 21:29:59 rho su: (to beagleindex) root on none Feb 18 21:30:51 rho su: (to beagleindex) root on none Feb 18 21:33:49 rho syslogd: /var/log/messages: Read-only file system Feb 18 21:33:49 rho syslogd: /var/log/warn: Read-only file system Feb 18 21:33:49 rho syslogd: /var/log/warn: Read-only file system Feb 18 21:33:49 rho kernel: EXT3-fs error (device hda5): ext3_free_inode: bit already cleared for inode 1522610 Feb 18 21:33:49 rho kernel: Aborting journal on device hda5. Feb 18 21:33:49 rho kernel: ext3_abort called. Feb 18 21:33:49 rho kernel: EXT3-fs error (device hda5): ext3_journal_start_sb: Detected aborted journal Feb 18 21:33:49 rho kernel: Remounting filesystem read-only Feb 18 21:33:49 rho kernel: EXT3-fs error (device hda5) in ext3_delete_inode: IO failure Feb 18 21:33:49 rho kernel: __journal_remove_journal_head: freeing b_committed_data Feb 18 21:33:49 rho kernel: __journal_remove_journal_head: freeing b_committed_data -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=246959 jack@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |matthias.andree@gmx.de ------- Comment #2 from jack@novell.com 2007-02-21 02:51 MST ------- Thanks for the report but it's hard to say anything here. I guess you don't have a filesystem image (metadata would be enough) before you ran e2fsck, do you? Obviously, something corrupted your filesystem and it seems the corruption was rather heavy. Given your disk had to remap a few sectors before, I would not trust it completely. Unless you have any more information, this is impossible to debug, sorry. So do you have the corrupted fs image or something like that? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=246959 matthias.andree@gmx.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|matthias.andree@gmx.de | ------- Comment #3 from matthias.andree@gmx.de 2007-02-21 03:22 MST ------- I'm afraid I don't have a metadata image. My fault, I didn't think of that and I did not expect corruptions as bad as I've seen, since I have never had such massive data losses with ext2 or ext3 in 10 years. WRT the disk drive, it passes S.M.A.R.T. self tests and has not reallocated more sectors since the original event (7 in total according to smartctl -a) or logged any I/O errors in the current situation. Even if it had, it should not have cost more than a few directories, but: $ sudo find /lost+found/ -type d | wc -l Password: 20366 That's 20366 directories in lost+found, and they're from all over the map, inode numbers from 18,000 to 2,000,000, with a bit more than 2 million available inodes total. I'd suggest to keep this report around for a few weeks, just to see if any further similar reports come in -- or if this was a one-time event. The related bits I found are http://www.ussg.iu.edu/hypermail/linux/kernel/0511.0/0193.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=246959 jack@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |REMIND ------- Comment #4 from jack@novell.com 2007-02-21 03:55 MST ------- Yes, I'll definitely keep your report in mind. I've actually collected several reports of ext3 corruption in vanilla kernels starting with a bit in a bitmap already cleared (usually it was a block bitmap though). But none of the reports reported a significant filesystem corruption - that differs from your case. So I agree we definitely have a bug somewhere (probably in vanilla kernels) it's just really hard to track it down... So I'll add your report to my collection and close the bug for now. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=246959#c5
Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959#c6
--- Comment #6 from Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959#c7
--- Comment #7 from Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959#c8
--- Comment #8 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=246959
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=246959#c9
Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959#c10
--- Comment #10 from Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959#c11
--- Comment #11 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=246959#c12
--- Comment #12 from Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959#c13
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=246959#c14
Matthias Andree
https://bugzilla.novell.com/show_bug.cgi?id=246959#c15
--- Comment #15 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=246959#c16
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=246959
User mfreitas@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=246959#c17
Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959
User mfreitas@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=246959#c18
--- Comment #18 from Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959
User jack@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=246959#c19
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=246959
User mfreitas@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=246959#c20
--- Comment #20 from Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959
User mfreitas@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=246959#c21
--- Comment #21 from Miguel Freitas
https://bugzilla.novell.com/show_bug.cgi?id=246959
User mfreitas@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=246959#c22
Miguel Freitas
participants (1)
-
bugzilla_noreply@novell.com