[Bug 345039] New: Encrypted filesystem write blocks the system
https://bugzilla.novell.com/show_bug.cgi?id=345039 Summary: Encrypted filesystem write blocks the system Product: openSUSE 10.3 Version: Final Platform: i686 OS/Version: openSUSE 10.3 Status: NEW Severity: Normal Priority: P5 - None Component: Security AssignedTo: security-team@suse.de ReportedBy: robin.listas@telefonica.net QAContact: qa@suse.de Found By: --- (I classify this provisionally as a security problem, considering encryption a security feature; please reclassify as needed). Setup description (longish) ---------------------------- I have several encrypted filesystems inherited from 10.2 and before (system has been upgraded during the time from 8.1 up to 10.3; ie, I don't use fresh install). CPU is a P-IV, 1GiB, and I have 3 HD (± 300, 120 & 60 GB) and more than 16 partitions on two of them. The main encrypted partition is activated via "/etc/init.d/boot.crypto start" during boot up, and /etc/cryptotab: /dev/loop0 /dev/disk/by-id/ata-ST3320620A_5QF2M56F-part15 \ /cripta xfs twofish256 noatime,nodiratime It is mounted via loop0, /dev/mapper/cryptotab_loop0, and I think /dev/dm-0, but exactly how, I don't understand (please, documentation anywhere :-?) nimrodel:~ # mount -l | grep cripta /dev/mapper/cryptotab_loop0 on /cripta type xfs (rw,noatime,nodiratime) [crypta_320] nimrodel:~ # dmsetup info Name: cryptotab_loop0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 That one has not given problems as far as I know. However, I have three more encripted filesystems, on file (not partitions) and mounted manually. Many of the files stored on them are 300-400 MiB. One of the filesystem is largish (28GiB), and the other two have the exact size of a DVD (4.4GiB), because I burn the file directly to a DVD as my backup procedure. Thus, I can also mount encrypted DVDs from the command line using "mount" as user, and all this through entries in fstab: /biggy/crypta_f.mm.x /mnt/crypta.mm.x xfs \ noauto,user,noatime,nodiratime,loop,encryption=twofish256 1 4 /Grande/imgs/crypta_f1_dvd.mm.x /mnt/crypta.mm_dvd1.x xfs \ noauto,user,loop,encryption=twofish256 1 4 /Grande/imgs/crypta_f2_dvd.mm.x /mnt/crypta.mm_dvd2.x xfs \ noauto,user,loop,encryption=twofish256 1 4 nimrodel:~ # mount -l | grep twofish /biggy/crypta_f.mm.x on /mnt/crypta.mm.x type xfs (rw,noexec,nosuid,nodev,noatime,nodiratime,loop=/dev/loop1,encryption=twofish256) nimrodel:~ # losetup -a /dev/loop0: [000e]:4593 (/dev/disk/by-id/ata-ST3320620A_5QF2M56F-part15) /dev/loop1: [1650]:135 (/biggy/crypta_f.mm.x), encryption twofish (type 18), key length 32 This setup has worked very well while using 10.2, but in 10.3 I have problems: I experience filesystem crash or lock up when writing to the large encrypted one. Problem description -------------------- Reading from /mnt/crypta.mm.x (28GiB) works (I managed to copy it elsewhere without a glitch). However, writing to this filesystem, from one of the encrypted DVDs fail. To clear out the DVD subsystem, I copied over the DVD using dd to a file, and loopmounted that one: /dev/loop4: [0314]:9142822 (/Grande/imgs/roto), encryption twofish (type 18), key length 32 Well, whatever the data source, writing to /mnt/crypta.mm.x a set of large files (over 300 MiB each), locks the console where the copy is having place. The copy process stops. It is unkillable. Umount of that filesystem locks (and umount is unkillable). Logout of GNome locks if an xterm locked by this problem. Reboot of system locks, requiring hard power off, and later, filesystem repair (not, it seems, of /mnt/crypta.mm.x). If gnome is active, I see in the CPU system monitor applet that "IOWait" is 100%, but cpu ussage is very low (encryption uses a lot of cpu here). gkrellm ends by freezing. If I try to "ls -l" the destination dir of the copy (that is locked, frozen), the terminal doing this ls also freezes. I have to lazy umount ("umount -l /mountpoints &") all the mountpoints I can, and then try to reboot (which hangs), and finally poweroff the machine forcefully. There is absolutely nothing in the logs relative to this problem (I believe I know how to look into logs and find problems, even if not understanding them; that was my work specialty some years ago O:-) ). I have fsck the encrypted filesystem, nothing reported: nimrodel:~ # losetup -e twofish256 /dev/loop2 /biggy/crypta_f.mm.x Password: nimrodel:~ # file -s /dev/loop2 /dev/loop2: SGI XFS filesystem data (blksz 4096, inosz 256, v2 dirs) nimrodel:~ # xfs_check /dev/loop2 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_check. If you are unable to mount the filesystem, then use the xfs_repair -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. nimrodel:~ # mount /dev/loop2 /mnt/crypta.mm.x/ nimrodel:~ # umount /dev/loop2 nimrodel:~ # xfs_check /dev/loop2 nimrodel:~ # No errors, you see. I'm aware that the encryption filesystem method have changed in 10.3, but the only document I have is the release notes. Probably I would have to use a different method than losetup, but I have no idea which exactly. But notice that the problem arises from a filesystem mounted directly from fstab - shouldn't this method be used now anymore? I can not mount this partition from /etc/cryptotab as I need to mount and umount it manually, at will. In any case, the "classic" method should not freeze the computer, as it does :-? I can run tests; right now I think I can reproduce the situation at will, as I had to reboot several times yesterday while trying to diagnose it (several hours). But please, be careful with the suggested tests, as a failure means powering off forcefully, which means a minimum of half an hour per test, providing I don't loose data in the process. So far I have been lucky. And please, try to limit tests to runlevel 3, as a gnome lockup plays havoc with things like command history and some gadgets. I haven't seen any messages related to the lock. Not in dmesg, not in consoles, log files, etc. nothing. Not related to the problem even remotely, and sometimes not a message about anything at all. I'm willing to recompile the kernel and increase verbosity somewhere, if you know that somewhere (I don't). Mmmm... I think I wrote all I know about it. I hope you can decipher the problem! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039#c1 Christian Boltz <suse-beta@cboltz.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lnussel@novell.com, suse-beta@cboltz.de AssignedTo|security-team@suse.de |kernel-maintainers@forge.provo.novell.com Component|Security |Kernel --- Comment #1 from Christian Boltz <suse-beta@cboltz.de> 2007-11-29 16:07:08 MST --- Reassigning to the kernel developers (as suggested by Ludwig Nussel) - and CC'ing Ludwig because he has seen a similar problem at home (see his mail in suse-security). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c2 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|Normal |Major --- Comment #2 from Carlos Robinson <robin.listas@telefonica.net> 2007-12-30 04:48:44 MST --- This problem continues. I increased the severity because this bug impedes work. Soon after I write large files to an encrypted, loop mounted, filesystem, the operation locks. All programs are thereafter unable to exit or be killed -9, and system can not even reboot or halt, only power switch works, forcing a huge fsck. It does not matter if filesystem is mounted through device mapper, new style (dm-crypt), or old style (cryptoloop). A week ago I tried new style. /etc/crypttab: mycrypt_mm_f /biggy/crypta_f.mm.x none \ cipher=twofish-cbc-plain,size=256,hash=sha512,noauto,loop /etc/fstab: /dev/mapper/mycrypt_mm_f /mnt/crypta.mm.x xfs \ noauto,user,noatime,nodiratime 1 4 nimrodel:~ # df -h /mnt/crypta.mm.x Filesystem Size Used Avail Use% Mounted on /dev/mapper/mycrypt_mm_f 28G 19G 9.3G 67% /mnt/crypta.mm.x Mounted as: /dev/mapper/mycrypt_mm_f on /mnt/crypta.mm.x type xfs \ (rw,noexec,nosuid,nodev,noatime,nodiratime) Logs: nothing is logged anywhere. Nothing! I'd be happy to try kernel options to enable more logging, or try tests... I can trigger the bug at will, it is reproducible here. But I need clues, I don't know what to do: you are the experts, not me. The next step for me would be to erase and recreate the filesystem, but if that works you will not be able to diagnose it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c3 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeffm@novell.com Status|NEW |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #3 from Jeff Mahoney <jeffm@novell.com> 2008-01-02 15:20:15 MST --- There could be a deadlock or an Oops occurring. Since you're running a GUI and the system is locking up in the manner it is, it's tough to debug. If you can reproduce this at will, could you do so in console mode after running dmesg -n 8 ; klogconsole -r 0 That will dump all kernel output messages to the current console. If there's an oops, you'll see it. Otherwise, if it locks up without an Oops, please do a alt+sysrq+t to dump the states of the processes on the system. If something is stuck in a deadlock, we should be able to see it. The best way to capture this info is with a serial console, but if you're unable to set one up - a photo of the Oops will do. If there's sysrq+t output, that's going to be more difficult to capture in its entirety. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c4 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|robin.listas@telefonica.net | --- Comment #4 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-03 04:12:25 MST --- (In reply to comment #3 from Jeff Mahoney)
There could be a deadlock or an Oops occurring. Since you're running a GUI and the system is locking up in the manner it is, it's tough to debug.
The first time I was in gnome; later on, I always tested in runlevel 3. I will attach my report as a file. The gist of it: nothing.
If you can reproduce this at will, could you do so in console mode after running dmesg -n 8 ; klogconsole -r 0
No output at all. The cp command simply stops copying, cpu goes iddle.
please do a alt+sysrq+t to dump the states of the processes on the system. If something is stuck in a deadlock, we should be able to see it.
No output at all. Don't we need to enable first sysrq somehow?
The best way to capture this info is with a serial console, but if you're unable to set one up - a photo of the Oops will do. If there's sysrq+t output, that's going to be more difficult to capture in its entirety.
Serial port would be possible (if you point me to a howto, including pinout of serial cable), but I don't think it is necessary, the system doesn't crash till I try to umount or access the affected filesystem. I can read the log, do dmesg, cat /proc/something, etc. However, there is no message anywhere. Nothing, nada. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c5 --- Comment #5 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-03 04:26:58 MST --- Created an attachment (id=189148) --> (https://bugzilla.novell.com/attachment.cgi?id=189148) Requested report detailed info -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c6 --- Comment #6 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-23 17:27:53 MST --- There is some unexpected relation to Bug 349942 - see comment number 4 there: (https://bugzilla.novell.com/show_bug.cgi?id=349942#c4) I will try to reproduce this bug with Apparmour disabled, time permitting. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User ffiene@veka.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c7 Frank Fiene <ffiene@veka.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ffiene@veka.com --- Comment #7 from Frank Fiene <ffiene@veka.com> 2008-01-30 11:04:09 MST --- Hmm, maybe i have the same error. I've shared an encrypted xfs formatted disk with nfs. If i try to copy large files (bigger than 350MB), cp ends normally but gives me a file of round about 350MB. Nor error messages from cp and nothing in /var/log/messages both on server and client. If i try to mount an iso file on this server with loop, every file after the first 350MB are corrupt or are not existing. I've mounted with r(w)size of 8192 and 1024, tried also to disable NFSv4. I didn't try to enable NFS over TCP and i think UDP is not the problem. Puh AppArmor. On client disabled, i don't know if on server. Normaly i install without AppArmor but i am not sure ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c8 --- Comment #8 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-30 17:00:54 MST --- I finally tried with apparmour disabled, after doing a backup just in case. I managed to copy 4.7 GB from an encrypted DVD to an encrypted filesystem without problems. I then tried to copy another one... and this one failed. So apparmour is not the problem. Nor did I get any error message anywhere: the copy process stops, the affected filesystem is not umountable. Question, please, someone: Is there any option I can enable in the kernel, even recompiling the kernel, so that we could get more detailed log allowing us to learn what the heck is happening here? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User ffiene@veka.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c9 --- Comment #9 from Frank Fiene <ffiene@veka.com> 2008-01-31 01:27:48 MST --- Sorry, not the same error for me. I've tested with samba with the same shared directory and it works, but samba is so bloody slow (6MB/s vs. 12MB/s on a 100MBit line) I will create a new bug report. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 Greg Kroah-Hartman <gregkh@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.provo.novell.com |jeffm@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c10 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #10 from Jeff Mahoney <jeffm@novell.com> 2008-02-19 14:22:39 MST --- @Carlos: If the system is hanging and you can still issue commands (as seems to be the case), can you try the following: echo t > /proc/sysrq-trigger Depending on whether writing files works, either save the output of dmesg to a file, or output it to another machine like this: dmesg | ssh <somehost> "cat > dmesg.out" That will give a kernel stack trace of everything on the system, and should tell me where the encrypted block device is hanging. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c11 --- Comment #11 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-19 16:13:08 MST --- (In reply to comment #10 from Jeff Mahoney)
@Carlos:
If the system is hanging and you can still issue commands (as seems to be the case), can you try the following:
echo t > /proc/sysrq-trigger
Do I do that before I start the file copy procedure, or after it hangs? I assume the former.
Depending on whether writing files works, either save the output of dmesg to a file, or output it to another machine like this:
dmesg | ssh <somehost> "cat > dmesg.out"
That will give a kernel stack trace of everything on the system, and should tell me where the encrypted block device is hanging.
Ok, I'll try, time permitting. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c12 --- Comment #12 from Jeff Mahoney <jeffm@novell.com> 2008-02-19 16:15:49 MST --- No, it needs to be done after the hang. A trace before the hang will just show a normally running system, chugging along. :) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c13 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|robin.listas@telefonica.net | --- Comment #13 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-19 18:49:13 MST --- Created an attachment (id=195842) --> (https://bugzilla.novell.com/attachment.cgi?id=195842) requested dmesg output That is the dmesg file, saved locally. It took me longer to test because first I had to boot my spare computer, which was recalcitrant tonight (it "runs" 7.3). There is a snag: I forgot I was running my slightly customized kernel. I'll attach my kernel config file and System.map, but if that doesn't help, tell me and I'll redo the test with the default kernel - but not tonight, here it is 2:48 AM. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c14 --- Comment #14 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-19 18:50:23 MST --- Created an attachment (id=195843) --> (https://bugzilla.novell.com/attachment.cgi?id=195843) System.map-2.6.22.17-0.1-cer of the running kernel -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c15 --- Comment #15 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-19 18:51:11 MST --- Created an attachment (id=195845) --> (https://bugzilla.novell.com/attachment.cgi?id=195845) kernel.config.gz of the running kernel for the dmesg file above. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c16 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID --- Comment #16 from Jeff Mahoney <jeffm@novell.com> 2008-02-20 11:10:11 MST --- That's a showstopper. Obviously we can't support kernels we don't ship. If you're able to reproduce without VMware and with a kernel as shipped by Novell, please re-open this bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c17 --- Comment #17 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-20 11:38:53 MST --- (In reply to comment #16 from Jeff Mahoney)
That's a showstopper. Obviously we can't support kernels we don't ship. If you're able to reproduce without VMware and with a kernel as shipped by Novell, please re-open this bug.
You are unfair and unjust! :-/ If you have read this report carefully you'd know that's not the case. What about Ludwig? Is he also using a non standard software? I said that yesterday I was running my kernel, I repeat: _Yesterday_. I can reproduce any time with the default kernel, and I have done so often here. Also, Vmware is not involved, as it happens that I used vmware yesterday to test something, the first time since November. I simply forgot to reboot before submitting my report. I was so happy that you gave me instructions to follow that I forgot to reboot to the default kernel before testing and reporting, and I told you so. I also told you I could repeat the test in standard conditions. You could simply have said: "Yes, please repeat the test in standard conditions." I would have set aside two or three hours to please you. But no, you go and close the bug as invalid after all these months of testings on your behalf. YOU ARE UNFAIR! :-/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c18 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #18 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-20 11:39:56 MST --- Invalid close as "invalid". -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c19 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #19 from Jeff Mahoney <jeffm@novell.com> 2008-02-20 11:52:34 MST --- My apologies, I misunderstood. It's standard practice for us to close out bugs in unsupportable situations. The intention is that if the problem is reproducible with a shipped kernel, the user will re-open it and provide the information. In many cases, the reporter is unable to reproduce it and the bug stays closed. If we did otherwise, we'd end up with loads of "NEEDINFO" bugs collecting dust since those users tend to fall silent when it's shown that the problem isn't with anything we ship. I did mention that you could re-open it to reproduce with the shipped kernel, so closing the bug as invalid under such situations is not "final," as you've just proved. :) If you're able, the contents of /var/log/messages would be helpful as well. It looks like the dmesg output was too long and ended up getting cut off at the beginning. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c20 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #195842|0 |1 is obsolete| | Attachment #195843|0 |1 is obsolete| | Attachment #195845|0 |1 is obsolete| | Status|NEEDINFO |REOPENED Info Provider|robin.listas@telefonica.net | --- Comment #20 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-20 13:28:29 MST --- Created an attachment (id=196086) --> (https://bugzilla.novell.com/attachment.cgi?id=196086) dmesg_after_standard_kernel.out.gz Ok, apologies accepted. You could,however, simply mark as needinfo, an automatically as invalid after, say, a month without response. Here I add the data. I'll attach my /var/log/kernel file and /var/log/messages (edited to contain data of today only), because my messages file does not contain kernel messages (syslog-ng configuration non standard). If you need more logs or data, ask :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c21 --- Comment #21 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-20 13:29:15 MST --- Created an attachment (id=196087) --> (https://bugzilla.novell.com/attachment.cgi?id=196087) kernel.gz -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c22 --- Comment #22 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-20 13:30:05 MST --- Created an attachment (id=196088) --> (https://bugzilla.novell.com/attachment.cgi?id=196088) messages.gz -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c23 --- Comment #23 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-20 13:33:20 MST --- Created an attachment (id=196089) --> (https://bugzilla.novell.com/attachment.cgi?id=196089) boot.msg.gz -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c24 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #24 from Jeff Mahoney <jeffm@novell.com> 2008-02-20 17:46:08 MST --- Hrm, the goal was to get an un-truncated log, but that didn't work. So, how about we try two things at once: I've built a -debug kernel that contains things like lock debugging, as well as and increased log buffer size. This isn't a kernel you'd want to run on a production system, but it contains a bunch of sanity checks that should help us here. http://ftp.suse.com/pub/people/jeffm/suse/testpkgs/345039/ Once you've installed the kernel, reboot, and re-run the tests as before. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c25 --- Comment #25 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-20 18:11:07 MST --- Thanks for your effort -- I hope we can catch this one :-) Ok, I will install it tomorrow. I see two rpms, I suppose I install both. What was the rpm command to install a supplementary kernel without removing the already installed one? I don't remember off-hand. If you don't know I'll look the man page again. Grub is no problem, I'll handle that ok. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c26 --- Comment #26 from Jeff Mahoney <jeffm@novell.com> 2008-02-20 18:43:43 MST --- rpm -i <rpm> does it. You'll only need the .i386.rpm. The .nosrc.rpm is the source rpm. It's only "nosrc" instead of "src" because otherwise we'd have to ship a full copy of the linux kernel source for _every_ flavor of the kernel we ship. That's a lot of space to waste. ;) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User lnussel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c27 --- Comment #27 from Ludwig Nussel <lnussel@novell.com> 2008-02-21 01:00:12 MST --- FWIW the problem I had at home was caused by the sata driver and is fixed/worked around meanwhile. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c28 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |REOPENED Info Provider|robin.listas@telefonica.net | --- Comment #28 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-21 05:37:45 MST --- Created an attachment (id=196255) --> (https://bugzilla.novell.com/attachment.cgi?id=196255) kernel log Here you are. I had to copy over 9 GiB till the crash happened, I almost thought it did not want to help us today. Yiks! But I think that something wrong happened to the trace: it is to small. The dmesg file doesn't contain anything. I think I saw more in the terminal, but I can't be sure. I did "dmesg | tee somefile", it started to pour pages, then stopped. The kernel log file contains less than a screen of traces. Ok, if you think something happened to the data, I'll repeat the test, dumping via ssh to the aux pc. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c29 --- Comment #29 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-21 05:39:39 MST --- Created an attachment (id=196260) --> (https://bugzilla.novell.com/attachment.cgi?id=196260) messages.gz, debug kernel -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c30 --- Comment #30 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-21 05:40:10 MST --- Created an attachment (id=196261) --> (https://bugzilla.novell.com/attachment.cgi?id=196261) dmesg_debug_kernel.out.gz -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c31 --- Comment #31 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-21 05:40:51 MST --- Created an attachment (id=196262) --> (https://bugzilla.novell.com/attachment.cgi?id=196262) boot.msg_debug_kernel.gz -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c32 --- Comment #32 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-21 12:41:43 MST --- One detail: this time, when lazy umounting everything, I got the message: can't lock file /etc/mtab~: timed out This is the first time I see this. Ah, please tell me if the test was successful, and you saw something - even if then it takes months to repair; but knowing that the fault is finally known would be be something. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c33 --- Comment #33 from Jeff Mahoney <jeffm@novell.com> 2008-02-21 13:02:22 MST --- It certainly seems as though something happened to the data. The dmesg output provided is lacking any sort of stack trace information. I appreciate your efforts in trying to track this down. It can only be more frustrating for you than it is for me. When you commented that the output was only half a page, I was hopeful that it was a lockdep report citing a deadlock problem. It just ended up being a really truncated stack trace. Given that the next line after the trace was a "klog starting" line, it looks as though it was from a previous test run. Hopefully dumping it over the network will give the full output. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c34 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #196255|0 |1 is obsolete| | Attachment #196260|0 |1 is obsolete| | Attachment #196261|0 |1 is obsolete| | Attachment #196262|0 |1 is obsolete| | --- Comment #34 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-21 14:27:59 MST --- Created an attachment (id=196436) --> (https://bugzilla.novell.com/attachment.cgi?id=196436) boot.msg.gz, debug kernel, good log Ok, this time I think we got the data. dmesg worked fine both locally and in the remote ssh. I don't know why it failed the other time: I suppose that the command "tee" crashed and impeded the correct saving of the file. I did notice "tee" crashing, but I did not notice that the /var/log/kernel file didn't had all the data. The restart you saw in the previous log is simply the next reboot of the computer: I edited the log to remove previous lines to this debug session, and maybe the next normal session, to save size. I'm attaching this time only the dmesg output, because I think it is complete. If you want kernel and messages log, just ask for it. And obviously, yes, I want to have this solved O:-) I can make do rebooting to 10.2 and saving what I need, but... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c35 --- Comment #35 from Jeff Mahoney <jeffm@novell.com> 2008-03-17 13:45:02 MST --- Actually, that looks just like boot.msg, not dmesg. That ends up silencing most of the kernel output during boot once syslog starts. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c36 --- Comment #36 from Carlos Robinson <robin.listas@telefonica.net> 2008-03-17 14:25:48 MST --- Created an attachment (id=202656) --> (https://bugzilla.novell.com/attachment.cgi?id=202656) dmesg.out.gz file Indeed, because the attachment of Comment #34 is the boot log file. This was a month ago, and now I can't remember how or why the dmesg file wasn't attached. Maybe firefox crashed or stuck. Anyway, here goes dmesg. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c37 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |ASSIGNED --- Comment #37 from Jeff Mahoney <jeffm@novell.com> 2008-03-18 12:02:00 MST --- Ok, here's what I see that's of potential interest: xfsdatad/0 D 00000000 7000 1958 2 (L-TLB) f60dfee4 00000046 00000001 00000000 f60dfe78 c04aa718 c04aa518 c044fa00 c044fa00 00000046 00000009 df8934c0 79edc300 000000ac 00000000 df8935f0 c1a0da00 00000000 0000000a 00000000 c1b7f828 0044788f f789b70c 00000000 Call Trace: [<c01e1ebd>] rwsem_down_failed_common+0x14a/0x169 [<c02e475e>] rwsem_down_write_failed+0x1b/0x23 [<c02e4826>] call_rwsem_down_write_failed+0x6/0x10 [<f941e854>] xfs_ilock+0x45/0x65 [xfs] [<f943fab9>] xfs_setfilesize+0x35/0xc3 [xfs] [<f943fc31>] xfs_end_bio_delalloc+0x10/0x1a [xfs] [<c0132061>] run_workqueue+0x7d/0x110 [<c01329e1>] worker_thread+0xbe/0xcd [<c01350ea>] kthread+0x3b/0x61 [<c0106167>] kernel_thread_helper+0x7/0x10 loop1 D EE785D4C 5148 5569 2 (L-TLB) ee785dc4 00000046 00000086 ee785d4c 00000086 dfeac914 ee785d60 c044fa00 c044fa00 dfeac890 0000000a ee765830 79edc300 000000ac 00000000 ee765960 c1a0da00 00000000 dfeac890 ee785d88 c1b7f828 ee785d94 c01c9b53 ee785dac Call Trace: [<c02e3702>] io_schedule+0x34/0x53 [<c0154519>] sync_page+0x3b/0x3f [<c02e38eb>] __wait_on_bit+0x36/0x5d [<c0154683>] wait_on_page_bit+0x5d/0x64 [<c0159183>] write_cache_pages+0x14e/0x27b [<c01592d2>] generic_writepages+0x22/0x2a [<f9440089>] xfs_vm_writepages+0x36/0x3c [xfs] [<c01592fd>] do_writepages+0x23/0x34 [<c015520c>] __filemap_fdatawrite_range+0x60/0x6c [<c0155413>] filemap_fdatawrite+0x26/0x28 [<f926541f>] sync_file+0x28/0x62 [loop] [<f9266904>] loop_thread+0x3ef/0x428 [loop] [<c01350ea>] kthread+0x3b/0x61 [<c0106167>] kernel_thread_helper+0x7/0x10 mc D EE5DD75C 5152 5710 5532 (NOTLB) ee5dd770 00000086 00000002 ee5dd75c ee5dd74c 00000000 00000001 c044fa00 c044fa00 00000010 00000007 eeaa0c90 eb727b00 000000bc 00000000 eeaa0dc0 c1a0da00 00000000 00012cff 00000000 c1b7f828 0001f36c 0000d7ff ffffffff Call Trace: [<c02e4a17>] __down+0xc3/0xd7 [<c02e479a>] __down_failed+0xa/0x10 [<f94418b6>] xfs_buf_lock+0x4b/0x4e [xfs] [<f942fb90>] xfs_getsb+0x25/0x31 [xfs] [<f94355da>] xfs_trans_getsb+0x3b/0x6d [xfs] [<f94346eb>] xfs_trans_apply_sb_deltas+0x15/0x4a9 [xfs] [<f9434c6e>] _xfs_trans_commit+0xef/0x2fd [xfs] [<f94261bc>] xfs_iomap_write_allocate+0x2ea/0x422 [xfs] [<f9424f6b>] xfs_iomap+0x34c/0x3e0 [xfs] [<f94460de>] xfs_bmap+0x2c/0x32 [xfs] [<f9440024>] xfs_map_blocks+0x3c/0x6b [xfs] [<f9440ff0>] xfs_page_state_convert+0x30f/0x5f6 [xfs] [<f944140b>] xfs_vm_writepage+0x93/0xc8 [xfs] [<c0158e2b>] __writepage+0xb/0x25 [<c01591a9>] write_cache_pages+0x174/0x27b [<c01592d2>] generic_writepages+0x22/0x2a [<f9440089>] xfs_vm_writepages+0x36/0x3c [xfs] [<c01592fd>] do_writepages+0x23/0x34 [<c018c153>] __writeback_single_inode+0x1a0/0x337 [<c018c658>] sync_sb_inodes+0x18e/0x260 [<c018ca81>] writeback_inodes+0x84/0xd3 [<c0159665>] balance_dirty_pages_ratelimited_nr+0xea/0x1ac [<c0155af6>] generic_file_buffered_write+0x47a/0x549 [<f94473b3>] xfs_write+0x614/0x941 [xfs] [<f94439c8>] xfs_file_aio_write+0x6a/0x72 [xfs] [<c0172ca7>] do_sync_write+0xc5/0x102 [<c0173480>] vfs_write+0xaf/0x13e [<c0173a29>] sys_write+0x3d/0x61 [<c0104e86>] sysenter_past_esp+0x5f/0x89 pdflush D EE855D80 5512 5781 2 (L-TLB) ee855d94 00000046 00000002 ee855d80 ee855d70 00000000 00000000 c044fa00 c044fa00 00000000 0000000a ee23f300 76116f00 000000b3 00000000 ee23f430 c1a0da00 00000000 f0150290 f7b78140 df9dc438 0001cbc0 f92a5ca0 ffffffff Call Trace: [<c02e3702>] io_schedule+0x34/0x53 [<c0154519>] sync_page+0x3b/0x3f [<c02e3828>] __wait_on_bit_lock+0x36/0x60 [<c01544ca>] __lock_page+0x54/0x5c [<c0159142>] write_cache_pages+0x10d/0x27b [<c01592d2>] generic_writepages+0x22/0x2a [<f9440089>] xfs_vm_writepages+0x36/0x3c [xfs] [<c01592fd>] do_writepages+0x23/0x34 [<c018c06b>] __writeback_single_inode+0xb8/0x337 [<c018c658>] sync_sb_inodes+0x18e/0x260 [<c018ca81>] writeback_inodes+0x84/0xd3 [<c015991e>] wb_kupdate+0x84/0xe4 [<c0159d13>] pdflush+0x117/0x1ae [<c01350ea>] kthread+0x3b/0x61 [<c0106167>] kernel_thread_helper+0x7/0x10 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c38 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #38 from Jeff Mahoney <jeffm@novell.com> 2008-03-18 12:08:50 MST --- There have been a *lot* of upstream XFS changes recently. Would you be willing to test an openSUSE 11.0 Alpha kernel to see if this is still reproducable? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c39 --- Comment #39 from Carlos Robinson <robin.listas@telefonica.net> 2008-03-18 12:44:28 MST --- Yes, of course. Easiest for me is to try an os11.0 kernel on os10.3, if that is possible. Just tell me which one to download and install. The advantage for me, if it works, would be to "repair" 10.3 that way. If that's not possible, I have another partition on which I have intended to install alpha, but didn't do it yet. It would be longer to do, but I would. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c40 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|robin.listas@telefonica.net | --- Comment #40 from Carlos Robinson <robin.listas@telefonica.net> 2008-03-18 12:45:11 MST --- Changing status -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c41 --- Comment #41 from Carlos Robinson <robin.listas@telefonica.net> 2008-03-30 04:23:12 MST --- (In reply to comment #39 from Carlos Robinson)
If that's not possible, I have another partition on which I have intended to install alpha, but didn't do it yet. It would be longer to do, but I would.
I have tested alpha 11, the Friday release. minas-morgul:~ # uname -a Linux minas-morgul 2.6.25-rc6-git5-10-default #1 SMP 2008-03-20 22:57:18 +0100 i686 i686 i386 GNU/Linux The "problem" is very much alive and kicking here. :-/ I don't have a log report, I had forgotten the "echo t > /proc/sysrq-trigger" command and once the system was crashed I could no longer mount the partition with my notes to refresh my memory. If you want it, tell me and I will try again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c42 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #42 from Jeff Mahoney <jeffm@novell.com> 2008-03-30 08:01:26 MST --- It would be useful, but not essential. I was hoping to see if any of the recent fixes helped. Can you try to reproduce with kernel-vanilla from 10.3 as well? This will show whether or not the problem was introduced by any of our patches. I understand this can be a pain. I don't have an environment set up locally with which to track this down. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c43 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|robin.listas@telefonica.net | --- Comment #43 from Carlos Robinson <robin.listas@telefonica.net> 2008-03-30 13:37:22 MST --- Ok, I just tried with kernel vmlinuz-2.6.22.5-31-vanilla. I copied 9 GiB from two encrypted partitions to another encrypted partition, two times (delete destination space, copy, delete, copy again): no problem, no crash, fast copy. So, yes, the problem is an addition, and the new XFS things don't solve the problem. Hopefully this can help you to find the problem... for me, I know that I don't need to boot into 10.2 to do my encrypted backups, I can use the vanilla kernel in 10.3 - but I don't know what else will stop to work, and what security problems have not been plugged. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c44 --- Comment #44 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-18 05:42:07 MST --- Created an attachment (id=216233) --> (https://bugzilla.novell.com/attachment.cgi?id=216233) messages log file I have checked current factory to see if the problem is still active - and it is, very much so :-( Kernel: 2.6.25.3-2-default Factory version: probably Beta 3. Fully up to date. Attached goes the /var/log/messages file, and the boot.msg file. Knowing that 10.3 Vanilla does not have this bug, I would willing to try your kernel with some patches removed. If you can not provide such a kernel, I'm wiling to compile it myself, but I'd need guidance as to what patches to remove and how: I have compiled kernels many times, but I have never removed patches; and in this case I have no idea what to remove, of course It takes about 4 hours to compile a kernel here, plus testing time. But if you can not reproduce this bug on your premises, it stands to reason that it must be me who does the testing: but I _need_ _guidance_ as to what to test. This bug is ugly: so far I'm the only known person bitten, but there could be more out there, and you never know what a broken piece of software might do any other day. Please help. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c45 --- Comment #45 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-18 05:42:45 MST --- Created an attachment (id=216234) --> (https://bugzilla.novell.com/attachment.cgi?id=216234) factory_11.0_20080517_boot.msg.gz -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Kernel |Kernel OS/Version|openSUSE 10.3 |openSUSE 11.0 Product|openSUSE 10.3 |openSUSE 11.0 Target Milestone|--- |Beta 3 Version|Final |Beta 3 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c46 --- Comment #46 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-22 13:18:16 MST --- Created an attachment (id=217613) --> (https://bugzilla.novell.com/attachment.cgi?id=217613) Output from "top" during the lock in factory This is a log taken from the command 'top' during the lock. The copy process was made with 'mc', so look at this line: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ WCHAN Flags COMMAND 10628 crypt 20 0 6688 2836 1956 D 0.0 0.3 0:03.12 xfs_ilock ..4.2... mc Ii hope this helps a bit more to learn what is happening. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c47 --- Comment #47 from Carlos Robinson <robin.listas@telefonica.net> 2008-07-16 05:25:04 MDT --- I'm still waiting for trying to solve this (see comment #44) Meanwhile, I have reproduced the problem using LUKS encryption (10.3), creating encrypted filesystems using Yast. /etc/crypttab: cr_crypta_f5_dvd.mm.x /Grande/imgs/crypta_f5_dvd.mm.x none noauto cr_crypta_f6_dvd.mm.x /Grande/imgs/crypta_f6_dvd.mm.x none noauto (I changed the last "none" (written by Yast) to "noauto") /etc/fstab: /dev/mapper/cr_crypta_f5_dvd.mm.x /mnt/crypta.mm_dvd5.x xfs noatime,noauto 0 0 /dev/mapper/cr_crypta_f6_dvd.mm.x /mnt/crypta.mm_dvd6.x xfs noatime,noauto 0 0 nimrodel:~ # file /Grande/imgs/crypta_f[56]*x /Grande/imgs/crypta_f5_dvd.mm.x: LUKS encrypted file, ver 1 [aes, cbc-essiv:sha256, sha1] UUID: 1ec0ab07-73d0-4ccd-b8df-08076ff /Grande/imgs/crypta_f6_dvd.mm.x: LUKS encrypted file, ver 1 [aes, cbc-essiv:sha256, sha1] UUID: bb7b10ee-74a1-43aa-b55a-df39ca3 These appeared to work for some hours, then locked. I rebooted, tried again, locked fast. I'll attach the dmesg output later. I also made filesystems in the same way, but using ext3 and reiserfs, which do not lock (so far, after writing +40 GB). So the problem is indeed associated with XFS. There is something ugly hidden in there, waiting to come out when you less expect it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c48 --- Comment #48 from Carlos Robinson <robin.listas@telefonica.net> 2008-07-16 05:29:55 MDT --- Created an attachment (id=228120) --> (https://bugzilla.novell.com/attachment.cgi?id=228120) kernel log during crash using LUKS XFS crypto filesystem (re comment 46) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c49 --- Comment #49 from Carlos Robinson <robin.listas@telefonica.net> 2008-07-16 05:31:23 MDT --- (In reply to comment #48 from Carlos Robinson)
Created an attachment (id=228120) --> (https://bugzilla.novell.com/attachment.cgi?id=228120) [details] kernel log during crash using LUKS XFS crypto filesystem (re comment 46)
Errata: (re comment 47) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=345039#c50 --- Comment #50 from Carlos Robinson <robin.listas@telefonica.net> 2008-12-29 12:13:40 MST --- Created an attachment (id=262656) --> (https://bugzilla.novell.com/attachment.cgi?id=262656) messages log file opensuse 11.1 GM Well, the bug remains alive on OpenSUSE 11.1 GM. I tried twice. First attempt, an encrypted XFS filesystem made with 10.3: /etc/crypttab: mycr_mm_f /data/biggy/crypta_f.mm.x none \ cipher=twofish-cbc-plain,size=256,hash=sha512,noauto,loop /etc/fstab: /dev/mapper/mycr_mm_f /data/crypta.mm.x xfs \ noauto,user,noatime,nodiratime 0 0 You will find the messages log attached, it is the first part. Then I tried with another filesystem, XFS LUKS made with 11.0: /etc/crypttab: mcr_mm_f5_dvd.x /data/Grande/imgs/crypta_f5_dvd.mm.x none noauto /etc/fstab: /dev/mapper/mcr_mm_f5_dvd.x /data/crypta.mm_dvd5.x xfs noatime,noauto 0 0 The log is in the second part of the file. Both attempts failed when I tried to write (big files), nothing got written in fact. Procedure is the same as previously. As I see no progress in this matter, I have destroyed my big XFS encrypted filesystem and created another one as reiserfs. Notice that I have kept that filesystem for a year for the sole purpose of your tests, and I have seen no movement at all on this bug - so I destroy it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Kernel |Kernel OS/Version|openSUSE 11.0 |openSUSE 11.1 Product|openSUSE 11.0 |openSUSE 11.1 Target Milestone|Beta 3 |--- Version|Beta 3 |Final -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com