[Bug 345039] New: Encrypted filesystem write blocks the system
https://bugzilla.novell.com/show_bug.cgi?id=345039 Summary: Encrypted filesystem write blocks the system Product: openSUSE 10.3 Version: Final Platform: i686 OS/Version: openSUSE 10.3 Status: NEW Severity: Normal Priority: P5 - None Component: Security AssignedTo: security-team@suse.de ReportedBy: robin.listas@telefonica.net QAContact: qa@suse.de Found By: --- (I classify this provisionally as a security problem, considering encryption a security feature; please reclassify as needed). Setup description (longish) ---------------------------- I have several encrypted filesystems inherited from 10.2 and before (system has been upgraded during the time from 8.1 up to 10.3; ie, I don't use fresh install). CPU is a P-IV, 1GiB, and I have 3 HD (± 300, 120 & 60 GB) and more than 16 partitions on two of them. The main encrypted partition is activated via "/etc/init.d/boot.crypto start" during boot up, and /etc/cryptotab: /dev/loop0 /dev/disk/by-id/ata-ST3320620A_5QF2M56F-part15 \ /cripta xfs twofish256 noatime,nodiratime It is mounted via loop0, /dev/mapper/cryptotab_loop0, and I think /dev/dm-0, but exactly how, I don't understand (please, documentation anywhere :-?) nimrodel:~ # mount -l | grep cripta /dev/mapper/cryptotab_loop0 on /cripta type xfs (rw,noatime,nodiratime) [crypta_320] nimrodel:~ # dmsetup info Name: cryptotab_loop0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 That one has not given problems as far as I know. However, I have three more encripted filesystems, on file (not partitions) and mounted manually. Many of the files stored on them are 300-400 MiB. One of the filesystem is largish (28GiB), and the other two have the exact size of a DVD (4.4GiB), because I burn the file directly to a DVD as my backup procedure. Thus, I can also mount encrypted DVDs from the command line using "mount" as user, and all this through entries in fstab: /biggy/crypta_f.mm.x /mnt/crypta.mm.x xfs \ noauto,user,noatime,nodiratime,loop,encryption=twofish256 1 4 /Grande/imgs/crypta_f1_dvd.mm.x /mnt/crypta.mm_dvd1.x xfs \ noauto,user,loop,encryption=twofish256 1 4 /Grande/imgs/crypta_f2_dvd.mm.x /mnt/crypta.mm_dvd2.x xfs \ noauto,user,loop,encryption=twofish256 1 4 nimrodel:~ # mount -l | grep twofish /biggy/crypta_f.mm.x on /mnt/crypta.mm.x type xfs (rw,noexec,nosuid,nodev,noatime,nodiratime,loop=/dev/loop1,encryption=twofish256) nimrodel:~ # losetup -a /dev/loop0: [000e]:4593 (/dev/disk/by-id/ata-ST3320620A_5QF2M56F-part15) /dev/loop1: [1650]:135 (/biggy/crypta_f.mm.x), encryption twofish (type 18), key length 32 This setup has worked very well while using 10.2, but in 10.3 I have problems: I experience filesystem crash or lock up when writing to the large encrypted one. Problem description -------------------- Reading from /mnt/crypta.mm.x (28GiB) works (I managed to copy it elsewhere without a glitch). However, writing to this filesystem, from one of the encrypted DVDs fail. To clear out the DVD subsystem, I copied over the DVD using dd to a file, and loopmounted that one: /dev/loop4: [0314]:9142822 (/Grande/imgs/roto), encryption twofish (type 18), key length 32 Well, whatever the data source, writing to /mnt/crypta.mm.x a set of large files (over 300 MiB each), locks the console where the copy is having place. The copy process stops. It is unkillable. Umount of that filesystem locks (and umount is unkillable). Logout of GNome locks if an xterm locked by this problem. Reboot of system locks, requiring hard power off, and later, filesystem repair (not, it seems, of /mnt/crypta.mm.x). If gnome is active, I see in the CPU system monitor applet that "IOWait" is 100%, but cpu ussage is very low (encryption uses a lot of cpu here). gkrellm ends by freezing. If I try to "ls -l" the destination dir of the copy (that is locked, frozen), the terminal doing this ls also freezes. I have to lazy umount ("umount -l /mountpoints &") all the mountpoints I can, and then try to reboot (which hangs), and finally poweroff the machine forcefully. There is absolutely nothing in the logs relative to this problem (I believe I know how to look into logs and find problems, even if not understanding them; that was my work specialty some years ago O:-) ). I have fsck the encrypted filesystem, nothing reported: nimrodel:~ # losetup -e twofish256 /dev/loop2 /biggy/crypta_f.mm.x Password: nimrodel:~ # file -s /dev/loop2 /dev/loop2: SGI XFS filesystem data (blksz 4096, inosz 256, v2 dirs) nimrodel:~ # xfs_check /dev/loop2 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_check. If you are unable to mount the filesystem, then use the xfs_repair -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. nimrodel:~ # mount /dev/loop2 /mnt/crypta.mm.x/ nimrodel:~ # umount /dev/loop2 nimrodel:~ # xfs_check /dev/loop2 nimrodel:~ # No errors, you see. I'm aware that the encryption filesystem method have changed in 10.3, but the only document I have is the release notes. Probably I would have to use a different method than losetup, but I have no idea which exactly. But notice that the problem arises from a filesystem mounted directly from fstab - shouldn't this method be used now anymore? I can not mount this partition from /etc/cryptotab as I need to mount and umount it manually, at will. In any case, the "classic" method should not freeze the computer, as it does :-? I can run tests; right now I think I can reproduce the situation at will, as I had to reboot several times yesterday while trying to diagnose it (several hours). But please, be careful with the suggested tests, as a failure means powering off forcefully, which means a minimum of half an hour per test, providing I don't loose data in the process. So far I have been lucky. And please, try to limit tests to runlevel 3, as a gnome lockup plays havoc with things like command history and some gadgets. I haven't seen any messages related to the lock. Not in dmesg, not in consoles, log files, etc. nothing. Not related to the problem even remotely, and sometimes not a message about anything at all. I'm willing to recompile the kernel and increase verbosity somewhere, if you know that somewhere (I don't). Mmmm... I think I wrote all I know about it. I hope you can decipher the problem! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039#c1
Christian Boltz
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c2
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c3
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c4
Carlos Robinson
There could be a deadlock or an Oops occurring. Since you're running a GUI and the system is locking up in the manner it is, it's tough to debug.
The first time I was in gnome; later on, I always tested in runlevel 3. I will attach my report as a file. The gist of it: nothing.
If you can reproduce this at will, could you do so in console mode after running dmesg -n 8 ; klogconsole -r 0
No output at all. The cp command simply stops copying, cpu goes iddle.
please do a alt+sysrq+t to dump the states of the processes on the system. If something is stuck in a deadlock, we should be able to see it.
No output at all. Don't we need to enable first sysrq somehow?
The best way to capture this info is with a serial console, but if you're unable to set one up - a photo of the Oops will do. If there's sysrq+t output, that's going to be more difficult to capture in its entirety.
Serial port would be possible (if you point me to a howto, including pinout of serial cable), but I don't think it is necessary, the system doesn't crash till I try to umount or access the affected filesystem. I can read the log, do dmesg, cat /proc/something, etc. However, there is no message anywhere. Nothing, nada. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c5
--- Comment #5 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c6
--- Comment #6 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User ffiene@veka.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c7
Frank Fiene
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c8
--- Comment #8 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User ffiene@veka.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c9
--- Comment #9 from Frank Fiene
https://bugzilla.novell.com/show_bug.cgi?id=345039
Greg Kroah-Hartman
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c10
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c11
--- Comment #11 from Carlos Robinson
@Carlos:
If the system is hanging and you can still issue commands (as seems to be the case), can you try the following:
echo t > /proc/sysrq-trigger
Do I do that before I start the file copy procedure, or after it hangs? I assume the former.
Depending on whether writing files works, either save the output of dmesg to a file, or output it to another machine like this:
dmesg | ssh <somehost> "cat > dmesg.out"
That will give a kernel stack trace of everything on the system, and should tell me where the encrypted block device is hanging.
Ok, I'll try, time permitting. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c12
--- Comment #12 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c13
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c14
--- Comment #14 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c15
--- Comment #15 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c16
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c17
--- Comment #17 from Carlos Robinson
That's a showstopper. Obviously we can't support kernels we don't ship. If you're able to reproduce without VMware and with a kernel as shipped by Novell, please re-open this bug.
You are unfair and unjust! :-/ If you have read this report carefully you'd know that's not the case. What about Ludwig? Is he also using a non standard software? I said that yesterday I was running my kernel, I repeat: _Yesterday_. I can reproduce any time with the default kernel, and I have done so often here. Also, Vmware is not involved, as it happens that I used vmware yesterday to test something, the first time since November. I simply forgot to reboot before submitting my report. I was so happy that you gave me instructions to follow that I forgot to reboot to the default kernel before testing and reporting, and I told you so. I also told you I could repeat the test in standard conditions. You could simply have said: "Yes, please repeat the test in standard conditions." I would have set aside two or three hours to please you. But no, you go and close the bug as invalid after all these months of testings on your behalf. YOU ARE UNFAIR! :-/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c18
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c19
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c20
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c21
--- Comment #21 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c22
--- Comment #22 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c23
--- Comment #23 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c24
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c25
--- Comment #25 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c26
--- Comment #26 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User lnussel@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c27
--- Comment #27 from Ludwig Nussel
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c28
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c29
--- Comment #29 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c30
--- Comment #30 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c31
--- Comment #31 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c32
--- Comment #32 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c33
--- Comment #33 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c34
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c35
--- Comment #35 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c36
--- Comment #36 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c37
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c38
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c39
--- Comment #39 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c40
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c41
--- Comment #41 from Carlos Robinson
If that's not possible, I have another partition on which I have intended to install alpha, but didn't do it yet. It would be longer to do, but I would.
I have tested alpha 11, the Friday release. minas-morgul:~ # uname -a Linux minas-morgul 2.6.25-rc6-git5-10-default #1 SMP 2008-03-20 22:57:18 +0100 i686 i686 i386 GNU/Linux The "problem" is very much alive and kicking here. :-/ I don't have a log report, I had forgotten the "echo t > /proc/sysrq-trigger" command and once the system was crashed I could no longer mount the partition with my notes to refresh my memory. If you want it, tell me and I will try again. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c42
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c43
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c44
--- Comment #44 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c45
--- Comment #45 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c46
--- Comment #46 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c47
--- Comment #47 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c48
--- Comment #48 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c49
--- Comment #49 from Carlos Robinson
Created an attachment (id=228120) --> (https://bugzilla.novell.com/attachment.cgi?id=228120) [details] kernel log during crash using LUKS XFS crypto filesystem (re comment 46)
Errata: (re comment 47) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=345039
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=345039#c50
--- Comment #50 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=345039
Carlos Robinson
participants (1)
-
bugzilla_noreply@novell.com