[Bug 823797] New: Kernel Bug triggered by Soft Lockup when using mkfs.ext[2|3|4] to iscsi partition on EQLOGIC 100E-00
https://bugzilla.novell.com/show_bug.cgi?id=823797 https://bugzilla.novell.com/show_bug.cgi?id=823797#c0 Summary: Kernel Bug triggered by Soft Lockup when using mkfs.ext[2|3|4] to iscsi partition on EQLOGIC 100E-00 Classification: openSUSE Product: openSUSE 12.3 Version: Final Platform: x86-64 OS/Version: openSUSE 12.3 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: andrew.holland@nordictelecom.fi QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36 With 64bit Kernel 3.7.10-1.11-default and open-iscsi 2.0.870-47.4.1, using the ext libraries 1.42.6-2.2.1, I trigger a kernel bug whenever I try to format a partition on our EQLOGIC 100E-00 device. This occurs whether the partition is on a GPT or a MS-Dos style partitioning scheme. The bug reported to the server debug console is: kernel:[ 148.953735] BUG: soft lockup - CPU#5 stuck for 22s! [mkfs.ext4:4631] * additional tests: Reiserfs and swap partitions are created without triggering the problem. The disk can be written to, and read, with dd, without triggering the problem. When a filesystem (reiserfs tested) is created, standard filesystem activities (create, append, delete, read) are as normal. I am unsure if the issue lies in the Kernel, open-iscsi, or the ext filesystem library/tools - it may be a combination of all. Reproducible: Always Steps to Reproduce: 1. Attach to iscsi device (using open-iscsi, CHAP authentication. - CHAP authentication requires custom, non-YAST editing of files. - in my case, the device is attached at boot. (config set to automatic, not manual.) 2. Create partition on device using parted. - To assure this occurs every time, I have gone as far as to write zeroes to the entire drive using dd, then created both a GPT and a MS-Dos partitioning scheme. 3. Attempt to format the partition via mkfs.ext[n] where n is 2, 3, or 4. - Occurs no matter which "version" is used. 4. Output will show: mkfs.ext4 /dev/sdc1 mke2fs 1.42.6 (21-Sep-2012) At this point, all output on the associated TTY stops until the Kernel Bug message appears: kernel:[ 148.953735] BUG: soft lockup - CPU#5 stuck for 22s! [mkfs.ext4:4631] * Additional info: - when using tmux or screen or other tty's, journald shows the following messages filling the log: Jun 07 11:25:38 mysql1.nordictele.com kernel: sdc1: rw=129, want=8384512, limit=1951744 Jun 07 11:25:38 mysql1.nordictele.com systemd-journal[379]: Missed 240 kernel messages Actual Results: System slows down as the logs fill with 'access beyond end of device' messages. Formatting process (mkfs.ext[2|3|4]) cannot be killed via control-c or "kill" Other tty's can be opened to perform actions/look at logs, but system eventually locks up. (I have not had a reboot occur; I have powercycled the machine to restore functionality each time.) Expected Results: Partition initialized with ext-style filesystem (and optional journal) System is 64 bit; Muti-core with 24 cores; - model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz results of 'free' command (showing memory:) total used free shared buffers cached Mem: 32967184 3625316 29341868 0 7572 1124664 -/+ buffers/cache: 2493080 30474104 Swap: 33551356 0 33551356 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c1
Jiri Slaby
At this point, all output on the associated TTY stops until the Kernel Bug message appears: kernel:[ 148.953735] BUG: soft lockup - CPU#5 stuck for 22s! [mkfs.ext4:4631]
Could you grab a photo of the whole report? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c2
Andrew Holland
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c3
--- Comment #3 from Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c4
Andrew Holland
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c5
Jiri Slaby
sd 8:0:0:0: Attached scsi generic sg3 type 0 sd 8:0:0:0: [sdc] 2147512320 512-byte logical blocks: (1.09 TB/1.00 TiB) ... sdc1: rw=129, want=8384512, limit=2147508224 attempt to access beyond end of device sdc1: rw=129, want=18446744073709549568, limit=2147508224
Huh? Is this a bug in ext-utils? Honza, any ideas? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c6
--- Comment #6 from Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c7
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c8
Andrew Holland
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c9
--- Comment #9 from Andrew Holland
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c10
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c11
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c12
--- Comment #12 from Andrew Holland
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c13
--- Comment #13 from Jiri Slaby
I can test starting at Midnight Helsinki time - would that be too early?
(What time zone is used for builds - CET? UTC? MDT?)
Around 2 am CET IIRC. You can always check the changelog, it has to contain "block: discard granularity might not be power of 2." -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c14
--- Comment #14 from Andrew Holland
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c15
--- Comment #15 from Andrew Holland
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c16
--- Comment #16 from Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c17
--- Comment #17 from Jiri Slaby
It's not built yet...
Better phrased as "not published yet". https://api.opensuse.org/build/Kernel:openSUSE-12.3/standard/x86_64/kernel-d... should work if you have an OBS account... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c18
Andrew Holland
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c19
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=823797
https://bugzilla.novell.com/show_bug.cgi?id=823797#c20
--- Comment #20 from Swamp Workflow Management
http://bugzilla.novell.com/show_bug.cgi?id=823797
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com