[Bug 1044294] New: FCoE, BUG: sleeping function called from invalid context at ../kernel/locking/mutex.c:97, QEMU, x86_64
http://bugzilla.suse.com/show_bug.cgi?id=1044294 Bug ID: 1044294 Summary: FCoE, BUG: sleeping function called from invalid context at ../kernel/locking/mutex.c:97, QEMU, x86_64 Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.2 Hardware: x86-64 OS: openSUSE 42.2 Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: holger@fam-schranz.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Environment: 3 virtual machines, connected with FCoE, 2 of them are workers and 1 is a RAID-Server. Issue: This issue occurs always onto the RAID-Server after a while, depend to the load situation.
From the worksystems (SLES11SP4) are 4 path's configured and used to the VM which is the RAID system.
A reconnect with: fcoeadm -r eth1/2 doesn't work and hangs. RAID:~ # hostnamectl Static hostname: RAID Icon name: computer-vm Chassis: vm Machine ID: 8861646bb10c9582ad7be1f6583d364a Boot ID: e31e33c369584f389cddd61d83b96e2d Virtualization: kvm Operating System: openSUSE Leap 42.2 CPE OS Name: cpe:/o:opensuse:leap:42.2 Kernel: Linux 4.4.70-18.9-default Architecture: x86-64 RAID:~ # uname -a Linux RAID 4.4.70-18.9-default #1 SMP Wed May 31 09:09:25 UTC 2017 (c1231a7) x86_64 x86_64 x86_64 GNU/Linux RAID:~ # ---------------------------- Jun 13 22:25:59 RAID kernel: BUG: sleeping function called from invalid context at ../kernel/locking/mutex.c:97 Jun 13 22:25:59 RAID kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 11442, name: kworker/3:1 Jun 13 22:25:59 RAID kernel: CPU: 3 PID: 11442 Comm: kworker/3:1 Not tainted 4.4.70-18.9-default #1 Jun 13 22:25:59 RAID kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Jun 13 22:25:59 RAID kernel: Workqueue: events fcoe_ctlr_timer_work [libfcoe] Jun 13 22:25:59 RAID kernel: 0000000000000000 ffffffff81329447 ffff88007b5fa048 ffff88007b5fa048 Jun 13 22:25:59 RAID kernel: ffffffff8160cf6c ffff88007b5fa000 ffffffffa00945f1 ffff88007b5fa010 Jun 13 22:25:59 RAID kernel: 0000000101d6b8c0 ffff880078d74820 ffff880078d747d8 ffffffffa00b0fe9 Jun 13 22:25:59 RAID kernel: Call Trace: Jun 13 22:25:59 RAID kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Jun 13 22:25:59 RAID kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Jun 13 22:25:59 RAID kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Jun 13 22:25:59 RAID kernel: [<ffffffff81329447>] dump_stack+0x5c/0x85 Jun 13 22:25:59 RAID kernel: [<ffffffff8160cf6c>] mutex_lock+0x1c/0x38 Jun 13 22:25:59 RAID kernel: [<ffffffffa00945f1>] fc_rport_logoff+0x21/0xe0 [libfc] Jun 13 22:25:59 RAID kernel: [<ffffffffa00b0fe9>] fcoe_ctlr_timer_work+0x6a9/0xca0 [libfcoe] Jun 13 22:25:59 RAID kernel: [<ffffffff81097775>] process_one_work+0x155/0x440 Jun 13 22:25:59 RAID kernel: [<ffffffff810982b6>] worker_thread+0x116/0x4b0 Jun 13 22:25:59 RAID kernel: [<ffffffff8109d8a2>] kthread+0xd2/0xf0 Jun 13 22:25:59 RAID kernel: [<ffffffff8160f58f>] ret_from_fork+0x3f/0x70 Jun 13 22:25:59 RAID kernel: DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70 Jun 13 22:25:59 RAID kernel: Jun 13 22:25:59 RAID kernel: Leftover inexact backtrace: Jun 13 22:25:59 RAID kernel: [<ffffffff8109d7d0>] ? kthread_park+0x50/0x50 Jun 13 22:26:03 RAID kernel: BUG: sleeping function called from invalid context at ../kernel/locking/mutex.c:97 Jun 13 22:26:03 RAID kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 11442, name: kworker/3:1 Jun 13 22:26:03 RAID kernel: CPU: 3 PID: 11442 Comm: kworker/3:1 Not tainted 4.4.70-18.9-default #1 Jun 13 22:26:03 RAID kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Jun 13 22:26:03 RAID kernel: Workqueue: events fcoe_ctlr_timer_work [libfcoe] Jun 13 22:26:03 RAID kernel: 0000000000000000 ffffffff81329447 ffff880036a96048 ffff880036a96048 Jun 13 22:26:03 RAID kernel: ffffffff8160cf6c ffff880036a96000 ffffffffa00945f1 ffff880036a96010 Jun 13 22:26:03 RAID kernel: 0000000101d706e2 ffff880078d74820 ffff880078d747d8 ffffffffa00b0fe9 Jun 13 22:26:03 RAID kernel: Call Trace: Jun 13 22:26:03 RAID kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Jun 13 22:26:03 RAID kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Jun 13 22:26:03 RAID kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Jun 13 22:26:03 RAID kernel: [<ffffffff81329447>] dump_stack+0x5c/0x85 Jun 13 22:26:03 RAID kernel: [<ffffffff8160cf6c>] mutex_lock+0x1c/0x38 Jun 13 22:26:03 RAID kernel: [<ffffffffa00945f1>] fc_rport_logoff+0x21/0xe0 [libfc] Jun 13 22:26:03 RAID kernel: [<ffffffffa00b0fe9>] fcoe_ctlr_timer_work+0x6a9/0xca0 [libfcoe] Jun 13 22:26:03 RAID kernel: [<ffffffff81097775>] process_one_work+0x155/0x440 Jun 13 22:26:03 RAID kernel: [<ffffffff810982b6>] worker_thread+0x116/0x4b0 Jun 13 22:26:03 RAID kernel: [<ffffffff8109d8a2>] kthread+0xd2/0xf0 Jun 13 22:26:03 RAID kernel: [<ffffffff8160f58f>] ret_from_fork+0x3f/0x70 Jun 13 22:26:03 RAID kernel: DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70 Jun 13 22:26:03 RAID kernel: Jun 13 22:26:03 RAID kernel: Leftover inexact backtrace: Jun 13 22:26:03 RAID kernel: [<ffffffff8109d7d0>] ? kthread_park+0x50/0x50 Jun 13 22:26:03 RAID kernel: BUG: scheduling while atomic: kworker/3:1/11442/0x00000002 Jun 13 22:26:03 RAID kernel: Modules linked in: fuse target_core_user tcm_loop tcm_qla2xxx qla2xxx vhost_scsi vhost iscsi_target_mod ib_srpt ib_cm ib_sa ib_mad ib_core ib_addr tcm_fc target_core_file target_core_iblock target_core_pscsi target_core_mod configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs iTCO_wdt iTCO_vendor_support virtio_balloon ppdev acpi_cpufreq pcspkr i2c_i801 joydev lpc_ich mfd_core processor parport_pc parport shpchp button ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod virtio_gpu drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_console virtio_scsi hid_generic usbhid 8021q garp stp llc mrp bnx2fc cnic uio xhci_pci xhci_hcd usbcore usb_common ahci libahci virtio_pci virtio_ring virtio e1000 libata serio_raw fcoe libfcoe Jun 13 22:26:03 RAID kernel: libfc scsi_transport_fc sg scsi_mod autofs4 Jun 13 22:26:03 RAID kernel: CPU: 3 PID: 11442 Comm: kworker/3:1 Not tainted 4.4.70-18.9-default #1 Jun 13 22:26:03 RAID kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Jun 13 22:26:03 RAID kernel: Workqueue: events fcoe_ctlr_timer_work [libfcoe] Jun 13 22:26:03 RAID kernel: 0000000000000000 ffffffff81329447 ffff88017fd95c40 ffff8801022cfca0 Jun 13 22:26:03 RAID kernel: ffffffff8118a8ee ffff8801022cfce8 ffffffff8160afbe ffff8801470460c4 Jun 13 22:26:03 RAID kernel: ffff8801444c0d40 ffff8801022d0000 ffff8801444c0d40 ffff880178f9984c Jun 13 22:26:03 RAID kernel: Call Trace: Jun 13 22:26:03 RAID kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Jun 13 22:26:03 RAID kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Jun 13 22:26:03 RAID kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Jun 13 22:26:03 RAID kernel: [<ffffffff81329447>] dump_stack+0x5c/0x85 Jun 13 22:26:03 RAID kernel: [<ffffffff8118a8ee>] __schedule_bug+0x4b/0x59 Jun 13 22:26:03 RAID kernel: [<ffffffff8160afbe>] thread_return+0x5eb/0x6bd Jun 13 22:26:03 RAID kernel: [<ffffffff8160b0cc>] schedule+0x3c/0x90 Jun 13 22:26:03 RAID kernel: [<ffffffff8160b3f5>] schedule_preempt_disabled+0x15/0x20 Jun 13 22:26:03 RAID kernel: [<ffffffff8160ced6>] __mutex_lock_slowpath+0xb6/0x130 Jun 13 22:26:03 RAID kernel: [<ffffffff8160cf79>] mutex_lock+0x29/0x38 Jun 13 22:26:03 RAID kernel: [<ffffffffa00945f1>] fc_rport_logoff+0x21/0xe0 [libfc] Jun 13 22:26:03 RAID kernel: [<ffffffffa00b0fe9>] fcoe_ctlr_timer_work+0x6a9/0xca0 [libfcoe] Jun 13 22:26:03 RAID kernel: [<ffffffff81097775>] process_one_work+0x155/0x440 Jun 13 22:26:03 RAID kernel: [<ffffffff810982b6>] worker_thread+0x116/0x4b0 Jun 13 22:26:03 RAID kernel: [<ffffffff8109d8a2>] kthread+0xd2/0xf0 Jun 13 22:26:03 RAID kernel: [<ffffffff8160f58f>] ret_from_fork+0x3f/0x70 Jun 13 22:26:03 RAID kernel: DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70 Jun 13 22:26:03 RAID kernel: Jun 13 22:26:03 RAID kernel: Leftover inexact backtrace: Jun 13 22:26:03 RAID kernel: [<ffffffff8109d7d0>] ? kthread_park+0x50/0x50 Jun 13 22:26:19 RAID kernel: BUG: sleeping function called from invalid context at ../kernel/locking/mutex.c:97 Jun 13 22:26:19 RAID kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 11499, name: kworker/3:3 Jun 13 22:26:19 RAID kernel: CPU: 3 PID: 11499 Comm: kworker/3:3 Tainted: G W 4.4.70-18.9-default #1 Jun 13 22:26:19 RAID kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Jun 13 22:26:19 RAID kernel: Workqueue: events fcoe_ctlr_timer_work [libfcoe] Jun 13 22:26:19 RAID kernel: 0000000000000000 ffffffff81329447 ffff880178801848 ffff880178801848 Jun 13 22:26:19 RAID kernel: ffffffff8160cf6c Jun 13 22:26:19 RAID kernel: ffff880178801800 ffffffffa00945f1 ffff880178801810 Jun 13 22:26:19 RAID kernel: 0000000101d6d440 Jun 13 22:26:19 RAID kernel: ffff880036a17820 ffff880036a177d8 ffffffffa00b0fe9 Jun 13 22:26:19 RAID kernel: Call Trace: Jun 13 22:26:19 RAID kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Jun 13 22:26:19 RAID kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Jun 13 22:26:19 RAID kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Jun 13 22:26:19 RAID kernel: [<ffffffff81329447>] dump_stack+0x5c/0x85 Jun 13 22:26:19 RAID kernel: [<ffffffff8160cf6c>] mutex_lock+0x1c/0x38 Jun 13 22:26:19 RAID kernel: [<ffffffffa00945f1>] fc_rport_logoff+0x21/0xe0 [libfc] Jun 13 22:26:19 RAID kernel: [<ffffffffa00b0fe9>] fcoe_ctlr_timer_work+0x6a9/0xca0 [libfcoe] Jun 13 22:26:19 RAID kernel: [<ffffffff81097775>] process_one_work+0x155/0x440 Jun 13 22:26:19 RAID kernel: [<ffffffff810982b6>] worker_thread+0x116/0x4b0 Jun 13 22:26:19 RAID kernel: [<ffffffff8109d8a2>] kthread+0xd2/0xf0 Jun 13 22:26:19 RAID kernel: [<ffffffff8160f58f>] ret_from_fork+0x3f/0x70 Jun 13 22:26:19 RAID kernel: DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70 Jun 13 22:26:19 RAID kernel: Jun 13 22:26:19 RAID kernel: Leftover inexact backtrace: Jun 13 22:26:19 RAID kernel: [<ffffffff8109d7d0>] ? kthread_park+0x50/0x50 Jun 13 22:26:51 RAID sshd[11500]: Accepted publickey for root from 192.168.101.100 port 33413 ssh2: DSA SHA256:p/NSgZGhYmtt2zdIWjlOuyPFGKDIf6OaB2FTiW8CV2c Jun 13 22:26:51 RAID sshd[11500]: pam_unix(sshd:session): session opened for user root by (uid=0) Jun 13 22:26:51 RAID systemd[1]: Created slice User Slice of root. Jun 13 22:26:51 RAID systemd[1]: Starting User Manager for UID 0... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c1
Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c2
--- Comment #2 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c3
--- Comment #3 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c4
--- Comment #4 from Johannes Thumshirn
after some tests I think I have found an important hint: The mutex.c issue occur immediately when a virtual machine, which is connect via fcoe, is destroyed with force by qemu (virsh destroy <vm>). In case of a normal shutdown only a message in der traget system is written to the message file that the rport is blocked for a while, but you can reconnect by the startup/power on of the system.
This hardens my assumption of the VN2VN timeout. I'm working on a test setup to reproduce the issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c5
--- Comment #5 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c6
--- Comment #6 from Johannes Thumshirn
Hello Mr. Thumshirn,
thats great. I'm happy that can I help. If you want, I can start a test with a fix from you in parallel. The test environment is already installed.
Can you try booting with a -debug flavor kernel? Maybe we can get more information what's going on with the locking out of it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c7
--- Comment #7 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c8
--- Comment #8 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c13
Johannes Thumshirn
Seems I have the same issue
[ 72.229523] BUG: sleeping function called from invalid context at ../kernel/locking/mutex.c:97 [ 72.229538] in_atomic(): 1, irqs_disabled(): 0, pid: 2387, name: mdadm [ 72.229544] CPU: 2 PID: 2387 Comm: mdadm Not tainted 4.4.73-18.17-default #1 [ 72.229547] Hardware name: LENOVO 346034G/346034G, BIOS G6ETB5WW (2.75 ) 09/01/2016
I can reproduce it 100%, but only on that Lenovo machine above not on Qemu (I haven't tested other real machines yet.)
For me this kernel needs to run to trigger the bug: kernel-default-4.4.73-18.17.1.x86_64
The previous kernel 4.4.72 works
I run util-linux test-suite:
$ cd util-linux/ $ make -j4 fdisk blkid losetup $ sudo ./tests/ts/blkid/md-raid1-whole
Mostly the system freezes completely.
Do I understand you correctly that there is no FCoE involved? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c15
--- Comment #15 from Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
Björn Voigt
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c19
--- Comment #19 from Björn Voigt
http://bugzilla.suse.com/show_bug.cgi?id=1044294
Ruediger Meier
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c20
--- Comment #20 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c21
--- Comment #21 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c22
--- Comment #22 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c23
--- Comment #23 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c24
--- Comment #24 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c25
--- Comment #25 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c26
--- Comment #26 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c27
--- Comment #27 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c28
--- Comment #28 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c29
--- Comment #29 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c30
--- Comment #30 from Johannes Thumshirn
Any news please? How is the status?
No unfortunately not as I'm extremely busy. I'm sorry. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c31
Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c32
--- Comment #32 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c33
--- Comment #33 from Johannes Thumshirn
Please wait for a while.
No problem, I'll be on vacation the next two weeks anyways. Hoping to read good news when I'm back :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c34
--- Comment #34 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c35
--- Comment #35 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c36
--- Comment #36 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c37
--- Comment #37 from Johannes Thumshirn
Hi Johannes,
about this issue I have try to find an easy method to recreate and now I think I have found it:
Use the ethtool and change the settings. Each usage of the ethtool to the FCoE used interfaces run into this problem.
An example: eth2/eth3 are assign to FCoE
ethtool -s eth2 autoneg off speed 100 duplex full
and the problem occure
Best regards
Holger
Just for confirmation, did you try the kernel from comment 31? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c38
--- Comment #38 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c39
--- Comment #39 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c40
--- Comment #40 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c41
--- Comment #41 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c42
--- Comment #42 from Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c43
--- Comment #43 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c44
--- Comment #44 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c45
--- Comment #45 from Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c46
--- Comment #46 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c47
--- Comment #47 from Johannes Thumshirn
Hi Johannes,
Sorry for the late reaction. I was in vacation and back today. I will try it latest tomorrow.
No problem, thanks a lot -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c48
--- Comment #48 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c50
--- Comment #50 from Holger Schranz
From my point of view it looks better. A reboot with reconnect runs in 95% without problems. A hard broken (Crash of the Initator system) runs into the same behavior (a picture will be upload in parallel). Also the test which I have described in comment #35 occourd further. But his can be, because I don't use the patch from you.
About a description ... sorry ... it is a huge description and need a longer time. Short: The "RAID"-System is a LEAP 42.2 virtual system. There are 5 files via QEMU configured as HDD's. This 5 files are described as HDD's-RAID via tgtd. this is necessary because of VPD which we need. This internally mapped and forward to FCoE RAID:~ # lsscsi -g [0:0:0:0] disk LSI MR SAS 6G 1GB 2.5+ /dev/sda /dev/sg0 [1:0:0:1] disk QEMU QEMU HARDDISK 2.5+ /dev/sdf /dev/sg5 [1:0:0:2] disk QEMU QEMU HARDDISK 2.5+ /dev/sde /dev/sg4 [1:0:0:3] disk QEMU QEMU HARDDISK 2.5+ /dev/sdd /dev/sg3 [1:0:0:4] disk QEMU QEMU HARDDISK 2.5+ /dev/sdc /dev/sg2 [1:0:0:15] disk QEMU QEMU HARDDISK 2.5+ /dev/sdb /dev/sg1 [4:0:0:0] cd/dvd QEMU QEMU DVD-ROM 2.5+ /dev/sr0 /dev/sg6 [10:0:0:0] storage IET Controller 0001 - /dev/sg7 [10:0:0:1] disk FUJITSU ETERNUS_DXL 0000 /dev/sdi /dev/sg11 [10:0:0:2] disk FUJITSU ETERNUS_DXL 0000 /dev/sdk /dev/sg13 [10:0:0:3] disk FUJITSU ETERNUS_DXL 0000 /dev/sdm /dev/sg15 [10:0:0:4] disk FUJITSU ETERNUS_DXL 0000 /dev/sdo /dev/sg17 [10:0:0:15] disk FUJITSU ETERNUS_DXL 0000 /dev/sdp /dev/sg18 [11:0:0:0] storage IET Controller 0001 - /dev/sg8 [11:0:0:1] disk FUJITSU ETERNUS_DXL 0000 /dev/sdg /dev/sg9 [11:0:0:2] disk FUJITSU ETERNUS_DXL 0000 /dev/sdh /dev/sg10 [11:0:0:3] disk FUJITSU ETERNUS_DXL 0000 /dev/sdj /dev/sg12 [11:0:0:4] disk FUJITSU ETERNUS_DXL 0000 /dev/sdl /dev/sg14 [11:0:0:15] disk FUJITSU ETERNUS_DXL 0000 /dev/sdn /dev/sg16 RAID:~ # The first part is for QEMU the next 2 parts are for tgtd because of a 2 way connection. Both 10:0:0:15 and 11:0:0:15 are connect to 1:0:0:15 Present to FCoE is via lio/targetcli. I hope you are not so confused. Best regards Holger -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c51
--- Comment #51 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c52
--- Comment #52 from Vladimir Anufriev
http://bugzilla.suse.com/show_bug.cgi?id=1044294
Vladimir Anufriev
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c53
--- Comment #53 from Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
Ralf Müller
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c54
--- Comment #54 from Ralf Müller
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c55
--- Comment #55 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c56
--- Comment #56 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c57
--- Comment #57 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c58
--- Comment #58 from Johannes Thumshirn
Hello @ all,
after some times now I can say it runs more stable than before with proposal of a fix from Ralf.
A similar issue occured after an timeout onto a connection after 500 sec.. A very rare situation.
How to continue with this issue now? What about the proposal for a fix from Ralf?
Ralf, can you please send to patch upstream (to the linux-scsi mailing-list the open-fcoe list is defunct). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c59
--- Comment #59 from Ralf Müller
(In reply to Holger Schranz from comment #57)
Hello @ all,
after some times now I can say it runs more stable than before with proposal of a fix from Ralf.
A similar issue occured after an timeout onto a connection after 500 sec.. A very rare situation.
How to continue with this issue now? What about the proposal for a fix from Ralf?
Ralf, can you please send to patch upstream (to the linux-scsi mailing-list the open-fcoe list is defunct).
Hi Johannes, sorry for the long delay ... First, the modification: diff /usr/src/linux-4.4.114-42/drivers/scsi/fcoe/fcoe_ctlr.c fcoe_ctlr.c 2179c2179 < rcu_read_lock(); ---
mutex_lock(&lport->disc.disc_mutex);
2186,2187d2185 < rcu_read_unlock(); < mutex_lock(&lport->disc.disc_mutex); 2717c2715 < rcu_read_lock(); ---
mutex_lock(&lport->disc.disc_mutex);
2738c2736 < rcu_read_unlock(); ---
mutex_unlock(&lport->disc.disc_mutex);
3085,3086d3082 < mutex_unlock(&disc->disc_mutex); < rcu_read_lock(); 3095c3091 < rcu_read_unlock(); ---
mutex_unlock(&disc->disc_mutex);
Second: I have no idea what you mean with 'can you please send to patch upstream ... ' Can you explain what i should do, please -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c60
--- Comment #60 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c61
--- Comment #61 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c62
Hannes Reinecke
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c63
--- Comment #63 from Hannes Reinecke
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c64
--- Comment #64 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c65
--- Comment #65 from Ralf Müller
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c66
--- Comment #66 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c67
--- Comment #67 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c69
Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c70
--- Comment #70 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c71
Hannes Reinecke
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c72
--- Comment #72 from Hannes Reinecke
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c73
--- Comment #73 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c74
--- Comment #74 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c75
--- Comment #75 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c76
--- Comment #76 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c77
--- Comment #77 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c78
--- Comment #78 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c79
--- Comment #79 from Hannes Reinecke
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c80
--- Comment #80 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c81
--- Comment #81 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c82
--- Comment #82 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c83
--- Comment #83 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c84
--- Comment #84 from Holger Schranz
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c85
Tomáš Chvátal
http://bugzilla.suse.com/show_bug.cgi?id=1044294
http://bugzilla.suse.com/show_bug.cgi?id=1044294#c86
--- Comment #86 from Holger Schranz
participants (1)
-
bugzilla_noreply@novell.com