[Bug 1030941] New: No connection possible with NVMe over Fabrics over the Linux Soft RoCE transport
http://bugzilla.suse.com/show_bug.cgi?id=1030941 Bug ID: 1030941 Summary: No connection possible with NVMe over Fabrics over the Linux Soft RoCE transport Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: jthumshirn@suse.com Reporter: jthumshirn@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- When using NVMe over Fabrics with RDMA over the kernel's rdma_rxe.ko driver one sees I/O errors shortly after connecting. host: nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test target: nvmet: creating controller 1 for subsystem nvmf-test for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:123de86f-6e45-4a41-ba60-f09713b15515. nvmet: ctrl 1 keep-alive timer (15 seconds) expired! nvmet: ctrl 1 fatal error occurred! nvmet_rdma: freeing queue 0 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1030941
http://bugzilla.suse.com/show_bug.cgi?id=1030941#c1
Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1030941
http://bugzilla.suse.com/show_bug.cgi?id=1030941#c2
--- Comment #2 from Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1030941
http://bugzilla.suse.com/show_bug.cgi?id=1030941#c3
--- Comment #3 from Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1030941
http://bugzilla.suse.com/show_bug.cgi?id=1030941#c4
--- Comment #4 from Johannes Thumshirn
From instrumentation I found the Packet that breaks the conversation is a RDMA WRITE MIDDLE packet and it has a mtu of 4096 and residual length of 0. This is why the check in [1] set the state to RESPST_ERR_LENGTH.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1030941
http://bugzilla.suse.com/show_bug.cgi?id=1030941#c5
--- Comment #5 from Johannes Thumshirn
$ nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test [ 3.280868] rdma_rxe: write_data_in: pkt->psn: 5969147, pkt->opcode: IB_OPCODE_RC_RDMA_WRITE_FIRST, data_len: 1024, resid: 1024, resid - data_len: 0 [ 3.282341] rdma_rxe: check_rkey: dropping packet with pkt->psn: 5969148, pkt->opcode: IB_OPCODE_RC_RDMA_WRITE_MIDDLE, pktlen: 1024, resid: 0 [ 3.283727] rdma_rxe: qp#17 moved to error state
From the above output we see the RDMA WRITE FIRST already consumes all data, although a RDMA WRITE MIDDLE suggests there have to be at least 3 packets (FIRST, MIDDLE and LAST).
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1030941
http://bugzilla.suse.com/show_bug.cgi?id=1030941#c6
--- Comment #6 from Johannes Thumshirn
http://bugzilla.suse.com/show_bug.cgi?id=1030941
http://bugzilla.suse.com/show_bug.cgi?id=1030941#c7
Johannes Thumshirn
participants (1)
-
bugzilla_noreply@novell.com