Hi, I've got a QLogic 2342 2-channel fiberchannel card with a storagetek 9940B tape drive alone on a channel. The drive supports block sizes to 256k, but using that blocksize in dd causes 2.4.21-201 kernel blowup with the message (see below for full output) "Kernel BUG at pci_dma:42". If I use the default blocksize, it works without problems (just very slowly). The offending section of pci-dma.c identifies itself as "temporary 2.4 hack", so I'm guessing the problem lies there. :) 256k blocksize works fine in 2.6.3-rc2. I can't just run that version because the 3ware driver seems to be just totally broken in 2.6 (first access seeks the drives then freezes the machine, no message, no nothing). But you guys aren't in the 2.6 game so you don't have to solve that problem. ;-) Ideas? -mcq --- Mar 9 22:39:32 misery kernel: qla2x00_set_info starts at address = ffffffffa00600c0 Mar 9 22:39:32 misery kernel: qla2x00: Found VID=1077 DID=2312 SSVID=1077 SSDID=101 Mar 9 22:39:32 misery kernel: scsi(5): Found a QLA2312 @ bus 3, device 0x1, irq 5, iobase 0x4000 Mar 9 22:39:32 misery kernel: scsi(5): Allocated 4096 SRB(s). Mar 9 22:39:32 misery kernel: scsi(5): Configure NVRAM parameters... Mar 9 22:39:32 misery kernel: scsi(5): 64 Bit PCI Addressing Enabled. Mar 9 22:39:32 misery kernel: scsi(5): Scatter/Gather entries= 896 Mar 9 22:39:32 misery kernel: scsi(5): Verifying loaded RISC code... Mar 9 22:39:32 misery kernel: scsi(5): Verifying chip... Mar 9 22:39:32 misery kernel: scsi(5): Waiting for LIP to complete... Mar 9 22:40:32 misery kernel: scsi(5): Cable is unplugged... Mar 9 22:40:32 misery kernel: qla2x00: Found VID=1077 DID=2312 SSVID=1077 SSDID=101 Mar 9 22:40:32 misery kernel: scsi(6): Found a QLA2312 @ bus 3, device 0x1, irq 10, iobase 0x4400 Mar 9 22:40:32 misery kernel: scsi(6): Allocated 4096 SRB(s). Mar 9 22:40:32 misery kernel: scsi(6): Configure NVRAM parameters... Mar 9 22:40:32 misery kernel: scsi(6): 64 Bit PCI Addressing Enabled. Mar 9 22:40:32 misery kernel: scsi(6): Scatter/Gather entries= 896 Mar 9 22:40:32 misery kernel: scsi(6): Verifying loaded RISC code... Mar 9 22:40:32 misery kernel: scsi(6): Verifying chip... Mar 9 22:40:32 misery kernel: scsi(6): Waiting for LIP to complete... Mar 9 22:40:33 misery kernel: scsi(6): LIP reset occurred. Mar 9 22:40:33 misery kernel: scsi(6): LIP occurred. Mar 9 22:40:33 misery kernel: scsi(6): LOOP UP detected. Mar 9 22:40:33 misery kernel: scsi(6): Topology - (Loop), Host Loop address 0x0 Mar 9 22:40:34 misery kernel: scsi5 : QLogic QLA2312 PCI to Fibre Channel Host Adapter: bus 3 device 1 irq 5 Mar 9 22:40:34 misery kernel: Firmware version: 3.01.18, Driver version 6.05.00 Mar 9 22:40:34 misery kernel: Mar 9 22:40:34 misery kernel: scsi6 : QLogic QLA2312 PCI to Fibre Channel Host Adapter: bus 3 device 1 irq 10 Mar 9 22:40:34 misery kernel: Firmware version: 3.01.18, Driver version 6.05.00 Mar 9 22:40:34 misery kernel: Mar 9 22:40:34 misery kernel: Vendor: STK Model: T9940B Rev: 1.32 Mar 9 22:40:34 misery kernel: Type: Sequential-Access ANSI SCSI revision: 03 Mar 9 22:40:34 misery kernel: st: Version 20030403, bufsize 32768, max init. bufs 4, s/g segs 16 Mar 9 22:40:34 misery kernel: Attached scsi tape st0 at scsi6, channel 0, id 0, lun 0 (I run dd if=some_file of=/dev/st0 bs=256k) Mar 9 21:28:10 misery kernel: st0: Block limits 1 - 262144 bytes. Mar 9 21:32:11 misery kernel: Kernel BUG at pci_dma:42 Mar 9 21:32:11 misery kernel: invalid operand: 0000 Mar 9 21:32:11 misery kernel: CPU 1 Mar 9 21:32:11 misery kernel: Pid: 1309, comm: dd Not tainted Mar 9 21:32:11 misery kernel: RIP: 0010:[<ffffffff801166fc>]{pci_map_sg+124} Mar 9 21:32:11 misery kernel: RSP: 0018:00000100291f7ba8 EFLAGS: 00010002 Mar 9 21:32:11 misery kernel: RAX: 0000000000000001 RBX: 000001000e976840 RCX: 0000000000000001 Mar 9 21:32:11 misery kernel: RDX: 000000008020b9e0 RSI: 00000100004c0000 RDI: 00000101ff95d800 Mar 9 21:32:11 misery kernel: RBP: 0000000000000000 R08: 0000000000000002 R09: 000000000000003a Mar 9 21:32:11 misery kernel: R10: 000001000e39add0 R11: 0000000000000060 R12: 0000000000000001 Mar 9 21:32:11 misery kernel: R13: 000000000000000f R14: 000001000e976840 R15: 00000101ff95d800 Mar 9 21:32:11 misery kernel: FS: 000000000050e080(0000) GS:ffffffff8041f2c0(0000) knlGS:0000000000000000 Mar 9 21:32:11 misery kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 9 21:32:11 misery kernel: CR2: 0000002a955f6000 CR3: 00000001ff911000 CR4: 00000000000006e0 Mar 9 21:32:11 misery kernel: Process dd (pid: 1309, stackpage=100291f7000) Mar 9 21:32:11 misery kernel: Stack: 00000100291f7ba8 0000000000000018 ffffffff8013aaa3 00000100291f3550 Mar 9 21:32:11 misery kernel: 000001000e8400d0 000000000000020a 000001000e83cea8 000001000e83ce80 Mar 9 21:32:11 misery kernel: 000001000e976840 000001000e39ac00 ffffffffa00676e0 000001000000003a Mar 9 21:32:11 misery kernel: Call Trace: [<ffffffff8013aaa3>]{do_anonymous_page+259} Mar 9 21:32:11 misery kernel: [<ffffffffa00676e0>]{:qla2300-60650:qla2x00_64bit_start_scsi+832} Mar 9 21:32:11 misery kernel: [<ffffffffa006de5c>]{:qla2300-60650:qla2x00_next+508} Mar 9 21:32:11 misery kernel: [<ffffffffa0062256>]{:qla2300-60650:qla2x00_queuecommand+1078} Mar 9 21:32:11 misery kernel: [<ffffffff8020c0d2>]{scsi_dispatch_cmd+642} [<ffffffff80214c2e>]{scsi_request_fn+990} Mar 9 21:32:11 misery kernel: [<ffffffff80213d2f>]{__scsi_insert_special+127} [<ffffffff80213da2>]{scsi_insert_special_req+34} Mar 9 21:32:11 misery kernel: [<ffffffff8020c401>]{scsi_do_req+385} [<ffffffffa004f300>]{:st:st_sleep_done+0} Mar 9 21:32:11 misery kernel: [<ffffffffa004f533>]{:st:st_do_scsi+339} [<ffffffffa0051082>]{:st:st_write+2178} Mar 9 21:32:11 misery kernel: [<ffffffff8015201f>]{sys_write+191} [<ffffffff801100c3>]{system_call+119} Mar 9 21:32:11 misery kernel: Mar 9 21:32:11 misery kernel: Mar 9 21:32:11 misery kernel: Code: 0f 0b 7e da 28 80 ff ff ff ff 2a 00 eb 56 66 66 90 66 66 90
The offending section of pci-dma.c identifies itself as "temporary 2.4 hack", so I'm guessing the problem lies there. :)
The pci-dma code is fine. It's either the driver or the SCSI tape driver passing an illegal scatter-gather list.
256k blocksize works fine in 2.6.3-rc2. I can't just run that version because the 3ware driver seems to be just totally broken in 2.6 (first access seeks the drives then freezes the machine, no message, no nothing). But you guys aren't in the 2.6 game so you don't have to solve that problem. ;-)
3ware works for me fine on 2.6 This means on some boxes with iommu on it seems to trigger MCEs, but these can be disabled. -Andi
participants (2)
-
Andi Kleen
-
John McCorquodale