SLES8 kernel panic when powering SCSI devices on/off on AIC7XXX driver
Hello all: I'm running SLES 8 (kernel 2.4.19-64GB-SMP) on a Dell 6650 with 4GB of RAM. I have a Quantum M1500 tape library connected to the on-board Adaptec AIC7892 Ultra160 SCSI controller in the system. We are running the AIC7XXXX driver, which according to dmesg, is Rev 6.2.29. I can generate a kernel panic at will by turning this device off or on. This concerns me a bit, because any mistake can crash my system! :-/ Here is the kernel panic I've received, extracted from /var/log/messages: Aug 7 09:57:33 earth kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Aug 7 09:57:33 earth kernel: printing eip: Aug 7 09:57:33 earth kernel: 00000000 Aug 7 09:57:33 earth kernel: *pde = 1c852001 Aug 7 09:57:33 earth kernel: Oops: 0000 2.4.19-64GB-SMP #1 SMP Tue May 20 08:20:31 UTC 2003 Aug 7 09:57:33 earth kernel: CPU: 1 Aug 7 09:57:33 earth kernel: EIP: 0010:[<00000000>] Not tainted Aug 7 09:57:33 earth kernel: EFLAGS: 00010286 Aug 7 09:57:33 earth kernel: eax: c48f8ac0 ebx: c037b5c0 ecx: 00000000 edx: f247de60 Aug 7 09:57:33 earth kernel: esi: 650aa8c0 edi: f247de70 ebp: 660aa8c0 esp: f247ddf4 Aug 7 09:57:33 earth kernel: ds: 0018 es: 0018 ss: 0018 Aug 7 09:57:33 earth kernel: Process swapper (pid: 0, stackpage=f247d000) Aug 7 09:57:33 earth kernel: Stack: c02b29bc c48f8ac0 f247de70 f247de60 660aa8c0 c4925360 ffffffea c4f20000 Aug 7 09:57:33 earth kernel: c027ff17 f247de70 f247de60 00000000 f247c000 10000001 a05abbc8 00000000 Aug 7 09:57:33 earth kernel: ec09d7a0 e91708dc c028711b cdb0a160 00000000 000007f6 00000000 0027db79 Aug 7 09:57:33 earth kernel: Call Trace: [fib_lookup+252/320] [ip_route_input_slow+343/2528] [ip_queue_xmit2+331/448] [ip_route_input+73/480] [arp_process+861/1872] Aug 7 09:57:33 earth kernel: Call Trace: [<c02b29bc>] [<c027ff17>] [<c028711b>] [<c02807e9>] [<c02a590d>] Aug 7 09:57:33 earth kernel: [netif_rx+155/304] [net_rx_action+542/880] [do_softirq+217/224] [do_IRQ+251/304] [default_idle+0/80] [call_do_IRQ+5/13] Aug 7 09:57:33 earth kernel: [<c026a23b>] [<c026a75e>] [<c012d589>] [<c010b66b>] [<c0107110>] [<c010e2e8>] Aug 7 09:57:33 earth kernel: [default_idle+0/80] [default_idle+44/80] [cpu_idle+50/80] [vprintk+341/400] Aug 7 09:57:33 earth kernel: [<c0107110>] [<c010713c>] [<c01071b2>] [<c01280d5>] Aug 7 09:57:33 earth kernel: Code: Bad EIP value. Aug 7 09:57:33 earth kernel: <0>Kernel panic: Aiee, killing interrupt handler! Aug 7 09:57:33 earth kernel: In interrupt handler - not syncing Aug 7 09:57:40 earth kernel: Aug 7 09:57:40 earth kernel: wait_on_irq, CPU 2: Aug 7 09:57:40 earth kernel: irq: 0 [ 0 0 0 0 ] Aug 7 09:57:40 earth kernel: bh: 1 [ 1 1 0 0 ] Aug 7 09:57:40 earth kernel: Stack dumps: Aug 7 09:57:40 earth kernel: CPU 0: <unknown> Aug 7 09:57:40 earth kernel: CPU 1:00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Aug 7 09:57:40 earth kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Aug 7 09:57:40 earth kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 f247e70c Aug 7 09:57:40 earth kernel: Call Trace: Aug 7 09:57:40 earth kernel: Aug 7 09:57:40 earth kernel: CPU 3:00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Aug 7 09:57:41 earth kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Aug 7 09:57:41 earth kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Aug 7 09:57:41 earth kernel: Call Trace: Aug 7 09:57:41 earth kernel: Aug 7 09:57:41 earth kernel: CPU 2:f2475f18 00000002 00000002 ffffffff 00000002 c010c0b8 c02f22c2 00000002 Aug 7 09:57:41 earth kernel: c3a8d000 c010b252 00000002 f2475f78 c01e73e1 c3a8d000 dae46ee0 00000000 Aug 7 09:57:41 earth kernel: f2474000 c3a8d368 f2475f78 f2475f78 00000000 f2474000 c012da0a c3a8d000 Aug 7 09:57:41 earth kernel: Call Trace: [wait_on_irq+248/267] [__global_cli+98/112] [flush_to_ldisc+177/320] [__run_task_queue+106/128] [context_thread+359/512] Aug 7 09:57:41 earth kernel: Call Trace: [<c010c0b8>] [<c010b252>] [<c01e73e1>] [<c012da0a>] [<c0137be7>] Aug 7 09:57:41 earth kernel: [rest_init+0/96] [rest_init+0/96] [arch_kernel_thread+46/64] [context_thread+0/512] Aug 7 09:57:41 earth kernel: [<c0105000>] [<c0105000>] [<c01075de>] [<c0137a80>] Aug 7 09:57:41 earth kernel: I also have an old HP 88780B 9-Track tape drive connected to a Adaptec 2944UW controller, using the same driver. I believe that I can cause the same panic by turning that device on and off as well, although I haven't proven it. I kind of hate to keep crashing my running system. :-/ This leads me to believe that there is a problem in the AIC7XXX driver. Does anyone know what to do about this? TIA Eric Raskin ---------------------------------------------------------------------------- -------------- Eric H. Raskin Voice: 914-741-1100 Professional Advertising Systems Inc. Fax: 914-741-2788 70 Memorial Plaza eraskin@paslists.com Pleasantville, NY 10570
On Friday 08 August 2003 06:39, Eric Raskin wrote:
I can generate a kernel panic at will by turning this device off or on. This concerns me a bit, because any mistake can crash my system! :-/ Here is the kernel panic I've received, extracted from /var/log/messages:
Where is termination power comeing from? If from the tape, it might be a problem if the tape gets powerd down. You might try term power from the controller. (In most cases you can have more than one device supplying termination power and not have a problem). I also never rely on the built in terminators on any scsi device and prefer to supply seperate active terminators. I have similar external tape drives and can shut them off and the server keeps on running. (Spits a few nasty-grams into the log, but still runs). All in all, its a bad idea to power off scsi devices if you can avoid it... -- _____________________________________ John Andersen
John: Thanks for the reply. I will double-check that termination power comes from more than one device. I do have an active terminator (with an LED indicator). When I turn off the library, the indicator light goes out. So, it appears you are correct and the library is providing termination. I agree that it is a bad thing to turn off devices while up and running, but sometimes you just don't have a choice. When the library hangs up while users are on the system (including outside clients), I can't just shut everything down. In any event, *almost nothing* should crash a running system, no matter how dumb, right? Power switches to the server excluded, of course. :-) -----Original Message----- From: John Andersen [mailto:jsa@pen.homeip.net] Sent: Saturday, August 09, 2003 3:38 AM To: eraskin@paslists.com; SuSE-linux-e@suse.com Subject: Re: [SLE] SLES8 kernel panic when powering SCSI devices on/off on AIC7XXX driver On Friday 08 August 2003 06:39, Eric Raskin wrote:
I can generate a kernel panic at will by turning this device off or on. This concerns me a bit, because any mistake can crash my system! :-/ Here is the kernel panic I've received, extracted from /var/log/messages:
Where is termination power comeing from? If from the tape, it might be a problem if the tape gets powerd down. You might try term power from the controller. (In most cases you can have more than one device supplying termination power and not have a problem). I also never rely on the built in terminators on any scsi device and prefer to supply seperate active terminators. I have similar external tape drives and can shut them off and the server keeps on running. (Spits a few nasty-grams into the log, but still runs). All in all, its a bad idea to power off scsi devices if you can avoid it... -- _____________________________________ John Andersen -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
participants (2)
-
Eric Raskin
-
John Andersen