https://bugzilla.novell.com/show_bug.cgi?id=208782 Summary: When accessing the tape drive the system "crashes" sometimes with an Oops Product: SUSE Linux 10.1 Version: Final Platform: 32bit OS/Version: SuSE Linux 10.1 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: kern@sibbald.com QAContact: qa@suse.de When accessing the tape drive (several machines) with Bacula (backup software) the keyboard input goes away, then shortly later the mouse, always requiring a hard reboot. This happens on several machines, and once I have a program (under development) that fails, the failure is 100% reproducible. A small change in the program can make the problem go away, so it seems to be position dependent or size dependent. I have tested this on two machines with different SCSI cards and three different kernels: kernel-smp-2.6.16.21-0.13 on i686 SCSI Adaptec AIC-7892A U160/m (rev 02) kernel-smp-2.6.16.21-0.21 on i686 SCSI Adaptec AIC-7892A U160/m (rev 02) kernel-default-2.6.16.13-4 on i586 SCSI Adaptec AHA-2940U2/U2W Basic senario: open() tape drive do a few I/O's read() 64524 bytes (on tape with EOF at beginning) read returns errno=EBUSY re-read a few times system crashes The crash does not always produce an oops (probably too severe) I've straced the code an everything looks pretty normal except the return value from the read() (EBUSY). Change a few lines in the program more or less at random can make the problem go away. It is terribly frustrating ... I have numerous Oopses of various types. Here is an example (I have wrapped a few lines). Sep 28 10:30:02 rufus kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Sep 28 10:30:02 rufus kernel: printing eip: Sep 28 10:30:02 rufus kernel: 00000000 Sep 28 10:30:02 rufus kernel: *pde = 00000000 Sep 28 10:30:02 rufus kernel: Oops: 0000 [#1] Sep 28 10:30:02 rufus kernel: SMP Sep 28 10:30:02 rufus kernel: last sysfs file: /devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource Sep 28 10:30:02 rufus kernel: Modules linked in: appletalk ax25 ipx p8023 autofs4 ipv6 nfsd exportfs lockd nfs_acl sunrpc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd button battery ac apparmor aamatch_pcre loop usbhid dm_mod hw_random intel_agp agpgart shpchp pci_hotplug ehci_hcd uhci_hcd usbcore i8xx_tco e100 mii i2c_i801 snd_intel8x0 snd_ac97_codec i2c_core snd_ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc ide_cd cdrom parport_pc lp parport ext3 jbd fan thermal processor sg st aic7xxx scsi_transport_spi ata_piix libata piix sd_mod scsi_mod ide_disk ide_core Sep 28 10:30:02 rufus kernel: CPU: 0 Sep 28 10:30:02 rufus kernel: EIP: 0060:[<00000000>] Tainted: G U VLI Sep 28 10:30:02 rufus kernel: EFLAGS: 00010016 (2.6.16.21-0.21-smp #1) Sep 28 10:30:02 rufus kernel: EIP is at _stext+0x3feffd40/0x29 Sep 28 10:30:02 rufus kernel: eax: f6d7eff4 ebx: f6d7eff4 ecx: 00000000 edx: 00000003 Sep 28 10:30:02 rufus kernel: esi: c1a76000 edi: 00000001 ebp: df66bd5c esp: df66bd3c Sep 28 10:30:02 rufus kernel: ds: 007b es: 007b ss: 0068 Sep 28 10:30:02 rufus kernel: Process cc1plus (pid: 5293, threadinfo=df66a000 task=c1a62130) Sep 28 10:30:02 rufus kernel: Stack: <0>c0118d94 df66bd8c 00000003 c1800014 00000000 c1800014 df66bd8c 00000001 Sep 28 10:30:02 rufus kernel: df66bd80 c011ad45 00000000 df66bd8c 00000003 00000296 c1800014 00000000 Sep 28 10:30:02 rufus kernel: c13ec060 00001000 c0130964 df66bd8c c13ec060 00000000 00001000 c0142fcc Sep 28 10:30:02 rufus kernel: Call Trace: Sep 28 10:30:02 rufus kernel: [<c0118d94>] __wake_up_common+0x2f/0x53 Sep 28 10:30:02 rufus kernel: [<c011ad45>] __wake_up+0x2a/0x3d Sep 28 10:30:02 rufus kernel: [<c0130964>] __wake_up_bit+0x29/0x2e Sep 28 10:30:02 rufus kernel: [<c0142fcc>] generic_file_buffered_write+0x462/0x58b Sep 28 10:30:02 rufus kernel: [<f95e1538>] ext3_permission+0x0/0xa [ext3] Sep 28 10:30:02 rufus kernel: [<c01241f0>] current_fs_time+0x4c/0x58 Sep 28 10:30:02 rufus kernel: [<c01434cd>] __generic_file_aio_write_nolock+0x3d8/0x425 Sep 28 10:30:02 rufus kernel: [<c0168f75>] link_path_walk+0xb3/0xbd Sep 28 10:30:02 rufus kernel: [<c0143921>] generic_file_aio_write+0x57/0xab Sep 28 10:30:02 rufus kernel: [<f95d2d0d>] ext3_file_write+0x19/0x84 [ext3] Sep 28 10:30:02 rufus kernel: [<c015b5fb>] do_sync_write+0xb8/0xf3 Sep 28 10:30:02 rufus kernel: [<c013097f>] autoremove_wake_function+0x0/0x2d Sep 28 10:30:02 rufus kernel: [<c0133088>] hrtimer_run_queues+0x55/0xf0 Sep 28 10:30:02 rufus kernel: [<c015b543>] do_sync_write+0x0/0xf3 Sep 28 10:30:02 rufus kernel: [<c015beac>] vfs_write+0xaa/0x14f Sep 28 10:30:02 rufus kernel: [<c015c4b1>] sys_write+0x3c/0x63 Sep 28 10:30:02 rufus kernel: [<c0103bdb>] sysenter_past_esp+0x54/0x79 Sep 28 10:30:02 rufus kernel: Code: Bad EIP value. I can send more oopses and strace output. I also have a relatively simple version of the program that I can send that shows the problem, but Bacula is a big program, and with an Oops like this, I just am not going to be able to simplify it to a 20 line program. This is critical for me, because at this point the project has been dead since approximately 8 Sept when the problem first showed up. I haven't been able to isolate exactly what is triggering this. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.