New subject: [Bug 208782] When accessing the tape drive the system "crashes" sometimes with an Oops

28 Sep 2006

      https://bugzilla.novell.com/show_bug.cgi?id=208782

           Summary: When accessing the tape drive the system "crashes"
                    sometimes with an Oops
           Product: SUSE Linux 10.1
           Version: Final
          Platform: 32bit
        OS/Version: SuSE Linux 10.1
            Status: NEW
          Severity: Major
          Priority: P5 - None
         Component: Kernel
        AssignedTo: kernel-maintainers@forge.provo.novell.com
        ReportedBy: kern@sibbald.com
         QAContact: qa@suse.de

When accessing the tape drive (several machines) with Bacula (backup software)
the keyboard input goes away, then shortly later the mouse, always requiring a
hard reboot. This happens on several machines, and once I have a program (under
development) that fails, the failure is 100% reproducible.  A small change in
the program can make the problem go away, so it seems to be position dependent
or size dependent.

I have tested this on two machines with different SCSI cards and three
different kernels:
kernel-smp-2.6.16.21-0.13 on i686 SCSI Adaptec AIC-7892A U160/m (rev 02)
kernel-smp-2.6.16.21-0.21 on i686 SCSI Adaptec AIC-7892A U160/m (rev 02)
kernel-default-2.6.16.13-4 on i586 SCSI Adaptec AHA-2940U2/U2W

Basic senario:
open() tape drive
do a few I/O's
read() 64524 bytes (on tape with EOF at beginning)
read returns errno=EBUSY
re-read a few times
system crashes
The crash does not always produce an oops (probably too severe)

I've straced the code an everything looks pretty normal except the return value
from the read() (EBUSY).  Change a few lines in the program more or less at
random can make the problem go away.  It is terribly frustrating ...

I have numerous Oopses of various types.  Here is an example (I have wrapped a
few lines).
Sep 28 10:30:02 rufus kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000000
Sep 28 10:30:02 rufus kernel:  printing eip:
Sep 28 10:30:02 rufus kernel: 00000000
Sep 28 10:30:02 rufus kernel: *pde = 00000000
Sep 28 10:30:02 rufus kernel: Oops: 0000 [#1]
Sep 28 10:30:02 rufus kernel: SMP
Sep 28 10:30:02 rufus kernel: last sysfs file:
/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource
Sep 28 10:30:02 rufus kernel: Modules linked in: appletalk ax25
                              ipx p8023 autofs4 ipv6 nfsd exportfs lockd
nfs_acl sunrpc
                              snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device
edd button
                              battery ac apparmor aamatch_pcre loop usbhid
dm_mod hw_random
                              intel_agp agpgart shpchp pci_hotplug ehci_hcd
uhci_hcd usbcore
                              i8xx_tco e100 mii i2c_i801 snd_intel8x0
snd_ac97_codec i2c_core
                              snd_ac97_bus snd_pcm snd_timer snd soundcore
snd_page_alloc
                              ide_cd cdrom parport_pc lp parport ext3 jbd fan
thermal processor
                              sg st aic7xxx scsi_transport_spi ata_piix libata
piix sd_mod
                              scsi_mod ide_disk ide_core
Sep 28 10:30:02 rufus kernel: CPU:    0
Sep 28 10:30:02 rufus kernel: EIP:    0060:[<00000000>]    Tainted: G     U VLI
Sep 28 10:30:02 rufus kernel: EFLAGS: 00010016   (2.6.16.21-0.21-smp #1)
Sep 28 10:30:02 rufus kernel: EIP is at _stext+0x3feffd40/0x29
Sep 28 10:30:02 rufus kernel: eax: f6d7eff4   ebx: f6d7eff4   ecx: 00000000  
edx: 00000003
Sep 28 10:30:02 rufus kernel: esi: c1a76000   edi: 00000001   ebp: df66bd5c  
esp: df66bd3c
Sep 28 10:30:02 rufus kernel: ds: 007b   es: 007b   ss: 0068
Sep 28 10:30:02 rufus kernel: Process cc1plus (pid: 5293, threadinfo=df66a000
task=c1a62130)
Sep 28 10:30:02 rufus kernel: Stack: <0>c0118d94 df66bd8c 00000003 c1800014
00000000 c1800014 df66bd8c 00000001
Sep 28 10:30:02 rufus kernel:        df66bd80 c011ad45 00000000 df66bd8c
00000003 00000296 c1800014 00000000
Sep 28 10:30:02 rufus kernel:        c13ec060 00001000 c0130964 df66bd8c
c13ec060 00000000 00001000 c0142fcc
Sep 28 10:30:02 rufus kernel: Call Trace:
Sep 28 10:30:02 rufus kernel:  [<c0118d94>] __wake_up_common+0x2f/0x53
Sep 28 10:30:02 rufus kernel:  [<c011ad45>] __wake_up+0x2a/0x3d
Sep 28 10:30:02 rufus kernel:  [<c0130964>] __wake_up_bit+0x29/0x2e
Sep 28 10:30:02 rufus kernel:  [<c0142fcc>]
generic_file_buffered_write+0x462/0x58b
Sep 28 10:30:02 rufus kernel:  [<f95e1538>] ext3_permission+0x0/0xa [ext3]
Sep 28 10:30:02 rufus kernel:  [<c01241f0>] current_fs_time+0x4c/0x58
Sep 28 10:30:02 rufus kernel:  [<c01434cd>]
__generic_file_aio_write_nolock+0x3d8/0x425
Sep 28 10:30:02 rufus kernel:  [<c0168f75>] link_path_walk+0xb3/0xbd
Sep 28 10:30:02 rufus kernel:  [<c0143921>] generic_file_aio_write+0x57/0xab
Sep 28 10:30:02 rufus kernel:  [<f95d2d0d>] ext3_file_write+0x19/0x84 [ext3]
Sep 28 10:30:02 rufus kernel:  [<c015b5fb>] do_sync_write+0xb8/0xf3
Sep 28 10:30:02 rufus kernel:  [<c013097f>] autoremove_wake_function+0x0/0x2d
Sep 28 10:30:02 rufus kernel:  [<c0133088>] hrtimer_run_queues+0x55/0xf0
Sep 28 10:30:02 rufus kernel:  [<c015b543>] do_sync_write+0x0/0xf3
Sep 28 10:30:02 rufus kernel:  [<c015beac>] vfs_write+0xaa/0x14f
Sep 28 10:30:02 rufus kernel:  [<c015c4b1>] sys_write+0x3c/0x63
Sep 28 10:30:02 rufus kernel:  [<c0103bdb>] sysenter_past_esp+0x54/0x79
Sep 28 10:30:02 rufus kernel: Code:  Bad EIP value.

I can send more oopses and strace output.  I also have a relatively simple
version of the program that I can send that shows the problem, but Bacula is a
big program, and with an Oops like this, I just am not going to be able to
simplify it to a 20 line program.

This is critical for me, because at this point the project has been dead since
approximately 8 Sept when the problem first showed up.  I haven't been able to
isolate exactly what is triggering this.

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 208782] New: When accessing the tape drive the system "crashes" sometimes with an Oops

tags

participants (1)