Hi, On one machine I got during a kiwi build (does a lot of IO) [ 6062.355535] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen [ 6062.355539] ata1.00: irq_stat 0x00400040, connection status changed [ 6062.355543] ata1: SError: { PHYRdyChg DevExch } [ 6062.355545] ata1.00: failed command: FLUSH CACHE EXT [ 6062.355552] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [ 6062.355553] res 40/00:4c:53:60:ed/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 6062.355556] ata1.00: status: { DRDY } [ 6062.355561] ata1: hard resetting link [ 6066.852503] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 6066.854707] ata1.00: configured for UDMA/100 [ 6066.854711] ata1.00: retrying FLUSH 0xea Emask 0x10 [ 6066.854826] ata1: EH complete [ 6068.093583] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen [ 6068.093586] ata1.00: irq_stat 0x00400040, connection status changed [ 6068.093588] ata1: SError: { PHYRdyChg DevExch } [ 6068.093590] ata1.00: failed command: FLUSH CACHE EXT [ 6068.093593] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [ 6068.093594] res 40/00:5c:de:7e:07/00:00:00:00:00/40 Emask 0x10 (ATA bus error) [ 6068.093596] ata1.00: status: { DRDY } [ 6068.093599] ata1: hard resetting link [ 6072.538468] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 6072.540723] ata1.00: configured for UDMA/100 [ 6072.540727] ata1.00: retrying FLUSH 0xea Emask 0x10 [ 6072.540847] ata1: EH complete [ 7700.291971] BUG: unable to handle kernel NULL pointer dereference at 0000000f [ 7700.291976] IP: [<f8590c88>] prepare_error_buf+0x418/0x510 [reiserfs] [ 7700.291988] *pdpt = 000000001c5d6001 *pde = 0000000000000000 [ 7700.291992] Oops: 0000 [#1] PREEMPT SMP [ 7700.291995] last sysfs file: /sys/devices/pci0000:3f/0000:3f:06.3/modalias [ 7700.291998] Modules linked in: memainUSB nvidia(P) dm_mod sg iTCO_wdt iTCO_vendor_support button reiserfs loop af_packet sr_mod tg3 [last unloaded: cdrom] [ 7700.292011] [ 7700.292014] Pid: 6104, comm: rm Tainted: P 2.6.34.8-15.2-ccs #1 0B4Ch/HP Z400 Workstation [ 7700.292017] EIP: 0060:[<f8590c88>] EFLAGS: 00010286 CPU: 3 [ 7700.292027] EIP is at prepare_error_buf+0x418/0x510 [reiserfs] [ 7700.292028] EAX: 0000001e EBX: ffffffff ECX: 00000000 EDX: f85b0daf [ 7700.292030] ESI: dc593e40 EDI: dc593e3c EBP: f85b09be ESP: dc593d7c [ 7700.292031] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 7700.292033] Process rm (pid: 6104, ti=dc592000 task=dd96f1f0 task.ti=dc592000) [ 7700.292034] Stack: [ 7700.292034] 00000026 f8712000 f859f46b 00000082 c0827580 00000006 00000004 00000008 [ 7700.292037] <0> c028c3d7 00000082 00000002 f637c2e8 dc593eb0 f85b0dbe f6f95400 f85b09a0 [ 7700.292041] <0> f85975ab 00000001 00000282 00000000 c084fdc0 d2921558 c7783ce0 c0827500 [ 7700.292044] Call Trace: [ 7700.292059] [<f8590fb8>] __reiserfs_error+0x28/0xf0 [reiserfs] [ 7700.292070] [<f859914d>] reiserfs_do_truncate+0x4ed/0x600 [reiserfs] [ 7700.292086] [<f8599298>] reiserfs_delete_object+0x38/0x80 [reiserfs] [ 7700.292098] [<f8584a96>] reiserfs_delete_inode+0xb6/0xf0 [reiserfs] [ 7700.292105] [<c02fd308>] generic_delete_inode+0x68/0xf0 [ 7700.292108] [<c02fc834>] iput+0x44/0x50 [ 7700.292112] [<c02f457d>] do_unlinkat+0xdd/0x160 [ 7700.292115] [<c02030d0>] sysenter_do_call+0x12/0x26 [ 7700.292119] [<b78e8424>] 0xb78e8424 [ 7700.292120] Code: 08 89 4c 24 14 c7 44 24 04 5c 6e 5a f8 89 2c 24 e8 5e b0 e2 c7 8b 7c 24 40 e9 35 fd ff ff 8b 1f 8d 77 04 85 db 0f 84 d2 00 00 00 <0f> b6 43 10 bf 3a 98 5a f8 84 c0 74 1c 3c 03 74 6a 3c 02 bf 41 [ 7700.292137] EIP: [<f8590c88>] prepare_error_buf+0x418/0x510 [reiserfs] SS:ESP 0068:dc593d7c [ 7700.292143] CR2: 000000000000000f [ 7700.292145] ---[ end trace 4fbbb5b503c00782 ]--- [ 7700.292146] ------------[ cut here ]------------ [ 7700.292149] WARNING: at /kiwi/packages/BUILD/kernel-ccs-2.6.34.8/linux-2.6.34/kernel/exit.c:918 do_exit+0x2fd/0x350() [ 7700.292151] Hardware name: HP Z400 Workstation [ 7700.292151] Modules linked in: memainUSB nvidia(P) dm_mod sg iTCO_wdt iTCO_vendor_support button reiserfs loop af_packet sr_mod tg3 [last unloaded: cdrom ] [ 7700.292159] Pid: 6104, comm: rm Tainted: P D 2.6.34.8-15.2-ccs #1 [ 7700.292160] Call Trace: [ 7700.292164] [<c0206ba3>] try_stack_unwind+0x173/0x190 [ 7700.292167] [<c02057ef>] dump_trace+0x3f/0xe0 [ 7700.292169] [<c0206c0b>] show_trace_log_lvl+0x4b/0x60 [ 7700.292172] [<c0206c38>] show_trace+0x18/0x20 [ 7700.292176] [<c05af9a7>] dump_stack+0x6d/0x72 [ 7700.292179] [<c0243cfe>] warn_slowpath_common+0x6e/0xb0 [ 7700.292181] [<c0243d53>] warn_slowpath_null+0x13/0x20 [ 7700.292184] [<c02477ed>] do_exit+0x2fd/0x350 [ 7700.292187] [<c0206d76>] oops_end+0x86/0xc0 [ 7700.292191] [<c0223f8f>] bad_area_nosemaphore+0xf/0x20 [ 7700.292194] [<c0224477>] do_page_fault+0x2d7/0x360 [ 7700.292197] [<c05b2cda>] error_code+0x66/0x6c [ 7700.292203] [<f8590c88>] prepare_error_buf+0x418/0x510 [reiserfs] [ 7700.292213] [<f8590fb8>] __reiserfs_error+0x28/0xf0 [reiserfs] [ 7700.292224] [<f859914d>] reiserfs_do_truncate+0x4ed/0x600 [reiserfs] [ 7700.292239] [<f8599298>] reiserfs_delete_object+0x38/0x80 [reiserfs] [ 7700.292251] [<f8584a96>] reiserfs_delete_inode+0xb6/0xf0 [reiserfs] [ 7700.292256] [<c02fd308>] generic_delete_inode+0x68/0xf0 [ 7700.292258] [<c02fc834>] iput+0x44/0x50 [ 7700.292261] [<c02f457d>] do_unlinkat+0xdd/0x160 [ 7700.292263] [<c02030d0>] sysenter_do_call+0x12/0x26 [ 7700.292266] [<b78e8424>] 0xb78e8424 [ 7700.292267] ---[ end trace 4fbbb5b503c00783 ]--- [ 7700.292269] note: rm[6104] exited with preempt_count 1 rt-z9857:/var/log # dmesg | ksymoops ksymoops 2.4.11 on i686 2.6.34.8-15.2-ccs. Options used -V (default) -k /proc/kallsyms (default) -l /proc/modules (default) -o /lib/modules/2.6.34.8-15.2-ccs/ (default) -m /boot/System.map-2.6.34.8-15.2-ccs (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid ksyms file? No modules in ksyms, skipping objects No ksyms, skipping lsmod Error (regular_file): read_system_map stat /boot/System.map-2.6.34.8-15.2-ccs failed ksymoops: No such file or directory Warning (merge_maps): no symbols in merged map [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1]) [ 0.691966] ehci_hcd 0000:00:1a.7: debug port 1 [ 0.706972] ehci_hcd 0000:00:1d.7: debug port 1 [ 7700.291971] BUG: unable to handle kernel NULL pointer dereference at 0000000f 3 warnings and 1 error issued. Results may not be reliable.
-----Original Message----- From: Dave Howorth [mailto:dhoworth@mrc-lmb.cam.ac.uk]
SMART Error Log Version: 1 No Errors Logged
I think that's very significant. The drive didn't even see the bus error.
I used smartctrl do run some tests (offline, short Mar 10 10:08:26 rt-z9857 smartd[4759]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD3200AAJS_60Z0A0-WD_WCAV2S489213.ata.state Mar 10 10:08:26 rt-z9857 smartd[4761]: smartd has fork()ed into background mode. New PID=4761. Mar 10 10:38:26 rt-z9857 smartd[4761]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 74 to 64 Mar 10 11:08:27 rt-z9857 smartd[4761]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 3 Spin_Up_Time changed from 153 to 162 Mar 10 11:08:27 rt-z9857 smartd[4761]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 63 Mar 10 11:38:26 rt-z9857 smartd[4761]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 200 to 100 Mar 10 12:08:26 rt-z9857 smartd[4761]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 3 Spin_Up_Time changed from 162 to 169 Mar 10 13:08:26 rt-z9857 smartd[4761]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 63 to 64 Mar 10 13:08:26 rt-z9857 smartd[4761]: Device: /dev/sda [SAT], SMART Usage Attribute: 198 Offline_Uncorrectable changed from 100 to 200 Mar 10 13:08:26 rt-z9857 smartd[4761]: Device: /dev/sda [SAT], SMART Usage Attribute: 200 Multi_Zone_Error_Rate changed from 100 to 200
I am running some further smart tests now.
Definitely worth doing, and I predict they'll come back clean.
Not really, see above.
It does sound to me like a hardware incompatibility. I would definitely try a recent kernel.
These are OpenSUSE 11.3 machines. What do you mean with recent kernel? Please give me a hint which kernel I shall try. I suspect there's a reasonable chance that the
problem has already been fixed.
Oh, the other question will be, are you sure there's enough power and that all power and data cables are good.
Yes, definetly. This is industrial quality cabling. Power is fine. Best regards Martin Konold Robert Bosch GmbH Automotive Electronics (RtP2/TEF72) Postfach 13 42 72703 Reutlingen GERMANY www.bosch.com external.martin.konold@de.bosch.com Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais; Bernd Bohr, Rudolf Colm, Volkmar Denner, Wolfgang Malchow, Peter Marks, Peter Tyroller; Stefan Asenkerschbaumer, Uwe Raschke, Wolf-Henning Scheider -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org