[Bug 463829] New: OS 11.0 fails drive mount via Sil 3124 sata card; OK if using gParted LIVE CD
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c1 Summary: OS 11.0 fails drive mount via Sil 3124 sata card; OK if using gParted LIVE CD Product: openSUSE 11.0 Version: Final Platform: x86-64 OS/Version: openSUSE 11.0 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: pgnet.trash@gmail.com QAContact: qa@suse.de Found By: Customer summary: Opensuse 11.0 on X86_64 doesn't see external drives via a Sil 3124 sata card. However, gParted Live CD sees/manages them with no probelm. details: I've Opensuse 11.0 installed on X86_64. uname -a Linux server 2.6.25.18-0.2-default #1 SMP 2008-10-21 16:30:26 +0200 x86_64 x86_64 x86_64 GNU/Linux I've installed a SATA controller, a Addonics MultiLane 4X RAID5/JBOD PCI-X Controller which has a Silicon Image Sil 3124 chipset. It's connected to an external enclosure, via Multilane cable, containing two SAMSUNG HD103UJ 1TB drives. @ system boot, in Silicon Image BIOS config, I turn *off* RAID support, and can quick/low-level format the two drives. If I boot the system from a gParted LIVE CD (http://gparted.sourceforge.net/), I can see the sata_sil24 driver load @ console. Once fully booted, the two external drives show up in the partition editor as "sdc" & "sdd", and can be partitioned at will. If I boot the system to Opensuse 11.0, title openSUSE 11.0 (symlink) CONSOLE=ttyS0 root (hd0,0) kernel /vmlinuz \ root=/dev/system/LV_OS11 resume=/dev/md1 \ showopts vga=0x31a console=tty0 console=ttyS0,57600n8 initrd /initrd the sata drivers are apparently loaded, lsmod | egrep -i "ata|raid|scsi|ide" raid456 147232 0 async_xor 21504 1 raid456 async_memcpy 19840 1 raid456 async_tx 26084 3 raid456,async_xor,async_memcpy xor 22672 2 raid456,async_xor raid0 24832 0 raid1 43136 4 sata_sil24 36100 0 pata_amd 33284 0 sata_nv 46860 8 libata 195232 3 sata_sil24,pata_amd,sata_nv scsi_mod 195160 4 sr_mod,sg,sd_mod,libata dock 29344 1 libata and the PCI card is correctly recognized, lspci | egrep -i "ata|raid|scsi|ide" 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1) 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1) 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1) 04:07.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02) but I can find no trace of the sdc/sdd drives, just ls -1 /dev/sd* /dev/sda /dev/sda1 /dev/sda2 /dev/sda3 /dev/sda4 /dev/sdb /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb4 which are my internal, Opensuse-containing drives. Here's the disk-related dmesg output that (I think) is relevant: dmesg | egrep -i "^raid|^scsi|^ata|^md|^sd|^pata|^sata" -------- SCSI subsystem initialized sata_nv 0000:00:0e.0: version 3.5 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0xc800 ctl 0xc480 bmdma 0xc000 irq 22 ata2: SATA max UDMA/133 cmd 0xc400 ctl 0xc080 bmdma 0xc008 irq 22 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133 ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32) ata1.00: configured for UDMA/133 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133 ata2.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32) ata2.00: configured for UDMA/133 scsi 0:0:0:0: Direct-Access ATA ST3250410AS 3.AA PQ: 0 ANSI: 5 scsi 1:0:0:0: Direct-Access ATA ST3250410AS 3.AA PQ: 0 ANSI: 5 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0xbc00 ctl 0xb880 bmdma 0xb400 irq 23 ata4: SATA max UDMA/133 cmd 0xb800 ctl 0xb480 bmdma 0xb408 irq 23 ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) pata_amd 0000:00:0d.0: version 0.3.10 scsi4 : pata_amd scsi5 : pata_amd ata5: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14 ata6: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15 ata5.00: ATAPI: PIONEER DVD-RW DVR-112D, 1.21, max UDMA/66 ata5: nv_mode_filter: 0x1f39f&0x1f01f->0x1f01f, BIOS=0x1f000 (0xc5000000) ACPI=0x1f01f (30:900:0x11) ata5.00: configured for UDMA/66 scsi 4:0:0:0: CD-ROM PIONEER DVD-RW DVR-112D 1.21 PQ: 0 ANSI: 5 sata_sil24 0000:04:07.0: version 1.1 scsi6 : sata_sil24 scsi7 : sata_sil24 scsi8 : sata_sil24 scsi9 : sata_sil24 ata7: SATA max UDMA/100 host m128@0xfebffc00 port 0xfebf0000 irq 19 ata8: SATA max UDMA/100 host m128@0xfebffc00 port 0xfebf2000 irq 19 ata9: SATA max UDMA/100 host m128@0xfebffc00 port 0xfebf4000 irq 19 ata10: SATA max UDMA/100 host m128@0xfebffc00 port 0xfebf6000 irq 19 ata7: softreset failed (timeout) ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata7.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) ata7: failed to recover some devices, retrying in 5 secs ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata7.00: native sectors (2) is smaller than sectors (1953525168) ata7.00: ATA-7: SAMSUNG HD103UJ, 1AA01113, max UDMA7 ata7.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata7.00: model number mismatch 'SAMSUNG HD103UJ' != '' ata7.00: revalidation failed (errno=-19) ata7: limiting SATA link speed to 1.5 Gbps ata7.00: limiting speed to UDMA/100:PIO3 ata7: failed to recover some devices, retrying in 5 secs ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 10) ata7.00: n_sectors mismatch 1953525168 != 16514064 ata7.00: revalidation failed (errno=-19) ata7.00: disabled ata8: softreset failed (timeout) ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata8.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) ata8: failed to recover some devices, retrying in 5 secs ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata8.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) ata8: failed to recover some devices, retrying in 5 secs ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata8.00: native sectors (2) is smaller than sectors (1953525168) ata8.00: ATA-7: SAMSUNG HD103UJ, 1AA01113, max UDMA7 ata8.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata8.00: model number mismatch 'SAMSUNG HD103UJ' != '' ata8.00: revalidation failed (errno=-19) ata8.00: disabled ata9: SATA link down (SStatus 0 SControl 0) ata10: SATA link down (SStatus 0 SControl 0) md: raid1 personality registered for level 1 sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] Attached SCSI disk sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] Attached SCSI disk md: raid0 personality registered for level 0 raid6: int64x1 1877 MB/s raid6: int64x2 2605 MB/s raid6: int64x4 1901 MB/s raid6: int64x8 1906 MB/s raid6: sse2x1 2605 MB/s raid6: sse2x2 3312 MB/s raid6: sse2x4 3646 MB/s raid6: using algorithm sse2x4 (3646 MB/s) md: raid6 personality registered for level 6 md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 md: md1 stopped. md: md2 stopped. md: bind<sdb3> md: bind<sda3> raid1: raid set md2 active with 2 out of 2 mirrors md2: bitmap initialized from disk: read 13/13 pages, set 136 bits md: md1 stopped. md: bind<sdb2> md: bind<sda2> raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 1/1 pages, set 0 bits md: linear personality registered for level -1 sd 0:0:0:0: Attached scsi generic sg0 type 0 sd 1:0:0:0: Attached scsi generic sg1 type 0 scsi 4:0:0:0: Attached scsi generic sg2 type 5 md: md0 stopped. md: bind<sdb1> md: bind<sda1> raid1: raid set md0 active with 2 out of 2 mirrors md0: bitmap initialized from disk: read 1/1 pages, set 2 bits md: md3 stopped. md: bind<sdb4> md: bind<sda4> raid1: raid set md3 active with 2 out of 2 mirrors md3: bitmap initialized from disk: read 28/28 pages, set 2 bits -------- Noting above, ... ata8.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) ata8: failed to recover some devices, retrying in 5 secs ... ata8.00: native sectors (2) is smaller than sectors (1953525168) ... ata8.00: model number mismatch 'SAMSUNG HD103UJ' != '' ata8.00: revalidation failed (errno=-19) ... that looks suspicious to my eye, and digging, I've found, http://www.opensubscriber.com/message/linux-ide@vger.kernel.org/8592606.html http://www.mail-archive.com/linux-ide@vger.kernel.org/msg16058.html which @ least refers to the sil driver and mismatch errors ... happy to provide any additional info. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c1 --- Comment #1 from pgnet _ <pgnet.trash@gmail.com> 2009-01-07 09:15:30 MST --- forgot to mention, the above scenario already includes a changed, grep INITRD_MODULES /etc/sysconfig/kernel #INITRD_MODULES="processor thermal sata_nv pata_amd fan jbd ext3 raid1 dm_mod edd" INITRD_MODULES="sata_sil24 processor thermal sata_nv pata_amd fan jbd ext3 raid1 dm_mod edd" and a subsequent mkinitrd -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 Jiri Kosina <jkosina@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.provo.novell.com |teheo@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c2 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #2 from Tejun Heo <teheo@novell.com> 2009-01-12 20:15:58 MST --- Can you please attach /var/log/boot.msg? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c3 --- Comment #3 from Tejun Heo <teheo@novell.com> 2009-01-12 20:16:19 MST --- Also, does irqpoll kernel parameter help? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c4 --- Comment #4 from pgnet _ <pgnet.trash@gmail.com> 2009-01-12 21:49:49 MST --- Created an attachment (id=264673) --> (https://bugzilla.novell.com/attachment.cgi?id=264673) serial console output @ crash, with sata_sil24 card & attached drives (In reply to comment #2)
Can you please attach /var/log/boot.msg?
Dealing with other Novell issues, I was convinced to upgrade to Opensuse 11.1. OS 11.1 with the SATA card installed, but the attached drive array turned off, boots just fine, as it did before, into either kernel-default or kernel-xen. However, now, with the drives powered up, won't boot -- but crashes. So, I can't, atm, get you your requested /var/log/boot.msg. I _have_ attached (console.txt) serial console output which, I hope, may provide you the info you need ... ? I'll also check 'irqpoll' in a few minutes here ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c5 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW --- Comment #5 from pgnet _ <pgnet.trash@gmail.com> 2009-01-12 22:37:56 MST --- Created an attachment (id=264678) --> (https://bugzilla.novell.com/attachment.cgi?id=264678) boot.msg after addition of 'irqpoll' kernel param I've added 'irqpoll' to kernel params. boot completed, and I'm attaching boot.msg, per request. As before, still no trace of sdc/sdd, as far as I can see. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c6 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|pgnet.trash@gmail.com | --- Comment #6 from Tejun Heo <teheo@novell.com> 2009-01-13 20:41:01 MST --- Hmm... the oops probably is a separate issue. Does specifying "libata.force=1.5Gbps" make any difference? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |pgnet.trash@gmail.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c7 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW --- Comment #7 from pgnet _ <pgnet.trash@gmail.com> 2009-01-13 21:21:49 MST --- Created an attachment (id=264924) --> (https://bugzilla.novell.com/attachment.cgi?id=264924) console crash output with addition of libata.force=1.5Gps Since last efforts, I've DL'd Win7 beta (not my favorite, btw!) to check & update as required all BIOS, etc. Motherboard BIOS was upgraded, SATA card didn't need it. That given, per your request, booting to, title openSUSE_11_1 CONSOLE root (hd0,0) kernel /vmlinuz root=/dev/VG_Dom0/LV_ROOT resume=/dev/md1 \ showopts elevator=cfq iommu=soft irqpoll libata.force=1.5Gbps \ vga=0x31a console=tty0 console=ttyS0,57600n8 initrd /initrd I get a crash (attached ...). Hrm. I thought the irpoll fixed that ... will remove and try again, just for check. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c8 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|pgnet.trash@gmail.com | --- Comment #8 from pgnet _ <pgnet.trash@gmail.com> 2009-01-13 21:39:08 MST --- changing, - showopts elevator=cfq iommu=soft irqpoll libata.force=1.5Gbps \ + showopts elevator=cfq iommu=soft irqpoll \ i can, again, boot to kernel or kernel-xen. re-add ... libata.force=1.5Gbps ... and crashes again, as above. the cycle's reproducible, it seems. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c9 --- Comment #9 from Tejun Heo <teheo@novell.com> 2009-01-13 21:48:40 MST --- Does the controller and hard drive work properly under windows? Your two crash logs are completely different and it seems to indicate hardware problem. Can you please trigger the crash several times and try to see whether those are actually related to the parameter you're specifying or they were just coincidental. Also, please capture each crash. Let's try to find some pattern. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |pgnet.trash@gmail.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c10 --- Comment #10 from pgnet _ <pgnet.trash@gmail.com> 2009-01-13 21:59:57 MST ---
Does the controller and hard drive work properly under windows?
Yes. Perfectly. I was able to create partitions on the individual drives, &/or create RAID arrays, and partition at will. The I was able to install different versions of the Windows drivers, and -- presuming that there's some sort of checking when you do so -- received no errors/warnings, and was able to repeat all functions at will. The gParted disks also allow be to partition at will (though, afaict, no raid capability). I test that capability by partitioning in gParted as NTFS/FAT and then accessing in Win7. No problems experienced.
Your two crash logs are completely different and it seems to indicate hardware problem.
Can you please trigger the crash several times and try to see whether those are actually related to the parameter you're specifying or they were just coincidental. Also, please capture each crash. Let's try to find some
I had not thought to compare ... :-/ Though, the differences *may* be on 'either side' of updates to the Mobo BIOS , the migrate from Opensuse 11.0 -> 11.1, or monkeying with loaded modules, etc. pattern. I'll repeat a number of crashes, changing nothing in the interim, capture each console output, and zip into an attachment. Back asap. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c11 --- Comment #11 from pgnet _ <pgnet.trash@gmail.com> 2009-01-13 22:49:21 MST --- Created an attachment (id=264925) --> (https://bugzilla.novell.com/attachment.cgi?id=264925) zipped folder of several crash tests in a row, with libata.force=1.5GBps Starting at a system booted to kernel-defautl, with NOT libata.force..., grep irqpoll /boot/grub/menu.lst showopts elevator=cfq iommu=soft irqpoll \ uname -a Linux server 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux changing, vi /boot/grub/menu.lst ... grep irqpoll /boot/grub/menu.lst showopts elevator=cfq iommu=soft irqpoll libata.force=1.5Gbps \ then: reboot crash --> console_crash_repeat_1.txt reboot booted ok !? --> console_crash_repeat_2.txt --> /var/log/boot.msg ls -al /dev/sd* brw-rw---- 1 root disk 8, 0 2009-01-13 13:17 /dev/sda brw-rw---- 1 root disk 8, 16 2009-01-13 13:16 /dev/sdb brw-rw---- 1 root disk 8, 17 2009-01-13 13:16 /dev/sdb1 brw-rw---- 1 root disk 8, 18 2009-01-13 13:16 /dev/sdb2 brw-rw---- 1 root disk 8, 19 2009-01-13 13:16 /dev/sdb3 brw-rw---- 1 root disk 8, 20 2009-01-13 13:16 /dev/sdb4 brw-rw---- 1 root disk 8, 32 2009-01-13 13:16 /dev/sdc brw-rw---- 1 root disk 8, 33 2009-01-13 13:16 /dev/sdc1 brw-rw---- 1 root disk 8, 34 2009-01-13 13:16 /dev/sdc2 brw-rw---- 1 root disk 8, 35 2009-01-13 13:16 /dev/sdc3 brw-rw---- 1 root disk 8, 36 2009-01-13 13:16 /dev/sdc4 where'd the 4 sdc# partitions come from? the sda# partitions are missing. it's *supposed* to be, /dev/md0: sda1 + sdb1 (RAID-1) -> /boot, ext3 /dev/md1: sda2 + sdb2 (RAID-1) -> swap, swap /dev/md2: sda3 + sdb3 (RAID-1) -> LVM PVs ... /dev/md3: sda4 + sdb4 (RAID-1) -> LVM PVs ... reboot crash --> console_crash_repeat_3.txt reboot crash --> console_crash_repeat_4.txt back to, vi /boot/grub/menu.lst ... grep irqpoll /boot/grub/menu.lst showopts elevator=cfq iommu=soft irqpoll \ reboot boots OK. (of course, still, no sdc/sdd drives ...) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c12 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|pgnet.trash@gmail.com | --- Comment #12 from pgnet _ <pgnet.trash@gmail.com> 2009-01-13 22:51:13 MST --- . -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c13 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #13 from Tejun Heo <teheo@novell.com> 2009-01-13 23:42:38 MST --- Does "mem=3G" make any difference? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c14 --- Comment #14 from pgnet _ <pgnet.trash@gmail.com> 2009-01-14 08:58:38 MST --- Created an attachment (id=265042) --> (https://bugzilla.novell.com/attachment.cgi?id=265042) console etc output with mem=3G adding mem=3G to kernel options, boots to kernel-default & kernel-xen both complete, no crash kernel-default case: PCI-attached drives appear, and can be accessed via fdisk & partitioner kernel-xen case: drives, again, not available -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|pgnet.trash@gmail.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c15 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |trenn@novell.com --- Comment #15 from Tejun Heo <teheo@novell.com> 2009-01-14 19:26:37 MST --- Let's concentrate on kernel-default first. Something is wrong with 64bit DMA support on that machine. I'm quite sure that 64bit DMA support on sil24 controller is fine, so it's very likely that your PCI or host bridge can't handle 64bit DMA and thus corrupts data left and right when it happens. cc'ing Thomas. Thomas, sorry to bother you but do you happen to know who should I be bugging for possible chipset bugs? It seems we'll need to blacklist this chipset for 64bit DMA. pgnet_, can you please post the output of "lspci -nn"? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c16 --- Comment #16 from pgnet _ <pgnet.trash@gmail.com> 2009-01-14 19:37:41 MST --- Created an attachment (id=265174) --> (https://bugzilla.novell.com/attachment.cgi?id=265174) output of 'lspci -nn'
Let's concentrate on kernel-default first.
np
Something is wrong with 64bit DMA support on that machine ...
is that what the 'mem=3G' is affecting? that certainly seems to have done the trick ...
pgnet_, can you please post the output of "lspci -nn"?
(attached) fyi, atm, the attached drives are functioning in a S/W raid-1 array, with multiple partitions, volume groups & logival volumes, with a mix of ext3 and xfs fs. so far, a few read/write tests have proven reliable ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c17 --- Comment #17 from Tejun Heo <teheo@novell.com> 2009-01-14 20:52:56 MST --- mem=4G should work too. I just wanted to be on the safe side. Your pci/host bridge is either dropping data w/ 64bit address thrown at it by the controller or, more in much more scary scenario, writing it to some random place (most likely with upper 32bit of address clipped). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c18 --- Comment #18 from pgnet _ <pgnet.trash@gmail.com> 2009-01-14 21:10:33 MST --- (In reply to comment #17)
mem=4G should work too. I just wanted to be on the safe side.
ok
Your pci/host bridge is either dropping data w/ 64bit address thrown at it by the controller or, more in much more scary scenario, writing it to some random place (most likely with upper 32bit of address clipped).
In either case, likely due to hardware (mis)design, or hardware malfunction? Recall that the card certainly worked on 64-bit Win7 ... if that's telling at all, I don't know. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c19 --- Comment #19 from pgnet _ <pgnet.trash@gmail.com> 2009-01-14 21:29:15 MST ---
mem=4G should work too
just fyi, it does. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c20 --- Comment #20 from pgnet _ <pgnet.trash@gmail.com> 2009-01-14 22:22:35 MST ---
It seems we'll need to blacklist this chipset for 64bit DMA.
a naive question -- but, what does 'blacklist' imply? does the end-game include a working array, or me needing to consider other hardware? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c21 --- Comment #21 from Tejun Heo <teheo@novell.com> 2009-01-14 22:30:19 MST --- Heh... you'll get to keep your hardware. :-) It just means marking the bridge as incapable of 64bit DMA. The kernel will fall back to iommu. The performance hit shouldn't noticeable in most cases. Windows probably is already using iommu. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c22 --- Comment #22 from pgnet _ <pgnet.trash@gmail.com> 2009-01-14 22:42:14 MST ---
Heh... you'll get to keep your hardware. :-)
just thought i'd ask ;-) the way my luck's been running lately ...
It just means marking the bridge as incapable of 64bit DMA.
I'll be curious to know which component is, in fact, the culprit ... mobo &/or PCI card ...
The kernel will fall back to iommu. The performance hit shouldn't noticeable in most cases. Windows probably is already using iommu.
Hm. iommu. My understanding from other threads/topics, was that -- for this mobo/CPU, I need "iommu=soft", which, you may note, _is_ added to my kernel opts. Could that be an issue? I've more questions than answers, I'm afraid. Well, actually, _only_ questions, at this point. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c23 --- Comment #23 from Tejun Heo <teheo@novell.com> 2009-01-14 23:40:03 MST --- Most likely the mobo. No, for 64bit capable controllers, iommu is not used as the controller can directly address all the memories. Does iommu=force work? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c24 --- Comment #24 from pgnet _ <pgnet.trash@gmail.com> 2009-01-15 07:44:27 MST --- Created an attachment (id=265365) --> (https://bugzilla.novell.com/attachment.cgi?id=265365) console output for kernel-xen boot, with iommu=force changing, - showopts elevator=cfq iommu=soft irqpoll mem=4G \ + showopts elevator=cfq iommu=force irqpoll \ then booting, console shows, ... -> Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ 20000000 ... which is the reason I was told, @ MSI, to enable iommu-soft in the first place, since, according to MSI, my mobo does NOT support iommeu ... but I also now see, ... PCI-DMA: Disabling AGP. PCI-DMA: aperture base @ 20000000 size 65536 KB PCI-DMA: using GART IOMMU. PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture ... which looks, promising, and ... boot completes. uname -a Linux server 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux checking, the attached disks are available! fdisk -l| grep sd | grep autodetect /dev/sda1 2 78326 629145562+ fd Linux raid autodetect /dev/sda2 78327 121601 347606437+ fd Linux raid autodetect /dev/sdb1 2 78326 629145562+ fd Linux raid autodetect /dev/sdb2 78327 121601 347606437+ fd Linux raid autodetect /dev/sdc1 2 20 152617+ fd Linux raid autodetect /dev/sdc2 21 85 522112+ fd Linux raid autodetect /dev/sdc3 86 1651 12578895 fd Linux raid autodetect /dev/sdc4 1652 30401 230934375 fd Linux raid autodetect /dev/sdd1 * 2 20 152617+ fd Linux raid autodetect /dev/sdd2 21 85 522112+ fd Linux raid autodetect /dev/sdd3 86 1651 12578895 fd Linux raid autodetect /dev/sdd4 1652 30401 230934375 fd Linux raid autodetect similarly changing for kernel-xen boot, - showopts splash=silent selinux=0 elevator=cfq iommu=soft irqpoll mem=4G \ - showopts splash=silent selinux=0 elevator=cfq iommu=force irqpoll \ then on reboot i see some xen-specific messaging @ console (attachment), but the system completely boots, uname -a Linux server 2.6.27.7-9-xen #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux but the attached drives are missing again, fdisk -l| grep sd | grep autodetect /dev/sda1 2 20 152617+ fd Linux raid autodetect /dev/sda2 21 85 522112+ fd Linux raid autodetect /dev/sda3 86 1651 12578895 fd Linux raid autodetect /dev/sda4 1652 30401 230934375 fd Linux raid autodetect /dev/sdb1 * 2 20 152617+ fd Linux raid autodetect /dev/sdb2 21 85 522112+ fd Linux raid autodetect /dev/sdb3 86 1651 12578895 fd Linux raid autodetect /dev/sdb4 1652 30401 230934375 fd Linux raid autodetect nonetheless, progress, it seems ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c25 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |gregkh@novell.com, | |rjw@novell.com --- Comment #25 from Tejun Heo <teheo@novell.com> 2009-01-15 22:57:23 MST --- cc'ing Rafael and Greg. Can one of you guys take over here? It seems the chipset needs to be quirked. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c27 --- Comment #27 from pgnet _ <pgnet.trash@gmail.com> 2009-01-16 10:33:23 MST --- re: the drives-still-missing-in-xen behavior ... likely 'chipset quirks' as well? separate kernel/xen issue? or, just wait & see? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c29 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #29 from Tejun Heo <teheo@novell.com> 2009-01-21 06:11:43 MST --- pgnet_ can you please post the output of "lspci -nnvvvxxx"? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c30 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #30 from pgnet _ <pgnet.trash@gmail.com> 2009-01-21 07:20:21 MST --- Created an attachment (id=266525) --> (https://bugzilla.novell.com/attachment.cgi?id=266525) ouput of 'lspci -nnvvvxxx' per request ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c31 --- Comment #31 from pgnet _ <pgnet.trash@gmail.com> 2009-02-02 11:50:02 MST --- have there perchance been any updates/commits on this? none that i can see in RELEASE branch, but, perhaps in Factory? elsewhere? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User fifachen@sina.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c32 peer chen <fifachen@sina.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fifachen@sina.com --- Comment #32 from peer chen <fifachen@sina.com> 2009-02-05 00:57:46 MST --- pgnet_, Could you update latest BIOS and try again? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c33 --- Comment #33 from pgnet _ <pgnet.trash@gmail.com> 2009-02-05 08:19:56 MST --- the mobo's already @ the latest bios, v5.8 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c34 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #34 from Tejun Heo <teheo@novell.com> 2009-02-05 09:01:31 MST --- Can you please test the following kernel? http://htj.dyndns.org/export/testing/sl111-x86_64-bug463829_dbg0/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c35 --- Comment #35 from pgnet _ <pgnet.trash@gmail.com> 2009-02-05 09:33:51 MST --- @ install, setenv VER "2.6.27.14-bug463829_dbg0_d49a1108" rpm -i kernel-default-base-${VER}.x86_64.rpm kernel-default-${VER}.x86_64.rpm Kernel image: /boot/vmlinuz-2.6.27.14-bug463829_dbg0_d49a1108-default Initrd image: /boot/initrd-2.6.27.14-bug463829_dbg0_d49a1108-default Root device: /dev/VG_Dom0/LV_ROOT (mounted on / as ext3) Resume device: /dev/VG_Swap/LV_SWAP Device md3 not handled Script /lib/mkinitrd/setup/72-block.sh failed! Setting up /lib/modules/2.6.27.14-bug463829_dbg0_d49a1108-default Kernel image: /boot/vmlinuz-2.6.27.14-bug463829_dbg0_d49a1108-default Initrd image: /boot/initrd-2.6.27.14-bug463829_dbg0_d49a1108-default Root device: /dev/VG_Dom0/LV_ROOT (mounted on / as ext3) Resume device: /dev/VG_Swap/LV_SWAP Device md3 not handled Script /lib/mkinitrd/setup/72-block.sh failed! b4 proceeding, are these: Device md3 not handled Script /lib/mkinitrd/setup/72-block.sh failed! a concern that 1st needs to be addressed ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c36 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #36 from pgnet _ <pgnet.trash@gmail.com> 2009-02-05 15:38:05 MST --- Created an attachment (id=270627) --> (https://bugzilla.novell.com/attachment.cgi?id=270627) dmesg output from boot of test kernel dropping back to uname -a Linux server 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux & rebooting mysteriously removes that issue; mkinitrd completes without error. retrying, setenv VER "2.6.27.14-bug463829_dbg0_d49a1108" rpm -qa | grep kernel | grep $VER (empty) rpm -i kernel-default-*${VER}* Setting up /lib/modules/2.6.27.14-bug463829_dbg0_d49a1108-default Kernel image: /boot/vmlinuz-2.6.27.14-bug463829_dbg0_d49a1108-default Initrd image: /boot/initrd-2.6.27.14-bug463829_dbg0_d49a1108-default Root device: /dev/VG_Dom0/LV_ROOT (mounted on / as ext3) Resume device: /dev/VG_Swap/LV_SWAP Kernel Modules: raid0 raid1 xor async_tx async_memcpy async_xor raid456 jbd dock scsi_mod libata sata_sil24 sata_nv pata_amd ata_generic ide-core amd74xx ide-pci-generic dm-mod crypto_blkcipher dm-crypt dm-snapshot mbcache ext3 xfs aes_generic aes-x86_64 sha1_generic sha256_generic sha512_generic hwmon thermal_sys processor thermal fan edd cpuid crc-t10dif sd_mod usbcore ohci-hcd uhci-hcd ehci-hcd ff-memless hid usbhid linear Features: dm block usb md lvm2 resume.userspace resume.kernel Bootsplash: openSUSE (1280x1024) 30189 blocks checking, rpm -qa | grep kernel | grep $VER kernel-default-base-2.6.27.14-bug463829_dbg0_d49a1108 kernel-default-2.6.27.14-bug463829_dbg0_d49a1108 kernel-default-extra-2.6.27.14-bug463829_dbg0_d49a1108 rebooting to, title openSUSE 11.1 - 2.6.27.14-bug463829_dbg0_d49a1108 root (hd0,0) kernel /vmlinuz-2.6.27.14-bug463829_dbg0_d49a1108-default root=/dev/VG_Dom0/LV_ROOT resume=/dev/VG_Swap/LV_SWAP splash=silent showopts vga=0x31a initrd /initrd-2.6.27.14-bug463829_dbg0_d49a1108-default i've, uname -a Linux server 2.6.27.14-bug463829_dbg0_d49a1108-default #1 SMP 2009-02-05 15:21:23 +0100 x86_64 x86_64 x86_64 GNU/Linux and, fdisk -l | grep sd | grep autodetect Disk /dev/md2 doesn't contain a valid partition table Disk /dev/md3 doesn't contain a valid partition table Disk /dev/dm-0 doesn't contain a valid partition table Disk /dev/dm-1 doesn't contain a valid partition table Disk /dev/dm-2 doesn't contain a valid partition table Disk /dev/dm-4 doesn't contain a valid partition table Disk /dev/dm-7 doesn't contain a valid partition table Disk /dev/md1 doesn't contain a valid partition table Disk /dev/dm-10 doesn't contain a valid partition table /dev/sda1 2 78326 629145562+ fd Linux raid autodetect /dev/sda2 78327 121601 347606437+ fd Linux raid autodetect /dev/sdb1 2 78326 629145562+ fd Linux raid autodetect /dev/sdb2 78327 121601 347606437+ fd Linux raid autodetect /dev/sdc1 2 20 152617+ fd Linux raid autodetect /dev/sdc2 21 85 522112+ fd Linux raid autodetect /dev/sdc3 86 1651 12578895 fd Linux raid autodetect /dev/sdc4 1652 30401 230934375 fd Linux raid autodetect /dev/sdd1 * 2 20 152617+ fd Linux raid autodetect /dev/sdd2 21 85 522112+ fd Linux raid autodetect /dev/sdd3 86 1651 12578895 fd Linux raid autodetect /dev/sdd4 1652 30401 230934375 fd Linux raid autodetect it seems the externally attached sata drives are there, again. chekcing, they're also available in yast partitioner. i've attached the output of: dmesg | egrep -i "^raid|^scsi|^ata|^md|^sd|^pata|^sata" note that this boot was *without* the previously required kernel opts: iommu=force irqpoll also, i'm not sure what the "doesn't contain a valid partition table" messages are about ... of course, no option to test -xen, yet, with this kernel. one thing at a time ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c37 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #37 from Tejun Heo <teheo@novell.com> 2009-02-05 18:32:44 MST --- Hmm... have no idea what went wrong with the initial mkinitrd, strange. Anyways, can you please verify that the drives are actually functional? Make a fs, copy files, run md5sum on them, etc.. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c38 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #38 from pgnet _ <pgnet.trash@gmail.com> 2009-02-05 21:00:16 MST --- Created an attachment (id=270661) --> (https://bugzilla.novell.com/attachment.cgi?id=270661) 'exercise' of sata_sil24 attached drives with debug kernel
please verify that the drives are actually functional? Make a fs, copy files, run md5sum on them, etc..
here are the notes from a once-through on the drives. just a few small files ... not coverage of the whole drives ... but, afaict, looks good. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c39 --- Comment #39 from pgnet _ <pgnet.trash@gmail.com> 2009-02-05 23:05:59 MST --- Created an attachment (id=270674) --> (https://bugzilla.novell.com/attachment.cgi?id=270674) console output of subsequent failure to boot well ... despite working seemingly OK, the box hangs at subsequent reboot. relevant serial console output is attached. only way to get back to a prompt is single-user boot, and remove the /etc/fstab mount entries for the sata_sil24 attached devices. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c40 --- Comment #40 from pgnet _ <pgnet.trash@gmail.com> 2009-02-05 23:20:08 MST --- if i remove the /etc/fstab entries, i can again boot to the debug kernel, uname -a Linux server 2.6.27.14-bug463829_dbg0_d49a1108-default #1 SMP 2009-02-05 15:21:23 +0100 x86_64 x86_64 x86_64 GNU/Linux and, the drives are there, Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x0008de85 Device Boot Start End Blocks Id System /dev/sda1 1 79025 634768281 fd Linux raid autodetect /dev/sda2 79026 121601 341991720 fd Linux raid autodetect Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x000f14a1 Device Boot Start End Blocks Id System /dev/sdb1 1 79025 634768281 fd Linux raid autodetect /dev/sdb2 79026 121601 341991720 fd Linux raid autodetect if i reenable the /etc/fstab entries, then mount -a mount: special device /dev/VG03/D1 does not exist mount: special device /dev/VG04/D2 does not exist mount: special device /dev/VG04/D3 does not exist checking, ls -1d /dev/VG* /dev/VG_Dom0/ /dev/VG_DomU/ /dev/VG_Swap/ the VGs & LVs I'd created -- and used -- df -H | egrep "D1|D2|D3" /dev/mapper/VG03-LV_D1 650G 4.4M 650G 1% /home/stor/D1 /dev/mapper/VG04-LV_D2 228G 197M 216G 1% /home/stor/D2 /dev/mapper/VG04-LV_D3 118G 197M 112G 1% /home/stor/D3 are no longer there :-/ something in my 'process'? or a problem with the kernel? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User fifachen@sina.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c41 --- Comment #41 from peer chen <fifachen@sina.com> 2009-02-09 02:52:02 MST --- We found the related bug in database and wrong BIOS setting cause this issue. Please blacklist this model platform and use software iommu for 4g above DMA access since the latest BIOS also have same problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c42 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |teheo@novell.com --- Comment #42 from pgnet _ <pgnet.trash@gmail.com> 2009-02-09 08:38:12 MST --- (In reply to comment #41)
We found the related bug in database and wrong BIOS setting cause this issue. Please blacklist this model platform and use software iommu for 4g above DMA access since the latest BIOS also have same problem.
re: iommu (=force?), should this be used with your test kernel? wwould that be possibly responsible for the last result, i.e. problems with the missing /etc/fstab mounts? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c43 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|teheo@novell.com |pgnet.trash@gmail.com --- Comment #43 from Tejun Heo <teheo@novell.com> 2009-02-09 22:02:36 MST --- Peer Chen, thanks for verifying but the problem is that the debug kernel disables DAC (64bit DMA) on the board and pgnet_ is still seeing data corruption. Maybe something wrong with GART IOMMU too? pgnet_, can you please try "iommu=soft"? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c44 --- Comment #44 from pgnet _ <pgnet.trash@gmail.com> 2009-02-09 22:54:56 MST --- (In reply to comment #43)
pgnet_, can you please try "iommu=soft"?
certainly. working on a couple of other bugs ... will get to this asap. just to check, iommu=force or iommu=soft? and, irqpoll too, or not? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c45 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #45 from pgnet _ <pgnet.trash@gmail.com> 2009-02-10 00:24:12 MST --- with, iommu=soft the results are the same as documented in the attachment above "exercise' of sata_sil24 attached drives with debug kernel" https://bugzilla.novell.com/attachment.cgi?id=270674 everything works with the attached drives -- intially @ reboot with mounts in /etc/fstab, failed boot/crash until /etc/fstab entries are removed. then, not trace of the VGs i created on the drives. let me know if its worthwhile trying different kernel options at this point. thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c46 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #46 from Tejun Heo <teheo@novell.com> 2009-02-10 08:17:01 MST --- Can you please also try iommu=force? Ah.. strange. There should be no DAC DMA cycle with the patch applied. The detection went okay but your drive is either not getting or able to reply the correct data. Can you please test with more simplistic method? ie. dd a file to the raw device, reboot, dd in the written file multiple times and see whether the checksum matches the original and if not whether it changes on each read? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c47 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #47 from pgnet _ <pgnet.trash@gmail.com> 2009-02-10 11:07:07 MST --- Created an attachment (id=271618) --> (https://bugzilla.novell.com/attachment.cgi?id=271618) tests of iommu= settings with kernel-default & kernel-bug... these results, using 'simpler' format & test, are different & confusing ... kernel-default + iommu=soft -> NO DISKS kernel-default + iommu=force -> OK kernel-bug... + iommu=force -> OK kernel-bug... + iommu=soft -> OK kernel-default + (NO iommu=...) -> kernel OOPS could the reformat using gParted have 'cleared up' some disk corruption that cfdisk might have missed? next? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c48 --- Comment #48 from pgnet _ <pgnet.trash@gmail.com> 2009-02-10 12:03:13 MST --- Created an attachment (id=271642) --> (https://bugzilla.novell.com/attachment.cgi?id=271642) with & without iommu=soft, only kernel-bug ... as long as iommu=soft, kernel-bug... boots with the simply-formatted, ext-attached disks OK next, check with LVM & RAID etc. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c49 --- Comment #49 from pgnet _ <pgnet.trash@gmail.com> 2009-02-10 12:22:23 MST --- Created an attachment (id=271652) --> (https://bugzilla.novell.com/attachment.cgi?id=271652) test RAID array creation on attached drives w/ kernel-debug+iommu=soft i'm able to boot to kernel-debug+iommu=soft, with /etc/fstab mounts defined. but, attempt to create RAID-1 array across attached drives fails with: mdadm: device /dev/sda1 not suitable for any style of array -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c50 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #50 from Tejun Heo <teheo@novell.com> 2009-02-10 18:16:46 MST --- Eh.. let's please leave out lvm, md or filesystem for now. They just add to complexity and if you keep the filesystem mounted over tests, it doesn't mean much because the system caches file contents in memory. Please do the followings. 1. Boot kernel to test. No fstab entry or raid array. 2. dd if=TESTSOURCEFILE of=TESTTMPFILE bs=1M count=512 # change count as necessary 3. sha1sum TESTTMPFILE 4. dd if=TESTTMPFILE of=/dev/sdX bs=1M count=512 5. for ((i=0;i<3;i++)); do dd if=/dev/sdX of=TESTOUT bs=1M count=512; sha1sum TESTOUT; done Please make sure the device to test is not used by any FS before doing the test. The above test sequence will flush all the caches after each dd as they'll be the last users. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c51 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #51 from pgnet _ <pgnet.trash@gmail.com> 2009-02-10 20:27:45 MST --- 1. Boot kernel to test. No fstab entry or raid array. cat /etc/fstab | grep /dev/sd -> (empty) cat /proc/mdstat | egrep "sda|sdb" -> (empty) reboot (kernel opts: splash=silent showopts vga=0x31a console=tty0 console=ttyS0,57600n8) ... uname -ri 2.6.27.14-bug463829_dbg0_d49a1108-default x86_64
Please make sure the device to test is not used by any FS before doing the test. umount /dev/sda umount: /dev/sda: not mounted
2. dd if=testA.iso of=TESTTMPFILE bs=1M count=512 3. sha1sum TESTTMPFILE 0c9887a542db36a6f122f1f9f7a6ae1049e661ef TESTTMPFILE 4. dd if=TESTTMPFILE of=/dev/sda bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 21.6985 s, 24.7 MB/s 5. for ((i=0;i<3;i++)); do dd if=/dev/sda of=TESTOUT bs=1M count=512; sha1sum TESTOUT; done 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 9.23539 s, 58.1 MB/s 0c9887a542db36a6f122f1f9f7a6ae1049e661ef TESTOUT 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 8.19274 s, 65.5 MB/s 0c9887a542db36a6f122f1f9f7a6ae1049e661ef TESTOUT 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 9.90022 s, 54.2 MB/s 0c9887a542db36a6f122f1f9f7a6ae1049e661ef TESTOUT -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c52 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #52 from Tejun Heo <teheo@novell.com> 2009-02-11 00:55:41 MST --- Thanks, and after reboot, you still get the same values when you repeat only the reading and checksumming part of the test? If so, can you please try again with something larger than 4G? It's really strange.... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c53 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #53 from pgnet _ <pgnet.trash@gmail.com> 2009-02-11 09:54:17 MST --- (In reply to comment #52)
Thanks, and after reboot, you still get the same values when you repeat only the reading and checksumming part of the test?
apparently, yes. @ previous test, sha1sum TESTOUT 0c9887a542db36a6f122f1f9f7a6ae1049e661ef TESTOUT checking, uname -ri 2.6.27.14-bug463829_dbg0_d49a1108-default x86_64 reboot ... for ((i=0;i<3;i++)); do dd if=/dev/sda of=TESTOUT bs=1M count=512; sha1sum TESTOUT; done 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 11.176 s, 48.0 MB/s 0c9887a542db36a6f122f1f9f7a6ae1049e661ef TESTOUT 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 2.8667 s, 187 MB/s 0c9887a542db36a6f122f1f9f7a6ae1049e661ef TESTOUT 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 3.58828 s, 150 MB/s 0c9887a542db36a6f122f1f9f7a6ae1049e661ef TESTOUT
If so, can you please try again with something larger than 4G?
re-do, with > 4GB dd if=/dev/urandom of=TESTTMPFILE bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 1045.78 s, 5.1 MB/s sha1sum TESTTMPFILE 06be6cd14f9ff6555da2bd8573d9e56f292330a3 TESTTMPFILE dd if=TESTTMPFILE of=/dev/sda bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 168.142 s, 31.9 MB/s for ((i=0;i<3;i++)); do dd if=/dev/sda of=TESTOUT bs=1M count=5120; sha1sum TESTOUT; done 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 79.4242 s, 67.6 MB/s 06be6cd14f9ff6555da2bd8573d9e56f292330a3 TESTOUT 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 85.0126 s, 63.2 MB/s 06be6cd14f9ff6555da2bd8573d9e56f292330a3 TESTOUT 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 76.8508 s, 69.9 MB/s 06be6cd14f9ff6555da2bd8573d9e56f292330a3 TESTOUT reboot ... for ((i=0;i<3;i++)); do dd if=/dev/sda of=TESTOUT bs=1M count=5120; sha1sum TESTOUT; done 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 92.8563 s, 57.8 MB/s 06be6cd14f9ff6555da2bd8573d9e56f292330a3 TESTOUT 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 91.8821 s, 58.4 MB/s 06be6cd14f9ff6555da2bd8573d9e56f292330a3 TESTOUT 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 82.8299 s, 64.8 MB/s 06be6cd14f9ff6555da2bd8573d9e56f292330a3 TESTOUT
It's really strange ...
!! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c54 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #54 from Tejun Heo <teheo@novell.com> 2009-02-11 18:42:17 MST --- Hmmm... It looks like everything is working properly from the ATA driver's POV. You don't seem to be experiencing data corruption from failing DMA cycles at least. Can you please try put on a filesystem on it and see how it works (without lvm)? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c55 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #55 from pgnet _ <pgnet.trash@gmail.com> 2009-02-12 09:45:03 MST --- cfdisk /dev/sda mkfs.ext3 -L EXTA /dev/sda1 mke2fs 1.41.1 (01-Sep-2008) Filesystem label=EXTA OS type: Linux ... mount /dev/sda1 /mnt/EXTA mount | grep sda /dev/sda1 on /mnt/EXTA type ext3 (rw) dd if=/dev/urandom of=TESTTMPFILE bs=1M count=5120 sha1sum TESTTMPFILE 35f9d84b37e215e212682863b53e7728ad0272d5 TESTTMPFILE cp TESTTMPFILE /mnt/EXTA/ sha1sum /mnt/EXTA/TESTTMPFILE 35f9d84b37e215e212682863b53e7728ad0272d5 /mnt/EXTA/TESTTMPFILE for ((i=0;i<3;i++)); do cp -f /mnt/EXTA/TESTTMPFILE TESTOUT; sha1sum TESTOUT; done 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT reboot ... ?? server:/home/work # Write failed: Host is down .. mount /dev/sda1 /mnt/EXTA for ((i=0;i<3;i++)); do cp -f /mnt/EXTA/TESTTMPFILE TESTOUT; sha1sum TESTOUT; done 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c56 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #56 from Tejun Heo <teheo@novell.com> 2009-02-15 20:10:07 MST --- pgnet_, thanks for verifying. I don't have much experience with lvm but as far as I can see the disk itself is now working properly. Can you please try the lvm again? I don't really see why it wouldn't work. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c57 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #57 from pgnet _ <pgnet.trash@gmail.com> 2009-02-15 20:29:19 MST --- (In reply to comment #56)
pgnet_, thanks for verifying. I don't have much experience with lvm but as far as I can see the disk itself is now working properly. Can you please try the lvm again? I don't really see why it wouldn't work.
hi, here's the ext3-on-lvm scenario i think you want ... cfdisk /dev/sda -> Linux LVM pvcreate /dev/sda1 vgcreate -s 32 /dev/VGTEST /dev/sda1 lvcreate -n LVTEST -l 100%FREE /dev/VGTEST mkfs.ext3 -L EXTLVA /dev/VGTEST/LVTEST mke2fs 1.41.1 (01-Sep-2008) Filesystem label=EXTLVA OS type: Linux ... mount /dev/VGTEST/LVTEST /mnt/EXTLVA mount | grep LVTEST /dev/mapper/VGTEST-LVTEST on /mnt/EXTLVA type ext3 (rw) sha1sum TESTTMPFILE 35f9d84b37e215e212682863b53e7728ad0272d5 TESTTMPFILE cp TESTTMPFILE /mnt/EXTLVA/ sha1sum /mnt/EXTLVA/TESTTMPFILE 35f9d84b37e215e212682863b53e7728ad0272d5 /mnt/EXTLVA/TESTTMPFILE for ((i=0;i<3;i++)); do cp -f /mnt/EXTLVA/TESTTMPFILE TESTOUT; sha1sum TESTOUT; done 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT reboot .. mount /dev/VGTEST/LVTEST /mnt/EXTLVA for ((i=0;i<3;i++)); do cp -f /mnt/EXTLVA/TESTTMPFILE TESTOUT; sha1sum TESTOUT; done 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT 35f9d84b37e215e212682863b53e7728ad0272d5 TESTOUT -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |teheo@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c58 --- Comment #58 from Tejun Heo <teheo@novell.com> 2009-02-24 23:20:28 MST --- Sorry about the delay but it looks like it's working, or am I missing something? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c59 --- Comment #59 from pgnet _ <pgnet.trash@gmail.com> 2009-02-25 08:34:37 MST --- hi, per your earlier admonition,
let's please leave out lvm, md or filesystem for now.
i was just testing step-by-step in response to your direct requests; still hasn't been tested/verified with raid & lvm-on-raid. also, there's no 'debug' kernel-xen (which is my ulitimate target) to test. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c60 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|teheo@novell.com |pgnet.trash@gmail.com --- Comment #60 from Tejun Heo <teheo@novell.com> 2009-02-25 17:32:26 MST --- Yeap, seeing that raw partitions work just fine (have you been using them for the time being? Have you noticed anything?), dm/md/lvm should work just fine. Can you please try that? As for xen, if the default kernel is fixed, I think it will behave the same. There's nothing arch or vm specific. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c61 --- Comment #61 from pgnet _ <pgnet.trash@gmail.com> 2009-02-25 17:53:33 MST --- (In reply to comment #60)
have you been using them for the time being? Have you noticed anything?
no, i've not. this box spends most of its time in kernel-xen, working on a series of other issues. as your debug kernels are atm non-xen only, testing has been a "stop everything else", occasional effort. sorry.
dm/md/lvm should work just fine. Can you please try that?
yes, i'll get that started ...
As for xen, if the default kernel is fixed, I think it will behave the same. There's nothing arch or vm specific.
i'll be curious to see, if only as the "fails" for kernel-default & kernel-xen, reported above, looked different. of course, there's likely not much stock to be taken in comparing those ... if building a debug kernel-xen is doable on your end, i'll be happy to give that a whirl. either way, i'll post back here as soon as i've got something to report. thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c62 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #62 from pgnet _ <pgnet.trash@gmail.com> 2009-02-25 22:48:50 MST --- Created an attachment (id=275544) --> (https://bugzilla.novell.com/attachment.cgi?id=275544) test of debug kernel with FS on LVM on RAID_1 summary: boot to "debug" kernel-default; @ grub, root (hd0,0) kernel /vmlinuz-2.6.27.14-bug463829_dbg0_d49a1108-default root=/dev/VG_Dom0/LV_ROOT resume=/dev/VG_Swap/LV_SWAP splash=silent showopts vga=0x31a console=tty0 console=ttyS0,57600n8 initrd /initrd-2.6.27.14-bug463829_dbg0_d49a1108-default /dev/sda -> sda1, sda2 /dev/sdb -> sdb1, sdb2 /dev/md4 -> sda1 + sdb1 VG, LVM 100%, xfs /dev/md5 -> sda2 + sdb2 VG, LVM 66%, ext3 VG, LVM 33%, ext3 manual mount(s) multiple > 5GB file-system copies, sha1sums OK reboot manual mount(s) multiple > 5GB file-system copies, sha1sums OK mod /etc/fstab reboot mounts OK multiple > 5GB file-system copies, sha1sums OK looks good, with no kernel cmd_line mods required in grub. (question: really no "iommu=..." required/recommended ?) 'gory details' -> attachment ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c63 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |pgnet.trash@gmail.com --- Comment #63 from Tejun Heo <teheo@novell.com> 2009-02-26 04:02:08 MST --- Yes, no iommu parameter necessary. Can you please keep it running for a few days and see whether anything explodes? I'm a bit worried because you previously reported lvm didn't work even with the patched kernel. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c64 --- Comment #64 from pgnet _ <pgnet.trash@gmail.com> 2009-02-26 08:29:50 MST --- (In reply to comment #63)
Yes, no iommu parameter necessary.
thanks for clarifying.
Can you please keep it running for a few days and see whether anything explodes?
that's going to be a challenge if not in xen ... overnight (but of course, not 'using' the drives) is doable. since yesterday, the two 'new' raid arrays have been 'RESYNCING' (that sure takes awhile!). /dev/md5 is 100% done; md4 is 74% done, atm. afaict, no errors.
I'm a bit worried because you previously reported lvm didn't work even with the patched kernel.
as am i. however, since earlier tests, i've: -- ensured i 'pvremove -ff' before 'pvcreate'. i've found that simply reformatting doesn't always (that's what's confusing ... seems to depend on where i stop/restart the manual procedures) clear all the info off the drive. if there's an errant pv around, that'll cause problems. -- made sure to _manually_ add the new raid arrays' UUIDs to mdadm.conf. "Partitioner" seems to do it automatically, but manually creating the arrays does not. without the mod, the raid arrays don't start and the lvms on the raid arrays are never recognized. i'm not sure whether there's a step in the manual process (mdadm ... ?) that _should_ do the mdadm.conf mods. :-/
Thanks.
you too. this has been a long thread -- not much help from me. your help's appreciated. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c65 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|pgnet.trash@gmail.com |teheo@novell.com --- Comment #65 from pgnet _ <pgnet.trash@gmail.com> 2009-02-26 08:39:51 MST --- any reasonable chance of getting a kernel-xen version of your debug instance? with that, i can easily stress-test on this box for an extended period while still getting some other work done ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c66 --- Comment #66 from pgnet _ <pgnet.trash@gmail.com> 2009-02-26 10:27:14 MST --- re: the fail to assemble the raid array @ bott without prior edit/add of mdadm.conf, @ http://www.mail-archive.com/linux-raid@vger.kernel.org/msg10641.html "If you have your disk/controller and md drivers built into the kernel, AND marked the partitions as "linux raid autodetect", kernel may assemble them right at boot. But I don't remember if the kernel will even consider v.1 superblocks for its auto- assembly. In any way, don't rely on the kernel to do this work, in-kernel assembly code is very simplistic and works up to a moment when anything changes/breaks. It's almost the same code as was in old raidtools..." per above, i _have_ marked those partitions (/dev/sd{a,b}{1,2}) as "linux raid autodetect", w/ cfdisk. the comments above seem to imply that under 'sufficient' conditions, the kernel may (can? will?) assemble the array correctly at boot. and, it seems that's _without_ prior manual edit of mdadm.conf. if that's true (is it?), then the question is why, with the debug kernel in-place, are those arrays NOT getting assembled without the mdadm.conf manual edit? is the info above wrong, are mdadm or kernel assembly not working in-general, or is it something in the debug kernel's implementation? thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c67 --- Comment #67 from pgnet _ <pgnet.trash@gmail.com> 2009-02-26 21:12:54 MST --- just read, @ http://www.linuxsecurity.com/content/view/148081/ "The Linux kernel on openSUSE 11.1 was updated to the stable version2.6.27.19 and is also now at the same kernel as we are planning to ship with SUSE Linux Enterprise (Server/Desktop) 11." afaict, your "debug kernel" changes are _not_ included in that update. will they be? most importantly for us -- will they be for SLES 11? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c68 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|teheo@novell.com |pgnet.trash@gmail.com --- Comment #68 from Tejun Heo <teheo@novell.com> 2009-02-26 22:31:52 MST --- Okay, here's the xen kernel. http://htj.dyndns.org/export/testing/sl111-x86_64-xen-bug463829_dbg0/ As for the autodetection, md is not built into the kernel. You can trigger the auto scan manually tho. At any rate, using the yast storage tool to configure devices should do the right thing. And the patch of course is not in SL111 or SLE11 tree yet. It hasn't been verified to fix the problem. It will be included in the SL111 and SLE11 kernel when it's verified (will probably be released as part of kernel update). Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c69 --- Comment #69 from pgnet _ <pgnet.trash@gmail.com> 2009-02-26 22:57:23 MST --- (In reply to comment #68)
Okay, here's the xen kernel.
http://htj.dyndns.org/export/testing/sl111-x86_64-xen-bug463829_dbg0/
thank you. trying to install your rpms, rpm -i kernel-xen-*2.6.27.18-bug* package kernel-xen-base-2.6.27.19-3.2.1.x86_64 (which is newer than kernel-xen-base-2.6.27.18-bug463829_dbg0_06e5222a.x86_64) is already installed package kernel-xen-2.6.27.19-3.2.1.x86_64 (which is newer than kernel-xen-2.6.27.18-bug463829_dbg0_06e5222a.x86_64) is already installed package kernel-xen-extra-2.6.27.19-3.2.1.x86_64 (which is newer than kernel-xen-extra-2.6.27.18-bug463829_dbg0_06e5222a.x86_64) is already installed _can_ these versions safely co-exist? or should I first de-install the OS distro's updated/isntalled kernel-xen*? likely a minor issue ... just new to me.
At any rate, using the yast storage tool to configure devices should do the right thing.
fair enough. will verify when i get there ...
It will be included in the SL111 and SLE11 kernel when it's verified (will probably be released as part of kernel update).
so it _is_ on the SLES track -- eventually. all i needed. thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c70 --- Comment #70 from Tejun Heo <teheo@novell.com> 2009-02-26 23:01:06 MST --- Force upgrading it (-U --force) should work. It will replace the original kernel. Hmmm... not sure whether you can install it side-by-side tho. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c71 --- Comment #71 from pgnet _ <pgnet.trash@gmail.com> 2009-02-26 23:31:51 MST ---
Force upgrading it (-U --force) should work.
ok. just wanted to make sure nothing'd 'blow up' ;-)
It will replace the original kernel.
actually, not here. it replaces the _symlinks_, /boot/{vmlinuz,initrd}-xen, pointing them at your kernels. everything is left in place. oddly, after 'messing' with the symlinks, grub/menu.lst is auto-modified with a new entry with direct file names, not with the symlinks.
Hmmm... not sure whether you can install it side-by-side tho.
well, at least everything's "there". anyway, trying the debug kernel-xen, @boot it locks up completely. looks line no raid & no lvms are found. switch back to debug kernel-default, and it works. now, to start backtracking and find what/where the problem first appears. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c72 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|teheo@novell.com |pgnet.trash@gmail.com --- Comment #72 from pgnet _ <pgnet.trash@gmail.com> 2009-02-27 12:42:19 MST --- Created an attachment (id=276115) --> (https://bugzilla.novell.com/attachment.cgi?id=276115) serial console output comparing boot to -xen & -default debug kerns in the attached zip file, there's 2 txt files debug-default.txt debug-xen.txt containing snips from the console output for the OK-boot to kernel-default & FAIL'd-boot to kernel-xen, respectively. note the differences @ '1st encounter' with the attached sata_sil24 drives. something's clearly different -- is it likely a result of the kernel, or xen itslef? fyi, boot to distro's kernel-xen & kernel-default (i.e., without ext sata drives in use) is OK as well. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c73 --- Comment #73 from pgnet _ <pgnet.trash@gmail.com> 2009-02-27 19:49:13 MST --- Created an attachment (id=276171) --> (https://bugzilla.novell.com/attachment.cgi?id=276171) results modprobe remove/reinsert sata_sil24 fwiw, searching on/abt, "link online but device misclassified, retrying" reading here, http://article.gmane.org/gmane.linux.ide/36393 for, uname -ri 2.6.27.18-bug463829_dbg0_06e5222a-xen x86_64 with attahced drives currently 'missing', i checked, modprobe -r sata_sil24 Feb 27 18:34:04 server kernel: vendor=10de device=026f Feb 27 18:34:04 server kernel: sata_sil24 0000:04:07.0: PCI INT A disabled then, @ modprobe sata_sil24 the ext-attached devices (here [sdc] & [sdd]) are seen & attached (?), but (i) i don't see them in fdisk (ii) msg 'model number mismatch' is new ... attachment has the console output @ modprobe ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c74 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |teheo@novell.com --- Comment #74 from Tejun Heo <teheo@novell.com> 2009-03-01 05:19:09 MST --- Eh... looks like irq routing problem. Have no idea whatsoever what's wrong with the xen build tho. I've never used xen before. Does irqpoll help? If not, can you please try to verify the original fix with the default kernel to a reasonable level (overnight stress test or something)? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c75 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|pgnet.trash@gmail.com |teheo@novell.com --- Comment #75 from pgnet _ <pgnet.trash@gmail.com> 2009-03-01 07:59:23 MST --- (In reply to comment #74)
Eh... looks like irq routing problem. Have no idea whatsoever what's wrong with the xen build tho. I've never used xen before.
afaict so far, at least everything _else_ in the build is behaving -- DomUs are fine,internal RAID (sata_nv) is OK.
Does irqpoll help?
apparently not :-/ same symptoms (no drives, console messages, ...) as without it.
If not, can you please try to verify the original fix with the default kernel to a reasonable level (overnight stress test or something)?
sure. i assume you mean with the 'debug' kernel-default, 2.6.27.14-bug463829_dbg0_d49a1108-default. does "stress test" have a defined meaning to you, like a specific test suite? or, just, e.g., filesystem copies to and from the disk? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c76 --- Comment #76 from pgnet _ <pgnet.trash@gmail.com> 2009-03-01 08:56:33 MST --- digging around, i've found "bonnie++", which looks like the right sort of 'stress'. i'll let bonnie++ -d /home/stor/BACKUPS/ -u pgn:users -x 512 run as the aformentioned stress test & report back. if there's a different test you'd suggest, let me know. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c77 --- Comment #77 from Tejun Heo <teheo@novell.com> 2009-03-01 17:12:15 MST --- bonnie++ is fine but it would be nice to have something which can verify data integrity in parallel. ie. repeatedly making n copies of a large file and checksumming them. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c78 --- Comment #78 from pgnet _ <pgnet.trash@gmail.com> 2009-03-01 22:51:38 MST --- (In reply to comment #77)
bonnie++ is fine but it would be nice to have something which can verify data integrity in parallel. ie. repeatedly making n copies of a large file and checksumming them. Thanks.
no problem. added it in, and will let it crunch ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c79 --- Comment #79 from pgnet _ <pgnet.trash@gmail.com> 2009-03-02 11:07:10 MST --- fwiw, the simple script below has been chugging away since my last post (2009-03-01 22:51:38 MST). so far, no bonnie++ errors, & all checksums have been consistent/correct ... ----------- #!/bin/bash DIRS=( LV01 LV02 LV03 ) for ((i=0;i<256;i++)); do for F in "${DIRS[@]}"; do for ((j=0;j<3;j++)); do cp -f /home/stor/$F/T_IN /home/work/T_OUT sha1sum /home/work/T_OUT done bonnie++ -d /home/stor/$F/ -u pgn:users -x 1 -s 15720m -n100 for ((j=3;j<6;j++)); do cp -f /home/stor/$F/T_IN /home/work/T_OUT sha1sum /home/work/T_OUT done cp -f /home/work/T_IN /home/stor/$F/ sha1sum /home/stor/$F/T_IN done done ----------- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c80 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Info Provider|teheo@novell.com |pgnet.trash@gmail.com --- Comment #80 from Tejun Heo <teheo@novell.com> 2009-03-02 17:18:13 MST --- Great, just in case, the file is larger than memory size, right? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c81 --- Comment #81 from pgnet _ <pgnet.trash@gmail.com> 2009-03-02 17:30:14 MST ---
the file is larger than memory size, right
agh. nope. it's 5GB -- i.e., "greater than 4GB" -- which is what we were doing earlier. i've 8GB RAM. i can quite easily swith to a 9GB file ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c82 --- Comment #82 from Tejun Heo <teheo@novell.com> 2009-03-02 17:37:09 MST --- Yes, or you can just change the script such that the output files are first copied to T_OUT_$j and after all the copies are complete, sha1sum is run on each T_OUT files. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c83 --- Comment #83 from pgnet _ <pgnet.trash@gmail.com> 2009-03-02 17:52:33 MST --- I went the single, 9GB file route ... I'll let it run @ least overnight again. In the meantime, is there any info I can provide that'll help with the -xen issue? Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c84 --- Comment #84 from Tejun Heo <teheo@novell.com> 2009-03-02 18:01:45 MST --- Thanks. Regarding xen, I probably did something wrong while building it. If the -default kernel works fine with the patch, -xen should too. After checking the fix patch in, you can try the KOTD -xen kernel. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c85 --- Comment #85 from pgnet _ <pgnet.trash@gmail.com> 2009-03-02 18:22:19 MST ---
After checking the fix patch in,
check it in to where? i have no repository rights ... or do you mean _you_ have checked something in?
you can try the KOTD -xen kernel.
ok, that's "kernel of the day", it seems. where do those hide? nothing on Webpin ... you don't mean these: http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_Factory do you ? sorry, some things are not (yet) obvious to us mere mortals -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c86 --- Comment #86 from Tejun Heo <teheo@novell.com> 2009-03-02 18:33:18 MST --- (In reply to comment #85)
After checking the fix patch in,
check it in to where? i have no repository rights ...
or do you mean _you_ have checked something in?
Yeap, I meant me checking into SUSE kernel tree.
you can try the KOTD -xen kernel.
ok, that's "kernel of the day", it seems. where do those hide? nothing on Webpin ...
you don't mean these:
http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_Factory
do you ?
sorry, some things are not (yet) obvious to us mere mortals
Sorry about not being clear. The following link actually. http://ftp.suse.com/pub/projects/kernel/kotd/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c87 --- Comment #87 from pgnet _ <pgnet.trash@gmail.com> 2009-03-02 18:55:54 MST --- (In reply to comment #86)
Yeap, I meant me checking into SUSE kernel tree.
good. much better for _everyone_ involved ...
Sorry about not being clear.
np!
The following link actually. http://ftp.suse.com/pub/projects/kernel/kotd/
aha. that's new 2 me. i'm assuming that -- at some point, post check-in -- it'll be either in HEAD/ or SLE11_BRANCH/ (even though i'm, atm, on openSUSE 11.1 ...) whenever, please drop a note -- here, or 2 me -- and i'll try things out. thanks. p.s. fyi, the stress-test, with 9GB (> RAM) files is through just a couple of runs, but, so far, checksums & bonnie are OK. more to come ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c88 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|pgnet.trash@gmail.com | --- Comment #88 from pgnet _ <pgnet.trash@gmail.com> 2009-03-03 09:13:15 MST --- fyi, overnight stress test, started @ ~ 7pm MST, 2009-03-02, no errors & checksums are all consistent ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c89 --- Comment #89 from Tejun Heo <teheo@novell.com> 2009-03-03 19:15:11 MST --- Great. :-) Will forward the patch upstream and commit it to SLE11_BRANCH. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c90 --- Comment #90 from pgnet _ <pgnet.trash@gmail.com> 2009-03-03 22:38:59 MST --- thanks! i'll subseqeuntly try out the -xen kernel, and report back with findings. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c91 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED --- Comment #91 from Tejun Heo <teheo@novell.com> 2009-03-03 22:51:42 MST --- Patches committed to SLE11_BRANCH. Resolving as FIXED. Please wait a day or two and try the SLE11_BRANCH kotd kernel for xen. Thanks. ------------------------------------------------------------------- Wed Mar 4 06:49:54 CET 2009 - teheo@suse.de - patches.arch/x86-fix-nodac: x86: fix iommu=nodac parameter handling (bnc#463829). - patches.arch/x86-mcp51-no-dac: x86: disallow DAC for MCP51 PCI bridge (bnc#463829). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c92 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | --- Comment #92 from pgnet _ <pgnet.trash@gmail.com> 2009-03-04 17:47:16 MST --- reopening for comment, or redirect. (In reply to comment #84)
Regarding xen, I probably did something wrong while building it. If the -default kernel works fine with the patch, -xen should too.
I'm not so sure it's 'you' :-/ please read on ... (In reply to comment #91)
Patches committed to SLE11_BRANCH. Resolving as FIXED.
Please wait a day or two and try the SLE11_BRANCH kotd kernel for xen. Thanks.
------------------------------------------------------------------- Wed Mar 4 06:49:54 CET 2009 - teheo@suse.de
- patches.arch/x86-fix-nodac: x86: fix iommu=nodac parameter handling (bnc#463829). - patches.arch/x86-mcp51-no-dac: x86: disallow DAC for MCP51 PCI bridge (bnc#463829).
noting, rpm -qp --changelog $LOC/kernel-xen-base-${VER1}.x86_64.rpm ... * Wed Mar 04 2009 teheo@suse.de - patches.arch/x86-fix-nodac: x86: fix iommu=nodac parameter handling (bnc#463829). - patches.arch/x86-mcp51-no-dac: x86: disallow DAC for MCP51 PCI bridge (bnc#463829). ... and installing, rpm -qa | grep SLE11_BRANCH kernel-default-extra-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9 kernel-default-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9 kernel-xen-base-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9 kernel-xen-extra-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9 kernel-default-base-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9 kernel-xen-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9 booting to, ###Don't change this comment - YaST2 identifier: Original name: linux### title openSUSE 11.1 - 2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9 root (hd0,0) kernel /vmlinuz-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9-default root=/dev/VG_Dom0/LV_ROOT resume=/dev/VG_Swap/LV_SWAP splash=silent showopts vga=0x31a console=tty0 console=ttyS0,57600n8 initrd /initrd-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9-default is OK, uname -ri 2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9-default x86_64 and, the ext drive arrays are 'up', fdisk -l /dev/md4 /dev/md5 Disk /dev/md4: 650.0 GB, 650002580480 bytes 2 heads, 4 sectors/track, 158692036 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md4 doesn't contain a valid partition table Disk /dev/md5: 350.1 GB, 350199382016 bytes 2 heads, 4 sectors/track, 85497896 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md5 doesn't contain a valid partition table and, the LV's mounted, df -H | grep stor/ 650G 9.7G 641G 2% /home/stor/MEDIA 228G 9.9G 207G 5% /home/stor/BACKUPS 118G 9.9G 102G 9% /home/stor/DATA but, booting to the -xen variant, ###Don't change this comment - YaST2 identifier: Original name: xen### title Xen -- openSUSE 11.1 - 2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9 root (hd0,0) kernel /xen.gz dom0_mem=768M loglvl=all loglvl_guest=all vga=gfx-1280x1024x32 console=vga,com1 com1=57600,8n1 module /vmlinuz-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9-xen root=/dev/VG_Dom0/LV_ROOT resume=/dev/VG_Swap/LV_SWAP showopts splash=silent vga=0x31a console=tty0 console=xvc0,57600 elevator=cfq max_loop=64 module /initrd-2.6.27.19-SLE11_BRANCH_20090304073920_1eb029c9-xen crash with the same symptoms as reported above -- no drives, and no OK boot :-/ if the problem's not @kernel (is it?), then is it xen, mdadm, or something else? and, should this be _moved_ 'there', or others simply 'invited in' here? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |NEEDINFO Info Provider| |teheo@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c93 --- Comment #93 from Tejun Heo <teheo@novell.com> 2009-03-04 18:54:32 MST --- Does iommu=usedac change anything for xen? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c94 pgnet _ <pgnet.trash@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |REOPENED Info Provider|teheo@novell.com | --- Comment #94 from pgnet _ <pgnet.trash@gmail.com> 2009-03-04 19:08:10 MST --- adding, iommu=usedac to the new -xen stanza @grub, kernel /xen.gz iommu=usedac ... unfortunately makes no difference; crash as above. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c95 --- Comment #95 from Tejun Heo <teheo@novell.com> 2009-03-04 19:11:13 MST --- What do you mean by 'crash'? The sil24 timeouts? Can you attach the kernel log? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c96 --- Comment #96 from pgnet _ <pgnet.trash@gmail.com> 2009-03-04 19:26:38 MST --- Created an attachment (id=277210) --> (https://bugzilla.novell.com/attachment.cgi?id=277210) console output for kernel-xen boot er ... realizing it likely has specific/limited meaning to you, i simply meant that, as before, i'm dropped to a root-login @ 'maintenance mode', failing to get completely booted-up, and unable to mount '/boot' at all. that said, per request, here's the output @console -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c97 --- Comment #97 from Tejun Heo <teheo@novell.com> 2009-03-04 19:31:34 MST --- Thanks. It looks like IRQ delivery problem for sil24 controller. If iommu=usedac doesn't change anything, it's likely to be a separate issue. Can you please file a separate bug report? Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c98 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED --- Comment #98 from Tejun Heo <teheo@novell.com> 2009-03-04 19:32:44 MST --- Resolving this one as FIXED. Oh.. when filing a separate one, please include the followings... * A reference to this bug. * The captured failing log from KOTD above. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=463829 User pgnet.trash@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=463829#c99 --- Comment #99 from pgnet _ <pgnet.trash@gmail.com> 2009-03-04 19:48:16 MST --- done: https://bugzilla.novell.com/show_bug.cgi?id=482220 thanks! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com