Hi, A dual opteron running SLES 8 (details below) I have has always shown some stability issues when the CPU is loaded and doing a lot of I/O. I have a 3ware raid, but I'm patched up so I'm hoping these aren't 3ware issues. Recently the instability has become very annoying. For exampling, running a R (http://www.r-project.org) process which calls Matlab many, many times locks the machine within 24 hours. The same code run on dual opterons from pengiun computing with RH ES works okay. I have some questions about some boot messages I'm getting. Are they indicating serious issue (related to the bios or something)? I have attached my boot.msg. It contains some odd stuff such as:
<4>CPU 0: aperture @ 9b48000000 size 32 MB <4>Aperture from northbridge cpu 0 too small (32 MB)
and <4>mtrr: type mismatch for fd000000,400000 old: write-back new: write-combining [more messages like this] <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x800 base: 0xfd000000 [more messages like this] Note that "cat /proc/mtrr" reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg01: base=0xfc000000 (4032MB), size= 64MB: uncachable, count=1 Are these warnings innocuous? System Description Poly 2200A Dual Opteron ATX MB w/VGA,SATA,2LAN [Motherboard is Arima HDAMA] 2xAMD Opteron Processor Model 244 (64bit-1.8GHz) 4GB RAM: 8xDDR 333MHz 512M SDRAM PC2700 Memory ECC/Register 3Ware 7000-2 Port IDE RAID Controller 2xWD 250GB Ultra-100 IDE 7200RPM 8MB HD Cheers, JS. BOOT.MSG Inspecting /boot/System.map-2.4.21-207-smp Loaded 14963 symbols from /boot/System.map-2.4.21-207-smp. Symbols match kernel version 2.4.21. Error seeking in /dev/kmem Symbol #lvm-mod, value a0070000 Error adding kernel module table entry. klogd 1.4.1, log source = ksyslog started. ok <4>Bootdata ok (command line is root=/dev/sda2 hda=ide-scsi vga=788 ) <4>Linux version 2.4.21-207-smp (root@x86_64.suse.de) (gcc version 3.2.2 (SuSE Linux)) #1 SMP Thu Mar 11 15:46:44 UTC 2004 <6>BIOS-provided physical RAM map: <4> BIOS-e820: 0000000000000000 - 000000000009f000 (usable) <4> BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) <4> BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved) <4> BIOS-e820: 0000000000100000 - 00000000fbf80000 (usable) <4> BIOS-e820: 00000000fbf80000 - 00000000fc000000 (reserved) <4> BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved) <4> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) <4> BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) <4>kernel direct mapping tables upto 10100000000 @ 8000-d000 <6>Scanning NUMA topology in Northbridge 24 <3>Node 0 using interleaving mode 1/0 <6>No NUMA configuration found <6>Faking a node at 0000000000000000-00000000fbf80000 <4>Bootmem setup node 0 0000000000000000-00000000fbf80000 <7>ACPI: have wakeup address 0x10000002000 <4>Scan SMP from 0000010000000000 for 1024 bytes. <4>Scan SMP from 000001000009fc00 for 1024 bytes. <4>Scan SMP from 00000100000f0000 for 65536 bytes. <6>found SMP MP-table at 00000000000f6b00 <4>hm, page 000f6000 reserved twice. <4>hm, page 000f7000 reserved twice. <4>hm, page 0009f000 reserved twice. <4>hm, page 000a0000 reserved twice. <4>setting up node 0 0-fbf80 <4>On node 0 totalpages: 1032064 <4>zone(0): 4096 pages. <4>zone(1): 1027968 pages. <4>zone(2): 0 pages. <3>ACPI: Unable to locate RSDP <4>Intel MultiProcessor Specification v1.4 <4> Virtual Wire compatibility mode. <4>OEM ID: AMD Product ID: HAMMER APIC at: 0xFEE00000 <4>Processor #0 15:5 APIC version 16 <4>Processor #1 15:5 APIC version 16 <4>I/O APIC #2 Version 17 at 0xFEC00000. <4>I/O APIC #3 Version 17 at 0xFC000000. <4>I/O APIC #4 Version 17 at 0xFC001000. <4>Processors: 2 <4>Checking aperture... <4>CPU 0: aperture @ 9b48000000 size 32 MB <4>Aperture from northbridge cpu 0 too small (32 MB) <4>No AGP bridge found <4>Building zonelist for node : 0 <4>Kernel command line: root=/dev/sda2 hda=ide-scsi vga=788 <6>ide_setup: hda=ide-scsi <4>Initializing CPU#0 <6>time.c: Detected 1.193182 MHz PIT timer. <6>time.c: Detected 1804.125 MHz TSC timer. <4>Console: colour dummy device 80x25 <4>Calibrating delay loop... 3591.37 BogoMIPS <4>Memory: 4029392k/4128256k available (1865k kernel code, 0k reserved, 1185k data, 168k init) <6>Dentry cache hash table entries: 131072 (order: 9, 2097152 bytes) <6>Inode cache hash table entries: 131072 (order: 9, 2097152 bytes) <6>Mount cache hash table entries: 256 (order: 0, 4096 bytes) <4>Buffer-cache hash table entries: 262144 (order: 9, 2097152 bytes) <4>Page-cache hash table entries: 262144 (order: 9, 2097152 bytes) <6>CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2 way) <6>CPU: L2 Cache: 1024K (64 bytes/line/8 way) <6>Machine Check Reporting enabled for CPU#0 <4>POSIX conformance testing by UNIFIX <4>mtrr: v2.02 (20020716)) <6>CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2 way) <6>CPU: L2 Cache: 1024K (64 bytes/line/8 way) <4>CPU0: AMD Opteron(tm) Processor 244 stepping 01 <4>per-CPU timeslice cutoff: 5117.91 usecs. <4>task migration cache decay timeout: 10 msecs. <4>Booting processor 1/1 rip 6000 page 00000100fb880000 <4>Initializing CPU#1 <4>Calibrating delay loop... 3604.48 BogoMIPS <6>CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2 way) <6>CPU: L2 Cache: 1024K (64 bytes/line/8 way) <6>Machine Check Reporting enabled for CPU#1 <4>CPU1: AMD Opteron(tm) Processor 244 stepping 01 <6>Total of 2 processors activated (7195.85 BogoMIPS). <4>ENABLING IO-APIC IRQs <4>Setting 2 in the phys_id_present_map <6>...changing IO-APIC physical APIC ID to 2 ... ok. <4>Setting 3 in the phys_id_present_map <6>...changing IO-APIC physical APIC ID to 3 ... ok. <4>Setting 4 in the phys_id_present_map <6>...changing IO-APIC physical APIC ID to 4 ... ok. <7>init IO_APIC IRQs <7> IO-APIC (apicid-pin) 2-0, 2-17, 2-20, 2-21, 2-22, 2-23, 3-0, 3-1, 4-0, 4-1, 4-2, 4-3 not connected. <6>..TIMER: vector=0x31 pin1=2 pin2=0 <7>number of MP IRQ sources: 23. <7>number of IO-APIC #2 registers: 24. <7>number of IO-APIC #3 registers: 4. <7>number of IO-APIC #4 registers: 4. <6>testing the IO APIC....................... <4> <7>IO APIC #2...... <7>.... register #00: 02000000 <7>....... : physical APIC id: 02 <7>.... register #01: 00170011 <7>....... : max redirection entries: 0017 <7>....... : PRQ implemented: 0 <7>....... : IO APIC version: 0011 <7>.... register #02: 02000000 <7>....... : arbitration: 02 <7>.... IRQ redirection table: <7> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: <7> 00 000 00 1 0 0 0 0 0 0 00 <7> 01 001 01 0 0 0 0 0 1 1 39 <7> 02 001 01 0 0 0 0 0 1 1 31 <7> 03 001 01 0 0 0 0 0 1 1 41 <7> 04 001 01 0 0 0 0 0 1 1 49 <7> 05 001 01 1 1 0 1 0 1 1 51 <7> 06 001 01 0 0 0 0 0 1 1 59 <7> 07 001 01 0 0 0 0 0 1 1 61 <7> 08 001 01 0 0 0 0 0 1 1 69 <7> 09 001 01 0 0 0 0 0 1 1 71 <7> 0a 001 01 1 1 0 1 0 1 1 79 <7> 0b 001 01 1 1 0 1 0 1 1 81 <7> 0c 001 01 0 0 0 0 0 1 1 89 <7> 0d 001 01 0 0 0 0 0 1 1 91 <7> 0e 001 01 0 0 0 0 0 1 1 99 <7> 0f 001 01 0 0 0 0 0 1 1 A1 <7> 10 001 01 1 1 0 1 0 1 1 A9 <7> 11 000 00 1 0 0 0 0 0 0 00 <7> 12 001 01 1 1 0 1 0 1 1 B1 <7> 13 001 01 1 1 0 1 0 1 1 B9 <7> 14 000 00 1 0 0 0 0 0 0 00 <7> 15 000 00 1 0 0 0 0 0 0 00 <7> 16 000 00 1 0 0 0 0 0 0 00 <7> 17 000 00 1 0 0 0 0 0 0 00 <4> <7>IO APIC #3...... <7>.... register #00: 03000000 <7>....... : physical APIC id: 03 <7>.... register #01: 00030011 <7>....... : max redirection entries: 0003 <7>....... : PRQ implemented: 0 <7>....... : IO APIC version: 0011 <7>.... register #02: 00000000 <7>....... : arbitration: 00 <7>.... IRQ redirection table: <7> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: <7> 00 000 00 1 0 0 0 0 0 0 00 <7> 01 000 00 1 0 0 0 0 0 0 00 <7> 02 001 01 1 1 0 1 0 1 1 C1 <7> 03 001 01 1 1 0 1 0 1 1 C9 <4> <7>IO APIC #4...... <7>.... register #00: 04000000 <7>....... : physical APIC id: 04 <7>.... register #01: 00030011 <7>....... : max redirection entries: 0003 <7>....... : PRQ implemented: 0 <7>....... : IO APIC version: 0011 <7>.... register #02: 00000000 <7>....... : arbitration: 00 <7>.... IRQ redirection table: <7> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: <7> 00 000 00 1 0 0 0 0 0 0 00 <7> 01 000 00 1 0 0 0 0 0 0 00 <7> 02 000 00 1 0 0 0 0 0 0 00 <7> 03 000 00 1 0 0 0 0 0 0 00 <7>IRQ to pin mappings: <7>IRQ0 -> 0:2 <7>IRQ1 -> 0:1 <7>IRQ3 -> 0:3 <7>IRQ4 -> 0:4 <7>IRQ5 -> 0:5 <7>IRQ6 -> 0:6 <7>IRQ7 -> 0:7 <7>IRQ8 -> 0:8 <7>IRQ9 -> 0:9 <7>IRQ10 -> 0:10 <7>IRQ11 -> 0:11 <7>IRQ12 -> 0:12 <7>IRQ13 -> 0:13 <7>IRQ14 -> 0:14 <7>IRQ15 -> 0:15 <7>IRQ16 -> 0:16 <7>IRQ18 -> 0:18 <7>IRQ19 -> 0:19 <7>IRQ26 -> 1:2 <7>IRQ27 -> 1:3 <6>.................................... done. <4>Using local APIC timer interrupts. <4>Detected 12.528 MHz APIC timer. <6>cpu: 0, clocks: 2004584, slice: 668194 <6>CPU0T0:2004576,T1:1336368,D:14,S:668194,C:2004584 <6>cpu: 1, clocks: 2004584, slice: 668194 <6>CPU1T0:2004576,T1:668176,D:12,S:668194,C:2004584 <4>checking TSC synchronization across CPUs: passed. <6>time.c: Using PIT/TSC based timekeeping. <4>migration_task 0 on cpu=0 <4>migration_task 1 on cpu=1 <6>ACPI: Subsystem revision 20030619 <6>PCI: Using configuration type 1 <3>ACPI: System description tables not found <4> ACPI-0084: *** Error: acpi_load_tables: Could not get RSDP, AE_NOT_FOUND <4> ACPI-0134: *** Error: acpi_load_tables: Could not load tables: AE_NOT_FOUND <3>ACPI: Unable to load the System Description Tables <6>PCI: Using configuration type 1 <6>PCI: Probing PCI hardware <4>PCI: Probing PCI hardware (bus 00) <6>PCI: Using IRQ router default [1022/746b] at 00:07.3 <6>PCI->APIC IRQ transform: (B1,I0,P3) -> 19 <6>PCI->APIC IRQ transform: (B1,I0,P3) -> 19 <6>PCI->APIC IRQ transform: (B1,I4,P0) -> 16 <6>PCI->APIC IRQ transform: (B1,I6,P0) -> 18 <6>PCI->APIC IRQ transform: (B2,I2,P0) -> 26 <6>PCI->APIC IRQ transform: (B2,I3,P0) -> 27 <6>PCI->APIC IRQ transform: (B2,I4,P0) -> 27 <6>Linux agpgart interface v0.99 (c) Jeff Hartmann <6>agpgart: Maximum main memory to use for agp memory: 3868M <7>agpgart: no supported devices found. <6>PCI-DMA: Disabling IOMMU. <6>Linux NET4.0 for Linux 2.4 <6>Based upon Swansea University Computer Society NET3.039 <4>Initializing RT netlink socket <4>Starting kswapd <4>bigpage subsystem: allocated 0 bigpages (=0MB). <4>kinoded started <5>VFS: Disk quotas vdquot_6.5.1 <5>aio_setup: num_physpages = 258016 <5>aio_setup: sizeof(struct page) = 88 <4>IA32 emulation $Id: sys_ia32.c,v 1.60 2003/07/11 15:58:45 ak Exp $ <6>vesafb: framebuffer at 0xfd000000, mapped to 0xffffff0000011000, size 8128k <6>vesafb: mode is 800x600x16, linelength=1600, pages=7 <6>vesafb: scrolling: redraw <6>vesafb: directcolor: size=0:5:6:5, shift=0:11:5:0 <4>mtrr: type mismatch for fd000000,400000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,200000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,100000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,80000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,40000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,20000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,10000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,8000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,4000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,2000 old: write-back new: write-combining <4>mtrr: type mismatch for fd000000,1000 old: write-back new: write-combining <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x800 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x400 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x200 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x100 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x80 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x40 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x20 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x10 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x8 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x4 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x2 base: 0xfd000000 <4>mtrr: size and base must be multiples of 4 kiB <4>mtrr: size: 0x1 base: 0xfd000000 <6>bootsplash 3.0.9-2003/09/08: looking for picture.... found (800x600, 10632 bytes, v2). <4>Console: switching to colour frame buffer device 99x30 <6>fb0: VESA VGA frame buffer device <4>pty: 256 Unix98 ptys configured <6>Serial driver version 5.05c (2001-07-08) with HUB-6 MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI enabled <6>ttyS00 at 0x03f8 (irq = 4) is a 16550A <6>ttyS01 at 0x02f8 (irq = 3) is a 16550A <6>Real Time Clock Driver v1.10e <6>Floppy drive(s): fd0 is 1.44M <6>FDC 0 is a National Semiconductor PC87306 <4>RAMDISK driver initialized: 16 RAM disks of 128000K size 1024 blocksize <6>loop: loaded (max 16 devices) <6>Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4 <6>ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx <6>AMD8111: IDE controller at PCI slot 00:07.1 <6>AMD8111: chipset revision 3 <6>AMD8111: not 100%% native mode: will probe irqs later <6>AMD_IDE: PCI device 1022:7469 (rev 03) UDMA133 controller on pci00:07.1 <6> ide0: BM-DMA at 0x1020-0x1027, BIOS settings: hda:DMA, hdb:pio <6> ide1: BM-DMA at 0x1028-0x102f, BIOS settings: hdc:pio, hdd:pio <4>hda: SONY CD-RW CRX220E1, ATAPI CD/DVD-ROM drive <4>ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 <6>ide-floppy driver 0.99.newide <6>ide-floppy driver 0.99.newide <6>md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 <6>md: Autodetecting RAID arrays. <6>md: autorun ... <6>md: ... autorun DONE. <6>NET4: Linux TCP/IP 1.0 for NET4.0 <6>IP Protocols: ICMP, UDP, TCP, IGMP <6>IP: routing cache hash table of 16384 buckets, 256Kbytes <6>TCP: Hash tables configured (established 131072 bind 65536) <6>Linux IP multicast router 0.06 plus PIM-SM <6>NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. <6>cryptoapi: loaded <5>RAMDISK: Compressed image found at block 0 <4>VFS: Mounted root (ext2 filesystem). <6>SCSI subsystem driver Revision: 1.00 <4>3ware Storage Controller device driver for Linux v1.02.00.032. <5>scsi0 : Found a 3ware Storage Controller at 0x3000, IRQ: 26, P-chip: 1.3 <6>scsi0 : 3ware Storage Controller <4> Vendor: 3ware Model: Logical Disk 0 Rev: 1.0 <4> Type: Direct-Access ANSI SCSI revision: 00 <4>Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 <4>SCSI device sda: 488395120 512-byte hdwr sectors (250058 MB) <6>Partition check: <6> sda: sda1 sda2 <4>hda: attached ide-scsi driver. <6>scsi1 : SCSI host adapter emulation for IDE ATAPI devices <4> Vendor: SONY Model: CD-RW CRX220E1 Rev: 6YS1 <4> Type: CD-ROM ANSI SCSI revision: 02 <4>reiserfs: found format "3.6" with standard journal <4>reiserfs: enabling write barrier flush mode <4>reiserfs: using ordered data mode <4>reiserfs: checking transaction log (device sd(8,2)) ... <4>for (sd(8,2)) <4>reiserfs: replayed 26 transactions in 14 seconds <4>Using r5 hash to sort names <4>VFS: Mounted root (reiserfs filesystem) readonly. <5>Trying to move old root to /initrd ... failed <5>Unmounting old root <5>Trying to free ramdisk memory ... okay <4>Freeing unused kernel memory: 168k freed <6>md: Autodetecting RAID arrays. <6>md: autorun ... <6>md: ... autorun DONE. <4>reiserfs: enabling write barrier flush mode <4>Removing [8701 705696 0x0 SD]..done <4>There were 1 uncompleted unlinks/truncates. Completed <6>lvm-mp: allocating 39 lowmem entries at 00000100073f0000 <6>LVM version 1.0.5+(mp-v6d)(22/07/2002) module loaded <4>reiserfs: enabling write barrier flush mode <6>Adding Swap: 1052216k swap-space (priority 42) <4>reiserfs: enabling write barrier flush mode <4>reiserfs: enabling write barrier flush mode Kernel logging (ksyslog) stopped. Kernel log daemon terminating.