[opensuse] Problem that cause system crashed?
Dear all, I got terrible problem, my production machine crashed after runs 12 hours. Below the last messages from /var/log/boot.msg It is OpenSuSE 11.1 running with Intel DG43NB + E7500 HDD Seagate Model=ST3250310NS (ES Type) 1 GB of RAM non-ECC Dlink (Realtek 8138too) Intel e1000e (onboard) 3Com 3c590 FYI, this box setting off HDD is Native IDE (not AHCI) I also have another box that run similar specification (set as AHCI, no 3C590) that run very well for more than a week. Need advise and suggest from experts here. Thank you very much for your kind help Best Regards, Wong ---snip--- <6>input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3 <6>ACPI: Power Button [PWRF] <6>8139too Fast Ethernet driver 0.9.28 <6>8139too 0000:04:00.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 <6>eth0: RealTek RTL8139 at 0xf85d8000, 00:24:01:eb:76:ec, IRQ 20 <6>3c59x 0000:04:04.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22 <6>3c59x: Donald Becker and others. <6>0000:04:04.0: 3Com PCI 3c590 Vortex 10Mbps at 0001d100. <6>0000:04:04.0: Overriding PCI latency timer (CFLT) setting of 32, new value is 248. <4>Driver 'rtc_cmos' needs updating - please use bus_type methods <6>rtc_cmos 00:06: RTC can wake from S4 <6>rtc_cmos 00:06: rtc core: registered rtc_cmos as rtc0 <6>rtc0: alarms up to one year, y3k, 114 bytes nvram, hpet irqs <6>i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18 <3>ACPI: I/O resource 0000:00:1f.3 [0x1180-0x119f] conflicts with ACPI region SMIO [0x1180-0x119f] <6>ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver <6>Adding 1044216k swap on /dev/sda1. Priority:-1 extents:1 across:1044216k <6>loop: module loaded Kernel logging (ksyslog) stopped. Kernel log daemon terminating. Boot logging started on /dev/tty1(/dev/console) at Sat Dec 26 19:22:48 2009 Waiting for device /dev/disk/by-id/ata-ST3250310NS_9SF14QXY-part2 to appear: ok showconsole: Warning: the ioctl TIOCGDEV is not known by the kernel fsck 1.41.1 (01-Sep-2008) [/sbin/fsck.reiserfs (1) -- /] fsck.reiserfs -a /dev/disk/by-id/ata-ST3250310NS_9SF14QXY-part2 Reiserfs super block in block 16 on 0x802 of format 3.6 with standard journal Blocks (total/free): 10484416/9994120 by 4096 bytes Filesystem is NOT clean Replaying journal: Trans replayed: mountid 37, transid 20616, desc 4985, len 79, commit 5065, next trans offset 5048 Replaying journal: | | 0.3% 1 trans Trans replayed: mountid 37, transid 20617, desc 5066, len 37, commit 5104, next trans offset 5087 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2009-12-26 at 20:04 +0800, Wong wrote:
Dear all,
I got terrible problem, my production machine crashed after runs 12 hours.
Below the last messages from /var/log/boot.msg
The last lines from /var/log/messages would be more useful than the boot log. There is no clue to guess what the problem could have been. And it was only once... Test the usual suspects: memory and hard disk. Use memtest, smartctl... - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAks2C+EACgkQtTMYHG2NR9W6YwCgg8XipRp4HEEVZv+zMjx0VFkO Xc0An0VB0t+gMr2BgOAzce+CTt4weF5M =tg2f -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hi Carlos,
I got terrible problem, my production machine crashed after runs 12 hours.
Below the last messages from /var/log/boot.msg
The last lines from /var/log/messages would be more useful than the boot log.
There is no clue to guess what the problem could have been. And it was only once...
Test the usual suspects: memory and hard disk. Use memtest, smartctl...
Thank you so much for your advises. Below the last useful log from /var/log/messages. Seems the machine crashed after run MRTG on Dec 26 19:20:01. The traffic.cfg captured eth1 that uses SNMP (other cfg do not use SNMP). Our administrator did a hard reboot by turn off the power switch and the box was "luckily" able back to run on Dec 26 19:23:03. I also experiencing same problem (at this box) 2-3 days before this problem happen. The box crashed when run snmpwalk. But I am curios, does SNMP activity cause system crashed? BR, Wong --snip-- Dec 26 19:06:35 SuSEbox smartd[2591]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 67 Dec 26 19:06:35 SuSEbox smartd[2591]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 32 to 33 Dec 26 19:10:01 SuSEbox /usr/sbin/cron[21351]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/cpu.cfg >/dev/null 2>&1) Dec 26 19:10:01 SuSEbox /usr/sbin/cron[21350]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/qmail.cfg >/dev/null 2>&1) Dec 26 19:10:01 SuSEbox /usr/sbin/cron[21352]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/traffic.cfg >/dev/null 2>&1) Dec 26 19:15:01 SuSEbox /usr/sbin/cron[21377]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/cpu.cfg >/dev/null 2>&1) Dec 26 19:15:01 SuSEbox /usr/sbin/cron[21380]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/qmail.cfg >/dev/null 2>&1) Dec 26 19:15:01 SuSEbox /usr/sbin/cron[21379]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/traffic.cfg >/dev/null 2>&1) Dec 26 19:20:01 SuSEbox /usr/sbin/cron[21423]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/qmail.cfg >/dev/null 2>&1) Dec 26 19:20:01 SuSEbox /usr/sbin/cron[21422]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/cpu.cfg >/dev/null 2>&1) Dec 26 19:20:01 SuSEbox /usr/sbin/cron[21426]: (root) CMD (/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg-2/cfg/traffic.cfg >/dev/null 2>&1) Dec 26 19:23:03 SuSEbox syslog-ng[1608]: syslog-ng starting up; version='2.0.9' Dec 26 19:23:03 SuSEbox rchal: powersave cpufreq governor could not be loaded Dec 26 19:23:03 SuSEbox ifup: lo Dec 26 19:23:03 SuSEbox ifup: lo Dec 26 19:23:03 SuSEbox ifup: IP address: 127.0.0.1/8 Dec 26 19:23:03 SuSEbox ifup: Dec 26 19:23:03 SuSEbox ifup: Dec 26 19:23:03 SuSEbox ifup: IP address: 127.0.0.2/8 Dec 26 19:23:03 SuSEbox ifup: Dec 26 19:23:03 SuSEbox ifup: eth0 device: D-Link System Inc RTL8139 Ethernet (rev 10) Dec 26 19:23:03 SuSEbox ifup: eth0 Dec 26 19:23:03 SuSEbox ifup: IP address: 222.108.242.130/29 Dec 26 19:23:03 SuSEbox ifup: Dec 26 19:23:04 SuSEbox SuSEfirewall2: SuSEfirewall2 not active Dec 26 19:23:04 SuSEbox ifup: eth1 device: Intel Corporation 82567V-2 Gigabit Network Connection Dec 26 19:23:04 SuSEbox ifup: eth1 Dec 26 19:23:04 SuSEbox ifup: IP address: 192.168.0.21/24 Dec 26 19:23:04 SuSEbox ifup: Dec 26 19:23:04 SuSEbox SuSEfirewall2: SuSEfirewall2 not active Dec 26 19:23:04 SuSEbox ifup: eth2 device: 3Com Corporation 3c590 10BaseT [Vortex] Dec 26 19:23:04 SuSEbox ifup: eth2 Dec 26 19:23:04 SuSEbox ifup: IP address: 192.168.10.1/24 Dec 26 19:23:04 SuSEbox ifup: Dec 26 19:23:04 SuSEbox SuSEfirewall2: SuSEfirewall2 not active Dec 26 19:23:08 SuSEbox kernel: klogd 1.4.1, log source = /proc/kmsg started. Dec 26 19:23:08 SuSEbox kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 Dec 26 19:23:08 SuSEbox kernel: e1000e 0000:00:19.0: irq 27 for MSI/MSI-X Dec 26 19:23:08 SuSEbox kernel: e1000e 0000:00:19.0: irq 27 for MSI/MSI-X Dec 26 19:23:08 SuSEbox kernel: eth2: setting half-duplex. Dec 26 19:23:08 SuSEbox kernel: e1000e: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX Dec 26 19:23:08 SuSEbox kernel: 0000:00:19.0: eth1: 10/100 speed: disabling TSO Dec 26 19:23:08 SuSEbox auditd[2559]: Started dispatcher: /sbin/audispd pid: 2561 Dec 26 19:23:08 SuSEbox /usr/sbin/cron[2562]: (CRON) STARTUP (V5.0) Dec 26 19:23:08 SuSEbox auditd[2559]: Init complete, auditd 1.7.7 listening for events (startup state disable) Dec 26 19:23:08 SuSEbox audispd: priority_boost_parser called with: 4 Dec 26 19:23:08 SuSEbox audispd: af_unix plugin initialized Dec 26 19:23:08 SuSEbox audispd: audispd initialized with q_depth=80 and 1 active plugins Dec 26 19:23:08 SuSEbox smartd[2564]: smartd 5.39 2008-10-24 22:33 [i686-suse-linux-gnu] (openSUSE RPM) Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net Dec 26 19:23:08 SuSEbox smartd[2564]: Opened configuration file /etc/smartd.conf Dec 26 19:23:08 SuSEbox smartd[2564]: Drive: DEVICESCAN, implied '-a' Directive on line 26 of file /etc/smartd.conf Dec 26 19:23:08 SuSEbox smartd[2564]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda, type changed from 'scsi' to 'sat' Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], opened Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], found in smartd database. Dec 26 19:23:09 SuSEbox sshd[2578]: Server listening on 0.0.0.0 port 7575. Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list. Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.ST3250310NS-9SF14QXY.ata.state Dec 26 19:23:09 SuSEbox smartd[2564]: Monitoring 1 ATA and 0 SCSI devices Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 75 to 76 Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], SMART Usage Attribute: 188 Unknown_Attribute changed from 100 to 98 Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68 Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 33 to 32 Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 54 to 55 Dec 26 19:23:09 SuSEbox smartd[2564]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.ST3250310NS-9SF14QXY.ata.state Dec 26 19:23:09 SuSEbox smartd[2582]: smartd has fork()ed into background mode. New PID=2582. Dec 26 19:23:10 SuSEbox kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Dec 26 19:23:10 SuSEbox kernel: nf_conntrack version 0.5.0 (15237 buckets, 60948 max) Dec 26 19:23:10 SuSEbox kernel: CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use Dec 26 19:23:10 SuSEbox kernel: nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or Dec 26 19:23:10 SuSEbox kernel: sysctl net.netfilter.nf_conntrack_acct=1 to enable it. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (2)
-
Carlos E. R.
-
Wong