[Bug 248448] New: OpenSuSE 10.2: Sporadically crashes on new hardware. Completely unresponsive!
https://bugzilla.novell.com/show_bug.cgi?id=248448 Summary: OpenSuSE 10.2: Sporadically crashes on new hardware. Completely unresponsive! Product: openSUSE 10.2 Version: Final Platform: x86 OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: mkeys@catoosa.k12.ga.us QAContact: qa@suse.de I think I'm having the same problem as this guy: http://www.archivesat.com/openSUSE_help/thread2403188.htm The system just hangs. The HD activity light is solid. Keyboard/Mouse is not responsive in KDE (if running). SSH will not respond with a login prompt, but I am getting ping replies from it. The only fix I know of is to press+hold the power button and pray it comes back up. This system is a 1 year old HP dx5150, with an AMD Athlon 64 3000+ (1.8GHz) with 512MB RAM, SATA 40GB hard drive. This same system ran fine with OpenSuSE 10.0. It started crashing after I upgraded to 10.2. It is used for primarily for web proxy and content filtering (Squid, Dansguardian, ClamAV), but I'm also running Samba, BIND, Apache2, MySQL, and webmin. The ClamAV plugin is also a recent change since upgrading from 10.0. Latest kernel on the updates.. proxy:/var/log # uname -r 2.6.18.2-34-default Plenty of space on the hard drive.. proxy:/var/log # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 38177260 10046268 28130992 27% / udev 225088 80 225008 1% /dev I was getting lots of these messages in /var/log/messages: WARNING: newer swaplog entry for dirno 0, fileno 00002815 WARNING: Disk space over limit: 105304 KB > 102400 KB I deleted /var/cache/squid/swap.store, and restarted Squid. I also wrote an hourly cron job to stop dansguardian and squid, delete the contents of /tmp/dgvirus and /var/cache/squid/swap.store, and finally restart squid and dansguardian to make sure I'm getting a clean slate. That did the trick for clearing those error messages, but even with that it froze twice today. I'm getting lots of these too... is this a big problem? smbd[4397]: [2007/02/23 15:40:35, 0] auth/auth_util.c:create_builtin_administrators(785) smbd[4397]: create_builtin_administrators: Failed to create Administrators Here are the crashes.. proxy:/var/log # last |grep crash root pts/0 Fri Feb 23 14:13 - crash (-4:-42) root :0 console Fri Feb 23 14:12 - crash (-4:-41) root pts/0 Thu Feb 22 16:09 - crash (12:54) root :0 console Thu Feb 22 16:08 - crash (12:55) root tty1 Mon Jan 22 10:00 - crash (-2:-45) Where else can I look to find the problem? Please help! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448 ------- Comment #1 from mkeys@catoosa.k12.ga.us 2007-02-23 14:23 MST ------- This bug also sounds very similar, except I'm not running SCSI drives: https://bugzilla.novell.com/show_bug.cgi?id=238572 Is it an AMD specific problem? I also have SMART running and the status is fine. I also ran a BIOS HD self-check... both the SMART status and extensive self check had normal results. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448 jim@edwardsj7.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jim@edwardsj7.com ------- Comment #2 from jim@edwardsj7.com 2007-02-23 15:38 MST ------- Yes, the symptoms are very similar to mine ! I think my mouse remains active enough to move the cursor, but the buttons and keyboard are dead. I have no crash messages or warnings in my log files, it just locks up solid. I can recover by either the reset button or the power button to turn it off. Up until last night I had not suffered any significant data loses, but last night I lost a VPN configuration file :-( I have to admit that the incidence of lock-ups since I implemented the SCSI driver fix seems to be lower, so maybe there is a more general DMA problem behind both our problems. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448 ------- Comment #3 from mkeys@catoosa.k12.ga.us 2007-02-23 20:53 MST ------- It's interesting that you mentioned I/O intensive processes as a suspect. Squid+DGAV can be quite I/O intensive under heavy loads. I'm also thinking it has something to do with AMD processors. I've got another server running 10.2 that is Intel with an IDE as it's main and 3 SCSI drives in a RAID-0 configuration. It is set up the same way as far as packages/configuration and it never locks up. It does not serve nearly as many requests per second as this one does though. I'll do the ISO test you were talking about and see what happens. Today the mouse cursor would move a little, but it was VERY slow and was clipping really bad. It seems to me there's a loop somewhere, where data is just cycling through and not allowing any other processes to jump in. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448 ------- Comment #4 from jim@edwardsj7.com 2007-02-23 21:16 MST ------- I don't know if you see the same, but if I let the system sit locked-up for long enough, don't ask me how long that is as I've left the room for an hour and not watched the LED, the HD indicator will go out, but the system remains locked. It is interesting that you see the same LED effect when probably your LED is connected to the MB driver, while mine is to the Adaptec SCSI card. I've been meaning to have the LED driven by both but never got around to it, if I had we could have seen if both were driving (I was going to use a bi-colour LED) :-( Something that I have noticed causes the lock-up, but which is not one would have thought a particularly stressful operation, is using the Opera and Seamonkey browsers (I can't really tell which may be the cause as I run them both together and it is when I have asked them both to load different pages that it has happened). Of course it could just be coincidence, but oit has happened more than once under that condition, so probably not ! The hardware is still perfectly happy running 9.0, and the problem still happens when the "/" drive is IDE not SCSI, it is certainly an interesting problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448 ------- Comment #5 from mkeys@catoosa.k12.ga.us 2007-02-26 07:34 MST ------- I've found it locked-up on a Monday morning before. The hard drive light was solid as usual and the system was totally unresponsive. I had to power it off to fix it. I've temporarily taken the ClamAV plugin out of the picture. We'll see how it goes this week. Are there any specific logs I should be watching other than /var/log/messages or a more verbose log setting I should enable to get more information about what is causing the crash? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448 gregkh@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |mkeys@catoosa.k12.ga.us ------- Comment #6 from gregkh@novell.com 2007-02-28 17:33 MST ------- can you run memtest86 and verify that this is not a hardware issue? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448 ------- Comment #7 from mkeys@catoosa.k12.ga.us 2007-03-01 11:45 MST ------- (In reply to comment #6)
can you run memtest86 and verify that this is not a hardware issue?
Sure thing, I'll let it run over the weekend and post the results on Monday. FYI - Since I removed ClamAV scanning the web cache objects it has been running great. It's appears Mr. Edwards was correct about the amount of I/O being related to the crash/freeze. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448 mike_wells@cox.net changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mike_wells@cox.net ------- Comment #8 from mike_wells@cox.net 2007-03-08 09:18 MST ------- I am having the same random system freezes on 2 machines running openSUSE 10.2. Machine 1 is Intel based (Pentium D-830) and running 64 bit. Machine 2 is AMD based (Athlon XP 2800+) running 32 bit. No similarity between the hardware and no similarity between the random freezes except that it takes a hard reset to restore both machines. I have run previous versions of SUSE (9.1, 9.3, 10.1) and never experienced total lockups in those. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=248448#c9
Stephan Kulow
participants (1)
-
bugzilla_noreply@novell.com