kupdated in uninterrupatable sleep after kernel update
Hi, One of our SuSE 8.1 servers is experiencing problems since the security update for the do_brk() bug. Maybe this should be forwarded to SuSE's kernel team, but since it started after upgrading the kernel I decided to post the report here. Here goes: Last saterday we had to reset the machine. Remote login over ssh was not possible. The load was steadily climbing to over 40, according to a web interface: http://europa.hostingxs.nl/sysinfo/ Some time before the reset the kernel oopsed, oops is attached below. Currently the server has one process in uninterruptable sleep: hensema@europa:~> ps -eo stat,pid,cmd | grep ^D DW 6 [kupdated] Therefore the load is stable at 1. The server seems to be running stable for now. The oops: Dec 7 00:16:16 europa kernel: invalid operand: 0000 2.4.21-151-default #1 Fri Nov 28 03:16:17 UTC 2003 Dec 7 00:16:16 europa kernel: CPU: 0 Dec 7 00:16:16 europa kernel: EIP: 0010:[ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121586616/96] Not tainted Dec 7 00:16:16 europa kernel: EIP: 0010:[<c1ed3c48>] Not tainted Dec 7 00:16:16 europa kernel: EFLAGS: 00010282 Dec 7 00:16:16 europa kernel: eax: 00000098 ebx: c1ad5000 ecx: cb617dc4 edx: c1eef956 Dec 7 00:16:16 europa kernel: esi: c39042c0 edi: c1ad5000 ebp: 00000000 esp: cb617dc0 Dec 7 00:16:16 europa kernel: ds: 0018 es: 0018 ss: 0018 Dec 7 00:16:16 europa kernel: Process popper (pid: 17541, stackpage=cb617000) Dec 7 00:16:16 europa kernel: Stack: c1eef956 c1ef03c0 c1eea520 cb617de0 d7fbb060 c1ec9b4f c1ad5000 c1eea520 Dec 7 00:16:16 europa kernel: d6523de8 d7fbb060 d6523cc0 00254f2c c39042c0 0014f000 d6523cc0 c1ec9c4f Dec 7 00:16:16 europa kernel: cb617e38 d6523cc0 0014f000 00000000 0014f000 00000000 00000002 00000085 Dec 7 00:16:16 europa kernel: Call Trace: [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121472682/96] (04) [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121470016/96] (04) [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121494240/96] (12) Dec 7 00:16:16 europa kernel: Call Trace: [<c1eef956>] (04) [<c1ef03c0>] (04) [<c1eea520>] (12) Dec 7 00:16:16 europa kernel: [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121627825/96] (08) [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121494240/96] (32) [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121627569/96] (52) [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121522088/96] (120 ) Dec 7 00:16:16 europa kernel: [<c1ec9b4f>] (08) [<c1eea520>] (32) [<c1ec9c4f>] (52) [<c1ee3858>] (120) Dec 7 00:16:16 europa kernel: [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121618842/96] (88) [generic_file_write_nolock+495/1024] (80) [generic_file_write+287/320](44) [ide-cd:__insmod_ide-cd_O/lib/modules/2.4.21-151-default/kernel/dri+-121614846/96] (32) Dec 7 00:16:16 europa kernel: [<c1ecbe66>] (88) [<c013773f>] (80) [<c0137bdf>] (44) [<c1ecce02>] (32) Dec 7 00:16:16 europa kernel: [sys_write+133/256] (36) [system_call+51/64] (60) Dec 7 00:16:16 europa kernel: [<c01463f5>] (36) [<c0109073>] (60) Dec 7 00:16:16 europa kernel: Modules: [(reiserfs:<c1ec0060>:<c1ef1d74>)] Dec 7 00:16:16 europa kernel: Code: 0f 0b 4e 01 5c f9 ee c1 85 db 68 c0 03 ef c1 74 17 66 8b 43 In the mail log: Dec 6 00:16:16 europa popper[30399]: apop "jjd" [pop_apop.c:214] Dec 6 00:16:16 europa popper[30399]: jjd at [hostname] ([ip]): -ERR [SYS/TEMP] POP authentication DB not available (user jjd): No such file or directory (2) [pop_apop.c:249] The machine was reset around 00:40 or 00:50. Before the update (friday) it had a uptime of 100 days. It only went down because it had to be moved to another rack. All filesystems are reiser. Unfortunately the machine wasn't installed as a backup server / dns server, and it wasn't prepared for use by customers. To be able to use quota I was forced to create two filesystems (document root and mail spool) on loopback filesystem. The partitions reside on a 3ware IDE raid controller. I have no clue whatsoever why it would attempt to access the cdrom drive. I'm not even sure it the machine has one, and it certainly should not be mounted. -- Erik Hensema (erik@hensema.net)
Last saterday we had to reset the machine. Remote login over ssh was not possible. The load was steadily climbing to over 40, according to a web interface: http://europa.hostingxs.nl/sysinfo/
All filesystems are reiser. Unfortunately the machine wasn't installed as a
Reiser is fast and looses it's data fast as well, if there are any disc-problems - Better use ext3! This is a mailserver, isn't it? If so reliability goes before speed -> no reiser!
backup server / dns server, and it wasn't prepared for use by customers. To be able to use quota I was forced to create two filesystems (document root and mail spool) on loopback filesystem.
Why don't you use postfix, there is a kind of "quota-option" within mailbox-settings /etc/postfix/main.cf e.g.: mailbox_size_limit = 51200000 message_size_limit = 10240000 for 50 MB Mailbox and max. 10MB Attachement, you can change this to your delight. Or with ext3 rewrite /etc/fstab e.g. (the part with usrquota): /dev/sda6 /home ext3 defaults,usrquota 1 2 activate your quotas after a reboot. Reiser doesn't support quotas (as I know) and that's why I don't use it either. Next thing is: /var/spool/mail -> only 1 GB, is that enough if you get more accounts (better use usrquota or the postfix option above, sendmail should have such a feature as well)?
The partitions reside on a 3ware IDE raid controller. I have no clue whatsoever why it would attempt to access the cdrom drive. I'm not even sure it the machine has one, and it certainly should not be mounted.
From phpsysinfo:
IDE Devices hdb: (none) (Capacity: 0.00 GB ) Do you have a CD-ROM? I think so! Maybe you read here: http://www.ussg.iu.edu/hypermail/linux/kernel/0202.2/0429.html http://mail.nl.linux.org/linux-mm/2001-11/msg00050.html There were several users having problems with kupdated. This has something to do with the kerneldriver for your hd-controller. There were sometimes kernelproblems with RAID drivers, like SuSE 8.2 with Promise, or was it 8.1? Maybe something has to be compiled new. Philippe
On Tue, Dec 09, 2003 at 01:39:35AM +0100, Philippe Vogel wrote:
Last saterday we had to reset the machine. Remote login over ssh was not possible. The load was steadily climbing to over 40, according to a web interface: http://europa.hostingxs.nl/sysinfo/
All filesystems are reiser. Unfortunately the machine wasn't installed as a
Reiser is fast and looses it's data fast as well, if there are any disc-problems - Better use ext3!
We're using mirroring raid. No problems according to the controller software. Never had any problems with reiser either, and we're using it on all servers.
This is a mailserver, isn't it? If so reliability goes before speed -> no reiser!
backup server / dns server, and it wasn't prepared for use by customers. To be able to use quota I was forced to create two filesystems (document root and mail spool) on loopback filesystem.
Why don't you use postfix, there is a kind of "quota-option" within mailbox-settings /etc/postfix/main.cf e.g.:
That would not solve any kernel problems, now would it? ;-) [...]
activate your quotas after a reboot. Reiser doesn't support quotas (as I know) and that's why I don't use it either.
Reiser does support quota.
Next thing is: /var/spool/mail -> only 1 GB, is that enough if you get more accounts (better use usrquota or the postfix option above, sendmail should have such a feature as well)?
It's a temporary sollution, we've going to move all sites off this server as soon as we have upgraded an old server to suse 8.1. [...]
Maybe you read here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0202.2/0429.html
http://mail.nl.linux.org/linux-mm/2001-11/msg00050.html
There were several users having problems with kupdated. This has something to do with the kerneldriver for your hd-controller. There were sometimes kernelproblems with RAID drivers, like SuSE 8.2 with Promise, or was it 8.1?
We've had severe problems with promise under redhat 7.3. Never again, it's rubbish. Anyway, thanks for your input, especially the mailinglist references. -- Erik Hensema (erik@hensema.net)
Reiser is fast and looses it's data fast as well, if there are any disc-problems - Better use ext3!
Sounds more like FUD, or can you substantiate that? I had a disk go bad recently, I copied the partition with dd_rescue (which rocks), and used reiserfsck to fix up a few things. If I lost any files they weren't important, but a bunch of files had nullblocks in them. The latter is obviously unavailable with any fs.
Reiser doesn't support quotas (as I know) and that's why I don't use it either.
You don't know a hell of a lot about reiserfs. It's had quotas since, uhhm, SuSE 8.0 (or was that ACLs, quotas even earlier)? Other distros were much slower to catch up, but that's a poblem with those distros, not reiserfs. Volker -- Volker Kuhlmann is possibly list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me.
Hi,
Reiser is fast and looses it's data fast as well, if there are any disc-problems - Better use ext3!
Sounds more like FUD, or can you substantiate that?
Although this is not then "suse-filesystem" mailing list ... ;) I also stopped using ReiserFS on our servers, because of bad experiences. ReiserFS is very fast, but on two occasions (Power failure with no UPS) we had heavily corrupted file systems (with data loss), server was running SuSE 7.0 and 7.2. Another corrupted ReiserFS caused troubles on our mailserver (no power failure), we still don't know why the filesystem was corrupted. On my home system i suddenly noticed corrupt files. There was no poweroutage or problems with my hardware and reiserfs check with rebuild trees fixed the errors. I switched over to ext3 from ReiserFS, when i migrated our productions servers from SuSE to Debian and im using ext3 on my SuSE Workstation and home PC. I never regretted that change (yes we had bad events with powerfailure or admins that removed the wrong cable in the rack). But there are also commercial, high performance server installations out there that use ReiserFS, so your milage may vary. peace Tom -- this is a maillist account, so please send personal replies to cso[at]trium[dot]de
So far I only saw file Reiser file system corruption on a system with bad RAM. You can't blame Reier for that. I have, however, always been very careful about the kernel that I am running with Reiser - there are certainly troublesome kernel versions out there as far a Reiser is concerned. I did recently suffer file system corruption on an ext3 partition (with data loss). That said - I did switch most servers to XFS over the past year (had been running Reiser since early 2000). The main reason is performance for large files on large file systems (over 1 TB). XFS's long history as a reliable file system also helps make me feel comfortable about this choice. Ferdinand --On Tuesday, December 09, 2003 04:24:25 PM +0100 Thomas Seliger <CRJLJAKTJORB@spammotel.com> wrote:
I also stopped using ReiserFS on our servers, because of bad experiences. ReiserFS is very fast, but on two occasions (Power failure with no UPS) we had heavily corrupted file systems (with data loss), server was running SuSE 7.0 and 7.2. Another corrupted ReiserFS caused troubles on our mailserver (no power failure), we still don't know why the filesystem was corrupted.
On my home system i suddenly noticed corrupt files. There was no poweroutage or problems with my hardware and reiserfs check with rebuild trees fixed the errors.
I switched over to ext3 from ReiserFS, when i migrated our productions servers from SuSE to Debian and im using ext3 on my SuSE Workstation and home PC. I never regretted that change (yes we had bad events with powerfailure or admins that removed the wrong cable in the rack).
But there are also commercial, high performance server installations out there that use ReiserFS, so your milage may vary.
peace Tom
-- this is a maillist account, so please send personal replies to cso[at]trium[dot]de
-- Check the headers for your unsubscription address For additional commands, e-mail: suse-security-help@suse.com Security-related bug reports go to security@suse.de, not here
-- Ferdinand Schmid Architectural Energy Corporation Celebrating over 20 Years of Improving Building Energy Performance http://www.archenergy.com
(OT) * Ferdinand Schmid wrote on Sun, Dec 14, 2003 at 21:31 -0700:
So far I only saw file Reiser file system corruption on a system with bad RAM.
Or bad IDE cable, controller problem, CPU heating problem, maybe because of a died fan, ... . Such things happen. Often surprinsingly.
You can't blame Reier for that. I have, however, always been very careful about the kernel that I am running with Reiser - there are certainly troublesome kernel versions out there as far a Reiser is concerned.
:-) What you tell sounds a little like: "it is not reliable out of the box - but there are chances to get it run for a while". :-)
I did recently suffer file system corruption on an ext3 partition (with data loss).
I think ext3 is much more tolerant e.g. for bit-flip-errors and such. I think that is important. Maybe on a future filesystem the redundancy level could be a tunable. Should be more efficient to do it on file system level; such as mirror the directory structure but not it's contents or such. But you are right, you cannot blame Reiser for that. But it is everyones personal decision if he believes that he never will get a memory or "partial" IDE cable failure or such in future in one of his servers. oki, Steffen -- Dieses Schreiben wurde maschinell erstellt, es trägt daher weder Unterschrift noch Siegel.
On Tue, Dec 09, 2003 at 12:10:13AM +0100, Erik Hensema wrote:
One of our SuSE 8.1 servers is experiencing problems since the security update for the do_brk() bug. Maybe this should be forwarded to SuSE's kernel team, but since it started after upgrading the kernel I decided to post the report here.
Update: It seems that this machine has been upgraded to a SUSE 9.0 kernel. My other SuSE 8.1 machines are also running 2.4.21-151-default. They were upgraded by YOU. My 8.2 machines, also upgraded by YOU, are running 2.4.20-4GB-athlon. Did someome copy the wrong rpm to the 8.1 update directory? -- Erik Hensema (erik@hensema.net)
In message <20031209104933.GA1246@bender.home.hensema.net>, Erik Hensema <erik@hensema.net> writes
On Tue, Dec 09, 2003 at 12:10:13AM +0100, Erik Hensema wrote:
One of our SuSE 8.1 servers is experiencing problems since the security update for the do_brk() bug. Maybe this should be forwarded to SuSE's kernel team, but since it started after upgrading the kernel I decided to post the report here.
Update:
It seems that this machine has been upgraded to a SUSE 9.0 kernel. My other SuSE 8.1 machines are also running 2.4.21-151-default. They were upgraded by YOU.
My 8.2 machines, also upgraded by YOU, are running 2.4.20-4GB-athlon.
Did someome copy the wrong rpm to the 8.1 update directory?
Yes, this seems to have broken one or two things round here, it didn't cross my mind it was a mistake, I sort of assumed that 2.4.19 couldn't be updated satisfactorily. If it was a file copying error, then that would explain why they didn't check various dependencies! Hope if it was an error SuSE will admit it soon, and give us a suitable 8.1 kernel update before I try to recompile various other things, and break yet more things! -- Roger Hayter
participants (7)
-
Erik Hensema
-
Ferdinand Schmid
-
Philippe Vogel
-
Roger Hayter
-
Steffen Dettmer
-
Thomas Seliger
-
Volker Kuhlmann