[opensuse-security] Re: [security-announce] SUSE Security Announcement: Linux kernel (SUSE-SA:2007:064)
Hi, Marcus Meissner wrote
______________________________________________________________________________
SUSE Security Announcement
Package: kernel Announcement ID: SUSE-SA:2007:064 Date: Tue, 04 Dec 2007 11:00:00 +0000 Affected Products: SUSE LINUX 10.1 SUSE Linux Enterprise Desktop 10 SP1 SUSE Linux Enterprise 10 SP1 DEBUGINFO SLE SDK 10 SP1 SUSE Linux Enterprise Server 10 SP1
4 hours after I booted our NFS server and our clients with the new kernel, lockd and/or nfsd crashed on the NFS server and I had to reboot. I know it's tainted so I won't get official support, so we will run a test with the original SuSE kernel. Anyway, it's only tainted with the fglrx and the nvidia 3D modules and we've been adding those to every kernel since SuSE 10.1 and SLES10 were release, so I doubt that it's their fault. Since October 4th we were running the 2.6.16.53-0.16 kernel code (with fglrs and nvidia) without any problems, so it must be a patch from 2.6.16.53 to 2.6.16.54. It looks like sth. has changed in the lockd or nfsd code. Is anyone else sth. similar? since cu, Frank Oops: 0002 [1] SMP last sysfs file: /class/scsi_host/host4/proc_name CPU 0 Modules linked in: nfsd exportfs af_packet cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table eeprom adm1026 hwmon_vid hwmon i2c_isa edd mptctl button battery ac sr_mod cdrom dm_mod usbhid usb_storage shpchp i2c_nforce2 forcedeth ohci_hcd ehci_hcd usbcore i2c_core pci_hotplug fan thermal processor qla2xxx firmware_class scsi_transport_fc amd74xx sg mptsas scsi_transport_sas mptscsih mptbase ide_disk ide_core Pid: 6403, comm: lockd Tainted: G U 2.6.16-10suse-bio-smp #1 RIP: 0010:[<ffffffff8019414e>] <ffffffff8019414e>{__locks_delete_block+43} RSP: 0018:ffff810116fdbe00 EFLAGS: 00010286 RAX: ffff81013cfab508 RBX: ffff81013cfab500 RCX: ffffffff80421780 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff81013cfab500 RBP: ffff81011bfe1810 R08: ffff810116fbb054 R09: 0000000000000005 R10: 0000000000000000 R11: ffff81011a893800 R12: ffff810149d5f860 R13: 0000000000000001 R14: ffff8101168a54f0 R15: 0000000000000000 FS: 00002b65f94dc6d0(0000) GS:ffffffff804a9000(0000) knlGS:00000000eb99fba0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000021b5e0000 CR4: 00000000000006e0 Process lockd (pid: 6403, threadinfo ffff810116fda000, task ffff81011b088100) Stack: ffffffff801947c4 ffff810149d5f860 ffff81011bfe1810 ffff8101199a08d0 ffffffff8019492b ffff81011bfe1810 ffffffff80194d46 ffff81011bfe1310 ffff8101199a0828 ffff8100d24093c0 Call Trace: <ffffffff801947c4>{locks_wake_up_blocks+27} <ffffffff8019492b>{locks_delete_lock+126} <ffffffff80194d46>{__posix_lock_file+398} <ffffffff80233f4f>{nlmsvc_unlock+125} <ffffffff802382c5>{nlm4svc_proc_unlock+111} <ffffffff8037298c>{svc_process+838} <ffffffff8023365a>{lockd+0} <ffffffff802337f9>{lockd+415} <ffffffff8010bb8a>{child_rip+8} <ffffffff8023365a>{lockd+0} <ffffffff8023365a>{lockd+0} <ffffffff8010bb82>{child_rip+0} Code: 48 89 0a 48 89 40 08 48 89 47 08 48 c7 07 00 00 00 00 c3 48 RIP <ffffffff8019414e>{__locks_delete_block+43} RSP <ffff810116fdbe00> CR2: 0000000000000000 <0>Bad page state in process 'nfsd' page:ffff81021ce5c6a8 flags:0x0600000000000000 mapping:0000000000000000 mapcount:-32510 count:469637161 Trying to fix it up, but a reboot is needed Backtrace: Call Trace: <ffffffff80161fd5>{bad_page+80} <ffffffff80163035>{get_page_from_freelist+740} <ffffffff80163181>{__alloc_pages+101} <ffffffff80375042>{svc_recv+250} <ffffffff8012aed7>{default_wake_function+0} <ffffffff80382504>{__down_read+18} <ffffffff8824b47e>{:nfsd:nfsd+0} <ffffffff8824b586>{:nfsd:nfsd+264} <ffffffff8010bb8a>{child_rip+8} <ffffffff8824b47e>{:nfsd:nfsd+0} <ffffffff8824b47e>{:nfsd:nfsd+0} <ffffffff8010bb82>{child_rip+0} nfsd: last server has exited nfsd: unexporting all filesystems -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. * --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-security+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-security+help@opensuse.org
Something definitely happens when rebooting NFS clients from the 2.6.16.53 kernel to the 2.6.16.54 with the server already running 2.6.16.54. This morning I had the server already running with the original, untainted SLES10 kernel and rebooted 5 clients that were still running 2.6.16.53. They all came up, but a minuter later the NFS server crashed so hard, that I couldn't even see a console log anymore and had to hard reset it. The server didn't come up again, it stopped at "LIL" so I have to boot a rescue system now. Today I will check if it also happens when rebooting clients that are running the new kernel, too, or if it is just an imcompatibility between the old and new kernel. cu, Frank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. * --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-security+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-security+help@opensuse.org
On Tue, Dec 04, 2007 at 03:11:32PM +0100, Frank Steiner wrote:
Hi,
Marcus Meissner wrote
______________________________________________________________________________
SUSE Security Announcement
Package: kernel Announcement ID: SUSE-SA:2007:064 Date: Tue, 04 Dec 2007 11:00:00 +0000 Affected Products: SUSE LINUX 10.1 SUSE Linux Enterprise Desktop 10 SP1 SUSE Linux Enterprise 10 SP1 DEBUGINFO SLE SDK 10 SP1 SUSE Linux Enterprise Server 10 SP1
4 hours after I booted our NFS server and our clients with the new kernel, lockd and/or nfsd crashed on the NFS server and I had to reboot. I know it's tainted so I won't get official support, so we will run a test with the original SuSE kernel.
Anyway, it's only tainted with the fglrx and the nvidia 3D modules and we've been adding those to every kernel since SuSE 10.1 and SLES10 were release, so I doubt that it's their fault. Since October 4th we were running the 2.6.16.53-0.16 kernel code (with fglrs and nvidia) without any problems, so it must be a patch from 2.6.16.53 to 2.6.16.54. It looks like sth. has changed in the lockd or nfsd code.
Is anyone else sth. similar? since
So far we have not heard about it. "2.6.16-10suse-bio-smp", you even recompiled your kernel, right? Does this happen when you do "rcnfsserver stop"? Its not clear where this might happen. Ciao, Marcus --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-security+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-security+help@opensuse.org
Marcus Meissner wrote
So far we have not heard about it.
Might be related to root-over-nfs because I can reliably reproduce it by rebooting our root-over-nfs clients to crash the nfs server. Rebooting other clients that just mount stuff like /home etc. do not crash the server. Maybe it's some special kind of locking or sth. that occurs when having things like /var over NFS...
"2.6.16-10suse-bio-smp", you even recompiled your kernel, right?
Yes, that because we use the same kernel for server and diskless clients and compile network drivers into the kernel, so that the diskless clients can do root-over-nfs (I guess it should work with a initrd, too, but we didn't try so far).
Does this happen when you do "rcnfsserver stop"?
No, just after booting a client this happens. Not always, I rebooted and rebooted three clients in parallel, and after about 2-3 reboots, the servers crashes. Now this happens also when server and clients are running the .54 kernel, so it's not just an upgrade issue from .53 to .54. I couldn't get further logs because the crashes are now so hard that no log is written anymore. I'm trying to setup syslog via ttyS0 to fetch them. I that might help in the meantime, here is the output from ksymoops for the first crash with the fglrx-nvidia-tainted kernel. Maybe someone can get some information from this. cu, Frank ksymoops 2.4.11 on x86_64 2.6.16-10suse-bio-smp. Options used -V (default) -k /proc/kallsyms (default) -l /proc/modules (default) -o /lib/modules/2.6.16-10suse-bio-smp/ (default) -m /boot/System.map-2.6.16-10suse-bio-smp (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid ksyms file? No modules in ksyms, skipping objects No ksyms, skipping lsmod Dec 4 14:26:35 backus kernel: CPU 0 Dec 4 14:26:35 backus kernel: Pid: 6403, comm: lockd Tainted: G U 2.6.16-10suse-bio-smp #1 Dec 4 14:26:35 backus kernel: RIP: 0010:[<ffffffff8019414e>] <ffffffff8019414e>{__locks_delete_block+43} Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 Dec 4 14:26:35 backus kernel: RSP: 0018:ffff810116fdbe00 EFLAGS: 00010286 Dec 4 14:26:35 backus kernel: RAX: ffff81013cfab508 RBX: ffff81013cfab500 RCX: ffffffff80421780 Dec 4 14:26:35 backus kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff81013cfab500 Dec 4 14:26:35 backus kernel: RBP: ffff81011bfe1810 R08: ffff810116fbb054 R09: 0000000000000005 Dec 4 14:26:35 backus kernel: R10: 0000000000000000 R11: ffff81011a893800 R12: ffff810149d5f860 Dec 4 14:26:35 backus kernel: R13: 0000000000000001 R14: ffff8101168a54f0 R15: 0000000000000000 Dec 4 14:26:35 backus kernel: FS: 00002b65f94dc6d0(0000) GS:ffffffff804a9000(0000) knlGS:00000000eb99fba0 Dec 4 14:26:35 backus kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 4 14:26:35 backus kernel: CR2: 0000000000000000 CR3: 000000021b5e0000 CR4: 00000000000006e0 Dec 4 14:26:35 backus kernel: Stack: ffffffff801947c4 ffff810149d5f860 ffff81011bfe1810 ffff8101199a08d0 Dec 4 14:26:35 backus kernel: ffffffff8019492b ffff81011bfe1810 ffffffff80194d46 ffff81011bfe1310 Dec 4 14:26:35 backus kernel: ffff8101199a0828 ffff8100d24093c0 Dec 4 14:26:35 backus kernel: Call Trace: <ffffffff801947c4>{locks_wake_up_blocks+27} Dec 4 14:26:35 backus kernel: <ffffffff8019492b>{locks_delete_lock+126} <ffffffff80194d46>{__posix_lock_file+398} Dec 4 14:26:35 backus kernel: <ffffffff80233f4f>{nlmsvc_unlock+125} <ffffffff802382c5>{nlm4svc_proc_unlock+111} Dec 4 14:26:35 backus kernel: <ffffffff8037298c>{svc_process+838} <ffffffff8023365a>{lockd+0} Dec 4 14:26:35 backus kernel: <ffffffff802337f9>{lockd+415} <ffffffff8010bb8a>{child_rip+8} Dec 4 14:26:35 backus kernel: <ffffffff8023365a>{lockd+0} <ffffffff8023365a>{lockd+0} Dec 4 14:26:35 backus kernel: <ffffffff8010bb82>{child_rip+0} Dec 4 14:26:35 backus kernel: Code: 48 89 0a 48 89 40 08 48 89 47 08 48 c7 07 00 00 00 00 c3 48
RIP; ffffffff8019414e <__locks_delete_block+2b/3e> <=====
RAX; ffff81013cfab508
RBX; ffff81013cfab500 RCX; ffffffff80421780 RDI; ffff81013cfab500 RBP; ffff81011bfe1810 R08; ffff810116fbb054 R11; ffff81011a893800 R12; ffff810149d5f860 R14; ffff8101168a54f0
Trace; ffffffff801947c4
participants (2)
-
Frank Steiner
-
Marcus Meissner