Hi, I have some troubles with dual opteron server, Tyan Thunder K8S pro motherboard, 2x246 opteron CPU, 4G mem, suse 9.0 AMD64, kernel 2.4.21-178-smp. I have not yet tested the latest kernel... Occassionaly (once in 2 weeks), there is a deadlock. I am not sure if this is a hardware or kernel problem, since the power supply died once. Any suggestions? There is a ksymoops output attached. Thanks, Andrej -- _____________________________________________________________ doc. dr. Andrej Filipcic, E-mail: Andrej.Filipcic@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674 Fax: +386-1-425-7074 ------------------------------------------------------------- ksymoops 2.4.9 on x86_64 2.4.21-178-smp. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.21-178-smp/ (default) -m /boot/System.map-2.4.21-178-smp (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Unable to handle kernel paging request at virtual address 000000121c9322c8 ffffffff801845b8 PML4 f65dd067 PGD 0 Oops: 0000 CPU 0 Pid: 26374, comm: top Tainted: P RIP: 0010:[<ffffffff801845b8>]{collect_sigign_sigcatch+56} Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 RSP: 0018:000001002f699d80 EFLAGS: 00010006 RAX: 000000121c9322c0 RBX: 00000100a7294000 RCX: 0000000000000000 RDX: 000001002f699ed8 RSI: 000001002f699ed0 RDI: 00000100a7294000 RBP: 00000100570f45b8 R08: 000000121c9322c8 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 00000100570f4580 R13: 00000000003db000 R14: 0000000000000000 R15: 000000000051a780 FS: 000000000050eee0(0000) GS:ffffffff804bbf00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000121c9322c8 CR3: 0000000000101000 CR4: 00000000000006e0 Process top (pid: 26374, stackpage=1002f699000) Stack: 000001002f699d80 0000000000000018 ffffffff80184bf7 00000100a7294000 ffffffff8016f3e0 0000000000000246 ffffffff80182edb 000001007f90b000 0000010080013020 ffffffff803d3480 fffffffffffffff4 00000100656344c0 Call Trace: [<ffffffff80184bf7>]{proc_pid_stat+407} [<ffffffff8016f3e0>]{new_inode+16} [<ffffffff80182edb>]{proc_pid_make_inode+235} [<ffffffff80183343>]{proc_base_lookup+563} [<ffffffff80161ae7>]{real_lookup+167} [<ffffffff8016c5d1>]{dput+33} [<ffffffff80166129>]{__link_path_walk+2425} [<ffffffff80182491>]{proc_info_read+113} [<ffffffff8015466f>]{sys_read+191} [<ffffffff80110143>]{system_call+119} Code: 49 8b 00 48 83 f8 01 75 0f 41 8d 49 ff 48 d3 e0 48 09 06 eb
RIP; ffffffff801845b8 <collect_sigign_sigcatch+38/80> <=====
Trace; ffffffff80184bf7 <proc_pid_stat+197/600> Trace; ffffffff8016f3e0 <new_inode+10/80> Trace; ffffffff80183343 <proc_base_lookup+233/250> Trace; ffffffff8016c5d1 <dput+21/160> Trace; ffffffff80182491 <proc_info_read+71/100> Trace; ffffffff80110143 <system_call+77/7c> Code; ffffffff801845b8 <collect_sigign_sigcatch+38/80> 0000000000000000 <_RIP>: Code; ffffffff801845b8 <collect_sigign_sigcatch+38/80> <===== 0: 49 8b 00 mov (%r8),%rax <===== Code; ffffffff801845bb <collect_sigign_sigcatch+3b/80> 3: 48 83 f8 01 cmp $0x1,%rax Code; ffffffff801845bf <collect_sigign_sigcatch+3f/80> 7: 75 0f jne 18 <_RIP+0x18> Code; ffffffff801845c1 <collect_sigign_sigcatch+41/80> 9: 41 8d 49 ff lea 0xffffffffffffffff(%r9),%ecx Code; ffffffff801845c5 <collect_sigign_sigcatch+45/80> d: 48 d3 e0 shl %cl,%rax Code; ffffffff801845c8 <collect_sigign_sigcatch+48/80> 10: 48 09 06 or %rax,(%rsi) Code; ffffffff801845cb <collect_sigign_sigcatch+4b/80> 13: eb 00 jmp 15 <_RIP+0x15> NMI Watchdog detected LOCKUP on CPU0, eip ffffffff80185d02, registers: CPU 0 Pid: 26375, comm: top Tainted: P RIP: 0010:[<ffffffff80185d02>]{.text.lock.array+7} RSP: 0018:000001007680dd80 EFLAGS: 00000086 RAX: 0000000000000000 RBX: 00000100a7294000 RCX: 0000000000000000 RDX: 000001007680ded8 RSI: 000001007680ded0 RDI: 00000100a7294000 RBP: 00000100570f45b8 R08: 0000010080000638 R09: 00000100828233d8 R10: 0000000000000001 R11: 0000000000000000 R12: 00000100570f4580 R13: 00000000003db000 R14: 0000000000000000 R15: 000000000051a780 FS: 00000000005260a0(0000) GS:ffffffff804bbf00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a9556c000 CR3: 0000000000101000 CR4: 00000000000006e0 Process top (pid: 26375, stackpage=1007680d000) Stack: 000001007680dd80 0000000000000018 0000000000100000 ffffffff80117040 ffffffff80117130 0000000000000001 0000000000000000 0000000000000000 ffffffff803ca8e0 0000000000000000 0000000000000000 0000000000000000 Call Trace: <EOE> [<ffffffff80184bf7>]{proc_pid_stat+407} [<ffffffff802c8f98>]{sprintf+136} [<ffffffff80183343>]{proc_base_lookup+563} [<ffffffff8016c5d1>]{dput+33} [<ffffffff80166129>]{__link_path_walk+2425} [<ffffffff80182491>]{proc_info_read+113} [<ffffffff8015466f>]{sys_read+191} [<ffffffff80110143>]{system_call+119} Code: f3 90 7e f5 e9 84 e8 ff ff e8 70 1b 14 00 e9 80 e9 ff ff 41
RIP; ffffffff80185d02 <.text.lock.array+7/105> <=====
Trace; ffffffff802c8f98 <sprintf+88/90> Trace; ffffffff8016c5d1 <dput+21/160> Trace; ffffffff80182491 <proc_info_read+71/100> Trace; ffffffff80110143 <system_call+77/7c> Code; ffffffff80185d02 <.text.lock.array+7/105> 0000000000000000 <_RIP>: Code; ffffffff80185d02 <.text.lock.array+7/105> <===== 0: f3 90 repz nop <===== Code; ffffffff80185d04 <.text.lock.array+9/105> 2: 7e f5 jle fffffffffffffff9 <_RIP+0xfffffffffffffff9> Code; ffffffff80185d06 <.text.lock.array+b/105> 4: e9 84 e8 ff ff jmpq ffffffffffffe88d <_RIP+0xffffffffffffe88d> Code; ffffffff80185d0b <.text.lock.array+10/105> 9: e8 70 1b 14 00 callq 141b7e <_RIP+0x141b7e> Code; ffffffff80185d10 <.text.lock.array+15/105> e: e9 80 e9 ff ff jmpq ffffffffffffe993 <_RIP+0xffffffffffffe993> Code; ffffffff80185d15 <.text.lock.array+1a/105> 13: 41 00 00 add %al,(%r8) 1 warning issued. Results may not be reliable.