[suse-amd64] - Segfaults in ld, grep
Hello all, we have a new server - 2xOpteron 246, motherboard MSI-K8D Master-133 (chipset AMD8131) and 2GB ECC dual-channel memories Kingston, used fs is XFS. I installed SUSE LINUX 10.0 (X86-64). During compilation of large projects (for example linux kernel) and sometimes during system boot following messages in syslog appear (and compilation stops): Oct 14 10:00:32 linux kernel: ld[7703]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffc90170 error 4 Oct 14 15:01:43 linux kernel: grep[32266]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffa66800 error 4 Oct 14 15:01:44 linux kernel: grep[32355]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffff958720 error 4 Oct 14 15:01:44 linux kernel: grep[32395]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffcd2600 error 4 Oct 14 15:01:44 linux kernel: grep[32458]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffed3e10 error 4 Nov 15 15:44:53 cedr kernel: ld[19959]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffff807c90 error 4 Nov 15 15:45:50 cedr kernel: ld[21868]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffd78a80 error 4 All addresses are the same - 00002aaaaaaae05c. In /proc/self/maps these address belongs to /lib64/ld-2.3.5.so I tried check integrity of installed grep, binutils, gcc and glibc - all were valid. I tried to add/remove/change/move memory modules, running memtest, altering different BIOS settings (including load defaults), but the same messages always appear. When I tried to boot kernel with maxcpus=1 everything was ok. Now I installed SUSE LINUX 10.0 (i386) and running the same compilation - no errors appear. On this systen I tried to run x86_64 kernel (with all these i386 32-bit programs and libraries) and everything works also ok. Do you have any ideas, why programs don't run reliable on our computer under x86_64-smp? Is it possible, that problem can be in suse 10.0 bundled glibc-2.3.5 for x86_64? Thanks in advance, Jan Zahradnik ________ Information from NOD32 ________ This message was checked by NOD32 Antivirus System for Linux Mail Server. http://www.nod32.com
On Tuesday 22 November 2005 14:12, Jan Zahradník wrote:
Hello all, we have a new server - 2xOpteron 246, motherboard MSI-K8D
During compilation of large projects (for example linux kernel) and sometimes during system boot following messages in syslog appear (and compilation stops):
Hi, these messages are routinely appearing in my syslog after updating to 10.0: grep[5726]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffff6370 error 4 vendor.pl[8012]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007ffffff3f3b0 error 4 I thought it has something to do with the weird MONO libraries that yast insisted to install, and which were manually removed :) The machine is dual Opteron 250, motherboard Tyan S2885, 8GB RAM. The quality of the "standard" suse kernel is, hmm, decreasing. I get now an oops on inserting a firewire HDD. 'modprobe wbsd' on an amd64 notebook also oopses, but it is a known problem since ages. Oleg. ------ processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 37 model name : AMD Opteron(tm) Processor 250 stepping : 1 cpu MHz : 2390.467 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm bogomips : 4781.16 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
Oleg Gusev wrote:
Hi,
these messages are routinely appearing in my syslog after updating to 10.0:
grep[5726]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffff6370 error 4 vendor.pl[8012]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007ffffff3f3b0 error 4
I thought it has something to do with the weird MONO libraries that yast insisted to install, and which were manually removed :) The machine is dual Opteron 250, motherboard Tyan S2885, 8GB RAM. The quality of the "standard" suse kernel is, hmm, decreasing. I get now an oops on inserting a firewire HDD.
This is not problem of "standard" suse kernel - under suse 10.0 for x86_64 I compiled and executed vanilla kernel 2.6.13.4, but it was generating same error messages as suse kernel. There could be problem somewhere in memory synchronization between processors.
'modprobe wbsd' on an amd64 notebook also oopses, but it is a known problem since ages.
Oleg.
Jan Zahradnik ________ Information from NOD32 ________ This message was checked by NOD32 Antivirus System for Linux Mail Server. http://www.nod32.com
On Tue, 22 Nov 2005 14:51:23 +0100
Oleg Gusev
On Tuesday 22 November 2005 14:12, Jan Zahradník wrote:
Hello all, we have a new server - 2xOpteron 246, motherboard MSI-K8D
During compilation of large projects (for example linux kernel) and sometimes during system boot following messages in syslog appear (and compilation stops):
Hi,
these messages are routinely appearing in my syslog after updating to 10.0:
grep[5726]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffff6370 error 4 vendor.pl[8012]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007ffffff3f3b0 error 4
Hello, I also had a grep segv problem. And it is a known problem. https://bugzilla.novell.com/show_bug.cgi?id=117197 Of course my mobo's BIOS(HDAMB) updated. but the problem is not fixed. :( Therefore I solved a problem for the moment by Thomas Renniger's patch. eshsf
Hi,
* Jan Zahradník
Hello all, we have a new server - 2xOpteron 246, motherboard MSI-K8D
Master-133 (chipset AMD8131) and 2GB ECC dual-channel memories Kingston, used fs is XFS. I installed SUSE LINUX 10.0 (X86-64). During compilation of large projects (for example linux kernel) and sometimes during system boot following messages in syslog appear (and compilation stops): Oct 14 10:00:32 linux kernel: ld[7703]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffc90170 error 4 Oct 14 15:01:43 linux kernel: grep[32266]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffa66800 error 4 Oct 14 15:01:44 linux kernel: grep[32355]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffff958720 error 4 Oct 14 15:01:44 linux kernel: grep[32395]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffcd2600 error 4 Oct 14 15:01:44 linux kernel: grep[32458]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffed3e10 error 4 Nov 15 15:44:53 cedr kernel: ld[19959]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffff807c90 error 4 Nov 15 15:45:50 cedr kernel: ld[21868]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffd78a80 error 4
All addresses are the same - 00002aaaaaaae05c. In /proc/self/maps these address belongs to /lib64/ld-2.3.5.so
I tried check integrity of installed grep, binutils, gcc and glibc - all were valid. I tried to add/remove/change/move memory modules, running memtest, altering different BIOS settings (including load defaults), but the same messages always appear. When I tried to boot kernel with maxcpus=1 everything was ok. Now I installed SUSE LINUX 10.0 (i386) and running the same compilation - no errors appear. On this systen I tried to run x86_64 kernel (with all these i386 32-bit programs and libraries) and everything works also ok.
Do you have any ideas, why programs don't run reliable on our computer under x86_64-smp? Is it possible, that problem can be in suse 10.0 bundled glibc-2.3.5 for x86_64?
This problem should be fixed with a BIOS update.
Thanks in advance,
Jan Zahradnik
Regards, Stefan -- SUSE LINUX Products GmbH, Maxfeldstr. 5 Mail: sf@suse.de D-90409 Nuernberg Phone: +49-911-740 53 - 0 GPG 1024D/91614BBC B226 E3DA 37B0 2170 7403 D19C 18AF E579 9161 4BBC
participants (4)
-
eshsf
-
Jan Zahradník
-
Oleg Gusev
-
Stefan Fent