[opensuse] BUG: Bad page state in process suse.de-cron-lo pfn:6db6db6db6e5d85e ??
All, When it rains it pours. I have an interesting issue with an old laptop (still running 13.1), that in the past has runs for weeks/months on end, now suddenly beginning to freeze ever day or two. The larger issue is I can't put my finger on the reason why... It's not a resources full issue: 23:25 alchemy:~> df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 40G 21G 17G 55% / devtmpfs 1.9G 16K 1.9G 1% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 3.9M 1.9G 1% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup tmpfs 1.9G 3.9M 1.9G 1% /var/run tmpfs 1.9G 3.9M 1.9G 1% /var/lock /dev/sda3 647G 232G 382G 38% /home 23:28 alchemy:~> free -tm total used free shared buffers cached Mem: 3832 376 3455 3 49 172 -/+ buffers/cache: 155 3677 Swap: 2053 0 2053 Total: 5886 376 5509 and there doesn't seem to be anything consistent captured in messages. Most times the log just stops, then the next boot picks up. This is a representative freeze, restart set of messages: 2017-09-28T04:59:54.199029-05:00 alchemy dbus[742]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.ModemManager1.service': Unit dbus-org.freedesktop.ModemManager1.service failed to load: No such file or directory. 2017-09-28T05:00:01.745535-05:00 alchemy /usr/sbin/cron[3963]: pam_unix(crond:session): session opened for user root by (uid=0) 2017-09-28T05:00:01.752675-05:00 alchemy systemd[1]: Starting Session 72 of user root. 2017-09-28T05:00:01.755741-05:00 alchemy systemd[1]: Started Session 72 of user root. 2017-09-28T05:00:02.404886-05:00 alchemy su: (to root) root on (null) 2017-09-28T05:00:02.406507-05:00 alchemy su: pam_unix(su:session): session opened for user nobody by (uid=0) 2017-09-28T05:01:06.179306-05:00 alchemy su: pam_unix(su:session): session closed for user nobody 2017-10-01T20:19:32.118923-05:00 alchemy rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="744" x-info="http://www.rsyslog.com"] start However, I did manage to catch a: 2017-10-06T05:01:12.680379-05:00 alchemy kernel: [44716.073470] BUG: Bad page state in process suse.de-cron-lo pfn:6db6db6db6e5d85e 2017-10-06T05:01:12.680402-05:00 alchemy kernel: [44716.073478] page:ffffea0002475490 count:-5632 mapcount:38229105 mapping:000000000098277e index:0x2ffffffff Huh? kernel Bad page state in process suse.de-cron-lo? Searching around, I found a kernel bug that seems semi-related: https://lkml.org/lkml/2016/8/5/371 However, this does not explain, "Why in the hell did it used to run for weeks/months (until I rebooted), to now freezing?" Usual suspects would be memory (test fine) and hard disk (swap corruption?). But I have no indication of any disk issues. So I guess my plea for help is basically a "What else to check?" question. How do you troubleshoot a problem that doesn't appear in the logs. (it's a laptop, so I can't just pull the case cover and check for puffy caps -- which may be a consideration). Ideas? I am no expert in kernel Call Traces. I can read them, I understand what the stack pointer address are saying, I'm just not 100% sure what it is telling me. I'm also at a loss to decipher the Code: kernel: [44716.074017] Code: 8b 44 24 18 4d 8d 2c 07 4d 3b 6d 00 0f 84 e9 02 00 00 45 85 e4 0f 84 d3 02 00 00 4d 8b 6d 08 49 83 ed 20 49 8b 45 28 49 8b 55 20 <48> 89 42 08 48 89 10 48 b8 00 01 10 00 00 00 ad de 49 89 45 20 Anybody here speak kernel Call Trace? Here is the rest of the kernel trace for the freeze I did capture. (attached) -- David C. Rankin, J.D.,P.E.
Den 2017-10-07 kl. 06:42, skrev David C. Rankin:
I am no expert in kernel Call Traces.
Me neither. But I do it from time to time in our company. Mostly because no one else does it :)
I can read them, I understand what the stack pointer address are saying, I'm just not 100% sure what it is telling me. I'm also at a loss to decipher the Code:
kernel: [44716.074017] Code: 8b 44 24 18 4d 8d 2c 07 4d 3b 6d 00 0f 84 e9 02 00 00 45 85 e4 0f 84 d3 02 00 00 4d 8b 6d 08 49 83 ed 20 49 8b 45 28 49 8b 55 20 <48> 89 42 08 48 89 10 48 b8 00 01 10 00 00 00 ad de 49 89 45 20
The Code-part is a hexdump of the machine code currently running at the time of the crash. It can be deciphered with the objdump-command. The instruction pointer is on <48> in your byte sequence at 0x2b so you need to adjust for that. 0x48-0x2b=0x1d Run this: #!/bin/sh cat <<EOF > yourmachinecode.s .text .globl foo foo: .byte 0x8b,0x44,0x24,0x18,0x4d,0x8d,0x2c,0x07,0x4d,0x3b,0x6d,0x00,0x0f,0x84,0xe9,0x02,0x00,0x00,0x45,0x85,0xe4,0x0f,0x84,0xd3,0x02,0x00,0x00,0x4d,0x8b,0x6d,0x08,0x49,0x83,0xed,0x20,0x49,0x8b,0x45,0x28,0x49,0x8b,0x55,0x20,0x89,0x42,0x08,0x48,0x89,0x10,0x48,0xb8,0x00,0x01,0x10,0x00,0x00,0x00,0xad,0xde,0x49,0x89,0x45,0x20 EOF gcc -c -o yourmachinecode.o yourmachinecode.s objdump --adjust-vma=0x1d --disassemble yourmachinecode.o
Anybody here speak kernel Call Trace?
Unfortunately not fluently regards, -- /bengan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/07/2017 12:42 AM, David C. Rankin wrote:
All,
When it rains it pours. I have an interesting issue with an old laptop (still running 13.1), that in the past has runs for weeks/months on end, now suddenly beginning to freeze ever day or two. The larger issue is I can't put my finger on the reason why...
It's not a resources full issue:
Might it be temperature issues? Dust gets into everything. -- Ken Schneider SuSe since Version 5.2, June 1998 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/07/2017 09:20 AM, Ken Schneider - openSUSE wrote:
Might it be temperature issues? Dust gets into everything.
Thanks Ken, Bengt, It's not temperature. (I regularly put a thumb-tack through the fan grill to peg the fan and run a shop-vac over both sides to suck the dust-bunnies out. If you try this MAKE SURE you have pegged the fan, otherwise you will wildly overspeed your fan. Also, if you get dust-bunnies sucked up against the screen from the inside when you shop-vac it, just rub a bit of hook-velcro over the screen on the outstide and it will catch and pull them through... :) After another 3 hour memtest run, I do get a couple of errors on random bit test 7 near the end, 2-new sticks of pc2-5300 SODIMM ordered. Had to cough up $19 for 4G. I'll set you know if that does it. Bengt, glad to see you use a heredoc similar to the way I use it for small C code tests. I put it altogether with: $ gcc -c -o ymc.o -x assembler - <<EOF .text .globl foo foo: .byte 0x8b,0x44,0x24,0x18,0x4d,0x8d,0x2c,0x07,0x4d,0x3b,0x6d,0x00,0x0f,\ 0x84,0xe9,0x02,0x00,0x00,0x45,0x85,0xe4,0x0f,0x84,0xd3,0x02,0x00,\ 0x00,0x4d,0x8b,0x6d,0x08,0x49,0x83,0xed,0x20,0x49,0x8b,0x45,0x28,\ 0x49,0x8b,0x55,0x20,0x89,0x42,0x08,0x48,0x89,0x10,0x48,0xb8,0x00,\ 0x01,0x10,0x00,0x00,0x00,0xad,0xde,0x49,0x89,0x45,0x20 EOF The 'movabs $0xdead000000100100,%rax' looks suspicious, even though it is a valid 64-bit number, it is well above what I would expect as a valid address: $ objdump --adjust-vma=0x1d --disassemble ymc.o ymc.o: file format elf64-x86-64 Disassembly of section .text: 000000000000001d <foo>: 1d: 8b 44 24 18 mov 0x18(%rsp),%eax 21: 4d 8d 2c 07 lea (%r15,%rax,1),%r13 25: 4d 3b 6d 00 cmp 0x0(%r13),%r13 29: 0f 84 e9 02 00 00 je 318 <foo+0x2fb> 2f: 45 85 e4 test %r12d,%r12d 32: 0f 84 d3 02 00 00 je 30b <foo+0x2ee> 38: 4d 8b 6d 08 mov 0x8(%r13),%r13 3c: 49 83 ed 20 sub $0x20,%r13 40: 49 8b 45 28 mov 0x28(%r13),%rax 44: 49 8b 55 20 mov 0x20(%r13),%rdx 48: 89 42 08 mov %eax,0x8(%rdx) 4b: 48 89 10 mov %rdx,(%rax) 4e: 48 b8 00 01 10 00 00 movabs $0xdead000000100100,%rax 55: 00 ad de 58: 49 89 45 20 mov %rax,0x20(%r13) -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/08/2017 04:08 AM, David C. Rankin wrote:
On 10/07/2017 09:20 AM, Ken Schneider - openSUSE wrote:
Might it be temperature issues? Dust gets into everything.
Thanks Ken, Bengt,
It's not temperature.
<snip>
After another 3 hour memtest run, I do get a couple of errors on random bit test 7 near the end, 2-new sticks of pc2-5300 SODIMM ordered. Had to cough up $19 for 4G.
David, cry me a river. I still remember once upon a time when I spent almost $600 for 512 MEG of ram to upgrade a system ;-) You still in Nacogdoches? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/08/2017 09:38 AM, Stevens wrote:
David, cry me a river. I still remember once upon a time when I spent almost $600 for 512 MEG of ram to upgrade a system ;-)
You still in Nacogdoches?
Yep, still trying to avoid starving to death in the woods :) I got you one better, my very first box, a 386/33 with 387 math coprocessor, I spent a whopping $400 for 1M of RAM. Sheeze... Then buying 40G for $27 (with shipping) for full ecc-server RAM really showed how Moore's law had lived up to its exponential billing. With the first box, I got a whopping 120M hard drive (I couldn't wait for DOS 4.1 to come out to get rid of the 33M partition limit.) -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2017-10-11 07:32, David C. Rankin wrote:
On 10/08/2017 09:38 AM, Stevens wrote:
David, cry me a river. I still remember once upon a time when I spent almost $600 for 512 MEG of ram to upgrade a system ;-)
You still in Nacogdoches?
Yep, still trying to avoid starving to death in the woods :)
I got you one better, my very first box, a 386/33 with 387 math coprocessor, I spent a whopping $400 for 1M of RAM. Sheeze... Then buying 40G for $27 (with shipping) for full ecc-server RAM really showed how Moore's law had lived up to its exponential billing.
With the first box, I got a whopping 120M hard drive (I couldn't wait for DOS 4.1 to come out to get rid of the 33M partition limit.)
But did you notice that buying the next computer costs about the same as the previous computer? And that you can not buy a computer (new) of limited power for much less money instead? -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" (Minas Tirith))
Hello, On Fri, 06 Oct 2017, David C. Rankin wrote:
When it rains it pours. I have an interesting issue with an old laptop (still running 13.1), that in the past has runs for weeks/months on end, now suddenly beginning to freeze ever day or two. The larger issue is I can't put my finger on the reason why... [..] Anybody here speak kernel Call Trace? Here is the rest of the kernel trace for the freeze I did capture. (attached)
man ksymoops And it looks like bad RAM to me. -dnh -- Hanna: My mother is dead. [..] It's all right, it happend a long time ago. Rachel: Hanna, what did your mum die of? Hanna (dead pan, matter of fact): Three bullets. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/08/2017 01:02 PM, David Haller wrote:
man ksymoops
And it looks like bad RAM to me.
-dnh
You are sill firing on all 8 dnh! -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (6)
-
Bengt Gördén
-
Carlos E. R.
-
David C. Rankin
-
David Haller
-
Ken Schneider - openSUSE
-
Stevens