[opensuse] BUG: Bad page state in process suse.de-cron-lo pfn:6db6db6db6e5d85e ??

7 Oct 2017

      All,

  When it rains it pours. I have an interesting issue with an old laptop
(still running 13.1), that in the past has runs for weeks/months on end, now
suddenly beginning to freeze ever day or two. The larger issue is I can't put
my finger on the reason why...

  It's not a resources full issue:

23:25 alchemy:~> df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        40G   21G   17G  55% /
devtmpfs        1.9G   16K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           1.9G  3.9M  1.9G   1% /run
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
tmpfs           1.9G  3.9M  1.9G   1% /var/run
tmpfs           1.9G  3.9M  1.9G   1% /var/lock
/dev/sda3       647G  232G  382G  38% /home
23:28 alchemy:~> free -tm
             total       used       free     shared    buffers     cached
Mem:          3832        376       3455          3         49        172
-/+ buffers/cache:        155       3677
Swap:         2053          0       2053
Total:        5886        376       5509

  and there doesn't seem to be anything consistent captured in messages. Most
times the log just stops, then the next boot picks up. This is a
representative freeze, restart set of messages:

2017-09-28T04:59:54.199029-05:00 alchemy dbus[742]: [system] Activation via
systemd failed for unit 'dbus-org.freedesktop.ModemManager1.service': Unit
dbus-org.freedesktop.ModemManager1.service failed to load: No such file or
directory.
2017-09-28T05:00:01.745535-05:00 alchemy /usr/sbin/cron[3963]:
pam_unix(crond:session): session opened for user root by (uid=0)
2017-09-28T05:00:01.752675-05:00 alchemy systemd[1]: Starting Session 72 of
user root.
2017-09-28T05:00:01.755741-05:00 alchemy systemd[1]: Started Session 72 of
user root.
2017-09-28T05:00:02.404886-05:00 alchemy su: (to root) root on (null)
2017-09-28T05:00:02.406507-05:00 alchemy su: pam_unix(su:session): session
opened for user nobody by (uid=0)
2017-09-28T05:01:06.179306-05:00 alchemy su: pam_unix(su:session): session
closed for user nobody
2017-10-01T20:19:32.118923-05:00 alchemy rsyslogd: [origin software="rsyslogd"
swVersion="7.4.7" x-pid="744" x-info="http://www.rsyslog.com"] start

  However, I did manage to catch a:

2017-10-06T05:01:12.680379-05:00 alchemy kernel: [44716.073470] BUG: Bad page
state in process suse.de-cron-lo  pfn:6db6db6db6e5d85e
2017-10-06T05:01:12.680402-05:00 alchemy kernel: [44716.073478]
page:ffffea0002475490 count:-5632 mapcount:38229105 mapping:000000000098277e
index:0x2ffffffff

  Huh? kernel Bad page state in process suse.de-cron-lo? Searching around, I
found a kernel bug that seems semi-related:

https://lkml.org/lkml/2016/8/5/371

However, this does not explain, "Why in the hell did it used to run for
weeks/months (until I rebooted), to now freezing?" Usual suspects would be
memory (test fine) and hard disk (swap corruption?). But I have no indication
of any disk issues.

  So I guess my plea for help is basically a "What else to check?" question.
How do you troubleshoot a problem that doesn't appear in the logs. (it's a
laptop, so I can't just pull the case cover and check for puffy caps -- which
may be a consideration). Ideas?

  I am no expert in kernel Call Traces. I can read them, I understand what the
stack pointer address are saying, I'm just not 100% sure what it is telling
me. I'm also at a loss to decipher the Code:

kernel: [44716.074017] Code: 8b 44 24 18 4d 8d 2c 07 4d 3b 6d 00 0f 84 e9 02
00 00 45 85 e4 0f 84 d3 02 00 00 4d 8b 6d 08 49 83 ed 20 49 8b 45 28 49 8b 55
20 <48> 89 42 08 48 89 10 48 b8 00 01 10 00 00 00 ad de 49 89 45 20

  Anybody here speak kernel Call Trace? Here is the rest of the kernel trace
for the freeze I did capture. (attached)

-- 
David C. Rankin, J.D.,P.E.

David C. Rankin

Bengt Gördén

Ken Schneider - openSUSE

David C. Rankin

Stevens

David C. Rankin

Carlos E. R.

David Haller

David C. Rankin

tags

participants (6)