Bug ID 1183990
Summary I think I found the cause of a kernel lock when attempting hibernation
Classification openSUSE
Product openSUSE Distribution
Version Leap 15.2
Hardware Other
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-bugs@opensuse.org
Reporter carlos.e.r@opensuse.org
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

Hi,

For months I have been experiencing a kernel "trouble" in _some_ of my attempts
to hibernate. Sometimes the hibernation would stall and not proceed. Issuing
"systemctl hibernate" replied that there was one in progress, but there was no
progress. I would attempt to halt the machine, but this would also stall at
some point. If I pressed ctrl-alt-del several times, fast, I would see the
message that it had detected the keys seven times and would halt immediately,
but it did not.

I had no way out but hit the power switch, and suffer the long fsck the next
morning.

Nothing in the logs whatsoever.

Well, one day I noticed in "atop" that one of my disks went to 100% busy when
this happened. So I left running another instance of gkrellm, displaying the
i/o state of all my partitions in /dev/sdc, and experimenting with "sync" I
noticed it was sdc9 which was active, at something like 400 Kbps.

I noticed that "sync" would take sometimes a minute to complete.

/dev/sdc9 is a reiserfs, and has several bind mounts:

  /dev/sdc9 on /data/Lareiserfs type reiserfs
(rw,relatime,lazytime,user_xattr,acl)

bind mounts:

  /dev/sdc9 on /data/homedvl type reiserfs
(rw,relatime,lazytime,user_xattr,acl)
  /dev/sdc9 on /usr/share/flightgear type reiserfs
(rw,relatime,lazytime,user_xattr,acl)
  /dev/sdc9 on /var/spool/news type reiserfs
(rw,relatime,lazytime,user_xattr,acl)
  /dev/sdc9 on /home/cer/terrasync type reiserfs
(rw,relatime,lazytime,user_xattr,acl)
  /dev/sdc9 on /usr/src type reiserfs (rw,relatime,lazytime,user_xattr,acl)


Now, the directory that is active is "/var/spool/news". I use leafnode nntp
proxy server. It contains 1.2 million files in about 3 GB of space.


I found that if I run this sequence:

time sync
time sync && systemctl hibernate

the machine hibernates successfully - 13 days so far, a record.


Most days it takes a minute to sync, but one day, I noticed it took several
minutes. Why? Well, it happened that at the same time "/usr/sbin/texpire" was
working (a cronjob triggers it). This task runs for about half an hour daily
expunging old posts, meaning it examines a million files.


Maybe with this (tentative) report you can improve the kernel response so that
it doesn't stall when trying to hibernate, assumedly when taking too long to
sync. Me would think that the kernel should stop tasks before doing the sync
:-?

At least, running sync manually I can detect the situation and kill the busy
task before suffering the crash.




On the other hand, maybe there is an issue on reiserfs with "lazytime" (which
is default), delaying the writes of "something" till forced to. My wild guess,
it delays the timestamp that registers a file was touched. Each time I read a
post, or Thunderbird scans a post, the timestamp (sorry, I don't remember which
exact timestamp it is) is written, but it is not actually written but delayed
"for ever".

I use reiserfs for this mount because in theory it should work better than
others with millions of small files.


You are receiving this mail because: