Re: [opensuse] System slowdowns

4 Oct 2016

      On 2016-10-04 20:25, Marc Chamberlin wrote:
...
On 10/3/2016 7:16 PM, Carlos E. R. wrote:
...
Thanks John, Carlos for your thoughts...  I ran the smartctl checks
against my disk drives and yeah one of the drives is showing some
failures although it's overall summary reports that the drive is
healthy. I noted that my swap partition was on the drive that is showing
errors, so decided to move the swap partition to a different drive to
see if that would help. It didn't.  I will go ahead and replace this
drive just to be safe and report back if that indeed improves things...
Recently one of my disks developed an error. I figured, from smartctl,
that the affected sectors were in the swap partition. So I just did
swapoff on that one, wrote it entirely with zeroes, and run smartctl
again. Clear. mkswap and activate it again. No issues. Watching it to
see if the number of bad sectors increase, then consider replacing it.
...
However, for my two cents worth, speaking as a computer
scientist/engineer myself, I would agree with Carlos. Processes are
suspend when they are waiting for I/O operations to complete. That is
normal OS behavior. So if a disk drive is having a hard time reading or
writing, then that would just result in the process being suspended for
a longer period of time, while the drive (using it's own internal
processor) figures out how to solve the problem it is having. And a
suspended process is not using CPU cycles. So me thinks that disk drive
errors should not result in high CPU usage, like I am seeing.
Right.
...
Something
is periodically and definitely using up a LOT of CPU time and top, atop,
and ksystemguard are not reporting why. Although ksystemguard will
graphically show that both of my CPU cores are running at or near 100%
usage. And the total amount of CPU time being reported for each of the
individual running processes does not add up to anything near 100%
usage. Typically I see less than 15% usage total. Once in awhile I will
see the total of the systemd and/or kdeinit5 processes climb up higher,
during these episodes of 100% CPU usage, but again nowhere near 100%.
There is a toggle in top to show other threads, I think kernel threads.
Wait a minute [...] Yes, I think it is "H".
...
I can believe it is something in the kernel itself, which these
monitoring tools may not know how to report on. Or perhaps my system has
been compromised by one heck of a smart virus or trojan that is able to
hide it's activities, who knows. I do use it for a number of servers and
it is exposed to the internet 24/7. I am not enough of an expert on the
internals of the Linux kernel to know how to diagnose it or what tools I
should/could use to figure out what it is doing.
One other symptom that is interesting is that the display can become
unresponsive to mouse/cursor movements/clicks as well. I don't know
whether Linux handles mouse events by interrupt handlers or polls the
mouse, but either way I would expect mouse events to be handled at a
fairly high priority. The fact that the display becomes unresponsive to
these mouse events seems to further point towards the idea that
something going on within the kernel/OS itself.
I had a curious issue several years ago. The system was very very busy.
It turned out that it was process number 1, init. The culprit was my
modem: unplug it and zero cpu. I rebooted my modem, problem solved.

Apparently my modem was triggering many interrupts in the serial port,
and again apparently they were handled by init.

-- 
Cheers / Saludos,

		Carlos E. R.
		(from 13.1 x86_64 "Bottle" at Telcontar)