On 11/12/2018 18.14, Andrei Borzenkov wrote:
11.12.2018 14:37, Carlos E. R. пишет:
Ah, got /proc/meminfo in another way:
minas-tirith:~ # cat /proc/meminfo > meminfo minas-tirith:~ # cat meminfo MemTotal: 3934240 kB MemFree: 106524 kB MemAvailable: 24012 kB
Your system has no free memory to do anything useful. So any attempt to start any new program will result in attempt to free something, then system must recall pages it just freed from disk again and to do it it needs to to find free memory again ... you get an idea.
Yes, I guessed that, but I don't know how it gets there. Procedure: watch TV on laptop. Stop TV. Go to sleep. Come back, locked. I had running: thunderbird (crashed), firefox, chrome. Thunderbird had been told to exit, days ago; but something remained loaded refusing to exit:
PID USER PR NI VIRT RES SHR SWAP S %CPU %MEM TIME+ COMMAND 4027 cer 20 0 1943072 3136 792 33888 S 0.043 0.080 0:20.19 thunderbird-bin
Buffers: 1596 kB Cached: 278208 kB SwapCached: 97988 kB Active: 2949780 kB Inactive: 618720 kB Active(anon): 2945904 kB Inactive(anon): 589672 kB Active(file): 3876 kB Inactive(file): 29048 kB
Most memory is consumed by some application(s) allocating (and actually using) large amount of memory. It is simply impossible that all loaded application binaries amount to just 33MB. On my system with rather static load active+inactive file is 4.8GB, and I have just Chromium + Thunderbird + Deluge + several terminal windows as part of user session.
Look, list of top processes sorted by memory (RES) just after killing Thunderbird and Chrome, so with responsive system: top - 13:31:23 up 4 days, 11:29, 5 users, load average: 0.85, 19.32, 34.43 Tasks: 271 total, 1 running, 270 sleeping, 0 stopped, 0 zombie %Cpu0 : 1.5 us, 1.3 sy, 0.0 ni, 97.1 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st %Cpu1 : 1.5 us, 0.7 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 3934240 total, 1306420 free, 2123600 used, 504220 buff/cache KiB Swap: 6289412 total, 5290168 free, 999244 used. 1449112 avail Mem PID USER PR NI VIRT RES SHR SWAP S %CPU %MEM TIME+ COMMAND 3256 cer 20 0 2543792 805272 15840 30120 S 0.000 20.47 0:53.40 Web Content 3342 cer 20 0 2336788 640792 37524 32844 S 0.000 16.29 1:24.18 Web Content 3164 cer 20 0 9487996 350420 61860 52756 S 0.439 8.907 7:32.39 firefox 3381 cer 20 0 1764236 77596 6964 32360 S 0.000 1.972 0:19.15 Web Content 2703 root 20 0 449308 24744 11532 15652 S 1.023 0.629 40:07.73 X 3397 cer 20 0 1688972 20756 2840 44604 S 0.000 0.528 0:07.70 Web Content 4038 cer 20 0 1038948 15516 4120 18216 S 0.146 0.394 2:02.10 xfce4-terminal 4067 cer 20 0 555352 12760 6484 2044 S 0.439 0.324 14:42.96 panel-18-weathe 4003 cer 20 0 869880 11492 1344 6992 S 0.000 0.292 0:10.20 xfdesktop 4319 cer 20 0 865652 7916 3404 4680 S 0.000 0.201 1:33.80 contarcorreo 4269 cer 39 19 918436 7792 780 86756 S 0.000 0.198 0:34.50 tracker-miner-f 4030 cer 20 0 875056 7192 2900 4028 S 0.585 0.183 18:49.47 gkrellm 22832 root 20 0 18420 6700 2188 508 S 0.000 0.170 0:00.35 bash 3989 cer 20 0 747576 6016 3700 5944 S 0.000 0.153 0:16.09 xfce4-panel 3981 cer 20 0 1003288 5048 112 6780 S 0.000 0.128 0:07.42 Thunar Nothing is really using a lot memory, unless we count virtual memory. What is above sums 2 GiB. Now, virt is 25GiB. Look, I had killed Thunderbird and Chrome, which were: PID USER PR NI VIRT RES SHR SWAP S %CPU %MEM TIME+ COMMAND 4027 cer 20 0 1943072 3136 792 33888 S 0.043 0.080 0:20.19 thunderbird-bin 12526 cer 20 0 1453444 137488 2816 14312 D 3.001 3.495 145:56.66 chrome 12670 cer 20 0 1703752 151676 5356 15124 D 0.749 3.855 104:20.03 chrome 12563 cer 20 0 2089092 63880 4524 243644 D 0.400 1.624 93:40.60 chrome 15952 cer 20 0 876624 118512 16556 21176 D 0.289 3.012 6:02.88 chrome 14060 cer 20 0 538600 4968 176 9852 D 0.238 0.126 140:14.77 chrome I should have queried sort by RES before starting killing, but you can guess that each operation took minutes. But I have it on the first incident on this thread: top - 21:48:19 up 1 day, 19:46, 5 users, load average: 9.18, 11.24, 14.59 Tasks: 291 total, 4 running, 287 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.4 us, 18.4 sy, 0.0 ni, 9.9 id, 68.1 wa, 0.0 hi, 1.1 si, 0.0 st KiB Mem : 3934240 total, 106132 free, 3417824 used, 410284 buff/cache KiB Swap: 6289412 total, 4956028 free, 1333384 used. 56300 avail Mem PID USER PR NI VIRT RES SHR SWAP USED S %CPU %MEM TIME+ COMMAND 4355 cer 20 0 9.857g 1.168g 33104 0 1.168g D 0.946 31.13 5:11.20 firefox 4603 cer 20 0 2189304 497600 20004 0 497600 S 0.000 12.65 0:20.22 Web Content 4476 cer 20 0 2121180 369828 4 0 369828 S 0.000 9.400 0:31.32 Web Content 12670 cer 20 0 1645688 204172 16424 24272 228444 S 2.208 5.190 60:26.25 chrome 4574 cer 20 0 1800896 123752 4776 0 123752 S 0.000 3.146 0:36.18 Web Content 4616 cer 20 0 1748260 113256 4444 0 113256 S 0.000 2.879 0:09.17 Web Content 12526 cer 20 0 1436224 93956 2964 20672 114628 D 4.416 2.388 84:47.28 chrome 15650 cer 20 0 896392 84652 12128 31548 116200 D 1.262 2.152 10:37.98 chrome 16330 cer 20 0 836828 81516 19036 29160 110676 D 0.631 2.072 3:20.19 chrome 12563 cer 20 0 2077116 79692 6032 228036 307728 S 0.000 2.026 61:25.77 chrome 15996 cer 20 0 827280 77852 15320 24792 102644 S 0.631 1.979 3:16.00 chrome 15952 cer 20 0 832136 77220 2944 26892 104112 S 0.000 1.963 3:07.12 chrome 15265 cer 20 0 845048 57020 728 33540 90560 S 0.000 1.449 11:23.63 chrome 16400 cer 20 0 793004 41616 744 26940 68556 S 0.315 1.058 1:22.59 chrome 2703 root 20 0 450280 41456 28532 15100 56556 D 0.631 1.054 22:37.38 X 4003 cer 20 0 869880 10892 8 6192 17084 S 0.000 0.277 0:05.23 xfdesktop 4038 cer 20 0 1036900 10676 28 17256 27932 S 0.000 0.271 0:37.96 xfce4-terminal 4030 cer 20 0 875056 8264 1888 1888 10152 S 0.631 0.210 9:59.47 gkrellm firefox was using 1.168g, but that is not "a lot", I have seen it at two and the machine huuging away happily and responsive.
Again - it means that any attempt to do anything will require loading binaries from disk. This alone would account for high wait time observed earlier.
What is your swappiness settings?
Defaults. Never touched it. minas-tirith:~ # cat /proc/sys/vm/swappiness /proc/sys/vm/vfs_cache_pressure /proc/sys/vm/min_free_kbytes /proc/sys/vm/watermark_scale_factor 60 100 67584 10 minas-tirith:~ # /etc/sysctl.conf: ### converted from /etc/sysconfig/sysctl at Mon, 06 Jan 2014 02:32:57 +0100 net.ipv4.tcp_syncookies = 1 net.ipv4.ip_forward = 0 net.ipv6.conf.all.forwarding = 0 net.ipv4.tcp_ecn = 0 # net.ipv6.conf.all.disable_ipv6 = 1
Unevictable: 80 kB Mlocked: 80 kB SwapTotal: 6289412 kB SwapFree: 4813752 kB Dirty: 104 kB Writeback: 0 kB AnonPages: 3282560 kB Mapped: 88652 kB Shmem: 246612 kB Slab: 108992 kB SReclaimable: 42936 kB SUnreclaim: 66056 kB KernelStack: 13696 kB PageTables: 79240 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8256532 kB Committed_AS: 13514900 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 399360 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 644416 kB DirectMap2M: 3450880 kB minas-tirith:~ #
Ok, I wait for suggestions on what to do next.
Your system is overloaded. Either do not run programs that consume such amount of RAM or add RAM. Tweaking swappiness may help to prevent pushing programs out of memory.
No, that's not it. I was using happily that machine during the night, then I left it alone, not closing any program, I went to sleep, and some 12 hours it crashed. Only cron and timers were changing things. I have been using the same load a long time, with 42.3, before upgrading to 15.0 I'll try removing "btrfsmaintenance". -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)