[oS-en] Reiserfs ongoing problem
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I have a problem with my one remaining reiserfs partition on rotating rust. The problem I initially detected was that telling the machine to hibernate sometimes appeared to stall, and I had to force a power off (via physical power switch) to recover. In fact, I found out later, the kernel was simply syncing one partition, and this operation would take half an hour, which it is absurd. Later I found out that it was only one partition affected, and this partition was (is) reiserfs. Then I guessed that the slow operation was syncing the metadata (the time stamps) of the news (nntp) spool. Traditionally a news server touches (and use) the access timestamp each time a post is read, affecting one file per read. With the default mount options used nowdays this modification stays in RAM for long, many hours. So, the cronjob that scans the news spool now also calls sync on that filesystem, and when hibernating I call sync in advance. This improved things, but not completely. Thus I finally resorted to moving the news spool only to another reiserfs partition created on an SSD disk, which usually syncs in seconds, instead of half an hour. So I thought, problem solved. But later on I noticed that the sync operation prior to hibernating on the old reiserfs partition still takes long, but only on some days - even if nothing writes there now. This is astonishing. Look, last Monday: Telcontar:~ # hibernate 2021-12-13 04:26:18+01:00 Checking news to send 2021-12-13 04:26:22+01:00 Syncing rm: cannot remove '/var/lib/news/bin/cronscriptparafetchnews.enabled': No such file or directory 2021-12-13 06:06:09+01:00 synced. Now screensaver xscreensaver-command: no screensaver is running on display :0.0 touch /var/lib/news/bin/cronscriptparafetchnews.enabled Telcontar:~ # An hour and a half to sync a not used partition! (I know it is only one partition because I monitor it with gkrellm) Hibernation script: date --rfc-3339=seconds echo Checking news to send /var/lib/news/bin/cronscriptparaenviarnewspendientes date --rfc-3339=seconds echo Syncing rm /var/lib/news/bin/cronscriptparafetchnews.enabled strace -o /tmp/hibernate.strace sync date --rfc-3339=seconds echo "synced. Now screensaver" xscreensaver-command -lock sleep 3 #sudo chvt 10 #sleep 1 sudo /usr/local/sbin/beep sleep 1 #systemctl hibernate sudo /usr/bin/systemctl hibernate The structure of the mount is this: Telcontar:~ # mount | grep sdc9 /dev/sdc9 on /data/Lareiserfs type reiserfs (rw,relatime,lazytime,user_xattr,acl) /dev/sdc9 on /usr/src type reiserfs (rw,relatime,lazytime,user_xattr,acl) /dev/sdc9 on /home/cer/terrasync type reiserfs (rw,relatime,lazytime,user_xattr,acl) /dev/sdc9 on /data/homedvl type reiserfs (rw,relatime,lazytime,user_xattr,acl) /dev/sdc9 on /usr/share/flightgear type reiserfs (rw,relatime,lazytime,user_xattr,acl) Telcontar:~ # The first one is the actual mount, the rest are bind mounts - fstab: LABEL=c_data_reiser /data/Lareiserfs reiserfs acl,user_xattr,barrier=flush,lazytime 1 3 /data/Lareiserfs/gamedata/flightgear /usr/share/flightgear none bind 0 0 /data/Lareiserfs/gamedata/terrasync /home/cer/terrasync none bind 0 0 /data/Lareiserfs/data_homedvl /data/homedvl none bind 0 0 /data/Lareiserfs/usr_src /usr/src none bind 0 0 Now I have added entries to my hibernate script to sync each reiserfs bind mount separately, find out which one it is, if any. - -- Cheers Carlos E. R. (from 15.2 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYbiC3Bwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVNQ0An0UB/PrQobQOFtt1bA5b ll35Ej7FAJ9E6b3Mh88V1OcIbudlN0SwdavIhw== =hvzD -----END PGP SIGNATURE-----
Hello, On Tue, 14 Dec 2021, Carlos E. R. wrote:
I have a problem with my one remaining reiserfs partition on rotating rust. [..] The first one is the actual mount, the rest are bind mounts - fstab:
LABEL=c_data_reiser /data/Lareiserfs reiserfs acl,user_xattr,barrier=flush,lazytime 1 3
That is not good for a newsspool. I use a loop-mounted image with reiserfs for that purpose. The relevant mount options: [..]/news_reiserfs.img /var/spool/news reiserfs loop,acl,user_xattr,strictatime,barrier=flush,noauto 0 0 Note the 'strictatime'. The image-file is on an ext3. HTH, -dnh -- printk("you lose buddy boy...\n"); linux-2.6.6/arch/sparc/kernel/traps.c
On Tue, 14 Dec 2021 15:49, David Haller <dnh@...> wrote:
Hello,
On Tue, 14 Dec 2021, Carlos E. R. wrote:
I have a problem with my one remaining reiserfs partition on rotating rust. [..] The first one is the actual mount, the rest are bind mounts - fstab:
LABEL=c_data_reiser /data/Lareiserfs reiserfs acl,user_xattr,barrier=flush,lazytime 1 3
That is not good for a newsspool. I use a loop-mounted image with reiserfs for that purpose. The relevant mount options:
[..]/news_reiserfs.img /var/spool/news reiserfs loop,acl,user_xattr,strictatime,barrier=flush,noauto 0 0
Note the 'strictatime'. The image-file is on an ext3.
Without wanting to start a filesystem-war, would xfs be an alternative to reiserfs for a news-spool usage? btrfs is (still) a no-go for this usecase imho, and not for debate here. - Yamaban.
On Wed, 15 Dec 2021 20:15:03 +0100 (CET) Yamaban <foerster@lisas.de> wrote:
On Tue, 14 Dec 2021 15:49, David Haller <dnh@...> wrote:
Hello,
On Tue, 14 Dec 2021, Carlos E. R. wrote:
I have a problem with my one remaining reiserfs partition on rotating rust. [..] The first one is the actual mount, the rest are bind mounts - fstab:
LABEL=c_data_reiser /data/Lareiserfs reiserfs acl,user_xattr,barrier=flush,lazytime 1 3
That is not good for a newsspool. I use a loop-mounted image with reiserfs for that purpose. The relevant mount options:
[..]/news_reiserfs.img /var/spool/news reiserfs loop,acl,user_xattr,strictatime,barrier=flush,noauto 0 0
Note the 'strictatime'. The image-file is on an ext3.
Without wanting to start a filesystem-war, would xfs be an alternative to reiserfs for a news-spool usage? btrfs is (still) a no-go for this usecase imho, and not for debate here.
xfs is regarded as good for large files and also large directories. reiserfs is good for small files and large directories. I believe a news spool is more like the small file case. But Carlos' problem isn't to do with the news spool anyway so further discussion is off-topic.
- Yamaban.
On 15/12/2021 20.15, Yamaban wrote:
On Tue, 14 Dec 2021 15:49, David Haller <dnh@...> wrote:
Hello,
Without wanting to start a filesystem-war, would xfs be an alternative to reiserfs for a news-spool usage? btrfs is (still) a no-go for this usecase imho, and not for debate here.
Not wanting a debate either, I'm curious about why not btrfs, because I was considering migrating to it. XFS should work, but would waste more disk size compared to reiserfs, and has the advantage of being actively developed, and is thread aware. ext4 has an specific optimization for news. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.2 (Legolas))
On 14/12/2021 15.49, David Haller wrote:
Hello,
On Tue, 14 Dec 2021, Carlos E. R. wrote:
I have a problem with my one remaining reiserfs partition on rotating rust. [..] The first one is the actual mount, the rest are bind mounts - fstab:
LABEL=c_data_reiser /data/Lareiserfs reiserfs acl,user_xattr,barrier=flush,lazytime 1 3
That is not good for a newsspool. I use a loop-mounted image with reiserfs for that purpose. The relevant mount options:
[..]/news_reiserfs.img /var/spool/news reiserfs loop,acl,user_xattr,strictatime,barrier=flush,noauto 0 0
Note the 'strictatime'. The image-file is on an ext3.
It makes it slower. Lazytime keeps the access time, but in ram. Anyway, the partition above no longer has the news spool. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.2 (Legolas))
On Wed, 15 Dec 2021 23:39:01 +0100 "Carlos E. R." <robin.listas@telefonica.net> wrote:
It makes it slower. Lazytime keeps the access time, but in ram.
Well, yes of course, but the problem you're having is slow hibernation not performance-limiting normal operation as far as you've told us.
On 16/12/2021 11.05, Dave Howorth wrote:
On Wed, 15 Dec 2021 23:39:01 +0100 "Carlos E. R." <robin.listas@telefonica.net> wrote:
It makes it slower. Lazytime keeps the access time, but in ram.
Well, yes of course, but the problem you're having is slow hibernation not performance-limiting normal operation as far as you've told us.
Yes, but the partition that is having problems no longer has the news spool, which was moved to SSD. On rotating rust reiserfs I have other data that AFAIK is not used. Not any other partition, xfs, ext4, btrfs..., on iron, has this problem, and all are mounted with the same options. And this problem is relatively recent. Maybe with Leap 15.x. What is the problem with reiserfs now? I can, when I get back home, mount that partition "noatime", and see. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.2 (Legolas))
On Tue, 14 Dec 2021 12:41:16 +0100 (CET) "Carlos E. R." <robin.listas@telefonica.net> wrote:
Hi,
I have a problem with my one remaining reiserfs partition on rotating rust.
The problem I initially detected was that telling the machine to hibernate sometimes appeared to stall, and I had to force a power off (via physical power switch) to recover. In fact, I found out later, the kernel was simply syncing one partition, and this operation would take half an hour, which it is absurd.
Later I found out that it was only one partition affected, and this partition was (is) reiserfs. Then I guessed that the slow operation was syncing the metadata (the time stamps) of the news (nntp) spool.
Traditionally a news server touches (and use) the access timestamp each time a post is read, affecting one file per read. With the default mount options used nowdays this modification stays in RAM for long, many hours.
So, the cronjob that scans the news spool now also calls sync on that filesystem, and when hibernating I call sync in advance. This improved things, but not completely.
Thus I finally resorted to moving the news spool only to another reiserfs partition created on an SSD disk, which usually syncs in seconds, instead of half an hour.
So I thought, problem solved.
But later on I noticed that the sync operation prior to hibernating on the old reiserfs partition still takes long, but only on some days - even if nothing writes there now. This is astonishing.
Have you tried using strictatime (and no lazytime)? Does that cure the sync problem? FWIW I don't see delays when I run sync on my reiser partition and I use noatime so your problem is most likely either related to the specific data you store or the atime modes you choose. [snip]
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2021-12-14 at 17:42 -0000, Dave Howorth wrote:
Hi,
I have a problem with my one remaining reiserfs partition on rotating rust.
...
Thus I finally resorted to moving the news spool only to another reiserfs partition created on an SSD disk, which usually syncs in seconds, instead of half an hour.
So I thought, problem solved.
But later on I noticed that the sync operation prior to hibernating on the old reiserfs partition still takes long, but only on some days - even if nothing writes there now. This is astonishing.
Have you tried using strictatime (and no lazytime)? Does that cure the sync problem?
FWIW I don't see delays when I run sync on my reiser partition and I use noatime so your problem is most likely either related to the specific data you store or the atime modes you choose.
AFAIK lazytime is now the default even if not specified. The news server specifically needs the access timestamp. That part I solved by moving it to an SSD, which means the problem was not writing a lot of data, but the seek time. Actual data written seems to be small. Being now on SSD that problem is solved. The weird thing now is that I keep having a problem with the remaining partition on rotating rust, which is basically not used at the moment. I simply do not know what it is writing there. To me, now, what I want is finding out what is going on rather than "solving" the problem. For solving I would rather migrate to another filesystem, or migrate this partition to SSD as well. But I'd rather investigate - which is not easy as I can not reproduce at will, it just happens one day unexpectedly. I forgot in the OP that I collected some data from atop, top, and iotop. For instance, see the large size of ram dedicated to cache: 16.7G (now it is 3.6G, same session still). Telcontar:~ # df -h /data/Lareiserfs Filesystem Size Used Avail Use% Mounted on /dev/sdc9 64G 19G 46G 30% /data/Lareiserfs Telcontar:~ # It can not be writing that partition in fully, those 16.7 gigs must be from somewhere else. ATOP - Telcontar 2021/12/13 04:32:24 -------------- 3s elapsed PRC | sys 0.13s | user 0.46s | #proc 588 | #tslpu 3 | #zombie 0 | no procacct | CPU | sys 4% | user 16% | irq 1% | idle 1087% | wait 92% | ipc notavail | CPL | avg1 2.32 | avg5 1.81 | avg15 1.07 | csw 14881 | intr 8933 | numcpu 12 | MEM | tot 31.3G | free 1.0G | cache 16.7G | buff 2.9G | slab 2.3G | hptot 0.0M | SWP | tot 100.0G | free 98.5G | | | vmcom 20.5G | vmlim 115.7G | DSK | sdc | busy 99% | read 0 | write 132 | MBw/s 0.3 | avio 22.5 ms | DSK | nvme0n1 | busy 1% | read 5 | write 3 | MBw/s 0.0 | avio 2.50 ms | NET | transport | tcpi 2 | tcpo 2 | udpi 4 | udpo 3 | tcpao 0 | NET | network | ipi 6 | ipo 5 | ipfrw 0 | deliv 6 | icmpo 0 | NET | eth0 0% | pcki 6 | pcko 5 | sp 1000 Mbps | si 2 Kbps | so 1 Kbps | PID SYSCPU USRCPU VGROW RGROW RUID EUID ST EXC THR S CPUNR CPU CMD 1/4 6208 0.02s 0.09s 0K 0K cer cer -- - 4 S 5 4% xfce4-terminal 3292 0.01s 0.09s 512K 0K root root -- - 21 S 3 3% X 23817 0.01s 0.04s 0K 180K cer cer -- - 294 S 0 2% firefox 9224 0.01s 0.04s 0K 0K root root -- - 1 S 2 2% iotop 24119 0.01s 0.03s 0K 752K cer cer -- - 30 S 9 1% Web Content 24146 0.00s 0.04s 0K 0K cer cer -- - 53 S 0 1% Web Content 27548 0.00s 0.03s 0K 0K cer cer -- - 26 S 5 1% soffice.bin 9196 0.01s 0.02s 0K 0K cer cer -- - 1 R 4 1% atop 24173 0.02s 0.00s 0K 0K cer cer -- - 27 S 7 1% Web Content 6012 0.00s 0.02s 0K 0K cer cer -- - 5 S 11 1% xfwm4 6199 0.02s 0.00s 0K 0K cer cer -- - 4 S 5 1% gkrellm 24038 0.00s 0.01s 0K 0K cer cer -- - 27 S 5 0% Web Content 28630 0.00s 0.01s 0K 0K cer cer -- - 22 S 3 0% FoxitReader 8478 0.00s 0.01s 0K 0K cer cer -- - 4 S 9 0% gnote top - 04:33:00 up 1 day, 16:37, 2 users, load average: 3,05, 2,04, 1,17 Tasks: 587 total, 2 running, 585 sleeping, 0 stopped, 0 zombie %Cpu(s): 1,2 us, 0,3 sy, 0,0 ni, 90,3 id, 8,1 wa, 0,0 hi, 0,1 si, 0,0 st KiB Mem : 32821868 total, 1059992 free, 9153492 used, 22608384 buff/cache KiB Swap: 10485760+total, 10328782+free, 1569772 used. 22622632 avail Mem PID USER PR NI VIRT RES SHR SWAP S %CPU %MEM TIME+ COMMAND 6208 cer 20 0 1336668 100996 35516 1436 S 3,593 0,308 4:03.50 xfce4-terminal 3292 root 20 0 2458604 489160 433484 31388 S 2,994 1,490 21:03.53 X 9196 cer 20 0 33856 14584 2784 0 S 2,096 0,044 16:50.35 atop 9224 root 20 0 54612 16296 3916 0 S 1,796 0,050 26:03.92 iotop 23817 cer 20 0 9,842g 2,961g 613656 0 S 1,796 9,459 69:02.49 firefox 24146 cer 20 0 3488196 401800 156764 0 S 1,497 1,224 8:36.14 Web Content 6199 cer 20 0 482696 17728 15608 2948 S 1,198 0,054 12:35.49 gkrellm 24119 cer 20 0 3259748 456988 144696 0 S 1,198 1,392 13:32.64 Web Content 24173 cer 20 0 2718164 207776 110820 0 S 0,898 0,633 6:40.33 Web Content 27548 cer 20 0 12,855g 532268 117972 0 S 0,898 1,622 12:30.20 soffice.bin 9220 cer 20 0 45480 4620 3388 0 R 0,599 0,014 6:07.14 top 11 root 20 0 0 0 0 0 I 0,299 0,000 0:44.75 rcu_sched 6019 cer 20 0 1875884 11872 8848 1360 S 0,299 0,036 7:20.29 pulseaudio 6080 cer 20 0 556252 31056 21364 1476 S 0,299 0,095 1:23.29 xfce4-panel 6202 cer 20 0 482600 17052 15464 3420 S 0,299 0,052 4:48.37 gkrellm 28630 cer 20 0 2173532 100388 48228 0 S 0,299 0,306 0:59.64 FoxitReader 30669 root 20 0 0 0 0 0 I 0,299 0,000 0:01.34 kworker/3:2-mm_pe+ 1 root 20 0 222228 10832 7220 496 S 0,000 0,033 0:17.80 systemd Total DISK READ : 0.00 B/s | Total DISK WRITE : 35.32 K/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 120.09 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 32132 be/4 root 0.00 B/s 2.35 K/s 0.00 % 99.99 % [kworker/u64:3+flush-8:32] 1456 be/3 root 0.00 B/s 11.77 K/s 0.00 % 74.69 % [jbd2/sdc5-8] 24432 be/4 cer 0.00 B/s 12.95 K/s 0.00 % 10.13 % firefox -P Small [glean.dispatche] 490 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.27 % [jbd2/nvme0n1p5-] 2080 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.09 % [kworker/0:0-events] 23889 be/4 cer 0.00 B/s 3.53 K/s 0.00 % 0.00 % firefox -P Small [Cache2 I/O] 28752 be/4 cer 0.00 B/s 2.35 K/s 0.00 % 0.00 % soffice.bin /home~s --splash-pipe=9 3168 be/4 cer 0.00 B/s 2.35 K/s 0.00 % 0.00 % firefox -P Small [IndexedDB #643] cer@Telcontar:~> uptime 04:34:10 up 1 day 16:38, 2 users, load average: 3,18, 2,30, 1,33 cer@Telcontar:~> Now: ATOP - Telcontar 2021/12/14 20:22:13 -------------- 3s elapsed PRC | sys 0.28s | user 1.18s | #proc 586 | #tslpu 0 | #zombie 0 | no procacct | CPU | sys 11% | user 40% | irq 1% | idle 1135% | wait 14% | ipc notavail | CPL | avg1 0.62 | avg5 0.72 | avg15 0.76 | csw 116872 | intr 16899 | numcpu 12 | MEM | tot 31.3G | free 18.9G | cache 3.6G | buff 447.1M | slab 523.9M | hptot 0.0M | SWP | tot 100.0G | free 97.9G | | | vmcom 20.3G | vmlim 115.7G | DSK | sdc | busy 18% | read 0 | write 11 | MBw/s 0.1 | avio 48.7 ms | NET | transport | tcpi 3 | tcpo 5 | udpi 4 | udpo 4 | tcpao 1 | NET | network | ipi 7 | ipo 9 | ipfrw 0 | deliv 7 | icmpo 0 | NET | eth0 0% | pcki 7 | pcko 9 | sp 1000 Mbps | si 3 Kbps | so 2 Kbps | PID SYSCPU USRCPU VGROW RGROW RUID EUID ST EXC THR S CPUNR CPU CMD 1/4 32422 0.17s 0.74s 0K 3524K cer cer -- - 31 S 6 30% Web Content 32108 0.02s 0.10s 0K -172K cer cer -- - 294 S 11 4% firefox 6208 0.00s 0.09s 0K 0K cer cer -- - 4 S 8 3% xfce4-terminal 3292 0.02s 0.06s -7588K 0K root root -- - 21 R 2 3% X 9224 0.01s 0.05s 0K 0K root root -- - 1 S 5 2% iotop 32395 0.00s 0.05s 0K 0K cer cer -- - 27 R 1 2% Web Content 32476 0.00s 0.03s 0K 0K cer cer -- - 27 S 6 1% Web Content top - 20:22:26 up 3 days, 8:26, 2 users, load average: 0,48, 0,68, 0,74 Tasks: 586 total, 2 running, 584 sleeping, 0 stopped, 0 zombie %Cpu(s): 3,0 us, 0,8 sy, 0,0 ni, 95,7 id, 0,3 wa, 0,0 hi, 0,2 si, 0,0 st KiB Mem : 32821868 total, 19834772 free, 8492224 used, 4494872 buff/cache KiB Swap: 10485760+total, 10264475+free, 2212844 used. 23182980 avail Mem PID USER PR NI VIRT RES SHR SWAP S %CPU %MEM TIME+ COMMAND 32422 cer 20 0 9559388 599392 140360 0 R 31,04 1,826 97:08.64 Web Content 32108 cer 20 0 9440412 3,026g 845168 0 S 4,179 9,667 26:58.24 firefox 3292 root 20 0 2496120 439412 383896 32792 S 2,388 1,339 40:41.51 X 6208 cer 20 0 1338308 81064 24764 12436 S 2,090 0,247 10:10.32 xfce4-terminal 9224 root 20 0 55852 17180 4520 924 S 1,791 0,052 52:38.28 iotop 32395 cer 20 0 2801276 277704 123424 0 S 1,493 0,846 8:58.26 Web Content 32476 cer 20 0 2925004 262380 147960 0 S 1,194 0,799 5:43.08 Web Content 6199 cer 20 0 482696 14568 12032 2548 S 0,896 0,044 25:44.48 gkrellm 9196 cer 20 0 32004 12444 2856 116 S 0,896 0,038 34:03.05 atop 6019 cer 20 0 1613740 11460 8564 1488 S 0,597 0,035 15:39.60 pulseaudio 11 root 20 0 0 0 0 0 I 0,299 0,000 1:30.46 rcu_sched Total DISK READ : 0.00 B/s | Total DISK WRITE : 61.23 K/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 147.18 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 1456 be/3 root 0.00 B/s 3.53 K/s 0.00 % 4.78 % [jbd2/sdc5-8] 26167 be/4 root 0.00 B/s 4.71 K/s 0.00 % 2.73 % [kworker/9:2-events] 490 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.27 % [jbd2/nvme0n1p5-] 28277 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.09 % [kworker/0:0-events] 32187 be/4 cer 0.00 B/s 3.53 K/s 0.00 % 0.01 % firefox -P Small [Cache2 I/O] 32644 be/4 cer 0.00 B/s 47.10 K/s 0.00 % 0.00 % firefox -P Small [localStorage DB] 27554 be/4 root 0.00 B/s 2.35 K/s 0.00 % 0.00 % [kworker/u64:3-events_unbound] cer@Telcontar:~> uptime 20:18:37 up 3 days 8:22, 2 users, load average: 0,50, 0,78, 0,77 cer@Telcontar:~> - -- Cheers, Carlos E. R. (from openSUSE 15.2 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYbjw9xwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVoyEAn1bOoper81sKj84ZcI0d l4U07apnAJ0RCrKn1lQ2sEjJf5nkyszneDLsSA== =Cu9v -----END PGP SIGNATURE-----
Carlos E. R. wrote:
Traditionally a news server touches (and use) the access timestamp each time a post is read, affecting one file per read. With the default mount options used nowdays this modification stays in RAM for long, many hours.
Might the solution simply be to change the mount options? i.e. remove lazytime ? -- Per Jessen, Zürich (2.0°C)
On 2021-12-14 12:41, Carlos E. R. wrote:
Hi,
I have a problem with my one remaining reiserfs partition on rotating rust.
The problem I initially detected was that telling the machine to hibernate sometimes appeared to stall, and I had to force a power off (via physical power switch) to recover. In fact, I found out later, the kernel was simply syncing one partition, and this operation would take half an hour, which it is absurd.
Later I found out that it was only one partition affected, and this partition was (is) reiserfs. Then I guessed that the slow operation was syncing the metadata (the time stamps) of the news (nntp) spool.
Traditionally a news server touches (and use) the access timestamp each time a post is read, affecting one file per read. With the default mount options used nowdays this modification stays in RAM for long, many hours.
So, the cronjob that scans the news spool now also calls sync on that filesystem, and when hibernating I call sync in advance. This improved things, but not completely.
Thus I finally resorted to moving the news spool only to another reiserfs partition created on an SSD disk, which usually syncs in seconds, instead of half an hour.
So I thought, problem solved.
But later on I noticed that the sync operation prior to hibernating on the old reiserfs partition still takes long, but only on some days - even if nothing writes there now. This is astonishing. Look, last Monday:
Telcontar:~ # hibernate 2021-12-13 04:26:18+01:00 Checking news to send 2021-12-13 04:26:22+01:00 Syncing rm: cannot remove '/var/lib/news/bin/cronscriptparafetchnews.enabled': No such file or directory 2021-12-13 06:06:09+01:00 synced. Now screensaver xscreensaver-command: no screensaver is running on display :0.0
touch /var/lib/news/bin/cronscriptparafetchnews.enabled Telcontar:~ #
An hour and a half to sync a not used partition!
(I know it is only one partition because I monitor it with gkrellm)
Hibernation script:
date --rfc-3339=seconds echo Checking news to send /var/lib/news/bin/cronscriptparaenviarnewspendientes date --rfc-3339=seconds echo Syncing rm /var/lib/news/bin/cronscriptparafetchnews.enabled strace -o /tmp/hibernate.strace sync date --rfc-3339=seconds echo "synced. Now screensaver" xscreensaver-command -lock sleep 3 #sudo chvt 10 #sleep 1 sudo /usr/local/sbin/beep sleep 1
#systemctl hibernate sudo /usr/bin/systemctl hibernate
The structure of the mount is this:
Telcontar:~ # mount | grep sdc9 /dev/sdc9 on /data/Lareiserfs type reiserfs (rw,relatime,lazytime,user_xattr,acl) /dev/sdc9 on /usr/src type reiserfs (rw,relatime,lazytime,user_xattr,acl) /dev/sdc9 on /home/cer/terrasync type reiserfs (rw,relatime,lazytime,user_xattr,acl) /dev/sdc9 on /data/homedvl type reiserfs (rw,relatime,lazytime,user_xattr,acl) /dev/sdc9 on /usr/share/flightgear type reiserfs (rw,relatime,lazytime,user_xattr,acl) Telcontar:~ #
The first one is the actual mount, the rest are bind mounts - fstab:
LABEL=c_data_reiser /data/Lareiserfs reiserfs acl,user_xattr,barrier=flush,lazytime 1 3
/data/Lareiserfs/gamedata/flightgear /usr/share/flightgear none bind 0 0 /data/Lareiserfs/gamedata/terrasync /home/cer/terrasync none bind 0 0 /data/Lareiserfs/data_homedvl /data/homedvl none bind 0 0 /data/Lareiserfs/usr_src /usr/src none bind 0 0
Now I have added entries to my hibernate script to sync each reiserfs bind mount separately, find out which one it is, if any.
When I started upgrading my machines to 15.3, I learned that Reiserfs is no longer supported (not mentioned in the release notes), so I finally migrated this partition to ext4. Apparently both XFS and btrfs behave badly with many small files, and ext4 has a tuning for news: Telcontar:~ # mkfs.ext4 -T news -L c_data_exreiser /dev/sdc9 -T usage-type[,...] Specify how the filesystem is going to be used, so that mke2fs can choose optimal filesystem parameters for that use. The usage types that are sup- ported are defined in the configuration file /etc/mke2fs.conf. The user may specify one or more usage types using a comma separated list. If this option is is not specified, mke2fs will pick a single default usage type based on the size of the filesys- tem to be created. If the filesystem size is less than 3 megabytes, mke2fs will use the filesystem type floppy. If the filesystem size is greater than or equal to 3 but less than 512 megabytes, mke2fs(8) will use the filesystem type small. If the filesys- tem size is greater than or equal to 4 terabytes but less than 16 terabytes, mke2fs(8) will use the filesystem type big. If the filesystem size is greater than or equal to 16 terabytes, mke2fs(8) will use the filesystem type huge. Otherwise, mke2fs(8) will use the default filesystem type default. /etc/mke2fs.conf: news = { inode_ratio = 4096 } (default is 16384) Mount options are: LABEL=c_data_exreiser /data/WasReiserfs ext4 \ acl,user_xattr,relatime,lazytime 1 3 which results in: /dev/sdc9 on /data/WasReiserfs type ext4 (rw,relatime,lazytime) (why the other options are ignored I don't know, but I suspect systemd interfering somehow). I do not observe any delay in hibernating the machine or issuing sync orders, which run instantly (5 seconds for the entire filesystem), so my take is that currently reiserfs has a problem with lazytime that other filesystems do not have. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
participants (5)
-
Carlos E. R.
-
Dave Howorth
-
David Haller
-
Per Jessen
-
Yamaban