[Bug 1156421] New: devel:kubic:images: no persistent systemd journal for aarch64/armv7l
http://bugzilla.suse.com/show_bug.cgi?id=1156421 Bug ID: 1156421 Summary: devel:kubic:images: no persistent systemd journal for aarch64/armv7l Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: MicroOS Assignee: fvogt@suse.com Reporter: kukuk@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- The images for Raspberry Pi 2/3 from devel:kubic:images have no persistent systemd journal logging, the x86-64 has. The /var/log/journal directory seems to exist, no idea why the log is written to /run/log/journal instead. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c1 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fbui@suse.com Flags| |needinfo?(fbui@suse.com) --- Comment #1 from Fabian Vogt <fvogt@suse.com> --- Created attachment 823824 --> http://bugzilla.suse.com/attachment.cgi?id=823824&action=edit journalctl What's in /var/log/journal and /run/log/journal? I think I can reproduce this here. The issue appears to be that systemd-journal-flush.service runs, but does somehow not cause a flush. Snippet from the journal during bootup: Oct 31 07:32:11 localhost systemd[1]: Starting Flush Journal to Persistent Storage... Oct 31 07:32:11 localhost systemd-journald[521]: Runtime Journal (/run/log/journal/88c7cea6795b4428909767e3e81c15bd) is 5.9M, max 47.2M, 41.3M free. Oct 31 07:32:11 localhost kernel: audit: type=1400 audit(1572507131.859:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="klogd" pid=584 comm="apparmor_parser" Oct 31 07:32:11 localhost systemd[1]: Started Flush Journal to Persistent Storage. That apparently didn't work, it still writing to /run/log/journal. Snipped from the journal after restarting systemd-journal-flush.service manually: Oct 31 07:44:53 localhost systemd[1]: Starting Flush Journal to Persistent Storage... Oct 31 07:44:53 localhost systemd-journald[521]: Time spent on flushing to /var is 108.202ms for 876 entries. Oct 31 07:44:53 localhost systemd-journald[521]: System Journal (/var/log/journal/88c7cea6795b4428909767e3e81c15bd) is 24.0M, max 1.4G, 1.4G free. Oct 31 07:44:53 localhost tallow[960]: Journal was rotated, resetting Oct 31 07:44:53 localhost systemd[1]: Started Flush Journal to Persistent Storage. Here it switched to /var successfully. So the question is why it does not happen (reliably?) when running during boot? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c2 --- Comment #2 from Fabian Vogt <fvogt@suse.com> --- My debugging attempts led nowhere so far, the journal is really not helpful. Using ExecStart=/bin/sh -c "sleep 5; journalctl --flush" in systemd-journal-flush.service seems to have worked around it, so it appears to be a race. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c3 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(fbui@suse.com) | --- Comment #3 from Franck Bui <fbui@suse.com> --- (In reply to Fabian Vogt from comment #1)
Here it switched to /var successfully. So the question is why it does not happen (reliably?) when running during boot?
Hmm... the only reason I can see is that /var is still a RO partition at the time the journal is flushed and therefore the system journal couldn't be created in /var. But you would get an explicit error on the next reboot since this time the journal file was already created when you flushed the journal manually and journald would fail at opening it RW. Can you see such error ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c4 --- Comment #4 from Franck Bui <fbui@suse.com> --- BTW which version of systemd are you running ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c5 --- Comment #5 from Franck Bui <fbui@suse.com> --- (In reply to Fabian Vogt from comment #1)
I think I can reproduce this here. The issue appears to be that systemd-journal-flush.service runs, but does somehow not cause a flush.
Any chance I can reproduce it locally, maybe with qemu ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c6 --- Comment #6 from Fabian Vogt <fvogt@suse.com> --- (In reply to Franck Bui from comment #3)
(In reply to Fabian Vogt from comment #1)
Here it switched to /var successfully. So the question is why it does not happen (reliably?) when running during boot?
Hmm... the only reason I can see is that /var is still a RO partition at the time the journal is flushed and therefore the system journal couldn't be created in /var.
/var is mounted read-write in the initrd already. So unless something re-mounts it multiple times during boot, I don't think that's the case.
But you would get an explicit error on the next reboot since this time the journal file was already created when you flushed the journal manually and journald would fail at opening it RW.
Can you see such error ?
Nope. The journal seems corrupted though, for some reason.
(In reply to Franck Bui from comment #4) BTW which version of systemd are you running ?
v243 - do you need the full rpm version? (In reply to Franck Bui from comment #5)
(In reply to Fabian Vogt from comment #1)
I think I can reproduce this here. The issue appears to be that systemd-journal-flush.service runs, but does somehow not cause a flush.
Any chance I can reproduce it locally, maybe with qemu ?
Might be possible with software emulation, I'll try. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c7 --- Comment #7 from Thorsten Kukuk <kukuk@suse.com> --- (In reply to Fabian Vogt from comment #6)
(In reply to Franck Bui from comment #3)
Hmm... the only reason I can see is that /var is still a RO partition at the time the journal is flushed and therefore the system journal couldn't be created in /var.
/var is mounted read-write in the initrd already. So unless something re-mounts it multiple times during boot, I don't think that's the case.
systemd remounts that at first after leaving the initrd, at least it did so when we debugged similar problems with apparmor. There we had the same problems, even if the filesystem was mounted read-write in the initrd, systemd remounted it to read-only and later to read-write, and loading the apparmor profiles failed due to read-only /var. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c8 --- Comment #8 from Fabian Vogt <fvogt@suse.com> ---
(In reply to Franck Bui from comment #5)
(In reply to Fabian Vogt from comment #1)
I think I can reproduce this here. The issue appears to be that systemd-journal-flush.service runs, but does somehow not cause a flush.
Any chance I can reproduce it locally, maybe with qemu ?
Might be possible with software emulation, I'll try.
It hangs after dracut detects the root device, so something is broken. It's really slow as well, so I don't recommend it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c9 --- Comment #9 from Fabian Vogt <fvogt@suse.com> --- Created attachment 824579 --> http://bugzilla.suse.com/attachment.cgi?id=824579&action=edit strace of journald during systemd-journal-flush.service failing I added an strace call to systemd-journal-flush.service and rebooted and luckily it still broke. The issue seems to be that indeed /var is still read-only: writev(6, [{iov_base="<46>", iov_len=4}, {iov_base="systemd-journald", iov_len=16}, {iov_base="[815]: ", iov_len=7}, {iov_base="Received client request to flush"..., iov_len=49}, {iov_base="\n", iov_len=1}], 5) = 77 mkdirat(AT_FDCWD, "/var/log/journal/ec14ed36308c4af2b46995c50601f402", 0755) = -1 EROFS (Read-only file system) openat(AT_FDCWD, "/var/log/journal/ec14ed36308c4af2b46995c50601f402/system.journal", O_RDWR|O_CREAT|O_NONBLOCK|O_CLOEXEC, 0640) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/run/log/journal/ec14ed36308c4af2b46995c50601f402", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 24 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c10 --- Comment #10 from Fabian Vogt <fvogt@suse.com> --- Created attachment 824585 --> http://bugzilla.suse.com/attachment.cgi?id=824585&action=edit journal with systemd.log_level=debug -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c11 --- Comment #11 from Franck Bui <fbui@suse.com> --- Fabian, can you please provide the debug logs again but with also "printk.devkmsg=on" option so the kernel stop dropping messages ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c12 --- Comment #12 from Fabian Vogt <fvogt@suse.com> --- Created attachment 824907 --> http://bugzilla.suse.com/attachment.cgi?id=824907&action=edit journal with systemd.log_level=debug printk.devkmsg=on -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c13 --- Comment #13 from Franck Bui <fbui@suse.com> --- Thanks. As the logs are broken due to the unreliable clock on your system (the debug logs with printk.devkmsg=on showed that the system switched to rootfs twice !), I'm not 100% sure but I can't see any hints on systemd unmounting or remounting /var RO after leaving initrd. According to our private chat /var is mounted RW just before switching to rootfs. So either something in userspace unmount it or remount it RO. Can you replace /usr/bin/mount and /usr/bin/umount with wrappers that log any attempts to mount/remount/umount /var before doing the actual work in order to make sure that the issue leaves in userspace ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c14 --- Comment #14 from Fabian Vogt <fvogt@suse.com> --- (In reply to Franck Bui from comment #13)
Thanks.
As the logs are broken due to the unreliable clock on your system (the debug logs with printk.devkmsg=on showed that the system switched to rootfs twice !), I'm not 100% sure but I can't see any hints on systemd unmounting or remounting /var RO after leaving initrd.
According to our private chat /var is mounted RW just before switching to rootfs. So either something in userspace unmount it or remount it RO.
Can you replace /usr/bin/mount and /usr/bin/umount with wrappers that log any attempts to mount/remount/umount /var before doing the actual work in order to make sure that the issue leaves in userspace ?
I tried that (but with /run, as /var is not writable all the time...) and didn't get any call for /var or /etc in there. I added "findmnt" to systemd-journal-flush.service now, let's see what happens. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c15 --- Comment #15 from Fabian Vogt <fvogt@suse.com> --- Progress! The culprit seems to be "mount -o remount /". Looks harmless, but actually isn't - it mounts everything on the same btrfs partition as read-only, for some reason. I suspect it's because "rw" isn't explicitly set as option, but that doesn't explain why: a) It's initially mounted rw b) It's mounted rw again later -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c16 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CONFIRMED CC| |iforster@suse.com Component|MicroOS |Kernel Assignee|fvogt@suse.com |kernel-maintainers@forge.pr | |ovo.novell.com --- Comment #16 from Fabian Vogt <fvogt@suse.com> --- The kernel code in fs/btrfs/super.c has special handling for SB_RDONLY on mounts, which results in somewhat surprising behaviour. So here's my current theory on what actually happens on bootup: initrd: 1. /sysroot gets mounted ro. The btrfs superblock is set read-only. 2. /sysroot/var gets mounted rw. The btrfs superblock is set read-write. after switch-root: 3. systemd-remount-fs does "mount -o remount /", which picks up the "ro" from /etc/fstab, resulting in "mount -o remount,ro /". The kernel code now tries to make the superblock read-only, which succeeds as no file is opened with write mode. As a result, /var is now ro as well. 4. systemd-journal-flush.service runs, with /var read-only. It fails. 5. local-fs.target starts, calling "mount /opt", which picks up the "rw" from /etc/fstab. This makes the superblock read-write, so /var is now read-write again. 6. The mount for / remains read-only as it's an explicit read-only mount. For context, a discussion on IRC: [15:11] <fvogt> Quick btrfs question: If I mount -o remount,ro /opt, all other btrfs subvols (/var, /root, etc.) are also remounted read-only. Is that intentional? [15:14] <dsterba> fvogt: yes, read-only applies to the superblock that's shared for all subvolumes [15:15] <dsterba> but I think there's a case where some of the subvolume mounts can be made read-only (while the filesystem is read-write, iow the read-only flag applies only to the mount point itself) [15:15] <fvogt> dsterba: Right, but specific subvolumes can be mounted ro on their own - only remount has this behaviour of touching other subvols [15:16] <fvogt> On the system I have here this breaks in weird ways, as / is read-only while /var is mounted explicitly rw. That works, until systemd comes along and does "mount -o remount /"... [15:17] <dsterba> looking at code, remount touches the superblock flags [15:24] <fvogt> Ok, so everything makes sense now, in a surprisingly complicated way... So the way fstab is set up works more or less only by accident as systemd mounts some other subvolumes rw later on boot, which makes the superblock read-write again [15:25] <fvogt> Would it be possible to only make the SB read-only on remount if all subtree mounts are r-o? That would IMO be the "expected behaviour", but a behaviour change... [15:28] <dsterba> I think that's not possible, VFS does not provide 'list of other subtree mounts' at the .remount callback [15:29] <fvogt> Currently it's "by chance" whether the superblock gets RDONLY on a remount,rw, depending on whether any file is opened with write mode [15:29] <fvogt> *...on a remount,ro of a subvol, depending on... [15:29] <dsterba> a file opened read-write will block rw->ro remount in genreal [15:30] <fvogt> Hm, which is also somewhat surprising [15:33] <dsterba> shouldn't be, that's VFS internal synchronization mechanism, all write requests increment a counter somewere and it's checked at times when the rw->ro is attempted [15:33] <fvogt> There is special handling for the "-EBUSY because of SB_RDONLY" case in super.c:1623, it retries vfs_kern_mount with flags&~SB_RDONLY [15:33] <dsterba> like switching read/write locks from one state to another [15:35] <dsterba> the EBUSY case is to support different ro/rw subvolume mounts, intorduced in 0723a0473fb48a1c93b113a So a workaround would be to put "After=local-fs.target" or something like that in systemd-journal-flush.service, but I'm sure upstream systemd won't like that... I'm reassigning to kernel now as IMO the behaviour in step 3 is unexpected and looks like a bug. If changing this behaviour isn't possible, we'd have to add another hack^Wcustomization in read-only-root-fs :-/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dsterba@suse.com, | |rgoldwyn@suse.com, | |tiwai@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c17 --- Comment #17 from Swamp Workflow Management <swamp@suse.de> --- This is an autogenerated message for OBS integration: This bug (1156421) was mentioned in https://build.opensuse.org/request/show/753160 Factory / read-only-root-fs -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c18 Miroslav Beneš <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mbenes@suse.com --- Comment #18 from Miroslav Beneš <mbenes@suse.com> --- David, Goldwyn, do you consider the behaviour Fabian described in comment 16 as step 3 a bug in the kernel? If not, we should reassign? Fabian, I assume the issue still exists with the newer kernel. Is that correct? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 Goldwyn Rodrigues <rgoldwyn@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-maintainers@forge.pr |rgoldwyn@suse.com |ovo.novell.com | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c20 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fvogt@suse.com --- Comment #20 from Fabian Vogt <fvogt@suse.com> --- For some reason I'm not in CC here, so I didn't get a notification mail... (In reply to Miroslav Beneš from comment #18)
David, Goldwyn, do you consider the behaviour Fabian described in comment 16 as step 3 a bug in the kernel? If not, we should reassign?
Fabian, I assume the issue still exists with the newer kernel. Is that correct?
I don't know, but I would assume that it's the case too. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c21 --- Comment #21 from Fabian Vogt <fvogt@suse.com> --- (In reply to Fabian Vogt from comment #20)
(In reply to Miroslav Beneš from comment #18)
David, Goldwyn, do you consider the behaviour Fabian described in comment 16 as step 3 a bug in the kernel? If not, we should reassign?
Fabian, I assume the issue still exists with the newer kernel. Is that correct?
I don't know, but I would assume that it's the case too.
Just reproduced the behaviour on 5.6.0 as well: dd if=/dev/zero of=btrfsfs seek=240 count=0 bs=1M mkfs.btrfs btrfsfs mount btrfsfs /mnt btrfs subvol create /mnt/sv mount -o remount,ro /mnt mount -osubvol=/sv btrfsfs /mnt/sv findmnt # /mnt is RO, /mnt/sv RW mount -o remount,ro /mnt findmnt # /mnt is RO, /mnt/sv RO as well umount /mnt{/sv,} rm btrfsfs -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1156421 http://bugzilla.suse.com/show_bug.cgi?id=1156421#c22 --- Comment #22 from Goldwyn Rodrigues <rgoldwyn@suse.com> --- Created attachment 835970 --> http://bugzilla.suse.com/attachment.cgi?id=835970&action=edit Proposed patch: Fix RO remount for subvols This is a proposed fix. However, there are problems by removing sb->s_flags setting code from fs/super.c. Filesystems depend of fs/super.c to set sb_flags for them, so it is not expected to be accepted upstream. I need to dig deeper with respect to fs_context -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c23 Thorsten Kukuk <kukuk@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kukuk@suse.com --- Comment #23 from Thorsten Kukuk <kukuk@suse.com> --- Is there any progress here? This renders every second installation of MicroOS-TIU useless. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c25 --- Comment #25 from Goldwyn Rodrigues <rgoldwyn@suse.com> --- We had a discussion in the btrfs mailing list and it was concluded that these changes would deviate from the "default" case [1]. Currently sys admins are setting any subvolume read-only to set the root filesystem read-only. While this is not ideal, this behavior has already been exploited. [1] https://lore.kernel.org/linux-btrfs/20220211164422.GA12643@twin.jikos.cz/T/#... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c26 --- Comment #26 from Fabian Vogt <fvogt@suse.com> --- (In reply to Goldwyn Rodrigues from comment #25)
We had a discussion in the btrfs mailing list and it was concluded that these changes would deviate from the "default" case [1]. Currently sys admins are setting any subvolume read-only to set the root filesystem read-only. While this is not ideal, this behavior has already been exploited.
[1] https://lore.kernel.org/linux-btrfs/20220211164422.GA12643@twin.jikos.cz/T/#...
Ah, good ol' spacebar heating (https://xkcd.com/1172/). Maybe the example doesn't really show the severity of the issue, as it focuses on the "root" mountpoint. This has more independent mounts of subvolumes: dd if=/dev/zero of=btrfsfs seek=240 count=0 bs=1M mkfs.btrfs btrfsfs mkdir mnt mount btrfsfs mnt btrfs subvol create mnt/sv0 btrfs subvol create mnt/sv1 umount mnt mkdir sv{0,1}mnt mount -o subvol=/sv0 btrfsfs sv0mnt mount -o subvol=/sv1 btrfsfs sv1mnt findmnt sv0mnt # RW findmnt sv1mnt # RW mount -o remount,ro sv0mnt findmnt sv0mnt # RO findmnt sv1mnt # RO! mount -o remount,rw sv1mnt findmnt sv0mnt # RW! findmnt sv1mnt # RW umount sv*mnt Do we have any filesystems we can compare the behaviour with? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c27 --- Comment #27 from Goldwyn Rodrigues <rgoldwyn@suse.com> --- (In reply to Fabian Vogt from comment #26)
(In reply to Goldwyn Rodrigues from comment #25)
We had a discussion in the btrfs mailing list and it was concluded that these changes would deviate from the "default" case [1]. Currently sys admins are setting any subvolume read-only to set the root filesystem read-only. While this is not ideal, this behavior has already been exploited.
[1] https://lore.kernel.org/linux-btrfs/20220211164422.GA12643@twin.jikos.cz/T/#...
Ah, good ol' spacebar heating (https://xkcd.com/1172/).
Maybe the example doesn't really show the severity of the issue, as it focuses on the "root" mountpoint. This has more independent mounts of subvolumes:
Yes, this is a problem with any subvolume. *Any* subvolume setting to remount read-only renders all subvolumes currently mounted of the entire filesystem read-only. This is what was pointed out by Graham Cobb in the discussion.
dd if=/dev/zero of=btrfsfs seek=240 count=0 bs=1M mkfs.btrfs btrfsfs mkdir mnt mount btrfsfs mnt btrfs subvol create mnt/sv0 btrfs subvol create mnt/sv1 umount mnt
mkdir sv{0,1}mnt mount -o subvol=/sv0 btrfsfs sv0mnt mount -o subvol=/sv1 btrfsfs sv1mnt findmnt sv0mnt # RW findmnt sv1mnt # RW mount -o remount,ro sv0mnt findmnt sv0mnt # RO findmnt sv1mnt # RO! mount -o remount,rw sv1mnt findmnt sv0mnt # RW! findmnt sv1mnt # RW umount sv*mnt
Do we have any filesystems we can compare the behaviour with?
Unfortunately, btrfs is pretty unique in this front. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c31 --- Comment #31 from Swamp Workflow Management <swamp@suse.de> --- SUSE-RU-2022:1821-1: An update that has three recommended fixes can now be installed. Category: recommended (low) Bug References: 1156421,1161264,1176052 CVE References: JIRA References: Sources used: openSUSE Leap 15.4 (src): read-only-root-fs-1.0+git20190206.586e9f1-150100.3.3.1 openSUSE Leap 15.3 (src): read-only-root-fs-1.0+git20190206.586e9f1-150100.3.3.1 SUSE Linux Enterprise Module for Transactional Server 15-SP4 (src): read-only-root-fs-1.0+git20190206.586e9f1-150100.3.3.1 SUSE Linux Enterprise Module for Transactional Server 15-SP3 (src): read-only-root-fs-1.0+git20190206.586e9f1-150100.3.3.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c34 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |felix.niederwanger@suse.com --- Comment #34 from Fabian Vogt <fvogt@suse.com> --- *** Bug 1202276 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 Daniel Rahn <drahn@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |drahn@suse.com, | |jeffm@suse.com, | |mge@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 Matthias Eckermann <mge@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|WONTFIX |--- -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 Matthias Eckermann <mge@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jeffm@suse.com) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c38 --- Comment #38 from Jeff Mahoney <jeffm@suse.com> --- Does mount --bind -oro,remount <mnt> work for you? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c41 --- Comment #41 from Thorsten Kukuk <kukuk@suse.com> --- (In reply to Jeff Mahoney from comment #38)
Does mount --bind -oro,remount <mnt> work for you?
If I call this from the commandline, yes, works. If I add "bind" to /etc/fstab (ro,bind), no, this doesn't seem to work. But we need something for /etc/fstab for the boot process. Beside that, what would be the side effects of using this option? I used the example from bsc#1202276 to test, since this triggers this problem very reliable. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1156421 https://bugzilla.suse.com/show_bug.cgi?id=1156421#c43 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jeffm@suse.com) | --- Comment #43 from Jeff Mahoney <jeffm@suse.com> --- (In reply to Thorsten Kukuk from comment #41)
(In reply to Jeff Mahoney from comment #38)
Does mount --bind -oro,remount <mnt> work for you?
If I call this from the commandline, yes, works. If I add "bind" to /etc/fstab (ro,bind), no, this doesn't seem to work.
But we need something for /etc/fstab for the boot process.
Beside that, what would be the side effects of using this option?
Using ro or rw without bind in the fstab has the same effect. If the initial mount is read-only, it will mark the entire file system read-only. If a another subvolume mount wants a read-write mount, it will leave the first mountpoint read-only, remount the file system internally read-write, and mark the new mount point read-write. Writes to the first mount will continue to be refused and will be allowed on the second mount. For context, the read-only flag can be set in: - Filesystem-wide superblock flags - Mountpoint flags - Subvolume flags If the superblock flag or the subvolume flag is set, the subvolume will be read-only no matter where it's mounted and regardless whether the mountpoint is read-write. If the mountpoint flag is set, the subvolume will be read-only only at that mountpoint and could be writable at another location if the subvolume was mounted more than once. If you use mount -oremount,ro on a subvolume, that overrides that internal remount to rw and you end up with the situation where the entire fs is read-only. If you use mount -oremount,ro --bind on a subvolume _mountpoint_ it only applies to the mountpoint. If you do that on a subvolume that isn't explicitly mounted, it'll complain about it not being a mountpoint and fail with an error. All of this happens with flags that the VFS layer understands, using the superblock and mountpoints. The problem is that btrfs subvolumes don't have to be represented by a mountpoint. We do that explicitly to give the appearance of a coherent file system. As a result, there's no internal linkage between mount points and subvolumes. The mount point flags and the subvolume flags aren't connected. This has some annoying UX effects. 1) You can set the mountpoint readonly status and it'll be enforced at that mountpoint, but not subsequent mounts of the same subvolume unless those are also mounted with the same read-only status. This would either be mount --bind -oremount,r[ow] or by just specifying the ro/rw flags in /etc/fstab. 2) You can set the btrfs subvolume readonly status and it'll be enforced everywhere, but the mountpoint flags are all that's shown to the user via 'mount.' To get the subvolume flags you'll need to use btrfs tools. This would be btrfs property set path read-only [true|false] The subvolume flags are persistent, while the mountpoint flags are not. Unfortunately, mount points essentially don't exist to the underlying file system. The mount routine for a file system returns a dentry and _then_ the mountpoint is created. You will see a mount point used in the btrfs mount code internally but it's only temporary and is released before the mount completes. So there's no way to have the mount point automatically reflect the flags of the subvolume. A long-term idea of mine has been to leverage the automount behavior in the VFS to handle subvolumes. The automount callback _does_ have access to the vfsmount and so it could set those flags before returning it to the vfs. I haven't had the time to do this and we've never made it an official feature request. So, I'm not sure this helps but should at least explain why it's so confusing. -- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com