Fabian Vogt changed bug 1156421
What Removed Added
Status NEW CONFIRMED
CC   iforster@suse.com
Component MicroOS Kernel
Assignee fvogt@suse.com kernel-maintainers@forge.provo.novell.com

Comment # 16 on bug 1156421 from
The kernel code in fs/btrfs/super.c has special handling for SB_RDONLY on
mounts, which results in somewhat surprising behaviour.

So here's my current theory on what actually happens on bootup:

initrd:
1. /sysroot gets mounted ro. The btrfs superblock is set read-only.
2. /sysroot/var gets mounted rw. The btrfs superblock is set read-write.
after switch-root:
3. systemd-remount-fs does "mount -o remount /", which picks up the "ro" from
/etc/fstab, resulting in "mount -o remount,ro /". The kernel code now tries to
make the superblock read-only, which succeeds as no file is opened with write
mode. As a result, /var is now ro as well.
4. systemd-journal-flush.service runs, with /var read-only. It fails.
5. local-fs.target starts, calling "mount /opt", which picks up the "rw" from
/etc/fstab. This makes the superblock read-write, so /var is now read-write
again.
6. The mount for / remains read-only as it's an explicit read-only mount.

For context, a discussion on IRC:

[15:11] <fvogt> Quick btrfs question: If I mount -o remount,ro /opt, all other
btrfs subvols (/var, /root, etc.) are also remounted read-only. Is that
intentional?
[15:14] <dsterba> fvogt: yes, read-only applies to the superblock that's shared
for all subvolumes
[15:15] <dsterba> but I think there's a case where some of the subvolume mounts
can be made read-only (while the filesystem is read-write, iow the read-only
flag applies only to the mount point itself)
[15:15] <fvogt> dsterba: Right, but specific subvolumes can be mounted ro on
their own - only remount has this behaviour of touching other subvols
[15:16] <fvogt> On the system I have here this breaks in weird ways, as / is
read-only while /var is mounted explicitly rw. That works, until systemd comes
along and does "mount -o remount /"...
[15:17] <dsterba> looking at code, remount touches the superblock flags
[15:24] <fvogt> Ok, so everything makes sense now, in a surprisingly
complicated way... So the way fstab is set up works more or less only by
accident as systemd mounts some other subvolumes rw later on boot, which makes
the superblock read-write again
[15:25] <fvogt> Would it be possible to only make the SB read-only on remount
if all subtree mounts are r-o? That would IMO be the "expected behaviour", but
a behaviour change...
[15:28] <dsterba> I think that's not possible, VFS does not provide 'list of
other subtree mounts' at the .remount callback
[15:29] <fvogt> Currently it's "by chance" whether the superblock gets RDONLY
on a remount,rw, depending on whether any file is opened with write mode
[15:29] <fvogt> *...on a remount,ro of a subvol, depending on...
[15:29] <dsterba> a file opened read-write will block rw->ro remount in genreal
[15:30] <fvogt> Hm, which is also somewhat surprising
[15:33] <dsterba> shouldn't be, that's VFS internal synchronization mechanism,
all write requests increment a counter somewere and it's checked at times when
the rw->ro is attempted
[15:33] <fvogt> There is special handling for the "-EBUSY because of SB_RDONLY"
case in super.c:1623, it retries vfs_kern_mount with flags&~SB_RDONLY
[15:33] <dsterba> like switching read/write locks from one state to another
[15:35] <dsterba> the EBUSY case is to support different ro/rw subvolume
mounts, intorduced in 0723a0473fb48a1c93b113a

So a workaround would be to put "After=local-fs.target" or something like that
in systemd-journal-flush.service, but I'm sure upstream systemd won't like
that...

I'm reassigning to kernel now as IMO the behaviour in step 3 is unexpected and
looks like a bug.
If changing this behaviour isn't possible, we'd have to add another
hack^Wcustomization in read-only-root-fs :-/


You are receiving this mail because: