What | Removed | Added |
---|---|---|
Status | NEW | CONFIRMED |
CC | iforster@suse.com | |
Component | MicroOS | Kernel |
Assignee | fvogt@suse.com | kernel-maintainers@forge.provo.novell.com |
The kernel code in fs/btrfs/super.c has special handling for SB_RDONLY on mounts, which results in somewhat surprising behaviour. So here's my current theory on what actually happens on bootup: initrd: 1. /sysroot gets mounted ro. The btrfs superblock is set read-only. 2. /sysroot/var gets mounted rw. The btrfs superblock is set read-write. after switch-root: 3. systemd-remount-fs does "mount -o remount /", which picks up the "ro" from /etc/fstab, resulting in "mount -o remount,ro /". The kernel code now tries to make the superblock read-only, which succeeds as no file is opened with write mode. As a result, /var is now ro as well. 4. systemd-journal-flush.service runs, with /var read-only. It fails. 5. local-fs.target starts, calling "mount /opt", which picks up the "rw" from /etc/fstab. This makes the superblock read-write, so /var is now read-write again. 6. The mount for / remains read-only as it's an explicit read-only mount. For context, a discussion on IRC: [15:11] <fvogt> Quick btrfs question: If I mount -o remount,ro /opt, all other btrfs subvols (/var, /root, etc.) are also remounted read-only. Is that intentional? [15:14] <dsterba> fvogt: yes, read-only applies to the superblock that's shared for all subvolumes [15:15] <dsterba> but I think there's a case where some of the subvolume mounts can be made read-only (while the filesystem is read-write, iow the read-only flag applies only to the mount point itself) [15:15] <fvogt> dsterba: Right, but specific subvolumes can be mounted ro on their own - only remount has this behaviour of touching other subvols [15:16] <fvogt> On the system I have here this breaks in weird ways, as / is read-only while /var is mounted explicitly rw. That works, until systemd comes along and does "mount -o remount /"... [15:17] <dsterba> looking at code, remount touches the superblock flags [15:24] <fvogt> Ok, so everything makes sense now, in a surprisingly complicated way... So the way fstab is set up works more or less only by accident as systemd mounts some other subvolumes rw later on boot, which makes the superblock read-write again [15:25] <fvogt> Would it be possible to only make the SB read-only on remount if all subtree mounts are r-o? That would IMO be the "expected behaviour", but a behaviour change... [15:28] <dsterba> I think that's not possible, VFS does not provide 'list of other subtree mounts' at the .remount callback [15:29] <fvogt> Currently it's "by chance" whether the superblock gets RDONLY on a remount,rw, depending on whether any file is opened with write mode [15:29] <fvogt> *...on a remount,ro of a subvol, depending on... [15:29] <dsterba> a file opened read-write will block rw->ro remount in genreal [15:30] <fvogt> Hm, which is also somewhat surprising [15:33] <dsterba> shouldn't be, that's VFS internal synchronization mechanism, all write requests increment a counter somewere and it's checked at times when the rw->ro is attempted [15:33] <fvogt> There is special handling for the "-EBUSY because of SB_RDONLY" case in super.c:1623, it retries vfs_kern_mount with flags&~SB_RDONLY [15:33] <dsterba> like switching read/write locks from one state to another [15:35] <dsterba> the EBUSY case is to support different ro/rw subvolume mounts, intorduced in 0723a0473fb48a1c93b113a So a workaround would be to put "After=local-fs.target" or something like that in systemd-journal-flush.service, but I'm sure upstream systemd won't like that... I'm reassigning to kernel now as IMO the behaviour in step 3 is unexpected and looks like a bug. If changing this behaviour isn't possible, we'd have to add another hack^Wcustomization in read-only-root-fs :-/