Thoughts on the file system layout
Hi, While working on MicroOS, UsrMerge, UsrEtc, playing with systemd features I was wondering where that could lead us to. I'm sure we didn't utilize the full potential of what we can do with our OS yet. So I've tried to dump my thoughts into an article: https://github.com/lnussel/lnussel.github.io/blob/fs/_posts/2020-12-16-fslay... tl;dr rpms to only operate below /usr and nowhere else. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
On 1/5/21 5:03 PM, Ludwig Nussel wrote:
If I understand this correctly, you want to mount the original root partition to /etc, so in consequence, / has to be a tmp filesystem onto which the mount points for /etc, /run, /usr, /var are created. That means you are also separating /home, /etc and /root from each other, aren't you? Adrian
John Paul Adrian Glaubitz píše v Út 05. 01. 2021 v 17:17 +0100:
That means you are also separating /home, /etc and /root from each other, aren't you?
I mostly like Ludwig’s thoughts with one exception: pushing /home down the /var hierarchy. From the backup point of view, I view /var and /home as two radical different topics. /home is something which must be preserved under any conditions, /var is something which varies from “I don’t care at all” to “it would be nice to have preserved”. Perhaps some semi-valuable parts of the /var hierarchy should be moved somewhere more appropriate (/var/lib/rpm to /usr/var/lib/rpm? because it is mostly the same character as /usr, written by the packaging program only, for example)? Best, Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mcepl@ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 Ask yourself whether you’ve really obtained fullness of the Holy Spirit, followed by gifts needed for your service. -- Francis MacNutt
On Jan 06 2021, Matěj Cepl wrote:
(/var/lib/rpm to /usr/var/lib/rpm?
/var/lib/rpm has already been moved to /usr/lib/sysimage/rpm. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
On Thursday 2021-01-07 14:00, Hans-Peter Jansen wrote:
It is not a database in "that sense" that I think you might be thinking of/in. In "that sense", the rpmdb is more like an _index_ of what is present in /usr, and is (barring exceptional circumstances) _consistent_ with what's in /usr.
On Thu, 07 Jan 2021, 14:05:01 +0100, Jan Engelhardt wrote:
It might not be _one_ database, but all files in that directory are (and most of them are changed whenever rpm is installing/updating/removing rpms, so the read only does not actually work well for this directory): $ file /usr/lib/sysimage/rpm/* /usr/lib/sysimage/rpm/Basenames: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Conflictname: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Dirnames: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Enhancename: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Filetriggername: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Group: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Installtid: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Name: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Obsoletename: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Packages: Berkeley DB (Hash, version 9, native byte-order) /usr/lib/sysimage/rpm/Providename: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Recommendname: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Requirename: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Sha1header: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Sigmd5: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Suggestname: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Supplementname: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Transfiletriggername: Berkeley DB (Btree, version 9, native byte-order) /usr/lib/sysimage/rpm/Triggername: Berkeley DB (Btree, version 9, native byte-order) Cheers. l8er manfred
Am 07.01.21 um 14:38 schrieb Manfred Hollstein:
The point is you would never change that database without also changing content of /usr, that's why Jan named it an index. If you want to change the content of /usr, you would first switch to a read write mode - including the rpm database. Greetings, Stephan -- Lighten up, just enjoy life, smile more, laugh more, and don't get so worked up about things. Kenneth Branagh
Matěj Cepl wrote:
What to backup certainly adds another dimension to the problem and needs a closer look. I left it out intentionally for now as I had the feeling it opens another can of worms that leads to revamping the /var hierarchy too. After all /var contains both primary data that is eligible (eg a postgres database) and secondary data (caches, logs) that you normally wouldn't back up. The point really is to avoid having all those different locations for special use cases and concentrate on their actual purpose, ie to store your data. Let's be honest what's the point of saving a drawing you made and in ~/Documents, writing a php app in /srv (stores it's data in /var/lib/pgsql though), dropping a shell script in /usr/local/bin, and installing that outdated, proprietary printer driver in /opt? If it was eg Android you'd probably put all of that on your sdcard. So that's why I suggest to symlink /home, /root, /opt, /srv and /usr/local into a subdir of /var by default. That should be all we need for the generic case. Specific use cases where you ie the admin knows that certain locations need special handling you can still override that. So would be perfectly fine to eg mount your /home on your laptop from a different, encrypted disk. A university workstation could mount /usr/local or /opt via nfs for site specific software. Or a cluster node that has /srv for some local storage etc etc. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
On 1/5/21 5:03 PM, Ludwig Nussel wrote:
2 immediate thoughts: a) nice to read, yet somehow I miss the description and relevance to the original problem. b) this is so radical that this needs to be discussed with other GNU/Linuxes; otherwise this looks like it will likely break compatibility. Have a nice day, Berny
On 2021-01-05 17:03, Ludwig Nussel wrote:
Can be read in blog format too: https://lnussel.github.io/2020/12/16/fslayout/
aplanas wrote:
Ah, thanks for pointing that out! It's the better option. I copied the link from the wrong browser window :-( cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
On Tue, Jan 5, 2021 at 11:03 AM Ludwig Nussel <ludwig.nussel@suse.de> wrote:
With regards to the content in /boot, this has been an issue in Fedora too, and some development on a solution has started in the form of bootupd: https://github.com/coreos/bootupd Though unlike in openSUSE, Fedora's GRUB already supports Bootloader Spec, which does implement some of the solution you've outlined. -- 真実はいつも一つ!/ Always, there's only one truth!
Neal Gompa wrote:
Worth taking a look indeed. Thanks for the pointer! cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
On Tue, Jan 5, 2021 at 9:03 AM Ludwig Nussel <ludwig.nussel@suse.de> wrote:
This is very cool and quite similar to what I've been thinking about the past few months, especially since Fedora has moved to Btrfs by default on the desktop. - same realization that effectively the sysroot, the thing that's mounted and needed first is /etc/ because that's where fstab is and that's needed to assemble the rest of the system. And a light weight system reset [1] being essentially reverting to some default or empty /etc/ and /var/. - I've also found /usr/ on a read-only subvolume or snapshot is not too difficult although it is tied to the state of /var/lib/rpm/ right now. If everything is sourced in /usr/ then it's less controversial, if it ever was, to move the rpm database into /usr/. - I'm liking coreos/bootupd as a possible way to populate /boot and /boot/efi using /usr as a source. - I like even more not persistently mounting /boot/efi, which I consider unfortunate from day 1. And for all the same reasons /boot doesn't need to be persistently mounted either. These are bootloader and boot related things, and they should have proper owners, e.g. bootupd and fwupd, and those things should be able to mount them on demand, update them, unmount them. -But I still wonder whether there's a better, simpler, more flexible way to discover and assemble possibly different versions of /usr/ /etc/ and /var/ - and yes /home/ but I'll set that aside entirely because that discussion also should include systemd-homed. - And I wonder about piles of snapshots in hidden .snapshot directories, which ends up confusing the results from common tools like find, du, locate, and it'll even confuse many backup utilities where they'll really think you have 50x 50G worth of sysroot snapshots, or whatever. And that leads to whether various snapshotting utilities should have their own namespace for snapshots outside of the usual mount path. - And for that matter I wonder if each distro should have its own namespace so different versions of the distro, not just different updated states, can all coexist on Btrfs. + What could possibly bind the prior three, some variant of the systemd.io Discoverable Partitions Spec [3], but for subvolumes/snapshots (and readily adaptable for LVM thinp or ZFS databases, and so on). Is this useful? A way to name the top-level, generally invisible to the user, subvolumes and snapshots in such a way as to be self-describing, and obviate such dependency on fstab? Make fstab an addendum rather than primary and mandatory, in particular that it could be reset and we still want to boot. This now qualifies as ancient times. http://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html Specifically I'm looking at the "What We Propose" section, but even Lennart says it's stale and needs to be reimagined. But conceptually is it useful to have a way to bring some structure to subvolumes/snapshots, so they're out of the user's way? And also out of each other's way because there can be more than one reason for doing automated snapshots. libbtrfsutil offers a path based and file descriptor based means of creating snapshots. I don't know if that means persistently mounting the top-level e.g. somewhere in /run or if it's possible to forgo it using the _fd variant. I'm not opposed to nested subvolumes. A hybrid approach does seem to be indicated. But mainly to decomplicate fstab, discover and assemble at boot, and it seems attractive to compartmentalize, organize, have structure. -- Chris Murphy [1] A more serious, true reset that could recover from significant disaster even including unrepairable file system, based on btrfs seed, I think is less interesting by itself and more interesting if the effort can be used for other things like live images and/or install images. The replication feature is quite fast. And checksumming this source on every read means the result has integrity (making certain assumptions for brevity). [2] http://lists.rpm.org/pipermail/rpm-maint/2017-October/006681.html [3] https://systemd.io/DISCOVERABLE_PARTITIONS/
On 06/01/2021 01.44, Chris Murphy wrote:
This is already done on Tumbleweed. There is just a symlink left for compatibility: /var/lib/rpm -> ../../usr/lib/sysimage/rpm
Being able to factory-reset a machine by deleting /etc/* and /var/* is really nice. It also means that a backup of /etc contains just the admin changes to the system and not all the distribution defaults, so it is easier to adapt other (newer) systems the same way. That could simplify the work of the 'machinery' tool.
On 1/6/21 2:01 PM, Bernhard M. Wiedemann wrote:
Being able to factory-reset a machine by deleting /etc/* and /var/* is really nice.
Does this count as factory-reset? And in which situation do you need such a factory-reset (for whatever definition of factory-reset)? For me a really effective factory-reset is installing a system from scratch and running fully-automated config management afterwards.
If one needs automation one should implement real automation and not rely on admin changes in /etc being restored. All in all I'm not convinced that these changes are worth the trouble. Ciao, Michael.
On 06/01/2021 14.47, Michael Ströder wrote:
Does this count as factory-reset? And in which situation do you need such a factory-reset (for whatever definition of factory-reset)?
Before I joined SUSE, I built video-player hardware appliances based on Debian and they had such a factory reset implemented with aufs to allow people to get a system back into a defined config state. Comes handy for all kind of hardware that is rented out.
If one needs automation one should implement real automation and not rely on admin changes in /etc being restored.
I have used chef and salt a lot over the last years and there is always one shortcoming: They are great at setting up things, but if you update your state descriptions to no more install + configure + run foo, it will just leave foo on the systems. Often it is not a big deal, but if you have automatic service discovery in a cluster, it might mean that those dropped services are still used. Starting from scratch is one way out, but can be rather slow. Also not so easy, if you have relevant data on the machine. Another way would be to have an explicit uninstall target, but most published salt formulas dont have that.
On 1/6/21 5:51 PM, Bernhard M. Wiedemann wrote:
So IMHO this was a rather constrained setup, much under control of the vendor and not subject to change by the average sys admin.
Yes, that's a real deficiency of most config management *code*.
Starting from scratch is one way out, but can be rather slow.
In case of your video-player hardware appliance example users will probably expect a factory reset to go really quickly (up to a minute or so). But when talking about server service setups 15 to 30 min. is acceptable. The most important thing in this case is that you don't need to do something manually.
In case of HA for databases or file storage you need to setup instant replication to backup instances anyway. Ciao, Michael.
Am Mittwoch, 6. Januar 2021, 17:51:01 CET schrieb Bernhard M. Wiedemann:
Fun, I've been using unionfs (-2005) and later aufs in diskless (fat) workstation setups with one layered common nfs {open}suse installation without loosing a specific configurability of each machine. The proposed setup would allow for easier setup and configuration of creative usage schemes. Thumbs up from here. Cheers, Pete
Chris Murphy wrote:
I really hope they don't repeat all the mistakes SUSE made on the way :-)
I agree.
Yeah, similarly to /boot I guess. Should be sufficient if snapper provides access when needed.
Sure, certainly a nice gimmick for nerds that we may get for free when kept in mind. Not sure if it's of much practical relevance though :-)
Absolutely. Also, I'd like to think of revisions of a particular tree rather than snapshots. That probably changes the way to look at potential new naming schemes. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
Dne úterý 5. ledna 2021 17:03:33 CET, Ludwig Nussel napsal(a):
It's very interesting. Looks reasonable. I just wonder about two points: 1) What are consequences (if any) for fully encrypted systems, i.e. now encrypted LVM containing / and swap, with only unencrypted /boot/efi? 2) What happens if I'd try to install some 3rd party RPM (build outside OBS)? They often go into /opt, or follow "traditional" /etc /usr /var, ... -- Vojtěch Zeisek https://trapa.cz/ Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
On Wednesday 2021-01-06 13:47, Vojtěch Zeisek wrote:
I see little difference. Either the kernel+initramfs is located in an unencrypted portion and does the unlock procedure, or else the bootloader needs to understand crypto volumes. I'd prefer the former. Bootloaders are already ... overloaded (pun not intended, but I'll happily take it in). The visions to get rid of GRUB and just let the EFI runtime load things on x86 is something I would very much welcome -- for one, it would save having to add bootsplash styling to two separate components (currently bootloader and initramfs phase) in two different formats.
2) What happens if I'd try to install some 3rd party RPM (build outside OBS)? They often go into /opt, or follow "traditional" /etc /usr /var, ...
Apart from the sillyness that teamviewer is, I can't remember a time something tried to install to /opt. VirtualBox? Is in /usr. Anydesk? /usr. VMware? I thhhink /usr too, but it's been too long since I used it.
Dne středa 6. ledna 2021 14:18:48 CET jste napsal(a):
Master PDF Editor is in /opt/master-pdf-editor-5 Synology NAS software (synchronization clients, ...) in in /opt/Synology (well, I converted the packages from DEB using alien) Yakyak (Hangouts client) is in /opt/yakyak Zoom is in /opt/zoom Cisco Anyconnect VPN client is in /opt/cisco MS Skype and Teams are surprisingly in /usr :-) I can imagine bunch of drivers (e.g. for printers) to be in /opt. Without some kind of backward compatibility this can be very problematic for a lot of people, I'm afraid... -- Vojtěch Zeisek https://trapa.cz/ Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
Am Mittwoch, 6. Januar 2021, 14:35:29 CET schrieb Vojtěch Zeisek:
Many more, e.g. - balenaEtcher - beAClientSecurity - brother - calibre - epson-inkjet-printer-escpr - epson-printer-utility - master-pdf-editor-5 - MediathekView But from a purely users perspective: I woudn't care if /opt were excluded from snapshots.
Without some kind of backward compatibility this can be very problematic for a lot of people, I'm afraid...
If you breaked using software installed in /opt, beaClientSecurity e.g., that would be a real damage for me. Legacy printer drivers as well. Regards, Alexander
Am Donnerstag, 7. Januar 2021, 15:23:26 CET schrieb AW:
Some of those can be replaced with packages from the distro space (MediathekView) or even right from TW repo itself (calibre). LibreOffice is a good and free alternative to edit PDF files, given you don't need layout recognition.
But from a purely users perspective: I woudn't care if /opt were excluded from snapshots.
Cannot comment on this, my roots are on xfs, and I *do* backups (with borg).
Well, no worry, sooner or later beaClientSecurity will break by itself on updates, at least that happens on a regular base for the few attorneys, that I happen to support (on Windows because RA-MICRO..). Pete
Dne středa 6. ledna 2021 15:21:27 CET, Jan Engelhardt napsal(a):
Yeah, it's terrible crap (e.g. recently discussed problem with ibus), but unfortunately plenty of people must use it, willing or not... :-/
IMHO nobody complains that /opt would be excluded from snapshots, backups or so, but as /opt simply is often used by 3rd party SW (I listed some examples from my notebook in [1], I wonder where Lexmark drivers in my work desktop are), openSUSE can't IMHO break installation of SW into /opt. Well, I'm unable to foresee all the consequences of lnussel's proposal (which seems otherwise great to me). Might be there is no problem at all. I just wished to point out that breakage of installation into /opt would be bad for many users. [1] <https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/ message/ADXXI33QP4CORFBZHTQJSZTFJLS4C5JI/> -- Vojtěch Zeisek https://trapa.cz/ Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
On Wed, 2021-01-06 at 15:21 +0100, Jan Engelhardt wrote:
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s13.html Installing into /opt is FHS-compliant for 3rd party software. Actually, I'd say the standard recommends it. Martin
On Wed, 2021-01-06 at 15:21 +0100, Jan Engelhardt wrote:
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s13.html Installing into /opt is FHS-compliant for 3rd party software. Actually, I'd say the standard recommends it. Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer
On Wed, Jan 06, Jan Engelhardt wrote:
I have e.g: - CNI plugins for kubernetes - Intel OpenVINO And a lot of other binary only stuff from different vendors. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & MicroOS SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany Managing Director: Felix Imendoerffer (HRB 36809, AG Nürnberg)
On Wed, Jan 6, 2021 at 6:19 AM Jan Engelhardt <jengelh@inai.de> wrote:
I think the initramfs should be secured other ways. [1] The kernel and initramfs aren't secrets, don't need to be encrypted. We encrypt them because by doing so we make them essentially impossible to attack. But what we actually care about is not confidentiality, but rather integrity and authenticity. One option is FAT. This is simple, widely supported by many bootloaders and even UEFI firmware. Another option is Btrfs. GRUB offers no journal replay for ext3/4 and XFS. Therefore there are various tricks or just luck, to try to avoid cases where journal reply might be needed at boot time, possibly rendering the system unbootable because the bootloader has an inconsistent view of the file system, lacking journal replay in such cases. Btrfs has no journal, thus doesn't run into this difficulty. You'll get either the old or the new boot configuration following a crash. There is the EFI fs [2] project. Turn GRUB file system drivers into EFI drivers. Viola, the UEFI firmware now understands Btrfs, but you don't need GRUB. And for BIOS, extlinux supports Btrfs. [1] We have this same problem with the hibernation image. I wonder if the authentication component of this technique could be used to secure a locally generated initramfs? It might require a small generic initramfs in the kernel, to facilitate obtaining a sealed key from a TPM or "yubikey" in order to authenticate the real initramfs. Or possibly just ship a large generic signed initramfs, generated distro-side. https://lkml.org/lkml/2019/7/10/601 [2] https://github.com/pbatard/efifs -- Chris Murphy
Hello, a side question: On 2021-01-08 07:36, Chris Murphy wrote:
The kernel and initramfs aren't secrets, don't need to be encrypted.
I wonder if it never can happen that the initramfs contains secrets like whatever private keys or password hashes or things like that? E.g. an admin may need extended functionality in his initramfs so his particular initramfs may contain more things than usual and even unexpected things. Furthermore when someone uses LUKS he probably wants to have all and everything encrypted to be completely on the safe side without the need to always think about what gets stored where. Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5 - 90409 Nuernberg - Germany (HRB 36809, AG Nuernberg) GF: Felix Imendoerffer
Dne pátek 8. ledna 2021 10:13:27 CET, Johannes Meixner napsal(a):
Exactly. Otherwise it highly looses sense. -- Vojtěch Zeisek https://trapa.cz/ Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
Hello Johannes,
Chris actually talked about that in his post. If some stage in the boot process in encrypted, the predecessor stage needs the means to retrieve the secrets and decrypt it. And *some* stage in the boot process needs to be unencrypted, so that it can be booted by the HW without prior access to keys or pass phrases. A "small generic initramfs" as suggested by Chris would be one obvious option. Other options would be adding the functionality to the boot loader (grub), or using firmware based encryption / measurement (TXT, anyone? :-). OTOH, retrieving secrets and decrypting almost arbitrary storage isn't a trivial task. Given that there are lots of encryption schemes and many different ideas how to store secrets (pass phrase, USB stick, TPM, yubikey/smart card, biometric devices, and any combinations thereof), it requires a rather large-ish and flexible software stack. From the Linux PoV, this excludes the FW option, because FW lacks the flexibility to support the multitude of different preferences that Linux users have. Chris' small generic initramfs actually looks most promising to me in this regard, if there is really a need to encrypt the actual initramfs. If that small generic file needs to be re-built on a regular basis, the question arises how *this* image would be protected from inclusion of sensitive secrets ... something other than dracut must be used to build it, for sure.
I'd say that an administrator doing things this way is misguided, and would need to take the responsibility for protecting his secrets himself. In general, the initramfs should be minimal, and distros should do their best to enforce or at least encourage that. But I digress.
You need to start somewhere. Total encryption is an illusion as long as you're not using HW/FW based technology. Moreover, if you deal with highly sensitive material such as secrets to access data, you *ought* to think about where they get stored. I agree that using full disk encryption simplifies this, but it's mostly a matter of convenience, not security. I, for one, have never bothered to encrypt /boot. Regards, Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer
Hello Martin, thank you for your comprehensive explanation! Now I understand. I had erroneously thought it is about dracut's initramfs that an admin could make as "big-and-fat" as he needs/likes. Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5 - 90409 Nuernberg - Germany (HRB 36809, AG Nuernberg) GF: Felix Imendoerffer
On Fri, Jan 08, Johannes Meixner wrote:
Of course the initramfs can contain such secrets. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & MicroOS SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany Managing Director: Felix Imendoerffer (HRB 36809, AG Nürnberg)
Vojtěch Zeisek wrote:
Only good consequences I hope :-) Like not having grub prompt for a passphrase to decrypt the kernel and then the initrd prompting again to actually access the fs.
2) What happens if I'd try to install some 3rd party RPM (build outside OBS)? They often go into /opt, or follow "traditional" /etc /usr /var, ...
IMO we need a new approach to that. Creative ideas welcome :-) The host's rpm database is in charge of the OS, the OS is in /usr. So by definition can't deal with 3rd party stuff in /opt. There are people who prefer to not install 3rd party software as rpms at all and prefer snaps and flatpacks for example. Those app formats might be built from rpms too but rules for the contained tree could be different. Another approach could be to invent a layered database approach that can track 3rd party packages separately. That may even include 3rd party packages that try to install to /usr... cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
Hi, On 1/8/21 11:29 AM, Ludwig Nussel wrote:
But 3rd party stuff is not only in /opt! There are plenty of 3rd party applications that put stuff into /usr (if we consider the non containerized "traditional" 3rd party ISV universe). If we strive for a strict separation of "user installed and modified content" vs. "part of the system" then using /usr is not the answer. We are not going to get everyone to behave (keep their stuff in /opt) nor are we going to get the existing apps that do this intermingling to change (move their stuff to /opt). Meaning if one of the goals is strict separation then something new has to be created "/system", for example, and everything that is shipped as part of the system lives there. But then the question becomes how does one protect this new place from getting "polluted" by 3rd party code? The answer may be that "traditional" apps will probably not change and new apps will probably be container based and as such the risk that $UNMANAGED_3rd_PARTY_APP scribbles to "/system" is reasonably low. But hoping to achieve separation, which is what I read into the post as a part of the idea, and keeping things in /usr is a pipe dream.
Forget packages, how many commercial apps that follow the "traditional" install onto the base system approach deliver their stuff as packages? 5%? I've been out of that world for a long time but would be surprised if it has changed much. So what's missing for me is a concise goal statement as to which problem is intended to be solved. Is it over all complexity or is the primary goal better separation of system and user (admin) stuff? Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Engineer LINUX Technical Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On 2021-01-08 T 17:29 +0100 Ludwig Nussel wrote:
Vojtěch Zeisek wrote:
One could combine RPM's --dbpath and --root options to *always* install 3rd party software into a separate hierarchy. OpenPKG makes use of that (http://www.openpkg.org/) for two decades, if I am not mistaken. No sure, though, whether that is a scalable approach. Probably fully separating OS and applictions makes more sense? So long - MgE -- Matthias G. Eckermann, Director Product Management Linux Platforms SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nürnberg (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
Hi, my feedback: On Tue, Jan 05, Ludwig Nussel wrote:
tl;dr rpms to only operate below /usr and nowhere else.
I like it, but there are some things to consider or fix: 1) Never put /home on /var, normal users can fillup their home directory and prevent the system from working. 2) As the discussions here show, we still need /opt, but I don't see there a problem. 3) Not sure about /srv. The FHS spec was problematic from the beginning as designing this for a place, where admins and distributions should store stuff, but the distributions are not allowed to modify or overwrite admin changes. We (openSUSE) do it clearly wrong. I think most applications and admins use /home/<app> or something like /data instead. So we could drop it, at least we should not use it at all. 4) /usr/lib/linux: so bootloader need to be able to load the kernel from a different partition than the initrd? I doubt this will work. Copying the kernel image is also problematic, with all the space free checks, cleanup, etc. normally the package manager would do for you. 5) No rollback for /etc needed in the final setup? This contradicts a previous comment, that /etc needs to be included in a rollback to match the configuration of the installed software. This stays still valid with the final approach, as admin made changes for updated services could break the old version of that service. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & MicroOS SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany Managing Director: Felix Imendoerffer (HRB 36809, AG Nürnberg)
Thorsten Kukuk wrote:
Never say never :-) Running out of disk space certainly is a situation the OS needs to be able to handle in a defined way. Who or what is the most likely offender or what 'working' means depends on use case though. On a MicroOS based Rpi weather station I'm certainly not worried that my .bash_history grows out of bounds for example. If I was, I'd also have to take care of /var/tmp, /var/mail etc.
IOW there's work to do, yes. Have to check bootupd.
You lost me here. In general OS packages must not and can not mess with /etc directly. So it wouldn't be possible to install files there, nor mangle them in nasty %post scripts. If for whatever reason config files need to be updated, a defined mechanism is to be developed to perform the migration at the right time. That has to work forwards and backwards and has to trigger a mechanism that creates snapshots/revisions of /etc. Yes, there be dragons. Big ones. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
Hello, Happy New Year! On 2021-01-05 17:03, Ludwig Nussel wrote:
https://github.com/lnussel/lnussel.github.io/blob/fs/_posts/2020-12-16-fslay...
I am really not an expert in this area but as far as I see you basically propose a new filesystem hierarchy that is different than the Filesystem Hierarchy Standard (FHS). But you use the same directory names as in the old FHS. My gutt feeling is that this won't work in practice. I think different things should have different names to be able to distinguish them and to have easier backward compatibility (old names mean the old stuff and new names mean the new one). So I think a new filesystem hierarchy should use new names and then new directory names could be even comprehensible (what the heck does 'usr' mean and why does 'etc' mean "config"). E.g. names that tell the purpose, for example something like /os (or more explanatory /operating_system) /boot (or more explanatory /bootloader) /conf (or more explanatory /admin_config) /data (or more explanatory /system_data) /user (or more explanatory /user_data) /temp (or more explanatory /run_time_data) /apps (or more explanatory /applications) ... Then step by thep things could be moved from FHS directories to the new ones and when (after some years) nothing uses the old FHS directories they can be dropped. Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5 - 90409 Nuernberg - Germany (HRB 36809, AG Nuernberg) GF: Felix Imendoerffer
Hi Am 07.01.21 um 13:12 schrieb Johannes Meixner:
Please, no. This reinvents names for no good reason, and breaks muscle memory. Best regards Thomas
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
Hi Am 07.01.21 um 13:33 schrieb Michal Suchánek:
People have expectations where to find things. There's only so much you can change without making people really upset. Then you have programs like autotools, which configure directories in variables like bindir, sbindir, includedir. Using entirely different names for the directories creates a mess for developers. If anything, name the new names symlinks. (Not that I advocate for this.) Best regards Thomas
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
On Thu, Jan 07, 2021 at 01:52:01PM +0100, Thomas Zimmermann wrote:
Why would anyone expect anything in cryprically named three-letter directories unless reading a standard that mandats those braindead names? Doesn't FHS that mandates those evolve over time and move things around? autotools have those variables exactly for the reason you might want to specify other place where to put your stuff, and you have to most of the time because the FHS today is not what the filesystem standard was when these tools were designed. Also see the unix video - usr stands for user data but today it's in /home. Yeah, makes perfect nonsense. Thanks Michal
Hi Am 07.01.21 um 14:26 schrieb Michal Suchánek:
Established convention. Consistency with other distros.
Exactly. But when you're writing a build script and have to ask yourself where *that* target has be installed in, having consistent and recognizable naming helps a lot. And autotools is just one program. Many examples and tutorials expect a certain uniformity on Linux.
Also see the unix video - usr stands for user data but today it's in /home. Yeah, makes perfect nonsense.
I'm all for improving the FS layout, but you won't improve things by inventing new names. Remember https://xkcd.com/927/ Best regards Thomas
Thanks
Michal
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
On Thu, Jan 07, 2021 at 02:41:43PM +0100, Thomas Zimmermann wrote:
I am not saying that it is great idea for a Linux distribution to move to completely different names and ignore FHS. It does not change that FHS is pretty stupid, and I would not mandate using it on anything that is not inherently bound by it. Thanks Michal
Hello Thomas, I think you misunderstood what I meant. On 2021-01-07 14:41, Thomas Zimmermann wrote (excerpts):
Please, no. This reinvents names for no good reason, and breaks muscle memory.
My "good reason" is that new and different things should use different names to make different things distinguishable. Please respond to what I wrote and explain why you think new and different things can/should still use old names.
Established convention. Consistency with other distros.
I think one cannot have something new and different and keep stablished conventions and consistency with the old stuff.
I'm all for improving the FS layout, but you won't improve things by inventing new names. Remember https://xkcd.com/927/
Only inventing new names would not improve anything and I am not talking about inventing new names to improve something. As far as I understand it Ludwig's proposal is not a backward compatible improvement of the FHS but something new and different. If Ludwig's proposal is a backward compatible improvement of the FHS, then new names would be wrong because a FHS compatible filesystem hierarchy should use FHS names. https://xkcd.com/927/ is not about backward compatible improvements but about different (even competing) standards. Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5 - 90409 Nuernberg - Germany (HRB 36809, AG Nuernberg) GF: Felix Imendoerffer
The way I see it, Ludwig has proposed two parallel views/layouts for the file system, one facilitates placement of files according to (openSUSE's implementation of) the FHS (the one most users and software deployers care about), and the new one facilitates storage management (backups, rollbacks, etc.). Thinking of it this way, I like Johannes's idea of using new names for the storage management view, and expanding Ludwig's strategy of using symlinks to fill out the FHS view. David On 1/7/21 6:40 AM, Johannes Meixner wrote:
Dne čtvrtek 7. ledna 2021 14:41:43 CET, Thomas Zimmermann napsal(a):
For newcomers from Windows, everything is same (or equally strange and new), no problem. But users of contemporary distributions are used to certain logic, naming, etc. Such people can be very confused. Perfect documentation could fix this. I don't argue against progress (anyway, I don't have technical skill to foresee all consequences), I'd just point that we should be careful here. -- Vojtěch Zeisek https://trapa.cz/ Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
On 1/7/2021 10:06, Vojtěch Zeisek wrote:
Speaking of Windows, they have a System32 directory where 64-bit stuff goes and SysWOW64 where 32-bit stuff goes. And while System32 is legacy, SysWOW64 is relatively new (at least speaking of the timelines in which we are...and yes I do know what WOW64 is; the directory names are still confusing). So FHS isn't the only source of naming which is at least not necessarily descriptive and at worst downright confusing. -- Jason Craig
On 1/7/21 1:12 PM, Johannes Meixner wrote:
It's just historic. "usr" used to be the place for the home directories and "etc" just a place where to put files that didn't belong anywhere else [1]. There is an old video by AT&T with an introduction to Unix from the 70s or 80s where the meaning is explained [2]. Adrian
[1] https://ask.slashdot.org/comments.pl?sid=224934&cid=18220458 [2] https://www.youtube.com/watch?v=tc4ROCJYbm0
Johannes Meixner wrote:
On 2021-01-05 17:03, Ludwig Nussel wrote:
https://github.com/lnussel/lnussel.github.io/blob/fs/_posts/2020-12-16-fslay...
Not really. Just trying to get some order into a historically grown mess :-)
If one wanted to create a new OS from scratch that wants to break with it's origins like eg Android or MacOS such names would probably be used, yes. I don't think we are aiming for that though, not me at least. So I think we can use the existing TLAs while specifying their purpose more precisely just fine. After all those names are not really end user visible anyways. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
On Tue, 2021-01-05 at 17:03 +0100, Ludwig Nussel wrote:
Quick questions: 1. Rather than merging / and /etc, why not use tmpfs for / proper (merge it with /run, for example), and mount /etc separately? 2. Why no CoW for data in general? Generally speaking, I like your proposal. It is focused on the principles of hierarchy simplification, snapshotting, and distribution maintenance. Of course, as you have the distro maintainer hat on :-) I believe there are other possible guiding principles for designing the hierarchy, which should be taken into account when we redesign it. Here comes a brain dump. - NFS root or similar concepts, where system files are shared between multiple hosts. Is this obsolete today, as storage has become so cheap that sharing system directories is no issue any more? - arch dependence / independence (think /usr/share vs. /usr/lib). (If we redesign the file system, we might consider stashing arch-dependent files into separate hierachies, going from "lib vs lib64" towards something like "lib.x86_64 vs lib.ppc64le" etc) - package origin; the /opt vs. /usr topic has been discussed on this thread already. IMO it actually makes sense to store distro-provided and 3rd-party-provided packages in separate file systems (*). But then, if rpm was strictly confined to /usr, we'd either need a separate rpm data base for 3rd party software, or fully switch to "foreign" packaging such as appimage for 3rd parties ... - ... which leads me to the question how these foreign packages would end up in your scheme. Do you consider them "data"? Would it make sense to reserve a slot in the hierarchy for them? - other operating systems tend to put everything belonging to a certain "app" into a few app-specific directories. This is how /opt works, and from the 3rd party point of view, it makes a lot of sense. This is mostly about how to organize "data", I suppose. But still worth consideration. - data with special needs such as VM images (no CoW) or container images (docker storage). Ok, it's "data", but definitely with different requirements than most other data. - backup strategy. I like this a lot about your scheme; /usr would essentially need no backup at all, as it could be easily recreated from distro packages any time. But /home and /var also require different backup strategies. Being able to use the same backup strategy for an entire file system is a good thing in daily administration. - privacy and encryption. with per-user encrypted $HOME, it makes sense to store other data belonging to the users in the same file system, like their mail spool or temporary files. World-writable /tmp should be obsolete; basically nothing except the user's own $HOME should be writable by the user. Data shared between users would live in special, separate area. Similar considerations would hold for services. - user's personal files may have different requirements as well. Some are precious, some temporary, and many on a middle ground; some are privacy-sensitive but most are not. I for one tend to do all my open- source related work (i.e. all my work) outside of my encrypted $HOME. I guess most of the stuff I mentioned is probably "data" in your scheme and thus out of scope for your document. But the bottom line is that there are lots of different types of "data", and I'm unsure whether putting it all in a single top directory is really a good idea. Sorry for the long post, I hope at least some thoughts are valuable. Cheers, Martin (*) IMO 3rd party software installing into /usr should adhere closely to the distro standards, and as these differ between distros, more often than not 3rd party software behaves poorly in this regard. I actually prefer such software installing itself into /opt rather than polluting /usr. -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer
Hello, On 2021-01-08 13:29, Martin Wilck wrote:
I remember "good old MS-DOS times" where each application installed all of it (programs and date) below its own specific directory (regardless how that directory was named). So it was easy e.g. to see all what belongs to that application, how much disk space it wastes, and to get completely rid of it.
I like the Android idea to run different things under different accounts to get them separated. I would like to see that for Linux services in general (except exceptions like systemd). So each service would use its own user and home directory which is the only writable place for that account. I think for third-party software it would be also good: Each third-party software uses its own separated account and all what belongs to a third-party software is stored in the home directory of this separated account. So a future filesystem standard should support such use cases. By the way: The cupsd is a fine bad example of a service that runs as root (because sometimes it may need to do some exceptional stuff). About 15 years ago we let the cupsd run as user 'lp' which worked for all normal user's printing cases but we got many bad user complaints plus agressive rejection by upstream ('RunAsUser' got dropped by upstream in a minor release) so we gave up and let that thingy run as root since then. Nowadays one can cage that beast in a container. Wow - what great progress! But I think traditional Unix style (like FHS and all what needs to do "important" things runs as root) will stay forever (except some big does all on its own like Google with Android). Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5 - 90409 Nuernberg - Germany (HRB 36809, AG Nuernberg) GF: Felix Imendoerffer
On Fri, Jan 08, Johannes Meixner wrote:
For this we have today package managers, who do provide the same functionality much nicer ;)
That's what we call Container on Linux ;)
With MicroOS and containerized workload, this is already possible today. My router and servers are completly containerized, including a full mailserver setup including dovecot. And of course my nameserver, dhcp, etc. Only chrony is not containerized: chicken-egg problem, you need the correct time to pull a container.
That's why even SAP is moving in the container direction :)
So a future filesystem standard should support such use cases.
The nice things about containers are: they don't care about your filesystem standard, as they are running outside of it.
To be honest, I like a containerized cups running as "root" much more than having a cups running on my system as user "lp".
FHS will not stay forever, FHS is dead, since"FHS upstream" refuses to do anything new. And "imprtant" things runs as root are luckily solveable. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & MicroOS SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany Managing Director: Felix Imendoerffer (HRB 36809, AG Nürnberg)
Hello, things get somewhat off-topic - but today is Friday ;-) On 2021-01-08 14:51, Thorsten Kukuk wrote:
Do you mean package managers like RPM? RPM never leaves stuff behind when removing an application? Oh yes there is the 'ghost' workaround provided the spec file maker can imagine all files ;-)
I am maily lamenting about that well known and established technologies did not sufficiently evolve in practice (regardless that in theory it should have been possible) so they will die out and get replaced with something completely new and different. SysV init sucks - let's have systemd ;-) Seems evolution does not happen sustainable in IT instead God-mode-like creation from scratch happens? Oh - wait - there is the kernel that evolves since decades. Probably only an accidental exception. Yeah - RPM cannot remove all files belonging to an application, so just use container images instead of RPM packages ;-) Yeah, yeah - FHS is insufficient, so just use containers instead of directories ;-) Yeah, yeah, yeah [1] Unix cannot do "important" things as non-root, so just use containers instead of Unix accounts ;-) Now when all is separated in containers one just needs to orchestrate them to get again the illusion of "one system". But I understand there is no way back. Containers are the future. [1] https://www.songtexte.com/songtext/the-beatles/she-loves-you-6bd292c2.html Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5 - 90409 Nuernberg - Germany (HRB 36809, AG Nuernberg) GF: Felix Imendoerffer
On Friday 2021-01-08 13:29, Martin Wilck wrote:
The big traditional problem with nonstandard root devices is that one is constantly fighting the initramfs generators. Some want the / source device to always exist (which is not the case with aufs), and only allow non-existing devices when using some "nfs" mode. But the nfs mode does not support constructing an aufs mount. Not to mention that the classic /etc/fstab did not even have the ability to specify a placeholder such as macaddr, or the mount move operations. One would always end up writing the shell code for the mount dance custom, and then try to funnel that into the initramfs.. arguably it's gotten better with the presence of systemd inside the initramfs. mount -t nfs fs:/srv/suse /a1 mount -t nfs fs:/srv/rw/<macaddr> /a2 mount -t aufs -o upper=/a2:lower=/a1 aufs /sysroot mkdir /sysroot/a1 mkdir /sysroot/a2 mount --move /a1 /sysroot/a1 mount --move /a2 /sysroot/a2
Debian is doing that since a long time, but as long as no one else is jumping on the train, it probably won't progress.
Martin Wilck wrote:
That would be also a possible model. systemd-nspawn does that for volatile containers. With / on tmpfs what would be the meaning of the kernel's root= parameter then though?
2. Why no CoW for data in general?
That's a good question. Is it safe to assume that services like libvirt, postgres etc behave properly when their data is on a CoW fs?
Absolutely!
When the OS is read-only in /usr and data is clearly separated, sharing should become easier than ever I hope. It really doesn't matter where /usr comes from or how it's updated. The initrd just needs to be able to mount it.
Personally I don't like lib64 but it's not in the way either as long as we don't have to ship systems that run more than its 32 and 64bit variants. To support two completely different architectures in one volume, the /usr trees could be stored in separate partitions or btrfs subvolumes. The discoverable partitions spec prepares for that already.
Exactly.
Containers end up in /var and nobody questioned that so far :-) Where are snaps and flatpaks stored? Also /var or ~ I assume. IMO 3rd party packages are of the same kind.
Yes, as I wrote elsewhere that needs a closer look too. There's at least primary and secondary data where the latter does not need to be backed up.
The most simple solution to avoid thinking too much about that was full disc encryption so far :-) It's a bit brute force but makes sure there are no accidental leaks.
It probably isn't in a specific case indeed :-) The question is basically whether a single volume for data is just good enough most of the time so we can do that by default. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
participants (28)
-
Andreas Schwab
-
aplanas
-
AW
-
Bernhard M. Wiedemann
-
Bernhard Voelker
-
Chris Murphy
-
David Walker
-
Hans-Peter Jansen
-
Jan Engelhardt
-
Jason Craig
-
Johannes Meixner
-
John Paul Adrian Glaubitz
-
Larry Len Rainey
-
Ludwig Nussel
-
Manfred Hollstein
-
Martin Wilck
-
Martin Wilck
-
Matthias G. Eckermann
-
Matěj Cepl
-
Michael Ströder
-
Michal Suchánek
-
Neal Gompa
-
Peter Suetterlin
-
Robert Schweikert
-
Stephan Kulow
-
Thomas Zimmermann
-
Thorsten Kukuk
-
Vojtěch Zeisek