Struggling with dist-upgrade & disk space management
Hello folks, (I hope this is the right list; if not: apologies, and where would this belong more? users@? support@? Somewhere else?) I have Tumbleweed installed on a pair of desktops: the first since February 2018, and the second since April 2021. I let the former gather dust since setting up the latter until a couple of weeks ago; I figured that dist-upgrading this "old" setup was the responsible thing to do after waking it up (openSUSE-release 20210621 → 20220122). The upgrade itself went smoothly, but now I'm facing a problem that I have experienced quite often with this first desktop (though not with the second desktop yet): 'btrfs filesystem usage /' says that… *something* is close to bursting, and I suspect that the next 1GB+ dist-upgrade will have me scrambling again for disk space, and confused about snapshot management. First, what I think I understand (see footnotes for the commands I use to come to these conclusions): * I have about 20GB's worth of actual data in the /… subvolume? partition?, estimated by weighing /usr.[1] * The / subvolume is 40GB, and has almost no free space left.[2] * My recent upgrade left me with a 15GB snapshot that *I think* is responsible for / being so full?[3] Onto my questions, ordered from "pressing & down-to-earth" to "mostly of academic interest": (1) At this stage, how would a Competent Sysadmin™ proceed? Given that (a) this is not the first time I've struggled with this, (b) I've (tried to) read up on btrfs and snapper, but evidently I couldn't digest that info into actionable maintenance protocols, I figure there are some fundamental concepts that haven't made their way to my brain yet; I'd be very grateful if someone could nudge me toward the path to Enlightenment. (2) Is this 2018 setup still salvageable, or should I reinstall? I try to follow factory@lists.opensuse.org and project@lists.opensuse.org, so I know that Tumbleweed underwent some Big Changes between February 2018 and April 2021; I assume that reinstalling should nonetheless not be necessary, and I just messed something up? (3) What does this huge snapshot represent? The dates in snapper list make me think that it's a diff between the first installation (2018-02-20) and the last upgrade (2022-01-24), in which case, yup, of course the diff would be huge. Empirically though deleting this huge snapshot just moves all the gigabytes into snapshot 1 (that's what I remember happening the last time I tried to reclaim disk space), which I cannot delete. Is there a way to tell snapper to just forget about 2018 and use the last upgrade as the oldest point of reference? (4) There are some differences between the two desktop setups I don't understand. For one, 'btrfs filesystem usage /' reports very different device sizes: on the first setup it's 40GB; on the second setup it's roughly 440GB.[4] This is surprising to me, since both disks have similar total sizes (about 500GB).[5] Also, the subvolume setup seems different?[6] Not sure why the older setup has /tmp listed, since findmnt /tmp tells me it's a tmpfs; why is it still on btrfs's radar? Also, why does the newer setup have entries for /root and /home, and not the older one? Thank you if you've made it this far; I realize that this is a lot of open-ended questions. I tried to do my research, but it seems that the more I dig the less I understand. FWIW: I very much try to let Tumbleweed do its thing without interfering: I pick the default partitioning & subvolume settings when installing, I don't interact with zypper beside "install" and "dist-upgrade", and were it not for these snapshot snags I would have no reason to research snapper nor btrfs (beside idle curiosity, of course). I'm reaching out solely because dist-upgrade sometimes fails due to lack of space… and I wonder what I'm doing wrong :/ Again, thank you for your time. And thanks for Tumbleweed in general! Lest this wall of text give the wrong impression: the distro provides as smooth an experience as any I've had on Linux. [1] On the older desktop: # du -hs /usr --exclude /usr/local 19G /usr [2] On the older desktop: # btrfs filesystem usage / Overall: Device size: 40.00GiB Device allocated: 39.38GiB Device unallocated: 640.00MiB Device missing: 0.00B Used: 37.47GiB Free (estimated): 1.82GiB (min: 1.50GiB) Free (statfs, df): 1.82GiB Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 100.05MiB (used: 0.00B) Multiple profiles: no Data,single: Size:36.31GiB, Used:35.12GiB (96.71%) /dev/sda2 36.31GiB Metadata,DUP: Size:1.50GiB, Used:1.18GiB (78.48%) /dev/sda2 3.00GiB System,DUP: Size:32.00MiB, Used:16.00KiB (0.05%) /dev/sda2 64.00MiB Unallocated: /dev/sda2 640.00MiB [3] On the older desktop: # snapper --iso \ list --columns number,type,date,used-space,cleanup,description # | Type | Date | Used Space | Cleanup | Description ---+--------+---------------------+------------+---------+---------------------- 0 | single | | | | current 1* | single | 2018-02-20 20:13:41 | 237.26 MiB | | first root filesystem 2 | pre | 2022-01-24 07:39:54 | 15.98 GiB | number | zypp(zypper) 3 | post | 2022-01-24 12:58:38 | 12.64 MiB | number | 4 | pre | 2022-01-24 13:46:28 | 960.00 KiB | number | zypp(zypper) 5 | post | 2022-01-24 13:49:01 | 73.57 MiB | number | [4] On the newer desktop: # btrfs filesystem usage / | head Overall: Device size: 444.63GiB Device allocated: 282.05GiB Device unallocated: 162.58GiB Device missing: 0.00B Used: 272.94GiB Free (estimated): 170.77GiB (min: 170.77GiB) Free (statfs, df): 170.77GiB Data ratio: 1.00 Metadata ratio: 1.00 [5] On the older desktop: # parted -l Model: ATA ST500DM002-1BD14 (scsi) Disk /dev/sda: 500GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 9437kB 8389kB bios_grub 2 9437kB 43.0GB 42.9GB btrfs legacy_boot 3 43.0GB 492GB 449GB xfs 4 492GB 500GB 8310MB linux-swap(v1) swap On the newer desktop: # parted -l Model: ATA LDLC F7+480GB (scsi) Disk /dev/sda: 480GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 538MB 537MB fat32 boot, esp 2 538MB 478GB 477GB btrfs 3 478GB 480GB 2148MB linux-swap(v1) swap [6] On the older desktop: # btrfs subvolume list / | grep -v snapshots/ ID 257 gen 622718 top level 5 path @ ID 258 gen 638632 top level 257 path @/var ID 259 gen 637339 top level 257 path @/usr/local ID 260 gen 622718 top level 257 path @/tmp ID 261 gen 637339 top level 257 path @/srv ID 262 gen 637339 top level 257 path @/opt ID 263 gen 629216 top level 257 path @/boot/grub2/x86_64-efi ID 264 gen 637331 top level 257 path @/boot/grub2/i386-pc ID 265 gen 637331 top level 257 path @/.snapshots ID 269 gen 637339 top level 258 path @/var/lib/machines On the newer desktop: # btrfs subvolume list / | grep -v -e snapshots/ -e docker ID 256 gen 31 top level 5 path @ ID 257 gen 285669 top level 256 path @/var ID 258 gen 285613 top level 256 path @/usr/local ID 259 gen 281519 top level 256 path @/srv ID 260 gen 285554 top level 256 path @/root ID 261 gen 281519 top level 256 path @/opt ID 262 gen 285671 top level 256 path @/home ID 263 gen 281519 top level 256 path @/boot/grub2/x86_64-efi ID 264 gen 281519 top level 256 path @/boot/grub2/i386-pc ID 265 gen 285470 top level 256 path @/.snapshots
On 13.02.2022 20:13, Kévin Le Gouguec wrote:
Hello folks,
(I hope this is the right list; if not: apologies, and where would this belong more? users@? support@? Somewhere else?)
I have Tumbleweed installed on a pair of desktops: the first since February 2018, and the second since April 2021. I let the former gather dust since setting up the latter until a couple of weeks ago; I figured that dist-upgrading this "old" setup was the responsible thing to do after waking it up (openSUSE-release 20210621 → 20220122).
The upgrade itself went smoothly, but now I'm facing a problem that I have experienced quite often with this first desktop (though not with the second desktop yet): 'btrfs filesystem usage /' says that… *something* is close to bursting, and I suspect that the next 1GB+ dist-upgrade will have me scrambling again for disk space, and confused about snapshot management.
First, what I think I understand (see footnotes for the commands I use to come to these conclusions):
* I have about 20GB's worth of actual data in the /… subvolume? partition?, estimated by weighing /usr.[1]
Yes, it looks like it.
* The / subvolume is 40GB, and has almost no free space left.[2]
* My recent upgrade left me with a 15GB snapshot that *I think* is responsible for / being so full?[3]
Yes. This snapshot consumes 15GB of unique (non-shared) data. Showing btrfs qgroup show / may give some more information.
(2) Is this 2018 setup still salvageable, or should I reinstall? I try to follow factory@lists.opensuse.org and project@lists.opensuse.org, so I know that Tumbleweed underwent some Big Changes between February 2018 and April 2021; I assume that reinstalling should nonetheless not be necessary, and I just messed something up?
Delete offending snapshot (as long as you are sure you will not need it to rollback). snapper delete 2 Note that this may move accounting of this data to different snapshot, in the worst case you may need to delete all of them.
(3) What does this huge snapshot represent? The dates in snapper list make me think that it's a diff between the first installation (2018-02-20) and the last upgrade (2022-01-24), in which case, yup, of course the diff would be huge.
So look inside this snapshot? Snapshot is filesystem content at some point in time. Snapshot "grows" when content of actual (root) filesystem changes and starts to differ from what is in snapshot. If you have 15G of packaged files and did huge update that touches *all* packages (there were several not long ago), then snapshot will have old files and you root will have new files.
Empirically though deleting this huge snapshot just moves all the gigabytes into snapshot 1 (that's what I remember happening the last time I tried to reclaim disk space), which I cannot delete. Is there a way to tell snapper to just forget about 2018 and use the last upgrade as the oldest point of reference?
No, that should not happen. Snapshot 1 is your actual root filesystem. You cannot delete it. So show actual results after "snapper delete 2".
(4) There are some differences between the two desktop setups I don't understand. For one, 'btrfs filesystem usage /' reports very different device sizes: on the first setup it's 40GB; on the second setup it's roughly 440GB.[4] This is surprising to me, since both disks have similar total sizes (about 500GB).[5]
There are disks and there are partitions. You "old" desktop has 40G root partition, the rest is partition with xfs filesystem. Probably /home, at some point Tumbleweed installation defaulted to this layout. 40G is a bit tight and suitable for the basic installation only.
Also, the subvolume setup seems different?[6] Not sure why the older setup has /tmp listed, since findmnt /tmp tells me it's a tmpfs; why is it still on btrfs's radar? Also, why does the newer setup have entries for /root and /home, and not the older one?
Yes, things change over time. ...
[3] On the older desktop: # snapper --iso \ list --columns number,type,date,used-space,cleanup,description # | Type | Date | Used Space | Cleanup | Description ---+--------+---------------------+------------+---------+---------------------- 0 | single | | | | current 1* | single | 2018-02-20 20:13:41 | 237.26 MiB | | first root filesystem 2 | pre | 2022-01-24 07:39:54 | 15.98 GiB | number | zypp(zypper) 3 | post | 2022-01-24 12:58:38 | 12.64 MiB | number | 4 | pre | 2022-01-24 13:46:28 | 960.00 KiB | number | zypp(zypper) 5 | post | 2022-01-24 13:49:01 | 73.57 MiB | number |
...
[6] On the older desktop: # btrfs subvolume list / | grep -v snapshots/ ID 257 gen 622718 top level 5 path @ ID 258 gen 638632 top level 257 path @/var ID 259 gen 637339 top level 257 path @/usr/local ID 260 gen 622718 top level 257 path @/tmp ID 261 gen 637339 top level 257 path @/srv ID 262 gen 637339 top level 257 path @/opt ID 263 gen 629216 top level 257 path @/boot/grub2/x86_64-efi ID 264 gen 637331 top level 257 path @/boot/grub2/i386-pc ID 265 gen 637331 top level 257 path @/.snapshots ID 269 gen 637339 top level 258 path @/var/lib/machines
That does not match. Please, NEVER filter output. Now we have no idea whether your system is seriously broken or you just decided to now show this data. In particular, omitting snapshots will make interpreting output of btrfs qgroup show / near to impossible.
On 13.02.22 19:04, Andrei Borzenkov wrote:
40G is a bit tight and suitable for the basic installation only.
I seem to remember that 40G root size was the default until some time ago. Colleagues of mine stepped into that trap when they had to update SLES12-SPx (unpatched after installation) to SP(x+1) on a system that had been installed in a stupid "use all defaults => next => next => next => finish" installation. The only luck they had was that they could rollback, remove all previous snapshots and then update in smaller batches :-) IMHO 100GB root partition size is the absolute minimum when using BTRFS, unless you are really running a rolling tumblewed installation and update at least once per week to keep the snapshots relatively small. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
First off, thanks for taking the time to sift through this info-dump…
and sorry for the bad judgment calls wrt filtering output.
Replying point-by-point.
Andrei Borzenkov
On 13.02.2022 20:13, Kévin Le Gouguec wrote:
* The / subvolume is 40GB, and has almost no free space left.[2]
* My recent upgrade left me with a 15GB snapshot that *I think* is responsible for / being so full?[3]
Yes. This snapshot consumes 15GB of unique (non-shared) data. Showing
btrfs qgroup show /
may give some more information.
Attached below, along with the full output of 'btrfs subvolume list /'.
(2) Is this 2018 setup still salvageable, or should I reinstall? I try to follow factory@lists.opensuse.org and project@lists.opensuse.org, so I know that Tumbleweed underwent some Big Changes between February 2018 and April 2021; I assume that reinstalling should nonetheless not be necessary, and I just messed something up?
Delete offending snapshot (as long as you are sure you will not need it to rollback).
snapper delete 2
Note that this may move accounting of this data to different snapshot, in the worst case you may need to delete all of them.
OK; haven't done that yet, since I'd like to make sure I haven't elided important information in my opening post.
(3) What does this huge snapshot represent? The dates in snapper list make me think that it's a diff between the first installation (2018-02-20) and the last upgrade (2022-01-24), in which case, yup, of course the diff would be huge.
So look inside this snapshot? Snapshot is filesystem content at some point in time. Snapshot "grows" when content of actual (root) filesystem changes and starts to differ from what is in snapshot. If you have 15G of packaged files and did huge update that touches *all* packages (there were several not long ago), then snapshot will have old files and you root will have new files.
Mmm, 'ls -lt /.snapshots/[23]/snapshot/usr/bin' says that the most recent file in snapshot 2 dates from 2021-06-23 (that would be the last upgrade I ran on the desktop before shelving it), while the most recent file in snapshot 3 dates from 2022-01-24 (the "wake-up call" upgrade). So maybe the date on snapshot 1 (2018-02-20) is a red herring, and everything is Perfectly Normal™: the diff between 2021-06-23 and 2022-01-24 (snapshot 2 and 3) simply does weigh ¾ of /usr.
Empirically though deleting this huge snapshot just moves all the gigabytes into snapshot 1 (that's what I remember happening the last time I tried to reclaim disk space), which I cannot delete. Is there a way to tell snapper to just forget about 2018 and use the last upgrade as the oldest point of reference?
No, that should not happen. Snapshot 1 is your actual root filesystem. You cannot delete it.
So show actual results after "snapper delete 2".
That will be my next move. Hopefully this "moving into snapshot 1" stuff is just a figment of my deranged mind.
40G is a bit tight and suitable for the basic installation only.
(FWIW, as I mentioned in my message, I don't think I've messed with Tumbleweed's defaults there)
Also, the subvolume setup seems different?[6] Not sure why the older setup has /tmp listed, since findmnt /tmp tells me it's a tmpfs; why is it still on btrfs's radar? Also, why does the newer setup have entries for /root and /home, and not the older one?
Yes, things change over time.
Sure; is there a point where these residual discrepancies could be a problem though, requiring a full reinstall?
[3] On the older desktop: # snapper --iso \ list --columns number,type,date,used-space,cleanup,description # | Type | Date | Used Space | Cleanup | Description ---+--------+---------------------+------------+---------+---------------------- 0 | single | | | | current 1* | single | 2018-02-20 20:13:41 | 237.26 MiB | | first root filesystem 2 | pre | 2022-01-24 07:39:54 | 15.98 GiB | number | zypp(zypper) 3 | post | 2022-01-24 12:58:38 | 12.64 MiB | number | 4 | pre | 2022-01-24 13:46:28 | 960.00 KiB | number | zypp(zypper) 5 | post | 2022-01-24 13:49:01 | 73.57 MiB | number |
...
[6] On the older desktop: # btrfs subvolume list / | grep -v snapshots/ ID 257 gen 622718 top level 5 path @ ID 258 gen 638632 top level 257 path @/var ID 259 gen 637339 top level 257 path @/usr/local ID 260 gen 622718 top level 257 path @/tmp ID 261 gen 637339 top level 257 path @/srv ID 262 gen 637339 top level 257 path @/opt ID 263 gen 629216 top level 257 path @/boot/grub2/x86_64-efi ID 264 gen 637331 top level 257 path @/boot/grub2/i386-pc ID 265 gen 637331 top level 257 path @/.snapshots ID 269 gen 637339 top level 258 path @/var/lib/machines
That does not match. Please, NEVER filter output. Now we have no idea whether your system is seriously broken or you just decided to now show this data.
Apologies; the point of this last footnote was to compare the subvolume layout with the second desktop I installed in April; I did not expect it to be useful for debugging the free space issue. Full output below, followed by 'btrfs qgroup show /': ID 257 gen 622718 top level 5 path @ ID 258 gen 640101 top level 257 path @/var ID 259 gen 640082 top level 257 path @/usr/local ID 260 gen 622718 top level 257 path @/tmp ID 261 gen 637339 top level 257 path @/srv ID 262 gen 637339 top level 257 path @/opt ID 263 gen 629216 top level 257 path @/boot/grub2/x86_64-efi ID 264 gen 639631 top level 257 path @/boot/grub2/i386-pc ID 265 gen 640082 top level 257 path @/.snapshots ID 266 gen 640082 top level 265 path @/.snapshots/1/snapshot ID 269 gen 637339 top level 258 path @/var/lib/machines ID 1958 gen 629495 top level 265 path @/.snapshots/2/snapshot ID 1972 gen 639631 top level 265 path @/.snapshots/3/snapshot ID 1973 gen 639631 top level 265 path @/.snapshots/4/snapshot ID 1974 gen 639631 top level 265 path @/.snapshots/5/snapshot qgroupid rfer excl -------- ---- ---- 0/5 16.00KiB 16.00KiB 0/257 16.00KiB 16.00KiB 0/258 1.28GiB 1.28GiB 0/259 296.71MiB 296.71MiB 0/260 16.00KiB 16.00KiB 0/261 16.00KiB 16.00KiB 0/262 275.77MiB 275.77MiB 0/263 16.00KiB 16.00KiB 0/264 2.69MiB 2.69MiB 0/265 15.45MiB 15.45MiB 0/266 17.95GiB 252.15MiB 0/269 16.00KiB 16.00KiB 0/1958 17.35GiB 15.98GiB 0/1972 18.04GiB 12.64MiB 0/1973 18.04GiB 960.00KiB 0/1974 17.89GiB 75.93MiB IIUC the highest count of "bytes owned exclusively" goes to qgroupid 0/1958, which would be snapshot 2. Not sure how to interpret this information though; e.g. is there a way to reconcile 'btrfs filesystem usage /' (which says "Used: 37.47GiB") with the "exclusive" count? # btrfs qgroup show / --raw | tr -s ' ' | cut -d' ' -f3 | tail -n+3 | paste -sd+ | bc | numfmt --to=si 20G My takeaways so far: * This is probably all par for the course, since my / subvolume is barely twice as big as all the stuff I put in /usr. * In these conditions, any full-system upgrade is bound to fill the subvolume. * I guess I was doubtful that the diff before/after any given dist-upgrade would be so big; thanks for lifting the veil of my eyes. * Next time I install Tumbleweed, I should consider not doing a… how did Stefan put it? “a stupid "use all defaults => next => next => next => finish" installation” 🤪 Again, thanks for your time; I'll wait a bit to make sure no-one sees something obviously wrong with the information I've added in this reply, then go ahead and delete snapshot 2. I'll try to followup afterward, if only to confirm that this was all much ado about nothing.
Stefan Seyfried wrote:
On 13.02.22 19:04, Andrei Borzenkov wrote:
40G is a bit tight and suitable for the basic installation only. I seem to remember that 40G root size was the default until some time ago. Colleagues of mine stepped into that trap when they had to update SLES12-SPx (unpatched after installation) to SP(x+1) on a system that had been installed in a stupid "use all defaults => next => next => next => finish" installation. The only luck they had was that they could rollback, remove all previous snapshots and then update in smaller batches :-) IMHO 100GB root partition size is the absolute minimum when using BTRFS, unless you are really running a rolling tumblewed installation and update at least once per week to keep the snapshots relatively small.
I find that for most users ~40G is a pretty decent amount of space for the root partition. Considering that we must have something between 15-20% of that space free for performance reasons, we still have 32-34G, give or take, of space for the system plus snapshots. Now, IMHO, the default Snapper configuration is what doesn't make much sense, for the average user: | # limit for number cleanup | NUMBER_MIN_AGE="1800" | NUMBER_LIMIT="50" | NUMBER_LIMIT_IMPORTANT="10" This will limit "number" snapshots to _50_, plus 10 important snapshots. At the moment I use only 6 as a limint and 4 important ones. I'd say experimenting a little bit to find a nice balance here is worth one's while. | # create hourly snapshots | TIMELINE_CREATE="yes" Now what to say about this one, for the average users. If anyone has a use case for a domestic PC/laptop for taking snapshots of the system every hour I'm all ears. I disabled this one in my system. | # cleanup hourly snapshots after some time | TIMELINE_CLEANUP="yes" Since I disabled "timeline" snapshot creation, I disabled this one so its unit service don't even bother to get fired up only to find out that there is no work to be done. | # limits for timeline cleanup | TIMELINE_MIN_AGE="1800" | TIMELINE_LIMIT_HOURLY="10" | TIMELINE_LIMIT_DAILY="10" | TIMELINE_LIMIT_WEEKLY="0" | TIMELINE_LIMIT_MONTHLY="10" | TIMELINE_LIMIT_YEARLY="10" Again, if anyone has any reason for keeping snapshots more than a year, please tell us. Another thing I like doing is to every week or two delete all snapshots and manually take one. Sometimes I left only the one(s) I took manually. Of course if you setup good numbers that work for you, you can just left things rolling without worrying much about it. All this rant is about regular users. For packagers/developers/testers and whatnots a bigger / root file system plus a nice /etc/snapper/configs/root tweaking will make things smoother for sure. I hope this is usefull for someone. Take care, Luciano.
participants (4)
-
Andrei Borzenkov
-
Kévin Le Gouguec
-
Luciano Santos
-
Stefan Seyfried