[opensuse] Re: thin volumes, was: average partition and file sizes & file I/O speed

newer
[opensuse] retaining bash history...

older
[opensuse] Octave requires...

Chris Murphy

18 Mar 2015 18 Mar '15

16:13

On Tue, Mar 17, 2015 at 8:32 PM, Linda Walsh <suse@tlinx.org> wrote:

...

Chris Murphy wrote:

...
shrink = fstrim grow = Just start the thing out at 4x the size you think you'll ever need in its lifetime.]]

--- Eh... xfs doesn't shrink.

No need when it's on a virtual sized (thin) volume. When you use fstrim, unused LE's are returned to the pool. It's actually more efficient than an fs shrink would be.

...

And as was announced today, (AGAIN), btrfs isn't stable in latest kernel.

? I'm not seeing anything on opensuse-bugs@ about it, or on linux-btrfs@ related to this. There are always some edge cases being run into. XFS is more stable, but bugs still get found there too.

...

So thin snaps w/XFS -- not quite there....whereas this script is pretty reliable.. checkpoints and can pick up a failed diff from day before before continuing with the current days.

The issue I've had with thin snapshots is related to LVM metadata getting full and then the entire VG imploding spectacularly and none of the LV's in it working. I wasn't exactly trying to blow it up, but I was testing the thin_pool_autoextend_threshold before I realized the default was 100, meaning it's disabled. So it's user error, but could also arguably be a weird default that results in a distinctly non-graceful failure with no obvious way to repair the problem. Neither pvck nor vgck did anything, and I quickly gave up. But then also, vgmetadatacopies = 0 is also a default so maybe it couldn't do anything to fix it. Btrfs correctly gets some grief with its many repair options still not consolidated, but I haven't had it implode this spectacularly. -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Show replies by date

Linda Walsh

20 Mar 20 Mar

01:54

Chris Murphy wrote:

...

...
And as was announced today, (AGAIN), btrfs isn't stable in latest kernel.

? I'm not seeing anything on opensuse-bugs@ about it, or on linux-btrfs@ related to this. There are always some edge cases being run into. XFS is more stable, but bugs still get found there too.

Sorry, was talking about this.. -------- Original Message -------- Subject: btrfs Kernel Bug in Linux Kernel 3.19 Date: Sat, 14 Mar 2015 17:20:42 -0700 From: Sam M. <backgroundprocess@gmail.com> To: opensuse@opensuse.org Is anyone familiar with this bug?: http://permalink.gmane.org/gmane.comp.file-systems.btrfs/42274 I dropped back to 3.16.7 after being on 3.19.x because I think I experienced this bug... -----------------------------------------

...

The issue I've had with thin snapshots is related to LVM metadata getting full and then the entire VG imploding spectacularly and none of the LV's in it working.

*ouch*

...

I wasn't exactly trying to blow it up,

hmmm....where have I heard that before? (i.e. my normal usage is often defined as HW/SW abuse... ;-); but they don't understand, my degree is in computer science...so it certainly doesn't look weird to me! ;-) )

...

Btrfs correctly gets some grief with its many repair options still not consolidated, but I haven't had it implode this spectacularly.

reminds me of a few early fun sessions in Win7 with it's "restore system to previous state" function -- where it "ate" my disk twice in the first 3 months it was out (ate=randomly deleted about 50% of the files..no apparent pattern, reinstall required). System restores and 'rollback' functions make me very nervous after those experiences. Recently worst has been it using >100G for snapshots/earlier sessions but all of them being unrestorable because it couldn't restore some font I didn't use or care about...Have they ever heard of "restore what you can", or ask me if I need foobar.ttf? Ug -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Chris Murphy

02:48

On Thu, Mar 19, 2015 at 7:54 PM, Linda Walsh <suse@tlinx.org> wrote:

...

Chris Murphy wrote:

...
...
And as was announced today, (AGAIN), btrfs isn't stable in latest kernel.

? I'm not seeing anything on opensuse-bugs@ about it, or on linux-btrfs@ related to this. There are always some edge cases being run into. XFS is more stable, but bugs still get found there too.

--- Sorry, was talking about this..

-------- Original Message -------- Subject: btrfs Kernel Bug in Linux Kernel 3.19 Date: Sat, 14 Mar 2015 17:20:42 -0700 From: Sam M. <backgroundprocess@gmail.com> To: opensuse@opensuse.org

Is anyone familiar with this bug?:

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/42274

It's 3.19.0-rc4 kernel, not a release mainline kernel. Currently 3.19.2 is stable so this is sorta old news. I haven't heard about it still being a problem, I never ran into it even with 3.19-rc4. These sorts of edge case regressions in Btrfs do still happen with rc kernels sometimes. I think it's been a year since a major bug appeared in a released mainline kernel though, and it was fixed soon after, no data was lost, and there was a btrfs check fix for it also.

...

I dropped back to 3.16.7 after being on 3.19.x because I think I experienced this bug... -----------------------------------------

...
The issue I've had with thin snapshots is related to LVM metadata getting full and then the entire VG imploding spectacularly and none of the LV's in it working.

--- *ouch*

...
I wasn't exactly trying to blow it up,

--- hmmm....where have I heard that before? (i.e. my normal usage is often defined as HW/SW abuse... ;-); but they don't understand, my degree is in computer science...so it certainly doesn't look weird to me! ;-) )

...
Btrfs correctly gets some grief with its many repair options still not consolidated, but I haven't had it implode this spectacularly.

--- reminds me of a few early fun sessions in Win7 with it's "restore system to previous state" function -- where it "ate" my disk twice in the first 3 months it was out (ate=randomly deleted about 50% of the files..no apparent pattern, reinstall required). System restores and 'rollback' functions make me very nervous after those experiences.

Yeah they're not the same thing. The NTFS shadow copy is a completely different design from Btrfs. In the realm of a couple hundred subvolumes or snapshots, the file system is quite stable on stable hardware. There are some scalability issues with many more snapshots but it's primarily a performance problem. Off hand I haven't seen this itself cause a Btrfs volume to unravel or become inconsistent, but we're at a level of stability where things are increasingly just edge cases. So not nearly as many people experience the same thing.

...

Recently worst has been it using >100G for snapshots/earlier sessions but all of them being unrestorable because it couldn't restore some font I didn't use or care about...Have they ever heard of "restore what you can", or ask me if I need foobar.ttf? Ug

Right. Well, despite Btrfs still being improved upon, already I've had good luck, even with willful sabotage, getting to to at least mount read-only and updating a backup, or in worst case, sucking data offline with btrfs rescue (basically a scraping tool). On the CentOS list recently someone had a single linear LV backed by 4 PVs. Yup. And one PV died, and they wanted some data off the file system. So, restore what you can? What do you expect in such a case? Well with on PV dead, the file system has a big hole in it, and doesn't recover. I tried this with ext4 and it can't recover anything. On XFS on an LV across all four drives, I recovered 452MB out of 3.5GB. And on Btrfs -dsingle -mraid1 across 4 drives, it recovered exactly 3/4, 100% of the data on the remaining drives was recoverable. Gotcha? Btrfs by default uses -mDUP at mkfs time. So if you just start adding drives, that doesn't change. One drive has all the metadata. So you should remember to balance -mconvert raid1 when moving to a multiple device Btrfs, at the moment it's not automatic. -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Linda Walsh

04:58

New subject: [opensuse] LVM or snapshots not a replacement for backups (was Re: thin volumes, was: average partition and file sizes & file I/O speed)

Chris Murphy wrote:

...

On the CentOS list recently someone had a single linear LV backed by 4 PVs. Yup. And one PV died, and they wanted some data off the file system.

Well, my biggest PV = 1 VG, and I subdivide that into lv's but it is comprised of 24 HD's in a RAID10. On top of that, another similar config has daily incremental (tower of hanoi) style backups going back an average of 3-4 months before they expire. The most a snapshot goes back is about 1 month -- they were purely for convenience at this point.

...

So, restore what you can? What do you expect in such a case?

Sounds like it didn't have any redundancy -- sorta like my 4-disk RAID0 on my workstation (all SSD's). Anyone one of them go, all of them go... though I have had some success restoring from an MSImage image-backup in the past, and I do take a new image-backup weekly -- but the workstation "just" has the software installed on it -- all the content is on the server with RAID10 + full incremental backups + convenience snapshots. I *really* don't like losing data.. Underlying my system disks on the linux is 'no LV' -- just a 3-disk RAID5 (though they are all backed up daily).

...

Gotcha? Btrfs by default uses -mDUP at mkfs time. So if you just start adding drives, that doesn't change. One drive has all the metadata. So you should remember to balance -mconvert raid1 when moving to a multiple device Btrfs, at the moment it's not automatic.

I'm NOT ruling out trying another FS or moving to it should I like it, but I've been using XFS since 1995. Only once had a SW issue thrash the disk (restore from backup). So ... with about 2-decades of usage, I'm reasonably 'ok' w/XFS's stability. (Before serviceD, most of my wounds were self-inflicted, when it came along many things stopped working.. (piece by piece)... Had a small glitch pop up adding some new libs, libsasl2 -- used by sudo, it wanted libsasl2.so.2, which the lovely RH-Pkg-Manager deleted for me and replaced it with an incompatible libsasl2.so.3 -- but fortunately have copies of root and /usr, so now: /usr/lib64/libsasl2.so -> libsasl2.so.2.0.25* /usr/lib64/libsasl2.so.2 -> libsasl2.so.2.0.25* /usr/lib64/libsasl2.so.2.0.25* /usr/lib64/libsasl2.so.3 -> libsasl2.so.3.0.0* /usr/lib64/libsasl2.so.3.0.0* have both -- as *should* be available, as an option in the installer, but the installer doesn't support the linux/unix versioned-library system, where you can multiple versions sitting side-by-side (as above). MS has had that now since Vista -- you can have many different versions of the same file, -- and most the apps you have had for 10 years -- still run on the current OS. Let's see a linux vendor even begin to make a claim for "stability" -- IMO, that's one of the biggest reasons for linux's lack of success in the consumer and business markets. -- Almost no way an app you bought that worked on opensuse 5 years ago is likely to run at all on 13.2. But the fact is, many XP-era programs still run (just a bit faster) on Win7, 15 years later. Having a reasonable install and upgrade process that preserves old libraries until it is certain that no progs depend on them wouldn't be rocket science. Right now, I try to parse the output of 'prelink' to see if anything is missing. Works ok 'cept when prelink dumps core... ;-( (Yeah, I meander a bit in conversations, but in my mind, they are connected -- backups to restoring things to working state... snapshots, etc...) *cheers* Linda -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Chris Murphy

05:37

New subject: [opensuse] Re: LVM or snapshots not a replacement for backups (was Re: thin volumes, was: average partition and file sizes & file I/O speed)

On Thu, Mar 19, 2015 at 10:58 PM, Linda Walsh <suse@tlinx.org> wrote:

...

I'm NOT ruling out trying another FS or moving to it should I like it, but I've been using XFS since 1995. Only once had a SW issue thrash the disk (restore from backup). So ... with about 2-decades of usage, I'm reasonably 'ok' w/XFS's stability.

I trust XFS a lot more than hard drives. The thing is, XFS implicitly trusts the hardware. The metadata checksumming coming soon somewhat alters that perspective. And I can't say Btrfs is as stable as XFS, it's not. And even when it is, there are simply some use cases XFS will continue to excel at, in particular VM images.

...

MS has had that now since Vista -- you can have many different versions of the same file, -- and most the apps you have had for 10 years -- still run on the current OS. Let's see a linux vendor even begin to make a claim for "stability" -- IMO, that's one of the biggest reasons for linux's lack of success in the consumer and business markets. -- Almost no way an app you bought that worked on opensuse 5 years ago is likely to run at all on 13.2. But the fact is, many XP-era programs still run (just a bit faster) on Win7, 15 years later.

Hence Docker and containers being the new hotness to work around the lack of ABI/API stability within a distro, let alone among them. In effect the distros are different OS's that just so happen to share a kernel (and now systemd which actually does improve the consistency in a good way). There's also the framekwork, runtime, and portable applications concept advocated by the systemd cabal (that's self-described cabal). http://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html The earlier article it refers to on stateless systems is a practical why, and the above URL is more how. It drives me crazy that mobile devices arrived on the scene and promptly have an easier install/upgrade/reset/restore experience than desktop or server Linux. There is no good reason for this. The distros have inanely complicated installers that appease 8001 layouts for no real good reason. The relative success of the rest of the world proves custom installation choices are not that important in the grand scheme of things. Making it work, self heal, making things easier, that's important. Application installation on linux sneezes piles of files throughout the file system. Apps are as much separate domain from OS as user files. I should be able to rollback to an earlier application version independent from an OS update. There is a way forward with containers. I don't know whether it'll stick. But a part of it is recognizing choice has had consequences. -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

jdd

07:14

New subject: [opensuse] Re: LVM or snapshots not a replacement for backups (was Re: thin volumes, was: average partition and file sizes & file I/O speed)

Le 20/03/2015 06:37, Chris Murphy a écrit :

...

why, and the above URL is more how. It drives me crazy that mobile devices arrived on the scene and promptly have an easier install/upgrade/reset/restore experience than desktop or server Linux.

I just did a backup / reset / restore on an android smartphone and get exactly the same install as before, I would like to be able to do the same on openSUSE... may be it's the initial subject of an other thread on this list (btrfs...) jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Chris Murphy

19:46

New subject: [opensuse] Re: LVM or snapshots not a replacement for backups (was Re: thin volumes, was: average partition and file sizes & file I/O speed)

On Fri, Mar 20, 2015 at 1:14 AM, jdd <jdd@dodin.org> wrote:

...

Le 20/03/2015 06:37, Chris Murphy a écrit :

...
why, and the above URL is more how. It drives me crazy that mobile devices arrived on the scene and promptly have an easier install/upgrade/reset/restore experience than desktop or server Linux.

I just did a backup / reset / restore on an android smartphone and get exactly the same install as before, I would like to be able to do the same on openSUSE...

I think so too, especially considering it requires ample esoteric knowledge to know how to properly back up the entire system, and more to restore it. Even writing out a how too is difficult, let alone actually doing it. Seriously, just be prepared to only restore /home and reinstall the whole OS, update it, and reinstall apps at this point. -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Sam M.

28 Mar 28 Mar

05:11

...

It's 3.19.0-rc4 kernel, not a release mainline kernel. Currently 3.19.2 is stable so this is sorta old news. I haven't heard about it still being a problem, I never ran into it even with 3.19-rc4.

...

These sorts of edge case regressions in Btrfs do still happen with rc kernels sometimes. I think it's been a year since a major bug appeared in a released mainline kernel though, and it was fixed soon after, no data was lost, and there was a btrfs check fix for it also.

That bug exists in all 3.19 kernels; I contacted a kernel dev and he confirmed that It's not just the RC kernel it exists in. On Thu, Mar 19, 2015 at 7:48 PM, Chris Murphy <lists@colorremedies.com> wrote:

...

On Thu, Mar 19, 2015 at 7:54 PM, Linda Walsh <suse@tlinx.org> wrote:

...
Chris Murphy wrote:

...
...
And as was announced today, (AGAIN), btrfs isn't stable in latest kernel.

? I'm not seeing anything on opensuse-bugs@ about it, or on linux-btrfs@ related to this. There are always some edge cases being run into. XFS is more stable, but bugs still get found there too.

--- Sorry, was talking about this..

-------- Original Message -------- Subject: btrfs Kernel Bug in Linux Kernel 3.19 Date: Sat, 14 Mar 2015 17:20:42 -0700 From: Sam M. <backgroundprocess@gmail.com> To: opensuse@opensuse.org

Is anyone familiar with this bug?:

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/42274

It's 3.19.0-rc4 kernel, not a release mainline kernel. Currently 3.19.2 is stable so this is sorta old news. I haven't heard about it still being a problem, I never ran into it even with 3.19-rc4.

These sorts of edge case regressions in Btrfs do still happen with rc kernels sometimes. I think it's been a year since a major bug appeared in a released mainline kernel though, and it was fixed soon after, no data was lost, and there was a btrfs check fix for it also.

...
I dropped back to 3.16.7 after being on 3.19.x because I think I experienced this bug... -----------------------------------------

...
The issue I've had with thin snapshots is related to LVM metadata getting full and then the entire VG imploding spectacularly and none of the LV's in it working.

--- *ouch*

...
I wasn't exactly trying to blow it up,

--- hmmm....where have I heard that before? (i.e. my normal usage is often defined as HW/SW abuse... ;-); but they don't understand, my degree is in computer science...so it certainly doesn't look weird to me! ;-) )

...
Btrfs correctly gets some grief with its many repair options still not consolidated, but I haven't had it implode this spectacularly.

--- reminds me of a few early fun sessions in Win7 with it's "restore system to previous state" function -- where it "ate" my disk twice in the first 3 months it was out (ate=randomly deleted about 50% of the files..no apparent pattern, reinstall required). System restores and 'rollback' functions make me very nervous after those experiences.

Yeah they're not the same thing. The NTFS shadow copy is a completely different design from Btrfs. In the realm of a couple hundred subvolumes or snapshots, the file system is quite stable on stable hardware. There are some scalability issues with many more snapshots but it's primarily a performance problem. Off hand I haven't seen this itself cause a Btrfs volume to unravel or become inconsistent, but we're at a level of stability where things are increasingly just edge cases. So not nearly as many people experience the same thing.

...
Recently worst has been it using >100G for snapshots/earlier sessions but all of them being unrestorable because it couldn't restore some font I didn't use or care about...Have they ever heard of "restore what you can", or ask me if I need foobar.ttf? Ug

Right. Well, despite Btrfs still being improved upon, already I've had good luck, even with willful sabotage, getting to to at least mount read-only and updating a backup, or in worst case, sucking data offline with btrfs rescue (basically a scraping tool).

On the CentOS list recently someone had a single linear LV backed by 4 PVs. Yup. And one PV died, and they wanted some data off the file system. So, restore what you can? What do you expect in such a case? Well with on PV dead, the file system has a big hole in it, and doesn't recover. I tried this with ext4 and it can't recover anything. On XFS on an LV across all four drives, I recovered 452MB out of 3.5GB. And on Btrfs -dsingle -mraid1 across 4 drives, it recovered exactly 3/4, 100% of the data on the remaining drives was recoverable.

Gotcha? Btrfs by default uses -mDUP at mkfs time. So if you just start adding drives, that doesn't change. One drive has all the metadata. So you should remember to balance -mconvert raid1 when moving to a multiple device Btrfs, at the moment it's not automatic.

-- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Chris Murphy

30 Mar 30 Mar

18:25

On Fri, Mar 27, 2015 at 11:11 PM, Sam M. <backgroundprocess@gmail.com> wrote:

...

...
It's 3.19.0-rc4 kernel, not a release mainline kernel. Currently 3.19.2 is stable so this is sorta old news. I haven't heard about it still being a problem, I never ran into it even with 3.19-rc4.

...
These sorts of edge case regressions in Btrfs do still happen with rc kernels sometimes. I think it's been a year since a major bug appeared in a released mainline kernel though, and it was fixed soon after, no data was lost, and there was a btrfs check fix for it also.

That bug exists in all 3.19 kernels; I contacted a kernel dev and he confirmed that It's not just the RC kernel it exists in.

Looks like it goes back to 3.18.1, and isn't easy to reproduce. I do a lot of btrfs send and receive and haven't hit it. Anyway there's a patch that appeared just after 3.19 was released. So the usual process applies, which is the fix must be demonstrated in mainline before it gets backported to stable. And it's been in 4.0.0 since Feb 19. http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=for-linus&id=3d84be799194147e04c0e3129ed44a948773b80a -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

3549

Age (days ago)

3561

Last active (days ago)

List overview

Download

8 comments

4 participants

participants (4)

Chris Murphy
jdd
Linda Walsh
Sam M.