[opensuse-kernel] BTRFS bug, 3.3 rc6 ..system screwed up
hi: My workstation got screwed up today, I cannot delete any file, nor write anything to the root filesystem, even though "mount" says there are a few GB free... any operation, including btrfs balance, defragment etc or "no space left on device" :-((( I think I figured out what's going on, for some reason snapper wrote snapshots to the root filesystem in a directory called ".snapshots" those apparently are filling this small ssd drive.. attempting to delete subvolumes results in two different kind of mess ;) - If I attempt to rm -rf .snapshots it says that it is read only fs -If I attempt to delete using either snapper or btrfs subvolume, the kernel oopses and the the machine freezes up, REISUB has to be used to make it run again... Unfortunaly my camera is also broken and I cannot take a photo of the crash.. aby hints appreciated,,, -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On Saturday 10 March 2012 16.20:53 Cristian Rodríguez wrote:
hi:
My workstation got screwed up today, I cannot delete any file, nor write anything to the root filesystem, even though "mount" says there are a few GB free... any operation, including btrfs balance, defragment etc or "no space left on device" :-(((
I think I figured out what's going on, for some reason snapper wrote snapshots to the root filesystem in a directory called ".snapshots" those apparently are filling this small ssd drive.. attempting to delete subvolumes results in two different kind of mess ;)
- If I attempt to rm -rf .snapshots it says that it is read only fs
-If I attempt to delete using either snapper or btrfs subvolume, the kernel oopses and the the machine freezes up, REISUB has to be used to make it run again...
Unfortunaly my camera is also broken and I cannot take a photo of the crash..
aby hints appreciated,,,
I've saw exactly the same situation in a vm / being full 360Mb free during a zypper dup Was hardtime to understand what's happening until I catch snapper eating 9.8Go of the 16Gb (I think a ratio of a maximum of 50% should be never exceeded for snapshots) Then I user snapper delete to remove any snapshot referenced in the .snapshot After that I use rm -fr .snapshots and even the read-only message I was able to recover a lot of place. All actions were a bit vodoo so can't be really reported as a nice bug. Just say take care of snapper, .snapshot : it seems that the default used are not suitable. -- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch openSUSE Member & Ambassador GPG KEY : D5C9B751C4653227 irc: tigerfoot -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 10/03/12 16:44, Bruno Friedmann wrote:
I've saw exactly the same situation in a vm / being full 360Mb free during a zypper dup
Was hardtime to understand
yeah, lost quite a bit of hair trying to figure it out :-S what's happening until I catch snapper eating 9.8Go of the 16Gb
(I think a ratio of a maximum of 50% should be never exceeded for snapshots)
Then I user snapper delete to remove any snapshot referenced in the .snapshot
After that I use rm -fr .snapshots and even the read-only message I was able to recover a lot of place.
All actions were a bit vodoo so can't be really reported as a nice bug. Just say take care of snapper, .snapshot : it seems that the default used are not suitable.
In this case the kernel BUGS in btrfs_unlink_subvol() exactly in http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=fs/btr... line 2951 btrfs_i_size_write(dir, dir->i_size - name_len * 2); dir->i_mtime = dir->i_ctime = CURRENT_TIME; ret = btrfs_update_inode(trans, root, dir); BUG_ON(ret); --> bang, here we go... btrfs_free_path(path); return 0; and my efforts to delete anything are futile :-( -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 10/03/12 16:44, Bruno Friedmann wrote:
On Saturday 10 March 2012 16.20:53 Cristian Rodríguez wrote:
hi:
My workstation got screwed up today, I cannot delete any file, nor write anything to the root filesystem, even though "mount" says there are a few GB free... any operation, including btrfs balance, defragment etc or "no space left on device" :-(((
I think I figured out what's going on, for some reason snapper wrote snapshots to the root filesystem in a directory called ".snapshots" those apparently are filling this small ssd drive.. attempting to delete subvolumes results in two different kind of mess ;)
- If I attempt to rm -rf .snapshots it says that it is read only fs
-If I attempt to delete using either snapper or btrfs subvolume, the kernel oopses and the the machine freezes up, REISUB has to be used to make it run again...
Unfortunaly my camera is also broken and I cannot take a photo of the crash..
aby hints appreciated,,,
I've saw exactly the same situation in a vm / being full 360Mb free during a zypper dup
Was hardtime to understand what's happening until I catch snapper eating 9.8Go of the 16Gb (I think a ratio of a maximum of 50% should be never exceeded for snapshots)
There is something really wrong with this stuff.. - All monitoring programs report that there is space available. - No popup, no warning whatsoever that the filesystem X is becoming full (probably related to the previous one) - Subtle random "no space left on device".. usually "fixed" by running btrfs filesystem balance ... or something absurd like: first time : osc build --clean --> delay --> rpm complains about the db..then no space left on device, no packages installed. second time: osc build --clean --> setup packages but stops at the few last ones.. third time --> rm -rf /path/to/the/package/build/root osc build --> package builds correctly, no errors, system normal ..you can write anything else to the FS.. WTF !!! osc build --> some *other* app, like the music player or the browser crashes.. (!!!!) -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On Sat, Mar 10, 2012 at 05:05:11PM -0300, Cristian Rodríguez wrote:
In this case the kernel BUGS in btrfs_unlink_subvol() exactly in
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=fs/btr...
line 2951
btrfs_i_size_write(dir, dir->i_size - name_len * 2); dir->i_mtime = dir->i_ctime = CURRENT_TIME; ret = btrfs_update_inode(trans, root, dir); BUG_ON(ret); --> bang, here we go... btrfs_free_path(path); return 0;
This patch http://article.gmane.org/gmane.comp.file-systems.btrfs/15254 (depends on http://permalink.gmane.org/gmane.comp.file-systems.btrfs/15253 ) fixes the crash when attempting to delete the file under no-space conditions.
and my efforts to delete anything are futile :-(
Deleting & COW needs a few blocks to succeed, and for that reason there is a global reserve of blocks to be used even if there's no space. But it somehow is not used and no files could be deleted. The patches above are applied in opensuse kernel. david -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/10/2012 03:05 PM, Cristian Rodríguez wrote:
On 10/03/12 16:44, Bruno Friedmann wrote:
I've saw exactly the same situation in a vm / being full 360Mb free during a zypper dup
Was hardtime to understand
yeah, lost quite a bit of hair trying to figure it out :-S
what's happening until I catch snapper eating 9.8Go of the 16Gb
(I think a ratio of a maximum of 50% should be never exceeded for snapshots)
Then I user snapper delete to remove any snapshot referenced in the .snapshot
After that I use rm -fr .snapshots and even the read-only message I was able to recover a lot of place.
All actions were a bit vodoo so can't be really reported as a nice bug. Just say take care of snapper, .snapshot : it seems that the default used are not suitable.
In this case the kernel BUGS in btrfs_unlink_subvol() exactly in
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=fs/btr...
line 2951
btrfs_i_size_write(dir, dir->i_size - name_len * 2); dir->i_mtime = dir->i_ctime = CURRENT_TIME; ret = btrfs_update_inode(trans, root, dir); BUG_ON(ret); --> bang, here we go... btrfs_free_path(path); return 0;
Ah, yep. Welcome to the current state of upstream btrfs error handling. The good news is that we've put a ton of work into improving this and it should be upstream for 3.4. The master branch didn't have any of the error handling patchset, but it does as of about 5 minutes ago. When I get a moment I'll need to update the btrfs code in the 12.1 kernel as well. Grab a new KOTD in a few hours and you should see a changelog with about 80 btrfs commits. You'll obviously need to find a way to boot that kernel without writing to your root file system, though. Let me know if you still run into problems after that. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJPW+YJAAoJEB57S2MheeWysw0QAK+JnVVBxHt2Tps//CLZDkAf K9k7FjEz2GylbFRpu74SW7rELUGkty9/Ei4urjN9S/0ESIsPG0XJ7Dri+ePNWMFp Uw51e1KuhOBHKzlU6GqChJx4+wUqBlEl8qSOBFSc3G8qghjq0faYG22rxXwEvhzO /WoM3JSGHtDi3477c761u4D4ne9BdrXnI/0WO3XeGuv5G5E5EhNhwR4pCz2aL2+j ndILnNUuKab9trrmxLvw0cptRDvsl/vBbAvP1x2qRh0x9dQKxhEsjGz8zHzfazuo LEDv0QbXpdPnsjjRU31sq7MR/d2MG3/ywnEsulTEmLBZw8jvKyl/mISilBF7DrBb SiqnEYJTO0SL2YV9fJoERtZh4kpbgWz/mbTZBaguIbJsC807yMsR8ZmO/7q13ka0 uj/q4gP3e7bLTZmPDwJvm02ycLTL5HZ3xKMCyKMvnhDFrJT1mbJEjs0S82UrYpz0 oTF3a3AqCv1PYwjHSg9IGmMaWFsicqSh25e9Khc485r+POcy3i2jsqD2XuWgcRya fSOaGhNE1A8HpUL89TOyheBQks6A0FAIhYXL6MyiPI4SbJoGtlcHKrBloSrxwd50 M/LsUhWw2suj9BblAufjIqCOcejhfAU+8VAYnu4+TucjoNyx5tiBZJhKQyUGE5/K vKO99cOdz1ZW7QM38oEt =v7Q/ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On Sat, Mar 10, 2012 at 05:54:04PM -0300, Cristian Rodríguez wrote:
On 10/03/12 16:44, Bruno Friedmann wrote:
I've saw exactly the same situation in a vm / being full 360Mb free during a zypper dup
There is something really wrong with this stuff..
- All monitoring programs report that there is space available. - No popup, no warning whatsoever that the filesystem X is becoming full (probably related to the previous one) - Subtle random "no space left on device".. usually "fixed" by running btrfs filesystem balance ... or something absurd like:
The no space message may be temporary, in case when the block reservation is done and then the file contents get inlined to the tree, so the previously reserved block is not used. This accounting is done at transaction commit time, ie. every 30 seconds or when sync is called (or when backgroud writeback starts to flush pages to disk).
first time : osc build --clean --> delay --> rpm complains about the db..then no space left on device, no packages installed.
second time: osc build --clean --> setup packages but stops at the few last ones..
third time --> rm -rf /path/to/the/package/build/root osc build --> package builds correctly, no errors, system normal ..you can write anything else to the FS.. WTF !!!
During these commands lots of file data are in flight and one simply cannot know when the blocks are returned back and used for other files, or just are kept reserved and hits no-space. This happened a lot in 3.2 kernels, and 3.3 improved the situation, however with patch http://thread.gmane.org/gmane.comp.file-systems.btrfs/15363/focus=16019 there are reports of early no-space (because of too eager block reservations). This patch is not in opensuse kernel.
osc build --> some *other* app, like the music player or the browser crashes.. (!!!!)
Do these application handle ENOSPC return code? :) david -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
El 10/03/12 20:27, David Sterba escribió:
The patches above are applied in opensuse kernel.
david
Jeff, David, Thanks for the explanations and taking a look at this :-D I have managed to delete some files ( no idea how that happended , obscure magic like Bruno's situation) the machine now boots and I will install a newer KOTD as soon at it appears... Cheers ! -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
El 10/03/12 20:38, Jeff Mahoney escribió:
The master branch didn't have any of the error handling patchset, but it does as of about 5 minutes ago.
Ok, it works but... [ 21.911116] ------------[ cut here ]------------ [ 21.911123] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.3.rc7/linux-3.3-rc7/fs/inode.c:346 inc_nlink+0x24/0x40() [ 21.911125] Hardware name: Studio XPS 435T/9000 [ 21.911127] Modules linked in: arc4 ath9k mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek ath9k_common ath9k_hw snd_hda_intel sr_mod ata_generic snd_hda_codec cdrom snd_hwdep pata_jmicron snd_pcm ath cfg80211 snd_timer snd i7core_edac edac_core cdc_acm r8169 coretemp iTCO_wdt i2c_i801 joydev dcdbas sg pcspkr iTCO_vendor_support serio_raw soundcore rfkill snd_page_alloc microcode autofs4 sha256_generic cbc dm_crypt dm_mod linear btrfs zlib_deflate crc32c_intel nouveau ttm drm_kms_helper drm mxm_wmi video wmi button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh [ 21.911158] Pid: 1103, comm: mount Not tainted 3.3.0-rc7-2-desktop #1 [ 21.911159] Call Trace: [ 21.911168] [<ffffffff8100456a>] dump_trace+0xaa/0x2b0 [ 21.911174] [<ffffffff815b94e1>] dump_stack+0x69/0x6f [ 21.911178] [<ffffffff8103f3bb>] warn_slowpath_common+0x7b/0xc0 [ 21.911181] [<ffffffff81179c64>] inc_nlink+0x24/0x40 [ 21.911201] [<ffffffffa020c7a8>] link_to_fixup_dir+0xc8/0xf0 [btrfs] [ 21.911322] [<ffffffffa020f1c9>] replay_one_buffer+0x1c9/0x360 [btrfs] [ 21.911442] [<ffffffffa020c00d>] walk_down_log_tree+0x20d/0x3e0 [btrfs] [ 21.911569] [<ffffffffa020c4de>] walk_log_tree+0x9e/0x210 [btrfs] [ 21.911684] [<ffffffffa021041e>] btrfs_recover_log_trees+0x1fe/0x380 [btrfs] [ 21.911797] [<ffffffffa01d79ff>] open_ctree+0x138f/0x19a0 [btrfs] [ 21.911847] [<ffffffffa01b211f>] btrfs_fill_super.isra.53+0x7f/0x150 [btrfs] [ 21.911865] [<ffffffffa01b4070>] btrfs_mount+0x3b0/0x3e0 [btrfs] [ 21.911880] [<ffffffff811642b5>] mount_fs+0x45/0x1d0 [ 21.911886] [<ffffffff8117e3ef>] vfs_kern_mount+0x6f/0x110 [ 21.911891] [<ffffffff8117f033>] do_kern_mount+0x53/0x120 [ 21.911896] [<ffffffff81180bc5>] do_mount+0x1a5/0x260 [ 21.911901] [<ffffffff8118106a>] sys_mount+0x9a/0xf0 [ 21.911908] [<ffffffff815da039>] system_call_fastpath+0x16/0x1b [ 21.911915] [<00007f4d9921186a>] 0x7f4d99211869 [ 21.911917] ---[ end trace 339ed378c0b86f99 ]--- -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
participants (4)
-
Bruno Friedmann
-
Cristian Rodríguez
-
David Sterba
-
Jeff Mahoney