[Bug 1068647] New: btrfs send bails out on orphan inode
http://bugzilla.suse.com/show_bug.cgi?id=1068647 Bug ID: 1068647 Summary: btrfs send bails out on orphan inode Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: martin.wilck@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- I can't create backups of my currently active subvolume with btrfs send any more. It proceeeds for several minutes, than bails out with the error message ERROR: send ioctl failed with -2: No such file or directory This situation has persisted throughout several reboots, remounts, and kernel updates. This first occured while I was doing lots of experiments with btrfs-send, working on my "btrfs-clone" utility (https://github.com/mwilck/btrfs-clone) and testing the efficiency of various cloning strategies within this tool. I made several successful snapshots, until suddenly the described situation came to pass which makes it impossible to do any more snapshots of the "main" subvol. Thus it's possible that the situation was caused by btrfs-send. I admit I did this while the subvolume was mounted. Of course I'd always set the cloned subvolume and its parent to read-only mode while ran btrfs-send (it will abort otherwise). Except for the btrfs-send, no other problems have been observed so far. All other tools (btrfs image, btrfs scrub, btrfs inspect-internal dump-tree) silently ignored this situation. I've traced the error down to the following call stack:
send_subvol() full_send_tree() changed_cb [btrfs]() finish_inode_if_needed [btrfs]() get_inode_info.constprop.48 [btrfs]() apply_children_dir_moves [btrfs]() send_utimes [btrfs]() get_cur_path [btrfs]() __get_cur_name_and_parent [btrfs]() is_inode_existent.part.15 [btrfs]() get_first_ref [btrfs]()
get_first_ref() returns with -ENOENT. I used some instrumentation to find out what was causing this problem, and found the following situation in the file system tree: root tree 28090400768(level 0) / 28058992640(1) - subvol @/.subvols/1/snapshot is item 89 at 28061384704
root tree node 28058992640 level 1 items 5 free 488 generation 188885 owner 1 fs uuid ad544c37-37f9-44a4-9798-d29d3cb5db44 chunk uuid bb2d5d7b-6b0b-4f2d-9677-097711ef8572 key (EXTENT_TREE ROOT_ITEM 0) block 28090400768 (1714502) gen 188885
leaf 28090400768 items 98 free space 6117 generation 188885 owner 1 leaf 28090400768 flags 0x1(WRITTEN) backref revision 1 fs uuid ad544c37-37f9-44a4-9798-d29d3cb5db44 chunk uuid bb2d5d7b-6b0b-4f2d-9677-097711ef8572
item 5 key (ROOT_TREE_DIR INODE_ITEM 0) itemoff 14770 itemsize 160 generation 3 transid 0 size 0 nbytes 16384 block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 sequence 0 flags 0x0(none) atime 1493728330.0 (2017-05-02 14:32:10) ctime 1493728330.0 (2017-05-02 14:32:10) mtime 1493728330.0 (2017-05-02 14:32:10) otime 1493728330.0 (2017-05-02 14:32:10)
item 12 key (257 ROOT_ITEM 0) itemoff 13191 itemsize 439 generation 180352 root_dirid 256 bytenr 36967874560 level 0 refs 1 lastsnap 12 byte_limit 0 bytes_used 16384 flags 0x0(none) uuid d4ad4958-2cc1-004e-934d-f223ef4c5f2a ctransid 180352 otransid 8 stransid 0 rtransid 0 drop key (0 UNKNOWN.0 0) level 0 item 13 key (257 ROOT_BACKREF 5) itemoff 13172 itemsize 19 root backref key dirid 256 sequence 2 name @
item 33 key (258 ROOT_ITEM 0) itemoff 12290 itemsize 439 generation 188511 root_dirid 256 bytenr 17933533184 level 1 refs 1 lastsnap 0 byte_limit 0 bytes_used 163840 flags 0x0(none) uuid e99de79b-c494-9445-812b-d08389f82947 ctransid 188511 otransid 11 stransid 0 rtransid 0 drop key (0 UNKNOWN.0 0) level 0 item 34 key (258 ROOT_BACKREF 257) itemoff 12262 itemsize 28 root backref key dirid 256 sequence 3 name .snapshots
item 89 key (259 ROOT_ITEM 12) itemoff 10419 itemsize 439 generation 188884 root_dirid 256 bytenr 28061384704 level 2 refs 1 lastsnap 186971 byte_limit 0 bytes_used 259981312 flags 0x0(none) uuid 17c54832-5028-694c-9f97-1be0a1df8757 parent_uuid d4ad4958-2cc1-004e-934d-f223ef4c5f2a ctransid 188884 otransid 12 stransid 0 rtransid 0 drop key (0 UNKNOWN.0 0) level 0 item 90 key (259 ROOT_BACKREF 258) itemoff 10393 itemsize 26 root backref key dirid 257 sequence 2 name snapshot
File system tree 28061384704 - btrfs path 28052832256(level 0) / 28045197312(1) / 28061384704(2) - item 8 (1746132 INODE_ITEM 0) has "links 0" and no ref - item 86 is an ORPHAN node linking to item 8.
file tree key (259 ROOT_ITEM 12) node 28061384704 level 2 items 56 free 437 generation 188884 owner 259 fs uuid ad544c37-37f9-44a4-9798-d29d3cb5db44 chunk uuid bb2d5d7b-6b0b-4f2d-9677-097711ef8572
key (1732417 INODE_REF 830626) block 28045197312 (1711743) gen 188860
node 28045197312 level 1 items 341 free 152 generation 188860 owner 259 fs uuid ad544c37-37f9-44a4-9798-d29d3cb5db44 chunk uuid bb2d5d7b-6b0b-4f2d-9677-097711ef8572 ... key (1746122 INODE_ITEM 0) block 28052832256 (1712209) gen 188821
leaf 28052832256 items 66 free space 8146 generation 188821 owner 259 leaf 28052832256 flags 0x1(WRITTEN) backref revision 1 fs uuid ad544c37-37f9-44a4-9798-d29d3cb5db44 chunk uuid bb2d5d7b-6b0b-4f2d-9677-097711ef8572 ... item 7 key (1746123 EXTENT_DATA 4096) itemoff 15701 itemsize 53 generation 188295 type 1 (regular) extent data disk byte 470392832 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 8 key (1746132 INODE_ITEM 0) itemoff 15541 itemsize 160 generation 188295 transid 188472 size 940 nbytes 4096 block group 0 mode 100644 links 0 uid 465 gid 464 rdev 0 sequence 16 flags 0x6(PREALLOC) atime 1510830949.715375999 (2017-11-16 12:15:49) ctime 1510838729.976180731 (2017-11-16 14:25:29) mtime 1510830949.687375946 (2017-11-16 12:15:49) otime 1510830949.687375946 (2017-11-16 12:15:49) item 9 key (1746132 EXTENT_DATA 0) itemoff 15488 itemsize 53 generation 188295 type 1 (regular) extent data disk byte 470614016 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 10 key (1746213 INODE_ITEM 0) itemoff 15328 itemsize 160 generation 188342 transid 188819 size 6498 nbytes 8192 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 0 flags 0x4(none) atime 1510849204.24960545 (2017-11-16 17:20:04) ctime 1510833103.94629539 (2017-11-16 12:51:43) mtime 1510833103.86629524 (2017-11-16 12:51:43) otime 1510833037.158509427 (2017-11-16 12:50:37) ... item 85 key (1746369 EXTENT_DATA 0) itemoff 7030 itemsize 53 generation 188860 type 1 (regular) extent data disk byte 368373760 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 86 key (ORPHAN ORPHAN_ITEM 1746132) itemoff 7030 itemsize 0 orphan item
So, while trying to process inode 1746213, btrfs send runs finish_inode_if_needed() on 1746132, and fails to send the utimes for this inode because it can't figure out the path. The way I'm reading the btrfs code, it seems that btrfs-send assumes that all orphans have been cleaned up when it starts running. Indeed, btrfs should have got rid of orphan nodes by running btrfs_orphan_cleanup() via btrfs_cleanup_fs_roots() at mount time, but that doesn't seem to be the case. In fact, I've run btrfs-debug tree on this file system two times with different FS generations, and found that the orphan inode hat changed its position (inode number) in the tree. Unfortunately I didn't keep the output of the first debug run, but I'm positive that the inode number of the failing inode was in the 1743xxx range first, and is now 1746132. Therefore I'm filing this as a kernel bug. The problem was first seen with kernel 4.13.11-1.1.g0526da3 now I'm running 4.14.0-1.1.gab9e909. There are 3 aspects to this problem: 1) how did it come to pass? 2) why doesn't the kernel clean up the orphan at boot time? Why does the orphan change inode number? Or is the orphan actually cleaned, but a new one is generated instead? 3) how do I recover? I'd really love to get rid of this broken inode. I could simply switch to a different snapshot of the file system, but AFAICS as long as I don't delete the offending subvolume, the situation would persist. Anyway, as long as 1) and 2) are open, I guess I'd rather keep the subvolume around to facilitate further debugging. I'm unsure if this is easily reproducible. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c1 --- Comment #1 from Martin Wilck <martin.wilck@suse.com> --- Here is one interesting piece of data: I created an image of this FS with btrfs-image -w, transferred it to another system, restored it to a nbd block device, and ran btrfs inspect-internal dump-tree on it. The orphan node had been preserved in this sequence of operations, it was still there, although a t slightly different position: item 99 key (ORPHAN ORPHAN_ITEM 1746132) itemoff 5630 itemsize 0 orphan item Next I mounted this file system on the other computer which is running Leap 42.2 (4.4.92-18.36-default), and saw the following:
Nov 17 10:28:09 zeus kernel: BTRFS warning (device nbd0): unrecognized super flag: 17179869184 Nov 17 10:28:09 zeus kernel: BTRFS info (device nbd0): disk space caching is enabled Nov 17 10:28:09 zeus kernel: BTRFS info (device nbd0): has skinny extents Nov 17 10:28:09 zeus kernel: BTRFS info (device nbd0): detected SSD devices, > enabling SSD mode Nov 17 10:28:09 zeus kernel: BTRFS info (device nbd0): checking UUID tree Nov 17 10:28:09 zeus kernel: BTRFS debug (device nbd0): unlinked 1 orphans
dump-tree is still running on the file system state after this, but I'm confident that the orphan node will be gone. I already know from other tests I made that btrfs-send --no-data works on the file systems created with btrfs-image/btrfs-image -r/mount on the Leap 42.2 system. Sooo, it would seem that we have a regression in 4.13/4.14, causing orphan cleanup at mount time to fail or skip this particular node for whatever reason. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 zhen ren <zren@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zren@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c2 --- Comment #2 from Martin Wilck <martin.wilck@suse.com> --- I've reviewed logs for btrfs related to orphan inode handling but haven't been able to see an obvious candidate causing the problem. Looking at the various patches between 4.4 and 4.13 handling corner cases where btrfs send/receive may fail makes me wonder how reliable this really is, though. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c3 Martin Wilck <martin.wilck@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fdmanana@suse.com --- Comment #3 from Martin Wilck <martin.wilck@suse.com> --- Filipe, as you've done some work in this area, maybe you can have a look? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c4 --- Comment #4 from Martin Wilck <martin.wilck@suse.com> --- One possibility to overcome the send problem would be simply ignoring orphan inodes in the code path indicated in comment 1 (finish_inode_if_needed() -> send_utimes()). I don't see how that would cause harm. Maybe send would bail out later if it hits the ORPHAN entry in the FS tree? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c5 --- Comment #5 from Martin Wilck <martin.wilck@suse.com> --- I found an old TW VM installation with this kernel: kernel-default-4.9.4-1.1.x86_64 * Sun Jan 15 2017 jslaby@suse.cz - Linux 4.9.4 (bnc#1012628). - commit ddcc4a0 and verified that it successfully removed the orphan at boot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c6 --- Comment #6 from Martin Wilck <martin.wilck@suse.com> --- Same thing with 4.11.2-1-default, using a qemu image I created with btrfs-image/btrfs-image -r.
# uname -r 4.11.2-1-default # rpm -q --changelog kernel-default-4.11.2 | head -n 3 * Sat May 20 2017 jslaby@suse.cz - Linux 4.11.2 (bnc#1012628). - commit 03903d8 # mount -o ro /dev/vdc /mnt/tst # btrfs send -v --no-data /mnt/tst >/dev/null Mode NO_FILE_DATA enabled At subvol /mnt/tst ERROR: send ioctl failed with -2: No such file or directory # mount -o rw,remount /mnt/tst # dmesg | grep -i btrfs | tail -n 2 [ 1089.315435] BTRFS info (device vdc): disk space caching is enabled [ 1089.322606] BTRFS debug (device vdc): unlinked 1 orphans # btrfs send -v --no-data /mnt/tst >/dev/null Mode NO_FILE_DATA enabled At subvol /mnt/tst BTRFS_IOC_SEND returned 0 joining genl thread
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c7 --- Comment #7 from Martin Wilck <martin.wilck@suse.com> --- ARGH. I tested in a VM with 4.13.12-1-default, and all is fine, it behaves just like 4.11.2 in the last comment. Really confused now, why doesn't the same thing happen on my laptop? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c8 --- Comment #8 from Martin Wilck <martin.wilck@suse.com> --- Update: this is not a kernel *regression* after all. Still highly erratic behavior. If I mount the FS and run the "btrfs send" operation on the default subvol early during boot (dracut pre-udev stage), I see no errors (I don't see the "unlinked 1 orphans" message either during mount, maybe because there aren't any orphans at this stage). BUT when the system is fully up, the error during btrfs send occurs again. This shows that, in contrast to my original conjecture, the orphan does NOT survive the first r/w mount. It probably doesn't even survive the umount at shutdown. But somehow a (new?) orphan springs into life when the system is up, causing the error in btrfs send. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c9 --- Comment #9 from Martin Wilck <martin.wilck@suse.com> --- I cloned my problematic FS using raw "dd", changed the UUID with btrfstune, and have now two partitions in my laptop with basically the same OS, I can switch back and forth. The new FS is sda2, the old one was sda7. If I boot sda2, the btrfs send error occurs on sda2, and sda7 works fine. So it seems that this is an effect that occurs only when the FS is *in use* in some specific way, which is apparently much different from just being *mounted*. I might have guessed that of course, but I didn't see anything in the man page and "btrfs send" had worked well for me on various live file systems, so I assumed that this was supported. I also saw this in btrfs_ioctl_send():
WARN_ON(send_root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE);
and I never saw this warning, so I thought the orphan cleanup state should be OK. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c10 --- Comment #10 from Martin Wilck <martin.wilck@suse.com> --- It's even more funny. The problem seems to occur if and only if I'm running in a graphical environment. # systemctl isolate multi-user.target # btrfs prop set -ts / ro true; btrfs send --no-data -v / >/dev/null; \ btrfs prop set -ts / ro false BTRFS_IOC_SEND returned 0 ## (this can be repeated with same result) # btrfs isolate graphical.target # btrfs prop set -ts / ro true; btrfs send --no-data -v / >/dev/null; \ btrfs prop set -ts / ro false ERROR: send ioctl failed with -2: no such file or directory ## (this can also be repeated with same result) # systemctl isolate multi-user.target # btrfs prop set -ts / ro true; btrfs send --no-data -v / >/dev/null; \ btrfs prop set -ts / ro false BTRFS_IOC_SEND returned 0 ## (this can be repeated with same result) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c11 --- Comment #11 from Martin Wilck <martin.wilck@suse.com> --- Surprise - this has something to do with gdm. If I replace gdm with a different display manager (tried lightdm so far), the problem doesn't occur any more, even if I'm logged in under GNOME. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c12 --- Comment #12 from Martin Wilck <martin.wilck@suse.com> --- I think I've understood what's going on. During startup of gdm, there seems to be a race for the filr /var/lib/gdb/.config/dconf/user. Various gsd-related daemons open and mmap() this file, while at the same time the "dconf-service" utility replaces it (twice) with a different file using rename() system calls. The gsd daemons keep the mapping of the old, deleted file in memory, which causes the file to become an orphan. As soon as gdm is stopped, the orphan disappears. This explains the observations made in comment 10 and comment 11. I can see two problems here: 1) the behavior of the gnome-settings-daemon seems to be badly in line with what "dconf-service" is doing. Upon closer inspection, I can see lots of processes mapping deleted instances of /run/user/$UID/dconf/user and $HOME/.config/dconf/user, for both the gdm account and my account. The btrfs send problem aside, that can't be a healthy situation, but it has to be either a dconf or a gsd problem, or both. 2) Nonetheless, it's commonplace that processes have open fds or mmaps for deleted files. It seems wrong that "btrfs send" would fail in such situations. The right thing (TM) to do for orphan inodes in this specific case (callstack in comment 1) would be to ignore the orphan for the send_utimes call. I'm going to split off (1) from this bz. For (2), I may come up with a btrfs patch. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c13 --- Comment #13 from Martin Wilck <martin.wilck@suse.com> --- GNOME/dconf side tracked in bsc#1068837. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c14 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeffm@suse.com Assignee|kernel-maintainers@forge.pr |fdmanana@suse.com |ovo.novell.com | --- Comment #14 from Jeff Mahoney <jeffm@suse.com> --- That makes sense. The orphan items are left whenever there is an open files deleted or truncated. In normal runtime, they could be all over the place, to be cleaned up in a later transaction (or mount). This is true of any file system, not just btrfs. Filipe is the send/receive expert and has the most recent experience so I’ll bounce this one to him. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 Matthias Eckermann <mge@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium CC| |mge@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c15 --- Comment #15 from Martin Wilck <martin.wilck@suse.com> --- Has anything happened yet? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c16 --- Comment #16 from Filipe Manana <fdmanana@suse.com> --- (In reply to Martin Wilck from comment #15)
Has anything happened yet?
I sent today a btrfs fix and fstest upstream: https://patchwork.kernel.org/patch/10539705/ https://patchwork.kernel.org/patch/10539709/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1068647 http://bugzilla.suse.com/show_bug.cgi?id=1068647#c17 --- Comment #17 from Martin Wilck <martin.wilck@suse.com> --- Thanks, that looks very promising. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1068647 https://bugzilla.suse.com/show_bug.cgi?id=1068647#c18 --- Comment #18 from Martin Wilck <martin.wilck@suse.com> --- This seems to have been dup'd to bug 1110650. Unfortunately I'm still seeing send/receive issues. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1068647 https://bugzilla.suse.com/show_bug.cgi?id=1068647#c19 --- Comment #19 from Filipe Manana <fdmanana@suse.com> --- (In reply to Martin Wilck from comment #18)
This seems to have been dup'd to bug 1110650.
Unfortunately I'm still seeing send/receive issues.
Can you paste the exact errors send/receive gives? Ideally with -vvv passed to btrfs receive and anything from syslog/dmesg if available. Historically, there were many different cases leading to send/receive failing in similar ways, so I need to know exactly how it failed. Also, which kernel version? Is it again tumbleweed, leap, SLE, which version? Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com