Hello, On Tue, 30 Oct 2018, Dave Howorth wrote:
On Tue, 30 Oct 2018 17:36:09 +0100 Per Jessen
wrote: This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
Try: ionice -c 3 nice find ... That way, your system should stay nicely responding while your disks/ssds are exercised ;) Just did a find ... over a couple of TB on 7 disks (no dirs with tons of files though), and the system was as smooth as ever while I'm typing this ;) That find generates a plain-text index ('name\tsize') over currently about ~900k files ;) Accordingly, 'free' shows ~2.3GB cached and 682MB buffers ATM just after the find is just done ;) And that's with just 4GiB RAM ...
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
I don't know the answer to your question, but I'm interested in it, since I used to have a lot of directories like that and just learned not to do an ls on them :( I never tried find because the names were systemised and I had an index of them. Oh, and yes, it is filesystem dependent but still bad news in all the ones I tried.
Yes. Which is why I have my news-spool on a loop-mounted reiserfs3-image (also to get around the inode problem on the ext3/4 I use "on disk" ;) $ /sbin/losetup -a |grep news /dev/loop0: [0815]:32897 (/Video/P/news_reiserfs.img) $ grep news /proc/mounts /dev/loop0 /var/spool/news reiserfs rw,acl,user_xattr,barrier=flush 0 0 $ df -ih /Video/P/; df -Th /Video/P/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb5 1.8M 2.2K 1.8M 1% /Video/P Filesystem Type Size Used Avail Use% Mounted on /dev/sdb5 ext3 1.8T 1.7T 4.2G 100% /Video/P *mompl* # umount /var/spool/news # reiserfsck -y /Video/P/news_reiserfs.img [..] Directories 1234 Other files 920677 That /Video/P FS was (and is) meant to store fewish larger files, that 1.8M inodes is plenty for the intended use. Also, I can move the newsspool around "en bloc" should I want/need to ;) And having those 900k+ files/dirs of the newsspool in just one inode of the ext3 is just _NICE_ :) A find/ls in the spool is still dead slow, even though a (loop-) reiserfs is just for such a use-case. Boggles of small unimportant / replaceable / re-downloadable files...
I believe ls uses readdir() rather than getdents().
Actually, no, ls seems to _only_ use getdents(2). $ LC_ALL=C ltrace -S -e file ls /tmp/ (add '-R' as an 'ls' option, makes it hard to read though)...
Did you try that and/or does your faster program work with it instead? I'd be interested to try to track down what it is that makes ls unusably slow in these circumstances. Maybe it's calling stat() or building in-memory structures for sorting the names or somesuch that causes the slowdown.
Well, find does use 'readdir(3)' which uses 'getdents(2)'. ls does quite a bit more "around" (*stat)
If you're willing to post the source of your utility or email it, I'll have a play.
AOL :)
But: on whatever FS, you'll have to get the info _somehow_ from the
VFS, so it depends on what your program wants... For plain names,
well, a dir is just a (special) file which you could mmap and parse
(or just use readdir which does just that ;)
Playing around with debugfs and the kernel-sources of the fs and the
manpages is enlightning (but just for fun).
Well, if your interested, just do:
$ dd if=/dev/zero bs=10M count=1 of=foo.img
$ mkfs.