[opensuse] ls or find on directories with millions of files ?
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational. I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc). -- Per Jessen, Zürich (5.9°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Tue, 30 Oct 2018 17:36:09 +0100 Per Jessen <per@computer.org> wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
I don't know the answer to your question, but I'm interested in it, since I used to have a lot of directories like that and just learned not to do an ls on them :( I never tried find because the names were systemised and I had an index of them. Oh, and yes, it is filesystem dependent but still bad news in all the ones I tried. I believe ls uses readdir() rather than getdents(). Did you try that and/or does your faster program work with it instead? I'd be interested to try to track down what it is that makes ls unusably slow in these circumstances. Maybe it's calling stat() or building in-memory structures for sorting the names or somesuch that causes the slowdown. If you're willing to post the source of your utility or email it, I'll have a play. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dave Howorth wrote:
On Tue, 30 Oct 2018 17:36:09 +0100 Per Jessen <per@computer.org> wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
I don't know the answer to your question, but I'm interested in it, since I used to have a lot of directories like that and just learned not to do an ls on them :(
I was planning to run a 'find' this Friday after 1800, but then I got annoyed and started looking for a real solution.
I believe ls uses readdir() rather than getdents(). Did you try that and/or does your faster program work with it instead?
afaict, both find and ls use readdir() and that is the problem.
I'd be interested to try to track down what it is that makes ls unusably slow in these circumstances. Maybe it's calling stat() or building in-memory structures for sorting the names or somesuch that causes the slowdown.
My code calls stat() as it goes along, it's still perfectly useable.
If you're willing to post the source of your utility or email it, I'll have a play.
I borrowed it from here: http://man7.org/linux/man-pages/man2/getdents.2.html -- Per Jessen, Zürich (5.6°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hello, On Tue, 30 Oct 2018, Dave Howorth wrote:
On Tue, 30 Oct 2018 17:36:09 +0100 Per Jessen <per@computer.org> wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
Try: ionice -c 3 nice find ... That way, your system should stay nicely responding while your disks/ssds are exercised ;) Just did a find ... over a couple of TB on 7 disks (no dirs with tons of files though), and the system was as smooth as ever while I'm typing this ;) That find generates a plain-text index ('name\tsize') over currently about ~900k files ;) Accordingly, 'free' shows ~2.3GB cached and 682MB buffers ATM just after the find is just done ;) And that's with just 4GiB RAM ...
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
I don't know the answer to your question, but I'm interested in it, since I used to have a lot of directories like that and just learned not to do an ls on them :( I never tried find because the names were systemised and I had an index of them. Oh, and yes, it is filesystem dependent but still bad news in all the ones I tried.
Yes. Which is why I have my news-spool on a loop-mounted reiserfs3-image (also to get around the inode problem on the ext3/4 I use "on disk" ;) $ /sbin/losetup -a |grep news /dev/loop0: [0815]:32897 (/Video/P/news_reiserfs.img) $ grep news /proc/mounts /dev/loop0 /var/spool/news reiserfs rw,acl,user_xattr,barrier=flush 0 0 $ df -ih /Video/P/; df -Th /Video/P/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb5 1.8M 2.2K 1.8M 1% /Video/P Filesystem Type Size Used Avail Use% Mounted on /dev/sdb5 ext3 1.8T 1.7T 4.2G 100% /Video/P *mompl* # umount /var/spool/news # reiserfsck -y /Video/P/news_reiserfs.img [..] Directories 1234 Other files 920677 That /Video/P FS was (and is) meant to store fewish larger files, that 1.8M inodes is plenty for the intended use. Also, I can move the newsspool around "en bloc" should I want/need to ;) And having those 900k+ files/dirs of the newsspool in just one inode of the ext3 is just _NICE_ :) A find/ls in the spool is still dead slow, even though a (loop-) reiserfs is just for such a use-case. Boggles of small unimportant / replaceable / re-downloadable files...
I believe ls uses readdir() rather than getdents().
Actually, no, ls seems to _only_ use getdents(2). $ LC_ALL=C ltrace -S -e file ls /tmp/ (add '-R' as an 'ls' option, makes it hard to read though)...
Did you try that and/or does your faster program work with it instead? I'd be interested to try to track down what it is that makes ls unusably slow in these circumstances. Maybe it's calling stat() or building in-memory structures for sorting the names or somesuch that causes the slowdown.
Well, find does use 'readdir(3)' which uses 'getdents(2)'. ls does quite a bit more "around" (*stat)
If you're willing to post the source of your utility or email it, I'll have a play.
AOL :) But: on whatever FS, you'll have to get the info _somehow_ from the VFS, so it depends on what your program wants... For plain names, well, a dir is just a (special) file which you could mmap and parse (or just use readdir which does just that ;) Playing around with debugfs and the kernel-sources of the fs and the manpages is enlightning (but just for fun). Well, if your interested, just do: $ dd if=/dev/zero bs=10M count=1 of=foo.img $ mkfs.<your_favorite_fs> [-f] foo.img [loop mount and write some some stuff to play with on the fs] [sync && unmount] $ debugfoofs foo.img $ vche foo.img ## or whatever hexeditor works with "disks" ... It _is_ fun to play around with that :) -dnh -- Microsoft is a cross between The Borg and the Ferengi. Unfortunately they use Borg to do their marketing and Ferengi to do their programming. [Simon Slavin in the SDM] -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
David Haller wrote:
Hello,
On Tue, 30 Oct 2018, Dave Howorth wrote:
On Tue, 30 Oct 2018 17:36:09 +0100 Per Jessen <per@computer.org> wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
Try:
ionice -c 3 nice find ...
That way, your system should stay nicely responding while your disks/ssds are exercised ;) Just did a find ... over a couple of TB on 7 disks (no dirs with tons of files though), and the system was as smooth as ever while I'm typing this ;)
Generally the system responds fine, and the volume is only about 600Gb. 94% full which is why I'm tidying up :-) I tried your 'ionice -c 3 nice find ...' - it appears to take the same time. No output for the first 10min. An strace shows it doing the same - running getdents() one after another.
That find generates a plain-text index ('name\tsize') over currently about ~900k files ;) Accordingly, 'free' shows ~2.3GB cached and 682MB buffers ATM just after the find is just done ;) And that's with just 4GiB RAM ...
The killer seems to be the 'find' on a directory will millions of files.
But: on whatever FS, you'll have to get the info _somehow_ from the VFS, so it depends on what your program wants... For plain names, well, a dir is just a (special) file which you could mmap and parse (or just use readdir which does just that ;)
I need name and mtime, but the stat() that I added to the getdent() loop did not seem to cause any problem or delays. I mean, it will be noticeable over 3mill files, but that's okay as long as the system remains responsive. Seems to me 'find' ought to have an option to say "one getdent() at a time" or some such. -- Per Jessen, Zürich (5.8°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 30/10/18 12:36 PM, Per Jessen wrote:
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Well, I'd start with the "-f -- do not sort" option. Sorting means slurp up *everything* into memory, which is going to involve a lot of virtual memory work and probably paging to get there. What you really want is "read one/print one" or "read one/process one" -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
On 30/10/18 12:36 PM, Per Jessen wrote:
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Well, I'd start with the "-f -- do not sort" option. Sorting means slurp up *everything* into memory, which is going to involve a lot of virtual memory work and probably paging to get there.
It's not memory that is the problem - the box has 64Gb. I tried 'ls -f', made no difference although I did not let it finish - the last 'find' ran for 14 hours before I had to stop it.
What you really want is "read one/print one" or "read one/process one"
Yep and that is more or less what the code does - do one getdents(), process it, rinse, repeat. With that I can list 3+ million files in minutes. It just seems to me 'ls' and 'find' ought to be able to do the same. -- Per Jessen, Zürich (5.7°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 30/10/2018 19.30, Per Jessen wrote:
Anton Aylward wrote:
On 30/10/18 12:36 PM, Per Jessen wrote:
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Well, I'd start with the "-f -- do not sort" option. Sorting means slurp up *everything* into memory, which is going to involve a lot of virtual memory work and probably paging to get there.
It's not memory that is the problem - the box has 64Gb. I tried 'ls -f', made no difference although I did not let it finish - the last 'find' ran for 14 hours before I had to stop it.
What you really want is "read one/print one" or "read one/process one"
Yep and that is more or less what the code does - do one getdents(), process it, rinse, repeat. With that I can list 3+ million files in minutes. It just seems to me 'ls' and 'find' ought to be able to do the same.
Have you tried a filebrowser? I'm just curious. Maybe 'mc'. Another thing to try is magnetic vs ssd disk, see if it is iobound. Huh, no, you said a different library call is faster. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)
Hello, On Tue, 30 Oct 2018, Per Jessen wrote:
Anton Aylward wrote:
On 30/10/18 12:36 PM, Per Jessen wrote:
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Well, I'd start with the "-f -- do not sort" option. Sorting means slurp up *everything* into memory, which is going to involve a lot of virtual memory work and probably paging to get there.
It's not memory that is the problem - the box has 64Gb. I tried 'ls -f', made no difference although I did not let it finish - the last 'find' ran for 14 hours before I had to stop it.
What you really want is "read one/print one" or "read one/process one"
Yep and that is more or less what the code does - do one getdents(), process it, rinse, repeat. With that I can list 3+ million files in minutes. It just seems to me 'ls' and 'find' ought to be able to do the same.
Weird. Both 'find' and 'ls' just do that (wrapped in a readdir(3) possibly). Could you compare an $ ltrace -S -efile ... of ls/find vs. your "just getdents" code? A small sample dir (-tree) should suffice to spot the diff (when ls/find don't sort)... Oh, well, yeah, ls _does_ need to stat if it's to print dates and stuff, so keep that in mind. -dnh ObSigNote: *bwah* Somewhen, all the fun was pruned... And now a CoC? Whut?!?!!! -- prom_printf("Detected PenguinPages, getting out of here.\n"); linux-2.0.38/arch/sparc/mm/srmmu.c -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
David Haller wrote:
Hello,
On Tue, 30 Oct 2018, Per Jessen wrote:
Anton Aylward wrote:
On 30/10/18 12:36 PM, Per Jessen wrote:
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Well, I'd start with the "-f -- do not sort" option. Sorting means slurp up *everything* into memory, which is going to involve a lot of virtual memory work and probably paging to get there.
It's not memory that is the problem - the box has 64Gb. I tried 'ls -f', made no difference although I did not let it finish - the last 'find' ran for 14 hours before I had to stop it.
What you really want is "read one/print one" or "read one/process one"
Yep and that is more or less what the code does - do one getdents(), process it, rinse, repeat. With that I can list 3+ million files in minutes. It just seems to me 'ls' and 'find' ought to be able to do the same.
Weird. Both 'find' and 'ls' just do that (wrapped in a readdir(3) possibly).
Could you compare an
$ ltrace -S -efile ...
of ls/find vs. your "just getdents" code? A small sample dir (-tree) should suffice to spot the diff (when ls/find don't sort)...
Yep, that sounds interesting, I'll try that tomorrow.
Oh, well, yeah, ls _does_ need to stat if it's to print dates and stuff, so keep that in mind.
Good point. find will also need to, if I use -mtime (which I tend to do when tidying up old stuff). -- Per Jessen, Zürich (5.6°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
David Haller wrote:
Could you compare an
$ ltrace -S -efile ...
of ls/find vs. your "just getdents" code? A small sample dir (-tree) should suffice to spot the diff (when ls/find don't sort)...
Yep, that sounds interesting, I'll try that tomorrow.
These are the traces: https://files.jessen.ch/ltrace-find.log (find -mtime) https://files.jessen.ch/ltrace-ls.log (ls -f) https://files.jessen.ch/ltrace-lsdir.log (my code) The directory is "cur". A quick glance at ls and find, and I see roughly the same behaviour - open("cur") followed by streaming getdents(). 'ls' uses mmap, 'find' does not. (both ran for less than a minute). My code (largely borrowed from the man page) - open("cur") getdents stat stat write . . . . . getdents stat stat write The extra stat call is because I call ctime() to print the mtime. -- Per Jessen, Zürich (5.8°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
Anton Aylward wrote:
On 30/10/18 12:36 PM, Per Jessen wrote:
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Well, I'd start with the "-f -- do not sort" option. Sorting means slurp up *everything* into memory, which is going to involve a lot of virtual memory work and probably paging to get there.
It's not memory that is the problem - the box has 64Gb. I tried 'ls -f', made no difference although I did not let it finish - the last 'find' ran for 14 hours before I had to stop it.
What you really want is "read one/print one" or "read one/process one"
Yep and that is more or less what the code does - do one getdents(), process it, rinse, repeat. With that I can list 3+ million files in minutes.
Okay, I was perhaps a little enthusiastic here. "minutes" yes, but more like 120+. I'm at 3291912 files and counting, around July 2015. Still, the system remains responsive, processing, database is running, much much better than with 'find'. -- Per Jessen, Zürich (5.3°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/30/18 9:18 PM, Per Jessen wrote:
Per Jessen wrote:
Anton Aylward wrote:
What you really want is "read one/print one" or "read one/process one"
Yep and that is more or less what the code does - do one getdents(), process it, rinse, repeat. With that I can list 3+ million files in minutes.
That is exactly what find does by using gnulib's FTS: it reads a certain number of entries, processes them, and then reads the next ones until all are done (or find terminates otherwise).
Okay, I was perhaps a little enthusiastic here. "minutes" yes, but more like 120+. I'm at 3291912 files and counting, around July 2015. Still, the system remains responsive, processing, database is running, much much better than with 'find'.
I guess that you added an option like -mtime that requires an additional stat(), or the action does (like -ls). You didn't write what file system type you are using, did you? After all, this seems to be a very special action on that directory. What will you do with that list? I mean you'll most probably need yet another round to move or delete the files ... Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Bernhard Voelker wrote:
On 10/30/18 9:18 PM, Per Jessen wrote:
Per Jessen wrote:
Anton Aylward wrote:
What you really want is "read one/print one" or "read one/process one"
Yep and that is more or less what the code does - do one getdents(), process it, rinse, repeat. With that I can list 3+ million files in minutes.
That is exactly what find does by using gnulib's FTS: it reads a certain number of entries, processes them, and then reads the next ones until all are done (or find terminates otherwise).
Hi Berny That sounds like what I am asking for - how do I make 'find' do that?
Okay, I was perhaps a little enthusiastic here. "minutes" yes, but more like 120+. I'm at 3291912 files and counting, around July 2015. Still, the system remains responsive, processing, database is running, much much better than with 'find'.
I guess that you added an option like -mtime that requires an additional stat(), or the action does (like -ls). You didn't write what file system type you are using, did you?
File system is JFS - it's likely I used -mtime on the attempt that hung up the entire system, yes. Using my own bit of code, I went through a directory with 10715698 files in 629minutes, including a stat() call. The time is really not important, the key thing is the system kept running and remained responsive.
After all, this seems to be a very special action on that directory. What will you do with that list? I mean you'll most probably need yet another round to move or delete the files ...
Yes, most of them will be deleted - being careful and doing them one by one will take a couple of days maybe, but again, the system will keep running. I'm currently deleting 3037712 files, that has been running for about 16 hours now. -- Per Jessen, Zürich (4.1°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/31/18 8:10 AM, Per Jessen wrote:
Bernhard Voelker wrote:
That is exactly what find does by using gnulib's FTS: it reads a certain number of entries, processes them, and then reads the next ones until all are done (or find terminates otherwise).
Hi Berny
That sounds like what I am asking for - how do I make 'find' do that?
That needs to be the 'ftsfind' binary in older versions, starting with 4.3.
I guess that you added an option like -mtime that requires an additional stat(), or the action does (like -ls). You didn't write what file system type you are using, did you?
File system is JFS - it's likely I used -mtime on the attempt that hung up the entire system, yes.
I think JFS should be fine - I just wanted to rule out networking or ancient file systems etc. Well, you could do much about it anyway.
Using my own bit of code, I went through a directory with 10715698 files in 629minutes, including a stat() call. The time is really not important, the key thing is the system kept running and remained responsive.
Re. responsiveness: I'd guess that depends on the implementation of the file system in the kernel. Of course, there is 'ionice' which could tell the system to avoid overly eagerness.
After all, this seems to be a very special action on that directory. What will you do with that list? I mean you'll most probably need yet another round to move or delete the files ...
Yes, most of them will be deleted - being careful and doing them one by _______^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What is the criteria? Can you decide by filename? Then you could generate a list of commands like shown in the other email with 'find ... -printf'. Last-minute idea: Another option would be to move out the "good" files you want to keep, and then delete the whole directory. ;-) Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Bernhard Voelker wrote:
On 10/31/18 8:10 AM, Per Jessen wrote:
Bernhard Voelker wrote:
That is exactly what find does by using gnulib's FTS: it reads a certain number of entries, processes them, and then reads the next ones until all are done (or find terminates otherwise).
Hi Berny
That sounds like what I am asking for - how do I make 'find' do that?
That needs to be the 'ftsfind' binary in older versions, starting with 4.3.
I might try it, just build it from source on this system. It would be very cool if the ftsfind works.
After all, this seems to be a very special action on that directory. What will you do with that list? I mean you'll most probably need yet another round to move or delete the files ...
Yes, most of them will be deleted - being careful and doing them one by _______^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What is the criteria? Can you decide by filename?
Age. The filename do include a timestamp, but I don't want to rely on it.
Then you could generate a list of commands like shown in the other email with 'find ... -printf'.
Well, for the biggest directory with some 10mill files, I now have a full list, I'll slowly start looping through that and deleting the files. -- Per Jessen, Zürich (9.8°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/30/18 5:36 PM, Per Jessen wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Both find and ls use gnulib's FTS implementation instead of raw readdir and friends. You didn't show us the command line you used, but I would guess that you used some options that require an additional stat(). E.g. "ls -l" needs to do an additional stat(). Likewise for coloring output etc. Furthermore, ls defaults to sort the output. Better use the -U option to stick to "directory order". The option -f includes -U. For find, it is the same: did you only run 'find'/'find -print' or 'find -ls'? The former 2 should should be quite fast while the latter also requires a stat(). Finally, it depends on the file system type: e.g. NFS is known to become really nasty with so many files in a directory. Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Bernhard Voelker wrote:
On 10/30/18 5:36 PM, Per Jessen wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Both find and ls use gnulib's FTS implementation instead of raw readdir and friends.
Is that relatively new? This is an elderly system, not up-to-date, running openSUSE 11. Another reason for cleaning it up.
You didn't show us the command line you used, but I would guess that you used some options that require an additional stat(). E.g. "ls -l" needs to do an additional stat(). Likewise for coloring output etc. Furthermore, ls defaults to sort the output. Better use the -U option to stick to "directory order". The option -f includes -U.
Yes, Anton also suggested using '-f' yesterday, but my initial test did not show any improvement. (within 5-10min).
For find, it is the same: did you only run 'find'/'find -print' or 'find -ls'? The former 2 should should be quite fast while the latter also requires a stat().
I almost certainly did 'find dir -mtime +365' - this would also require a stat(), but my code shows the stat() is not the problem. -- Per Jessen, Zürich (4.6°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/31/18 3:23 AM, Per Jessen wrote:
On 10/30/18 5:36 PM, Per Jessen wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory.
when searching for a file by its name, would the "locate" command be unsuitable? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 31/10/2018 12.27, ken wrote:
On 10/31/18 3:23 AM, Per Jessen wrote:
On 10/30/18 5:36 PM, Per Jessen wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory.
when searching for a file by its name, would the "locate" command be unsuitable?
Yes, if it contains that directory it can certainly be used. It can be told to verify if the file currently exists. However, "locate /etc" also finds "/backup/etc/*" or "/alternate/etc/*", There is no switch for absolute paths that I know of. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)
On 10/31/18 7:45 AM, Carlos E. R. wrote:
On 31/10/2018 12.27, ken wrote:
On 10/31/18 3:23 AM, Per Jessen wrote:
On 10/30/18 5:36 PM, Per Jessen wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. when searching for a file by its name, would the "locate" command be unsuitable? Yes, if it contains that directory it can certainly be used. It can be told to verify if the file currently exists.
However, "locate /etc" also finds "/backup/etc/*" or "/alternate/etc/*", There is no switch for absolute paths that I know of.
Then use "locate --regex ^/etc/[fname or whatever]" for absolute path. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 10/31/18 8:23 AM, Per Jessen wrote:
Bernhard Voelker wrote:
On 10/30/18 5:36 PM, Per Jessen wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Both find and ls use gnulib's FTS implementation instead of raw readdir and friends.
Is that relatively new? This is an elderly system, not up-to-date, running openSUSE 11. Another reason for cleaning it up.
Indeed, FTS support has been added in the 4.3 and newer, and as far as I remember it was installed aside the regular 'find' (alias 'oldfind' upstreams) as 'ftsfind'. Nowadays, 'oldfind' has gone and is only available in the upstream git development tree.
You didn't show us the command line you used, but I would guess that you used some options that require an additional stat(). E.g. "ls -l" needs to do an additional stat(). Likewise for coloring output etc. Furthermore, ls defaults to sort the output. Better use the -U option to stick to "directory order". The option -f includes -U.
Yes, Anton also suggested using '-f' yesterday, but my initial test did not show any improvement. (within 5-10min).
I think the fastest possible way would be '/usr/bin/ls -1U', thus ensuring that you don't use the 'ls' alias which is usually adding some options automagically.
For find, it is the same: did you only run 'find'/'find -print' or 'find -ls'? The former 2 should should be quite fast while the latter also requires a stat().
I almost certainly did 'find dir -mtime +365' - this would also require a stat(), but my code shows the stat() is not the problem.
Assuming you have quite regular file names, you could have find to create commands for you, e.g. moving into a per-day subdirectory $ find . -maxdepth 1 -mindepth 1 \ -type f -mtime +365 \ -printf 'mkdir -p "%TY/%Tm/%Td" && mv -n "%P" "%TY/%Tm/%Td/"\n' would produce commands like: mkdir -p "2015/12/25" && mv -n "file.1451011600" "2015/12/25/" mkdir -p "2015/12/25" && mv -n "file.1451008000" "2015/12/25/" mkdir -p "2015/12/25" && mv -n "file.1451004400" "2015/12/25/" mkdir -p "2015/12/25" && mv -n "file.1451000800" "2015/12/25/" mkdir -p "2015/12/24" && mv -n "file.1450997200" "2015/12/24/" mkdir -p "2015/12/24" && mv -n "file.1450993600" "2015/12/24/" which you could feed into a subsequent shell. (Obviously, it would be faster if you'd know and create the subdirectories upfront.) If you just want to delete them, the -delete option is your friend (also >v4.3). Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Bernhard Voelker wrote:
On 10/31/18 8:23 AM, Per Jessen wrote:
Bernhard Voelker wrote:
On 10/30/18 5:36 PM, Per Jessen wrote:
This might be file system dependent, I'm not sure. I've been doing some tidying up and got stuck on a few directories with millions of files in them. 3+ million per directory. Doing a 'find' takes a very long time and also essentially chokes the system. I ended up writing a small utility using getdents() instead, much faster and the system remains operational.
I was just wondering if e.g. 'find' or 'ls' had some options that would limit the scope ? (not mtime etc).
Both find and ls use gnulib's FTS implementation instead of raw readdir and friends.
Is that relatively new? This is an elderly system, not up-to-date, running openSUSE 11. Another reason for cleaning it up.
Indeed, FTS support has been added in the 4.3 and newer,
Berny, do you know how I would determine if the currently installed 'find' has FTS support? I have version 4.4 installed. -- Per Jessen, Zürich (8.9°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 11/1/18 6:15 PM, Per Jessen wrote:
Berny, do you know how I would determine if the currently installed 'find' has FTS support? I have version 4.4 installed.
On oS:TW, I see fts_{open,close,read} and friends in the output of: $ readelf -s /usr/bin/find | grep -i fts 151: 0000000000022220 923 FUNC GLOBAL DEFAULT 15 fts_open 165: 00000000000225c0 377 FUNC GLOBAL DEFAULT 15 fts_close 166: 0000000000022fc0 41 FUNC GLOBAL DEFAULT 15 fts_set 167: 0000000000022740 2176 FUNC GLOBAL DEFAULT 15 fts_read 168: 0000000000022ff0 381 FUNC GLOBAL DEFAULT 15 fts_children Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Bernhard Voelker wrote:
On 11/1/18 6:15 PM, Per Jessen wrote:
Berny, do you know how I would determine if the currently installed 'find' has FTS support? I have version 4.4 installed.
On oS:TW, I see fts_{open,close,read} and friends in the output of:
$ readelf -s /usr/bin/find | grep -i fts 151: 0000000000022220 923 FUNC GLOBAL DEFAULT 15 fts_open 165: 00000000000225c0 377 FUNC GLOBAL DEFAULT 15 fts_close 166: 0000000000022fc0 41 FUNC GLOBAL DEFAULT 15 fts_set 167: 0000000000022740 2176 FUNC GLOBAL DEFAULT 15 fts_read 168: 0000000000022ff0 381 FUNC GLOBAL DEFAULT 15 fts_children
Thanks, that was a great answer! I didn't know about readelf. ' on my system with opensuse 11 - # readelf -s /usr/bin/find | grep -i fts 131: 00000000004158b0 1851 FUNC GLOBAL DEFAULT 14 fts_read 132: 0000000000414750 45 FUNC GLOBAL DEFAULT 14 fts_set 133: 0000000000415650 436 FUNC GLOBAL DEFAULT 14 fts_children 136: 0000000000416150 867 FUNC GLOBAL DEFAULT 14 fts_open 141: 0000000000415ff0 338 FUNC GLOBAL DEFAULT 14 fts_close I'm not sure what this FTS support is, but it doesn't seem to help. -- Per Jessen, Zürich (8.1°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (7)
-
Anton Aylward
-
Bernhard Voelker
-
Carlos E. R.
-
Dave Howorth
-
David Haller
-
ken
-
Per Jessen