Has anybody had any problems with find looping? I occasionally find in the morning that my overnight backup script hasn't finished because one of the jobs is blocked. When I look with top, I always discover a 'find' process is hogging a CPU (actually a HT of a dual-processor Xeon system): Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu1 : 0.0% us, 0.0% sy, 100.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu2 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu3 : 0.0% us, 0.0% sy, 0.0% ni, 98.3% id, 1.7% wa, 0.0% hi, 0.0% si PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19073 root 39 15 2924 704 2752 R 99.9 0.0 1833:14 find ps tells me a little about the process: root 19073 19055 98 Feb02 ? 1-06:44:52 find /backup/suse3/suse3-root/2006-02-01/tree -ls I did an strace -p but there was no output over five minutes. The system is SUSE 9.2, GNU find version 4.1.20. uname says: Linux suse1 2.6.8-24.14-smp #1 SMP Tue Mar 29 09:27:43 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux Has anybody seen this before? Thanks, Dave
Hi Dave, On Thursday 09 February 2006 05:10, Dave Howorth wrote:
Has anybody had any problems with find looping?
Not me.
I occasionally find in the morning that my overnight backup script hasn't finished because one of the jobs is blocked.
<snip> How robust is your backup script vis a vis error handling / time outs? You may need to modify your procedure or schedule so it doesn't conflict with find when it's called on schedule by cron. Carl
Dave, On Thursday 09 February 2006 02:10, Dave Howorth wrote:
Has anybody had any problems with find looping?
No, but I've seen it explicitly detect, report and circumvent symlink loops.
I occasionally find in the morning that my overnight backup script hasn't finished because one of the jobs is blocked. When I look with top, I always discover a 'find' process is hogging a CPU (actually a HT of a dual-processor Xeon system):
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19073 root 39 15 2924 704 2752 R 99.9 0.0 1833:14 find
...
I did an strace -p but there was no output over five minutes.
So you know that ps is in a fairly tight CPU loop, issuing no system calls at all. You could use "lsof" to find out which file-system resources ps is actively using. Anything other than the usual (its executable file, shared object libraries, standard in and out, etc.) might at least tell you where in the file system it was operating when it got stuck. You might also want to consider running a file system check. There could be some kind of corruption or inconsistency that's throwing ps for a loop, if you will.
...
Thanks, Dave
Randall Schulz
participants (3)
-
Carl Hartung
-
Dave Howorth
-
Randall R Schulz