Jan, Istvan, On Friday 18 August 2006 01:53, Jan Engelhardt wrote:
I'd like to back up my files. I have zillion directories with zillion subdirs and it's possible that some files are duplicated. I would like to find them before backing up. I found a software that can do this but it's not free (zsDuplicateHunter). Do you know one that is opensource and available for suse?
find /dir -type f -print0 | xargs -0 md5sum | sort
should do the trick.
To which I'd add one more pipline stage: |uniq -D -w 32 Hint: % uniq --help |egrep -e '-D|-w'uniq --help ... -D, --all-repeated[=delimit-method] print all duplicate lines ... -w, --check-chars=N compare no more than N characters in lines ... By adding this you'll see only the duplicated entries. I ran this on my publications directory (where I keep downloaded academic papers, mostly) and out of 5,577 files occupying 1.4 GB it found 432 unique duplicated files with 954 duplicates found. So in honor of its usefulness, I've created this script (to be elaborated later): ~/bin/duploc: -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- #!/bin/bash --norc find "$@" -type f -print0 \ |xargs -0 md5sum \ |sort \ |uniq -D -w 32 -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- Note that as written you can give find options (-follow, e.g.) after the directory names.
Jan Engelhardt
Randall Schulz