Hello: I'd like to back up my files. I have zillion directories with zillion subdirs and it's possible that some files are duplicated. I would like to find them before backing up. I found a software that can do this but it's not free (zsDuplicateHunter). Do you know one that is opensource and available for suse? Thanks, IG _________________________________________________________________ Most még jobban megéri takarékoskodni! Állami támogatás és kedvező hitel az [origo] lakáskasszában! http://lakaskassza.origo.hu/index.html
I'd like to back up my files. I have zillion directories with zillion subdirs and it's possible that some files are duplicated. I would like to find them before backing up. I found a software that can do this but it's not free (zsDuplicateHunter). Do you know one that is opensource and available for suse?
find /dir -type f -print0 | xargs -0 md5sum | sort should do the trick. Jan Engelhardt --
Jan, Istvan, On Friday 18 August 2006 01:53, Jan Engelhardt wrote:
I'd like to back up my files. I have zillion directories with zillion subdirs and it's possible that some files are duplicated. I would like to find them before backing up. I found a software that can do this but it's not free (zsDuplicateHunter). Do you know one that is opensource and available for suse?
find /dir -type f -print0 | xargs -0 md5sum | sort
should do the trick.
To which I'd add one more pipline stage: |uniq -D -w 32 Hint: % uniq --help |egrep -e '-D|-w'uniq --help ... -D, --all-repeated[=delimit-method] print all duplicate lines ... -w, --check-chars=N compare no more than N characters in lines ... By adding this you'll see only the duplicated entries. I ran this on my publications directory (where I keep downloaded academic papers, mostly) and out of 5,577 files occupying 1.4 GB it found 432 unique duplicated files with 954 duplicates found. So in honor of its usefulness, I've created this script (to be elaborated later): ~/bin/duploc: -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- #!/bin/bash --norc find "$@" -type f -print0 \ |xargs -0 md5sum \ |sort \ |uniq -D -w 32 -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- Note that as written you can give find options (-follow, e.g.) after the directory names.
Jan Engelhardt
Randall Schulz
Hi Istvan, I had the same problem a few months ago, and I discover a shell script that I updated to find identical files in several directories. Unfortunaly I don't remember who the original author his. The script is quite fast once it looks for files with the same size and only then calculates their md5sums. The output is a shell script with all the duplicate files listed, so that you can choose the ones to delete. There is one problem that I could not solve and I don't know the exact reason, but it's related with files with "strange" characters in their names, like "'`'". But this "feature" only manifested once with files that I suspected had Chinese characters in their names. Let me have some feed-back if you use it. Regards Lívio Cipriano ----- Original Message ----- Hello: I'd like to back up my files. I have zillion directories with zillion subdirs and it's possible that some files are duplicated. I would like to find them before backing up. I found a software that can do this but it's not free (zsDuplicateHunter). Do you know one that is opensource and available for suse? Thanks, IG _________________________________________________________________ Most még jobban megéri takarékoskodni! Állami támogatás és kedvező hitel az [origo] lakáskasszában! http://lakaskassza.origo.hu/index.html
On Fri, 2006-08-18 at 10:21 +0200, Istvan Gabor wrote:
I'd like to back up my files. I have zillion directories with zillion subdirs and it's possible that some files are duplicated. I would like to find them before backing up. I found a software that can do this but it's not free (zsDuplicateHunter). Do you know one that is opensource and available for suse? Thanks, IG
FDUPES http://premium.caribe.net/~adrian2/fdupes.html Duff http://duff.sourceforge.net/
Istvan Gabor wrote:
I'd like to back up my files. I have zillion directories with zillion subdirs and it's possible that some files are duplicated. I would like to find them before backing up. I found a software that can do this but it's not free (zsDuplicateHunter). Do you know one that is opensource and available for suse?
Depending what you want to do after you find them, you might be interested in faster-dupemerge which locates duplicate files and turns them into hard links: http://www.furryterror.org/~zblaxell/dupemerge/dupemerge.html I haven't used it myself, but know of people who have. Cheers, Dave
Istvan Gabor wrote:
I'd like to back up my files. I have zillion directories with zillion subdirs and it's possible that some files are duplicated. I would like to find them before backing up. I found a software that can do this but it's not free (zsDuplicateHunter). Do you know one that is opensource and available for suse?
Depending what you want to do after you find them, you might be interested in faster-dupemerge which locates duplicate files and turns them into hard links:
http://www.furryterror.org/~zblaxell/dupemerge/dupemerge.html
I haven't used it myself, but know of people who have. Don't forget, that hard links only work with a single file system. If
Dave Howorth wrote: the dupes are on different partitions, a hard link won't work.
participants (7)
-
Dave Howorth
-
Istvan Gabor
-
James Knott
-
Jan Engelhardt
-
Lívio Cipriano
-
Randall R Schulz
-
rmyster