http://bugzilla.suse.com/show_bug.cgi?id=1051654 Bug ID: 1051654 Summary: duperemove doesn't handle small files: "Skipping small file" Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: All OS: Other Status: NEW Severity: Enhancement Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: Ralf.Friedl@online.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- duperemove doesn't process files smaller than blocksize. blocksize default is 128k, minimum is 4k. According to the man page, the consideration for blocksize is the memory requirement, as large files will need more blocks if the blocksize is smaller.
Use the specified block size. Raising the block size will consume less memory but may miss some duplicate blocks. Conversely, lowering the blocksize consumes more memory and may find more duplicate blocks. The default blocksize of 128K was chosen with these parameters in mind.
This doesn't imply that there is anything wrong with processing smaller files. Yet duperemove ignores smaller files, whether they come from duperemove's own file scan or from an external source with the --fdupes option. I removed this restriction, and it seems to work fine. Of course, nothing really bad can happen, as duperemove just passes files to the kernel, and the kernel will verify the equality of the blocks. But the deduplication for small files also works. And a lot of small files can also make a significant reduction in disk space used. On a related note, the fdupes program is really inefficient, you may want to include jdupes instead or as an alternative. fdupes will stat the same file more than 20 times instead of reusing the result, read the same file more than once, and so on. --- a/file_scan.c +++ b/file_scan.c @@ -205,12 +205,12 @@ vprintf("Skipping non-regular file %s\n", name); goto out; } - +#if 0 if (st->st_size < blocksize) { vprintf("Skipping small file %s\n", name); goto out; } - +#endif ret = access(name, R_OK); if (ret) { fprintf(stderr, "Error %d: %s while accessing file %s. " -- You are receiving this mail because: You are on the CC list for the bug.