Bug ID 1051654
Summary duperemove doesn't handle small files: "Skipping small file"
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware All
OS Other
Status NEW
Severity Enhancement
Priority P5 - None
Component Basesystem
Assignee bnc-team-screening@forge.provo.novell.com
Reporter Ralf.Friedl@online.de
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

duperemove doesn't process files smaller than blocksize. blocksize default is
128k, minimum is 4k. According to the man page, the consideration for blocksize
is the memory requirement, as large files will need more blocks if the
blocksize is smaller.
> Use the specified block size. Raising the block size will consume less memory but may miss some duplicate blocks. Conversely, lowering the blocksize consumes more memory and may find more duplicate blocks. The default blocksize of 128K was chosen with these parameters in mind.

This doesn't imply that there is anything wrong with processing smaller files.
Yet duperemove ignores smaller files, whether they come from duperemove's own
file scan or from an external source with the --fdupes option.

I removed this restriction, and it seems to work fine. Of course, nothing
really bad can happen, as duperemove just passes files to the kernel, and the
kernel will verify the equality of the blocks. But the deduplication for small
files also works. And a lot of small files can also make a significant
reduction in disk space used.

On a related note, the fdupes program is really inefficient, you may want to
include jdupes instead or as an alternative. fdupes will stat the same file
more than 20 times instead of reusing the result, read the same file more than
once, and so on.

--- a/file_scan.c
+++ b/file_scan.c
@@ -205,12 +205,12 @@
                vprintf("Skipping non-regular file %s\n", name);
                goto out;
        }
-
+#if 0
        if (st->st_size < blocksize) {
                vprintf("Skipping small file %s\n", name);
                goto out;
        }
-
+#endif
        ret = access(name, R_OK);
        if (ret) {
                fprintf(stderr, "Error %d: %s while accessing file %s. "


You are receiving this mail because: