pkolaczk / fclones Goto Github PK
View Code? Open in Web Editor NEWEfficient Duplicate File Finder
License: MIT License
Efficient Duplicate File Finder
License: MIT License
Here's my output as it's running currently:
[2021-01-25 11:58:30.332] fclones: info: Started
[2021-01-25 11:58:41.402] fclones: info: Scanned 40512 file entries
[2021-01-25 11:58:41.402] fclones: info: Found 38084 (10.7 TB) files matching selection criteria
[2021-01-25 11:58:41.408] fclones: info: Found 15854 (4.2 TB) candidates after grouping by size
[2021-01-25 11:58:41.414] fclones: info: Found 15694 (3.3 TB) candidates after grouping by paths and file identifiers
[2021-01-25 12:00:32.283] fclones: info: Found 2159 (3.3 TB) candidates after grouping by prefix
[2021-01-25 12:00:53.996] fclones: info: Found 2159 (3.3 TB) candidates after grouping by suffix
Grouping by contents [=> ] 139.20GB/5.97TB
The size reporting in the log messages seems accurate given the data I'm running this tool on, but what confuses me is the 5.97TB
total grouping progress. If we have 3.3TB of candidates, I would expect to see matching numbers.
I expect this is something to do with the fact that a lot of the existing data consists of large files which are hard-linked and exist in two places, so depending on how the size count handles those files, that could be the source of the discrepancy. I'm not sure if this is just a clarity thing, or if this means that there's actually room for speeding up the hashing process - I'm not an expert obviously, but I assume that if these hard-linked files are hashed once, it would be unnecessary to hash any other files/paths that point to the same data.
Great work - already had a play with the utility on macos and it works great.
Just feedback, I also tried to compile to manjaro on the rpi4 and the compile failed on the hashing crate. I might ask the author o the fashhash-sys crate what would be required to allow it to compile on the arm architecture.
fclones git:(master) cargo build --release
Compiling fasthash-sys v0.3.2
Compiling getrandom v0.1.14
Compiling num_cpus v1.13.0
Compiling atty v0.2.14
error: failed to run custom build command for `fasthash-sys v0.3.2`
Caused by:
process didn't exit successfully: `/home/stuart/rust_projects/fclones/target/release/build/fasthash-sys-3bfb9e86593b1584/build-script-build` (exit code: 101)
--- stdout
TARGET = Some("aarch64-unknown-linux-gnu")
OPT_LEVEL = Some("3")
TARGET = Some("aarch64-unknown-linux-gnu")
HOST = Some("aarch64-unknown-linux-gnu")
TARGET = Some("aarch64-unknown-linux-gnu")
TARGET = Some("aarch64-unknown-linux-gnu")
HOST = Some("aarch64-unknown-linux-gnu")
CC_aarch64-unknown-linux-gnu = None
CC_aarch64_unknown_linux_gnu = None
HOST_CC = None
CC = None
HOST = Some("aarch64-unknown-linux-gnu")
TARGET = Some("aarch64-unknown-linux-gnu")
HOST = Some("aarch64-unknown-linux-gnu")
CFLAGS_aarch64-unknown-linux-gnu = None
CFLAGS_aarch64_unknown_linux_gnu = None
HOST_CFLAGS = None
CFLAGS = None
DEBUG = Some("false")
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/home/stuart/rust_projects/fclones/target/release/build/fasthash-sys-fc57bf495c3381b2/out/src/fasthash.o" "-c" "src/fasthash.cpp"
cargo:warning=cc: error: unrecognized command line option \u2018-msse4.2\u2019
cargo:warning=cc: error: unrecognized command line option \u2018-maes\u2019
cargo:warning=cc: error: unrecognized command line option \u2018-mavx\u2019
cargo:warning=cc: error: unrecognized command line option \u2018-mavx2\u2019
exit code: 1
--- stderr
thread 'main' panicked at '
Internal error occurred: Command "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/home/stuart/rust_projects/fclones/target/release/build/fasthash-sys-fc57bf495c3381b2/out/src/fasthash.o" "-c" "src/fasthash.cpp" with args "cc" did not execute successfully (status code exit code: 1).
', /home/stuart/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.55/src/lib.rs:1672:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: build failed
-- feel free to close this issue - I just wanted to give feedback.
Add an -o <file>
option. This would allow for more flexibility when building pipelines, e.g. wrapping fclones
with time
.
... in case someone wanted to create a GUI or just use file deduplication in their own programs.
Is the implementation of finding duplicates based on size?
Can it find duplicates even if there's a size mismatch and report the file names?
Is it possible to use fclones
to find duplicates between, but not within, two directory trees? Here's an example:
destination/
2021/
January/
A.jpg
source/
A1.jpg <-- copy of destination/2021/January/A.jpg (also same as A2.jpg)
A2.jpg <-- copy of destination/2021/January/A.jpg (also same as A1.jpg)
B1.jpg <-- same as B2.jpg
B2.jpg <-- same as B1.jpg
I want to identify A1.jpg
and A2.jpg
under source
as duplicates of A.jpg
in destination
.
B1.jpg
and B2.jpg
are also duplicates but only under sources
. They should be excluded from the match list because they don't match anything in destination
.
FWIW, the use case is a source folder of images that have previously been processed by scripts to rename them and sort them into a destination directory structure (e.g. by year and month, or by other EXIF metadata). Then we come across a new folder of images, some of which may have been processed previously, and we want to know if we can safely delete them because we already have copies in the destination directory.
Tried to compile from source on a FreeBSD jail, got these errors, tried again with the verbose flag to get more information.
Compiling fclones v0.17.0
error[E0063]: missing field l_sysid
in initializer of flock
--> /root/.cargo/registry/src/github.com-1ecc6299db9ec823/fclones-0.17.0/src/lock.rs:31:17
|
31 | let f = libc::flock {
| ^^^^^^^^^^^ missing l_sysid
error[E0063]: missing field l_sysid
in initializer of flock
--> /root/.cargo/registry/src/github.com-1ecc6299db9ec823/fclones-0.17.0/src/lock.rs:47:17
|
47 | let f = libc::flock {
| ^^^^^^^^^^^ missing l_sysid
error: aborting due to 2 previous errors
For more information about this error, try rustc --explain E0063
.
error: failed to compile fclones v0.17.0
, intermediate artifacts can be foundat /tmp/cargo-installPdO1Nf
Caused by:
could not compile fclones
See attached file for full message.
FreeBSD fclones Errors.docx
Example:
[2020-06-23 18:25:13.126] fclones: info: Found 963 (3.0 MB) duplicate files
hello,
I'm unable to (cross)compile a musl binary on aarch64 ("cross" as in from ubuntu, still aarch64)
[also tried compiling native on alpine aarch64, same result]
$ cargo install --target aarch64-unknown-linux-musl fclones
[cut]
Compiling reflink v0.1.3
error[E0308]: mismatched types
--> .cargo/registry/src/github.com-1ecc6299db9ec823/reflink-0.1.3/src/sys/unix.rs:21:39
|
21 | libc::ioctl(dest.as_raw_fd(), IOCTL_FICLONE, src.as_raw_fd())
| ^^^^^^^^^^^^^ expected `i32`, found `u64`
|
help: you can convert a `u64` to an `i32` and panic if the converted value doesn't fit
|
21 | libc::ioctl(dest.as_raw_fd(), IOCTL_FICLONE.try_into().unwrap(), src.as_raw_fd())
| ++++++++++++++++++++
For more information about this error, try `rustc --explain E0308`.
error: could not compile `reflink` due to previous error
warning: build failed, waiting for other jobs to finish...
error: failed to compile `fclones v0.17.1`
compiling with glibc works correctly
(rust 1.57.0)
regards,
m
Sometime i keep 2 times the content of a CF card full of video or photos.
It will be great to have a detection of directory which all content are already present somewhere else.
Running fclones dedupe
on some directory tree results in mtimes being updated for directories containing files that were deduplicated.
I don't know whether this should be addressed, because while mtimes may be desirable to preserve, the directories really were updated through file creation. This effect does make it less likely that I would want to use fclones dedupe
on old directory trees with potentially informative mtimes, though.
Add a compare with fdupes
: performance, features...
Hello @pkolaczk, Again Thanks a lot for this GREAT Tool. I do love the idea of parallel processing and using the power of rust in this tool! and I do have a couple of questions and maybe feature requests that I want to discuss with you.
So First of all, I Tested this tool on a low power DS215j NAS Device.
CPU: MARVELL Armada 375 88F6720 - Dual Core - 800 MHz (ARMv7)
RAM: 512 MB
HDD: 6TB WD NAS Drive
and those are my questions after testing:
This is useful when you have too much duplicates that the terminal window cannot handle. In my case I want to run the command on Screen and then come later to get the results.
I Tried the command below, but this don't give me any Progress/Status from the tool:
sudo /usr/bin/time --verbose **./fclones ~ -R --format JSON** |& tee -a /volume2/duplicatesdata.json
This could be helpful to how much time each phase took, Something like:
2020-06-20T21:57:06 [INFO] - fclones: info: Scanned 4687831 file entries
2020-06-21T05:57:06 [INFO] - fclones: fclones: info: Found 3857155 (5.4 TB) files matching selection criteria
2020-06-21T08:57:06 [INFO] - fclones: fclones: info: Found 3447623 (1.4 TB) candidates after grouping by size
In my Case The Device is slow & HDD is Big, Sometimes in other tools I need to run duplicate analysis for 6 days.
When I was trying this tool I lost power after leaving it running for 2 days. So this can help me run the tool again to continue analysis.
From what I can Seen in here Metrohash is a great hashing algorithm. but it is optimized for machine-specific (x64 SSE4.2) x86-64 architectures.
So Adding another algorithm that is not machine-specific would be a great addition.
Running fclones on a relatively small directory, I noticed its performance is surprisingly bad:
$ time fclones group ~/Downloads/
[2021-06-06 18:57:22.658] fclones: info: Started grouping
[2021-06-06 18:57:23.091] fclones: info: Scanned 967 file entries
[2021-06-06 18:57:23.091] fclones: info: Found 873 (2.7 GB) files matching selection criteria
[2021-06-06 18:57:23.091] fclones: info: Found 47 (9.0 MB) candidates after grouping by size
[2021-06-06 18:57:23.092] fclones: info: Found 47 (9.0 MB) candidates after grouping by paths and file identifiers
[2021-06-06 18:57:23.097] fclones: info: Found 45 (8.5 MB) candidates after grouping by prefix
[2021-06-06 18:57:23.105] fclones: info: Found 45 (8.5 MB) candidates after grouping by suffix
[2021-06-06 18:57:23.125] fclones: info: Found 45 (8.5 MB) redundant files
<...>
real 0m0.481s
user 0m0.271s
sys 0m0.195s
So it takes about 0.5 sec to process a directory with less than 1000 files. I noticed that most of the time is spent in "Initializing" phase. So I ran strace:
$ strace -c fclones group ~/Downloads/
<...>
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ------------------
34.07 0.349066 5 61719 1563 openat
25.31 0.259249 4 60156 close
23.58 0.241530 4 53660 newfstatat
6.21 0.063639 6 10372 read
4.34 0.044439 6 6691 readlinkat
3.39 0.034740 7 4502 5 access
1.92 0.019707 221 89 17 futex
0.60 0.006150 5 1096 getdents64
<...>
So, it appears fclones makes 60k openat
, 60k close
, and 54k newfstat
. This is very surprising.
Inspecting openat
syscalls it seems that most of them are traversing /sys/
filesystem. Here is a fragment of strace output (filtered by openat
syscall):
openat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:14.0/usb2/2-4/2-4:1.0/uevent", O_RDONLY|O_CLOEXEC) = 5
openat(AT_FDCWD, "/run/udev/data/+usb:2-4:1.0", O_RDONLY|O_CLOEXEC) = 5
openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 5
openat(5, "sys", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 6
openat(6, "bus", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
openat(5, "usb", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 6
openat(6, "devices", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
openat(5, "1-4.4:1.0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 6
openat(5, "..", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 6
openat(6, "..", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
openat(5, "..", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 6
openat(6, "devices", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
openat(5, "pci0000:00", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 6
openat(6, "0000:00:14.0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
openat(5, "usb1", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 6
openat(6, "1-4", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
openat(5, "1-4.4", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 6
openat(6, "1-4.4:1.0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
I get the following messages running fclones group yyy :
[2021-10-24 14:33:02.212] fclones: warn: Failed to fetch extents for file : Operation not supported (os error 95)
Maybe it's harmless, my problem is that I don't know what it means.
I am using version: fclones 0.17.0
system is xubuntu 20.04 (everything upgraded)
I run fclones on aboaut 2TiB of data - 200000 files of all sizes
filesystem is zfs (no mirror or raid but encrypted)
disk is spinning disk 4TiB
Warning occurs on 10 files of the 200000
Any ideas? Could the warning message be somewhat more verbose?
Can I just ignore it?
By the way - I have run a brief speed comparison on that data above - here are my results:
(Intel quad core - 16GB ram - disk cache fully loaded from previous operations)
fclones group xxx 34 min
rdfind xxx 81 min
jdupes -S -M -Q -r xxx 90 min
rmlint -T df xxx 134 min
Pretty impresive!!!
Hello Piotr,
Thanks for your work on fclones,
I had a plan to use for deduplication for files on my NAS, however I encountered a strange problem.
Please find directory contents on my nas:
'IMG_20210416_204824 (1).jpg'* 'IMG_20210416_204830 (2).jpg'* IMG_20210416_204845.jpg* 'IMG_20210416_223757 (1).jpg'* 'IMG_20210416_224002 (2).jpg'* IMG_20210416_224021.jpg* IMG_20210416_232550.jpg*
'IMG_20210416_204824 (2).jpg'* IMG_20210416_204830.jpg* 'IMG_20210416_204847 (1).jpg'* 'IMG_20210416_223757 (2).jpg'* IMG_20210416_224002.jpg* 'IMG_20210416_224022 (1).jpg'* Thumbs.db*
IMG_20210416_204824.jpg* 'IMG_20210416_204832 (1).jpg'* 'IMG_20210416_204847 (2).jpg'* IMG_20210416_223757.jpg* 'IMG_20210416_224015 (1).jpg'* 'IMG_20210416_224022 (2).jpg'* VID_20210416_224914.mp4*
'IMG_20210416_204827 (1).jpg'* 'IMG_20210416_204832 (2).jpg'* IMG_20210416_204847.jpg* 'IMG_20210416_223758 (1).jpg'* 'IMG_20210416_224015 (2).jpg'* IMG_20210416_224022.jpg* VID_20210416_232553.mp4*
'IMG_20210416_204827 (2).jpg'* IMG_20210416_204832.jpg* 'IMG_20210416_223755 (1).jpg'* 'IMG_20210416_223758 (2).jpg'* IMG_20210416_224015.jpg* 'IMG_20210416_224107 (1).jpg'*
IMG_20210416_204827.jpg* 'IMG_20210416_204845 (1).jpg'* 'IMG_20210416_223755 (2).jpg'* IMG_20210416_223758.jpg* 'IMG_20210416_224021 (1).jpg'* 'IMG_20210416_224107 (2).jpg'*
'IMG_20210416_204830 (1).jpg'* 'IMG_20210416_204845 (2).jpg'* IMG_20210416_223755.jpg* 'IMG_20210416_224002 (1).jpg'* 'IMG_20210416_224021 (2).jpg'* IMG_20210416_224107.jpg*
Please find that files with (1)
or (2)
in their names are duplicates for sure, I confirmed this by md5sum
cmd - they simply have the same size.
Directory is mounted as type cifs (rw,relatime,vers=3.1.1,cache=strict,username=agnieszka,uid=1000,forceuid,gid=1000,forcegid,addr=10.0.0.10,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)
fclones --version 0.10.2
linux version 5.4.102-rt53-MANJARO
While I'm in this directory with duplicates I use the following fclones cmd: fclones .
And I got report:
[2021-04-17 20:51:17.333] fclones: info: Started
[2021-04-17 20:51:18.125] fclones: info: Scanned 1 file entries
[2021-04-17 20:51:18.125] fclones: info: Found 0 (0 B) files matching selection criteria
[2021-04-17 20:51:18.125] fclones: info: Found 0 (0 B) candidates after grouping by size
[2021-04-17 20:51:18.126] fclones: info: Found 0 (0 B) candidates after grouping by paths and file identifiers
[2021-04-17 20:51:18.126] fclones: info: Found 0 (0 B) candidates after grouping by prefix
[2021-04-17 20:51:18.126] fclones: info: Found 0 (0 B) candidates after grouping by suffix
[2021-04-17 20:51:18.126] fclones: info: Found 0 (0 B) duplicate files
And same story (report) for fclones . --names '*.jpg'
It looks like fclones
does not see these files correctly. I thought this is because of their long names with whitespaces (sorry, these are names generated by my phone). I renamed two duplicates for simples names like a.jpg
and b.jpg
but I got same results - no duplications found.
Interestingly I tracked fclones
by strace
and there is no single strace log which claims fclones
reads any files in this location.
Finally, I copied all these files to local directory on my disk and... same results - no duplications found.
Please let me know if you would need additional data about this issue to diagnose this problem
Thanks in advance
Jody Bruchon found that the default performance on a single spinning drive was bad.
This doesn't surprise me, because all the settings like parallelism level, buffer sizes etc are tuned for SSD, and they are really bad for spinning drives.
I'm running this in order to test that my --priority
and --keep-path
values are right:
watch 'fclones remove --dry-run < results-file.out'
Unfortunately, it's not working because the results are not ordered. So the command is rerun, and the list items dance around every time watch
updates.
I thought maybe concurrency was affecting it, so I tried --threads main:1
but that only seems to apply to the group
action and not the others.
My workaround right now is to run the results through | sort
but it's not ideal because I need to look for each result in the alphabetical list.
So my feature request is to add some ordering to the result list, preferably in the same order as the input list. Even if the ordering is not exposed in the CLI options in any way, it would be a help.
Currently logging and progress bar implementations are tightly coupled to the fclones engine. These should be replaced by traits + adapters so they can be swapped to different implementations e.g. in a GUI app.
You should commit the Cargo.lock
file after building the project since it's a Rust binary. I suggest you to remove Cargo.lock
line from .gitignore#L3
For more information, please see: https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries
Some programs like exiv2
modify the file instead of creating a new file without modifying the original.
In such cases, using --transform
is not possible without additional scripting to make a copy before modification.
A new option --transform-copy
would first make a copy of each file into a temporary directory, and then invoke the external program on that copy.
Running fclones *
in Windows 10 results in an error (sorry for the error message in Polish ;-)
[2020-07-19 16:53:35.267] fclones.exe: error: Failed to stat C:\*: Nazwa pliku, nazwa katalogu lub składnia etykiety woluminu jest niepoprawna. (os error 123)
[2020-07-19 16:53:35.269] fclones.exe: info: Scanned 0 file entries
[2020-07-19 16:53:35.270] fclones.exe: info: Found 0 (0 B) files matching selection criteria
[2020-07-19 16:53:35.272] fclones.exe: info: Found 0 (0 B) candidates after grouping by size
[2020-07-19 16:53:35.274] fclones.exe: info: Found 0 (0 B) candidates after pruning hard-links
[2020-07-19 16:53:35.277] fclones.exe: info: Found 0 (0 B) candidates after grouping by prefix
[2020-07-19 16:53:35.278] fclones.exe: info: Found 0 (0 B) candidates after grouping by suffix
[2020-07-19 16:53:35.280] fclones.exe: info: Found 0 (0 B) duplicate files
Running fclones . -R
works properly and also running under WSL works properly.
Problem: when I run the command "fclones group . | fclones remove" on Windows 10 (Build 19042.985), I get the error "Failed to read file list: Malformed group header: F:\Photos\Sorted Photos\2005\07._DSC00284.jpg."
I had ran "fclones group ." and it worked flawlessly, so I naturally wanted to remove the duplicate files. After including the "| flcones remove", it gave me this error. Is it something to do with the file starting with a '.'?
Persist hashes to a file in order to speedup subsequent runs or to avoid recomputing hashes when the previous run was interrupted.
fclones fails to build on my Arch Linux f2fs partition.
failures:
---- dedupe::test::test_partition_respects_creation_time_priority stdout ----
[2021-07-29 18:44:45.674] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/ctime_priority/file_3: creation time is not available for the filesystem
[2021-07-29 18:44:45.674] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/ctime_priority/file_2: creation time is not available for the filesystem
[2021-07-29 18:44:45.674] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/ctime_priority/file_2: creation time is not available for the filesystem
[2021-07-29 18:44:45.674] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/ctime_priority/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_creation_time_priority' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:856:80
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- dedupe::test::test_partition_respects_drop_patterns stdout ----
[2021-07-29 18:44:45.675] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/drop/file_3: creation time is not available for the filesystem
[2021-07-29 18:44:45.675] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/drop/file_2: creation time is not available for the filesystem
[2021-07-29 18:44:45.675] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/drop/file_2: creation time is not available for the filesystem
[2021-07-29 18:44:45.675] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/drop/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_drop_patterns' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:923:68
---- dedupe::test::test_partition_respects_keep_patterns stdout ----
[2021-07-29 18:44:45.675] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/keep/file_3: creation time is not available for the filesystem
[2021-07-29 18:44:45.675] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/keep/file_2: creation time is not available for the filesystem
[2021-07-29 18:44:45.675] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/keep/file_2: creation time is not available for the filesystem
[2021-07-29 18:44:45.675] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/keep/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_keep_patterns' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:904:68
---- dedupe::test::test_run_dedupe_script stdout ----
[2021-07-29 18:44:45.676] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/dedupe_script/file_3: creation time is not available for the filesystem
[2021-07-29 18:44:45.676] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/dedupe_script/file_2: creation time is not available for the filesystem
[2021-07-29 18:44:45.676] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/dedupe_script/file_2: creation time is not available for the filesystem
[2021-07-29 18:44:45.676] fclones-6cdeb7b3f6a11fd5: warn: Failed to read creation time of file /home/et/yay/fclones/src/fclones-0.12.3/target/test/dedupe/partition/dedupe_script/file_1: creation time is not available for the filesystem
[2021-07-29 18:44:45.676] fclones-6cdeb7b3f6a11fd5: warn: Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read.
thread 'dedupe::test::test_run_dedupe_script' panicked at 'assertion failed: `(left == right)`
left: `0`,
right: `2`', src/dedupe.rs:944:13
failures:
dedupe::test::test_partition_respects_creation_time_priority
dedupe::test::test_partition_respects_drop_patterns
dedupe::test::test_partition_respects_keep_patterns
dedupe::test::test_run_dedupe_script
test result: FAILED. 94 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.49s
error: test failed, to rerun pass '--lib'
Add an option to choose a different hash function.
E.g.:
fclones --hash highway -R ~
Greetings!
I've been playing a bit with fclones
this morning (super cool tool, BTW) and wanted to use the --stdin
parameter to read the list of files to analyze from the output of find
. Based on the documentation it seems like passing the input to fclones group --stdin
should work, but whenever I try this I always get an error: fclones: error: No input files
Here's a simple, trivial example of what I mean:
localhost~ % fclones --version
fclones 0.12.2
localhost~ % mkdir blah
localhost~ % cd blah
localhost~/blah % touch {1,2,3}.c
localhost~/blah % find . -name '*.c'
./1.c
./2.c
./3.c
localhost~/blah % find . -name '*.c' | fclones group --stdin
[2021-06-19 15:01:12.126] fclones: error: No input files
I'm not sure if I'm doing something incorrectly - any ideas? Thanks in advance!
Hey!
I'm maintaining fclones and fclones-bin packages on AUR and I saw 0.9.0
release doesn't have binary artifacts. Unfortunately I couldn't bump the -bin
package because of that.
Are you considering to bring back binary artifacts (e.g. fclones-$pkgver.tgz
) along with the upcoming releases?
The output of --dry-run
for link
and dedupe
moves the original files out of the way via mv
(just like fclones
itself) but then completely ignores possible failure of the next command and removes the backup in the following step.
If the calling shell does not have errexit
set this can lead to data loss (actually just filename loss) if the ln
/cp
fails.
I would suggest to just print the action for each line instead and optionally emit a shell header which is called and includes proper error handling.
fclones
looks great, but installing a whole Rust build stack to test it out is a bit of barrier. At some point it would be great if MacOS-compatible builds where generated & provided as part of the regular release process.
Found when testing #64
I'm running into an issue - scanning 20GB over the network and was wondering when the last stage is grouping by content would be great to be able to get duplicates written out as they come. For the sake of better UX maybe only stream them when there is an argument to write the output to a file. This way if I interrupt or something interrupts the content hashing I can still get a partial result.
When running fclones group -I
, it seems to be finding duplicate files underneath the same root (path argument on command line). For example, if I construct a tree like:
echo hi > source.txt
mkdir -p {a,b}/{1,2}
for i in {a,b}/{1,2}/test; do cp source.txt "${i}"; done
and then run fclones group -I a b
, I get:
[2021-11-17 16:08:01.482] fclones: info: Started grouping
[2021-11-17 16:08:02.019] fclones: info: Scanned 10 file entries
[2021-11-17 16:08:02.019] fclones: info: Found 4 (12 B) files matching selection criteria
[2021-11-17 16:08:02.019] fclones: info: Found 3 (9 B) candidates after grouping by size
[2021-11-17 16:08:02.019] fclones: info: Found 3 (9 B) candidates after grouping by paths and file identifiers
[2021-11-17 16:08:02.033] fclones: info: Found 3 (9 B) candidates after grouping by prefix
[2021-11-17 16:08:02.033] fclones: info: Found 3 (9 B) candidates after grouping by suffix
[2021-11-17 16:08:02.034] fclones: info: Found 3 (9 B) redundant files
# Report by fclones 0.17.1
# Timestamp: 2021-11-17 16:08:02.036 -0500
# Command: fclones group -I a b
# Found 1 file groups
# 9 B (9 B) in 3 redundant files can be removed
e872d4a1bdc12e1262820a95eebb530a, 3 B (3 B) * 4:
/tmp/tree/a/1/test
/tmp/tree/a/2/test
/tmp/tree/b/1/test
/tmp/tree/b/2/test
fclone should offer a way of deleting / hardlinking / softlinking duplicated files automatically.
In #25:
@pkolaczk wrote:
That's right, fclones doesn't offer any way of deleting files automatically yet. I believe this is a task for a different program (or a subcommand) that would take output of fclones.
and @piranna replied:
From a UNIX perspective, yes, it makes sense that task being done by another command, but being so much attached to fclones output format... :-/ Maybe a shell script wrapper that offer a compatible interface with fdupes? :-) That would be easy to implement, but not sure if It should be hosted here un fclones repo or being totally independent...
IMHO, a postprocessing script parsing the fclones output might require more complexity than adding a CLI switch. For instance, here's an (untested) python implementation that leverages the CSV output (expected in fclones_out.csv
) to replace duplicates with hard links:
#!/usr/bin/env python
import logging
from os import link, unlink
from os.path import isfile
def main() -> None:
with open("fclones_out.csv") as f_handler:
for duplicates in (
fclone_output_line.split(",")[3:]
for fclone_output_line in f_handler.readlines()
if not fclone_output_line.startswith("size")
):
src = duplicates[0]
for dst in duplicates[1:]:
logging.debug("%s -> %s", src, dst)
if isfile(dst):
unlink(dst)
link(src, dst)
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG)
main()
PS: I think this deserves a ticket on its own, feel free to delete it if you don't agree. :-)
Hello, I am unable to compile fclones due to an error. This happens with both the AUR package and manually running cargo build --release
.
OS: Manjaro Linux x86_64
Rust/Cargo version 1.49.0
error[E0004]: non-exhaustive patterns: `Removable` not covered
--> src/device.rs:22:24
|
22 | 0 => match disk_type {
| ^^^^^^^^^ pattern `Removable` not covered
|
::: /home/timothy/.cargo/registry/src/github.com-1ecc6299db9ec823/sysinfo-0.15.4/src/common.rs:257:5
|
257 | Removable,
| --------- not covered
|
= help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
= note: the matched value is of type `DiskType`
error[E0004]: non-exhaustive patterns: `Removable` not covered
--> src/device.rs:41:15
|
41 | match self.disk_type {
| ^^^^^^^^^^^^^^ pattern `Removable` not covered
|
::: /home/timothy/.cargo/registry/src/github.com-1ecc6299db9ec823/sysinfo-0.15.4/src/common.rs:257:5
|
257 | Removable,
| --------- not covered
|
= help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
= note: the matched value is of type `DiskType`
error[E0004]: non-exhaustive patterns: `Removable` not covered
--> src/device.rs:49:23
|
49 | FileLen(match self.disk_type {
| ^^^^^^^^^^^^^^ pattern `Removable` not covered
|
::: /home/timothy/.cargo/registry/src/github.com-1ecc6299db9ec823/sysinfo-0.15.4/src/common.rs:257:5
|
257 | Removable,
| --------- not covered
|
= help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
= note: the matched value is of type `DiskType`
error[E0004]: non-exhaustive patterns: `Removable` not covered
--> src/device.rs:57:23
|
57 | FileLen(match self.disk_type {
| ^^^^^^^^^^^^^^ pattern `Removable` not covered
|
::: /home/timothy/.cargo/registry/src/github.com-1ecc6299db9ec823/sysinfo-0.15.4/src/common.rs:257:5
|
257 | Removable,
| --------- not covered
|
= help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
= note: the matched value is of type `DiskType`
error[E0004]: non-exhaustive patterns: `Removable` not covered
--> src/device.rs:69:23
|
69 | FileLen(match self.disk_type {
| ^^^^^^^^^^^^^^ pattern `Removable` not covered
|
::: /home/timothy/.cargo/registry/src/github.com-1ecc6299db9ec823/sysinfo-0.15.4/src/common.rs:257:5
|
257 | Removable,
| --------- not covered
|
= help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
= note: the matched value is of type `DiskType`
error[E0004]: non-exhaustive patterns: `Removable` not covered
--> src/device.rs:100:31
|
100 | let p = match disk_type {
| ^^^^^^^^^ pattern `Removable` not covered
|
::: /home/timothy/.cargo/registry/src/github.com-1ecc6299db9ec823/sysinfo-0.15.4/src/common.rs:257:5
|
257 | Removable,
| --------- not covered
|
= help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
= note: the matched value is of type `DiskType`
error: aborting due to 6 previous errors
For more information about this error, try `rustc --explain E0004`.
error: could not compile `fclones`
To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed
I just created an AUR package for Archlinux. You can find it here: https://aur.archlinux.org/packages/fclones-git/
It might be worth adding its name and a link to it in the README file.
Hi,
Your tool looks very promosing, so I wanted to give it a go on Mac. Unfortunately I get build errors. Mainly about unresolved PosixFadviseAdvice
.
short trace
error[E0433]: failed to resolve: use of undeclared type or module `PosixFadviseAdvice`
If have zero RUST skills but if you have some advice I would gladly help you out.
Btrfs supports in place dedup ( https://btrfs.wiki.kernel.org/index.php/Deduplication ), via a syscall. This is completely safe, as checks if the files are identical before deduping.
This would be very Linux specific low level code.
This dependency makes it harder to install fclones on some platforms. Let's switch to regex crate.
It looks like stat
does not report creation time from zfs properly, listing now "birth". I assume what ever fclones is using is doing similar and not getting a creation time reported. I'm still digging around for details, Does ZFS store "Birth Time" or "Creation Time" ? is what I've uncovered so far.
failures:
---- dedupe::test::test_partition_respects_keep_patterns stdout ----
[2021-06-05 20:02:23.458] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/keep/file_3: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/keep/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/keep/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/keep/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_keep_patterns' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:904:68
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- dedupe::test::test_partition_respects_drop_patterns stdout ----
[2021-06-05 20:02:23.458] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_3: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_drop_patterns' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:923:68
---- dedupe::test::test_partition_respects_creation_time_priority stdout ----
[2021-06-05 20:02:23.458] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/ctime_priority/file_3: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/ctime_priority/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/ctime_priority/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/ctime_priority/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_creation_time_priority' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:856:80
---- dedupe::test::test_run_dedupe_script stdout ----
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/dedupe_script/file_3: creation time is not available for the filesystem
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/dedupe_script/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/dedupe_script/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261: warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/dedupe_script/file_1: creation time is not available for the filesystem
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261: warn: Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read.
thread 'dedupe::test::test_run_dedupe_script' panicked at 'assertion failed: `(left == right)`
left: `0`,
right: `2`', src/dedupe.rs:944:13
failures:
dedupe::test::test_partition_respects_creation_time_priority
dedupe::test::test_partition_respects_drop_patterns
dedupe::test::test_partition_respects_keep_patterns
dedupe::test::test_run_dedupe_script
test result: FAILED. 92 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out; finished in 65.13s
0 ✓ fryfrog@apollo ~ $ ls -alh /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
-rw-r--r-- 1 fryfrog fryfrog 0 Jun 5 20:02 /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
0 ✓ fryfrog@apollo ~ $ stat /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
File: /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
Size: 0 Blocks: 1 IO Block: 131072 regular empty file
Device: 19h/25d Inode: 2938048 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ fryfrog) Gid: ( 1000/ fryfrog)
Access: 2021-06-05 20:02:23.449795401 -0700
Modify: 2021-06-05 20:02:23.449795401 -0700
Change: 2021-06-05 20:02:23.449795401 -0700
Birth: -
0 ✓ fryfrog@apollo ~ $ sudo zdb -O rpool/ROOT/arch home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
Object lvl iblk dblk dsize dnsize lsize %full type
2938048 1 128K 512 0 512 512 0.00 ZFS plain file
0 ✓ fryfrog@apollo ~ $ sudo zdb -ddddd rpool/ROOT/arch 2938048
Dataset rpool/ROOT/arch [ZPL], ID 394, cr_txg 20, 81.7G, 1547348 objects, rootbp DVA[0]=<0:2287a77000:1000> DVA[1]=<0:2875361000:1000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=140448231L/140448231P fill=1547348 cksum=11e787a4a1:3022d1624a43:45bff9d1ab72ff:480dc35cda0302c4
Object lvl iblk dblk dsize dnsize lsize %full type
2938048 1 128K 512 0 512 512 0.00 ZFS plain file
176 bonus System attributes
dnode flags: USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
path /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
uid 1000
gid 1000
atime Sat Jun 5 20:02:23 2021
mtime Sat Jun 5 20:02:23 2021
ctime Sat Jun 5 20:02:23 2021
crtime Sat Jun 5 20:02:23 2021
gen 140447745
mode 100644
size 0
parent 2938047
links 1
pflags 840800000004
There was a proposal for rdfind to use existing ZFS checksum, which is created when a file is written to ZFS. This may result in much faster comparison, especially for big files on ZFS.
I think this would be great enhancement for fclones.
Here is the original rdfind post.
Thank you for this nice tool!
Hi,
I see that your app do some crazy optimizations for SSD and HDD, and I'm curious how fast it works in comparison to my app Czkawka(it use mostly simple optimizations and rather primitive algorighms, since I focus more on GUI).
I'm almost sure that with big amount of duplicated files fclones will be faster, but I'm curious if with second scan Czkawka will be faster due using caching hash results.
subj
Hi, Piotr
I found something obvious - but nevertheless interesting.
I ran my regex against the fclones report to get 2 text files - a set of unique files and a set of duplicate files (minus one copy to use as an original).
I found that the set of unique files still had some duplicates images - which were different on the hash and file size due to exif data.
It seems that the camera sourced exif metadata was the superset and a number of fields (maybe half of them) were dropped when the photos were imported into Apple iphotos.
So, that got myself and a friend wondering how easy/hard it would be to pipe images (on the fly) stripped of exif data via exiftool to fclones which could then create the report ignoring the exif data (since it would no longer be there) - and then finally maybe parse the data again to sort on largest size first based on the persistent size on the disk.
The largest size file (where exif data was different would be indicative of the richest data set to keep as the original - which would be easier to regex to keep if sorted to the top for each set of duplicates.
Happy to have a play with this idea, but if you have any thoughts about this - specifically about ingesting from exiftool into fclones I would be keen to hear about it.
Cheers,
Stu
The first answer here is suggesting a similar approach to the same kind of problem.
ref:https://softwarerecs.stackexchange.com/questions/51032/compare-two-image-files-for-identical-data-excluding-metadata
It would be nice to cargo install fclones
, have cargo track versions, and so on. This could be part of the CI pipeline, and/or there's cargo-release
which handles tagging and so on.
If files change after an fclones group
run without updating the timestamps and remain the same size, then the fclones link
command (and others) can lead to data loss:
$ mkdir z; cd z; echo same > 1; echo same > 2; echo abcd > Z
$ cat ?
same
same
abcd
$ fclones group . -o log
$ cp -a Z 1 # timestamp is kept
$ fclones link < log
$ cat ?
abcd
abcd
abcd
This could be avoided by also checking that the ctime of a file is older than the start of the group
run, and if not re-checking or aborting.
Maybe even add a --paranoid
option to check the content bite-by-byte before acting on it. But even in this case I am not aware of any (Unix) way to guarantee exclusive write access to a file, so maybe mention that the checked data is expected to not change.
$ fclones remove ... < dupes.json
fclones: error: Input error: Not a default fclones report. Formats other than the default one are not supported yet.
I found it very useful to process the fclones group
JSON output with jq
and would like continue the workflow with fclones remove
.
fclones dedupe
doesn't seem to preserve mtimes on Linux. Preserving mtimes seems like something that should be both possible and desirable, but please let me know if I missed something.
I tested this with btrfs on NixOS 21.11-pre.
# uname -a
Linux ra 5.10.76-hardened1 #1-NixOS SMP Wed Oct 27 07:56:57 UTC 2021 x86_64 GNU/Linux
# fclones --version
fclones 0.17.0
# cp -a /etc/passwd ./
# touch --date 2009-01-01 passwd
# l
total 4,096
-rw-r--r-- 1 at at 3,891 2009-01-01 00:00 passwd
# cp -a passwd passwd.2
# l
total 8,192
-rw-r--r-- 1 at at 3,891 2009-01-01 00:00 passwd.2
-rw-r--r-- 1 at at 3,891 2009-01-01 00:00 passwd
# fclones group . | fclones dedupe
[2021-10-29 04:25:29.532] fclones: info: Started grouping
[2021-10-29 04:25:29.540] fclones: info: Scanned 3 file entries
[2021-10-29 04:25:29.540] fclones: info: Found 2 (7.8 KB) files matching selection criteria
[2021-10-29 04:25:29.540] fclones: info: Found 1 (3.9 KB) candidates after grouping by size
[2021-10-29 04:25:29.540] fclones: info: Found 1 (3.9 KB) candidates after grouping by paths and file identifiers
[2021-10-29 04:25:29.552] fclones: info: Found 1 (3.9 KB) candidates after grouping by prefix
[2021-10-29 04:25:29.552] fclones: info: Found 1 (3.9 KB) candidates after grouping by suffix
[2021-10-29 04:25:29.552] fclones: info: Found 1 (3.9 KB) redundant files
[2021-10-29 04:25:29.553] fclones: info: Started deduplicating
[2021-10-29 04:25:29.561] fclones: info: Processed 1 files and reclaimed up to 3.9 KB space
# l
total 8,192
-rw-r--r-- 1 at at 3,891 2009-01-01 00:00 passwd
-rw-r--r-- 1 at at 3,891 2021-10-29 04:25 passwd.2
I also tested a user xattr and it did seem to be preserved.
This paper reports that ordering accesses by inode id or by physical block location retrieved with ioctl fiemap API can give substantial performance improvements.
These techniques could be applied for the partial hashing phase of fclones
, where seek time and rotational latency are the major bottleneck.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.