Comments (10)
@aurelg The postprocessing step would be fast and definitely not a bottleneck. The main bottleneck is I/O for reading files to compute the hashes.
I generally agree this feature is much easier to implement inside fclones.
However, this is not as simple as the provided python script. When automatcally deleting user files, one has to be extremely cautious. E.g. there may be some edge cases when e.g. during the scanning phase a file was moved to a different location and fclones registered it as a duplicate. But at the moment it wants to delete it, there is no duplicate anymore.
This:
if isfile(dst):
unlink(dst)
link(src, dst)
might end up deleting the only existing file.
Better to move the file first, before deleting, then create a link, then if all ok, drop the moved file.
from fclones.
What do you means, exactly?
Most of the python code above deals with reconstructing proper datastructures from the fclones output. I guess such datastructures are probably already available in fclones. A dedicated flag could bypass the need for implementing (and maintaining) a parser.
I'm not very happy with the python dependency either. IMHO the link between an independent python project and fclones would be so tight that I don't think it's worth the split.
I'd prefer a shell-based approach as well. It would be more portable, but I fear it could be rather limiting later, though (as it becomes pretty complex, not very readable nor reliable when compared to Python when tests, additional switches or edge cases handling are needed).
Anyhow, a postprocessing step would probably limit (if not defeat) the speed advantage of fclones vs jdupes/fdupes.
from fclones.
Implemented in #53 released as v0.12.0.
from fclones.
IMHO, a postprocessing script parsing the fclones output might require more complexity than adding a CLI switch
What do you means, exactly?
I like your aproach to using Python, maybe bash is not enought, althought it's more powerful than people would expect, and this can be done with it in a more portable way, while the Python wrapper would need to be an independent project since it would not be just a helper command anymore... But yes, a fclones-helpers
package would totally make sense :-)
from fclones.
Anyhow, a postprocessing step would probably limit (if not defeat) the speed advantage of fclones vs jdupes/fdupes.
I think bottleneck are in hashes...
from fclones.
It might also be nice to avoid creating dst
if that has been removed since fclones
was executed. Such edge cases come from the arbitrary amount of time (and changes on the filesystem) between the execution of fclones
and the postprocessing. An implementation inside fclones
could be far more robust. 👍
from fclones.
fclone should offer a way of deleting / hardlinking / softlinking duplicated files automatically.
In #25:
@pkolaczk wrote:
That's right, fclones doesn't offer any way of deleting files automatically yet. I believe this is a task for a different program (or a subcommand) that would take output of fclones.
and @piranna replied:
From a UNIX perspective, yes, it makes sense that task being done by another command, but being so much attached to fclones output format... :-/ Maybe a shell script wrapper that offer a compatible interface with fdupes? :-) That would be easy to implement, but not sure if It should be hosted here un fclones repo or being totally independent...
IMHO, a postprocessing script parsing the fclones output might require more complexity than adding a CLI switch. For instance, here's an (untested) python implementation that leverages the CSV output (expected in
fclones_out.csv
) to replace duplicates with hard links:#!/usr/bin/env python import logging from os import link, unlink from os.path import isfile def main() -> None: with open("fclones_out.csv") as f_handler: for duplicates in ( fclone_output_line.split(",")[3:] for fclone_output_line in f_handler.readlines() if not fclone_output_line.startswith("size") ): src = duplicates[0] for dst in duplicates[1:]: logging.debug("%s -> %s", src, dst) if isfile(dst): unlink(dst) link(src, dst) if __name__ == "__main__": logging.basicConfig(level=logging.DEBUG) main()PS: I think this deserves a ticket on its own, feel free to delete it if you don't agree. :-)
I added a few things - love the code. Assumes you output the csv file to /tmp for tidyness. Remember to put the primary directory last in the fclones path to keep those as a priority (contrast to rdfind where its the first directory that is kept priority)
#!/usr/bin/env python3
import os
import logging
from pathlib import Path
def main() -> None:
with open("/tmp/fclones_out.csv") as f_handler:
for duplicates in (
fclone_output_line.split(",")[3:]
for fclone_output_line in f_handler.readlines()
if not fclone_output_line.startswith("size")
):
src = duplicates[0]
for dst in duplicates[1:]:
# logging.debug("%s -> %s", src, dst)
dst = dst.strip('\n')
my_file=Path(dst)
if my_file.is_file():
os.remove(dst)
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG)
main()
from fclones.
fclone should offer a way of deleting / hardlinking / softlinking duplicated files automatically.
In #25:
@pkolaczk wrote:
That's right, fclones doesn't offer any way of deleting files automatically yet. I believe this is a task for a different program (or a subcommand) that would take output of fclones.
and @piranna replied:
From a UNIX perspective, yes, it makes sense that task being done by another command, but being so much attached to fclones output format... :-/ Maybe a shell script wrapper that offer a compatible interface with fdupes? :-) That would be easy to implement, but not sure if It should be hosted here un fclones repo or being totally independent...
IMHO, a postprocessing script parsing the fclones output might require more complexity than adding a CLI switch. For instance, here's an (untested) python implementation that leverages the CSV output (expected in
fclones_out.csv
) to replace duplicates with hard links:#!/usr/bin/env python import logging from os import link, unlink from os.path import isfile def main() -> None: with open("fclones_out.csv") as f_handler: for duplicates in ( fclone_output_line.split(",")[3:] for fclone_output_line in f_handler.readlines() if not fclone_output_line.startswith("size") ): src = duplicates[0] for dst in duplicates[1:]: logging.debug("%s -> %s", src, dst) if isfile(dst): unlink(dst) link(src, dst) if __name__ == "__main__": logging.basicConfig(level=logging.DEBUG) main()PS: I think this deserves a ticket on its own, feel free to delete it if you don't agree. :-)
and here is a version just to move files to a duplicates directory ($HOME/Duplicates) for safety
#!/usr/bin/env python3
import os, shutil
import logging
from pathlib import Path
def main() -> None:
with open("/tmp/fclones_out.csv") as f_handler:
for duplicates in (
fclone_output_line.split(",")[3:]
for fclone_output_line in f_handler.readlines()
if not fclone_output_line.startswith("size")
):
src = duplicates[0]
moveto = "/Users/MyUserName/Duplicates/"
for dst in duplicates[1:]:
logging.debug("%s -> %s", src, dst)
dst = dst.strip('\n')
my_file_list=Path(dst)
if my_file_list.is_file():
myfile = os.path.basename(dst)
sink = moveto+myfile
shutil.move(dst,sink )
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG)
main()
from fclones.
Assumes you output the csv file to /tmp for tidyness
Better if it gets the info directly from stdin
:-)
from fclones.
Assumes you output the csv file to /tmp for tidyness
Better if it gets the info directly from
stdin
:-)
I like to check before deleting!! :-) and the move one loses directory structure so equally want to check first
from fclones.
Related Issues (20)
- [dedup] Use APFS clone (CoW) on macOS HOT 8
- Fast Path impl does not escape special chars like `$` HOT 1
- Improvement request: preserve metadata of replaced files HOT 3
- Performance really bad on mergerfs with btrfs backends HOT 2
- ARMv7 32bit assistance HOT 3
- `cargo install fclones` fails HOT 2
- Limit search to a single FS HOT 3
- Performance gains from even more small hash tests? HOT 5
- fclones remove deleting items in --keep-path HOT 1
- Partially matches HOT 1
- Don't lose the cache if interrupted
- Change to group progress output in 0.33.0 breaks piping to file via stdout HOT 2
- fclones depends on libc6:amd64 (>= 2.36) HOT 4
- On android, running on termux, hard or soft link creation fails after creating the dupes file.
- Hard links are reported as duplicates
- Feedback, showdown against 3 other tools HOT 4
- How to find AND isolate/extract unique files that are in one directory but not another? HOT 2
- Add ability to filter by magic HOT 1
- How to deduplicate and compress?
- Sort file chunks instead of hashing HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fclones.