Coder Social home page Coder Social logo

Comments (10)

pkolaczk avatar pkolaczk commented on June 24, 2024 2

@aurelg The postprocessing step would be fast and definitely not a bottleneck. The main bottleneck is I/O for reading files to compute the hashes.

I generally agree this feature is much easier to implement inside fclones.
However, this is not as simple as the provided python script. When automatcally deleting user files, one has to be extremely cautious. E.g. there may be some edge cases when e.g. during the scanning phase a file was moved to a different location and fclones registered it as a duplicate. But at the moment it wants to delete it, there is no duplicate anymore.

This:

       if isfile(dst):
                    unlink(dst)
       link(src, dst)

might end up deleting the only existing file.

Better to move the file first, before deleting, then create a link, then if all ok, drop the moved file.

from fclones.

aurelg avatar aurelg commented on June 24, 2024 1

What do you means, exactly?

Most of the python code above deals with reconstructing proper datastructures from the fclones output. I guess such datastructures are probably already available in fclones. A dedicated flag could bypass the need for implementing (and maintaining) a parser.

I'm not very happy with the python dependency either. IMHO the link between an independent python project and fclones would be so tight that I don't think it's worth the split.

I'd prefer a shell-based approach as well. It would be more portable, but I fear it could be rather limiting later, though (as it becomes pretty complex, not very readable nor reliable when compared to Python when tests, additional switches or edge cases handling are needed).

Anyhow, a postprocessing step would probably limit (if not defeat) the speed advantage of fclones vs jdupes/fdupes.

from fclones.

pkolaczk avatar pkolaczk commented on June 24, 2024 1

Implemented in #53 released as v0.12.0.

from fclones.

piranna avatar piranna commented on June 24, 2024

IMHO, a postprocessing script parsing the fclones output might require more complexity than adding a CLI switch

What do you means, exactly?

I like your aproach to using Python, maybe bash is not enought, althought it's more powerful than people would expect, and this can be done with it in a more portable way, while the Python wrapper would need to be an independent project since it would not be just a helper command anymore... But yes, a fclones-helpers package would totally make sense :-)

from fclones.

piranna avatar piranna commented on June 24, 2024

Anyhow, a postprocessing step would probably limit (if not defeat) the speed advantage of fclones vs jdupes/fdupes.

I think bottleneck are in hashes...

from fclones.

aurelg avatar aurelg commented on June 24, 2024

It might also be nice to avoid creating dst if that has been removed since fclones was executed. Such edge cases come from the arbitrary amount of time (and changes on the filesystem) between the execution of fclones and the postprocessing. An implementation inside fclones could be far more robust. 👍

from fclones.

rleaver152 avatar rleaver152 commented on June 24, 2024

fclone should offer a way of deleting / hardlinking / softlinking duplicated files automatically.

In #25:

@pkolaczk wrote:

That's right, fclones doesn't offer any way of deleting files automatically yet. I believe this is a task for a different program (or a subcommand) that would take output of fclones.

and @piranna replied:

From a UNIX perspective, yes, it makes sense that task being done by another command, but being so much attached to fclones output format... :-/ Maybe a shell script wrapper that offer a compatible interface with fdupes? :-) That would be easy to implement, but not sure if It should be hosted here un fclones repo or being totally independent...

IMHO, a postprocessing script parsing the fclones output might require more complexity than adding a CLI switch. For instance, here's an (untested) python implementation that leverages the CSV output (expected in fclones_out.csv) to replace duplicates with hard links:

#!/usr/bin/env python

import logging
from os import link, unlink
from os.path import isfile


def main() -> None:
    with open("fclones_out.csv") as f_handler:

        for duplicates in (
            fclone_output_line.split(",")[3:]

            for fclone_output_line in f_handler.readlines()

            if not fclone_output_line.startswith("size")
        ):
            src = duplicates[0]

            for dst in duplicates[1:]:
                logging.debug("%s -> %s", src, dst)

                if isfile(dst):
                    unlink(dst)
                link(src, dst)


if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    main()

PS: I think this deserves a ticket on its own, feel free to delete it if you don't agree. :-)

I added a few things - love the code. Assumes you output the csv file to /tmp for tidyness. Remember to put the primary directory last in the fclones path to keep those as a priority (contrast to rdfind where its the first directory that is kept priority)

#!/usr/bin/env python3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
import os
import logging
from pathlib import Path

def main() -> None:
    with open("/tmp/fclones_out.csv") as f_handler:

        for duplicates in (
            fclone_output_line.split(",")[3:]

            for fclone_output_line in f_handler.readlines()

            if not fclone_output_line.startswith("size")
        ):
            src = duplicates[0]

            for dst in duplicates[1:]:
#                logging.debug("%s -> %s", src, dst)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
                dst = dst.strip('\n')
		my_file=Path(dst)
                if my_file.is_file():
                    os.remove(dst)

if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    main()


from fclones.

rleaver152 avatar rleaver152 commented on June 24, 2024

fclone should offer a way of deleting / hardlinking / softlinking duplicated files automatically.

In #25:

@pkolaczk wrote:

That's right, fclones doesn't offer any way of deleting files automatically yet. I believe this is a task for a different program (or a subcommand) that would take output of fclones.

and @piranna replied:

From a UNIX perspective, yes, it makes sense that task being done by another command, but being so much attached to fclones output format... :-/ Maybe a shell script wrapper that offer a compatible interface with fdupes? :-) That would be easy to implement, but not sure if It should be hosted here un fclones repo or being totally independent...

IMHO, a postprocessing script parsing the fclones output might require more complexity than adding a CLI switch. For instance, here's an (untested) python implementation that leverages the CSV output (expected in fclones_out.csv) to replace duplicates with hard links:

#!/usr/bin/env python

import logging
from os import link, unlink
from os.path import isfile


def main() -> None:
    with open("fclones_out.csv") as f_handler:

        for duplicates in (
            fclone_output_line.split(",")[3:]

            for fclone_output_line in f_handler.readlines()

            if not fclone_output_line.startswith("size")
        ):
            src = duplicates[0]

            for dst in duplicates[1:]:
                logging.debug("%s -> %s", src, dst)

                if isfile(dst):
                    unlink(dst)
                link(src, dst)


if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    main()

PS: I think this deserves a ticket on its own, feel free to delete it if you don't agree. :-)

and here is a version just to move files to a duplicates directory ($HOME/Duplicates) for safety


#!/usr/bin/env python3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
import os, shutil
import logging
from pathlib import Path

def main() -> None:
    with open("/tmp/fclones_out.csv") as f_handler:

        for duplicates in (
            fclone_output_line.split(",")[3:]

            for fclone_output_line in f_handler.readlines()

            if not fclone_output_line.startswith("size")
        ):
            src = duplicates[0]
            moveto = "/Users/MyUserName/Duplicates/"
            for dst in duplicates[1:]:
                logging.debug("%s -> %s", src, dst)
                dst = dst.strip('\n')
                my_file_list=Path(dst)
                if my_file_list.is_file():
                    myfile = os.path.basename(dst)
                    sink = moveto+myfile
                    shutil.move(dst,sink )

if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    main()




from fclones.

piranna avatar piranna commented on June 24, 2024

Assumes you output the csv file to /tmp for tidyness

Better if it gets the info directly from stdin :-)

from fclones.

rleaver152 avatar rleaver152 commented on June 24, 2024

Assumes you output the csv file to /tmp for tidyness

Better if it gets the info directly from stdin :-)

I like to check before deleting!! :-) and the move one loses directory structure so equally want to check first

from fclones.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.