Coder Social home page Coder Social logo

deletevisuallyredundant's Introduction

Delete Visually Redundant

A Python script that deletes redundant visually-similar images.

For a list of visually similar images it finds, it first deletes all but the largest by file-size. If still multile files remain, it deletes all but the oldest by modified-on.

How to use

These are the accepted options:

Option Description
-p Mandatory. Path to the folder under which to recusively scan.
-r The script creates a 'dups.txt' file as an intermediate while processing. This file holds the list of visually-similar filenames. Giving -r prevents deleting this intermediate file after processing.
-d Dry run. Prints out the names of the files which would be deleted. Doesn't actually delete them.

Dependencies

  • findimagedupes

Gotchas

  • Supports jpg, png and gif files
    • The script can be trivially modified to support even more file types.

Tags

duplicate; image; remove; detect; find;

deletevisuallyredundant's People

Contributors

anirudhkishan avatar

Stargazers

Matteo Bigoi avatar Derek Gaffney avatar Julio Robledo avatar Nicolás V avatar  avatar Rubén avatar Julien Adam avatar Vlad Babii avatar Nathann Morand avatar Chad Napper avatar Guillaume Gelin avatar Ethan Shaw avatar J.J. Thompson avatar Zhivago-Sizov avatar Magnus Vilhelm Persson avatar Christopher Bowron avatar Jesse Michael avatar  avatar  avatar leuldereje avatar

Watchers

James Cloos avatar  avatar  avatar

deletevisuallyredundant's Issues

Same modified time -> all are delted

when two files have the same modified time, then all are deleted.

Can be fixed with something like:


    if (len(maxFilepaths) > 1):
        oldestModifiedTime = 9999999999999999999999
        keptfirst = False

        for maxFilepath in maxFilepaths:
            modifiedTime = os.stat(filepath).st_mtime

            if (modifiedTime < oldestModifiedTime):
                oldestModifiedTime = modifiedTime

        for maxFilepath in maxFilepaths:
            modifiedTime = os.stat(filepath).st_mtime

            if (modifiedTime > oldestModifiedTime):
                print("D:", maxFilepath)
                #if toDryRun == False:
                #    os.remove(maxFilepath)
            elif (modifiedTime >= oldestModifiedTime and keptfirst):
                print("D:", maxFilepath)
                #if toDryRun == False:
                #    os.remove(maxFilepath)
            else:
                keptfirst = True
                print("K:", maxFilepath)
    else:
        print("K:", maxFilepaths[-1])

Keep receiving FileNotFoundError

I keep receiving the following error:

line 50, in deleteAllButLargestAndOldest
modifiedTime = os.stat(filepath).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: "some file path"

Even though the supposedly not found file does indeed exist. Prove of it is that running the script a second time finds the file and deletes if its a duplicate. Unfortunately, later another file triggers the error and stops again. If I run the script over and over it finally ends the process successfully. Don't know why it finds the files some times and some others doesn't.

I'm not familiar with python programing so I'm afraid I can't debug what the problem actually is. I added a os.path.isfile(filepath) condition to skip not found paths but unfortunately this means some duplicates are left and I have to run the script once or twice more. Maybe if I run a loop with os.path.isfile(filepath) to see if running it more than once actually ends up finding the file? anyway hope this can be fix with the info I have given so far. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.