Coder Social home page Coder Social logo

knjcode / imgdupes Goto Github PK

View Code? Open in Web Editor NEW
337.0 5.0 24.0 950 KB

Identifying and removing near-duplicate images using perceptual hashing.

Python 96.34% Makefile 1.48% Dockerfile 2.17%
image dedupe perceptual-hashing perceptual-hashes deduplicate

imgdupes's Issues

error on install

pip install imgdupes
Collecting imgdupes
  Downloading imgdupes-0.1.1.tar.gz (11 kB)
Collecting future
  Using cached future-0.18.2.tar.gz (829 kB)
Requirement already satisfied: ImageHash in c:\users\realh\appdata\local\programs\python\python38\lib\site-packages (from imgdupes) (3.4)
Requirement already satisfied: joblib in c:\users\realh\appdata\local\programs\python\python38\lib\site-packages (from imgdupes) (0.16.0)
ERROR: Could not find a version that satisfies the requirement ngt (from imgdupes) (from versions: none)
ERROR: No matching distribution found for ngt (from imgdupes)

Compare with damaged JPGs

I have a huge library of files taken out with different softwares from a broken hard disk, I managed to salvage most of my photo library, but I discovered I have many corrupt copies of many files, they are exactly the same as my not corrupted versions for the first few lines and then they get cut or they start huge glitches.

The thing is that most photo duplicates removal softwares are actually using image hashing algorithms, and while they work wonderfully comparing "edited" images they don't work at all comparing corrupted files.

I don't know if that would be possible but I was wondering if there's any chance to add some sort of "pixel stream" comparison, like looking at differences not in a "perceptual hash" but treating pixels as some sort of string declaring color values starting from the top-left corner.

Thanks in advance :)

Only --faiss-flat works in docker on Synology NAS???

On my Synology NAS (DS720+) I installed docker and tried to run imgdupes on a folder with two exact same images. It works only for --faiss-flat while I never get any output or result for --ngt or --hnsw, no matter what other options, values or images I provide.

admin@nas2:/volume1/docker$ ll
total 600
-rwxrwxrwx 1 admin users 304184 Apr  1  2013 test1.jpg
-rwxrwxrwx 1 admin users 304184 Apr  1  2013 test2.jpg
admin@nas2:/volume1/docker$ sudo docker run -it -v $PWD:/app knjcode/imgdupes . phash 0
admin@nas2:/volume1/docker$ sudo docker run -it -v $PWD:/app knjcode/imgdupes --ngt . phash 0
admin@nas2:/volume1/docker$ sudo docker run -it -v $PWD:/app knjcode/imgdupes --hnsw . phash 0
admin@nas2:/volume1/docker$ sudo docker run -it -v $PWD:/app knjcode/imgdupes --faiss-flat . phash 0
Building faiss index (dimension=64, num_proc=3)
Exact neighbor searching using faiss
100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2545.09it/s]
test1.jpg
test2.jpg

admin@nas2:/volume1/docker$

What could cause this issue? Am I doing something wrong? Is it a bug?

If I cannot fix it, what are the pros and cons of --faiss-flat in contrast to --ngt or --hnsw?

RuntimeError: src/ngtpy.cpp

running in python 3.9 get follow error:

$ imgdupes --recursive datasets phash 4
Building NGT index (dimension=64, num_proc=15)
Traceback (most recent call last):
File "/home/huzhuolei/miniconda3/envs/imgdupes/bin/imgdupes", line 230, in
main()
File "/home/huzhuolei/miniconda3/envs/imgdupes/bin/imgdupes", line 226, in main
dedupe_images(args)
File "/home/huzhuolei/miniconda3/envs/imgdupes/bin/imgdupes", line 94, in dedupe_images
deduper.dedupe(args)
File "/home/huzhuolei/miniconda3/envs/imgdupes/lib/python3.9/site-packages/common/imagededuper.py", line 172, in dedupe
ngt_index.batch_insert(self.hashcache.hshs(), num_proc)
RuntimeError: src/ngtpy.cpp:

bash: imgdupes: command not found

I installed imgdupes with pip for both python2 and python3, I get the same error when I run
imgdupes --recursive target_dir phash 4

"--query" option does not work if the specified image is not contained in the target_dir

directory like this:

directory1/
......├── 1.png
......└── directory2
.........................└── 1.copy.png

If I cd to directory1/ then run:
imgdupes -r . --query 1.png dhash 4

It would show the result:

Searching similar images
100%|████████████| 2/2 [00:00<00:00, 15335.66it/s]
Query: 1.png

1.png
directory2/1.copy.png

However, if I run:
imgdupes -r directory2 --query 1.png dhash 4

It would fail to find out the 1.copy.png :

Searching similar images
100%|████████████| 1/1 [00:00<00:00, 3342.07it/s]
Query: 1.png

Find highest quality duplicated instead of removing duplicates

I would like to do something similar to whats described in the docs, but instead of deleting duplicate files, I would like to search for duplicates(from a set a query images), find the duplicate with the highest quality, and copy that duplicate to a new folder

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.