knjcode / imgdupes Goto Github PK

Identifying and removing near-duplicate images using perceptual hashing.

Python 96.34% Makefile 1.48% Dockerfile 2.17%

image dedupe perceptual-hashing perceptual-hashes deduplicate

imgdupes's Issues

error on install

pip install imgdupes
Collecting imgdupes
  Downloading imgdupes-0.1.1.tar.gz (11 kB)
Collecting future
  Using cached future-0.18.2.tar.gz (829 kB)
Requirement already satisfied: ImageHash in c:\users\realh\appdata\local\programs\python\python38\lib\site-packages (from imgdupes) (3.4)
Requirement already satisfied: joblib in c:\users\realh\appdata\local\programs\python\python38\lib\site-packages (from imgdupes) (0.16.0)
ERROR: Could not find a version that satisfies the requirement ngt (from imgdupes) (from versions: none)
ERROR: No matching distribution found for ngt (from imgdupes)

I have a huge library of files taken out with different softwares from a broken hard disk, I managed to salvage most of my photo library, but I discovered I have many corrupt copies of many files, they are exactly the same as my not corrupted versions for the first few lines and then they get cut or they start huge glitches.

The thing is that most photo duplicates removal softwares are actually using image hashing algorithms, and while they work wonderfully comparing "edited" images they don't work at all comparing corrupted files.

I don't know if that would be possible but I was wondering if there's any chance to add some sort of "pixel stream" comparison, like looking at differences not in a "perceptual hash" but treating pixels as some sort of string declaring color values starting from the top-left corner.

Thanks in advance :)

Add support for HEIF/HEIC via libheif

The HEIF format keeps growing in use.

Use different exit codes when dupes are found

Would be nice to be able to quickly check in a script or CI if dupes are found by checking the exit code

Is there a way to make imgcat display the images larger?

I can zoom in on my terminal, but then when the next set of images comes they're resized back to the dimensions they were before, with the text remaining large.

imgdupes not recognized as an internal or external command

I'm using win 8.1 64x
Python 3.9

Continues giving Error: Unable to load NGT. Please install NGT and python binding first. after multiple installs

I continue getting this error:
Error: Unable to load NGT. Please install NGT and python binding first.
After I have installed NGT multiple times again...
Not sure what the problem is, all help is much appreciated.

Only --faiss-flat works in docker on Synology NAS???

On my Synology NAS (DS720+) I installed docker and tried to run imgdupes on a folder with two exact same images. It works only for --faiss-flat while I never get any output or result for --ngt or --hnsw, no matter what other options, values or images I provide.

admin@nas2:/volume1/docker$ ll
total 600
-rwxrwxrwx 1 admin users 304184 Apr  1  2013 test1.jpg
-rwxrwxrwx 1 admin users 304184 Apr  1  2013 test2.jpg
admin@nas2:/volume1/docker$ sudo docker run -it -v $PWD:/app knjcode/imgdupes . phash 0
admin@nas2:/volume1/docker$ sudo docker run -it -v $PWD:/app knjcode/imgdupes --ngt . phash 0
admin@nas2:/volume1/docker$ sudo docker run -it -v $PWD:/app knjcode/imgdupes --hnsw . phash 0
admin@nas2:/volume1/docker$ sudo docker run -it -v $PWD:/app knjcode/imgdupes --faiss-flat . phash 0
Building faiss index (dimension=64, num_proc=3)
Exact neighbor searching using faiss
100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2545.09it/s]
test1.jpg
test2.jpg

admin@nas2:/volume1/docker$

What could cause this issue? Am I doing something wrong? Is it a bug?

If I cannot fix it, what are the pros and cons of --faiss-flat in contrast to --ngt or --hnsw?

RuntimeError: src/ngtpy.cpp

running in python 3.9 get follow error:

$ imgdupes --recursive datasets phash 4
Building NGT index (dimension=64, num_proc=15)
Traceback (most recent call last):
File "/home/huzhuolei/miniconda3/envs/imgdupes/bin/imgdupes", line 230, in
main()
File "/home/huzhuolei/miniconda3/envs/imgdupes/bin/imgdupes", line 226, in main
dedupe_images(args)
File "/home/huzhuolei/miniconda3/envs/imgdupes/bin/imgdupes", line 94, in dedupe_images
deduper.dedupe(args)
File "/home/huzhuolei/miniconda3/envs/imgdupes/lib/python3.9/site-packages/common/imagededuper.py", line 172, in dedupe
ngt_index.batch_insert(self.hashcache.hshs(), num_proc)
RuntimeError: src/ngtpy.cpp:

bash: imgdupes: command not found

I installed imgdupes with pip for both python2 and python3, I get the same error when I run
imgdupes --recursive target_dir phash 4

Docker error 132

Hello.

I would love to run this thing on my Synology-NAS (DS718+). Unfortunately, my containers always exit with error 132:

https://medium.com/@nprch_12/docker-exited-132-e38f9dd2cd0d

This is weird as my CPU should support SSE4.2.
cpuinfo.txt

Would you mind helping me in debugging this issue? Unfortunately I don't get any logs from the container as it crashes immediately.

Best

Is there a way to delete based on how similar two images are?

For example, delete these images if they are 60% similar but keep if they are 59% similar.

"--query" option does not work if the specified image is not contained in the target_dir

directory like this:

directory1/
......├── 1.png
......└── directory2
.........................└── 1.copy.png

If I cd to directory1/ then run:
imgdupes -r . --query 1.png dhash 4

It would show the result:

Searching similar images
100%|████████████| 2/2 [00:00<00:00, 15335.66it/s]
Query: 1.png

1.png
directory2/1.copy.png

However, if I run:
imgdupes -r directory2 --query 1.png dhash 4

It would fail to find out the 1.copy.png :

Searching similar images
100%|████████████| 1/1 [00:00<00:00, 3342.07it/s]
Query: 1.png

Find highest quality duplicated instead of removing duplicates

I would like to do something similar to whats described in the docs, but instead of deleting duplicate files, I would like to search for duplicates(from a set a query images), find the duplicate with the highest quality, and copy that duplicate to a new folder

knjcode / imgdupes Goto Github PK

imgdupes's Issues

error on install

Compare with damaged JPGs

Add support for HEIF/HEIC via libheif

Use different exit codes when dupes are found

Is there a way to make imgcat display the images larger?

imgdupes not recognized as an internal or external command

Continues giving Error: Unable to load NGT. Please install NGT and python binding first. after multiple installs

Only --faiss-flat works in docker on Synology NAS???

RuntimeError: src/ngtpy.cpp

bash: imgdupes: command not found

Docker error 132

Is there a way to delete based on how similar two images are?

"--query" option does not work if the specified image is not contained in the target_dir

Find highest quality duplicated instead of removing duplicates

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent