rhsimplex / image-match Goto Github PK

View Code? Open in Web Editor NEW

2.9K 101.0 406.0 1.54 MB

🎇 Quickly search over billions of images

Python 99.58% Dockerfile 0.42%

image-analysis image-signatures python search

image-match's People

Stargazers

Watchers

Forkers

mattwatson123 neopunisher droiddemos gldemos kk9599 techscientist rbs392 contarna dev-arena imjerrybao nikileshsa nadeemkhan xuanhan863 tzutalin suzaku steve21124 jsilva xurantju twocngdagz elcatibo xingwudao zeyaddeeb sbellem number0 zhleternity awesome-python erogol hitluobin pierrehao jsonbao kd1900 zhangxujinsh lch277 stylenie lprimeroo hkingz jmussach dayongchan alexhuang888 laskarcyber fabiocesari techgaun clear-datacenter ghotiv senzil infra-structure bag-of-projects dominicgwak yaowenwu newship caomw ravirnjn88 shannonyu wwwtravel klokantech lvctao iweekly yuyongpeng etali landoyjx zqqqq igledaniel joe2hpimn onlinesen optimizeimg rockystevejobs gideonmay onode eldk yhua123 puppycodes gokcerbelgusen lonelygo alexminnaar hno3kyoz poorevil ciandt-d1 james4388 janes wx7614140 heroonline tomzhang zgsxwsdxg ecdeveloper dortegau sthphoenix templeblock shiragami walidsa3d imaduddinamajid einali nagyist gepcel ihuanglele aditya-shah actank suood shadowcz007 yetone xiaoxiangyutf

image-match's Issues

Duplicate Identifier or Similarity Identifier

I was hoping to get some clarification on the intended use-case for this. Should it strictly be used for duplicate detection or can it also be used to identify similar images. This page seems to suggest that it can be used to measure image similarity. However when I try it on the attached images, it does not seem to agree with the intuition that two images of shoes should be significantly more similar than an image of a shoe and something else.

The distance between the first and second image seems to be 0.71422605625006175 but the distance between the first and third is 0.70043762770711271.

Signature generator fails on seemingly normal image

Causes an error (see #64 )

`refresh_after`

@jtara1 since you said you would make a PR based on your commit to force an immediate rebuild of the index.

Just making an issue to remind me =)

Strange problem searching

Hello, I have a weird problem. I have two images, A and B (B is a bigger and with more resolution version of A). If I use ImageSignature to calculate the normalized distance between the two I get 0.314299892917, which is pretty good, showing that they are a match.

Now, here is the problem, if I add image A to elasticsearch using ses.add_image('A.jpg') and then, use ses.search_image('B.jpg'), I get no results. I tried modifying the distance_cutoff to 0.99, and got a bunch of results, BUT these results did not include the A.jpg and all the results in this scenario had a distance of at least 0.60... I KNOW image A is ther because if I ses.search_image('A.jpg') I get a perfect match.

I attached the images.

Investigate use of "MoreLikeThis" query

Per this suggestion:

You might want to look at Morelikethis queries to boost performance. I worked on a proprietary version of this and at the time Lucene performance dropped off nearly linearly with the number of query terms.
We used MoreLikeThis to reduce our queries count to the 30-40 most statistically interesting terms. The one hiccup being an issue in Lucene [1] where the term cache wasn't operating properly. We just added our own image query term cache and a custom MLT query to leverage it, which gave us a 10x speed bump over any other methods we tried.
The interestingness of the terms is assessed on a per-term basis though, so you might see a relevence drop for some types of image if you set MoreLikeThis to use too few terms.
[1] https://issues.apache.org/jira/browse/LUCENE-1690

Look into MoreLikeThis instead of BoolQuery

Elasticsearch 5 compatibility ?

Hi,

I tried to test image-match but I'm getting this error while trying to search inside db for an image :
`
elasticsearch.exceptions.RequestError: TransportError(400, u'parsing_exception', u'no [query] registered for [filtered]')``

I'm using elasticsearch 5, is it supported ?

Add method to remove images from index

Please can you add a method to remove an image from the index, I would like to keep the index up-to-date with my list of images (which is constantly changing).

Search images using part of the picture

Hi @rhsimplex, I come again with another question. When I searched images from Elasticsearch backend using part of the image(1/2, 1/3 of the image), it is hardly to find a image matched. I want to solve this problem to some extent. Could you give me some advice?

Find Sub images in larger image

Is it possible to use this to find sub images in larger images?

So for instance if I have a picture of 20 books arranged neatly, and want to find 1 of those books, can this be used, and can we return a bounded box for that image?

Whether it contains memory leaks？

When I use image-match in CentOS, it will takes up a lot of memory. Finally, will be killed by the system.

Dockerfile does not create ElasticSearch server

Documentation says:

We have a Docker image that takes care of setting up image_match and elasticsearch

Looking in the Dockerfile and setup.py, I see no creation of an ElasticSearch container.

Perhaps the documentation should mention it only creates a containerized environment where you can use the Python REPL for execute programs that use the image-match and elasticsearch libraries.

I would suggest, however, that adding an explanation on how you can create an ElasticSearch container:

docker run -P -d elasticsearch

And possibly linking it (or whatever is relevant in the latest Docker version) to the created image-match container can be a lot more useful.

At the very least, a better explanation on the expected usage pattern of this container should be added.

looks very nice

does it have any buildpack for heroku?

Feature request: better errors for corrupt images

Right now exceptions regarding corrupt images need to be caught using except xml.etree.ElementTree.ParseError which looks rather... abstract.

Image match should throw its own type of exception in such cases.

Related: #59

ElasticSearch 10K record limit

It seems like ElasticSearch has a 10k rows limit.
Forgive the basic question but does image-match handle this?
Can't seem to figure out how to get our docker set up to run for a larger database.

Update docs

Added some new features, @sbellem will need a little help from you =)

POST /tester2/image?refresh=false [status:406 request:0.003s] elasticsearch.exceptions.TransportError: <exception str() failed>

I am using a VMware Centos7 system,and install all software follow the guide doc!
and then i test the following code:

import elasticsearch
from image_match.elasticsearch_driver import SignatureES
es = elasticsearch.Elasticsearch()
es.indices.create('tester2')
ses = SignatureES(es, index='tester2')
ses.add_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
list = ses.search_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
print(list)

then i got this error!

POST /tester2/image?refresh=false [status:406 request:0.003s]
Traceback (most recent call last):
  File "/home/diters/PycharmProjects/imageMatch/imageMatchServer.py", line 6, in <module>
    ses.add_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
  File "/usr/local/python3/lib/python3.6/site-packages/image_match/signature_database_base.py", line 203, in add_image
    self.insert_single_record(rec, refresh_after=refresh_after)
  File "/usr/local/python3/lib/python3.6/site-packages/image_match/elasticsearch_driver.py", line 88, in insert_single_record
    self.es.index(index=self.index, doc_type=self.doc_type, body=rec, refresh=refresh_after)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 279, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/transport.py", line 329, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 109, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 108, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: <exception str() failed>

Python 3 support

Join the future of Python

Question about searching

Hello,

I have a question about record generation. I don't understand why can set a word key like simple_word_0 and use it for search .

From code :

    def insert_single_record(self, rec):
        """Insert an image record.

        Must be implemented by derived class.

        Args:
            rec (dict): an image record. Will be in the format returned by
                make_record

                For example, rec could have the form:

                {'path': 'https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg',
                 'signature': [0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 0 ... ]
                 'simple_word_0': 42252475,
                 'simple_word_1': 23885671,
                 'simple_word_10': 9967839,
                 'simple_word_11': 4257902,
                 'simple_word_12': 28651959,
                 'simple_word_13': 33773597,
                 'simple_word_14': 39331441,
                 'simple_word_15': 39327300,
                 'simple_word_16': 11337345,
                 'simple_word_17': 9571961,
                 'simple_word_18': 28697868,
                 'simple_word_19': 14834907,
                 'simple_word_2': 7434746,
                 'simple_word_20': 37985525,
                 'simple_word_21': 10753207,
                 'simple_word_22': 9566120,
                 ...
                 'metadata': {...}
                 }

                 The number of simple words corresponds to the attribute N

        """

As I see the detail implement in elasticsearch and mongodb . I found image-match's searching step is :

generate words from signature
convert signature into word dict { 'simple_word_0': 42252475, 'simple_word_1': 23885671, 'simple_word_10': 9967839}
query for each word, get a word matched record list
do signature comparison with each word matched signature , get the high score ones.

Here, what I can't understand is step 2 and 3. The word key is assign by order, how could it be match to another pic with some differences (translation or rotation). But actually image-match is doing so , the only reason I can imagine is phash could persist feature position, is that ture ?

If that is true, there come out another question: if phash could persist feature position , why not use some solution like simhash online query (http://www.wwwconference.org/www2007/papers/paper215.pdf) ?

Split signature into several tables/blocks, comparison times can be reduced.

Query regarding image storage

I intend to try out this project in my application however I have a doubt regarding the add image call. Does it actually store the image in elastic search server? I hope not. What I want to do is store the images elsewhere,possibly AWS S3 and use the URLs to add images though your app to elastic search and then search them whenever I want. If you store the images as well it is a double storage for my app. So just want to know.

cannot resize an array that references

hi ...
when i wanna call ses.add_image i got this error :

ses.add_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')

my error :
cannot resize an array that references or is referenced

by another array in this way. Use the resize function

Skip duplicates on indexing?

Is there a way to skip indexing an image, if it already has been indexed? I have some images in elasticsearch, and I need to index some others. However, some of those images were already indexed. Is there way to make those images, which have 100% match in DB, to not be indexed again?

about ext data

Hi
This project is great, thank you for your open source.
In use, I feel a little less convenient.
When used in conjunction with the ES, add_image ext data can add a field, json format
such as:
add_image ( 'image url', 'json data')
Back out in the search results come together.

Thus, when adding pictures, you can write the image with some extra information, search can directly get.

Own little experience.
We are hoping to add this feature.
Thank you

Documentation of way to get complete distance matrix

Hi there,

I have a set of 4000 images which I want to create into a cluster. My images are a large set of images taken from various fixed cameras (might move a small, small bit due to wind), some at day some at night, and they might have people, dogs, cats, etc. I am trying to create clusters based on the camera (i.e. clusters of images all taken by the same camera).

I'm planning on using HDBSCAN for this:
http://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html

I've got image-match running and have done the following modifications to the library to attempt and get a complete distance matrix:

I have tried settings distance_cutoff of SignatureDatabaseBase() to 1.0, and size of SignatureES() to 4000, but I seem to be getting a sparse 4000x4000 matrix.

Is there any easy way to get the full distance matrix?

Also, any hints on when increasing k, N and n_grid is correct for more precise results?

I also noticed some images contain specific textual labels embedded in the image in the same places (like date/time and camera name). Since these labels aren't big, I'm pretty sure they're mostly ignored here - am I right?

Why "An image signature for any kind of image, Wong et al" instead of pHash?

Thanks for the awesome package. Is there a rationale for choosing to implement the digital signature from "An image signature for any kind of image, Wong et al" versus pHash?

Add integration tests

To improve development speed and PRs review we can add some integration tests.

We can test image-match as a "blackbox", having a directory with images on one side, and expected fingerprints on the other. We can always work on unit tests later.

Signature Compression

Currently the entire signature is stored as an array, and the words are stored as 32-bit(?) integers. Investigate compression schemes.

SignatureMongo problem with python3

for import

`from pymongo.mongo_client import MongoClient

from os import listdir
from os.path import isfile, join
from image_match.goldberg import ImageSignature
from image_match.signature_database_base import make_record
from image_match.mongodb_driver import SignatureMongo`

there is this error:

Traceback (most recent call last): File "/home/mehdi/ws/temp/image_match/src/sample3.py", line 7, in <module> from image_match.mongodb_driver import SignatureMongo File "/home/mehdi/venvs/image_match/lib/python3.5/site-packages/image_match/mongodb_driver.py", line 1, in <module> from signature_database_base import SignatureDatabaseBase ImportError: No module named 'signature_database_base'

in file image_match/mongodb_driver.py

changing
from signature_database_base import SignatureDatabaseBase from signature_database_base import normalized_distance from multiprocessing import cpu_count, Process, Queue from multiprocessing.managers import Queue as managerQueue import numpy as np

to
from .signature_database_base import SignatureDatabaseBase from .signature_database_base import normalized_distance from multiprocessing import cpu_count, Process, Queue from multiprocessing.managers import Queue as managerQueue import numpy as np

other error occurs :
Traceback (most recent call last): File "/home/mehdi/ws/temp/image_match/src/sample3.py", line 7, in <module> from image_match.mongodb_driver import SignatureMongo File "/home/mehdi/venvs/image_match/lib/python3.5/site-packages/image_match/mongodb_driver.py", line 4, in <module> from multiprocessing.managers import Queue as managerQueue ImportError: cannot import name 'Queue'

Logos

Thanks @WojHupert

cairo

I am a fresher to Python. Why I encountered the problem
"import cairo # pycairo
ImportError: No module named 'cairo'" .
Although I have installed cairo(https://www.cairographics.org/download/), I can not fix it yet!
I run that on Mac OS X EI Capitan 10.11.4 .
Do you have a more detailed guide than quick start and I really need it. If you have, please send it to my email([email protected]).
Really thanks！

Storing and Searching

This is the example in quick start, a TypeError "TypeError: unorderable types: dict() > dict()" occured when I run it. Have you ever encountered this problem?

The code is:

`from elasticsearch import Elasticsearch
from image_match.elasticsearch_driver import SignatureES

es = Elasticsearch()
ses = SignatureES(es)

ses.add_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')

ses.add_image('https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg')

ses.add_image('https://upload.wikimedia.org/wikipedia/commons/e/e0/Caravaggio_-_Cena_in_Emmaus.jpg')

ses.add_image('https://c2.staticflickr.com/8/7158/6814444991_08d82de57e_z.jpg')

list = ses.search_image('https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg')

print(list)`

When it was excuted, "TypeError: unorderable types: dict() > dict()" occurred. The detailed problem is as follows.
`runfile('/Users/lvchangtao/local-image-match/storeSearching.py', wdir='/Users/lvchangtao/local-image-match')

/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/image_match/goldberg.py:402: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future

lower_y_lim:upper_y_lim]) # no smoothing here as in the paper

Traceback (most recent call last):

File "", line 1, in

File "/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 714, in runfile
execfile(filename, namespace)

File "/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/lvchangtao/local-image-match/storeSearching.py", line 20, in
list = ses.search_image('https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg')

File "/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/image_match/signature_database_base.py", line 270, in search_image
r = sorted(np.unique(result).tolist(), key=itemgetter('dist'))

File "/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/numpy/lib/arraysetops.py", line 198, in unique
ar.sort()

TypeError: unorderable types: dict() > dict()`

linspace() got an unexpected keyword argument 'dtype'

While I'm trying to run this code:

from image_match.goldberg import ImageSignature gis = ImageSignature() a = gis.generate_signature('/home/francesco/Scrivania/2.jpg') b = gis.generate_signature('/home/francesco/Scrivania/1.jpg') gis.normalized_distance(a, b)

I get this error:

~$ python '/home/francesco/Scrivania/imagematch.py' Traceback (most recent call last): File "/home/francesco/Scrivania/imagematch.py", line 3, in <module> a = gis.generate_signature('/home/francesco/Scrivania/2.jpg') File "/usr/local/lib/python2.7/dist-packages/image_match/goldberg.py", line 164, in generate_signature n=self.n, window=image_limits) File "/usr/local/lib/python2.7/dist-packages/image_match/goldberg.py", line 341, in compute_grid_points x_coords = np.linspace(window[0][0], window[0][1], n + 2, dtype=int)[1:-1] TypeError: linspace() got an unexpected keyword argument 'dtype'

Is it possible to ignore a feature , ex: 'color - black' in the comparison query?

I'm trying to compare two images , for example - a black hand and a white hand - they could be identical if the color can be ignored. Is it possible , any properties can be set in the queries?

Thanks.

Image url in Path Stored as Text Field

I am not an Elasticsearch expert but I think that when indexing a document the 'path' field which contains the image url is being stored as a text field which means that Elasticsearch is internally tokenizing the url. As a consequence I'm not able to query an image url via a term query (it will return no results) to get an exact match. I am able to use a match query however this will also return other documents that have similar url's which is not optimal. Is there any way the 'path' field can be stored as a keyword field so that the urls do not get tokenized and term queries will work?

cairo error on import ImageSignature

I attempt to import the following:

from image_match.goldberg import ImageSignature

And it raises an issue apparently related to cairo--actually originating in the cairocffi library:

File "/Users/[User]/anaconda/lib/python3.5/site-packages/cairocffi/__init__.py", line 46, in <module>
cairo = dlopen(ffi, 'cairo', 'cairo-2')
raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
OSError: dlopen() failed to load a library: cairo / cairo-2

I installed cairo using the MacPorts option and installation seemed successful.

Any ideas?

Problem with TravisCI build

Something is wrong with scikit-image and travis

Adding elasticsearch 5.3.0 to easy-install.pth file
Installed /home/travis/miniconda/envs/test-environment/lib/python3.5/site-packages/elasticsearch-5.3.0-py3.5.egg
Searching for scikit-image<0.13,>=0.12
Reading https://pypi.python.org/simple/scikit-image/
Downloading https://pypi.python.org/packages/86/d0/b0192dc9a544da90f2d9150bcd84b981c6873e42a1f752b6affb89180ad8/scikit-image-0.12.3.tar.gz#md5=04ea833383e0b6ad5f65da21292c25e1
Best match: scikit-image 0.12.3
Processing scikit-image-0.12.3.tar.gz
Writing /tmp/easy_install-ids2h2tl/scikit-image-0.12.3/setup.cfg
Running scikit-image-0.12.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-ids2h2tl/scikit-image-0.12.3/egg-dist-tmp-mqepoz_1
error: SandboxViolation: mkdir('/home/travis/.config', 511) {}
The package setup script has attempted to modify files on your system
that are not within the EasyInstall build area, and has been aborted.
This package cannot be safely installed by EasyInstall, and may not
support alternate installation locations even if you run its setup
script by hand.  Please inform the package's author and the EasyInstall
maintainers to find out if a fix or workaround is available.
The command "python setup.py install" failed and exited with 1 during .
Your build has been stopped.

Python 3.5 issue with compute_mean_level method in ImageSignature

avg_grey[i, j] = np.mean(image[lower_x_lim:upper_x_lim, lower_y_lim:upper_y_lim]) # no smoothing here as in the paper

lower_x_lim, upper_x_lim, lower_y_lim, and upper_y_lim are causing the slice to through an exception in python 3.5.

Does anyone else have this issue?

What does your cluster look like?

First of al thanks for this project!

includes a database backend that easily scales to billions of images and supports sustained high rates of image insertion: up to 10,000 images/s on our cluster!

I was wondering what that cluster would look like. How many and what types of nodes for image-match and for Elasticsearch, as well as CPU and memory for each. I'm thinking of using Google's container engine (Kubernetes) for deployment and need to estimate cost.

Hi, I havent installed it yet but can this search find two .............

Hi, I havent installed it yet but can this search find two images of same person in completely different settings for example

OBAMA 1

And OBAMA 2

Feature request: make cairosvg optional

It's pretty big dependency; I think many people would be happy with PIL / Pillow alone.

Problem install module, cairocffi

I had this issue, but have now found a solution.

OS: Ubuntu 16.04
Python 3.5.2

While installing the the needed modules, I ran into a version mismatch error
after entering

sudo pip3 install cairocffi

I got

AssertionError: version mismatch, 1.5.2 != 1.8.3

Full error message from pip install

Solution

Downgrade cffi to previous version and attempt to install cairocffi again.

sudo pip3 uninstall cffi
sudo pip3 install cffi==1.5.2
sudo pip3 install cairocffi

Add documentation for filtered meta-search

Need to write documentation for new feature #63

add ext save

I modified the code, adds a field.
You can add pictures at the same time adding a JSON text description.
At the same time of reading can be obtained.
But I'm not familiar with git, I do not know how to submit to you.
So, playing a compressed package. You see if you can merge into it.
I hope that will not increase your work.

download url:
http://7jpsbs.com1.z0.glb.clouddn.com/image-match.tar.gz

Two identical image diff significant

Hi, I have two identical image, but the algorithm outs two different signatures.

First image:

Second image:

Code below:

from image_match.goldberg import ImageSignature
gis = ImageSignature()
test01 = gis.generate_signature('test01.jpg')
test02 = gis.generate_signature('test02.jpg')
gis.normalized_distance(test01, test02)

which outputs

0.70823708184882128

Is that right?

can't search

hi, i have question. I try to search image but i got nothing in result. This is my code.
es = Elasticsearch("192.168.20.35:9200") ses = SignatureES(es) list = ses.search_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg') print(list)

And i got [] as the result.

Pypi support

@sbellem has some ideas about fixing the setup... @vrde let's all discuss when you get back?

(Question) Storing hashes in RDBMS

I'm asking this question since I think it might benefit other people.

I made a driver that stores the hashes in a PostgreSQL database. My test database contains 25k images. The query process looks roughly like this:

Choose an image to query database for
Compute the image's signature and lookup words (let's say the words = [12895189, 2517912795, 72159172, 1275215791, ...])

Get IDs and signatures of relevant images, using the lookup words in a query similar to this:

SELECT DISTINCT(image_id), image_signature
FROM image
INNER JOIN image_signature_lookup ON image.image_id = image_signature_lookup.image_id
WHERE image_signature_lookup.word IN (12895189, 2517912795, 72159172, 1275215791, ...)

(for a test image this returns about 7.5k images)

image.image_id, image_signature_lookup.image_id and image_signature_lookup.word are all indexed.

Compute the final distances with ProcessPoolExecutor and numpy, assemble the actual search results

Test query made this way takes 1.2 s. What I'm worried about is scaling this solution - for every 10k images, the database grows by about 75 MB in size, and over 600 000 lookup records are created.

My questions are:

is this anywhere near as fast as it should be, or should I set up elastic search after all?
what N and k values, besides the default ones, could bring down the database size while still being useful for detecting near duplicates (+- JPEG artifacts etc.)?

Query regarding image storage

I intend to try out this project in my application however I have a doubt regarding the add image call. Does it actually storing the image in elastic search server? I hope not. What I want to do is store the images elsewhere,possibly AWS S3 and use the URLs to add images though your app to elastic search and then search them whenever I want. If you store the images as well it is a double storage for my app. So just want to know.

not able to install on raspbian

after trying: sudo pip install image_match
it is stuck forever and shows only this:

Processing /home/pi/image-match
Collecting scikit-image<0.13,>=0.12 (from image-match==1.1.2)
  Using cached scikit-image-0.12.3.tar.gz

same with trying to build with source .

I'm getting elasticsearch.exceptions.ConnectionError

OS: Ubuntu 16.04
Python 3.5.2

I'm just trying to get the basic example working from the Documentation.

I installed using sudo pip3 install . while in the image-match directory. Here's the code I'm running in test.py:

from elasticsearch import Elasticsearch
from image_match.elasticsearch_driver import SignatureES

es = Elasticsearch()
ses = SignatureES(es)

ses.add_image('http://i.imgur.com/KUuRtTc.jpg')

The exception / error I'm getting is a chain of exceptions with the most recent one being

elasticsearch.exceptions.ConnectionError:
ConnectionError(<urllib3.connection.HTTPConnection object at 0x7fb8425c27b8>: 
Failed to establish a new connection: 
[Errno 111] Coonnection.HTTPConnection object at 0x7fb8425c27b8>: 
Failed to establish a new connection: [Errno 111] Connection refused)

Full traceback

Docker image not found

I cannot pull docker image at ascribe/image-match. [not found]

rhsimplex / image-match Goto Github PK

image-match's People

Stargazers

Watchers

Forkers

image-match's Issues

Recommend Projects

Recommend Topics

Recommend Org