This Project implements image retrieval from large image dataset using different image similarity measures based on the following two approaches.
- Based on Siamese Network which is neural network architectures that contain two or more identical subnetworks
- can be used for predefined image dataset and must be trained on that image dataset to work for our purpose
- Using Resnet pre-trained Network to extract features and store them based on LSH simmilarity to get faster responce for large dataset
- Siamese Network
-
to train on new dataset look at Siamese-networks-train.ipynb
-
after training on that dataset it can return simmilarity measure between images
example code at SiameseTest.py
-
- Using Pretrained resent18 model
-
after feature extraction using pretrained model it calculates image simmilarity
example code at storeLSH.ipynb
-
#To install requirements for the project
$ pip install -r requirements.txt
$ pip install grpcio
$ pip install grpcio-tools
run the following command to generate gRPC classes for Python
# only in Service folder run
$ python3.6 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. image_retrival.proto
To run it on your own image, use the following command. Please make sure to see their paper / the code for more details.
# on project directory this will start the server
$ python start_service.py
If you have a nvidia-docker2 installed, we have Dockerfile.gpu which you can use to build your image or if that doesn't exist.
./deploy_service.sh
Note the above script resolves to build the docker container depending on the availability of nvidia-docker. Also without GPU the cosine similarity and euclidean similarity measure computation takes 6+ hours. It's best suited to have a GPU.
We need to mount the classed_data folder as that is the images we are going to return reside in.
# this will open port 50051 and run the service
docker run -it -v $PWD/data/classed_data:/image-retrieval-in-pytorch/data/classed_data -p 8003:8003 -p 8004:8004 singularitynet:image-retrieval-cpu
cd models/
#download dataset using
bash download.bash
#to create classed_data folder to generate hash tables
python preprocess.py
#to generate hash table
# Look at the class to work on specific dataset from ours
python generate_hashtable.py
- As given in storeLSH.ipynb you can initialize LSH engine and add image embedding after hashing and comparing them either by cosine similarity or Euclidean distance . then you will save the table using pickle
- first read the pickle file and then you can query the image you have on that hash table . it will return the index that is similar , in our case the image path
preparing a better cleaner and good resolution data-set can improve the output results
- Israel Abebe- Author and Maintainer - SingularityNet.io
- Tesfa Yohannes - Maintainer - SingularityNet.io