elcorto / imagecluster Goto Github PK

Cluster images based on image content using a pre-trained deep neural network, optional time distance scaling and hierarchical clustering.

Home Page: https://elcorto.github.io/imagecluster

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

image-clustering clustering pre-trained deep-neural-networks python

imagecluster's People

Contributors

Stargazers

Watchers

Forkers

maduser64 kheffah xuehaouwa codes-kzhan pandinosaurus darshisen caglorithm iamweiweishi shivareddy37 minyall sysau sumanthreddykaliki mahboobmm suiauthon hankeping dgreyling scriptsmith polytronicgr schen6 kaggledevs ashvinmacaw heladio-ac qinghaizheng1992 isaacueca lepe92 phymucs jjeaby sionhu 5l1v3r1 manikant92 huzhangron zhukkang sun-jiao own2pwn gerintang hummans file-campuran luckeybertl28 aravinda89 hhiraba hillary142 success-vera admariner bellyfat vanguard478 bassemfg liuguoyou robrui mauererm sangkyunyoon

imagecluster's Issues

How to pick one most representative images from each clusters?

Hi, thanks for your work! I have successfully run the code and gotten clusters of images. And now I hope to pick the most representative images from each cluster. Could you plz give me some advice how how to do it?

fingerprints.pk

Sorry，where is fingerprints.pk？

Reading Timestamp Error

reading timestamps ...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py",

Generate the dedrogram

Great work and very instructive documentation! However, I didn't see any instruction on generating the dendrogram showing on the "Method" page. Could you tell me how you guys generate the dendrogram using your hierarchical clustering method?

Thanks ahead!

No module named 'PIL'

from imagecluster import main
Traceback (most recent call last):
File "<pyshell#4>", line 1, in
from imagecluster import main
File "c:\users\eva\desktop\imageclustering\imagecluster-master\imagecluster\main.py", line 3, in
from imagecluster import calc as ic
File "c:\users\eva\desktop\imageclustering\imagecluster-master\imagecluster\calc.py", line 7, in
import PIL.Image
ModuleNotFoundError: No module named 'PIL'

Clusters overlapping

Is the provided code supports the overlapping between clusters ??
for example: some images should be repeated in more than one cluster.

ImportError: cannot import name 'main'

I am unable to run the example. stuck in first import line. Code is running as it has no module name main. None of the other help in form of docstring is available. Kindly rectify the issue at your earliest.

Issue:

 ImportError                               Traceback (most recent call last)
<ipython-input-2-3a7d49ef9779> in <module>()
----> 1 from imagecluster import main

ImportError: cannot import name 'main'

Question on suitable database to store image feature vectors

Hello. I'm looking for suggestions on a suitable database to store the image feature vectors extracted by the function fingerprint(image, model) for a large set of images.

When using a database a new extraction can be added easily to the database without recomputing all image fingerprints. Also, as the database grows, the need to load all the fingerprints into the memory can be eliminated, something that cannot be done when using pickle.

is it possible to add a new image without re-clustering?

Your work is intriguing and I would like to understand what happens if I have a new image to be added to the database. The NN paradigm should allow me to parse the tree and look for the node that is nearest to my new image. What does happen with your software in such a case?

Why do I report the following error when using this API

My environment is Windows7，python3.6，TensorFlow 1.8.0

create image arrays test/imagecluster\images.pk
create image arrays test/imagecluster\images.pk
create image arrays test/imagecluster\images.pk
create image arrays test/imagecluster\images.pk
create image arrays test/imagecluster\images.pk
exitcode = _main(fd)
exitcode = _main(fd)
exitcode = _main(fd)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
exitcode = _main(fd)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
exitcode = _main(fd)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
prepare(preparation_data)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
prepare(preparation_data)
prepare(preparation_data)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
prepare(preparation_data)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
_fixup_main_from_path(data['init_main_from_path'])
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "D:\Anaconda3\lib\runpy.py", line 263, in run_path
run_name="mp_main")
File "D:\Anaconda3\lib\runpy.py", line 263, in run_path
run_name="mp_main")
File "D:\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
run_name="mp_main")
File "D:\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
run_name="mp_main")
File "D:\Anaconda3\lib\runpy.py", line 263, in run_path
mod_name, mod_spec, pkg_name, script_name)
File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code
pkg_name=pkg_name, script_name=fname)
File "D:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
mod_name, mod_spec, pkg_name, script_name)
File "D:\Documents\Downloads\imagecluster-master\examples\example_api_minimal.py", line 10, in
File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code
pkg_name=pkg_name, script_name=fname)
File "D:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
images,fingerprints,timestamps = icio.get_image_data('test/')
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 252, in get_image_data
exec(code, run_globals)
exec(code, run_globals)
File "D:\Documents\Downloads\imagecluster-master\examples\example_api_minimal.py", line 10, in
File "D:\Documents\Downloads\imagecluster-master\examples\example_api_minimal.py", line 10, in
mod_name, mod_spec, pkg_name, script_name)
File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code
images,fingerprints,timestamps = icio.get_image_data('test/')
images,fingerprints,timestamps = icio.get_image_data('test/')
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 252, in get_image_data
pkg_name=pkg_name, script_name=fname)
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 252, in get_image_data
File "D:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exec(code, run_globals)
File "D:\Documents\Downloads\imagecluster-master\examples\example_api_minimal.py", line 10, in
mod_name, mod_spec, pkg_name, script_name)
File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code
images,fingerprints,timestamps = icio.get_image_data('test/')
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 252, in get_image_data
exec(code, run_globals)
File "D:\Documents\Downloads\imagecluster-master\examples\example_api_minimal.py", line 10, in
images,fingerprints,timestamps = icio.get_image_data('test/')
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 252, in get_image_data
images = read_images(imagedir, **img_kwds)
images = read_images(imagedir, **img_kwds)
images = read_images(imagedir, **img_kwds)
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 184, in read_images
images = read_images(imagedir, **img_kwds)
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 184, in read_images
images = read_images(imagedir, **img_kwds)
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 184, in read_images
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 184, in read_images
File "D:\Documents\Downloads\imagecluster-master\imagecluster\io.py", line 184, in read_images
with Pool(ncores) as pool:
with Pool(ncores) as pool:
with Pool(ncores) as pool:
File "D:\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
File "D:\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
File "D:\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
with Pool(ncores) as pool:
with Pool(ncores) as pool:
File "D:\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
File "D:\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
context=self.get_context())
context=self.get_context())
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 174, in init
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 174, in init
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 174, in init
context=self.get_context())
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 174, in init
context=self.get_context())
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 174, in init
self._repopulate_pool()
self._repopulate_pool()
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
self._repopulate_pool()
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
self._repopulate_pool()
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
self._repopulate_pool()
File "D:\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
w.start()
File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
w.start()
File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
w.start()
File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
w.start()
File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
w.start()
File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
self._popen = self._Popen(self)
self._popen = self._Popen(self)
self._popen = self._Popen(self)
File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
self._popen = self._Popen(self)
File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
return Popen(process_obj)
return Popen(process_obj)
File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
return Popen(process_obj)
File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
return Popen(process_obj)
File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
prep_data = spawn.get_preparation_data(process_obj._name)
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
_check_not_importing_main()
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
_check_not_importing_main()
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
_check_not_importing_main()
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
is not going to be frozen to produce an executable.''')
is not going to be frozen to produce an executable.''')

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Build docs using gh-pages

No module named 'imagecluster'

Hi, I ran the first 2 lines of the code
from imagecluster import calc as ic
from imagecluster import postproc as pp

and i get this error : ModuleNotFoundError: No module named 'imagecluster'.

I checked the requirements doc, I have all the necessary packages.

Please help.

transfer learning the pre-trained model

Hi @elcorto ,

This repo a nice work!
I applied this work to my dataset and find it works fine.
Then i replace the vgg 16 model with vgg19/res50...and almost all pretrained model provided by keras. My view is that transfer learning the model to private dataset(different class number) will be better, since the cluster number of practical data to be processed is unknown and could be quite different from imagenet.
Besides, the distance matrix could be alternative type according to type of task.

Have you ever did further research on this repo?
Thanks very much for your contribution!

get cluster number of new image in already clustered algorithm

how to assign cluster to new example?

error on shape size (224, 224, 4) maybe related to keras/tenserflow versions?

======================================================================
ERROR: Failure: ValueError (Error when checking input: expected input_2 to have shape (224, 224, 3) but got array with shape (224, 224, 4))

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nose/failure.py", line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
File "/usr/lib/python3/dist-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python3/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python3/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/usr/lib/python3.5/imp.py", line 234, in load_module
return load_source(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "", line 693, in _load
File "", line 673, in _load_unlocked
File "", line 673, in exec_module
File "", line 222, in _call_with_frames_removed
File "/home/mobile/imagecluster/test.py", line 2, in
main.main('/home/mobile/app_icons/images', sim=0.5)
File "/home/mobile/imagecluster/imagecluster/main.py", line 28, in main
fps = ic.fingerprints(files, model, size=(224,224))
File "/home/mobile/imagecluster/imagecluster/imagecluster.py", line 131, in fingerprints
return dict((fn, fingerprint(fn, model, size)) for fn in files)
File "/home/mobile/imagecluster/imagecluster/imagecluster.py", line 131, in
return dict((fn, fingerprint(fn, model, size)) for fn in files)
File "/home/mobile/imagecluster/imagecluster/imagecluster.py", line 90, in fingerprint
return model.predict(arr4d_pp)[0,:]
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1147, in predict
x, _, _ = self._standardize_user_data(x)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 749, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training_utils.py", line 137, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected input_2 to have shape (224, 224, 3) but got array with shape (224, 224, 4)
-------------------- >> begin captured logging << --------------------
PIL.PngImagePlugin: DEBUG: STREAM b'IHDR' 16 13
PIL.PngImagePlugin: DEBUG: STREAM b'IDAT' 41 6482
--------------------- >> end captured logging << ---------------------

Ran 2 tests in 33.882s

FAILED (errors=1)

Expected 2D array, got 1D array instead: array=[].

when running
fps = ic.pca(fps, n_components=0.95)
i get the following error

**Traceback (most recent call last):
File "", line 1, in
File "/home/leena/mainproject/imagecluster/imagecluster/calc.py", line 185, in pca
Xp = PCA(kwds).fit(X).transform(X)
File "/usr/lib/python3/dist-packages/sklearn/decomposition/pca.py", line 329, in fit
self._fit(X)
File "/usr/lib/python3/dist-packages/sklearn/decomposition/pca.py", line 370, in _fit
copy=self.copy)
File "/usr/lib/python3/dist-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

a question about quantity imbalance

The 4000 images were clustered, but there were more than 3000 images in one category and less in other categories, resulting in quantity imbalance. I want to ask how do you make the number of categories not too different.

OSError: image file is truncated (30 bytes not processed)

I launched your algorithm with this code on 200 pictures :

#!/usr/bin/env python3
from imagecluster import main
main.main('/tmp/pictures/', sim=0.5)

I got this error :

Traceback (most recent call last):
File "test.py", line 4, in
main.main('/tmp/pictures/', sim=0.5)
File "/tmp/imagecluster/imagecluster/main.py", line 28, in main
fps = ic.fingerprints(files, model, size=(224,224))
File "/tmp/imagecluster/imagecluster/imagecluster.py", line 131, in fingerprints
return dict((fn, fingerprint(fn, model, size)) for fn in files)
File "/tmp/imagecluster/imagecluster/imagecluster.py", line 131, in
return dict((fn, fingerprint(fn, model, size)) for fn in files)
File "/tmp/imagecluster/imagecluster/imagecluster.py", line 67, in fingerprint
img = PIL.Image.open(fn).resize(size, 2)
File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 1747, in resize
self.load()
File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 228, in load
"(%d bytes not processed)" % len(b))
OSError: image file is truncated (30 bytes not processed)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.