Coder Social home page Coder Social logo

Comments (12)

duhaime avatar duhaime commented on August 18, 2024 1

Hmm, at that step the layout models should be a dict mapping each layout label to positional vertices based on the image vectors (see get_layout_models). Then to store those in the JSON format used by the viewer, we iterate over the images, as we store errored images as an array, and want to skip those errored images when creating the JSON.

I've never liked that the image and vector are decoupled like this--it'd be nice to have a minimal class that keeps image and vector together...

If image validation is functioning properly, your reprocessing should go swimmingly. It's been a while since I looked at that bit of the pipeline...

It would be nice to start building a test suite for this kind of thing. If time permits later in the week I'll try and get started on that front and the image+vec unification in a custom class...

from pix-plot.

tokee avatar tokee commented on August 18, 2024 1

I'll see if I can isolate the images discarded in the validation phase and reproduce with a small enough corpus to share.

from pix-plot.

duhaime avatar duhaime commented on August 18, 2024

Hmm, that's interesting. There may be some memory leak in one of the dimension reduction libraries used. Can you try adding some print statements between the model calls in:

return {
  'tsne_2d': center_features(tsne_2d_model.fit_transform(vecs)),
  'tsne_3d': center_features(tsne_3d_model.fit_transform(vecs)),
  'umap_2d': center_features(umap_2d_model.fit_transform(vecs)),
}

to figure out which is the culprit? If you have opportunity to monitor / eyeball the RAM usage by each, I'd be curious to hear what you find...

from pix-plot.

duhaime avatar duhaime commented on August 18, 2024

Also, given that the error springs from one of the scikit tsne models, you could try replacing those calls with https://github.com/DmitryUlyanov/Multicore-TSNE, which should give a speed boost. I'm not sure how RAM requirements scale in the multicore tsne, but it could be worth a shot!

from pix-plot.

tokee avatar tokee commented on August 18, 2024

Thank you for your suggestions. I have added timestamped debugs around the lines that you stated and started a script that outputs free memory every 10 seconds. It will take some hours to get a result, which I'll post here. After that I'll try and use the Multicore-TSNE-version.

from pix-plot.

tokee avatar tokee commented on August 18, 2024

The OOM happens when calling tsne_2d_model.fit_transform(vecs). My memory-watching script (polling every 10 seconds) reported no anomalies around that time, so guessing I would say the the extmath module tries to allocate one single very large structure in return fast_dot(a, b) and fails immediately. My Python skills does not extend to opening up that module and making it output a and b.

I'll see if the Multicore-TSNE helps.

from pix-plot.

duhaime avatar duhaime commented on August 18, 2024

Sounds good! I've also just read promising remarks about the Barnes-Hut t-SNE implementation, which is rumored to scale to much larger datasets with lower ram requirements: https://github.com/lvdmaaten/bhtsne

from pix-plot.

tokee avatar tokee commented on August 18, 2024

Running with Multicore-TSNE worked well, with no noticeable memory overhead when reaching the center feature calculation phase. Unfortunately it failed during umap_2d with the error below. Seems like a one-off error?

 * center features called umap_2d at 2019-05-04 16:56:26
Traceback (most recent call last):
  File "utils/process_images.py", line 648, in <module>
    tf.app.run()
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "utils/process_images.py", line 644, in main
    PixPlot(image_glob)
  File "utils/process_images.py", line 77, in __init__
    if self.flags.process_images: self.process_images()
  File "utils/process_images.py", line 192, in process_images
    self.write_json()
  File "utils/process_images.py", line 396, in write_json
    'cells': self.get_cell_data(),
  File "utils/process_images.py", line 343, in get_cell_data
    layouts = [[float(j) for j in layout_models[k][idx]] for k in layout_keys]
  File "utils/process_images.py", line 343, in <listcomp>
    layouts = [[float(j) for j in layout_models[k][idx]] for k in layout_keys]
IndexError: index 270682 is out of bounds for axis 0 with size 270682

from pix-plot.

duhaime avatar duhaime commented on August 18, 2024

Yes, I think your "off by one" hypothesis sounds right...

Did you validate the input images? If so, did any of the inputs cause issues? If one did, I'd be grateful if you could send it. Either way, something in the image validation pipeline will need addressing if you validated the images...

from pix-plot.

tokee avatar tokee commented on August 18, 2024

Ah, my bad. I used --validate_images=False as the initial run was with validation. It seems that the code iterates number of images and not number of loaded image vectors. Not sure if that is the right thing to do or not. Either way, I've started a re-run with --validate_images=True.

from pix-plot.

tokee avatar tokee commented on August 18, 2024

After 2-3 days run, I got the same error (pasted below to also get the timing). Strangely enough, calling top during processing showed only 1 CPU at full tilt most of the time. My imports are changed to MulticoreTSNE:

...
from sklearn.metrics import pairwise_distances_argmin_min
# from sklearn.manifold import TSNE
from MulticoreTSNE import MulticoreTSNE as TSNE
...

Running with

python utils/process_images.py --clusters=40 --validate_images=True --copy_images=False --image_files="/data01/dsc/static_content/pixplot/kb_all/output/1200/*.jpg"

produced

...
 * loaded 270681 of 270682 image vectors
 * loaded 270682 of 270682 image vectors
 * calculating 40 clusters
 * generating image position data
 * building lower-dimensional projections
 * Calculating all center features at 2019-05-07 13:16:04
 * center features called tsne_2d at 2019-05-08 14:28:08
 * center features called tsne_3d at 2019-05-09 15:16:20
 * center features called umap_2d at 2019-05-09 15:58:49
Traceback (most recent call last):
  File "utils/process_images.py", line 648, in <module>
    tf.app.run()
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "utils/process_images.py", line 644, in main
    PixPlot(image_glob)
  File "utils/process_images.py", line 77, in __init__
    if self.flags.process_images: self.process_images()
  File "utils/process_images.py", line 192, in process_images
    self.write_json()
  File "utils/process_images.py", line 396, in write_json
    'cells': self.get_cell_data(),
  File "utils/process_images.py", line 343, in get_cell_data
    layouts = [[float(j) for j in layout_models[k][idx]] for k in layout_keys]
  File "utils/process_images.py", line 343, in <listcomp>
    layouts = [[float(j) for j in layout_models[k][idx]] for k in layout_keys]
IndexError: index 270682 is out of bounds for axis 0 with size 270682
Thu May  9 16:09:31 CEST 2019

from pix-plot.

duhaime avatar duhaime commented on August 18, 2024

The experimental branch has happily been rewritten and shouldn't trigger this anymore @tokee. If you see this behavior again in the future, though, please feel free to reopen!

from pix-plot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.