Coder Social home page Coder Social logo

HDBSCAN not available about pix-plot HOT 5 OPEN

lakonis avatar lakonis commented on May 25, 2024
HDBSCAN not available

from pix-plot.

Comments (5)

pleonard212 avatar pleonard212 commented on May 25, 2024

Interesting -- if you start Python and try:

import hdbscan

...do you get no response (which is good!) or an error?

from pix-plot.

lakonis avatar lakonis commented on May 25, 2024

Error indeed :

> python                                                                                                
Python 3.7.16 (default, Mar 22 2023, 16:00:53) 
[GCC 12.2.1 20230201] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hdbscan
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
  File "/home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 21, in <module>
    from ._hdbscan_linkage import (single_linkage,
  File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

So it has something to do with numpy.

I did try to install different versions of numpy and hdbscan corresponding to pixplot last release (2020). And during those tests I noticed this error:

> pip install hdbscan==0.8.29                                                                                                       
Collecting hdbscan==0.8.29
  Using cached hdbscan-0.8.29-cp37-cp37m-linux_x86_64.whl
Collecting numpy>=1.20
  Using cached numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
Requirement already satisfied: scikit-learn>=0.20 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (0.24.2)
Requirement already satisfied: scipy>=1.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (1.4.0)
Requirement already satisfied: cython>=0.27 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (0.29.33)
Requirement already satisfied: joblib>=1.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from scikit-learn>=0.20->hdbscan==0.8.29) (3.1.0)
Installing collected packages: numpy, hdbscan
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
  Attempting uninstall: hdbscan
    Found existing installation: hdbscan 0.8.26
    Uninstalling hdbscan-0.8.26:
      Successfully uninstalled hdbscan-0.8.26
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.5.0 requires numpy~=1.19.2, but you have numpy 1.21.6 which is incompatible.
pixplot 0.0.113 requires numpy==1.19.5, but you have numpy 1.21.6 which is incompatible.
Successfully installed hdbscan-0.8.29 numpy-1.21.6
WARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.
You should consider upgrading via the '/home/nicolas/.pyenv/versions/3.7.16/bin/python3.7 -m pip install --upgrade pip' command.

pixplot has worked (with "hdbscan not available") with config numpy==1.19.5 and hdbscan=0.8.24-0.8.29

from pix-plot.

lakonis avatar lakonis commented on May 25, 2024

I believe it has something to do with tensorflow, cuda, libcudart.so.11.0, etc. I am not sure I want to go that deep since I am using pixplot for ~1000 images dataset and an Intel GPU, which involves more heavy installations..

However, it seems that hdbscan takes into account the label/category column into the clustering, which is particularly interesting in my case. I believe the sklearn KMeans does not, is that correct ?

Am I missing something else without CUML ?

HDBSCAN not available; using sklearn KMeans
CUML not available; using umap-learn UMAP

Thank you !

from pix-plot.

pleonard212 avatar pleonard212 commented on May 25, 2024

CUML is just a library that contains an accelerated implementation of UMAP; no worries there. You're correct that there are some real annoyances around numba and numpy; not sure if you're on Linux or not but there's some notes on the very end of this wiki page that might help:

https://github.com/YaleDHLab/pix-plot/wiki/Ubuntu-20-&-22-with-GPU

from pix-plot.

lakonis avatar lakonis commented on May 25, 2024

I am on Linux Manjaro, but I have a GPU Intel. Therefore, I am trying this, installing intel-extension-for-tensorflow 1.1.0, but it upgrades everything and breaks pixplot requirements.

Again, GPU or speed is not crucial to me. It's rather hdbscan that could improve my clustering from what I understand. But maybe I am mistaking ?

from pix-plot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.