Hello, I have the following packages running python 3.7.16: <div class="highlight

Interesting -- if you start Python and try: <div class="snippet-clipboard-content

Error indeed : <div class="highlight highlight-source-shell notranslate position-r

HDBSCAN not available about pix-plot HOT 5 OPEN

lakonis commented on May 25, 2024

HDBSCAN not available

from pix-plot.

Comments (5)

pleonard212 commented on May 25, 2024

Interesting -- if you start Python and try:

import hdbscan

...do you get no response (which is good!) or an error?

from pix-plot.

lakonis commented on May 25, 2024

Error indeed :

> python                                                                                                
Python 3.7.16 (default, Mar 22 2023, 16:00:53) 
[GCC 12.2.1 20230201] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hdbscan
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
  File "/home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 21, in <module>
    from ._hdbscan_linkage import (single_linkage,
  File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

So it has something to do with numpy.

I did try to install different versions of numpy and hdbscan corresponding to pixplot last release (2020). And during those tests I noticed this error:

> pip install hdbscan==0.8.29                                                                                                       
Collecting hdbscan==0.8.29
  Using cached hdbscan-0.8.29-cp37-cp37m-linux_x86_64.whl
Collecting numpy>=1.20
  Using cached numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
Requirement already satisfied: scikit-learn>=0.20 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (0.24.2)
Requirement already satisfied: scipy>=1.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (1.4.0)
Requirement already satisfied: cython>=0.27 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (0.29.33)
Requirement already satisfied: joblib>=1.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from scikit-learn>=0.20->hdbscan==0.8.29) (3.1.0)
Installing collected packages: numpy, hdbscan
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
  Attempting uninstall: hdbscan
    Found existing installation: hdbscan 0.8.26
    Uninstalling hdbscan-0.8.26:
      Successfully uninstalled hdbscan-0.8.26
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.5.0 requires numpy~=1.19.2, but you have numpy 1.21.6 which is incompatible.
pixplot 0.0.113 requires numpy==1.19.5, but you have numpy 1.21.6 which is incompatible.
Successfully installed hdbscan-0.8.29 numpy-1.21.6
WARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.
You should consider upgrading via the '/home/nicolas/.pyenv/versions/3.7.16/bin/python3.7 -m pip install --upgrade pip' command.

pixplot has worked (with "hdbscan not available") with config numpy==1.19.5 and hdbscan=0.8.24-0.8.29

from pix-plot.

lakonis commented on May 25, 2024

I believe it has something to do with tensorflow, cuda, libcudart.so.11.0, etc. I am not sure I want to go that deep since I am using pixplot for ~1000 images dataset and an Intel GPU, which involves more heavy installations..

However, it seems that hdbscan takes into account the label/category column into the clustering, which is particularly interesting in my case. I believe the sklearn KMeans does not, is that correct ?

Am I missing something else without CUML ?

HDBSCAN not available; using sklearn KMeans
CUML not available; using umap-learn UMAP

Thank you !

from pix-plot.

pleonard212 commented on May 25, 2024

CUML is just a library that contains an accelerated implementation of UMAP; no worries there. You're correct that there are some real annoyances around numba and numpy; not sure if you're on Linux or not but there's some notes on the very end of this wiki page that might help:

https://github.com/YaleDHLab/pix-plot/wiki/Ubuntu-20-&-22-with-GPU

from pix-plot.

lakonis commented on May 25, 2024

I am on Linux Manjaro, but I have a GPU Intel. Therefore, I am trying this, installing intel-extension-for-tensorflow 1.1.0, but it upgrades everything and breaks pixplot requirements.

Again, GPU or speed is not crucial to me. It's rather hdbscan that could improve my clustering from what I understand. But maybe I am mistaking ?

from pix-plot.

HDBSCAN not available about pix-plot HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent