Comments (5)
Interesting -- if you start Python and try:
import hdbscan
...do you get no response (which is good!) or an error?
from pix-plot.
Error indeed :
> python
Python 3.7.16 (default, Mar 22 2023, 16:00:53)
[GCC 12.2.1 20230201] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hdbscan
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module>
from .hdbscan_ import HDBSCAN, hdbscan
File "/home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 21, in <module>
from ._hdbscan_linkage import (single_linkage,
File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
So it has something to do with numpy.
I did try to install different versions of numpy and hdbscan corresponding to pixplot last release (2020). And during those tests I noticed this error:
> pip install hdbscan==0.8.29
Collecting hdbscan==0.8.29
Using cached hdbscan-0.8.29-cp37-cp37m-linux_x86_64.whl
Collecting numpy>=1.20
Using cached numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
Requirement already satisfied: scikit-learn>=0.20 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (0.24.2)
Requirement already satisfied: scipy>=1.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (1.4.0)
Requirement already satisfied: cython>=0.27 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (0.29.33)
Requirement already satisfied: joblib>=1.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from hdbscan==0.8.29) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/nicolas/.pyenv/versions/3.7.16/lib/python3.7/site-packages (from scikit-learn>=0.20->hdbscan==0.8.29) (3.1.0)
Installing collected packages: numpy, hdbscan
Attempting uninstall: numpy
Found existing installation: numpy 1.19.5
Uninstalling numpy-1.19.5:
Successfully uninstalled numpy-1.19.5
Attempting uninstall: hdbscan
Found existing installation: hdbscan 0.8.26
Uninstalling hdbscan-0.8.26:
Successfully uninstalled hdbscan-0.8.26
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.5.0 requires numpy~=1.19.2, but you have numpy 1.21.6 which is incompatible.
pixplot 0.0.113 requires numpy==1.19.5, but you have numpy 1.21.6 which is incompatible.
Successfully installed hdbscan-0.8.29 numpy-1.21.6
WARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.
You should consider upgrading via the '/home/nicolas/.pyenv/versions/3.7.16/bin/python3.7 -m pip install --upgrade pip' command.
pixplot has worked (with "hdbscan not available") with config numpy==1.19.5
and hdbscan=0.8.24
-0.8.29
from pix-plot.
I believe it has something to do with tensorflow
, cuda
, libcudart.so.11.0
, etc. I am not sure I want to go that deep since I am using pixplot for ~1000 images dataset and an Intel GPU, which involves more heavy installations..
However, it seems that hdbscan takes into account the label/category column into the clustering, which is particularly interesting in my case. I believe the sklearn KMeans
does not, is that correct ?
Am I missing something else without CUML ?
HDBSCAN not available; using sklearn KMeans
CUML not available; using umap-learn UMAP
Thank you !
from pix-plot.
CUML is just a library that contains an accelerated implementation of UMAP; no worries there. You're correct that there are some real annoyances around numba and numpy; not sure if you're on Linux or not but there's some notes on the very end of this wiki page that might help:
https://github.com/YaleDHLab/pix-plot/wiki/Ubuntu-20-&-22-with-GPU
from pix-plot.
I am on Linux Manjaro, but I have a GPU Intel. Therefore, I am trying this, installing intel-extension-for-tensorflow 1.1.0, but it upgrades everything and breaks pixplot requirements.
Again, GPU or speed is not crucial to me. It's rather hdbscan that could improve my clustering from what I understand. But maybe I am mistaking ?
from pix-plot.
Related Issues (20)
- Dependency Requirements Issue HOT 1
- Image not displayed on iPad
- Download metadata for additional metadata fields HOT 1
- Add option to download metadata file for UMAP clusters like that for manually created clusters
- Best way to make a smaller "PixPlot" HOT 2
- metadata / custom fields HOT 1
- Bug with get_heightmap and umap layouts HOT 3
- Non-OK status: "platform is already registered with name: "METAL"
- Non-OK status: "platform is already registered with name: 'METAL'"
- Improper duplicate name validation in filter_images() HOT 5
- gbk code issue
- How would YaleDH like PixPlot to be ited?
- How would YaleDH like Pixplot to be cited?
- Associating filenames and clusters HOT 1
- Demo websites are down
- ThreeJS error when selecting lat/long view
- Sample data downloads are broken
- Could not build wheels for scipy, HOT 6
- Tensorflow Issue? terminating due to uncaught exception of type google::protobuf::FatalException: HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pix-plot.