Coder Social home page Coder Social logo

Comments (7)

lmcinnes avatar lmcinnes commented on May 17, 2024

I think a sparse distance matrix is not actually supported (it should be, it's a valid use case). Sparse matrix input as feature vectors works, but the distance matrix its a little trickier. I believe it could be implemented, but it would be very slow on the size matrix you have -- the main difficulty is computing the core distance for each point, which involves sorting every row (or column, but I see you have a csr, so let's deal with rows). That is a lot of sorts to run, even if each is small. Let me know if you're interested and I'll try and get this implemented when I get a chance.

from hdbscan.

lmcinnes avatar lmcinnes commented on May 17, 2024

I have implemented support for this now -- last changes commited to master should cover it. I can't speak for performance (I suspect it won't be super) at this time, but it shouldn't be too bad. I leaned heavily on scipy's sparse matrix csgraph minimum spanning tree implementation, so hopefully that will suffice if your distance matrix is sparse enough.

from hdbscan.

moi90 avatar moi90 commented on May 17, 2024

Does the hdbscan implementation depend on a specific sparse matrix implementation? Can I use all flavours in scipy.sparse? What about Pysparse?

from hdbscan.

lmcinnes avatar lmcinnes commented on May 17, 2024

from hdbscan.

moi90 avatar moi90 commented on May 17, 2024

Thanks! Another question: Does your implementation of MST inherit the following property? "This routine uses undirected graphs as input and output. That is, if graph[i, j] and graph[j, i] are both zero, then nodes i and j do not have an edge connecting them. If either is nonzero, then the two are connected by the minimum nonzero value of the two." Or (equivalently): Is it enough to store just one direction of every edge?

from hdbscan.

lmcinnes avatar lmcinnes commented on May 17, 2024

If you are using the sparse distance matrix input format (i.e. sparse matrix as X,metric='precomputed' then in principle yes. The catch comes that the distance matrix needs to be converted into a mutual-reachability distance matrix. That step will alter the requirement to be "If either is nonzero, then the two are connected by the maximum nonzero value of the two." because of the way things get calculated. I think you might actually also need the full symmetric matrix to compute core-distance correctly in that step, but I would have to look through the code and think a little more about that.

from hdbscan.

moi90 avatar moi90 commented on May 17, 2024

Thanks! It seems that precomputing the distance matrix for a large dataset is not the best idea...

from hdbscan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.