Coder Social home page Coder Social logo

predict for HDBSCAN about dbscan HOT 9 CLOSED

mhahsler avatar mhahsler commented on May 24, 2024
predict for HDBSCAN

from dbscan.

Comments (9)

peekxc avatar peekxc commented on May 24, 2024

Predicting which clusters new points belong be done simply w/ the cluster membership probabilities for either the default clustering returned or for the clusters returned by cutree-ing the hierarchy (see this issue).

One small technical issue is that since both DBSCAN and HDBSCAN are unsupervised frameworks for clustering, and the predicted clusters won't necessarily match the result of e.g. running DBSCAN/HDBSCAN on the original data set w/ the new data instead, i.e. cluster(X) + predict(new X) != cluster(X + new X). But if people are fine with this w/ DBSCAN then I don't see why not to add this functionality to HDBSCAN

from dbscan.

mhahsler avatar mhahsler commented on May 24, 2024

Predicting cluster membership on new data is a useful thing and should be added.

from dbscan.

moredatapls avatar moredatapls commented on May 24, 2024

sounds good, i will try to create a PR.

Predicting which clusters new points belong be done simply w/ the cluster membership probabilities for either the default clustering returned or for the clusters returned by cutree-ing the hierarchy (see this issue).

@peekxc could you clarify what you mean by that? I'm not entirely sure how to implement your suggestions. what do you mean by the "default clustering"?

from dbscan.

peekxc avatar peekxc commented on May 24, 2024

@moredatapls What I mean is that HDBSCAN is not a singular clustering algorithm per-se. If you run hdbscan, it creates a hierarchy, optimizes a mass-sensitive criterion to generate a set of local 'cuts' in the hierarchy. The clusters resulting from these cuts are what I refer to as the 'default' clustering.

But HDBSCAN isn't limited to just those local cuts, you can also use it as you would with a more traditional cluster hierarchy, e.g.

data("DS3")
res <- hdbscan(DS3, minPts = 50)
cutree(res$hc, k = 8)

For the prediction though, I think the default clustering is fine.

from dbscan.

mhahsler avatar mhahsler commented on May 24, 2024

I think the default clustering is fine. I have now extracted the predict functions into its own file predict.R. Please put the code for HDBSCAN there.

from dbscan.

mhahsler avatar mhahsler commented on May 24, 2024

@peekxc: Please review the code.

from dbscan.

jwijffels avatar jwijffels commented on May 24, 2024

+1 for a predict.hdbscan, it is something we need if we want to implement https://github.com/michalovadek/top2vecr and put a package on CRAN for that.

from dbscan.

mhahsler avatar mhahsler commented on May 24, 2024

hdbscan has now a predict function.

from dbscan.

jwijffels avatar jwijffels commented on May 24, 2024

thanks!

from dbscan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.