Coder Social home page Coder Social logo

denstream's Introduction

Hello there ๐Ÿ‘‹ I'm Issa

LinkedIn Medium Email Email

About me ๐Ÿš€

  • ๐ŸŽ“ I am a Machine Learning Lead @Luko
  • โค๏ธ I am passionate about Software Engineering, Machine Learning/Deep Learning, Computer Vision and Music
  • ๐Ÿ“œ BSc Informatics and MSc Artificial Intelligence
  • ๐Ÿ“ซ How to reach me: [email protected]
  • ๐Ÿ  Paris (๐Ÿ‡ซ๐Ÿ‡ท)

Issa's github stats

denstream's People

Contributors

issamemari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

denstream's Issues

getting empty p_micro_cluster_centers when trying the algorithm with less n_samples (10, 50, 100, 300 ...)

As the title says, I'm struggling trying to decode **why I am getting empty p_micro_cluster_centers when running Test.py setting n_samples to a smaller number **. Any help will be appreciated.

image

image

I don't know why but there are not any issues of the kind if the data set you use has much more elements. I tried the algorithm with 500 n_samples and it worked fine.

IMPORTANT: there seems to be a problem in the _partial_fit method when it comes to generate p_micro_clusters ... I think it has to do with the if statement but I do not know what to do about it:

def _partial_fit(self, sample, weight):
     self._merging(sample, weight)
     if self.t % self.tp == 0:
         self.p_micro_clusters = [p_micro_cluster for p_micro_cluster
                                  in self.p_micro_clusters if
                                  p_micro_cluster.weight() >= self.beta *
                                  self.mu]
         Xis = [((self._decay_function(self.t - o_micro_cluster.creation_time
                                       + self.tp) - 1) /
                 (self._decay_function(self.tp) - 1)) for o_micro_cluster in
                self.o_micro_clusters]
         self.o_micro_clusters = [o_micro_cluster for Xi, o_micro_cluster in
                                  zip(Xis, self.o_micro_clusters) if
                                  o_micro_cluster.weight() >= Xi]
     self.t += 1

Thanks in advance!

fit_predict only returns labels for newly added data, is that intended?

Hello. I can see in the code that fit_predict gets an X array of data, adds it to the micro clusters and then runs DBSCAN on them (the micro clusters). However, it then returns labels only for X but not for preexisting data.
A more concrete example of how I want to use the algorithm:
I have multiple days of data.
I run partial_fit using only the 1st day, and this creates some micro clusters.
Then I run partial_fit on the data of the 2nd day, and this updates the existing micro clusters.
etc
If on my final day I want to run fit_predic to get the final clustering result, I have to give all the points (of all the days) as X because the function only labels those points.
Is it how it is intended?

Get items from each cluster

Hi, is there a way to get the elements of each cluster ?. For example, if a cluster has a microclusters and 10 elements. How can I get those 10 elements with their X, Y coordinates?

fixed eps for dbscan when clustering micro clusters?

Hi! I was reading the code and came across this:

file DenStream.py, line 130: dbscan = DBSCAN(eps=0.3, algorithm='brute')

I'm trying to understand the algorithm so here are my questions:

  1. why setting a fixed DBSCAN eps param? shouldn't it vary according to every problem/data set?
  2. if fixed, why 0.3 instead of another value?

Reading the MOA (Java framework) DenStream implementation, I found the following: a constant is multiplied to the original parametrized 'epsilon' to get the value of the DBSCAN epsilon ...

image

https://github.com/Waikato/moa/blob/master/moa/src/main/java/moa/clusterers/denstream/WithDBSCAN.java

Is there a way to not assign every point to a cluster?

In fit_predict I see the labels are returned. Each individual label is assigned to the nearest microcluster and hence every point has a label. Yet we are using dbscan in this methodology which allows for non-assignment of points. Is there a way to adapt this code to have some points not labeled (e.g. -1) for points which are outliers. In practice, I see that there are some points which are too far away from the centers yet could not reasonably belong to any other microcluster and this should be unassigned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.