Coder Social home page Coder Social logo

debacl's People

Contributors

bpkent avatar coaxlab0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

debacl's Issues

Hitting overflow issues on sparse data

I was playing with the latest code and seem to hit the following issue. My data is sparse with few clusters (I think!). Any workarounds possible ?

File "cluster.py", line 100, in
main()
File "cluster.py", line 72, in main
debacl_tree = get_debacl_tree(df,k,prune_threshold)
File "cluster.py", line 27, in get_debacl_tree
verbose=True)
File "/usr/local/lib/python2.7/site-packages/debacl/level_set_tree.py", line 1255, in construct_tree
density = _utl.knn_density(radii, n, p, k)
File "/usr/local/lib/python2.7/site-packages/debacl/utils.py", line 276, in knn_density
unit_vol = _np.pi**(p / 2.0) / _spspec.gamma(1 + p / 2.0)
OverflowError: (34, 'Result too large')

huge memory being used

I did testing for 389 images with 400k 128 dimensions sift key points features .

It took over 10gb ram and used 15gb swap memory , eventually my ssd was out of storage and process went dead without error information .

I reduced to half of images with 200k 128 dimensions sift key points features . Same thing happened .
is there any solution I can fix it ?

Pruning is buggy

Something is wrong with pruning. It messes up both the size of nodes and the color correspondence. It seems to occur only when the pruning threshold is larger than k, but it needs more investigation.

Surprising behavior for `get_clusters` with the `first-k` method.

When retrieving clusters with the first-k method, there is some surprising behavior for edge cases.

  • If there are more root nodes than k, all roots are returned as clusters, so there are more than k clusters.
  • If there are fewer than 'k', each leaf is a cluster, but this still leaves fewer than 'k' clusters.

The solution is probably to simply warn the user when the number of clusters returned isn't exactly 'k'.

Need a better error message for plot color nodes if a node is not valid

The get_clusters method can return "-1" values to indicate points that are "noise" for a given cluster labeling scheme. If this "-1" label is passed directly to the color_nodes parameter it's a problem because this isn't a valid tree node ID, but the error message isn't very helpful.

>>> cluster_labels = tree.get_clusters(fill_background=True)
>>> labels = np.unique(cluster_labels[:, 1])
>>> fig = tree.plot(color_nodes=labels)[0]
KeyError                                  Traceback (most recent call last)
<ipython-input-27-8cae5ab76c06> in <module>()
----> 1 fig = tree.plot(color_nodes=labels)[0]

/home/brian/projects/DeBaCl/debacl/level_set_tree.pyc in plot(self, form, horizontal_spacing, color_nodes, colormap)
    313 
    314         for i, ix in enumerate(color_nodes):
--> 315             subtree = self._make_subtree(ix)
    316 
    317             for ix_sub in subtree.nodes.keys():

/home/brian/projects/DeBaCl/debacl/level_set_tree.pyc in _make_subtree(self, ix)
    519 
    520         T = LevelSetTree()
--> 521         T.nodes[ix] = _copy.deepcopy(self.nodes[ix])
    522         T.nodes[ix].parent = None
    523         queue = self.nodes[ix].children[:]

KeyError: -1

knn density estimate produces overflow error for high-dimensional data

I have had success using the DeBaCl kNN density estimate for low-dimensional data. I get an overflow error with significantly higher dimension data (500k+ length vectors), which I presume is attributed to the $v_d$ (unit ball volume) in the denominator of the density estimate formula. An equation I found for the unit ball volume suggests this would be nearly 0, leading to a divide by nearly 0 and an overflow error. I'm using 1/(distance to the kth neighbor) as a density estimate placeholder for now.

end_level has huge number

I am using Debacl for images features clustering , and the tree results has huge end-level , is there something wrong with my implementation ?

tree = dcl.construct_tree(featlist, k=2,verbose=True)

b003ef06-5eb8-4f46-867c-3d3706601819

geom_tree import error

when I try to import geom tree module using below mention python import command
"from debacl import geom_tree as gtree"
an import error occurs. Kindly help me with this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.