coaxlab / debacl Goto Github PK
View Code? Open in Web Editor NEWDensity Based Clustering (DeBaCl) Toolbox
License: BSD 3-Clause "New" or "Revised" License
Density Based Clustering (DeBaCl) Toolbox
License: BSD 3-Clause "New" or "Revised" License
I was playing with the latest code and seem to hit the following issue. My data is sparse with few clusters (I think!). Any workarounds possible ?
File "cluster.py", line 100, in
main()
File "cluster.py", line 72, in main
debacl_tree = get_debacl_tree(df,k,prune_threshold)
File "cluster.py", line 27, in get_debacl_tree
verbose=True)
File "/usr/local/lib/python2.7/site-packages/debacl/level_set_tree.py", line 1255, in construct_tree
density = _utl.knn_density(radii, n, p, k)
File "/usr/local/lib/python2.7/site-packages/debacl/utils.py", line 276, in knn_density
unit_vol = _np.pi**(p / 2.0) / _spspec.gamma(1 + p / 2.0)
OverflowError: (34, 'Result too large')
I did testing for 389 images with 400k 128 dimensions sift key points features .
It took over 10gb ram and used 15gb swap memory , eventually my ssd was out of storage and process went dead without error information .
I reduced to half of images with 200k 128 dimensions sift key points features . Same thing happened .
is there any solution I can fix it ?
The build has been failing, but the badge still shows the build is passing.
When I tried to prune a tree with this notebook: https://nbviewer.org/github/papayawarrior/public_talks/blob/master/pydata_nyc_DeBaCl.ipynb the networkx gives an error NetworkXError: Frozen graph can't be modified. I tried to downgrade networkx but even the 1.9 version has this error.
Accessible in the package, not just documentation. For example
>>> import numpy
>>> numpy.__version__
'1.10.1'
Something is wrong with pruning. It messes up both the size of nodes and the color correspondence. It seems to occur only when the pruning threshold is larger than k, but it needs more investigation.
When retrieving clusters with the first-k
method, there is some surprising behavior for edge cases.
k
, all roots are returned as clusters, so there are more than k clusters.The solution is probably to simply warn the user when the number of clusters returned isn't exactly 'k'.
Also should be clear in the API docs and tutorial.
Hi,
I am trying to use DeBaCl but I am getting this error:
NetworkXError: SubGraph Views are readonly. Mutations not allowed
I have:
python: 2.7
numpy: 1.14.0
networkx: 2.0
Any idea?
I am trying this for the first time, so could've missed it. Is there a method available to sort the nodes based on cluster density ?
The get_clusters
method can return "-1" values to indicate points that are "noise" for a given cluster labeling scheme. If this "-1" label is passed directly to the color_nodes
parameter it's a problem because this isn't a valid tree node ID, but the error message isn't very helpful.
>>> cluster_labels = tree.get_clusters(fill_background=True)
>>> labels = np.unique(cluster_labels[:, 1])
>>> fig = tree.plot(color_nodes=labels)[0]
KeyError Traceback (most recent call last)
<ipython-input-27-8cae5ab76c06> in <module>()
----> 1 fig = tree.plot(color_nodes=labels)[0]
/home/brian/projects/DeBaCl/debacl/level_set_tree.pyc in plot(self, form, horizontal_spacing, color_nodes, colormap)
313
314 for i, ix in enumerate(color_nodes):
--> 315 subtree = self._make_subtree(ix)
316
317 for ix_sub in subtree.nodes.keys():
/home/brian/projects/DeBaCl/debacl/level_set_tree.pyc in _make_subtree(self, ix)
519
520 T = LevelSetTree()
--> 521 T.nodes[ix] = _copy.deepcopy(self.nodes[ix])
522 T.nodes[ix].parent = None
523 queue = self.nodes[ix].children[:]
KeyError: -1
I have had success using the DeBaCl kNN density estimate for low-dimensional data. I get an overflow error with significantly higher dimension data (500k+ length vectors), which I presume is attributed to the
when I try to import geom tree module using below mention python import command
"from debacl import geom_tree as gtree"
an import error occurs. Kindly help me with this.
Looks like packages aren't being installed or imported correctly.
This is what the "branch-mass" form of the plot uses, but it's not currently exposed for the user.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.