Coder Social home page Coder Social logo

Comments (6)

lmcinnes avatar lmcinnes commented on May 18, 2024

I'm not sure exactly what you mean. Can you provide some example code of the sort of thing you would like to see? Are you trying to fit multiple different groups from a single dataframe, each group independent of the prior?

from hdbscan.

lmcinnes avatar lmcinnes commented on May 18, 2024

I see what you mean now, and I can definitely see why that might be desirable. I don't have any elegant solutions to offer unfortunately. If you need to fit the groups independently then I think you have to effectively iterate through them in one way or another. That means either a transform as you have, or just iterating through the groups in the groupby and constructing the resulting series. I would lean toward the latter as it is "simpler" and will probably do the job, but obviously the transform will faster. You can access some of the "under the hood" code if you like to make the functions easier.

from hdbscan import hdbscan
from hdbscan._hdbscan_tree import outlier_scores

def outlier(series):
    l, c, p, tree, s, m = hdbscan(series)
    return outlier_scores(tree)

df['score'] = df.groupby('category')['numeric'].transform(outlier)

ought to work, although I admit I haven't tried it. Let me know if that is the sort of thing you had in mind.

from hdbscan.

lmcinnes avatar lmcinnes commented on May 18, 2024

Ah yes, the local variable x which is, of course my fault. This is why I should always test code that I type in. At least I can catch obvious errors. I've updated my comment to fix the obvious error (x should have been series). If we're lucky that might solve the other error too.

from hdbscan.

lmcinnes avatar lmcinnes commented on May 18, 2024

Ah, I see the problem. Sklearn wants 2D arrays, and a Series is 1D. You'll
need to wrap it in a second axis to make it work. I can't recall the syntax
off the top of my head but I'll post it when I get a chance to look it up.

On Wed, Aug 10, 2016 at 10:56 AM, Eric Coker [email protected]
wrote:

I'm running this instead of trying transform, because I thought it would
be simpler.
`from hdbscan import hdbscan
from hdbscan._hdbscan_tree import outlier_scores

def outlier(series):
l, c, p, tree, s, m = hdbscan(series)
return outlier_scores(tree)
score = df.groupby(['categ1','categ2'])['num_col'].apply(outlier)
`
And receiving this error, seems to possibly be related to
'gen_min_span_tree'

`/home/user/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in
apply(self, func, _args, *_kwargs)
713 # ignore SettingWithCopy here in case the user mutates
714 with option_context('mode.chained_assignment',None):
--> 715 return self._python_apply_general(f)
716
717 def _python_apply_general(self, f):

/home/user/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in
_python_apply_general(self, f)
717 def _python_apply_general(self, f):
718 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 719 self.axis)
720
721 return self._wrap_applied_output(keys, values,

/home/user/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in
apply(self, f, data, axis)
1404 # group might be modified
1405 group_axes = _get_axes(group)
-> 1406 res = f(group)
1407 if not _is_indexed_like(res, group_axes):
1408 mutated = True

/home/user/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in
f(g)
709 @wraps https://github.com/wraps(func)
710 def f(g):
--> 711 return func(g, _args, *_kwargs)
712
713 # ignore SettingWithCopy here in case the user mutates

in outlier(series)
3
4 def outlier(series):
----> 5 l, c, p, tree, s, m = hdbscan(series)
6 return outlier_scores(tree)
7

/home/user/anaconda2/lib/python2.7/site-packages/
hdbscan-0.8.1-py2.7-linux-x86_64.egg/hdbscan/hdbscan_.pyc in hdbscan(X,
min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm,
memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs,
allow_single_cluster, *

_kwargs) 495 memory.cache(_hdbscan_prims_kdtree)(X, min_samples, alpha,
496 metric, p, leaf_size, --> 497 gen_min_span_tree, *_kwargs)
498 else:
499 (single_linkage_tree,

/home/user/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/memory.pyc
in call(self, _args, *_kwargs)
281
282 def call(self, _args, *_kwargs):
--> 283 return self.func(_args, *_kwargs)
284
285 def call_and_shelve(self, _args, *_kwargs):

/home/user/anaconda2/lib/python2.7/site-packages/
hdbscan-0.8.1-py2.7-linux-x86_64.egg/hdbscan/hdbscan_.pyc in
_hdbscan_prims_kdtree(X, min_samples, alpha, metric, p, leaf_size,
gen_min_span_tree, **kwargs)
169 core_distances = tree.query(X, k=min_samples,
170 dualtree=True,
--> 171 breadth_first=True)[0][:, -1].copy(order='C')
172 # Mutual reachability distance is implicit in mst_linkage_core_vector
173 min_spanning_tree = mst_linkage_core_vector(X, core_distances,
dist_metric, alpha)

sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.kd_tree.BinaryTree.query
(sklearn/neighbors/kd_tree.c:10563)()

sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.kd_tree.
NeighborsHeap.init (sklearn/neighbors/kd_tree.c:4971)()

sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.kd_tree.get_memview_DTYPE_2D
(sklearn/neighbors/kd_tree.c:2662)()

/home/user/anaconda2/lib/python2.7/site-packages/sklearn/neighbors/kd_tree.so
in View.MemoryView.array_cwrapper (sklearn/neighbors/kd_tree.c:25261)()

/home/user/anaconda2/lib/python2.7/site-packages/sklearn/neighbors/kd_tree.so
in View.MemoryView.array.cinit (sklearn/neighbors/kd_tree.c:24186)()

ValueError: Invalid shape in axis 1: 0.`


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/lmcinnes/hdbscan/issues/50#issuecomment-238893151,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALaKBVH21b4S0mvBYH4lncrxO7d0k40Mks5qeeaCgaJpZM4JfTP7
.

from hdbscan.

lmcinnes avatar lmcinnes commented on May 18, 2024

I think what is needed is something like:

def outlier(series):
    feature_matrix = series.asobject[:, np.newaxis]
    l, c, p, tree, s, m = hdbscan(feature_matrix)
    return outlier_scores(tree)

I admit there may still be issues with hdbscan on one-dimensional data, but this should at least format it so that sklearn style APIs will deal with it appropriately.

from hdbscan.

lmcinnes avatar lmcinnes commented on May 18, 2024

I presume this was working now?

from hdbscan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.