Coder Social home page Coder Social logo

finch-clustering's Introduction

First Integer Neighbor Clustering Hierarchy (FINCH) Algorithm

alt text

FINCH is a parameter-free fast and scalable clustering algorithm. it stands out for its speed and clustering quality. The algorithm is described in our paper Efficient Parameter-free Clustering Using First Neighbor Relations published in CVPR 2019 . Read Paper.

Installation

The project is available in PyPI. To install run:

pip install finch-clust

Optional. Install PyNNDescent to get first neighbours for large data

To install finch with pynndescent run:

pip install "finch-clust[ann]"

Usage:

typically you would run:

from finch import FINCH
c, num_clust, req_c = FINCH(data)

You can set options e.g., required number of cluster or distance etc,

c, num_clust, req_c = FINCH(data, initial_rank=None, req_clust=None, distance='cosine', verbose=True)

For more details on meaning of input arguments check README in finch directory.

Matlab usage

Correponding Matlab implementation is provided in the matlab directory.

Demos

The following demo notebooks are available to see the usage in clustering a dataset.

  1. Basic usage on 2D toy data
  2. Clustering STL-10 dataset with FINCH

Relevant tools built on FINCH

  • h-nne: See also our h-nne method which uses FINCH for fast dimenionality reduction and visualization applications.

  • TW-FINCH: Also see our TW-FINCH variant which is useful for video segmentation.

Citation

@inproceedings{finch,
    author    = {M. Saquib Sarfraz and Vivek Sharma and Rainer Stiefelhagen}, 
    title     = {Efficient Parameter-free Clustering Using First Neighbor Relations}, 
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    pages = {8934--8943}
    year  = {2019}
}

The code and FINCH algorithm is not meant for commercial use. Please contact the author for licensing information.

finch-clustering's People

Contributors

ssarfraz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finch-clustering's Issues

Different Clustering results when using python and matlab implementation

I realized, that the python implementation does yield different results than the matlab version.

This I have found out by first comparing a python evaluation of the tw-finch clustering results against the provided matlab evaluation one, with one of the provided datasets and the features from the TW-FINCH paper.
After looking a little more into the issue, I have found that already in the first steps of the clustering process, both version assign the same features/frames to different clusters and the number of clusters is drastically different too, which explains the performance differences.

Have you encountered this issue and if yes are there solutions?

Unable to replicate numbers

Hey,

I was trying to replicate the numbers presented in the paper with the features provided and my numbers seem to be a bit on the lower side. Without changing anything, I ran the python version of the code, and what i noticed was on breakfast I am getting an MOF of 60.1 whereas the reported number is 62.7. Similarly for MPII, I am getting 41.51 but reported number is 42.0 (Though very minor). Is there a reason for this discrepancy?

TWFinch code missing FS "Eval" option; unable to reproduce accuracy

Hello, thank you for posting the code and data for the TWFinch paper.

The code seems to be missing an option to run the FS "Eval" dataset.
I've made a logical change to your code (below) to load this dataset, but am unable to reproduce the accuracy in the paper, which was reported as MoF= 71.1%.

The following change to TW-FINCH/util_fns/read_video.m produces an accuracy of MoF:= 66.7%:

 elseif strcmp(Dataset, 'FS')
    map=readtable(fullfile(mapping_path, 'mappingeval.txt'));
    map2=table([1:numel(map.Var2)]', 'RowNames', map.Var2);
    gt_label_str=table2cell(readtable(fullfile(gt_path, vid_name), 'Delimiter', '#', 'ReadVariableNames',false));
    gt_label_frame=table2array(map2(gt_label_str,1));

I would appreciate any guidance on what might be wrong. Thank you.

how to estimate number of clusters

I was wondering how to estimate number of clusters using FINCH after reading your paper, your method seems can always get the correct number of clusters, e.g., in Table 2.

array is empty with s1 dataset

why the algorithm triger an error when working on s1 dataset from http://cs.joensuu.fi/sipu/datasets/

~/finchcls.py in update_adj(self, adj, d)
94 v = np.argsort(d[idx])
95 v = v[:2]
---> 96 x = [idx[0][v[0]], idx[0][v[1]]]
97 y = [idx[1][v[0]], idx[1][v[1]]]
98 a = sp.lil_matrix(adj.get_shape())

IndexError: index 0 is out of bounds for axis 0 with size 0

the same error with a1 dataset and "unbalance" dataset

any other datasets it works fine

There is a bug when using pynndescent.NNDescent

Nice work!

When my data volume is very large, I will use the "NNDescent" in the "pynndescent" library according to the "Python" code, and then an error will occur.

image

my “pynndescent” version is ‘0.5.5'. how to fix it?

Looking forward to your reply, thanks.

about output

Thank your opened code,I want to know what mean about output of 'C', It is a N*2 array,what which is cluster label? I found about my data get bad result ,I want to reason.

errors when run the run_on_dataset.m

hello,thank you for posting the code for the TWFinch and great work!I have tried to reproduce the results,but I meet some problems when I run the run_on_dataset.m.

I downloaded the data and put it under E:\FINCH-Clustering-master\TW-FINCH,
then I run the script tw_finch = true Result = run_on_dataset('50Salads', tw_finch, 'E:\FINCH-Clustering-master\TW-FINCH\Action_Segmentation_Datasets');
the error is as follows

捕获144

The performance of FINCH on Aggregation

The code is working fine. But the performance I have got is always 0.96536 in terms of NMI (implemented in sklearn.metrics).
The code I run is as follows:

import numpy as np
import scipy.io as sio
from sklearn.metrics import normalized_mutual_info_score as nmi
from .finch import FINCH

data = sio.loadmat("Agg.mat")
X = data["X"]
y_true = data["Y"]
c_true = len(np.unique(y_true))

Y, num_clu, req_y = FINCH(X, req_clust=c_true, distance='euclidean') # or cosine
acc = nmi(y_true, req_y, average_method="max")
print(acc)

Looking forward to your reply

Code for TW-FINCH

It would be great if you could publish your code for TW-FINCH, since it is a bit hard to replicate the results from the paper.

Is there any randomness in the clustering results?

Hi, I fixed the random seed and input data and then applied FINCH for clustering. But I found that the results obtained by each clustering are different, what should I do to ensure that I can get a fixed result every time?

image

P.S. I have a large amount of data (hundreds of thousands) and use the NNDescent method in 'pynndescent', is it possible that this is the cause? What can I do?

Looking forward to your reply, thank you very much

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimensions(s) and the array at index 1 has 2 deminesion (s).

Hi, I tried to convert my video into a numpy array as method shown here (https://stackoverflow.com/questions/67644826/how-to-convert-a-video-to-a-numpy-array) . And now when I pass it as a input to the function as FINCH(data, req_clust=K, tw_finch=True) I am getting :
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimensions(s) and the array at index 1 has 2 deminesion (s). The shape of my data right now is (928, 108, 108, 3)

How do I fix this? Is there any other method to get feature vector of a video ? I really appreciate the response !

element of adjacent matrix may greater than 1

thanks for this amazing and practical algorihtm
when I browse the python ver code, I find the element of adjacent matrix may greater than 1 as below

csr_matrix in python.finch.py line45
0, 0, 0, 0, 1
0, 0, 0, 0, 1
0, 1, 0, 0, 0
0, 0, 0, 0, 1
0, 1, 0, 0, 0
adjacent matrix in line50
0, 1, 0, 1, 1
1, 0, 1, 1, 2
0, 1, 0, 0, 1
1, 1, 0, 0, 1
1, 2, 1, 1, 0

maybe this will impact the value of min_sim in hierarchy cluster line155

Finch Algo 2

Thank you for the greate method and code.
As far as I understand, I think algo2 is needed for evaluation, but I don't think there is a corresponding python code.

IndexError when req_clust > num_clust

I call finch using
cluster_partition, n_part_clust, part_labels = FINCH(data, req_clust=2)

and receive this error

line 185, in FINCH
    req_c = req_numclust(c[:, ind[-1]], data, req_clust, distance, use_ann_above_samples, verbose)
IndexError: list index out of range

My best guess is, that there is only one cluster, so the condition v >= req_clust is never fulfilled in ind = [i for i, v in enumerate(num_clust) if v >= req_clust], thereby the index list is empty, thereby ind[-1] is out of range.

What is the implication and how to best deal with this?

TW-FINCH feature extraction method

hello,thanks for your work!I'm sorry but this problem has been bothering me for a long time.For TW-FINCH,do the frame-wise features can only be extracted by iDT(your paper mentioned),or it can also be extracted by other CNN methods such as I3D?Will the methods affect the clustering results?

`sklearn` is still a dependency in `setup.py`

Commit b508b1a intended to remove sklearn dependency, but actually removed scipy. You can check the commit's diff here.

scipy is still installed since it's a dependency of scikit-learn, but we also get the deprecated sklearn package.

This means that the problem from #29 still affects finch-clust==0.1.8. We can check it by doing the following (based on How to test whether a package will be affected by the sklearn deprecation):

SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=False \
    pip install finch-clust==0.1.8

Matching problem of TW-FINCH

It is amazing that this unsupervised clutering method outperforms other paradigms on five challenging action segmentation datasets. However, some details puzzle me a lot, just about how to map the obtainded segments with different action labels (including background) using Hungarian algorithm. It would pretty appreciate if these problems would be explained.

Error when runninng TW_FINCH and specifying the number of clusters.

Hello,
Thank you for publishing your excellent work.

I was testing the TW_FINCH for clustering and it has been working well, but when I tried to specify the exact number of clusters I wanted, I got the following error:

    [186]  ind = [i for i, v in enumerate(num_clust) if v >= req_clust]
--> [187]  req_c = req_numclust(c[:, ind[-1]], data, req_clust, distance, use_tw_finch=tw_finch)
    [188]else:
    [189]  req_c = c[:, num_clust.index(req_clust)]

IndexError: list index out of range```

It seems to be in the c[:,ind[-1]] call.

What could be the reason behind this error?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.