Implementation of the paper Clustering of the structures by using "snakes & dragons" approach, or correlation matrix as a signal
- Macroeconomics development indicators from the World Bank - Link
-
NumPy
-
Pandas
-
Scikit-learn
-
Tqdm (for displaying a progress bar)
-
Yellowbrick (provides mechanism for selecting the best number of clusters
k
, as described in the paper)To install using the conda package mamager (recommended):
conda install -c districtdatalabs yellowbrick
The algorithm internally uses KMeans
multiple times on random partitions of the entire dataset. Although sklearn's implementation of K-Means is widely used, it is not the fastest out there. Intel-backed DAAL's implementation was found to be much faster in the initial benchmarks, giving almost 8-12x speed-up. If DAAL is not installed, then the code will fallback to use the sklearn's implementation.
The recommended way to install DAAL for python would be using the conda package manager:
conda install -c intel daal4py
- Consensus Clustering (paper): https://link.springer.com/article/10.1023/A:1023949509487
- Consensus Clustering (blog): https://towardsdatascience.com/consensus-clustering-f5d25c98eaf2