Karate Club is an unsupervised machine learning extension library for NetworkX.
Karate Club consists of state-of-the-art methods to do unsupervised learning on graph structured data. To put it simply it is a Swiss Army knife for small-scale graph mining research. First, it provides network embedding techniques at the node and graph level. Second, it includes a variety of overlapping and non-overlapping commmunity detection methods. Implemented methods cover a wide range of network science (NetSci, Complenet), data mining (ICDM, CIKM, KDD), artificial intelligence (AAAI, IJCAI) and machine learning (NeurIPS, ICML, ICLR) conferences, workshops, and pieces from prominent journals.
Citing
If you find Karate Club useful in your research, please consider citing the following paper:
@misc{rozemberczki2020karateclub,
title = {Karate Club: A tool for unsupervised learning on graph structured data.},
author = {Benedek Rozemberczki and Rik Sarkar},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/benedekrozemberczki/karateclub}}
}
A simple example
Karate Club makes the use of modern community detection tecniques quite easy (see here for the accompanying tutorial). For example, this is all it takes to use on a Watts-Strogatz graph Ego-splitting:
import networkx as nx
from karateclub import EgoNetSplitter
g = nx.newman_watts_strogatz_graph(1000, 20, 0.05)
splitter = EgoNetSplitter(1.0)
splitter.fit(g)
print(splitter.get_memberships())
Models included
In detail, the following community detection and embedding methods were implemented.
Overlapping Community Detection
-
DANMF from Ye et al.: Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection (CIKM 2018)
-
M-NMF from Wang et al.: Community Preserving Network Embedding (AAAI 2017)
-
Ego-Splitting from Epasto et al.: Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters (KDD 2017)
-
NNSED from Sun et al.: A Non-negative Symmetric Encoder-Decoder Approach for Community Detection (CIKM 2017)
-
BigClam from Yang and Leskovec: Overlapping Community Detection at Scale:A Nonnegative Matrix Factorization Approach (WSDM 2013)
Non-Overlapping Community Detection
-
EdMot from Li et al.: EdMot: An Edge Enhancement Approach for Motif-aware Community Detection (KDD 2019)
-
Label Propagation from Raghavan et al.: Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks (Physics Review E 2007)
Neighbourhood-Based Node Level Embedding
-
BoostNE from Li et al.: Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation (ASONAM 2019)
-
Diff2Vec from Rozemberczki and Sarkar: Fast Sequence Based Embedding with Diffusion Graphs (CompleNet 2018)
-
NetMF from Qui et al.: Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec (WSDM 2018)
-
Walklets from Perozzi et al.: Don't Walk, Skip! Online Learning of Multi-scale Network Embeddings (ASONAM 2017)
-
GraRep from Cao et al.: GraRep: Learning Graph Representations with Global Structural Information (CIKM 2015)
-
DeepWalk from Perozzi et al.: DeepWalk: Online Learning of Social Representations (KDD 2014)
-
NMF-ADMM from Sun and Févotte: Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence (ICASSP 2014)
Structural Node Level Embedding
- GraphWave from Donnat et al.: Learning Structural Node Embeddings via Diffusion Wavelets (KDD 2018)
Attributed Node Level Embedding
-
BANE from Yang et al.: Binarized Attributed Network Embedding (ICDM 2018)
-
TENE from Yang et al.: Enhanced Network Embedding with Text Information (ICPR 2018)
Graph Level Embedding
-
GL2Vec from Chen and Koga: GL2Vec: Graph Embedding Enriched by Line Graphs with Edge Features (ICONIP 2019)
-
FGSD from Verma and Zhang: Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs (NeurIPS 2017)
-
Graph2Vec from Narayanan et al.: Graph2Vec: Learning Distributed Representations of Graphs (MLGWorkshop 2017)
Head over to our documentation to find out more about installation and data handling, a full list of implemented methods, and datasets. For a quick start, check out our examples.
If you notice anything unexpected, please open an issue and let us know. If you are missing a specific method, feel free to open a feature request. We are motivated to constantly make Karate Club even better.
Installation
Karate Club can be installed with the following pip command.
$ pip install karateclub
As we create new releases frequently, upgrading the package casually might be beneficial.
$ pip install karateclub --upgrade
Running examples
As part of the documentation we provide a number of use cases to show how the clusterings and embeddings can be utilized for downstream learning. These can accessed here with detailed explanations.
Besides the case studies we provide synthetic examples for each model. These can be tried out by running the examples script.
$ python examples.py