Coder Social home page Coder Social logo

annoviko / pyclustering Goto Github PK

View Code? Open in Web Editor NEW
1.1K 41.0 251.0 34.24 MB

pyclustering is a Python, C++ data mining library.

Home Page: https://pyclustering.github.io/

License: BSD 3-Clause "New" or "Revised" License

C++ 36.97% Python 60.57% C 0.60% Makefile 0.30% Shell 0.48% PowerShell 0.37% TeX 0.61% CMake 0.11%
clustering oscillatory-networks data-mining neural-networks python3 c-plus-plus machine-learning algorithms data-science python

pyclustering's Introduction

Warning - Attention Users

Please be aware that the `pyclustering` library is no longer supported as of 2021 due to personal reasons. There will be no further maintenance, issue addressing, or feature development for this repository.

For continued usage, I recommend seeking alternative solutions.

Thank you for your understanding.

Build Status

Build Status Linux MacOS Build Status Win Coverage Status PyPi Download Counter JOSS

PyClustering

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.

Version: 0.11.dev

License: The 3-Clause BSD License

E-Mail: [email protected]

Documentation: https://pyclustering.github.io/docs/0.10.1/html/

Homepage: https://pyclustering.github.io/

PyClustering Wiki: https://github.com/annoviko/pyclustering/wiki

Dependencies

Required packages: scipy, matplotlib, numpy, Pillow

Python version: >=3.6 (32-bit, 64-bit)

C++ version: >= 14 (32-bit, 64-bit)

Performance

Each algorithm is implemented using Python and C/C++ language, if your platform is not supported then Python implementation is used, otherwise C/C++. Implementation can be chosen by ccore flag (by default it is always 'True' and it means that C/C++ is used), for example:

# As by default - C/C++ part of the library is used
xmeans_instance_1 = xmeans(data_points, start_centers, 20, ccore=True);

# The same - C/C++ part of the library is used by default
xmeans_instance_2 = xmeans(data_points, start_centers, 20);

# Switch off core - Python is used
xmeans_instance_3 = xmeans(data_points, start_centers, 20, ccore=False);

Installation

Installation using pip3 tool:

$ pip3 install pyclustering

Manual installation from official repository using Makefile:

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# compile CCORE library (core of the pyclustering library).
$ cd ccore/
$ make ccore_64bit      # build for 64-bit OS

# $ make ccore_32bit    # build for 32-bit OS

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using CMake:

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# generate build files.
$ mkdir build
$ cmake ..

# build pyclustering-shared target depending on what was generated (Makefile or MSVC solution)
# if Makefile has been generated then
$ make pyclustering-shared

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using Microsoft Visual Studio solution:

  1. Clone repository from: https://github.com/annoviko/pyclustering.git
  2. Open folder pyclustering/ccore
  3. Open Visual Studio project ccore.sln
  4. Select solution platform: x86 or x64
  5. Build pyclustering-shared project.
  6. Add pyclustering folder to python path or install it using setup.py
# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Proposals, Questions, Bugs

In case of any questions, proposals or bugs related to the pyclustering please contact to [email protected] or create an issue here.

PyClustering Status

Branch master 0.10.dev 0.10.1.rel
Build (Linux, MacOS) Build Status Linux MacOS Build Status Linux MacOS 0.10.dev Build Status Linux 0.10.1.rel
Build (Win) Build Status Win
Code Coverage Coverage Status Coverage Status 0.10.dev Coverage Status 0.10.1.rel

Cite the Library

If you are using pyclustering library in a scientific paper, please, cite the library:

Novikov, A., 2019. PyClustering: Data Mining Library. Journal of Open Source Software, 4(36), p.1230. Available at: http://dx.doi.org/10.21105/joss.01230.

BibTeX entry:

@article{Novikov2019,
    doi         = {10.21105/joss.01230},
    url         = {https://doi.org/10.21105/joss.01230},
    year        = 2019,
    month       = {apr},
    publisher   = {The Open Journal},
    volume      = {4},
    number      = {36},
    pages       = {1230},
    author      = {Andrei Novikov},
    title       = {{PyClustering}: Data Mining Library},
    journal     = {Journal of Open Source Software}
}

Brief Overview of the Library Content

Clustering algorithms and methods (module pyclustering.cluster):

Algorithm Python C++
Agglomerative
BANG
BIRCH
BSAS
CLARANS
CLIQUE
CURE
DBSCAN
Elbow
EMA
Fuzzy C-Means
GA (Genetic Algorithm)
G-Means
HSyncNet
K-Means
K-Means++
K-Medians
K-Medoids
MBSAS
OPTICS
ROCK
Silhouette
SOM-SC
SyncNet
Sync-SOM
TTSAS
X-Means

Oscillatory networks and neural networks (module pyclustering.nnet):

Model Python C++
CNN (Chaotic Neural Network)
fSync (Oscillatory network based on Landau-Stuart equation and Kuramoto model)
HHN (Oscillatory network based on Hodgkin-Huxley model)
Hysteresis Oscillatory Network
LEGION (Local Excitatory Global Inhibitory Oscillatory Network)
PCNN (Pulse-Coupled Neural Network)
SOM (Self-Organized Map)
Sync (Oscillatory network based on Kuramoto model)
SyncPR (Oscillatory network for pattern recognition)
SyncSegm (Oscillatory network for image segmentation)

Graph Coloring Algorithms (module pyclustering.gcolor):

Algorithm Python C++
DSatur
Hysteresis
GColorSync

Containers (module pyclustering.container):

Algorithm Python C++
KD Tree
CF Tree

Examples in the Library

The library contains examples for each algorithm and oscillatory network model:

Clustering examples: pyclustering/cluster/examples

Graph coloring examples: pyclustering/gcolor/examples

Oscillatory network examples: pyclustering/nnet/examples

Where are examples?

Code Examples

Data clustering by CURE algorithm

from pyclustering.cluster import cluster_visualizer;
from pyclustering.cluster.cure import cure;
from pyclustering.utils import read_sample;
from pyclustering.samples.definitions import FCPS_SAMPLES;

# Input data in following format [ [0.1, 0.5], [0.3, 0.1], ... ].
input_data = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Allocate three clusters.
cure_instance = cure(input_data, 3);
cure_instance.process();
clusters = cure_instance.get_clusters();

# Visualize allocated clusters.
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, input_data);
visualizer.show();

Data clustering by K-Means algorithm

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)

# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()

# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)

# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()

# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)

Data clustering by OPTICS algorithm

from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.optics import optics, ordering_analyser, ordering_visualizer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Read sample for clustering from some file
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN)

# Run cluster analysis where connectivity radius is bigger than real
radius = 2.0
neighbors = 3
amount_of_clusters = 3
optics_instance = optics(sample, radius, neighbors, amount_of_clusters)

# Performs cluster analysis
optics_instance.process()

# Obtain results of clustering
clusters = optics_instance.get_clusters()
noise = optics_instance.get_noise()
ordering = optics_instance.get_ordering()

# Visualize ordering diagram
analyser = ordering_analyser(ordering)
ordering_visualizer.show_ordering_diagram(analyser, amount_of_clusters)

# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show()

Simulation of oscillatory network PCNN

from pyclustering.nnet.pcnn import pcnn_network, pcnn_visualizer

# Create Pulse-Coupled neural network with 10 oscillators.
net = pcnn_network(10)

# Perform simulation during 100 steps using binary external stimulus.
dynamic = net.simulate(50, [1, 1, 1, 0, 0, 0, 0, 1, 1, 1])

# Allocate synchronous ensembles from the output dynamic.
ensembles = dynamic.allocate_sync_ensembles()

# Show output dynamic.
pcnn_visualizer.show_output_dynamic(dynamic, ensembles)

Simulation of chaotic neural network CNN

from pyclustering.cluster import cluster_visualizer
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
from pyclustering.nnet.cnn import cnn_network, cnn_visualizer

# Load stimulus from file.
stimulus = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)

# Create chaotic neural network, amount of neurons should be equal to amount of stimulus.
network_instance = cnn_network(len(stimulus))

# Perform simulation during 100 steps.
steps = 100
output_dynamic = network_instance.simulate(steps, stimulus)

# Display output dynamic of the network.
cnn_visualizer.show_output_dynamic(output_dynamic)

# Display dynamic matrix and observation matrix to show clustering phenomenon.
cnn_visualizer.show_dynamic_matrix(output_dynamic)
cnn_visualizer.show_observation_matrix(output_dynamic)

# Visualize clustering results.
clusters = output_dynamic.allocate_sync_ensembles(10)
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, stimulus)
visualizer.show()

Illustrations

Cluster allocation on FCPS dataset collection by DBSCAN:

Clustering by DBSCAN

Cluster allocation by OPTICS using cluster-ordering diagram:

Clustering by OPTICS

Partial synchronization (clustering) in Sync oscillatory network:

Partial synchronization in Sync oscillatory network

Cluster visualization by SOM (Self-Organized Feature Map)

Cluster visualization by SOM

pyclustering's People

Contributors

abhishek792 avatar adavidzh avatar alexeyreshetnyak avatar annoviko avatar bill2462 avatar mxbonn avatar narinemanukyan avatar polladin avatar quintasan avatar romanimm avatar tirkarthi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyclustering's Issues

[ants] Unit-tests should be implemented

  • Unit-tests should be implemented at least for easiest samples from Sample/ (SampleSimple1, SampleSimple2, etc.).
  • Existed unit-test should be implemented.

[support] Separate dynamic representation

New method or existed should be created/modified for representation dynamics. Separation should be configurable, for example, [0, 2, 6] on the first subplot, [1, 3, 5] on the second. If configuration is not specified then each dynamic of each object should be represented separately.

[antmeans] Examples don't run

It's not possible to run examples for antmeans algorithm: [antmeans.examples].

Traceback (most recent call last):
File "C:\Archive Files\PhD\PyClustering\antmeans\examples.py", line 10, in
import ant_clustering_with_mean as ant_clustering
ImportError: No module named 'ant_clustering_with_mean'

[nnet.plsom] Infinite loop

Hang (infinite loop) has been detected in test 'testWinners(self)' that is created for the PLSOM (tests.py) and that is commented right now.

!NOTE. There are no contradictions with specification!

Fragment of the test:
sample = read_sample('../../../Samples/SampleSimple3.txt');
network = plsom(5, 5, sample, stucture);
network.train();

[gcolor] Graph coloring using oscillatory network

Module that will contain modules related to solving graph coloring problem should be implemented. The first step is implementation of the algorithm based on the oscillatory neural network that based on the Kuramoto model. Unit-tests and examples should be implemented.

[pyclustering.cluster.ema] Expectation maximization algorithm

Introduction
Clustering algorithm EMA (Expectation Maximization Algorithm) should be implemented. EMA is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models.

EMA should be based on GMM (Gaussian Mixture Model) that should be also implemented if it is not available in 'scipy' library.

Resources

  1. Theory: http://idiom.ucsd.edu/~rlevy/pmsl_textbook/chapters/pmsl_3.pdf
  2. Youtube: https://www.youtube.com/watch?v=qMTuMa86NzU
  3. Article: http://www.ics.uci.edu/~smyth/courses/cs274/notes/EMnotes.pdf
  4. Bilmes, Jeff. "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models".
  5. EM algorithm and variants: an informal tutorial by Alexis Roche.

Description
Expectation maximization algorithm should be implemented with doxygen documentation:
pyclustering/cluster/ema.py

Unit-tests:
pyclustering/cluster/tests/unit/ut_ema.py

Examples:
pyclustering/cluster/examples/ema_examples.py

An example of interface of EMA algorithm:

class ema:
    def __init__(self, data, other_paeameters):
        # Initialization of the instance algorithm.

    def process(self):
        # Implementation.

    def get_clusters(self):
        # Returns clusters.

    def get_probabilities(self):
        # Returns belonging probabilities.

[dbscan] Final number of objects can less

Final number of objects that are allocated into clusters (include points that has been marked as noise) can be less than real number of points.

Test scenario:
dbscan SampleSimple1.txt 0.5 4

  • templateLengthProcessData('../Samples/SampleSimple1.txt', 0.5, 0, 10);
  • templateLengthProcessData('../Samples/SampleSimple3.txt', 1, 0, 20);

[nnet.plsom] Adaptation of unit-tests

Several unit-tests are not applicable for the PLSOM algorithm. The uni-tests are used for testing SOM. Implementation of PLSOM is performed in line article.

[samples] special format for graphs

Special format should be used for graph representation.
Mandatory fields:

  • Graph description (matrix or list representation).
    Optional fields:
  • Graph representation in the space.

[nnet.legion] Self-oscillation without stimulations

The problem is that noise is positive, but it should be negative. But in this case proper configuration for examples should be used (founded).

In some way network works properly, except self-oscillations, and the problem can be described as the contradiction with the articles. The fix should be committed when proper configuration will be founded.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.