Coder Social home page Coder Social logo

r-mussabayev / flakylib Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 11.82 MB

Flaky Clustering Library (Minimum Sum-Of-Squares Clustering)

License: MIT License

Python 6.19% Jupyter Notebook 93.81%
k-means clustering k-means-pp mssc centroid clustering-algorithm minimum-sum-of-squares-clustering local-search global-search variable-neighborhood-search

flakylib's Introduction

flakylib

Flaky Clustering Library (Minimum Sum-Of-Squares Clustering)

Clustering is one of the main methods for getting insight on the underlying nature and structure of data in unsupervised way. The purpose of clustering is organizing a set of data into clusters, such that the elements in each cluster are similar and different from those in other clusters.

K-means is one of the most used and fastest clustering algorithms. Actually, in nowadays K-means represents a big family of algorithms aimed to solving Minimum Sum-of-Square Clustering problem. FlakyLib is a Python library of K-means algorithm family optimized for Big Data clustering. Most of the algorithms in the FlakyLib are parallelized and optimized for high-performance computing with Numba, which translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. All FlakyLib algorithms are aimed to solving Minimum Sum-Of-Squares Clustering problem.

In solving large size problems, there are two major drawbacks of standard K-means technique: (i) since it has to process the Large/Big input dataset, it has heavy computational costs and (ii) it has a tendency to converge to one of the local minima of poor quality. In order to reduce the computational complexity, we collect a clustering techniques that utilize parallelization, memory and computational optimizations. To avoid the local minima convergence problem the different Variable Neighbourhood Search (VNS) techniques was used. Using FlakyLib it is possible to clusterize a tens/hundreds of millions of entities in a reasonable amount of time with efficient memory usage.

In most of the cases the naive K-means algorithm is not provide the best possible solution getting stuck in the nearest pit of local minimum. But using different VNS-based approaches (metaheuristics) it is possible to force K-means moving forward and on every iteration increasing solution quality. VNS allow us to transform the local search algorithms (like naive K-means) to global ones.

flakylib's People

Contributors

r-mussabayev avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

rmusab

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.