Coder Social home page Coder Social logo

hongjea-park / robust_em_for_gmm Goto Github PK

View Code? Open in Web Editor NEW
20.0 5.0 3.0 10.8 MB

MS Yang, A robust EM clustering algorithm for Gaussian mixture models, Pattern Recognit., 45 (2012), pp. 3950-3961

Python 99.44% Dockerfile 0.56%
gaussian-mixture-models gmm-clustering pattern-recognition unsupervised-clustering

robust_em_for_gmm's Introduction

A Robust EM Clustering Algorithm for Gaussian Mixture Models

Description

Python implementation of Robust EM Clustering for Gaussian Mixture Models[1]. (Click here to view the paper for more detail.)


  • robustgmm.robustgmm

    Scikit-learn API style for Robust GMM

  • robustgmm.generator

    Generator for synthetic data from mixture of gaussian.


For more detail to use, see the example below or paper_example.py

  • Reference

    MS Yang, A robust EM clustering algorithm for gaussian mixture models, Pattern Recognit., 45 (2012), pp. 3950-3961


Install

  1. Install from PyPI

    pip install robustgmm
  2. Install from Github

    pip install git+https://github.com/HongJea-Park/robust_EM_for_gmm.git

Example

All examples are conducted to compare with the experimental results of the paper.

# For more detail, refer ./test/paper_example.py
import numpy as np

from robustgmm import RobustGMM
from robustgmm import Generator_Multivariate_Normal


# Generate data from 2 multivariate normal distribution with fixed random seed
np.random.seed(0)
real_means = np.array([[.0, .0], [20, .0]])
real_covs = np.array([[[1, .0], [.0, 1]],
                      [[9, .0], [.0, 9]]])
mix_prob = np.array([.5, .5])
generator = Generator_Multivariate_Normal(means=real_means,
                                          covs=real_covs,
                                          mix_prob=mix_prob)
X = generator.get_sample(800)

# GMM using robust EM Algorithm
rgmm = RobustGMM()
rgmm.fit(X)

Figures for each examples in paper

  1. Example 1

    example1-1 example1-2

  2. Example 2

    example2-1-1 example2-1-2 example2-2-1 example2-2-2

  3. Example 3

    example3-1 example3-2

  4. Example 4

    example4

  5. Example 5

    example5-1 example5-2

  6. Example 6

    example6-1 example6-2

  7. Example 7

    example7

  8. Computational time cost

    timecost

robust_em_for_gmm's People

Contributors

hongjea-park avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

robust_em_for_gmm's Issues

the problem with duplicated data set when updating hidden variable

The data I used was actually collected, and there are many of the same values. However, the data in the given code examples are all generated by sampling, and all data points are different. My initialization in this case: First use np.unique to remove the duplicate values of all data points, and the remaining sample points are used as the mean initialization. The corresponding cluster number is initialized using the number of sample means; the initialization of the mixing coefficient uses the mean The frequency of each data point is divided by the total data point. When the program is running, there will be problems in updating the hidden variable z: min (self.z_.sum (axis = 1)) = 0; that is, there are some data points in the data set that do not belong to all Gaussian sub-models.
I look forward to your assistance in solving this problem. Thank you! Salute you

我使用的数据是实际情况下采集的,存在很多相同的值。然而所给的代码例子中的数据都是采样生成的,所有的数据点都不同。我的这种情况的初始化:首先使用np.unique把所有的数据点的重复值去掉,剩下的样本点作为均值初始化,相应聚类数初始化使用样本均值的数量;混合系数的初始化使用均值中每个数据点的频率除以总的数据点。在程序运行过程中,在更新隐变量z会出现问题:min(self.z_.sum(axis=1))=0;即数据集合中存在部分数据点不属于所有的高斯分模型。
期待您能帮助解决这个问题。谢谢!向您致敬

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.