Coder Social home page Coder Social logo

jameshensman / gpclust Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sheffieldml/gpclust

39.0 3.0 16.0 7.98 MB

Clustering time series using Gaussian processes and Variational Bayes.

License: GNU General Public License v3.0

Python 1.72% Jupyter Notebook 98.28%

gpclust's Introduction

GPclust

Clustering time series using Gaussian processes and variational Bayes.

User guide and tutorials are available via the included notebooks.

Currently implemented models are

  • MOG - Mixture of Gaussians
  • MOHGP - Mixtures of Hierarchical Gaussian processes
  • OMGP - Overlapping mixtures of Gaussian processes

Citation

The underlying algorithm is based on the 2012 NIPS paper:

http://books.nips.cc/papers/files/nips25/NIPS2012_1314.pdf

@article{hensman2012fast,
  title={Fast variational inference in the conjugate exponential family},
  author={Hensman, James and Rattray, Magnus and Lawrence, Neil D},
  journal={Advances in Neural Information Processing Systems},
  year={2012}
}

The code also implements clustering of Hierachical Gaussian Processes using that inference framework, detailed in the two following works:

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6802369

@article{hensman2014fast,
  author={Hensman, J. and Rattray, M. and Lawrence, N.},
  journal={Pattern Analysis and Machine Intelligence, IEEE Transactions on},
  title={Fast nonparametric clustering of structured time-series},
  year={2014},
  volume={PP},
  number={99},
  keywords={Biological system modeling;Computational modeling;Data models;Gaussian processes;Optimization;Time series analysis},
  doi={10.1109/TPAMI.2014.2318711},
  ISSN={0162-8828}
}

http://www.biomedcentral.com/1471-2105/14/252

@article{hensman2013hierarchical,
  title={Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters},
  author={Hensman, James and Lawrence, Neil D and Rattray, Magnus},
  journal={BMC bioinformatics},
  volume={14},
  number={1},
  pages={1--12},
  year={2013},
  publisher={BioMed Central}
}

Additionally Overlapping Mixtures of Gaussian Processes model is implemented (using the variational methods described in the above), which was published in this paper:

@article{Lazaro-Gredilla2012,
  title = {{Overlapping Mixtures of Gaussian Processes for the data association problem}},
  author = {L{\'{a}}zaro-Gredilla, Miguel and {Van Vaerenbergh}, Steven and Lawrence, Neil D.},
  doi = {10.1016/j.patcog.2011.10.004},
  journal = {Pattern Recognition},
  month = {apr},
  number = {4},
  pages = {1386--1395},
  url = {},
  volume = {45},
  year = {2012}
}

Dependencies

This work depends on the GPy project, as well as the numpy/scipy stack. matplotlib is optional for plotting.

I've tested the demos with GPy v0.8, but it should work with later versions also.

Contributors

  • James Hensman
  • Valentine Svensson
  • Max Zwiessele

gpclust's People

Contributors

jameshensman avatar mzwiessele avatar vals avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gpclust's Issues

MOHGP time-points

Hey @jameshensman , congrats on the awesome work.

I'm interested in clustering a bunch of time-series, which aren't necessarily sampled at the same time-points. Reading the papers, I got the feeling that the model can handle this type of data, but the MOHGP class only accepts a Tx1 array for the time-points, which seems to mean that the time-series have to be sampled at the same time points.

From the drosophila example what I understood is that each row of Y contains the concatenated replicates for each gene (and these replicates aren't necessarily sampled at the same time-points), but then the n-th replicate of every gene has the same time-points, i.e., all genes are sampled at the same time. Is that correct?

cython & python3

Some of the fast code in GPclust uses scipy.weave, which is deprecated in python3.

A cython replacement would make 2/3 compatibility easier, at the cost of a more difficult install.

remove tango dependency

GPclust uses GPy's ancient color scheme Tango.

This should be removed and a sensible default style from matplotlib should be used.

global name 'N1' is not defined

Hi James,

Trying to use your package on some gene expression data. When I try to run your example notebook for MOHGP, I get an error at m.optimize():

NameError: global name 'N1' is not defined

Indeed looking at the code in np_utilities.py it does look like N1 is not defined, but I assume this should be working. Any ideas? Full error trace below:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-8066ee77b06f> in <module>()
  3 
  4 m = GPclust.MOHGP(X, k_underlying, k_corruption, Y, K=10, prior_Z='DP', alpha=1.0)
----> 5 m.optimize()
  6 m.systematic_splits(verbose=False)

/usr/local/lib/python2.7/dist-packages/GPclust-0.1.0-py2.7.egg/GPclust/collapsed_vb.pyc in optimize(self, method, maxiter, ftol, gtol, step_length, callback, verbose)
 90                 callback()
 91 
 ---> 92             grad,natgrad = self.vb_grad_natgrad()
 93             grad,natgrad = -grad,-natgrad
 94             squareNorm = np.dot(natgrad,grad) # used to monitor convergence

/usr/local/lib/python2.7/dist-packages/GPclust-0.1.0-py2.7.egg/GPclust/MOHGP.pyc in vb_grad_natgrad(self)
142         #yn_mk = self.Y[:,:,None] - self.muk[None,:,:]
143         #ynmk2 = np.sum(np.dot(self.Sy_inv, yn_mk) * np.rollaxis(yn_mk,0,2),0)
--> 144         ynmk2 = multiple_mahalanobis(self.Y, self.muk.T, self.Sy_chol)
145 
146         grad_phi = (self.mixing_prop_bound_grad() -

/usr/local/lib/python2.7/dist-packages/GPclust-0.1.0-py2.7.egg/GPclust/np_utilities.pyc in multiple_mahalanobis_numpy_loops(X1, X2, L)
 43 def multiple_mahalanobis_numpy_loops(X1, X2, L):
 44     LLT = L.dot(L.T)
---> 45     result = np.zeros(shape=(N1, N2), dtype=np.float64)
 46     n = 0
 47     while n < N1:

NameError: global name 'N1' is not defined

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.