- Mallet Extension
In Mallet package, it only contains two topic Models--LDA and Hierachical LDA.
So I tried to implement some useful topic modeling methods on it.
Model:
- Hierarchical Dirichlet Process with Gibbs Sampling. (in
HDP
folder) - Inference part for hLDA. (in
hLDA
folder)
Usage:
- This is an extension for Mallet, so you need to have Mallet's source code first.
- put
HDP.java
,HDPInferencer.java
andHierarchicalLDAInferencer.java
insrc/cc/mallet/topics
folder. - If you are going to run HDP, make sure you include
knowceans
package in your project. - run
HDPTest.java
orhLDATest.java
will give you a demo for a small dataset indata
folder.
References:
- Mallet: http://mallet.cs.umass.edu/
- knowceans: http://sourceforge.net/projects/knowceans/
- HDP paper: http://www.cs.berkeley.edu/~jordan/papers/hdp.pdf
- HDP paper & source code: "Implementing the HDP with minimum code complexity" by Gregor Heinrich
- Scikit-learn Extension
Scikit-learn doesn't have any topic models yet, so I modified Matthew D. Hoffman's onlineldavb into scikit-learn format.
Model:
- online LDA with variational EM. (In
LDA
folder)
Usage:
- Make sure
numpy
,scipy
, andscikit-learn
are installed. - run
python test
inlda
folder for unit test - The onlineLDA model is in
lda.py
. - For a quick exmaple, run
python lda_example.py online
will fit a 10 topics model with 20 NewsGroup dataset.online
means we use online update(orpartial_fit
method). Changeonline
tobatch
will fit the model with batch update(orfit
method).
Note:
I updated the currecnt code based on currrent scikit-learn
development branch. Older scikit-learn
version will thorw error, but I am pretty sure everyone can fixed this with little alteration. (2014/09/15)
Reference:
- Scikit-learn: http://scikit-learn.org
- onlineLDA: http://www.cs.princeton.edu/~mdhoffma/code/onlineldavb.tar
- online LDA paper: http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf