Coder Social home page Coder Social logo

mallet-lda's Introduction

mallet-lda

This project is a minimal Clojure wrapper over the LDA topic modeling implementation from MALLET, the MAchine Learning for LanguagE Toolkit.

Installation

The latest stable release is 0.1.1.

Add this :dependency to your Leiningen project.clj:

["marcliberatore.mallet-lda" "0.1.1"]

Feedback and pull requests welcome!

Usage

(ns example
  (:require [marcliberatore.mallet-lda :refer [make-instance-list lda]]
            [marcliberatore.mallet-lda.misc :refer [load-sample-documents]]))

...

;; for example:

(let [data [[1 ["a" "little" "lamb"]]  
            [2 ["row" "your" "boat"]]
            ...]
      instance-list (make-instance-list data)]
  (lda instance-list))
  
;; or, if you're working in a cloned copy of this repository:
(lda (make-instance-list (load-sample-documents))) 
;; (the above won't work with the JAR, as it does not include the sample documents.)

Changelog

  • 0.1.1

    • Added mallet-lda.misc namespace, which contains functions to load data into the format expected by mallet-lda.
    • Added sample data in resources; see mallet-lda.misc/load-sample-data for an example of how to load this data.
  • 0.1.0

    • Initial release.

Sample Documents

The data in resources/sample-data is the web dataset from MALLET.

This sample data includes the text of 24 "featured articles" from Wikipedia, 12 from the English version, and 12 from the German version. They were retrieved in December 2008. The text is in UTF-8 encoding.

TODO

Write an idiomatic wrapper over the return value of (lda).

License

Copyright © 2013 Marc Liberatore

Distributed under the Eclipse Public License, the same as Clojure.

mallet-lda's People

Contributors

marcliberatore avatar shark8me avatar

Stargazers

Joseph Kunin avatar Alan avatar Mark Mucha avatar Vladimir Lukiyanov avatar Vic avatar Hossein Abedi avatar Paul Gowder avatar Sungjin Chun avatar  avatar Elpizo Choi avatar Tai avatar Timo Sulg avatar

Watchers

Jase Bell avatar James Cloos avatar  avatar

mallet-lda's Issues

Feedback: great job and i'm willing to contribute at some point in near future.

I found your library couple weeks ago from your blog when i was looking LDA libraries and preferable written in Clojure.

For Clojure there were only 2project: yours and deprecated chisel.
And your library exists on VersionEye, which makes it very easy to track new versions and dependencies.

I decided to add more functionalities to make it more suitable for my next project. You can check out current progress on my fork.

I'm currently working on model's serialization and i'm also going to add some simple tests on this weekend.

Is there better Mallet's documentation? Or does researcher at Amhart also have to use that ugly JavaDoc?

I'm not researcher myself, all my knowledge about LDA comes from this paper here. With my limited knowledge is quite difficult de-cipher what some methods are doing and write user-friendly doc-string.

Should i start bombing mail-list question-by-questions until they'll see benefits of having decent documentation? :d sounds like great hobby.

Ok, back to hacking.

Throws an error on load-sample-documents

I tried to run lda on the sample documents with the code example given in the README:

(ns mallet-topic-model.core
   (:require [marcliberatore.mallet-lda :refer [make-instance-list lda]]
            [marcliberatore.mallet-lda.misc :refer [load-sample-documents]]))

(lda (make-instance-list (load-sample-documents)))

I got the following error:

Failed trying to require mallet-topic-model.core with: java.lang.NullPointerException: null
misc.clj:29 marcliberatore.mallet-lda.misc/make-documents
misc.clj:34 marcliberatore.mallet-lda.misc/load-sample-documents
core.clj:13 mallet-topic-model.core/eval6264
..stacktrace elided.

and in the console:

Perhaps the 'resources' directories weren't copied into the 'class' directory.
Continuing.

Perhaps the sample documents in the resources folder of this project isn't available when we just include this jar in our project.clj?

Error on running sample

I tried out the sample which you have given in the README. The sample gives an error message and stops indefinitely. I am attaching the code and the error log below,

Code:
(let [data [[1 ["a" "little" "lamb"]] [2 ["row" "your" "boat"]] [3 ["boat" "river" "dance"]]] instance-list (make-instance-list data)] (lda instance-list))

Output:

Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file.
Perhaps the 'resources' directories weren't copied into the 'class' directory.
Continuing.
Feb 02, 2017 7:46:27 PM cc.mallet.topics.ParallelTopicModel
INFO: Coded LDA: 10 topics, 4 topic bits, 1111 topic mask
Feb 02, 2017 7:46:27 PM cc.mallet.topics.ParallelTopicModel initializeHistograms

INFO: max tokens: 3
Feb 02, 2017 7:46:27 PM cc.mallet.topics.ParallelTopicModel initializeHistograms

INFO: total tokens: 9
Feb 02, 2017 7:46:28 PM cc.mallet.topics.ParallelTopicModel estimate
INFO: <10> LL/token: -4.54015
Feb 02, 2017 7:46:28 PM cc.mallet.topics.ParallelTopicModel estimate
INFO: <20> LL/token: -4.54015
Feb 02, 2017 7:46:29 PM cc.mallet.topics.ParallelTopicModel optimizeBeta
INFO: [beta: 0]
Feb 02, 2017 7:46:29 PM cc.mallet.topics.ParallelTopicModel modelLogLikelihood
WARNING: NaN in log likelihood calculation
Feb 02, 2017 7:46:29 PM cc.mallet.topics.ParallelTopicModel estimate
INFO: <30> LL/token: 0
java.lang.ArrayIndexOutOfBoundsException: -1
at cc.mallet.topics.WorkerRunnable.sampleTopicsForOneDoc(WorkerRunnable.
java:489)
at cc.mallet.topics.WorkerRunnable.run(WorkerRunnable.java:275)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
1)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
at java.lang.Thread.run(Thread.java:745)

After which the execution stops but the program does not exit the process. I waited for like 3 hrs to see if it actually finishes the process, but it didn't.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.