marcliberatore / mallet-lda Goto Github PK
View Code? Open in Web Editor NEWa Clojure wrapper over the LDA topic modeling implementation in MALLET
a Clojure wrapper over the LDA topic modeling implementation in MALLET
I tried to run lda on the sample documents with the code example given in the README:
(ns mallet-topic-model.core
(:require [marcliberatore.mallet-lda :refer [make-instance-list lda]]
[marcliberatore.mallet-lda.misc :refer [load-sample-documents]]))
(lda (make-instance-list (load-sample-documents)))
I got the following error:
Failed trying to require mallet-topic-model.core with: java.lang.NullPointerException: null
misc.clj:29 marcliberatore.mallet-lda.misc/make-documents
misc.clj:34 marcliberatore.mallet-lda.misc/load-sample-documents
core.clj:13 mallet-topic-model.core/eval6264
..stacktrace elided.
and in the console:
Perhaps the 'resources' directories weren't copied into the 'class' directory.
Continuing.
Perhaps the sample documents in the resources folder of this project isn't available when we just include this jar in our project.clj?
I found your library couple weeks ago from your blog when i was looking LDA libraries and preferable written in Clojure.
For Clojure there were only 2project: yours and deprecated chisel.
And your library exists on VersionEye, which makes it very easy to track new versions and dependencies.
I decided to add more functionalities to make it more suitable for my next project. You can check out current progress on my fork.
I'm currently working on model's serialization and i'm also going to add some simple tests on this weekend.
Is there better Mallet's documentation? Or does researcher at Amhart also have to use that ugly JavaDoc?
I'm not researcher myself, all my knowledge about LDA comes from this paper here. With my limited knowledge is quite difficult de-cipher what some methods are doing and write user-friendly doc-string.
Should i start bombing mail-list question-by-questions until they'll see benefits of having decent documentation? :d sounds like great hobby.
Ok, back to hacking.
I tried out the sample which you have given in the README. The sample gives an error message and stops indefinitely. I am attaching the code and the error log below,
Code:
(let [data [[1 ["a" "little" "lamb"]] [2 ["row" "your" "boat"]] [3 ["boat" "river" "dance"]]] instance-list (make-instance-list data)] (lda instance-list))
Output:
Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file.
Perhaps the 'resources' directories weren't copied into the 'class' directory.
Continuing.
Feb 02, 2017 7:46:27 PM cc.mallet.topics.ParallelTopicModel
INFO: Coded LDA: 10 topics, 4 topic bits, 1111 topic mask
Feb 02, 2017 7:46:27 PM cc.mallet.topics.ParallelTopicModel initializeHistograms
INFO: max tokens: 3
Feb 02, 2017 7:46:27 PM cc.mallet.topics.ParallelTopicModel initializeHistograms
INFO: total tokens: 9
Feb 02, 2017 7:46:28 PM cc.mallet.topics.ParallelTopicModel estimate
INFO: <10> LL/token: -4.54015
Feb 02, 2017 7:46:28 PM cc.mallet.topics.ParallelTopicModel estimate
INFO: <20> LL/token: -4.54015
Feb 02, 2017 7:46:29 PM cc.mallet.topics.ParallelTopicModel optimizeBeta
INFO: [beta: 0]
Feb 02, 2017 7:46:29 PM cc.mallet.topics.ParallelTopicModel modelLogLikelihood
WARNING: NaN in log likelihood calculation
Feb 02, 2017 7:46:29 PM cc.mallet.topics.ParallelTopicModel estimate
INFO: <30> LL/token: 0
java.lang.ArrayIndexOutOfBoundsException: -1
at cc.mallet.topics.WorkerRunnable.sampleTopicsForOneDoc(WorkerRunnable.
java:489)
at cc.mallet.topics.WorkerRunnable.run(WorkerRunnable.java:275)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
1)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
at java.lang.Thread.run(Thread.java:745)
After which the execution stops but the program does not exit the process. I waited for like 3 hrs to see if it actually finishes the process, but it didn't.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.