Coder Social home page Coder Social logo

glintlda's Introduction

Glint LDA

Scalable Distributed LDA implementation for Spark & Glint

This implementation is based on LightLDA.

Usage

Make sure you have Glint running, for a simple localhost test with 2 servers, you can locally run Glint as follows:

sbt "run master"
sbt "run server"
sbt "run server"

Next, load in a dataset in Spark with an RDD:

// Preprocessing of data ...
// End result should be an RDD of breeze sparse vectors that represent bag-of-words term frequency vectors
rdd = sc.textFile(...).map(x => SparseVector[Int](...))

Construct the Glint client that acts as an interface to the running parameter servers

// Open glint client with a path to a specific configuration file
val gc = Client(ConfigFactory.parseFile(new java.io.File(configFile)))

Set the LDA parameters and call the fitMetropolisHastings function to run the LDA algorithm

// LDA topic model with 100,000 terms and 100 topics
val ldaConfig = new LDAConfig()
ldaConfig.setα(0.5)
ldaConfig.setβ(0.01)
ldaConfig.setTopics(100)
ldaConfig.setVocabularyTerms(100000)
val model = Solver.fitMetropolisHastings(sc, gc, rdd, ldaConfig, 100)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.