Coder Social home page Coder Social logo

topic-model-tutorial's Introduction

topic-model-tutorial

This repository contains notebooks, slides, and data for the short tutorial "Topic modelling with Scikit-learn", presented at PyData Dublin in September 2017.

Contents

The summary tutorial is covered in these slides. There are three associated IPython notebooks:

  1. Text Preprocessing: Provides a basic introduction to preprocessing documents with scitkit-learn.
  2. NMF Topic Models: Covers the application and interpretation of topic models via the NMF implementation provided by scitkit-learn.
  3. Parameter Selection for NMF: More advanced material on selecting the number of topics for NMF, using topic coherence.

To demonstrate the topic modelling techniques, a sample dataset is provided here. This consists of 4,551 news articles from 2016, stored in a single text file (25MB), one article per line.

Dependencies

This code has been tested with Python 3.6. The core package requirements are:

  • scikit-learn (tested with v0.19.0)
  • numpy
  • matplotlib

The model selection code also relies on the gensim package to build a Word2Vec model. A pre-built Word2Vec model for the sample dataset is also provided here for download (71MB).

Links and References

  • Scikit-learn home
  • NMF documentation for scikit-learn
  • Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature. [PDF]
  • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4). [Link]
  • O’Callaghan, D., Greene, D., Carthy, J., & Cunningham, P. (2015). An analysis of the coherence of descriptors in topic modeling. Expert Systems with Applications. [PDF]

topic-model-tutorial's People

Contributors

derekgreene avatar

Watchers

James Cloos avatar zhouyonglong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.