Coder Social home page Coder Social logo

chenghuige / word_observation_rbm_nce Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vaswani/word_observation_rbm_nce

0.0 2.0 0.0 2.57 MB

Train word embeddings with an RBM and Noise Contrastive Estimation training

Python 5.14% Makefile 0.57% C++ 93.61% Shell 0.68%

word_observation_rbm_nce's Introduction

WORD_OBSERVATION_RBM_NCE

Train word embeddings with an RBM and Noise Contrastive Estimation training

INTRODUCTION

  1. What are word embeddings ? The embedding of a word is a vector of real numbers in 'n' dimensional space where 'n' is specified by the user. Good word embeddings will cluster similar words together. There has been a lot of work in NLP on improving classification tasks using word embeddings. For example a. Word representations: a simple and general method for semi-supervised learning. http://dl.acm.org/citation.cfm?id=1858721 b. Natural Language Processing (almost) from Scratch. http://arxiv.org/abs/1103.0398 ...

  2. What is a restricted Boltzmann machine (RBM) ? A RBM is a stochastic model that can learn a joint probabilitity distribution over its inputs (http://en.wikipedia.org/wiki/Restricted_Boltzmann_machine). In this case, the inputs are n-grams of words. (The user can specify the order). The architecture of the RBM implemented in this tool is the same as http://www.cs.toronto.edu/~gdahl/papers/wrrbm_icml2012.pdf.

  3. What is noise contrastive estimation (NCE) ? In http://www.cs.toronto.edu/~gdahl/papers/wrrbm_icml2012.pdf, the RBM was trained with contrastive divergence. This implementation trains the word RBM with Noise Contrastive Estimation (NCE). http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_GutmannH10.pdf. NCE allows for a principled and easy way to train unnormalized statistical models. The goal is to estimate the parameters to differentiate between observed data and artifically generated noise. Unlike contrastive divergence, where the objective function is not clear, in NCE, we are optimizing a principled objective function whose progress we can track. NCE has been employed to train neural language models with success. a. A fast and simple algorithm for training neural probabilistic language models. In Proceedings of ICML, 2012 b. Decoding with large-scale neural language models improves translation. Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang, 2013. In Proceedings of EMNLP (This paper is accompanied with very efficient code to train a large scale neural language model. You can find it here http://nlg.isi.edu/software/nplm/)

DEPENDENCIES

  1. C++ compiler and GNU make
  2. Boost 1.47.0 or later http://www.boost.org
  3. Eigen 3.1.x http://eigen.tuxfamily.org
  4. Python 2.7.x, not 3.x http://python.org (To run the data prep scripts)

COMPILING

  1. Go to the src directory
  2. Modify the variables according to your environment
  3. type Make
  4. The binary RBMDahlNCE should be produced

USAGE

  1. Preparing your data: The program RBMDahlNCE needs 4 inputs --train_file -> A training file of integerized space separated n-grams. The integers must be between 0 and V where V is the Size of the vocabulary --words_file -> The vocabulary, one word per line where line 'i' is the word with integer representation 'i-1' (Assuming that lines start at index 1) --unigram_probs_file -> The unigram probabilities of the words. Each line has two tab separate entries: <integer_id_of_word>\t<unigram_probability_of_word> In prep_scripts/generateTrainingData.py can generate the data for you.

  2. Running the RBM: Please look at test/trainEmbeddings.sh for usage

For questions, comments or to report bugs, please contact [email protected],[email protected]

word_observation_rbm_nce's People

Watchers

James Cloos avatar allen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.