Coder Social home page Coder Social logo

jokame / kaggle---bag-of-words-meets-bag-of-popcorns-using-word2vec-in-r Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mukul13/kaggle---bag-of-words-meets-bag-of-popcorns-using-word2vec-in-r

0.0 2.0 0.0 249 KB

An entry to Bag of words meets bag of popcorns using word2vec in R

R 84.68% C++ 15.32%

kaggle---bag-of-words-meets-bag-of-popcorns-using-word2vec-in-r's Introduction

Kaggle Bag of Words Meets Bag of Popcorns using Word2vec in R

An entry to Bag of words meets bag of popcorns using word2vec in R

To get competion data, click here

####Packages needed:

  • rword2vec
  • Rcpp and RcppArmadillo
  • rpart and randomForest
  • tm

####Code Explanation:

  • Word vectors are obtained by using rword2vec package.
  • Binary output file is converted into text file for further processing.
  • To create training dataset for sentiment classification for reviews using word vectors obtained above, two popular methods can be used:
  1. Vector Averaging
  2. Clustering
  • In first methods, we have to do vector averaging for each row of labeled and test dataset. There are many ways to do this but I have done this part using Rcpp and RcppArmadillo (R interface to C++) to avoid these compute intensive operations.
  • In clustering,we are doing bag of centroids instead of bag of words. This part is also done using Rcpp and RcppArmadillo to optimize speed.
  • Finally, classsification is done using random forest.

####Note: I'd recommend to read this python tutorial series first for better understanding of vector averaging and clustering.

####Test dataset results:

image

Classification using Vector Averaging

image2

Classification using Clustering

####Results:

  • Accuracy obtained for averaging and bag of centroids is more than their respective threshold but it is still very less.
  • Accuracy can be improved using different machine learning algorithms like GBM,xgboost,neural networks etc and using techniques like stacking, blending, bagging etc.

kaggle---bag-of-words-meets-bag-of-popcorns-using-word2vec-in-r's People

Contributors

mukul13 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.