Coder Social home page Coder Social logo

sentiment-analysis's Introduction

sentiment-analysis

This is a simple project that takes on sentiment analysis. I'm going to try a lot of different methods, embeddings and approaches and see what gives the best results.

Preprocessing

Sample text before preprocessing:
Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!

1. Simple

  • lowercase everything

  • remove punctuation

  • remove multiple whitespaces

    The same text sample after applying this preprocessing:
    bromwell high is a cartoon comedy it ran at the same time as some other programs about school life such as teachers my 35 years in the teaching profession lead me to believe that bromwell highs satire is much closer to reality than is teachers the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled at high a classic line inspector im here to sack one of your teachers student welcome to bromwell high i expect that many adults of my age think that bromwell high is far fetched what a pity that it isnt

2. Standard

  • lowercase everything
  • remove punctuation
  • remove multiple whitespaces
  • remove stopwords

3. Advanced

  • lowercase everything
  • remove punctuation
  • remove multiple whitespaces
  • remove stopwords
  • lemmatization/stemming

Benchmarks

Preprocessing Embedding Model Accuracy
Simple One-hot Logistic Regression 0.88152
Simple One-hot SVM 0.87112
Standard One-hot Logistic Regression 0.88176
Standard One-hot SVM 0.87004
Advanced One-hot Logistic Regression 0.87141

Setup

I'm using Anaconda Python 3.7.1. All the other packages are in requirements.txt

To install the required packages just run pip3 install -r requirements.txt and all of the packages should be installed for you.

Download SpaCy english language: python -m spacy download en_core_web_lg

Running the training

If you want to run the training of a model, here is the usage (which you can get by typing python3 main.py -h):

usage: main.py [-h] -m {logistic,svm} -p {simple,standard,advanced} [-v]

required arguments:
  -m {logistic,svm}, --model {logistic,svm}
                      Specify which model to use
  -p {simple,standard,advanced}, --preprocess {simple,standard,advanced}
                      Specify which preprocessing to use

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Verbose output

TODO

  • Stemming/Lemmatizing
  • n-grams
  • TF-IDF
  • Word2Vec/GloVe
  • SVM

sentiment-analysis's People

Contributors

lukanovak93 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.