Coder Social home page Coder Social logo

textsentimentclassification's Introduction

Text Sentiment Classification

System Dependencies

  • Python 2.7
  • NumPy
  • Scipy

The Program

This is a program that will go through the stages of preprocessing data, training the data using perceptron, naive bayes, knn and rocchio classifiers and testing the models using five-fold cross validation techniques. A preprocessing option to strip punctionuation, and/or stop words can also be specified. The features vectors of the data can be represented in either binary, bag of words, or tf-idf approach. These options can be configured by passing the respective parameters as arguments.

If none of the optional arguments are provided, the program will be run with default options of not stripping the punctuation, epoch of 1, using a binary feature vector model, KNN classifier with k=1 and the Euclidean distance metric.

The Dataset

The dataset used in this program is the v2.0 polarity dataset from Cornell. There are two classes in this dataset, positive and negative, each classes has 1000 files for a total of 2000 files. If you would like to train on a similar dataset, please make sure that the dataset follows this format:

txt_sentoken/  
    positive_folder/
        file_1.txt file_2.txt ... file_42.txt
    negative_folder/
        file_43.txt file_44.txt ...

Running the Program

A sample run of the program might be:

python sentiment_classification.py review_polarity/txt_sentoken --nopunct

This would run the program with the default perceptron classifier using the binary model to create feature vectors after stripping out the punctuation from the dataset with the euclidean distance metric.

The Algorithms

To Do:

  • default number of stopwords is 173, which is a bit high and gives lower accuracy
  • try normalizing binary to see if better accuracy
  • save and load data from csv files
  • save weights of relevant classifiers (ie lr)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.