Coder Social home page Coder Social logo

twitter_sentiment_analysis_python's Introduction

twitter_sentiment_analysis_python
=================================
A study in classification:

Purpose: Classify tweets as Positive or Negative

DataSet: Tweets from twitter stream

Requirements:
MongoDB running on port 27017
Redis on 6739
API keys from Twitter

Included data sets:
AFINN-111.txt   Dictionary of words with sentiment value. Can be used as a feature set.
output*.txt   Results of query of live twitter stream or simple twitter query. Can use to test if don't want to get your own sample data using twitterstream.py

> sudo mongod
Code files:
[x] init.py
[x] get_stream.py
> Setup: set environment variables (TWITTER_API_KEY, TWITTER_API_SECRET, TWITTER_ACCESS_TOKEN_KEY, TWITTER_ACCESS_TOKEN_SECRET)
*to run: python get_stream.py #creates and writes output to file with name prepended by timestamp



Approach 0:
First pass 
Using the dictionary of word-sentiments
1. Calculate + and - log2 scores
Each tweet starts with + = -1.0, - = -1.0
Each distinct occurance of word in dict constributes log2(abs(score)) to either the + if is + score, or - if is - score
2. If abs(+) > abs(-), tweet is labeled 'positive'

Second pass
For unknown interesting words, calculate a likely +/- score.
1. Calculate likelihood that the word is in a + tweet vs - tweet using Bayes.
2. Store results

Third pass
Recalculate ratings of tweets


Approach 1:
Supervised Learning
1. Provide a training set of tweets that have been properly labeled with positive or negative.
2. Train.
3. Test.

Training/Initializing our classifier:
Choose Features:
1. Each tweet is broken up into tokens (lowercase), and tokens less than length of 3 and containing non English word characters are discarded.
2. Each tweet is mapped to a list of word features, distinct words ordered by frequency.
3. Calculate  P(word | + ), P(word | -)
Discard any words that are difficult to tell if +/-
Criteria: word is too common. assumption: common words such as 'the' and 'and' do not have clear sentiment meaning. 
How to tell if word is something like 'the' and 'and'?
Theory: frequency of word is high in each and P (word | + ) ~ P (word | - )


Use the Naive Bayes classifier:
Use the prior probabilty of each token with frequency of token in training set, and contribution from each feature.
If the word 'amazing' appears in 1 of 5 of positive tweets and none of negative tweets, the likelihood that the tweet is 'positive' is multiplied by .2

Actual classification:
1. Tweet starts out with likelihood 'positive': -1.0, 'negative': -1.0 ( <- log base 2 (1/2))
2. Add contributions from features
(ignore words not in feature list)
note: log base 2 again
eg) 'pos': -6.2, 'neg' : -15.3
3. Classify tweet as 'pos' or 'neg'. If abs('positive') > abs('negative'), classify as 'pos'

Primer on Bayes Probability:
Prior Probability: P(A), P(B)
Conditional Probability: P(x|A), P(x|B)
Posterior Probability: P(A|x), P(B|x)

[x] tweet_sentiment.py
*to run: python tweet_sentiment.py AFINN-111.txt tweet_file.txt result.txt*
[x] term_sentiment.py
*to run: python term_sentiment.py AFINN-111.txt tweet_file result.txt*
[x] frequency.py
*to run: python frequency.py tweet_file result.txt*


Still need to determine which state is the happiest.
Going to start with...

Dictionary of states (only 50 nodes. not too bad): (Sentiment Rating, tweet counts)

If can determine state of the tweeter:
  Determine sentiment of a tweet.
  Update dictionary with tweet sentiment value (compute avg?)

Flip out the dictionary such that
  get new dict with 
    sentiment rating: [state1, state2...]

Our sentiment:state_array dict is in asc. order so,
  we grab the last 10 or so and reverse this bunch
  print out up to 10 of the states. --> happiest 10 states!

twitter_sentiment_analysis_python's People

Contributors

pigate avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.