This project uses Naive Bayes classification to perform sentiment analysis on the 100 most recent tweets relating to a user-specified keyword.
Twitter sentiment corpus (see also https://github.com/karanluthra/twitter-sentiment-training)
This dataset, created by Sanders Analytics / Niek J. Sanders, consists of 5513 hand-classified tweets.
- Set up Python wrapper for Twitter API
- Process tweets in the training set
- Remove URLs, usernames and hashtag symbols
- Remove stopwords (e.g. pronouns) and punctuation
- Tokenize tweets into list of individual words
- Build model using Naive Bayes classification
- Build the vocabulary (i.e. all words in training set)
- Match training tweets against vocabulary and build features vector
- Train the classifier
- Test the model
- Input a search term
- Fetch most recent 100 tweets relating to the search term
- Process tweets (as above for training set)
- Classify sentiment of each tweet
- Output overall sentiment relating to the topic
- Jupyter Notebook to run python code and display results
- python-twitter library, a Python wrapper around the Twitter API
- Python re library for regular expression matching
- Python nltk library, a set of text processing tools for classification and tokenization etc.
- Creating The Twitter Sentiment Analysis Program in Python with Naive Bayes Classification, an article I followed along as a tutorial
- Rate Limiting of Twitter API