Coder Social home page Coder Social logo

dheerajgadwala / sentiment-analysis-on-twitter-data Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 414 KB

In this project, we train and test a different models such as SVM, Decision Trees, K-Nearest Neighbors, Logistic Regression, etc. and compare their performance based on accuracy and choose the best model for the application.

Jupyter Notebook 100.00%
machine-learning natural-language-processing

sentiment-analysis-on-twitter-data's Introduction

Introduction

Twitter is a popular online platform where large number of people post their opinions of variety of matters. These opinions could be with positive or negative connotation. Identifying the tone of the message can help filter the posts that are offensive or degrading.

Applying Natural Language Processing techniques and Machine Learning algorithms we could train models to classify a given text into positive or negative message. Such models can be used to quickly filter out offensive or inappropriate content from social media platform making it interesting.

In this project, we train and test a different model such as Support Vector Machine, Decision Trees, KNearest Neighbors, Logistic Regression, etc. and compare their performance based on accuracy and choose the best model for the application.

Dataset

We found a suitable datasetafter scouring through Kaggle. This dataset contains 1.6M tweets that were extracted using the twitter developer API. The raw data from twitter was annotated with either a 0 indicating negative emotion, or a 4 indicating positive emotion.

Instead of manually annotating the tweets, the creators used the twitter search API to collect tweets that had indicators of positive or negative emotion like presence of emoticons {for instance, :)-> positive and :(-> negative}.

The dataset contains 6 fields in total: Target {polarity of the tweet}, Ids: tweet id, date: date on which the tweet was posted, flag: The query, if a query was involved in fetching the data else ‘NO_QUERY’, user: the user that posted the tweet, and text: the content of the tweet.

Data Preprocessing

The data set contains 6 columns but for classification of tweets, we only required the classified target column andtheir respective tweets column. Hence other unnecessary columns were dropped. The following preprocessing steps were taken only on the tweet columns:

  1. Removed URLs, web-addresses, and email ids.
  2. Convert the tweets to lower case.
  3. Tokenize the data: Remove the work.
  4. Remove stop-words: Dropping common words that do not add meaning or value to the classification. We obtains the list of stop-words in Englishfrom nltk.corpus library.
  5. Stemming: PorterStemmer algorithm
  6. Lemmatization: WordNetLemmatizer.
  7. Vectorization: TFIDF and Glove.

Evaluation Methodolgy

After training the model with 85% of the data, we test the model with 15% of the data and evaluate its performance using different metrics such as Accuracy, Classification Report, and Receiver Operative Characteristic (ROC) curve.

The Classification reports provides the count of True Negatives (0,0), False positives (0,1), False Negatives (1,0) and True Positives (1,1). With ROC curve, we can observe the trend of True Positive rate vs False Positive rate.

Comparing these metrics helps us evaluate the performance of a models for the given application.

sentiment-analysis-on-twitter-data's People

Contributors

dheerajgadwala avatar

Stargazers

Antonios Mavridis avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.