Coder Social home page Coder Social logo

amrelsafy / covid19-urban-sentiment-analysis Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 5.0 30.86 MB

Arabic and English data analysis on the first weeks of COVID-19 quarantine with sentiment analysis and analyzing social media activity with respect to urban features of the governorates in Egypt.

Jupyter Notebook 100.00%
datascience sentiment analysis urban arabic nlp python covid egypt

covid19-urban-sentiment-analysis's Introduction

COVID19-Urban-Sentiment-Analysis

This is a data analysis study on the period one week before and after the COVID-19 quarantine was put in Egypt, This analysis investigates social media activity and sentiment for each Egyptian governorate in terms of its urban features and its activity regarding the COVID-19 pandemic.

The study proposes four different approaches for text data mining for spatio-temporal labelled data on Twitter and apply various NLP techniques for preprocessing before building a machine learning based model for sentiment classification and a dictionary for COVID-19 tweets detection.

Sample of the results

Sentiment of the people during the weeks before and after quarantine measures were imposed on the 14th of March halting all education on campus.

Sentiment Analysis

Pie chart of the percentage of COVID-19 tweets from all extracted tweets and a Word cloud extracted for the most mentioned words regarding COVID-19

Corona Pie Word Cloud

Creating the Egyptian Governorates dataset

I collected info on Egyptian governorates needed for the data collection and results in the future. I collected from three different sources the features needed using Pandas read-html for tables and combined them into one single dataset.

Twitter Data Mining

The study uses the TWINT tool in https://github.com/twintproject/twint to scrap tweets from Twitter overcoming the tweets limit of the Twitter API and session time I propose four different appraoches to extract spatially unlabelled tweets with respect to a certain area or city

  • Geotagged Approach
  • Keyword Search Approach
  • Profile Info Approach
  • Nearby Location Approach

Preprocessing and Sentiment Analysis

I have used the Arabic sentimentally annotated dataset from https://github.com/iamaziz/ar-embeddings and an English annotated dataset for building our model both found in our datasets folder.

For NLP I have used regex techniques to normalize the tweet from unnecessary formats, emojis, numbers and punctuations and used the Tashaphyne Arabic light stemmer from https://pypi.org/project/Tashaphyne/ to lightly stem the tweed used NLTK corpus on stopwords to remove Arabic stopwords.

To build the model, I used Scikit-learn's Count Vectorizer on unigrams and bigrams collection and TF-IDF transformer to extract text features from tweets into a TF-IDF matrix and a Naive Bayes classifier through a pipeline to build our model over the stemmed tweets.

COVID-19 Detection in Tweets

I have made a dictionary of the most common terms related to the coronavirus and used them to detect the tweets containing them to declare it a COVID-19 tweet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.