Coder Social home page Coder Social logo

evan-l-munson / saotd Goto Github PK

View Code? Open in Web Editor NEW
10.0 5.0 9.0 97.65 MB

Sentiment Analysis of Twitter Data (saotd)

R 98.66% TeX 1.34%
sentiment-analysis twitter-data tweets tidy-data plot r topicanalysis latent-dirichlet-allocation bing-lexicon n-grams

saotd's Introduction

Sentiment Analysis of Twitter Data

CRAN_Status_Badge R-CMD-check codecov Downloads Total Downloads GitHub last commit status DOI

Purpose

This package is focused on utilizing Twitter data due to its widespread global acceptance. The rapid expansion and acceptance of social media has opened doors into opinions and perceptions that were never as accessible as they are with today’s prevalence of mobile technology. Harvested Twitter data, analyzed for opinions and sentiment can provide powerful insight into a population. This insight can assist companies by letting them better understand their target population. The knowledge gained can also enable governments to better understand a population so they can make more informed decisions for that population. During the course of this research, data was acquired through the Public Twitter Application Programming Interface (API), to obtain Tweets as the foundation of data and will build a methodology utilizing a topic modeling and lexicographical approach to analyze the sentiment and opinions of text in English to determine a general sentiment such as positive or negative. The more people express themselves on social media, this application can be use1`d to gauge the general feeling of people.

Package

The saotd package is an R interface to the Twitter API and can be used to acquire Tweets based on user selected #hashtags and was developed utilizing a tidyverse approach. The package was designed to allow a user to conduct a complete analysis with the contained functions. The package will clean and tidy the Twitter data, determine the latent topics within the Tweets utilizing Latent Dirichlet Allocation (LDA), determine a sentiment score using the Bing lexicon dictionary and output visualizations.

Installation

You can install the CRAN version using:

install.packages("saotd")

You can install the development version from GitHub using:

install.packages("devtools")
devtools::install_github('evan-l-munson/saotd', build_vignettes = TRUE)

Using saotd

The functions that are provided by saotd are broken down into five different categories: Acquire, Explore, Topic Analysis, Sentiment Calculation, and Visualizations.

  • Acquire

    • tweet_acquire allows a user to acquire Tweets of their choosing by accessing the Twitter API. In order to do this the user needs to have a Twitter account. Additionally once the user has an account they will then need to sign up for a Twitter Developers account. Once a user has a Twitter developers account and has received their individual consumer key, consumer secret key, access token, and access secret key, they can acquire Tweets based on a list of hashtags and a requested number of entries per hashtag.
  • Explore

    • tweet_tidy removes all emoticons, punctuation, weblinks, etc and converts converts the data to a tidy structure.
    • merge_terms merges terms within a dataframe and prevents redundancy in the analysis.
    • unigram displays the text Uni-Grams within the Twitter data in sequence from the most used to the least used. A Uni-Gram is a single word.
    • bigram displays the text Bi-Grams within the Twitter data in sequence from the most used to the least used. A Bi-Gram is a combination of two consecutive words.
    • trigram displays the text Tri-Grams within the Twitter data in sequence from the most used to the least used. A Tri-Gram is a combination of three consecutive words.
    • bigram_network Bi-Gram networks builds on computed Bi-Grams. Bi-Gram networks serve as a visualization tool that displays the relationships between the words simultaneously as opposed to a tabular display of Bi-Gram words.
    • word_corr displays the word correlation between words.
    • word_corr_network displays the mutual relationship between words. The correlation network shows higher correlations with a thicker and darker edge color.
  • Topic Analysis

    • number_topics determines the optimal number of Latent topics within a dataframe by tuning the Latent Dirichlet Allocation (LDA) model parameters. Uses the ldatuning package and outputs an ldatuning plot. This process can be time consuming depending on the size of the dataframe.
    • tweet_topics determines the Latent topics within a dataframe by using Latent Dirichlet Allocation (LDA) model parameters. Uses the ldatuning package and outputs an ldatuning plot. Prepares Tweet text, creates DTM, conducts LDA, display data terms associated with each topic.
  • Sentiment Calculation

    • tweet_scores calculates the Sentiment Scores using the Bing Lexicon Dictionary that will account for sentiment by hashtag or topic.
    • posneg_words determines and displays the most positive and negative words within the Twitter data.
    • tweet_min_scores determines the minimum scores for either the entire dataset or the minimum scores associated with a hashtag or topic analysis.
    • tweet_max_scores determines the maximum scores for either the entire dataset or the maximum scores associated with a hashtag or topic analysis.
  • Visualizations

    • tweet_corpus_distribution determines the scores distribution for the entire Twitter data corpus.
    • tweet_distribution determines the scores distribution by hashtag or topic for Twitter data.
    • tweet_box displays the distribution scores of either hashtag or topic Twitter data.
    • tweet_violin displays the distribution scores of either hashtag or topic Twitter data.
    • tweet_time displays how the Twitter data sentiment scores through time.
    • tweet_worldmap function is not longer exported, as the Twitter data does not contain latitude and longitude values. Displays the location of a Tweet across the globe by hashtag or topic.

Example

For an example of how to use this package, find the vignette at:

library(saotd)
utils::vignette("saotd")

Meta

  • license:

    • All code is licensed GPL.
    • All data is from public data sources.
  • Get citation information for saotd in R by running:

citation("saotd")

Getting help

If you encounter a clear bug, please file a minimal reproducible example on github.

Contributing

If you would like to contribute, please create a Pull Request and make appropriate applicable changes for review.

References

saotd's People

Contributors

arfon avatar evan-l-munson avatar kbenoit avatar seanstuntz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

saotd's Issues

Uhorchak Review

Overall excellent work. Only issue I received during use was with the bi-gram network.
SAoTD::Bigram.Network()

saotd not installing on r

E> * preparing 'saotd':
E> * checking DESCRIPTION meta-information ... OK
E> Error in loadVignetteBuilder(pkgdir, TRUE) :
E> vignette builder 'knitr' not found
E> Execution halted

Stuntz feedback 2

Rename Columns

A handy function! When I input a new column name and click Save Name, the name I entered remains and the Column Names reverts back to the first name. The script correctly does not allow you to assign two columns the same name. What is the purpose of the search bar here if I cannot search for column names?

Dependency on old Rlang package

When I follow your instructions on https://cran.r-project.org/web/packages/saotd/vignettes/saotd.html under the Aquire section, R comes up with an error:

Error in mutate_impl(.data, dots) :
Evaluation error: as_dictionary() is defunct as of rlang 0.3.0.
Please use as_data_pronoun() instead

Unfortunately, I don't have the opportunity to test if this error occurs on other systems and r installations. The error message, however, indicates an old dependency, which should be updated. I hope somebody can fix this.

Proposal Review

@evan-l-munson

Here's what I see:

Overall: The package structure looks good - everything is in place

Feature Table: Extremely detailed, very well done

Potential issues: Pulling Twitter data (particularly via the API) requires that you create an app on Twitter to generate access keys. Depending on how the app is set up you will have a monthly limit on the number of API requests that can be made. If this is published online you could have many people making requests on your behalf and hit that limit very quickly.

Also, if this is published as a shiny app you won't be able to do anything with the data afterwards. From what I understand your use case seems to make this a good candidate to be a shiny gadget, rather than a shiny app.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.