Coder Social home page Coder Social logo

pennguin's Introduction

News-based Sentiment Analysis with GDELT

Introduction

In this project, we explore the feasibility of using news articles to predict/interpolate the relationship (friendliness) between important geo-political entities, such as big companies, politicians, and military representatives. We hope to analyze and, ideally, forecast the trend of socioeconomic conflicts centered around these entities. For instance, we ask questions like "Is the relationship between Shell and Nigeria governors worsening?" or "How many 'hidden parties' exist within these politicians?".

In doing so, we first scrape all relevant news articles using the URLs from the GDELT Project database. Next, we tokenize the articles into sentences, and we detect the entity co-mentions within each sentence. Whenever there is a co-mention detected within a sentence, e.g., "A and B failed to resolve their disputes across a wide range of issue areas.", we calculate a Sentiment Score based on the Goldstein Conflict Score by detecting the event(s) mentioned in that sentence. We use these sentiment scores as a proxy for the friendliness between the interested geo-political players. Finally, we construct a relationship graph using the co-mention edges, together with tonality scores, to perform analysis, e.g., graph clustering, and visualizations.

To sum, in this repo, we archived the code snippets for,

  • News scraping given URLs, text cleaning, and sentence tokenization
  • Entity mention detection (workable, but in development)
  • Co-reference resolution (experimenting)
  • Event detection and scoring (experimenting with advanced event extraction features)
  • Graph clustering and visualization

Environment Setup

CONDA IS REQUIRED FOR SETUP. To create the same environment, please run conda create -f environment.yml in terminal/command line. The environment file locates in the project home folder. To activate the newly created environment, run conda activate pennguin

Get Started with Source Code

For low-level APIs, i.e., scraping, co-mention detection, and event detection & grading, please directly refer to the source code under $REPO_FOLDER/src. Detailed description and sample usage are documented in the source file.

For graph clustering and visualization, please refer to $REPO_FOLDER/examples and read the Jupyter notebooks.

Misc

  • $REPO_FOLDER/analysis contains code for past analysis. Each analysis has its own source code folder, data folder, and output folder.
  • $REPO_FOLDER/data contains global data files shared across the entire project.

pennguin's People

Contributors

guest400123064 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.