Coder Social home page Coder Social logo

lukasgebhard / political-news-filter Goto Github PK

View Code? Open in Web Editor NEW
26.0 4.0 7.0 64 KB

A classifier that distinguishes political from non-political news articles.

License: Apache License 2.0

Python 94.23% Shell 5.77%
classification politics nlp computational-social-science text-mining news news-articles classifier cuda

political-news-filter's Introduction

Political News Filter

Political News Filter classifies English news articles regarding whether they cover policy topics.

It uses a broad characterization of politics: Politics is about "who gets what, when, and how" (Lasswell, 1936). As a result, Political News Filter may consider business news or tech news as political, depending on actual contents.

Requirements

  • Python 3.6
  • Pandas 0.24.1
  • NumPy 1.18.1
  • Keras 2.3.1
  • TensorFlow 2.1.0

Political News Filter supports both CPU and GPU processing. The latter is faster but requires a CUDA-capable graphics card and the CUDA toolkit.

Setup

  1. Clone this repository:

    $ git clone https://github.com/lukasgebhard/Political-News-Filter.git
    $ cd Political-News-Filter
  2. Download and extract pon_classifier.zip into the repository folder. Its inflated size is 1.1 GB.

  3. Install Python dependencies. For example, create a virtual environment:

    $ virtualenv --python=python3.6 venv
    $ source venv/bin/activate
    $ pip install -r requirements.txt
  4. Verify the installation was successful:

    $ ./check_installation.sh
    Hooray! Political News Filter is properly installed and ready to use.

Usage Demo

Start a Python session:

$ python3

Create exemplary articles:

>>> political_article = '''White House declares war against terror. The US government officially announced a ''' \
                        '''large-scale military offensive against terrorism. Today, the Senate agreed to spend an ''' \
                        '''additional 300 billion dollars on the advancement of combat drones to be used against ''' \
                        '''global terrorism. Opposition members sharply criticize the government. ''' \
                        '''"War leads to fear and suffering. ''' \
                        '''Fear and suffering is the ideal breeding ground for terrorism. So talking about a ''' \
                        '''war against terror is cynical. It's actually a war supporting terror."'''
>>> nonpolitical_article = '''Table tennis world cup 2025 takes place in South Korea. ''' \
                           '''The 2025 world cup in table tennis will be hosted by South Korea, ''' \
                           '''the Table Tennis World Commitee announced yesterday. ''' \
                           '''Three-time world champion, Hu Ho Han, did not pass the qualification round, ''' \
                           '''to the advantage of underdog Bob Bobby who has been playing outstanding matches ''' \
                           '''in the National Table Tennis League this year.'''

To filter a list of news articles, call filter_news:

>>> from political_news_filter import filter_news
>>> political_article == filter_news([political_article, nonpolitical_article])[0]
True

If you need more flexibility, you can directly call the underlying classifier:

>>> from political_news_filter import Classifier
>>> classifier = Classifier()
>>> probabilities = classifier.estimate([political_article, nonpolitical_article])
>>> probabilities[0] > 0.99
True
>>> probabilities[1] < 0.01
True

Please read the docstrings for further information.

Runtime Performance

Below are some benchmarks on a notebook with 6 CPU cores @ 2.6 GHz, a GPU with 4 GB GRAM and CUDA capability 7.5, 32 GB RAM, and a PCIe SSD drive:

Task On CPU On GPU
One-time Initialization 30 sec 15 sec
Classification of 1,000 articles 1.8 sec 1.3 sec

Architecture

The classifier is based on a model by Heng Zheng submitted to Kaggle under the Apache 2.0 license. It is a convolutional neural network with a 100-dimensional GloVe embedding layer, three convolutional layers, each one followed by a ReLu layer and a pooling layer, and finally a softmax output layer. During training, a cross-entropy loss function is minimized using dropout regularization.

Training & Evaluation

I created a labeled set of 0.57M news articles, selected from:

After fitting the classifier on 87.5 % of the articles, testing it on the remaining 12.5 % yields:

  • F1 = 94.4
  • Precision = 95.6
  • Recall = 93.2

How to Cite

If you use Political News Filter, please cite our poster:

@InProceedings{POLUSA,
  author     = {Gebhard, Lukas and Hamborg, Felix},
  title      = {The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and Outlet Popularity},
  year       = {2020},
  month      = {August},
  booktitle  = {Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20)},
  venue      = {Virtual event, China},
  publisher  = {Association for Computing Machinery},
  doi        = {10.1145/3383583.3398567}
}

political-news-filter's People

Contributors

lukasgebhard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

political-news-filter's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.