Coder Social home page Coder Social logo

clickbait-classifier-1's Introduction

Clickbait Classifier

This is a very naïve attempt at classifying article titles into one of two groups: "clickbait" (à la Buzzfeed and Clickhole) or "news" (à la The New York Times). I was curious if this could be done accurately; I can't think of a good definition for "clickbait" but I know it when I see it.

Eventually, I'd like to add a simple REST API and build a chrome extension that automatically deletes clickbait from my browser as I surf the internet. I truly cannot stand Buzzfeed.

Setup / Installation

Theoretically, the only command you need to run is:

clickbait-classifier/ $ pip install -r requirements.txt

Of course, you may want to set up a virtualenv:

clickbait-classifier/ $ virtualenv env
clickbait-classifier/ $ . env/bin/activate
(env)clickbait-classifier/ $ pip install -r requirements.txt

The classifier's dependencies can be found in requirements.txt. I'd like to say that it's a simple pip install but depending on your version of Python and your OS, scikit-learn and lxml may give you some issues. Be warned: these issues may take some time to resolve. I think it took me about 3 hours to get everything set up and I ended up having to start using a new package manager (brew) and a new Python install.

Usage

The code is pretty messy, but the general idea is that there is some article data in the data/ directory, and classifier.py uses this for training. You can download more data from Buzzfeed and Clickhole using the tools in scripts/.

clickbait-classifier/ $ ./scripts/scrape_buzzfeed.py > data/buzzfeed2.json  
clickbait-classifier/ $ ./scripts/scrape_clickhole.py > data/clickhole2.json  

If you feel like testing a few article titles, you can get a simple testing loop like so:

clickbait-classifier/ $ ./interactive.py

This will load the classifier, train it, and then present you with a simple loop where you can paste in article titles and see the results. You can quit using c-C. For example:

clickbait-classifier/ $ ./interactive.py
Loading classifier (may take time to train.)
Classification report:
             precision    recall  f1-score   support

  clickbait       0.91      0.62      0.74       172
       news       0.90      0.98      0.94       621

avg / total       0.91      0.91      0.90       793


  -9.0500 10 things         -5.3044 new
  -9.0500 11 things         -5.7492 bush
  -9.0500 13 times          -5.8460 overview
  -9.0500 15 times          -5.9519 iraq
  -9.0500 19 puppies        -5.9645 war
  -9.0500 2014              -5.9828 president
  -9.0500 2015              -5.9852 clinton
  -9.0500 21                -6.1021 special
  -9.0500 23 life           -6.1206 nation
  -9.0500 24                -6.1464 report
  -9.0500 25                -6.1778 campaign
  -9.0500 27                -6.2223 china
  -9.0500 33                -6.2880 york
  -9.0500 35                -6.2880 new york
  -9.0500 90s               -6.2994 plan
  -9.0500 90s kid           -6.3191 special report
  -9.0500 90s kids          -6.3523 says
  -9.0500 90s kids rejoice    -6.4277 big
  -9.0500 90s sitcom        -6.4423 challenged
  -9.0500 absolute          -6.4465 house
Done.

Article title: 43 Reasons 2014 Was The Best Year Ever To Be A Nerd
(95.13% clickbait, 4.87% news) -> clickbait

Article title: Protesters And Police Clash In Missouri For A Second Night
(19.32% clickbait, 80.68% news) -> news

Article title: 29 Christmas Vines That Will Make You Laugh Every Time
(88.25% clickbait, 11.75% news) -> clickbait

Article title: New Subprime Boom Ties Risky Loans to Car Titles
(10.98% clickbait, 89.02% news) -> news

Article title: ^C

clickbait-classifier-1's People

Contributors

peterldowns avatar yoavz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.