Coder Social home page Coder Social logo

nlp_political_speeches's Introduction

NLP Political Speeches

Goal: Classify political speeches as Democratic or Republican

Author

Jose Miguel Montoro - [email protected]

Description

This project investigates how to train ML/deep learning models so that they're able to classify political speeches into Democratic or Republican. The training data are labeled, historical political speeches by Democratic or Republican politicians, scraped from around the web. So far, several word vector representation models have been implemented: ngrams (Bag of Words) using scikit-learn CountVectorizer, TD-IDF with scikit-learn, SpaCy word vectors, FastText word vectors. Various ML classification models have been applied as well (Logistic Regression, Naive Bayes, Support Vector Machines, Random Forest) as well as neural networks (multi-layer perceptron, convolutional neural network). The ML model with best results is SVM with TD-IDF. The MLP approach also performed very well with TD-IDF.

Structure

All temporary files are in the data folder. Some python modules are in the helpers folder, they include utilities to help process the data and build and evaluate the models. All the code to extract the data, EDA, preprocess text and build and evaluate models is in the notebooks folder.

These are the steps that were followed to complete the project. Each step corresponds to one notebook.

  1. a) Web Scraping from https://millercenter.org/
  2. b) Web Scraping from https://www.americanrhetoric.com/
  3. Text Pre-Processing
  4. Exploratory Data Analysis
  5. Vectorization and Classification Models
  6. Neural Networks
  7. Deep Learning models

nlp_political_speeches's People

Contributors

josemmontoro avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.