nlp_political_speeches's Introduction

NLP Political Speeches

Goal: Classify political speeches as Democratic or Republican

Author

Description

This project investigates how to train ML/deep learning models so that they're able to classify political speeches into Democratic or Republican. The training data are labeled, historical political speeches by Democratic or Republican politicians, scraped from around the web. So far, several word vector representation models have been implemented: ngrams (Bag of Words) using scikit-learn CountVectorizer, TD-IDF with scikit-learn, SpaCy word vectors, FastText word vectors. Various ML classification models have been applied as well (Logistic Regression, Naive Bayes, Support Vector Machines, Random Forest) as well as neural networks (multi-layer perceptron, convolutional neural network). The ML model with best results is SVM with TD-IDF. The MLP approach also performed very well with TD-IDF.

Structure

All temporary files are in the data folder. Some python modules are in the helpers folder, they include utilities to help process the data and build and evaluate the models. All the code to extract the data, EDA, preprocess text and build and evaluate models is in the notebooks folder.

These are the steps that were followed to complete the project. Each step corresponds to one notebook.

a) Web Scraping from https://millercenter.org/
b) Web Scraping from https://www.americanrhetoric.com/
Text Pre-Processing
Exploratory Data Analysis
Vectorization and Classification Models
Neural Networks
Deep Learning models

Recommend Projects

josemmontoro / nlp_political_speeches Goto Github PK