Coder Social home page Coder Social logo

usfocuses's Introduction

usfocuses

NLP Analysis of the Congressional records and the Presidential Speeches

Business Understanding

The Congressional records and the Presidential Speeches contain a large amount of information regarding the past major events of United States. The massiveness of data makes it difficult to look up for relevant information of certain topics at certain periods that we are interested in and identify the historical trends of specific topics over the past years. With the methods of web scraping, TF-IDF(term frequency–inverse document frequency), and search algorithm, it is possible to find out the most relevant important words for a given topic or keywords. It would be convenient for historians to find the relevant information regarding certain topics of certain periods and for teachers to teach the history class. The main goal of this project is to address the questions proposed above.

Data Understanding

The Congressional Record that is published daily when the congress is in session is the official record of the proceedings and debates of the United States congress.

Data preparation

TF-IDF, 2-grams, Bag of words, and parts of Speeches tagging are some techniques for text processing.

Modeling

Multinomial classification models such as multinomialNB, Logistic Regression and Random Forest are applied.

Evaluation

One way to identify the accuracy of the model would be to check the trends of topics over time since the trend of a certain topic is known in the history. For multinomial classification problem, plotting the confusion matrix is a good method to find out how well the model performs on classifying different labels.

Deployment

Build a web app to allow users to search for the information they want and look at the topic trends.

usfocuses's People

Contributors

chenzhih03 avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

sandroclark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.