Special Interest Group for Natural Language Learning (SIGNLL) Projects

To learn more about SIGNLL and see the projects for yourself, visit our website here

SIGNLL is an organization committed to the learning and exploration of various Natural Language Processing and Machine Learning Topics. As one of the many subcommittees of the UIUC chapter of ACM, we have taught various lessons and walked through different Python projects to give a taste of this subfield of computer science. This repository contains the code and different materials for the Fall 2020 semester of SIGNLL

Getting Started
Projects
Website Info
Authors / Contact

Getting Started

To get the different projecs running on your machine, follow the steps below

Installing and Opening Notebooks

First, run the following command to get the files on your machine

git clone https://github.com/nbalepur/SIGNLL-Fall-2020.git

The recommended IDE for these projects is Jupyter Notebook. My preferred installation method of Jupyter Notebook is with Anaconda

Instructions on how to install Anaconda can be found here

Once Anaconda is installed, simply open up the application and type "jupyter notebook" into the terminal. From there, you should be able to navigate to your desired project file

Additional Installations

Some projects require additional libraries to be installed, such as Keras and Tensorflow in the Neural Networks project

This can be accomplished fairly simply using Anaconda Navigator, instructions of which can be found here

Running Jupyter Notebooks

An in-depth tutorial on how to run and navigate through Jupyter Notebooks can be found here

Projects

Throughout different semesters, we have worked on the following projects

Intro to Python and NumPy
Linear and Logistic Regression
Twitter Sentiment Analysis
Neural Networks
Chatbot Part 1
Chatbot Part 2
Text Summarization
Tries
Naive Bayes

Intro to Python and NumPy

This week, we go through a brief description of NLP and the other projects we covered throughout the semester. We then showcase a notebook demo with a beginner's application to Python and NumPy We have also provided a notebook for extra practice with various NumPy functions and Python data structures. Finally, we give a brief demo of using what we've learned in a simple spam-detection model

Spam / Not Spam Demo

In this introductory demo, we take a look at how we can use the basics of Python to predict whether or not an email is spam or not spam

In the following screenshot, spam designates a spam email and ham is a non-spam email

Linear and Logistic Regression

This week, we discuss two of the most fundamental algorithms for NLP: linear and logistic regression. We give an overview of the theory behind the regression, an explanation of data collection and model validation, and a preview of how these algorithms can be applied using tweet predictions

Linear Regression Demo

If you navigate to the SIGNLL Website under Linear and Logistic Regression and Linear Regression Demo, you can try the following demo for yourself!

You will be taken to an interactive plot where you can add points by clicking on the plot, and delete an existing point by clicking on it. Once you are satisfied, you can press Fit Line to run the algorithm. Finally, you can vary the degree of the model fit and the equation for the line will update dynamically. Pressing Clear will remove all points, lines, and equations on the screen

Twitter Sentiment Analysis

This week, we take what we learned last week and apply logistic regression to predicting the sentiment of tweets. We'll discuss our general algorithm for sentiment analysis and apply this algorithm in the notebook to determine whether a tweet is positive or negative

Custom Tweet Demo

If you navigate to the SIGNLL Website under Sentiment Analysis and Custom Tweet Demo, you can try the following demo for yourself!

In this demo, you can type in your own custom tweet and watch the computer predict whether or not your tweet was positive or negative. You'll be able to view your custom tweet, followed by visualizations of its overall sentiment probability and individual word breakdown

Neural Networks

This week, we go over another funademental concept for NLP: neural networks. We begin by taking a look at the mathematical and statistical theory behind neural networks and the principles of training and testing. Afterwards, we create a neural network from scratch using NumPy and use it to predict truth tables and handwritten digits with the MNIST dataset

Note: To run this project in Python, you must have Keras installed. Instructions on how to do this can be found above

Handwritten Digits Demo

If you navigate to the SIGNLL Website under Neural Networks and Handwritten Digits Demo, you can try the following demo for yourself!

Simply draw your number on the canvas and press Predict to view the probability distribution of the model. You can also press Clear to clear your drawing and predict again

Chatbot Part 1

We begin our exploration of chatbots this week by learning what chatbots are used for, the intuition behind how they work, and a simple bag-of-words algorithm we can use to accomplish our task. Afterwards, we use our knowledge of neural networks to train a customer support chatbot to predict a certain tag depending on the user input

Note: There are two versions of this notebook: one that uses NumPy and one that uses Keras, but they function in the exact same way

Tag Prediction Demo

If you navigate to the SIGNLL Website under Chatbot Part 1 and Tag Prediction Demo, you can try the following demo for yourself!

Type a custom message and you will be able to see the chatbot's association with your message to one of the following tags:

contact
deals
directions
fact
goodbye
greeting
options
recommendation
thanks

Chatbot Part 2

This week, we review our chatbot algorithm, learn how we can make multi-class predictions, and analyze the output layer of an activated neural network. We'll then use what we learned this week and last week to use our pre-trained chatbot to make predictions and converse with a user

Note: There are two versions of this notebook: one that uses NumPy and one that uses Keras, but they function in the exact same way

Chatbot Demo

If you navigate to the SIGNLL Website under Chatbot Part 2 and Chatbot Demo, you can try the following demo for yourself!

In this demo, you'll be able to interact with the chatbot that we built. This chatbot was created to be a customer support chatbot for Taco Bell. Type the message in the input field and press Send to have your customer support needs fulfilled!

Text Summarization

This week, we take a look at different types of summarization, a simple algorithm for summarizing text, and how to solve certain problems that arise from our algorithm. We then apply these concepts by taking an arbitrary Wikipedia page and picking the most representative sentences to form a coherent summary

Summary Visualization - Animation

If you navigate to the SIGNLL Website under Text Summarization and Summary Visualization, you can press Play Animation to try the following demo for yourself!

After pressing the Play Animation button, you'll be able to visualize how the algorithm works

Summary Visualization - Comparison

If you navigate to the SIGNLL Website under Text Summarization and Summary Visualization, you can press Compare Summary to try the following demo for yourself!

After pressing Compare Summary you can see a side-by-side comparison of the weights of the original text and the summary. You can toggle the switch at the top of the screen to switch between word weights and summary weights

Tries

This week, we take a look at the Trie data structure. We discuss the theory behind Tries, its benefits and drawbacks, and applications where the data structure would be useful. To prove our theories, we then analyze the efficiency of different data structures for storing a large amount of text, as well as some useful applications of the Trie data structure

Word Unscrambler Demo

If you navigate to the SIGNLL Website under Tries and Word Unscrambler, you can try the following demo for yourself!

In this demo you can take a look at a very useful application of Tries: recursively unscrambling letters to create valid words. Type some letters in the input field to see the speed at which Tries allow us to do this. After all the words are generated, you can use the slider to subset by word length

Naive Bayes

This week, we give an introduction to probability, statistics and bayes theorem, and then apply what we learned to the de-anonymization of tweets. By combining statistics and NLP, we are able to create a light-weight model to predict whether a tweet was written by Kanye West or Joe Biden

Tweet Author Demo

If you navigate to the SIGNLL Website under Naive Bayes and Tweet Author Demo, you can try the following demo for yourself!

Simply type your message in the input field and see if the computer predicts the tweet was more likely to be tweeted by Kanye West or Joe Biden. You'll be able to see a fake tweet for the predicted author, as well as visualizations for the author probability distribution and individual word author breakdown

Website Info

This website was created using React and relies heavily on the Bootstrap component library

To run the website locally, navigate to SIGNLL-Fall-2020/website and run the following command:

npm start

This will deploy the website locally in your browser

Authors / Contact

All of the code for this resository was written by me, Nishant Balepur. If you have any questions or concerns, feel free to reach out!

nbalepur / signll-fall-2020 Goto Github PK

signll-fall-2020's Introduction