Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
In this workshop, you will learn fundamental machine learning concepts, including how to build features for a classification task, how to build a text classification system that can predict whether sentences belong to one category ("news") or another ("romance"), and how to prepare data for machine learning using Pandas, a package for Python that helps to organize your data. We will also learn how to use the scikit-learn package for Python and how to evaluate the results of the analysis.
In this workshop, you will learn the following skills:
- How to use skills from the NLTK workshop to build features for a classification task
- How to build a text classification system that can predict whether sentences belong to one category ("news") or another ("romance")
- How to group data and perform calculations on the aggregations
- How to prepare data for machine learning using pandas, a package for Python that helps to organize your data
- How to use the scikit-learn package for Python to perform different types of machine learning on the data
- How to evaluate the results of machine learning algorithms
- How to visualize observations, aggregations, and algorithmic results
Introduction
Installation
What Is Classification?
Getting Our Data
Features
Visualization
Supervised Machine Learning
Supervised Classification
Unsupervised Machine Learning
Feature Extraction Using Bag of Words
Topic Modeling with Latent Dirchlet Allocation
Appendix: Visualizations
Resources
Session leaders: Rachel Rakov and Hannah Aizenman
Based on previous work by: Rachel Rakov and Hannah Aizenman
Digital Research Institute (DRI) Curriculum by Graduate Center Digital Initiatives is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at https://github.com/DHRI-Curriculum. When sharing this material or derivative works, preserve this paragraph, changing only the title of the derivative work, or provide comparable attribution.