This repository contains course materials for BIOS 611 (Introduction to Data Science) typically taught during the Fall Semester at UNC Chapel Hill in the Department of Biostatistics.
The intent of the course is to provide an intensive introduction to the technical material and skills that a data scientist needs in order to do repeatable, reliable research.
It covers basic linux tools like bash and make, Docker, git (extensively) and serves as an introduction to R and Python including how one goes about organizing a research project and an R or Python library.
Along the way we will become informally familiar with some analytical techniques: classification, regression and clustering. The emphasis here is practical: how to use the methods while avoiding common pitfalls.
Class is at 3:35 pm - 4:50 pm on MW. There is a lab session from 2:00 pm to 3:00 pm on Tuesdays.
Class is held in: McGavran-Greenberg PH-Rm 2308 Lab is held in: McGavran-Greenberg PH-Rm 2306
|----------------------|-------------|---------------------------------------|-----------------------------------------------------------------------------------|
Date | Time | Subject | Reading |
---|---|---|---|
Monday 2022-08-15 | 3:35-4:50pm | Introduction | 1,2 |
Tuesday 2022-08-16 | 2:00-3:00pm | Lab | |
Wednesday 2022-08-17 | 3:35-4:50pm | Compute Resources | 1,2,3 |
Monday 2022-08-22 | 3:35-4:50pm | Unix | 1,2,3,4 |
Tuesday 2022-08-23 | 2:00-3:00pm | Lab | |
Wednesday 2022-08-24 | 3:35-4:50pm | Docker | 12 |
Monday 2022-08-29 | 3:35-4:50pm | git basics & github basics | 1234 |
Tuesday 2022-08-30 | 2:00-3:00pm | Lab | |
Wednesday 2022-08-31 | 3:35-4:50pm | How to Think about Programming & R | 12345678910 |
Monday 2022-09-05 | No Class | ๐ ๐น Labor Day | |
Tuesday 2022-09-06 | No Class | ๐ฅฐ ๐ฅฐ Well-being Day | |
Wednesday 2022-09-07 | 3:35-4:50pm | More R | |
Monday 2022-09-12 | 3:35-4:50pm | Tidyverse for Tidying & GGPlot | 123456789 |
Tuesday 2022-09-13 | 2:00-3:00pm | Lab | |
Wednesday 2022-09-14 | 3:35-4:50pm | Make and Makefiles | |
Monday 2022-09-19 | 3:35-4:50pm | git concepts and practices | |
Tuesday 2022-09-20 | 2:00-3:00pm | Lab | |
Wednesday 2022-09-21 | 3:35-4:50pm | Markdown, RMarkdown, Notebooks, Latex | |
Monday 2022-09-26 | No Class | ๐ฅฐ ๐ฅฐ Well-being Day | |
Tuesday 2022-09-27 | 2:00-3:00pm | Lab | |
Wednesday 2022-09-28 | 3:35-4:50pm | Project Organization | |
Monday 2022-10-03 | 3:35-4:50pm | Dimensionality Reduction | |
Tuesday 2022-10-04 | 2:00-3:00pm | Lab | |
Wednesday 2022-10-05 | 3:35-4:50pm | Clustering | |
Monday 2022-10-10 | 3:35-4:50pm | Classification | |
Tuesday 2022-10-11 | 2:00-3:00pm | Lab | |
Wednesday 2022-10-12 | No Class | ๐ค ๐ University Day | |
Monday 2022-10-17 | 3:35-4:50pm | Model Validation and Selection | |
Tuesday 2022-10-18 | 2:00-3:00pm | Lab | |
Wednesday 2022-10-19 | 3:35-4:50pm | Shiny | |
Monday 2022-10-24 | 3:35-4:50pm | Introduction to Scientific Python | |
Tuesday 2022-10-25 | 2:00-3:00pm | Lab | |
Wednesday 2022-10-26 | 3:35-4:50pm | SQL (and pandas, dplyr) | |
Monday 2022-10-31 | 3:35-4:50pm | Pandas & SQL | |
Tuesday 2022-11-01 | 2:00-3:00pm | Lab | |
Wednesday 2022-11-02 | 3:35-4:50pm | SKLearn Introduction | |
Monday 2022-11-07 | 3:35-4:50pm | Training Neural Networks | |
Tuesday 2022-11-08 | 2:00-3:00pm | Lab | |
Wednesday 2022-11-09 | 3:35-4:50pm | Bokeh | |
Monday 2022-11-14 | 3:35-4:50pm | Browser Based Visualization w/ d3 | |
Tuesday 2022-11-15 | 2:00-3:00pm | Lab | |
Wednesday 2022-11-16 | 3:35-4:50pm | Data Science Ethics | |
Monday 2022-11-21 | 3:35-4:50pm | Panel Discussion | |
Tuesday 2022-11-22 | 2:00-3:00pm | Lab | |
Wednesday 2022-11-23 | No Class | ๐ฆ ๐ฆ Thanksgiving | |
Monday 2022-11-28 | 3:35-4:50pm | Web Scraping | |
Tuesday 2022-11-29 | 2:00-3:00pm | Lab | |
Wednesday 2022-11-30 | 3:35-4:50pm | Feedback Day | |
Monday 2022-12-05 | 3:35-4:50pm | Class Presentations I | |
Tuesday 2022-12-06 | 2:00-3:00pm | Lab | |
Wednesday 2022-12-07 | 3:35-4:50pm | Class Presentations II |
Lab will be generally unstructured time where you will be able to work on projects and ask me questions. Sometimes we will use this time to cover material.
I provide a Docker container which you can use to hack on these lectures and the associated materials. Some lectures may have their own Docker container. But to work on most of them:
./start-env.sh
This will start an RStudio Instance.