Notes and resources for Brad Flaugher's Data-Focused Programming Bootcamp
- (Preferred, but not required) Install Ubuntu Linux on your PC, if you have one Install Guide, and additional notes for dual-booting with Windows NOTE if you have a Mac this will be almost impossible, but that's OK, just install docker (see below)
- (Required) Install Docker Desktop Docker.com
- (Required) Read Command Line for Beginners
- (Required) Read Chapters 1-6 of Automate The Boring Stuff with Python
- (Recommended) Learn vim via a fun game, via exercises, by reading documentation
- (Recommended) Join the bootcampers LinkedIn Group
- Note 1: Lectures are a small part of the course, most bootcamper's time will be spent working on their final potrfolio projects.
- Note 2: The 6 week course is broken into numeric and alphabetical lectures. Lectures 1-6 are technical in nature, Lectures A-E are soft-skills and history.
- Definitions: Data Scientist, Data Engineer, Data Analyst (What do we spend time doing?)
- Definitions: Machine Learning and Artificial Intelligence
- History: What kind of ML is used today? How much of this book is practically useless?
- Neural Networks: Babies and Vision
- Neural Networks: Single Cell Neural Network aka Regression in Excel
- Neural Networks: Name and Height "Regression"
- Neural Networks: When will GPT-3 "insights" become stale? Is this learning? is this engineering? is this science?
- Neural Networks: Correllating words and images
- Neural Networks: Why only study NNs for now? NNs are Decision Trees, NNs vs SVMs
- Neural Networks: Playing Pong with real neurons
- Final Project Intro: Huggable Model and Google Play Virus Model
- VIDEO: Oleh's Car Price Predictor and Source Code
- VIDEO: Fall 2022 Bootcampers Presentation WARNING LARGE FILE and Hanna's Source Code
- Help Brad with FOSS Models for Medusa
- Free captioned images from the web, LAION
- The entire web, scraped for you, Common Crawl via comcrawl
- More specialized data... Datahub and Awesome pubilc datasets
- Definitions: Unix, Linux, Command Line, DevOps, Programming Language
- History: Python and C Speed Test, SQL
- History: BERT, GPT3, DALLE, Stable Diffusion and self-driving cars.
- History: A historical perspective on technological adoption, is it fast or slow? Flavors of technological disruption. (Lateral thinking with withered technology, how many people can use spreadsheets, and Keynes quote)
- Impostor Syndrome: "10,000 Qualified data scientists" Can you trust your professor at Berkley? Who are the ML Leads at big companies? Who are the IT consultants?
- Impostor Syndrome: What does MIT Say? A review of Managing Technical Professionals.
- 9 Reasons why you'll never be a data scientist
- Huge “foundation models” are turbo-charging AI progress
- Language Models: Past, Present, and Future
- Have Computers Made Us More Productive?
- Lateral Thinking With Withered Technology
- Keynes on Next-Day Delivery
- Definitions: docker, container, ephemeral, bash
- History: SQL, what it is and why it's important (PowerBI, Tableau, Athena, BigQuery)
- Docker: Command line usage, flags, interactive mode and bash
- Docker in the cloud: How to think about the cloud, Big Providers (AWS, GCP, Azure) and Small (Linode, Oracle, etc...)
- Aside: What are Kaggle and Colab?
- Demonstration: Create a github project, spin up environment, run experiment, save python file, commit changes.
- Practice: "Head of Data" interview question, how fast can you spin up an environment?
- Docker Documentaton
- Command Line for Beginners
- Interactive SQL course @ Codecademy
- Cloud Computing Business Overview (Sep 2022)
- Docker on AWS, or the TLDR version
- Docker on GCP, or the TLDR version
- Docker on Azure
- Spotty
- Tensordock
- Definitions: Open Source, FOSS
- History: Linux, Gnu and Free Software
- Aside: Cycling team analogy, Trek, Schwinn, Homemade Bike, #2 Kid with CNC machine vs old man with saw
- Aside: “A Generation Behind” - is it true? is it useful?
- Aside: Competition and cooperation in tech, story of Google, Apple and Microsoft and Open Source.
- Choosing Technologies: How to choose a technology and not stress about it. How to handle buy vs. build and this map
- Demonstration: Numbers are Data
- Demonstration: Text is Data
- Demonstration: Images are Data
- Discussion: Data Collection, ETL and "glue code"
- Preprocessing Notebook
- NLP in 5 Minutes with Tensorflow
- Tensorflow Image Classification Example
- Few-shot classification with SetFit...
- Fundamentals of Data Engineering, Chapters 1 and 3
- 6 Step ETL with Airflow
- Airflow vs Luigi vs Kubeflow etc..
- What the Consultants say: Modules and Team Interaction
- What the consultants say: Decision Making: Blending Models and Humans
- Discussion: Testing and Documentation
- Discussion: Common data gathering tricks
- Discussion: What to do when you get stuck with a horrible dataset
- Demonstration: UNIX as IDE
- Demonstration: Infrastructrue as code
- Discussion: Which Libraries should I use?
- Discussion: What is the problem with GUIs?
- Definition: Accuracy, Precision, Recall, F1, AUC
- Discussion: Layer Types and Standard or Template Models
- Discussion: Where to start, how to adjust hyperparameters
- Discussion: How can you steal ideas?
- Definitions: AI Ethics Big 3: Explainability, Bias, and Privacy
- Discussion: Who should die? Self-Driving trolley preblems.
- Discussion: I can predict criminality, should I?
- Discussion: Are biased models useful? When?
- Google Researcher Says She Was Fired Over Paper Highlighting Bias in A.I.
- Tesla’s ‘phantom braking’ problem is getting worse, and the US government has questions
- A.I. Is Not Sentient. Why Do People Say It Is?
- The Long Road to Driverless Trucks
- Stuck on the Streets of San Francisco in a Driverless Car
- Demonstration: Tensorflow Lite, Tensorflow Serving
- Discussion: Predict is easy, train is hard (computationally)
- Demonstration: Docker + Flask
- Discussion: DevOps vs MLOps, what is special? what is the same?
- Practical MLOps Chapters 1-4
- Create an MLOps Pipeline with github and docker in minutes
- AI Template (github)
Bootcampers will spend a tremendous time working on final projects that are targeted to the bootcamper's career goals. For an example final presentation see Oleh's Video (YouTube) and Oleh's Repository (GitHub).