dscoding

Examples and materials for the module on Python in the course in Coding for Data Science and Data Management

Python programming

Coding for Data Science and Data Management

Lectures organization

The course syllabus is organized according to 4 main case studies. For each case study, we will implement solutions, preferably from scratch, in order to introduce all the main topics on Pyhton that are part of the course program.

Prerequisites

The syllabus and the lectures presuppose a knowledge of the programming contents seen in the crash course on coding. For those who have difficulty it is suggested to complete the Python tutorial (https://docs.python.org/3/tutorial/) as an initial introduction to the language.

Textbook

The course is not associated with a specific textbook, but a good introduction to Python for Data Science and Machine Learning can be found in the following book:

Gereon, A. (2018). Hands-on Machine Learning with Scikit-Learn and Tensor Flow. O’Reily Media Inc., USA. link

During the lectures no slides will be used. Materials are limited to blackboard, Jupyter notebooks and Python code examples.

Python code

The code developed during the lectures as well as other materials provided by the lecturers will be available on the GitHub dscoding repository of this module at https://github.com/afflint/dscoding.

Case Study 1

Word generation

We learn how to implement a model for generating words according to several different languages. We start from a naive random model and we progressively improve the quality of the model by exploiting a real language-specific text dataset using Markov chains.

Lectures

Sept 23 - Sept 30

Python know how

Advanced use of Python dictionaries
Introduction to pandas
Introduction to nltk
Read/write from files and csv
Advenced use of numpy
Introduction to object oriented programming

Case Study 2

Clustering with KMeans and data visualization

We implement the KMeans clustering algorithm from scratch, working on several dataset, either 2d and multidimensional. At the end, we compare our implementation with the KMeans implementation of scikit-learn. Furthermore, we see how to visualize the clusters as well as the dynamics of the KMean algorithm.

Lectures

Oct 7 - Oct 14

Python know how

Advanced use of pandas for cleaning and preparing the datasets
Advanced use of numpy
Introduction to scikit-learn for PCA, clustering and evaluation metrics
Introduction to matplotlib and overview of other plot libraries (i.e., plotly, bokeh)
More on object oriented programming

Case Study 3

Linear regression and gradient descent

We implement from scratch linear regression and gradient descent. Both will be compared with available standard libraries and tested on several datasets.

Lectures

Oct 28 - Nov 4

Python know how

Advanced use of pandas, numpy and scikit-learn
Pipelines, feature selection and model selection in scikit-learn

Case Study 4

Simulation environment

We implement a complex simulation environment that requires a careful software design and advanced competences in object oriented programming.

Lectures

Nov 11 - Nov 18 - Nov 25 - Dec 2

Python know how

Introduction to software design
Advanced object oriented programming
DB interaction with SQLAlchemy and pymongo
Introduction to networkx
Introduction to graphical and interactive data apps with streamlit
Virtual environments using venv

michaglia / dscoding Goto Github PK

dscoding's Introduction

dscoding

Python programming

Coding for Data Science and Data Management

Lectures organization

Prerequisites

Textbook

Python code

Case Study 1

Word generation

Lectures

Python know how

Case Study 2

Clustering with KMeans and data visualization

Lectures

Python know how

Case Study 3

Linear regression and gradient descent

Lectures

Python know how

Case Study 4

Simulation environment

Lectures

Python know how

dscoding's People

Contributors

Stargazers

Recommend Projects

Recommend Topics

Recommend Org