Coder Social home page Coder Social logo

michaglia / dscoding Goto Github PK

View Code? Open in Web Editor NEW

This project forked from afflint/dscoding

1.0 0.0 0.0 1.17 MB

Examples and materials for the module on Python in the course in Coding for Data Science and Data Management

License: GNU General Public License v3.0

Python 0.63% Jupyter Notebook 99.37%

dscoding's Introduction

dscoding

Examples and materials for the module on Python in the course in Coding for Data Science and Data Management

Python programming

Coding for Data Science and Data Management

Lectures organization

The course syllabus is organized according to 4 main case studies. For each case study, we will implement solutions, preferably from scratch, in order to introduce all the main topics on Pyhton that are part of the course program.

Prerequisites

The syllabus and the lectures presuppose a knowledge of the programming contents seen in the crash course on coding. For those who have difficulty it is suggested to complete the Python tutorial (https://docs.python.org/3/tutorial/) as an initial introduction to the language.

Textbook

The course is not associated with a specific textbook, but a good introduction to Python for Data Science and Machine Learning can be found in the following book:

Gereon, A. (2018). Hands-on Machine Learning with Scikit-Learn and Tensor Flow. O’Reily Media Inc., USA. link

During the lectures no slides will be used. Materials are limited to blackboard, Jupyter notebooks and Python code examples.

Python code

The code developed during the lectures as well as other materials provided by the lecturers will be available on the GitHub dscoding repository of this module at https://github.com/afflint/dscoding.

Case Study 1

Word generation

We learn how to implement a model for generating words according to several different languages. We start from a naive random model and we progressively improve the quality of the model by exploiting a real language-specific text dataset using Markov chains.

Lectures

Sept 23 - Sept 30

Python know how

  • Advanced use of Python dictionaries
  • Introduction to pandas
  • Introduction to nltk
  • Read/write from files and csv
  • Advenced use of numpy
  • Introduction to object oriented programming

Case Study 2

Clustering with KMeans and data visualization

We implement the KMeans clustering algorithm from scratch, working on several dataset, either 2d and multidimensional. At the end, we compare our implementation with the KMeans implementation of scikit-learn. Furthermore, we see how to visualize the clusters as well as the dynamics of the KMean algorithm.

Lectures

Oct 7 - Oct 14

Python know how

  • Advanced use of pandas for cleaning and preparing the datasets
  • Advanced use of numpy
  • Introduction to scikit-learn for PCA, clustering and evaluation metrics
  • Introduction to matplotlib and overview of other plot libraries (i.e., plotly, bokeh)
  • More on object oriented programming

Case Study 3

Linear regression and gradient descent

We implement from scratch linear regression and gradient descent. Both will be compared with available standard libraries and tested on several datasets.

Lectures

Oct 28 - Nov 4

Python know how

  • Advanced use of pandas, numpy and scikit-learn
  • Pipelines, feature selection and model selection in scikit-learn

Case Study 4

Simulation environment

We implement a complex simulation environment that requires a careful software design and advanced competences in object oriented programming.

Lectures

Nov 11 - Nov 18 - Nov 25 - Dec 2

Python know how

  • Introduction to software design
  • Advanced object oriented programming
  • DB interaction with SQLAlchemy and pymongo
  • Introduction to networkx
  • Introduction to graphical and interactive data apps with streamlit
  • Virtual environments using venv

dscoding's People

Contributors

afflint avatar sergiopicascia avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.