- Installing the latest R version or ≥ 3.5 (https://cloud.r-project.org)
- Intalling the latest RStudio Desktop or ≥1.1 (https://www.rstudio.com/products/rstudio/download/)
- Installing latest Git or ≥ 2.7
- Also install Git Bash for windows
During this course we will be using few packages that do not come with the base R. Therefore, I have put a chunk of code in the requirements.txt
that will install all the necessary packages for you. All you have to do is open the R-base and copy and paste the content of the file.
Note: the file has a long list and installing all these packages might take sometime. Be patient.
- Introduction to Data Science – W1S1
- Motivation to learn Data Science
- Definition of Data Science
- Types of Data
- Ranks of Data
- Tasks of Data Scientists
- CRISP-DM – W1S1
- Business (Organizational) Understanding
- Data Understanding
- Data Preparation (Course Focus)
- Modeling (Course Focus)
- Evaluation (Course Focus)
- Deployment
- Data Scientist ToolBox – W1S2
- Git & GitHub practical Application
- Initializing repos
- Understanding the life cycle of files in git
- Viewing History
- Undoing things in git
- Creating Branches
- Merging Branches & Resolving conflicts
- Adding Remote Repos
- Pulling and Pushing files to remotes
- Tracking Branches
- R Programming Foundations W1S3
- R Basics
- Basic Math
- Variables
- Data Types
- Vectors
- Calling Functions
- Function Documentation
- Missing Data
- Pipes
- R beyond basics
- Other data Types (Data.Frames, Lists, Matrices, Arrays)
- Writing Functions
- Control Statements & Loops
- Reading Data into R
- CSV, Excel
- R Binary data
- Graphing in R
- Base Graphs
- ggplot2
- R Basics
- Rstudio IDE workspace.
- Viewing data
- Creating Projects
- Modifying your global options
- Creating Documents with R (Optional)
- Git & GitHub practical Application
- Introduction to Statistics
- Measure of Centrality
- Measure of Variability
- Probability Distributions
- Confidence Intervals
- Data Preprocessing
- Missing Data
- Outliers
- Inconsistent Data
- Incorrect Data
- Supervised Models
- Linear Regression
- Logistic Regression
- Elastic Nets
- Decision Trees
- Xgboost classifiers
- Random Forest (brief)
- Unsupervised Models
- KNN models
- Hierarchical Clustering
- Kmean (Homework)
- Model Evaluation
- Handeling unbalanced classes
- K-Fold Cross-Validation
- Confusion Matrix