Coder Social home page Coder Social logo

gr4nada / data-scientist-roadmap Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 1.37 MB

Jobs linked to data science are becoming more and more popular. A bunch of tutorials could easily complete this roadmap, helping whoever wants to start learning stuff about data science.

Jupyter Notebook 100.00%

data-scientist-roadmap's Introduction

data-scientist-roadmap

I just found this data science skills roadmap, drew by geeks for geeks on his cool page.


roadmap-picture


Jobs linked to data science are becoming more and more popular. A bunch of tutorials could easily complete this roadmap, helping whoever wants to start learning stuff about data science.

A Roadmap to Learn

Mathematics

Math skill is very important as they help us in understanding various machine learning algorithms that play an important role in Data Science.

Part 1:

  • Linear Algebra
  • Analytic Geometry
  • Matrix
  • Vector Calculus
  • Optimization

Part 2:

  • Regression
  • Dimensionality Reduction
  • Density Estimation
  • Classification

Probability

Probability is also significant to statistics, and it is considered a prerequisite for mastering machine learning.

  • Introduction to Probability
  • 1D Random Variable
  • The function of One Random Variable
  • Joint Probability Distribution
  • Discrete Distribution
  • Binomial (Python | R)
  • Bernoulli
  • Geometric etc
  • Continuous Distribution
  • Uniform
  • Exponential
  • Gamma
  • Normal Distribution (Python | R)

Statistics

Understanding of Statistics is very significant as this is a part of Data analysis.

  • Introduction to Statistics
  • Data Description
  • Random Samples
  • Sampling Distribution
  • Parameter Estimation
  • Hypotheses Testing (Python | R)
  • ANOVA (Python | R)
  • Reliability Engineering
  • Stochastic Process
  • Computer Simulation
  • Design of Experiments
  • Simple Linear Regression
  • Correlation
  • Multiple Regression (Python | R)
  • Nonparametric Statistics
  • Sign Test
  • The Wilcoxon Signed-Rank Test (R)
  • The Wilcoxon Rank Sum Test
  • The Kruskal-Wallis Test (R)
  • Statistical Quality Control
  • Basics of Graphs

Programming

One needs to have a good grasp of programming concepts such as Data structures and Algorithms. The programming languages used are Python, R, Java, Scala. C++ is also useful in some places where performance is very important.

Python:

  • Python Basics
  • List
  • Set
  • Tuples
  • Dictionary
  • Function, etc.
  • NumPy
  • Pandas
  • Matplotlib/Seaborn, etc.

R:

  • R Basics
  • Vector
  • List
  • Data Frame
  • Matrix
  • Array
  • Function, etc.
  • dplyr
  • ggplot2
  • Tidyr
  • Shiny, etc.
  • DataBase:
  • SQL
  • MongoDB
  • Other:
  • Data Structure
  • Time Complexity
  • Web Scraping (Python | R)
  • Linux
  • Git

Machine Learning

ML is one of the most vital parts of data science and the hottest subject of research among researchers so each year new advancements are made in this. One at least needs to understand basic algorithms of Supervised and Unsupervised Learning. There are multiple libraries available in Python and R for implementing these algorithms.

Introduction:

  • How Model Works
  • Basic Data Exploration
  • First ML Model
  • Model Validation
  • Underfitting & Overfitting
  • Random Forests (Python | R)
  • scikit-learn
  • Intermediate:
  • Handling Missing Values
  • Handling Categorical Variables
  • Pipelines
  • Cross-Validation (R)
  • XGBoost (Python | R)
  • Data Leakage,

Deep Learning

Deep Learning uses TensorFlow and Keras to build and train neural networks for structured data.

  • Artificial Neural Network
  • Convolutional Neural Network
  • Recurrent Neural Network
  • TensorFlow
  • Keras
  • PyTorch
  • A Single Neuron
  • Deep Neural Network
  • Stochastic Gradient Descent
  • Overfitting and Underfitting
  • Dropout Batch Normalization
  • Binary Classification

Feature Engineering

In Feature Engineering discover the most effective way to improve your models.

  • Baseline Model
  • Categorical Encodings
  • Feature Generation
  • Feature Selection

Natural Language Processing

In NLP distinguish yourself by learning to work with text data.

  • Text Classification
  • Word Vectors

Data Visualization Tools

Make great data visualizations. A great way to see the power of coding!

  • Excel VBA
  • BI (Business Intelligence):
  • Tableau
  • Power BI
  • Qlik View
  • Qlik Sense

Deployment

The last part is doing the deployment. Definitely, whether you are fresher or 5+ years of experience, or 10+ years of experience, deployment is necessary. Because deployment will definitely give you a fact is that you worked a lot.

  • Microsoft Azure
  • Heroku
  • Google Cloud Platform
  • Flask
  • DJango

Other Points to Learn

  • Domain Knowledge
  • Communication Skill
  • Reinforcement Learning
  • Different Case Studies:
  • Data Science at Netflix
  • Data Science at Flipkart
  • Project on Credit Card Fraud Detection
  • Project on Movie Recommendation, etc.

data-scientist-roadmap's People

Contributors

gr4nada avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.