Coder Social home page Coder Social logo

data-centric-ai-community / awesome-python-for-data-science Goto Github PK

View Code? Open in Web Editor NEW
67.0 7.0 14.0 54.78 MB

A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science! ๐Ÿ“Š

Home Page: http://discord.com/invite/mw7xjJ7b7s

Python 0.01% Dockerfile 0.01% Jupyter Notebook 96.52% HTML 3.48%
data-science exercises learn-to-code learning-by-doing learning-python learning-resources machine-learning python awesome-list data-quality

awesome-python-for-data-science's Introduction

Awesome

Discord Medium

Awesome Python for Data Science

The Data-Centric AI Community is the home of all things data ๐Ÿ

This repository was created by our community members to build a curated list of awesome resources such books, tutorials, courses, open-source libraries, exercises and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science!

๐Ÿ”จ Contributing to the Repo?

Check our CONTRIBUTING guide!

๐Ÿ’ซ You can also find us at our Discord Server to meet other learners, find co-developers or mentors, and engage in small hands-on coding sessions!

๐Ÿ Python Mastery

โ“ Where to Start!

If you're serious about starting your journey as a Pythonista, then you need to start with the basics. As a first approach to the language, we suggest that you start with the book "How to Think Like a Computer Scientist: Learning with Python 3" and follow up with the exercises presented in "Python By Example: Learning to Program in 150 Challenges". All exercises in the latter book have solutions, so it could be a nice way for you to start practicing.

If you feel up to it, and to keep yourself in check, you can contribute with exercises and solutions that you come up with to this repository. Just make sure to follow the structure under python-mastery and add your exercise and solution.py, or add a new version of a solution in case the exercise already exists and you think your solution is different from the one(s) presented (e.g. solution-03.py).

๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿซ Awesome Tutorials & Courses

๐Ÿ“š Awesome Books

๐Ÿ˜ธ List of Repos

  • 30-Days-Of-Python - 30 days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. Nevertheless, this challenge may take more than 100 days, so follow your own pace.
  • learn-python - Playground and cheatsheet for learning Python. A collection of Python scripts that are split by topics and contain code examples with explanations!
  • python-programming-exercises - 100 Python challenging programming exercises (with solutions!)

๐Ÿ‹๐Ÿฝโ€โ™€๏ธ Exercises

Please refer to this folder.

๐Ÿ›  Projects


๐Ÿ“Š Python for Data Science

โ“ Where to Start!

To learn data science, the CRISP-DM is a good approach:

CRISP-DM methodology

  1. Business/Problem Understanding
  2. ๐Ÿ†• Data Understanding: Check our EDA Projects in the Exercises section below! ๐ŸŽ‰
  3. ๐Ÿ†• Data Preparation: Follow the Tutorials below!
  4. Modelling
  5. Evaluation
  6. Deployment

๐Ÿ“š Awesome Books

๐Ÿšง WIP

๐Ÿ˜ธ List of Repos

๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿซ Tutorials

Data Understanding:

Data Preparation

Dealing with Missing Data

Data Transformation

๐Ÿ’ฟ Datasets (for exploration)

๐Ÿ•ต๐Ÿป Exploratory Data Analysis

  1. Olympic 124 Years Dataset: Exploring a dataset of the Olympic Games
๐Ÿซ‚ How to contribute?
  • Download the project and try to solve it at your own pace!
  • Ask as many questions as you like in our discord channel #๐Ÿds-projects
  • Share your final project by creating a Pull Request! ๐Ÿ‘

๐Ÿ”— Resources

๐Ÿ‘พ An Open Invitation

We are open to collaboration! If you want to start contributing you only need to create a pull request with relevant resources ๐Ÿš€ If you found these resources useful, please feel free to join our Discord server. We hope to say "Hi" on the other side! ๐Ÿ‘‹

A special shoutout to all contributors who keep pushing the boundaries of Data Science ๐Ÿ‘

Made with contrib.rocks.

awesome-python-for-data-science's People

Contributors

adamrossnelson avatar fabclmnt avatar jessehenson avatar miriamspsantos avatar vascoalramos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-python-for-data-science's Issues

Classification Metrics in Machine Learning

Description

Develop a tutorial on common classification metrics in machine learning, such as accuracy, precision, recall, and F1-score. Explain when to use each metric and how to calculate them.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

How to Perform Cross-Validation

Description

Create a tutorial for beginners that explains the concept of cross-validation in machine learning. Guide users through implementing k-fold cross-validation to assess model performance.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Exploratory Data Analysis (EDA) with ydata-profiling

Description

Develop a tutorial that walks beginners through exploratory data analysis using the ydata-profiling library.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Data Analysis of Text based Dataset

Description

it's crucial to perform in-depth data analysis and visualization to gain insights, discover patterns, and make informed decisions. This issue is focused on conducting an of the text data and creating visualizations that will aid our understanding.

Tasks

  • Data Exploration:

    • Perform initial data exploration to understand the structure and characteristics of the text dataset.
    • Identify key statistics, such as word count distributions, text length, and unique tokens.
  • Text Preprocessing:

    • Clean and preprocess the text data, including tasks like lowercasing, punctuation removal, and stopword removal.
    • Tokenize the text and create a vocabulary for further analysis.
  • Descriptive Analysis:

    • Calculate basic statistics, such as word frequency, to identify the most common terms in the dataset.
    • Visualize the distribution of word frequencies using appropriate charts (e.g., word clouds, bar charts).
  • Sentiment Analysis:

    • Perform sentiment analysis to gauge the overall sentiment of the text data.
    • Create sentiment score distributions and visualizations.
  • Topic Modeling:

    • Apply topic modeling techniques (e.g., LDA or NMF) to identify key topics within the text.
    • Visualize topic distributions and their evolution over time (if applicable).
  • Text Visualization:

    • Create informative visualizations to present the results of the analysis, such as word clouds, scatter plots, or heatmaps.
  • Insights and Findings:

    • Summarize the key insights and findings derived from the data analysis and visualizations.
  • Documentation:

    • Update the project documentation with the analysis methodology and findings.

Acceptance Criteria:

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Missing Data Imputation with Machine Learning Methods

Description

Create a Jupyter Notebook tutorial that demonstrates different machine learning methods for effectively handling missing data in datasets.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Handling Imbalanced Datasets with Undersampling

Description

Create a tutorial that introduces beginners to addressing imbalanced datasets by undersampling the majority class. Explain techniques like random undersampling, and others.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Hyperparameter Tuning with Grid Search

Description

Develop a tutorial showcasing how to optimize machine learning models by tuning hyperparameters using grid search.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Introduction to Time Series Analysis

Description

Create a tutorial that introduces beginners to time series data analysis. Cover basic concepts and simple forecasting techniques.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

How to Scale Numerical Data

Description

Create a Jupyter Notebook tutorial illustrating the importance of scaling numerical data for machine learning. Explore techniques like standardization and min-max scaling to preprocess and normalize numeric features.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

How to Encode Categorical Data

Description

Create a Jupyter Notebook tutorial that guides beginners through encoding categorical data for machine learning tasks. Cover techniques such as one-hot encoding and others to convert categorical variables into numerical form.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Clustering with K-Means

Description

Create a beginner-friendly tutorial introducing the concept of clustering using K-Means.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

How to Define Feature Importance

Description

Develop a Jupyter Notebook tutorial explaining the concept of feature importance in machine learning and showcasing some introductory methods.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Simple Linear Regression in Machine Learning

Description

Create a Jupyter Notebook tutorial explaining the concept of simple linear regression and how to perform it in Python for basic predictive modeling.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Introduction to Classification with Scikit-Learn

Description

Develop a beginner-friendly tutorial on classification using Scikit-Learn. Explain the basics of classification algorithms and guide users through building their first classifier.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Handling Imbalanced Datasets with Oversampling

Description

Craft a tutorial for beginners on addressing imbalanced datasets by oversampling the minority class. Show techniques like random oversampling and synthetic oversampling using SMOTE, or others of your choice!

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

How to Make Distributions More Gaussian

Description

Develop a Jupyter Notebook tutorial on transforming non-Gaussian distributions into more Gaussian-like ones. Explore various techniques like log transformations and others to enhance the distribution of data.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Data Augmentation with Synthetic Data using ydata-synthetic

Description

Develop a beginner-friendly tutorial on data augmentation using the ydata-synthetic library. Explain how to generate synthetic data to increase the size of training dataset, improving model performance.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Tutorial on Anomaly Detection

Description

Create a Jupyter Notebook tutorial that implements Anomaly Detection. Explain the concept of Anomaly Detection, and its applications, and provide a tutorial on how to use it in machine learning projects.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Data Visualization with Matplotlib and Seaborn

Description

Craft a tutorial on data visualization using Matplotlib and Seaborn. Show beginners how to create various types of plots and charts to explore and present data.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Implement Outlier Detection Tutorial

Description:
We need to create a comprehensive tutorial on outlier detection techniques and their practical implementation for our data science community. Outliers can significantly impact our data analysis and machine learning models, and it's essential that our users are well-informed about how to handle them.

Tasks:

  • Research and gather information on commonly used outlier detection methods.
  • Create a step-by-step guide on how to apply these techniques using our dataset.
  • Include code examples and explanations for each method.
  • Provide real-world use cases and scenarios where outlier detection is crucial.
  • Add visualizations to help users better understand the impact of outliers.
  • Ensure the tutorial is beginner-friendly and suitable for all skill levels.
  • Proofread and edit the tutorial for clarity and accuracy.
  • Create a table of contents and structure the tutorial logically.
  • Include external references and resources for further learning.
  • Test the code examples and instructions to confirm their correctness.

Expected Outcome:
Once this issue is completed, we will have a well-documented and informative tutorial on outlier detection. This resource will help our community members gain a better understanding of how to handle outliers in their data science projects.

Acceptance criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

How to Transform Numerical to Categorical Data

Description

Craft a Jupyter Notebook tutorial explaining how to convert numerical data into categorical format. Illustrate use cases and methods for creating meaningful categories from continuous variables.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Implement Feature Engineering Tutorial

Description

create a comprehensive tutorial on feature engineering to help both new and experienced team members understand and apply this crucial aspect of our data science work.

Tasks

  • Prepare an outline for the feature engineering tutorial, covering essential concepts and techniques.
  • Write a detailed introduction explaining the importance of feature engineering in our projects.
  • Provide clear examples of feature engineering methods used in our current project(s).
  • Include code snippets, demonstrations, and real-world use cases to illustrate the concepts.
  • Add references to external resources or research papers for further reading.
  • Include interactive code notebooks (e.g., Jupyter notebooks) that users can experiment with.
  • Add visuals, such as diagrams or charts, to aid in understanding.
  • Ensure the tutorial is well-structured, easy to follow, and suitable for both beginners and advanced team members.

Acceptance criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

How to Perform PCA Dimensionality Reduction

Description

Create a Jupyter Notebook tutorial that introduces Principal Component Analysis (PCA) for dimensionality reduction. Explain the concept of PCA, and its applications, and provide a tutorial on how to use it in machine learning projects.

Acceptance Criteria

  • Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
  • Modify the README.md file to include the new tutorial and a link to the added notebook

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.