A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science! ๐
Craft a tutorial on data visualization using Matplotlib and Seaborn. Show beginners how to create various types of plots and charts to explore and present data.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Create a Jupyter Notebook tutorial illustrating the importance of scaling numerical data for machine learning. Explore techniques like standardization and min-max scaling to preprocess and normalize numeric features.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Craft a Jupyter Notebook tutorial explaining how to convert numerical data into categorical format. Illustrate use cases and methods for creating meaningful categories from continuous variables.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Develop a Jupyter Notebook tutorial on transforming non-Gaussian distributions into more Gaussian-like ones. Explore various techniques like log transformations and others to enhance the distribution of data.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Develop a beginner-friendly tutorial on classification using Scikit-Learn. Explain the basics of classification algorithms and guide users through building their first classifier.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Create a Jupyter Notebook tutorial that guides beginners through encoding categorical data for machine learning tasks. Cover techniques such as one-hot encoding and others to convert categorical variables into numerical form.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Create a tutorial that introduces beginners to addressing imbalanced datasets by undersampling the majority class. Explain techniques like random undersampling, and others.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Craft a tutorial for beginners on addressing imbalanced datasets by oversampling the minority class. Show techniques like random oversampling and synthetic oversampling using SMOTE, or others of your choice!
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Create a Jupyter Notebook tutorial that introduces Principal Component Analysis (PCA) for dimensionality reduction. Explain the concept of PCA, and its applications, and provide a tutorial on how to use it in machine learning projects.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Develop a tutorial on common classification metrics in machine learning, such as accuracy, precision, recall, and F1-score. Explain when to use each metric and how to calculate them.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
create a comprehensive tutorial on feature engineering to help both new and experienced team members understand and apply this crucial aspect of our data science work.
Tasks
Prepare an outline for the feature engineering tutorial, covering essential concepts and techniques.
Write a detailed introduction explaining the importance of feature engineering in our projects.
Provide clear examples of feature engineering methods used in our current project(s).
Include code snippets, demonstrations, and real-world use cases to illustrate the concepts.
Add references to external resources or research papers for further reading.
Include interactive code notebooks (e.g., Jupyter notebooks) that users can experiment with.
Add visuals, such as diagrams or charts, to aid in understanding.
Ensure the tutorial is well-structured, easy to follow, and suitable for both beginners and advanced team members.
Acceptance criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
it's crucial to perform in-depth data analysis and visualization to gain insights, discover patterns, and make informed decisions. This issue is focused on conducting an of the text data and creating visualizations that will aid our understanding.
Tasks
Data Exploration:
Perform initial data exploration to understand the structure and characteristics of the text dataset.
Identify key statistics, such as word count distributions, text length, and unique tokens.
Text Preprocessing:
Clean and preprocess the text data, including tasks like lowercasing, punctuation removal, and stopword removal.
Tokenize the text and create a vocabulary for further analysis.
Descriptive Analysis:
Calculate basic statistics, such as word frequency, to identify the most common terms in the dataset.
Visualize the distribution of word frequencies using appropriate charts (e.g., word clouds, bar charts).
Sentiment Analysis:
Perform sentiment analysis to gauge the overall sentiment of the text data.
Create sentiment score distributions and visualizations.
Topic Modeling:
Apply topic modeling techniques (e.g., LDA or NMF) to identify key topics within the text.
Visualize topic distributions and their evolution over time (if applicable).
Text Visualization:
Create informative visualizations to present the results of the analysis, such as word clouds, scatter plots, or heatmaps.
Insights and Findings:
Summarize the key insights and findings derived from the data analysis and visualizations.
Documentation:
Update the project documentation with the analysis methodology and findings.
Acceptance Criteria:
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Create a tutorial for beginners that explains the concept of cross-validation in machine learning. Guide users through implementing k-fold cross-validation to assess model performance.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Create a Jupyter Notebook tutorial that implements Anomaly Detection. Explain the concept of Anomaly Detection, and its applications, and provide a tutorial on how to use it in machine learning projects.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Description:
We need to create a comprehensive tutorial on outlier detection techniques and their practical implementation for our data science community. Outliers can significantly impact our data analysis and machine learning models, and it's essential that our users are well-informed about how to handle them.
Tasks:
Research and gather information on commonly used outlier detection methods.
Create a step-by-step guide on how to apply these techniques using our dataset.
Include code examples and explanations for each method.
Provide real-world use cases and scenarios where outlier detection is crucial.
Add visualizations to help users better understand the impact of outliers.
Ensure the tutorial is beginner-friendly and suitable for all skill levels.
Proofread and edit the tutorial for clarity and accuracy.
Create a table of contents and structure the tutorial logically.
Include external references and resources for further learning.
Test the code examples and instructions to confirm their correctness.
Expected Outcome:
Once this issue is completed, we will have a well-documented and informative tutorial on outlier detection. This resource will help our community members gain a better understanding of how to handle outliers in their data science projects.
Acceptance criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook
Develop a beginner-friendly tutorial on data augmentation using the ydata-synthetic library. Explain how to generate synthetic data to increase the size of training dataset, improving model performance.
Acceptance Criteria
Submit a Jupyter notebook containing the tutorial and the necessary datasets if need
Modify the README.md file to include the new tutorial and a link to the added notebook