Springboard Bootcamp Projects

There are various projects that I worked on during my time at Springboard.

We worked on the following topics:

Some of my favorite projects from each section will be briefly described below. Each page will have a more detailed readme on each of the individual projects.

Data Science Pipeline

This folder is designed to contain general projects related to the data science pipeline such as exploring APIs, creating presentations, and the fundamental steps in data science.

The London Borough case study was my favorite as it was one of the first forays I made into data science. We analyzed data from the London boroughs and did basic analysis. Within this folder we also performed some API calls on the NASDAQ API and explored that data on a surface level.

One of my favorite things was the ability to use some data visualization to explore our data.

Regression Algorithms

This folder focuses on projects focused on regression analysis. Presently, we have explored linear and logistic regression. Of these two projects, I think the linear regression case study was my favorite.

Within the linear regression case study, we explored the Red Wine database and created multiple linear regression modlels. This gave me my first foray into exploring adjusting models and understanding the process of performing analysis. In each case study we went through the steps of loading, cleaning, visualizing the data, and tuning the models after testing them.

Clustering Algorithms

In this folder, we look at the foundations of clustering algorithms such as Euclidean vs Manhattan distance, cosine similarity, and k-means algorithms. In cosine similarity, we learned to use it for comparing similarities between sentences using tf-idf. My favorite project would be the k-means project.

In the k-means project we went through the entire process of creating a k-means clustering algoritm. The case study described the fundamentals of how k-means works and ran through some methods for optimizing and measuring the effectiveness of the algorithm. We created scree plots to find optimal clusters and even did silhouette analysis as a means of measuring performance.

Decision Tree Algorithms

This folder focuses all on decision tree algorithms! From standard decision trees to gradient boosting and even ensemble methods like random forest. We go through the foundations of decision tree algorithms and their motivations.

My favorite of these was definitely the Random Forest classifier case study. This covered a more complicated ensemble method which really shows off the power of decision trees. The case study not only showed how to create a random forest model but also how to analyze its performance. We used confusion matrices to understand performance. It also showed ways to look at variable importance.

Time Series Analysis

Within this folder is a singular time series analysis study. We look at creating a sales forecast using existing data. This case study taught the basics of TSA when assessing the data and model. We consider seasonality, stationarity, and forecasting with various ARIMA models.

This case study needs a bit more further work but lays the groundwork for loosely understanding TSA.

Hyperparamater Tuning

An important part of any data science project requires understanding the hyper-parameters and parameters that impact a machine learning model. WIthin this folder, we look at projects which focus on hyperparamter tuning such as Grid Search and Bayesian Optimization.

Of these two, Bayesian Optimization was my personal favorite. While slightly more complicated to implement it offers a far more robust and thorough understanding of hyperparameter tuning than GridSearch.

SQL Project

One of the most vital skills for any data scientist is knowing how to use SQL. Within this SQL project, we perform SQL Queries in both a RDMS like MySQL but also how to create an engine within Python and perform SQL queries within Python.

In the SQL Project you will find not only a notebook with the SQL queries but also the SQL database files that were used to store the SQL queries.

lutimoth / springboardmay2022 Goto Github PK

springboardmay2022's Introduction

Springboard Bootcamp Projects

Data Science Pipeline

Regression Algorithms

Clustering Algorithms

Decision Tree Algorithms

Time Series Analysis

Hyperparamater Tuning

SQL Project

springboardmay2022's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent