For the midterm project you'll get to practice the data science process by trying out a regression problem! You'll then get to compare your model performance to other peers' work on the same dataset as well as work on several other datasets. This will allow you to see new ideas on the same problem as well as nuances of the same technique across different datasets.
You will take one of the following datasets and apply a standard data science process of exploratory analysis, cleaning and machine learning. This process will be outlined in several large stages. To start, take your dataset, import the data and begin doing some initial exploratory analysis using the tools and techniques we have covered thus far. You will find all three datasets under a folder titled 'Datasets'.
- Walmart Sales Forecasting
- Estimating NYC Cab Trip Duration
- Lego Sets (previously seen)
- Load Data
- Exploratory Analysis
- Initial Model
- Evaluation
- Feature Engineering
- Model 2
- Evaluation
- Further Investigation/Exploration
#Your code here
#Your code here