-
Exploratory Data Analysis
-
Feature Engineering
-
Feature Selection
-
Model Selection
-
Model tuning
The purpose of this project is to produce a model to predict the winners of the 2023 March Madness college basketball tournament.
- All data will be acquired through Beautiful Soup from: https://www.sports-reference.com/cbb/
- The programming will be done inside of jupyter notebooks
- The data will be modified & analyzed using Python w/ the Pandas package
- This project will help to practice newly acquired data science skills
VSCode | Jupyter | Python | Github | iTerm2 | MacOS |
Code Editor | Notebook | Language | Developer Platform | Terminal | Operating System |
Numbers | SciKit Learn | Feature Engine |
Data Visualization | ML Library | Data Transformation |
Data will be collected for all NCAA College basketball teams on Sports Reference website: https://www.sports-reference.com/cbb/
The data to be collected:
- Team Names (ex. Colorado Buffaloes)
- Team Home Page Link ex. 0,1,2)
- Team Years (years containing stats) (ex. '2002-03')
- Stat Labels (of data to-be collected) (ex. fg_pct)
Actions to be performed:
- Establishes 'Team' Object (https://github.com/wmauz677/MarchMadness2023/blob/main/Classes/team.py)
- Creates teams_dictionary: the main data dictionary where all stats will be stored for all teams for all years
- Writes teams_dictionary to 'Data/teams_dictionary.pkl'
Data Source
Distributed under the MIT License. See LICENSE
for more information.