Welcome to the Soccer Stats Spectacular! This project dives into the world of soccer to analyze player performance and predict their skills. We're combining real-world match statistics with player ratings from Football Manager 2023 and FC24 to create a comprehensive model for player evaluation and comparison.
- Predict Player Skills: Can we use real match data to predict how players are rated in video games? Let's find out!
- Find Player and Team Twins: Ever wondered which players or teams are secretly alike? We'll uncover those hidden similarities.
- Bridge the Gap: Explore the fascinating relationship between on-field performance and virtual ratings.
Get ready for a deep dive into soccer stats, where data science meets the beautiful game!
1. Data Consolidation 📓
- Merge separate files for each league and statistic type.
- Combine FBRef statistics with Football Manager and FC24 ratings.
- Ensure consistent player and team naming across all datasets.
- Calculate percentage above/below team average for each player statistic:
def percent_above_below(group): return (group - group.mean()) / group.mean() player_data_transformed = player_data.groupby('team'). transform(percent_above_below)
- Aggregate original pre-transformation data per team for team-level statistics.
- Apply Principal Component Analysis (PCA) to reduce the dimensionality of player statistics:
from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler scaler = StandardScaler() player_data_scaled = scaler.fit_transform(player_data_transformed) pca_player = PCA(n_components=0.95) # Retain 95% of variance player_pca = pca_player.fit_transform(player_data_scaled)
- Apply PCA to reduce the dimensionality of team statistics.
- Append the resulting team PCA components to individual player data:
team_data_scaled = scaler.fit_transform(team_data) pca_team = PCA(n_components=5) # Adjust number of components as needed team_pca = pca_team.fit_transform(team_data_scaled) player_data_final = pd.concat([ pd.DataFrame(player_pca), pd.DataFrame(team_pca[player_data['team_id']]) ], axis=1)
- Perform exploratory data analysis (EDA) after each preprocessing step.
- Visualize PCA results to understand player and team distributions.
- Validate the effectiveness of the preprocessing steps.
- Implement a versioning system for datasets at each preprocessing stage.
- Ensure reproducibility of results and enable easy backtracking if needed.
To get started with the Soccer Stats Spectacular, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/soccer-stats-spectacular.git
- Navigate to the project directory:
cd soccer-stats-spectacular
- Install the required packages:
pip install -r requirements.txt
To run the analysis and start making predictions, execute the following command:
python main.py --data-source "FBRef" --season "2023-24" --predict-skills
This will kick off the data processing and prediction pipeline, producing results that you can analyze further.
We welcome contributions from fellow soccer enthusiasts and data scientists! To contribute:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature
). - Commit your changes (
git commit -m 'Add some AmazingFeature'
). - Push to the branch (
git push origin feature/AmazingFeature
). - Open a Pull Request.
This project is licensed under the MIT License. See the LICENSE.md file for details.
For questions, support, or just to say hello, reach out to us at:
- Email: [email protected]
- Twitter: @SoccerStatSpectacular
We hope you enjoy diving into the world of soccer stats with us!