Match prediction for dating app using a Speed Dating survey dataset from Kaggle. Final project for the course "Machine Learning and Intelligent Systems" @Eurecom
- data Raw and processed data
- raw_data.csv original dataset.
- data.pkl cleaned and reshaped dataset produced by the Dataset analysis notebook.
- SpeedDatingSurveyAndDataKey.doc Original survey given to participants annotated with correspondant dataset fields names.
- images Images used in reports
- reports Reports produced at the beginning and during the development
- Dataset_analysis Dataset analysis, cleaning, reshape and features selection.
- Isolation_Forest Isolation forest model experiments.
- Models_Pipeline ML models trainings and testing.
- Principal_Component_Analysis PCA
- gmm.py Python code defining a GMM class used in models.py
- models.py Python code implementing models and pipelines used in the notebook Models_Pipeline
- preprocessing.py Python code implementing preprocessing functions (interactions computations, PCA, training/test split) used in Models_Pipeline
- requirements Project's dependencies
- utilities.py Nice printing functions.