Models for predicting who have survived to the tragic sinking of the RMS Titanic ship.
- Kaggle competition: titanic;
- My Kaggle profile: elacerdajr.
- LogReg: Logistic Regression. Features: PClass, Sex, Sibsp.
- LogReg2: Logistic Regression with polynomial feartures (degree =2). Features: PClass, Sex, Sibsp.
- LogReg2+: Logistic Regression with polynomial feartures (degree =2). Features: PClass, Sex, Age, Sibsp, Parch.
- DTree: Decision Tree.
A sumary of the results are presented at the table:
Model | Accurracy | |
---|---|---|
Train | Test | |
LogReg | 0.80029 | 0.77033 |
LogReg2 | 0.7991 | 0.77511 |
LogReg2+ | 0.8305 | 0.78947 |
Decision Tree | 0.8039 (avg) | 0.74162 |
I noted that:
- Looking at the coefficients at the LogReg model is a great way to infer which variables are more important to the predictions.
- The accuracy is always lower for the test set, compered to the train set.
- The Decision Tree model represented the higher difference between train and test sets.
- The LogReg2+ improved somehow the results compared to the LogReg2 and LogReg models, although they are quite similar.