.
├──bin
│ ├── ffm-train # executable to train an FFM model, see https://github.com/guestwalk/libffm to build it
│ └── ffm-predict # executable to make prediction with a trained FFM model, same as above
└── data # Toy Datasets
├── dog_breed # downloaded from https://www.kaggle.com/c/dog-breed-identification/data
│ ├── Test # (folder) unzipped from test.zip
| ├── Train # (folder) unzipped from train.zip
| ├── labels.csv # unzipped from labels.csv.zip
| └── sample_submission.csv # unzipped from sample_submission.csv.zip
└── house_pricing # downloaded from https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
├── data_description.txt # directly downloaded
├── sample_submission.csv # directly downloaded or unzipped from sample_submission.csv.gz
├── test.csv # directly downloaded or unzipped from test.csv.gz
└── train.csv # directly downloaded or unzipped from train.csv.gz
- Try TensorFlow
- Try PyTorch
- Try MXNet
- Try FM
- Try FFM
- Try PNN (Polynominal Neural Networks)
- Try matrix decomposition (MF, SVD/SVD++, SVD for RecSys etc.)
- Try sklearn FeatureUnion and Pipeline
- Try tuning hyperparameters with hyperopt
- Try stacking
- Try ensemble with raw features involved (my idea)
- Try synthetic oversampling
- Try cost-sensitive learning for imbalanced data (not only with imbalenced labels, but also features)
- Try ensembling for imbalanced data
- Try cython
- Try TSNE
- Add simple data visualization
- Add simple testing
- Implement (or just copy from somewhere) Bayesian smoothing for CTR/CVR
- Implement simple A/B test
- Implement a full tunable and testable pipeline (it's fine to be simple, but it must be complete)
- Fix globals problem in info_utils