Working on: https://www.kaggle.com/c/freesound-audio-tagging-2019
-
Run 00_Preprocess.ipynb (Takes ~2.5 hours ๐ข)
- Converts audio files into images and saves them
- Turns string labels into binary indicators
- Perform label smoothing on the noisy dataset
- Merge the
train_curated.csv
andtrain_noisy.csv
intotrain_merged.csv
-
- Generates a single (balanced) validation fold based on the curated training set
- Defines a few simple image transforms
- Creates an
ImageDataBunch
with batch image normalization.normalize()
- Creates a
vgg16_bn
learner that uses mixup data augmentation
-
Run src/trainAll.py
- Performs a full training cycle with my best known hyperparameters and network
- Creates a
/kfolds
folder with validation set predictions - Creates a
/model_predictions
folder with test set predictions - Creates a
/model_source
folder with the exact source used to generate a given score
-
00_EDA.ipynb is Exploratory Data Analysis
- Visualize class balance
- Visualize audio length
- Find incorrect audio file
77b925c2.wav
- Example on how to create a validation set that is
- Only taken from curated dataset
- Is balanced according to labels (using
MultilabelStratifiedKFold
)
-
- Looking at the activations of the network to make sure nothing seems problematic
-
- My attempt to compute image statistics for normalization (similar to
imagenet_stats
ormnist_stats
) - Unfortunately using the statistics from this doesn't improve performance
- I have probably made a mistake and am misunderstanding how the statistics should be calculated.
- My attempt to compute image statistics for normalization (similar to