- ResNet50V1
- MobileNetV2
- VGG16
- SE-ResNeXt101-32x4d
- Zoom (0.1)
- Horizontal Flip
- Vertical Flip
These augmentations were chosen because for the purposes of the images, these augmentations do not interfere with the integrety of the image. For example, a plant flipped horizontally can still be seen as a plant and therefore, disease feature extraction would not be affected.
https://arxiv.org/abs/1512.03385
Trained using 336x336 image size, 3x augmentation. Parameters: 25,636,712 (25,583,592 trainable)
https://arxiv.org/pdf/1801.04381.pdf
Trained using 336x336 image size, 3x augmentation. Parameters: 2,257,984 (2,223,872 trainable)
https://arxiv.org/abs/1409.1556
Trained using 336x336 image size, 3x augmentation. Parameters: 14,717,253 (14,715,461 trainable)
https://arxiv.org/abs/1611.05431
Trained using 224x224 image size, 3x augmentation. Parameters: 47,054,517 (46,916,661 trainable)
There are many improvements that can be made to the model but were not implemented due to time and computation purposes:
- Increase image size For reference, each epoch takes:
Image Size: 500x500 - Colab GPU: 15 minutes, i5-8300H CPU: 9 hours
Image Size: 336x336 - Colab GPU: 8 minutes, i5-8300H CPU: 5 hours
Image Size: 224x224 - Colab GPU: 5 minutes, i5-8300H CPU: 1 hour 30 minutes
- Potentially add data augmentation
- K-fold cross validation
- Test time augmentation
- Callbacks (though may not be useful due to the low training epochs) ReduceLROnPlateau EarlyStopping
- Add additional Dense layers to the base model output including DropOut.
- Potentially remove the mean from training data.
Sample training (top to bottom - ResNet50, MobileNet, VGG16, SE-ResNeXt)