Trained a Variational Auto Encoder using pytorch lightning using CIFAR and MNIST data
- label appended to the Encoder imput
- label hot encoded and resized to append
- Created a batch of images with different labels (except the original to be sent for inference to trained model
- Combined the input and output to be displayed on a grid and dumped into a output image with labels
- Padding to achieve the desired size of inputs like in MNIST
- Understanding the need of the ready model and Increasing the channel size to handle the same
- Handling the display of 1 channel to 3 channels and vice versa
- VAE_CIFAR.py
- Training file for VAE CIFAR with label also appended along with image
- Hotencoded the label and then resized to the image dimensions and added
- VAE_CIFAR_inference.py
- Inference for CIFAR VAE trained using the earlier code
- Pass a different label to see what we get as outcome for each of the images
- Handled the multiple nuances with passing not correct labels and displaying the outcomes in a proper format with labels
- VAE_MNIST.py
- Training file for VAE MNIST with label also appended along with image - On same lines as VAE_CIFAR
- Hotencoded the label and then resized to the image dimensions and added
- Combined the inference part and generation of output images also internally in this
- Additional Complexity to convert the MNIST to 3 channels as well as to 32 by padding due to the Resnet model used by pl_bolts implementation
- Also comparison of the output with the input for loss calculation
- End to show the outputs in a proper format with the changed dimensions
- *.png
- Various outputs
- CIFAR
- Epochs=30
- enc_out_dim=512
- latent_dim=256
- input_height=32
- optimizer=torch.optim.Adam(self.parameters(), lr=1e-4)
- batch_size=16
- num_workers=16
- MNIST
- Epochs=50
- enc_out_dim=512
- latent_dim=256
- input_height=32 (Adjusted size of the image)
- optimizer=torch.optim.Adam(self.parameters(), lr=1e-4)
- batch_size=16
- num_workers=16
- Google Colab T4
- Some experimentation on CPU as well
- Important Excerpts from the tensorboard logs
- Output with different labels do not differ
- When using 28 x 28 image the output we were getting was 24 x 24 causing issues with comparison (So switched to 32 x 32 by padding)
- Also facing issues due to single channel in MNIST as model was defined accordingly
- Instead of changing the model copied the channels 3 times to create a 3 channel input