antixk / pytorch-vae Goto Github PK

A Collection of Variational Autoencoders (VAE) in PyTorch.

License: Apache License 2.0

Python 100.00%

pytorch pytorch-implementation vae vae-implementation deep-learning reproducible-research paper-implementations pytorch-vae variational-autoencoders architecture

pytorch-vae's Introduction

pytorch-vae's People

Contributors

Stargazers

Watchers

Forkers

davidliudw kldcr boyali ymarghi mtlong danielf29 nahidalam chomolungma amirstudy satoshirobatofujimoto ml-ai-nlp-ir nisheethjaiswal jaivardhankapoor ioangatop subburajs johnleehit suyanzhou626 hell-to-heaven nevrets noncomplete neccam jupram megayeye haroldss baukebrenninkmeijer andreimano notreallyme2 yufanpapa idavidrein russellizadi julienyulinma ml-and-ai-repo zengxh stevenji ttt749 riqianggao suwoncjh lobantseff nateraw zmskye eogns282 brophy-e jhsa26 nicholaspalomo ddkang yongduek sunshinejnjn rahulbhalley mujjingun wangxuuu timewarlock fgitmichael jeculai firsthandscientist areslp pth1993 gakkilovemath zhangkexin1126 onose004 batteryhp hologerry tor4z hsingjun lxk-yb nishank974 georgehappy1 wang1102 dlecnu hackathorn hex41434 davefiorino kingfou lelechen63 nmichlo-forks hainan89 vnesh-san candynamic ryank1m alexfrontxq ahoyosid rezaarmand zhaomengyu eikekutz silyfox ransselected nathanhack austenlamacraft pooyamoini iyerkrithika21 godofpdog ashlee-lu namkhanhtran redheadm dongzhikang mesmesgit qitianwu j20232 february24-lee toddhollon brarkaran

pytorch-vae's Issues

Generalise hard coded values

The code fails for general values of hidden dims, image sizes, etc. due to shape mismatches. Since image size and hidden dims are parameters anyway, can you please increase the flexibility in other parts of the code? This will allow it to be used as it is on other datasets and different architectures.

Why is M_N defined as the ratio of the batch size and dataset size?

In experiment.py you define M_N like this

train_loss = self.model.loss_function(*results,
                                              M_N = self.params['batch_size']/ self.num_train_imgs,
                                              optimizer_idx=optimizer_idx,
                                              batch_idx = batch_idx)

M_N is then used as a weighting factor for the KD term in the VAE implementations. Why does the weight depend on the ratio of the batch size and dataset size?

I saw other literature that uses the ratio of the latent dimension and the input dimension instead. I am not sure which is correct.

Related without answers (or not directly to my question)
#11 (says that it is to correct variances caused by small batches)
#23
#35

Image size of 128 and 256

I have checked previous issues with image size problem. You mentioned that
model = *VAE*(<in_chanels>, <latent_dim>, hidden_dims=[16, 32, 64, 128, 256, 512]) doing this would increase image size to 128. To make it 256 should we add [8, 16, 32, 64, 128, 256, 512] as well?

Also why doing this changes image size? I don't understand. I have known this from this issue #29

why is sample not implemented in vq vae

https://github.com/AntixK/PyTorch-VAE/blob/master/models/vq_vae.py#L216

any reason?

Image size

Hi, is there anyway to change the image size from 64 to 128 easily?

Error

Hi,
When I run the code, I encounter the following error. Would you please let me know how I can fix this issue?

python run.py -c configs/wae_mmd_imq.yaml
Traceback (most recent call last):
File "run.py", line 44, in
runner = Trainer(default_save_path=f"{tt_logger.save_dir}",
TypeError: init() got an unexpected keyword argument 'default_save_path'

Permutation Function for FactorVAE

According to the original paper by Kim et al., the permutation function permutes across the batch for each dimension. In the case here, if B, D = z.size(), def permute_latent(self, z: Tensor) should permute z along the dimension of B, i. e., z[i, j] = z[new_indices[i], j], where new_indices = torch.randperm(B).

Dynamic batch size in Beta-tcvae

Hi nice works,

I use dynamic batch size in my training, is it ok to use dynamic batch size to train beta-tcvae, as start_weight calculating depends on batch_size

PyTorch-VAE/models/betatc_vae.py

Line 177 in 8700d24

dataset_size = (1 / kwargs['M_N']) * batch_size # dataset size

Input and decoder output value ranges

Thank you for sharing this repo!
In VanillaVAE (and maybe others) I was wondering about the choice of using tanh as the final activation for outputs (that has a range of [-1,1]) without normalizing the input images to the same range [-1,1] (in the dataset transforms).
Wouldn't it make the reconstruction loss work much harder, opposed to either normalizing the input or using a final activation like sigmoid?

AttributeError: 'VAEXperiment' object has no attribute '_lazy_train_dataloader'

command: python run.py --config configs/vae.yaml
Then error messages pop up as the following:

INFO:root:gpu available: True, used: True
INFO:root:VISIBLE GPUS: 0
======= Training VanillaVAE =======
3099it [00:00, 8467848.92it/s]
Using downloaded and verified file: ../../shared/Data/celeba/list_attr_celeba.txt
Using downloaded and verified file: ../../shared/Data/celeba/identity_CelebA.txt
Using downloaded and verified file: ../../shared/Data/celeba/list_bbox_celeba.txt
Using downloaded and verified file: ../../shared/Data/celeba/list_landmarks_align_celeba.txt
Using downloaded and verified file: ../../shared/Data/celeba/list_eval_partition.txt
3099it [00:00, 6145696.50it/s]
Using downloaded and verified file: ../../shared/Data/celeba/list_attr_celeba.txt
Using downloaded and verified file: ../../shared/Data/celeba/identity_CelebA.txt
Using downloaded and verified file: ../../shared/Data/celeba/list_bbox_celeba.txt
Using downloaded and verified file: ../../shared/Data/celeba/list_landmarks_align_celeba.txt
Using downloaded and verified file: ../../shared/Data/celeba/list_eval_partition.txt
Traceback (most recent call last):
File "/home/jbhuang/anaconda3/envs/vae2/lib/python3.7/site-packages/pytorch_lightning/core/decorators.py", line 17, in _get_data_loader
value = getattr(self, attr_name)
File "/home/jbhuang/anaconda3/envs/vae2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 591, in getattr
type(self).name, name))
AttributeError: 'VAEXperiment' object has no attribute '_lazy_train_dataloader'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/jbhuang/MyWork/vae/PyTorch-VAE/utils.py", line 17, in func_wrapper
return pl.data_loader(fn)(self)
File "/home/jbhuang/anaconda3/envs/vae2/lib/python3.7/site-packages/pytorch_lightning/core/decorators.py", line 20, in _get_data_loader
value = fn(self) # Lazy evaluation, done only once.
File "/home/jbhuang/MyWork/vae/PyTorch-VAE/experiment.py", line 143, in train_dataloader
download=True) #Bill
File "/home/jbhuang/anaconda3/envs/vae2/lib/python3.7/site-packages/torchvision/datasets/celeba.py", line 63, in init
self.download()
File "/home/jbhuang/anaconda3/envs/vae2/lib/python3.7/site-packages/torchvision/datasets/celeba.py", line 117, in download
with zipfile.ZipFile(os.path.join(self.root, self.base_folder, "img_align_celeba.zip"), "r") as f:
File "/home/jbhuang/anaconda3/envs/vae2/lib/python3.7/zipfile.py", line 1258, in init
self._RealGetContents()
File "/home/jbhuang/anaconda3/envs/vae2/lib/python3.7/zipfile.py", line 1325, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Please help, thank you.

python 3.5 no longer supported

update README

integrate with Lightning ecosystem CI

Hello and so happy to see you use Pytorch-Lightning! 🎉
Just wondering if you already heard about quite the new Pytorch Lightning (PL) ecosystem CI where we would like to invite you to... You can check out our blog post about it: Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI ⚡
As you use PL framework for your cool project, we would like to enhance your experience and offer you safe updates to our future releases. At this moment, you run tests with a particular PL version, but it may accidentally happen that the next version will be incompatible with your project... 😕 We do not intend to change anything on our project side, but still here we have a solution - ecosystem CI with testing both - your and our latest development head we can find it very early and prevent releasing eventually bad version... 👍

What is needed to do?

have some tests, including PL integration
add config to ecosystem CI - https://github.com/PyTorchLightning/ecosystem-ci

What will you get?

scheduled nightly testing configured for development/stable versions
slack notification if something went wrong to investigate
testing also on multi-GPU machine as our gift to you 🐰

LibTorch VAE Implementation

Thank you for sharing your comprehensive and illuminating set of examples in this repository. I'm currently thinking of re-implementing a subset of these models, based on your Python implementations, using LibTorch, PyTorch's C++ frontend.
Providing I obtain some fruitful results, would you be interested in hosting some of those models here?

Nan in loss function in TC-Beta VAE

Hi,

I am running TC-Beta VAE on my data and I changed my architecture to an MLP encoder and Decoder. But I am getting nan in the loss function. And it seems I am getting nans for log_importance_weights, log_q_z and log_prod_q_z. Should I just add an epsilon to each of these quantities before taking log or there is some other issue that I am missing.

Cant install requirements

Hey, cool repo. Just wanted to let you know that I got the error below, when I run pip install -r requirements.txt. Not sure if others get this.
My python version: 3.8.5
My pip version: 20.0.2

Collecting pytorch-lightning==0.6.0
  Using cached pytorch-lightning-0.6.0.tar.gz (95 kB)
Collecting PyYAML==5.1.2
  Using cached PyYAML-5.1.2.tar.gz (265 kB)
Collecting tensorboard==2.1.0
  Using cached tensorboard-2.1.0-py3-none-any.whl (3.8 MB)
Collecting tensorboardX==1.6
  Using cached tensorboardX-1.6-py2.py3-none-any.whl (129 kB)
Collecting terminado==0.8.1
  Using cached terminado-0.8.1-py2.py3-none-any.whl (33 kB)
Collecting test-tube==0.7.0
  Using cached test_tube-0.7.0.tar.gz (20 kB)
ERROR: Could not find a version that satisfies the requirement torch==1.2.0 (from -r requirements.txt (line 7)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1)
ERROR: No matching distribution found for torch==1.2.0 (from -r requirements.txt (line 7))

Why not use rsample() PyTorch?

Beta VAE: Strange input & output channels

Why use hidden_dims[-1]*4? This will cause a mismatch between the input channel and the output channel like this (If i got it right)
Thank you!

Solved!

Installation error

New to pytorch and I'm looking to run your work but I'm encountering the error when when I set download=True in the appropriate locations in the experiment.py to download the celeba datasets I'm encountering this error:

Traceback (most recent call last):
  File "run.py", line 55, in <module>
    runner.fit(experiment)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 602, in fit
    self.single_gpu_train(model)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 470, in single_gpu_train
    self.run_pretrain_routine(model)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 796, in run_pretrain_routine
    self.reset_val_dataloader(ref_model)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py", line 154, in reset_val_dataloader
    self.val_dataloaders = self.request_data_loader(model.val_dataloader)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py", line 220, in request_data_loader
    data_loader = data_loader_fx()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pytorch_lightning/core/decorators.py", line 16, in inner_fx
    return fn(self)
  File "/home/ubuntu/PyTorch-VAE/experiment.py", line 161, in val_dataloader
    download=True),
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torchvision/datasets/celeba.py", line 63, in __init__
    self.download()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torchvision/datasets/celeba.py", line 117, in download
    with zipfile.ZipFile(os.path.join(self.root, self.base_folder, "img_align_celeba.zip"), "r") as f:
  File "/home/ubuntu/anaconda3/lib/python3.6/zipfile.py", line 1108, in __init__
    self._RealGetContents()
  File "/home/ubuntu/anaconda3/lib/python3.6/zipfile.py", line 1175, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Running on Ubuntu Deep Learning AMI instance with torch==1.3.1 and torchvision==0.4.2.

Would appreciate any help you can give! Thanks a lot.

VQ VAE model's reconstruction is a black image

I am using your wonderful library in my research project. There seems to be a bug in the VQ VAE mode class where the reconstruction is a blank image. Is it a known bug? Can you please help me with this issue?

In factor vae: an inplace problem occurred when using loss.backward

the error message is:

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4096, 6]], which is output 0 of TBackward, is at version 2; expected version 1 instead.

it seems to be the problem that a term(self.D_z_reserve) used in D_tc_loss calculated at vae_loss stage was modified somehow.

D_tc_loss = 0.5 * (F.cross_entropy(self.D_z_reserve, false_labels) + F.cross_entropy(D_z_perm, true_labels))

giving details:
I calculated and updated vae loss first, like:

 self.optim_VAE.zero_grad()
 vae_loss.backward(retain_graph=True)
 self.optim_VAE.step()

then when updating discriminator:

z = z.detach()
z_perm = self.permute_latent(z)
D_z_perm = self.D(z_perm)
D_tc_loss = 0.5 * (F.cross_entropy(self.D_z_reserve, false_labels) + F.cross_entropy(D_z_perm, true_labels))

self.optim_D.zero_grad()
D_tc_loss.backward()
self.optim_D.step()

the error message occurs as discribed at beginning.

when I delete term F.cross_entropy(self.D_z_reserve, false_labels) in D_tc_loss,
or change D_tc_loss into

D_tc_loss = 0.5 * (F.cross_entropy(self.D_z_reserve.detach(), false_labels) + F.cross_entropy(D_z_perm, true_labels))

everything goes alright.
but I'm not sure if use .detach() here is fine, and wondering what exact problem it is, waiting for you reply, thanks a lot.

Why is sampling N/A for VQ-VAE?

It should be possible to sample $z$ from the discrete latent space and then sample from $p(x|z)$

Val loss mismatch with train loss

Hi, thanks for this repository. I noticed while running that the train loss numbers are around <0.1 after an epoch, but the validation loss ranges from 20-30 (for Vanilla VAE, but the mismatch holds across a few other models with the same base like CVAE). I think this mismatch was introduced by #2 - the division for M_N now uses a different denominator in train vs val. The original intent was to get around val being run for the first time before train in the newer versions of lightning, but I don't think this is correct. Instead, one workaround is to run Trainer.fit with num_sanity_val_steps=0; this way num_train_steps is set before validation is run, so the train loss and val loss are back on the same scale. By doing this, I get similar numbers for both.

Am I misinterpreting something? Please let me know if I'm incorrect/the train and val losses should be very different. I'm not sure I understand the different scaling. Although the previous fix gets the two terms to be comparable again, I'm not really sure why we reweight the KL term in the first place - any insight would be appreciated.

KLD Weight

Hi,

In the VAE paper (https://arxiv.org/pdf/1312.6114.pdf), the VAE loss function has no additional weight parameter for the KLD loss:

However, in the implementation of the Vanilla VAE model, the loss function is written as below:

loss = recons_loss + kld_weight * kld_loss

When I set "kld_weight" to 1 in my model, it could not learn how to reconstruct the images. If I understand correctly, the "kld_weight" reduces the effect of the KLD loss to balance it with the reconstruction loss. However, as I mentioned, it is not defined in the VAE paper. Could anyone please explain to me why this parameter is used and why it is set to 0.00025 by default?

CIFAR-10 results not good

I am trying to reproduce published results on the CIFAR-10 dataset. My results currently do not look good using the default parameters for e.g. Vanilla VAE (and others). I.e. the model learns something very blurry, similar to e.g. https://bjlkeng.github.io/images/m2_images.png

Any suggestions how to improve these, e.g. I noticed that reducing the weight of the KL-loss already makes a big difference.

Issues with newest version of Lightning (0.7.0 & 0.7.1)

I've been running into this error when using your package when running on the newest master version of lightning. Running this command:

python run.py -c configs/cvae.yaml

results in ultimate error message:
AttributeError: 'VAEXperiment' object has no attribute 'num_train_imgs'

This is with Python 3.7.

When I downgrade to Lightning version 0.6.0 the same command works.

Setting for gamma and max_capacity in Beta_VAE

Hi,
When using beta_VAE for my own dataset, I'm not sure how I could set values for gamma and max_capacity. Should I just use the default one? Or is there any rule for setting them? Does anyone have a sense or explanation of this? Thank you!

Some questions regarding decoder in vanilla_vae

Hello @AntixK !

Thanks for sharing this helpful repository.

I am a beginner in VAE implementation and hence, had a confusion related to vanilla_vae:

In the self.final_layer, why is nn.ConvTranspose2d created with both in_channels and out_channels as hidden_dims[-1] i.e. 32 followed later by nn.Conv2d? Can we not have out_channels as 3 directly in nn.ConvTranspose2d and omit nn.Conv2d?

Could you please provide me insight on these?

Thank you, and have a nice day!

Why is img_size default to 64?

Hi was wondering why the img_size parameter for experiments is set to 64 as the celebA dataset images are of much larger size.

I am trying to experiment this on a different dataset that is larger than celebA (512 x 512) and was wondering if I should change img_size to create better reconstructed images. I tried changing it myself but I ran into size mismatch issues even though I don't see in the model file where the size of 64 comes to play.

Any help would be much appreciated

KL calculation in ladder VAE

Why for the KL calculation did you take the mean and variance of the encoder and not the ladder_block output?
The "prior" part is the mean and variance from the ladder_block.
As it can be seen from the next snapshot:

The generated samples are not great for WAE_MMD

Hi,
Friends, what about the quantity of your generated samples, my samples are not good-looking, I show some of them below.

Generated samples:

Reconstructed samples:

The reconstruction fails

Hello,

I used VanillaVAE to reconstruct game images, but I failed to do that.

You can see the images: the images in the first row is original images, while the lower one for reconstructed images.

The background of the image can be perfectly reconstructed, but the key object cannot be reconstructed. Do you have any suggestion?

Thank you!

running program on macOS

Hi, I am a new programmer on Python and still new on writing program on computer. So it is a bit messy on how to manage the folder and package. While I am running the code, I have face the following error.
pytorch_lightning.utilities.debugging.MisconfigurationException:
You requested GPUs: [0]
But your machine only has: []
May I ask is there any method to solve it. Thanks a lot

Sign issue in gaussian kernel function of MS-SSIM-VAE

The computation of the gaussian kernel is missing a sign in the exponent:

PyTorch-VAE/models/mssim_vae.py

Line 204 in 8700d24

kernel = torch.tensor([exp((x - window_size // 2)**2/(2 * sigma ** 2))

KL Weight

Hi!

Maybe it's a silly question but why do you use a KL Weight term? I understand that it's the percentage that a batch is over the total dataset. For instance, if there are 100 observations and the batch size is 10, the kl_weight should be 0.1, but why do you use it? I've seen some other implementations and doesn't find it. I'm sure there's a reason but I cannot find why weight just the KL Divergence and no the reconstruction loss.

Thank you so much! :)

Dynamic Range in MS-SSIM-Loss

Why do you compute the dynamic range in the MS-SSIM VAE from the data range of the reconstructed images. If I understand the original SSIM paper correctly, the dynamic range should be the largest values that the images might assume (e.g. 1.0)?

Torch Issue

I am trying to run this repo for the first time. I am getting the following error. Torch is installed and I am able to import torch outside of this script. Has anyone experienced a similar issue?

Traceback (most recent call last):
  File "run.py", line 5, in <module>
    from models import *
  File "D:\github\PyTorch-VAE\models\__init__.py", line 1, in <module>
    from .base import *
  File "D:\github\PyTorch-VAE\models\base.py", line 2, in <module>
    from torch import nn
ModuleNotFoundError: No module named 'torch'

Dataset CIFAR-10

Hi @AntixK,
Thank you for sharing your project with us.
I have a doubt. How do I use another dataset? I would like to use the CIFAR-10 collection. I changed the experiments file, but give the following error:

File "/home/josi/doutorado-2019/PyTorch-VAE/experiment.py", line 143, in train_dataloader download=True) TypeError: __init__() got an unexpected keyword argument 'split'

Could you help me?

Update to Latest Pytorch_Lightning

When I try to run this code, I get many errors that seem to be related to using deprecated options in pytorch_lightning, such as the max_nb_epochs option or the pytorch_lightning.logging module (which was replaced with the pytorch_lightning.logger module).
Lightning-AI/pytorch-lightning#663

Will this repo be updated to use the latest version of pytorch_lightning? I'm having a difficult time getting things to work

Thank you,
Ryan

Custom dataset.

Is it possible to use these models on our own custom dataset?

Linear layer weight init question

Thanks for your code. I encountered a problem when running the program. Because of the initialization problem of linear layer, each dimension value of the hidden vector learned is very small. Is there any suggested initialization method that can achieve better Is it good?

Modification for Test

It is needed to add a line self.save_hyperparameters() in

PyTorch-VAE/experiment.py

Line 16 in 8700d24

def __init__(self,

initialization to save hyperparameters (i.e., vae_model and params here), for successfully calling LightningModule.load_from_checkpoint(PATH) and runner.test() afterwards.

Dataset not found (running in a google colab sheet)

Hi, I'm new to pytorch and VAEs in general. I attempted to run your VanillaVAE but I can't figure out how to reference the dataset inside the config file. Specifically, this is my folder structure:

and my config looks like this:

exp_params:
  dataset: celeba
  data_path: "/content/drive/MyDrive/Colab/celeba"
  img_size: 64
  batch_size: 144 # Better to have a square number
  LR: 0.005
  weight_decay: 0.0
  scheduler_gamma: 0.95

but still the error remains the same:
RuntimeError: Dataset not found or corrupted. You can use download=True to download it

Apart from the fact that using download=True doesn't work (looks like it attempts to download an invalid dataset), is there anything to take into account like unzipping the archive img_align_celeba.zip or stuff like that?

Why not use rsample() PyTorch

Is there a reason for manually implementing reparameterization instead of using the rsample method provided in PyTorch?

Vanilla-VAE and usage of kld_weight

Hi @AntixK

Many thanks for this great effort.

Based on my understanding so far the original VAE does not talk about weighing the kl_divergence_loss. Later beta-vae and many other papers made the case of weighing the kl_div (and essentially treat it as a hyper-parameter).

In your implementations, I see that you consistently use kld_weight = kwards['M_N'] = batch_size/num_of_images.

Is this a norm to select the weight for kl_div loss using the ratio of batch size and a number of images?

Since in the original VAE paper no weighing was done is it okay to use it in vanilla_vae.py?

Regards
Kapil

Sampling Vanilla VAE

Hi,
First, thanks for all the shared work !
I have a question concerning the sampling function in the Vanilla VAE. Why do you sample from a normal distribution (0,1) and not from a normal distribution with the learned parameters mu and sigma ? Since when we train the network we decode from the latent space over this distribution isnt more meaningful to sample from this distribution ? Maybe is there something I didnt get.
Thank you again

Problems of IWAE ELBO Loss

Hi Anand and all,

As weighting of samples, weight should be detached from the current computational graph for the expected optimization objective, right? See

PyTorch-VAE/models/iwae.py

Line 155 in 8700d24

weight = F.softmax(log_weight, dim = -1)