sidward14 / style-attngan Goto Github PK

Improves Text to Image synthesis from AttnGAN by integrating the scale-specific control from StyleGAN; can optionally use GPT-2 as text encoder

License: Other

Python 100.00%

style-attngan's People

Contributors

Stargazers

Watchers

Forkers

vinayak-vg jamescoupe jesseai ayman-elkassas yashgupte21 maxylee namnaku87 electronicshelf hcji ji-in lily11223344 yangsenwxy kyunghoyu mehrshadmmt haizelf

style-attngan's Issues

flowers dataset

Can I use oxford 102 flower dataset instead of birds dataset in this code?
If yes can you help me how to do this?

Where is the hidden_state for gpt2?

Word features and sentence feature

My question is that if you use GPT-2 instead of LSTM for the text encoder, how did you calculate the word features and sentence feature? Can you explain more details?

What is the training process for AttnGAN ?

load model

Hello, how can I use the saved model from output/model/
I'm running code on google colab and I must to save model after some epochs.
Please guide me do this.

Which text encoder ?

Did you use GPT-2 or biLSTM for the text encoder to get the results shown in the table in the README.md file? Thanks

What is the learning rate to pre-train the DAMSM + GPT-2 model ?

The configuration file to pretrain the DAMSM + GPT-2 model:

Style-AttnGAN/code/cfg/DAMSM/bird.yml

Line 20 in 8aa8da6

    
           ENCODER_LR: 0.002  # 0.0002best; 0.002good; scott: 0.0007 with 0.98decay   # GPT-2: 0.00005good; 0.002bad

In your comment you mentioned to use a learning rate of 0.00005 for GPT-2. Should I change to 0.00005?

Thanks

Hi, I have seen your instructions for using a custom dataset. However, when I was trying to execute the pretrain_DAMSM.py, the program required captions.pickle file in the dataset path. It seems like I should do some data preprocessing work before the training. Do you have any ideas? Thanks!

?

About dimension problem when training GAN

In “D_GET_LOGITS” - “forward”
h_c_code = torch.cat((h_code, c_code), 1)
there is an error:

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 4 for tensor number 1 in the list.

I dont know the reason

How to pre-train DAMSM model to use GPT-2 for the text encoder ?

Can you give me which part of the code I need to modify and the command I should type to pre-train the DAMSM model to use GPT-2 for the text encoder? Thanks

How to train on our own custom dataset ? What files are needed for the trainings

Which epoch of text encoder and image encoder to use ?

Hello! I pre-trained the DAMSM + GPT-2 model for 600 epochs, and I have saved files for the text encoder and image encoder from text_encoder50.pth and image_encoder50.pth to image_encoder550.pth and image_encoder550.pth. I want to ask which epoch I should use for both the text encoder and image encoder to train the GAN

How to generate the textual descriptions for my own dataset?

Hi @sidward14,

Thx for sharing your work. While I was reading the original paper, I was curious about how the texts are generated for the CUB dataset. Is there a fixed pattern I can follow or is it generated based on a fixed grammar?

'

Working of gpt2 not clear

Whenever i try to train or evaluate using transformer flag with command, im getting error. It would be great if an example was provided on how to train with gpt2 transformer and results.
Error:
Traceback (most recent call last):
File "main.py", line 158, in
algo.train()
File "/content/drive/My Drive/pj/Style-AttnGAN-master/code/trainer.py", line 311, in train
text_encoder, image_encoder, netG, netsD, start_epoch = self.build_models()
File "/content/drive/My Drive/pj/Style-AttnGAN-master/code/trainer.py", line 79, in build_models
text_encoder.load_state_dict(state_dict)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for GPT2Model:
Missing key(s) in state_dict: "wte.weight", "wpe.weight", "h.0.ln_1.weight", "h.0.ln_1.bias", "h.0.attn.bias", "h.0.attn.masked_bias", "h.0.attn.c_attn.weight", "h.0.attn.c_attn.bias", "h.0.attn.c_proj.weight", "h.0.attn.c_proj.bias", "h.0.ln_2.weight", "h.0.ln_2.bias", "h.0.mlp.c_fc.weight", "h.0.mlp.c_fc.bias", "h.0.mlp.c_proj.weight", "h.0.mlp.c_proj.bias", "h.1.ln_1.weight", "h.1.ln_1.bias", "h.1.attn.bias", "h.1.attn.masked_bias", "h.1.attn.c_attn.weight", "h.1.attn.c_attn.bias", "h.1.attn.c_proj.weight", "h.1.attn.c_proj.bias", "h.1.ln_2.weight", "h.1.ln_2.bias", "h.1.mlp.c_fc.weight", "h.1.mlp.c_fc.bias", "h.1.mlp.c_proj.weight", "h.1.mlp.c_proj.bias", "h.2.ln_1.weight", "h.2.ln_1.bias", "h.2.attn.bias", "h.2.attn.masked_bias", "h.2.attn.c_attn.weight", "h.2.attn.c_attn.bias", "h.2.attn.c_proj.weight", "h.2.attn.c_proj.bias", "h.2.ln_2.weight", "h.2.ln_2.bias", "h.2.mlp.c_fc.weight", "h.2.mlp.c_fc.bias", "h.2.mlp.c_proj.weight", "h.2.mlp.c_proj.bias", "h.3.ln_1.weight", "h.3.ln_1.bias", "h.3.attn.bias", "h.3.attn.masked_bias", "h.3.attn.c_attn.weight", "h.3.attn.c_attn.bias", "h.3.attn.c_proj.weight", "h.3.attn.c_proj.bias", "h.3.ln_2.weight", "h.3.ln_2.bias", "h.3.mlp.c_fc.weight", "h.3.mlp.c_fc.bias", "h.3.mlp.c_proj.weight", "h.3.mlp.c_proj.bias", "h.4.ln_1.weight", "h.4.ln_1.bias", "h.4.attn.bias", "h.4.attn.masked_bias", "h.4.attn.c_attn.weight", "h.4.attn.c_attn.bias", "h.4.attn.c_proj.weight", "h.4.attn.c_proj.bias", "h.4.ln_2.weight", "h.4.ln_2.bias", "h.4.mlp.c_fc.weight", "h.4.mlp.c_fc.bias", "h.4.mlp.c_proj.weight", "h.4.mlp.c_proj.bias", "h.5.ln_1.weight", "h.5.ln_1.bias", "h.5.attn.bias", "h.5.attn.masked_bias", "h.5.attn.c_attn.weight", "h.5.attn.c_attn.bias", "h.5.attn.c_proj.weight", "h.5.attn.c_proj.bias", "h.5.ln_2.weight", "h.5.ln_2.bias", "h.5.mlp.c_fc.weight", "h.5.mlp.c_fc.bias", "h.5.mlp.c_proj.weight", "h.5.mlp.c_proj.bias", "h.6.ln_1.weight", "h.6.ln_1.bias", "h.6.attn.bias", "h.6.attn.masked_bias", "h.6.attn.c_attn.weight", "h.6.attn.c_attn.bias", "h.6.attn.c_proj.weight", "h.6.attn.c_proj.bias", "h.6.ln_2.weight", "h.6.ln_2.bias", "h.6.mlp.c_fc.weight", "h.6.mlp.c_fc.bias", "h.6.mlp.c_proj.weight", "h.6.mlp.c_proj.bias", "h.7.ln_1.weight", "h.7.ln_1.bias", "h.7.attn.bias", "h.7.attn.masked_bias", "h.7.attn.c_attn.weight", "h.7.attn.c_attn.bias", "h.7.attn.c_proj.weight", "h.7.attn.c_proj.bias", "h.7.ln_2.weight", "h.7.ln_2.bias", "h.7.mlp.c_fc.weight", "h.7.mlp.c_fc.bias", "h.7.mlp.c_proj.weight", "h.7.mlp.c_proj.bias", "h.8.ln_1.weight", "h.8.ln_1.bias", "h.8.attn.bias", "h.8.attn.masked_bias", "h.8.attn.c_attn.weight", "h.8.attn.c_attn.bias", "h.8.attn.c_proj.weight", "h.8.attn.c_proj.bias", "h.8.ln_2.weight", "h.8.ln_2.bias", "h.8.mlp.c_fc.weight", "h.8.mlp.c_fc.bias", "h.8.mlp.c_proj.weight", "h.8.mlp.c_proj.bias", "h.9.ln_1.weight", "h.9.ln_1.bias", "h.9.attn.bias", "h.9.attn.masked_bias", "h.9.attn.c_attn.weight", "h.9.attn.c_attn.bias", "h.9.attn.c_proj.weight", "h.9.attn.c_proj.bias", "h.9.ln_2.weight", "h.9.ln_2.bias", "h.9.mlp.c_fc.weight", "h.9.mlp.c_fc.bias", "h.9.mlp.c_proj.weight", "h.9.mlp.c_proj.bias", "h.10.ln_1.weight", "h.10.ln_1.bias", "h.10.attn.bias", "h.10.attn.masked_bias", "h.10.attn.c_attn.weight", "h.10.attn.c_attn.bias", "h.10.attn.c_proj.weight", "h.10.attn.c_proj.bias", "h.10.ln_2.weight", "h.10.ln_2.bias", "h.10.mlp.c_fc.weight", "h.10.mlp.c_fc.bias", "h.10.mlp.c_proj.weight", "h.10.mlp.c_proj.bias", "h.11.ln_1.weight", "h.11.ln_1.bias", "h.11.attn.bias", "h.11.attn.masked_bias", "h.11.attn.c_attn.weight", "h.11.attn.c_attn.bias", "h.11.attn.c_proj.weight", "h.11.attn.c_proj.bias", "h.11.ln_2.weight", "h.11.ln_2.bias", "h.11.mlp.c_fc.weight", "h.11.mlp.c_fc.bias", "h.11.mlp.c_proj.weight", "h.11.mlp.c_proj.bias", "ln_f.weight", "ln_f.bias".
Unexpected key(s) in state_dict: "encoder.weight", "rnn.weight_ih_l0", "rnn.weight_hh_l0", "rnn.bias_ih_l0", "rnn.bias_hh_l0", "rnn.weight_ih_l0_reverse", "rnn.weight_hh_l0_reverse", "rnn.bias_ih_l0_reverse", "rnn.bias_hh_l0_reverse".

Question

The coco dataset

Have you trained this repo on the coco dataset ? Can this repo be trained on the coco dataset ? Thanks

Styled netG, Unstyled Images?

Hi Sidhartha --

We're doing a final project for school class (NCSU). We used your adaptation of the model to make images from the oxford 102 flowers dataset. It's great thanks a lot!

We've trained the model using the style netG.

Is there any way now to get un-styled images without retraining? i.e. we have a netG.pth that evaluates un-styled for 64 and 128 images and styled images for the 256 images. I'd like to also generate the unstyled 256 images Is there a way to do this without re-training w/ GAN.B_STYLEGEN = False?

(we all have weak computers and we couldn't get it to run on the school servers, so we had to rent a server to run on which is why we don't want to re-train)

(I /think/ I saw unstyled 256 images in the training output images attention maps. I /think/ the images are styled ha - they sure look different from the 64/128 imgs. Maybe this question is way out in left field.)

Thanks!
Dennis

The pretrained DAMSM + GPT-2 model

Do you have pre-trained DAMSM model that uses GPT-2 for the text encoder? Thanks

Error when pretraining the DAMSM model ?

Hello @sidward14, thanks for your work, I did the same steps for preparing the bird dataset, downloaded and unzipped the file, used the command to pretrain the DAMSM model for the bird dataset but I still got the error:

File "Style-AttnGAN\code\datasets.py", line 87, in get_imgs

re_img = transforms.Resize(imsize[i])(img)
IndexError: list index out of range

Do you know how to fix this error, thanks for your help !