Hi, thanks for providing the code. However, I am confused regarding the embedding laye

Is the embedding model trainable during the training process? about diffusion-lm HOT 3 OPEN

xiangli1999 commented on July 24, 2024

Is the embedding model trainable during the training process?

from diffusion-lm.

Comments (3)

XiangLi1999 commented on July 24, 2024

Hi,

Thanks for the question, and for carefully studying the code!

We have experimented with various ways of initializing the word embeddings when training_mode='emb', it means random initialization; when training_mode='e2e', it means training end-to-end. For all the main experiments in the paper (except from ablations) we use --training_mode = 'e2e' to train the embeddings end-to-end. Inside the training code, the embedding step happens here:

Diffusion-LM/improved-diffusion/improved_diffusion/gaussian_diffusion.py

Line 1470 in 759889d

x_start_mean = model.model.module.get_embeds(input_ids)

, and we are using the get_embeds(input_ids) of Diffusion-LM.

For decoding, we actually load the. trained embedding. As shown in

Diffusion-LM/improved-diffusion/scripts/text_sample.py

Line 90 in 759889d

if args.training_mode.startswith('e2e'):

, when training_mode='e2e', we quickly overwrite the embeddings into the pre-trained one, by setting model2.weight = th.nn.Parameter(model.word_embedding.weight.clone().cpu()).

Hope this helps.

from diffusion-lm.

smiles724 commented on July 24, 2024

Thanks for your reply. I got a better understanding of the code with your response. I believe your code would be more readable if you could explain it more! Previously, I thought 'e2e' means 'English2English' (forgive me. )

from diffusion-lm.

smiles724 commented on July 24, 2024

However, I wonder why you loaded the weight of 'word_embedding' into the weight of 'lm_head'.

As far as I know, the dimension of 'word_embeding' is (vocab_size, in_channels), while the dimension of 'lm_head' is (in_channels, vocab_size). Should the parameters of 'lm_head' be learnable instead of using the same weight of 'word_embedding'?

Can you please give me some hints regarding this implementation? Thanks a lot.

from diffusion-lm.

Recommend Projects

Is the embedding model trainable during the training process? about diffusion-lm HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent