Coder Social home page Coder Social logo

Comments (3)

XiangLi1999 avatar XiangLi1999 commented on July 24, 2024

Hi,

Thanks for the question, and for carefully studying the code!

We have experimented with various ways of initializing the word embeddings when training_mode='emb', it means random initialization; when training_mode='e2e', it means training end-to-end. For all the main experiments in the paper (except from ablations) we use --training_mode = 'e2e' to train the embeddings end-to-end. Inside the training code, the embedding step happens here:

x_start_mean = model.model.module.get_embeds(input_ids)
, and we are using the get_embeds(input_ids) of Diffusion-LM.

For decoding, we actually load the. trained embedding. As shown in

if args.training_mode.startswith('e2e'):
, when training_mode='e2e', we quickly overwrite the embeddings into the pre-trained one, by setting model2.weight = th.nn.Parameter(model.word_embedding.weight.clone().cpu()).

Hope this helps.

from diffusion-lm.

smiles724 avatar smiles724 commented on July 24, 2024

Thanks for your reply. I got a better understanding of the code with your response. I believe your code would be more readable if you could explain it more! Previously, I thought 'e2e' means 'English2English' (forgive me. )

from diffusion-lm.

smiles724 avatar smiles724 commented on July 24, 2024

However, I wonder why you loaded the weight of 'word_embedding' into the weight of 'lm_head'.

As far as I know, the dimension of 'word_embeding' is (vocab_size, in_channels), while the dimension of 'lm_head' is (in_channels, vocab_size). Should the parameters of 'lm_head' be learnable instead of using the same weight of 'word_embedding'?

image

Can you please give me some hints regarding this implementation? Thanks a lot.

from diffusion-lm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.