Coder Social home page Coder Social logo

Reproducing GALAXY about galaxy HOT 7 CLOSED

siat-nlp avatar siat-nlp commented on May 25, 2024
Reproducing GALAXY

from galaxy.

Comments (7)

gaokaizhi avatar gaokaizhi commented on May 25, 2024

Thank you for releasing the code to the public!
I am trying to reproduce the pre-train checkpoint you shared on Github, but I could not get the same checkpoint for somehow. So several questions came to my mind:
Q1: Stopping criteria for choosing pre-training & fine-tuning checkpoints. It seems to me that the stopping criteria is not based on validation loss. What was the criteria for choosing the final epoch number? For example, you said epoch 14 for pre-training and epoch 7 for MultiWOZ2.0. I wonder how you came up with the number.
Q2: The number of pre-training data. The UniDA dataset you shared on Github has 463,039, but this seems smaller than the sum of the training sets in eight datasets used for UniDA (according to the paper). Did you get the same checkpoint with the data you currently uploaded?
Q3: GPU machines used for pre-training. It would be great if you could share what GPU machines you used to pre-train the GALAXY checkpoint. I am guessing that might be one of the reasons why I do not get the same result. Thanks!

Hello, may I ask what the result is after you make Finetune with GALAXY

from galaxy.

richlee123 avatar richlee123 commented on May 25, 2024

Thanks for the reply.

Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.

What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

from galaxy.

gaokaizhi avatar gaokaizhi commented on May 25, 2024

Thanks for the reply.

Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.

What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?

from galaxy.

richlee123 avatar richlee123 commented on May 25, 2024

Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?

Actually, the fine-tuning part worked for me. I ran the same code that the authors released.

sh scripts/multiwoz2.0/train.sh # Training on MultiWOZ2.0 (8 GPUs)

They used 8 GPUs so I followed the same procedure.

from galaxy.

HwwAncient avatar HwwAncient commented on May 25, 2024

Thanks for the reply.

Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.

What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

Thanks for your interest in GALAXY.
We pre-train GALAXY on eight 40G A100 GPU cards for 60 epochs and choose the best epoch according to the performance of the downstream tasks. During pre-training, the batch size of each card is set to 32. I suggest that you may try different pre-training epochs to solve downstream tasks instead of 14 to eliminate some differences. Besides, we use the combination of UniDA and UniDial as our pre-training data instead of just UniDA.

from galaxy.

HwwAncient avatar HwwAncient commented on May 25, 2024

Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?

Thanks for your interest in GALAXY.
You needn't modify any hyperparameters in the 'train.sh' and could directly run it to reproduce all downstream results. The key point is to maintain the batch size = 32 regardless of the number of GPU cards. We fine-tune GALAXY on eight 40G A100 GPU cards. However, as said in 'README.md', you can also jointly tune the hyper-parameter 'BATCH_SIZE' and 'GRADIENT_ACCUMULATION_STEPS' to maintain the originally offered batch size (32) according to your number of GPUs.

from galaxy.

richlee123 avatar richlee123 commented on May 25, 2024

Thank you for the response!

from galaxy.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.