Reproducing GALAXY about galaxy HOT 7 CLOSED

siat-nlp commented on May 25, 2024

Reproducing GALAXY

from galaxy.

Comments (7)

gaokaizhi commented on May 25, 2024

Thank you for releasing the code to the public!
I am trying to reproduce the pre-train checkpoint you shared on Github, but I could not get the same checkpoint for somehow. So several questions came to my mind:
Q1: Stopping criteria for choosing pre-training & fine-tuning checkpoints. It seems to me that the stopping criteria is not based on validation loss. What was the criteria for choosing the final epoch number? For example, you said epoch 14 for pre-training and epoch 7 for MultiWOZ2.0. I wonder how you came up with the number.
Q2: The number of pre-training data. The UniDA dataset you shared on Github has 463,039, but this seems smaller than the sum of the training sets in eight datasets used for UniDA (according to the paper). Did you get the same checkpoint with the data you currently uploaded?
Q3: GPU machines used for pre-training. It would be great if you could share what GPU machines you used to pre-train the GALAXY checkpoint. I am guessing that might be one of the reasons why I do not get the same result. Thanks!

Hello, may I ask what the result is after you make Finetune with GALAXY

from galaxy.

richlee123 commented on May 25, 2024

Thanks for the reply.

Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.

What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

from galaxy.

gaokaizhi commented on May 25, 2024

Thanks for the reply.

Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.

What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?

from galaxy.

richlee123 commented on May 25, 2024

Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?

Actually, the fine-tuning part worked for me. I ran the same code that the authors released.

sh scripts/multiwoz2.0/train.sh # Training on MultiWOZ2.0 (8 GPUs)

They used 8 GPUs so I followed the same procedure.

from galaxy.

HwwAncient commented on May 25, 2024

Thanks for the reply.

Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.

What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

Thanks for your interest in GALAXY.
We pre-train GALAXY on eight 40G A100 GPU cards for 60 epochs and choose the best epoch according to the performance of the downstream tasks. During pre-training, the batch size of each card is set to 32. I suggest that you may try different pre-training epochs to solve downstream tasks instead of 14 to eliminate some differences. Besides, we use the combination of UniDA and UniDial as our pre-training data instead of just UniDA.

from galaxy.

HwwAncient commented on May 25, 2024

Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.

Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?

Thanks for your interest in GALAXY.
You needn't modify any hyperparameters in the 'train.sh' and could directly run it to reproduce all downstream results. The key point is to maintain the batch size = 32 regardless of the number of GPU cards. We fine-tune GALAXY on eight 40G A100 GPU cards. However, as said in 'README.md', you can also jointly tune the hyper-parameter 'BATCH_SIZE' and 'GRADIENT_ACCUMULATION_STEPS' to maintain the originally offered batch size (32) according to your number of GPUs.

from galaxy.

richlee123 commented on May 25, 2024

Thank you for the response!

from galaxy.

Reproducing GALAXY about galaxy HOT 7 CLOSED

Comments (7)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent