Comments (7)
Thank you for releasing the code to the public!
I am trying to reproduce the pre-train checkpoint you shared on Github, but I could not get the same checkpoint for somehow. So several questions came to my mind:
Q1: Stopping criteria for choosing pre-training & fine-tuning checkpoints. It seems to me that the stopping criteria is not based on validation loss. What was the criteria for choosing the final epoch number? For example, you said epoch 14 for pre-training and epoch 7 for MultiWOZ2.0. I wonder how you came up with the number.
Q2: The number of pre-training data. The UniDA dataset you shared on Github has 463,039, but this seems smaller than the sum of the training sets in eight datasets used for UniDA (according to the paper). Did you get the same checkpoint with the data you currently uploaded?
Q3: GPU machines used for pre-training. It would be great if you could share what GPU machines you used to pre-train the GALAXY checkpoint. I am guessing that might be one of the reasons why I do not get the same result. Thanks!
Hello, may I ask what the result is after you make Finetune with GALAXY
from galaxy.
Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.
from galaxy.
Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.
Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?
from galaxy.
Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?
Actually, the fine-tuning part worked for me. I ran the same code that the authors released.
sh scripts/multiwoz2.0/train.sh # Training on MultiWOZ2.0 (8 GPUs)
They used 8 GPUs so I followed the same procedure.
from galaxy.
Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.
Thanks for your interest in GALAXY.
We pre-train GALAXY on eight 40G A100 GPU cards for 60 epochs and choose the best epoch according to the performance of the downstream tasks. During pre-training, the batch size of each card is set to 32. I suggest that you may try different pre-training epochs to solve downstream tasks instead of 14 to eliminate some differences. Besides, we use the combination of UniDA and UniDial as our pre-training data instead of just UniDA.
from galaxy.
Thanks for the reply.
Fine-tuning from the GALAXY checkpoint you shared exactly creates the same result reported in the paper (comb score of 110.35), assuming that epoch 7 is chosen based on the best comb score on the validation set.
What I was curious was how you get the GALAXY checkpoint (before fine-tuning). I tried pre-training from UniLM that you shared and I somehow did not get the same GALAXY checkpoint for pre-training (maybe due to different GPU machines or pre-training data). If I follow the code, pre-training from UniLM for 14 epochs and fine-tuning on MultiWOZ2.0 for 7 epochs, I get a comb score of 105.40, which is still SOTA, but not 110.35.Sorry, I'm not the author, I just use GALAXY to finetune and can't reproduce the results of the paper, so I would like to ask you if the parameters you use for finetune are the parameters in train. sh? And how many Gpus do you use for fine-tuning?
Thanks for your interest in GALAXY.
You needn't modify any hyperparameters in the 'train.sh' and could directly run it to reproduce all downstream results. The key point is to maintain the batch size = 32 regardless of the number of GPU cards. We fine-tune GALAXY on eight 40G A100 GPU cards. However, as said in 'README.md', you can also jointly tune the hyper-parameter 'BATCH_SIZE' and 'GRADIENT_ACCUMULATION_STEPS' to maintain the originally offered batch size (32) according to your number of GPUs.
from galaxy.
Thank you for the response!
from galaxy.
Related Issues (14)
- Question for codes and dataset HOT 2
- Two questions about the evaluation
- Evaluation on MultiWOZ
- running on CPU
- bug in the code
- Question about dynamic booking pointer during dialogue generation
- code and dataset HOT 3
- FileNotFoundError HOT 2
- The pretrain code? HOT 3
- About domain overlap in the dataset HOT 1
- when to release chinese dialogue pretrianing model ? HOT 1
- Questions about the labeled`UniDA` with development/test sets. HOT 2
- How to obtain delexicalized representations?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from galaxy.