guxd / dialogbert Goto Github PK

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

License: Other

Python 100.00%

dialogbert's People

Contributors

Stargazers

Watchers

Forkers

trendingtechnology createrll xiaoshengjun yq-wen paul-ruban aturkelson daniel-izmaylov dim225 ssxy00 caffe-in ancythom majonessyltetoy allzero-kwon kim-yong-jae

dialogbert's Issues

DialogBERT methods of context?

Hi again,

I am curious about what methods the paper authors used for context with DialogBERT development? Did you use context prepending of input tokens for that? And how many conversational turns for context were used to obtain the DialogBERT research paper results?

Thanks in advance

Reproducing results from the paper and hyperparameters

Hi,

I'm trying to reproduce the results you reported in the paper and unable to do so with the set of current hyperparameters. One notable problem is with per_gpu_eval_batch_size=1. Keeping it as is takes a long time to do evaluation, but when I set it to a value > 1, the code breaks. I figured that might have something to do with the generate method of DialogBERT class. Here, for example

generated = torch.zeros((num_samples,1), dtype=torch.long, device=device).fill_(self.tokenizer.cls_token_id)
# [batch_sz x 1] (1=seq_len)

num_samples is used as batch_sz? I'm wondering if this is intended, or a typo, because when I change num_samples to batch_sz for generated tokens the code works. However when the generated text shapes up, it doesn't seem to match the context it is generated from.

Could you please share the hyperparameters you used and help solve per_gpu_eval_batch_size=1 problem.

Thanks

Question about model parameter size

I am interested in implementing gradient checkpointing to support DialogBERT-XL training. What would the level of effort be with modifying DialogBERT to support an equivalent parameter size as GPT2-XL?

Thanks in advance!

DataLoader Function

Hi Xiaodong,

Thanks for sharing the source code.

I have a question regarding data_loader function. Is there any reason to create mini-batches by adding the following inputs?

self.cls_utt = [tokenizer.cls_token_id, tokenizer.cls_token_id, tokenizer.sep_token_id]
self.sep_utt = [tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.sep_token_id]

The resulting output would be like: [[101, 101, 102], [contexts], [101, 102, 102]].

Best,

Dong

Checkpoints available?

Hi, thanks for making your work available.

Are there any checkpoints available for DialogBERT to experiment with, or does it have to be trained from scratch using main.py?

Can you share what hardware configuration including GPU or CPU memory that you used to train -Medium and -Large? I am getting OOM on a K80 even attempting to train -Medium.

And how long did it take to train -Medium and -Large?

Thanks in advance!

Error when using interact.py

Are there any downloadable checkpoints for this model or does it have to be trained from scratch using main.py?

Data processing

Thanks for your great work! I'd like to apply your method and model to a brand new dataset, but have no idea about how to preprocess our dataset to the required format. Could you release the data preprocessing script? it'll be of great help!

Python版本问题

我在安装tokenizers失败了，原因是 pip 版本太低了。Python必须是3.6吗？

test error

def load(self, args):
    # Load a trained model and vocabulary that you have fine-tuned
    assert args.reload_from>=0, "please specify the checkpoint iteration in args.reload_from"
    output_dir = os.path.join(f"./output/{args.model}/{args.model_size}/models/", f'checkpoint-{args.reload_from}')
    self.model = DialogBERT.from_pretrained(output_dir)
    self.model.to(args.device)

def from_pretrained(self, model_dir):
    self.encoder_config = BertConfig.from_pretrained(model_dir)
    self.tokenizer = BertTokenizer.from_pretrained(path.join(model_dir, 'tokenizer'), do_lower_case=True)
    self.utt_encoder = BertForPreTraining.from_pretrained(path.join(model_dir, 'utt_encoder'))
    self.context_encoder = BertForSequenceClassification.from_pretrained(path.join(model_dir, 'context_encoder'))
    self.context_mlm_trans = BertPredictionHeadTransform(self.encoder_config)
    self.context_mlm_trans.load_state_dict(torch.load(path.join(model_dir, 'context_mlm_trans.pkl')),strict= False)
    self.context_order_trans = SelfSorting(self.encoder_config.hidden_size)
    self.context_order_trans.load_state_dict(torch.load(path.join(model_dir, 'context_order_trans.pkl')), strict= False)
    self.decoder_config = BertConfig.from_pretrained(model_dir)
    self.decoder = BertLMHeadModel.from_pretrained(path.join(model_dir, 'decoder'))

File "D:\NLP\DialogBERT-master\solvers.py", line 77, in load
self.model.to(args.device)
AttributeError: 'NoneType' object has no attribute 'to'
DialogBERT.from_pretrained is none ,how can i solve it?

can't load pretrained model

self.context_mlm_trans and self.context_order_trans are expecting a different key-structure

RuntimeError: Error(s) in loading state_dict for BertPredictionHeadTransform:
Missing key(s) in state_dict: "dense.weight", "dense.bias", "LayerNorm.weight", "LayerNorm.bias".
Unexpected key(s) in state_dict: "utt_encoder.bert.embeddings.position_ids", "utt_encoder.bert.embeddings.word_embeddings.weight", "utt_encoder.bert.embeddings.position_embeddings.weight", "utt_encoder.bert.embeddings.token_type_embeddings.weight", "utt_encoder.bert.embeddings.LayerNorm.weight", "utt_encoder.bert.embeddings.LayerNorm.bias", "utt_encoder.bert.encoder.layer.0.attention.self.query.weight", "utt_encoder.bert.encoder.layer.0.attention.self.query.bias", "utt_encoder.bert.encoder.layer.0.attention.self.key.weight", "utt_encoder.bert.encoder.layer.0.attention.self.key.bias", "utt_encoder.bert.encoder.layer.0.attention.self.value.weight", "utt_encoder.bert.encoder.layer.0.attention.self.value.bias", "utt_encoder.bert.encoder.layer.0.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.0.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.0.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.0.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.0.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.0.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.0.output.dense.weight", "utt_encoder.bert.encoder.layer.0.output.dense.bias", "utt_encoder.bert.encoder.layer.0.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.0.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.1.attention.self.query.weight", "utt_encoder.bert.encoder.layer.1.attention.self.query.bias", "utt_encoder.bert.encoder.layer.1.attention.self.key.weight", "utt_encoder.bert.encoder.layer.1.attention.self.key.bias", "utt_encoder.bert.encoder.layer.1.attention.self.value.weight", "utt_encoder.bert.encoder.layer.1.attention.self.value.bias", "utt_encoder.bert.encoder.layer.1.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.1.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.1.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.1.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.1.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.1.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.1.output.dense.weight", "utt_encoder.bert.encoder.layer.1.output.dense.bias", "utt_encoder.bert.encoder.layer.1.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.1.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.2.attention.self.query.weight", "utt_encoder.bert.encoder.layer.2.attention.self.query.bias", "utt_encoder.bert.encoder.layer.2.attention.self.key.weight", "utt_encoder.bert.encoder.layer.2.attention.self.key.bias", "utt_encoder.bert.encoder.layer.2.attention.self.value.weight", "utt_encoder.bert.encoder.layer.2.attention.self.value.bias", "utt_encoder.bert.encoder.layer.2.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.2.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.2.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.2.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.2.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.2.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.2.output.dense.weight", "utt_encoder.bert.encoder.layer.2.output.dense.bias", "utt_encoder.bert.encoder.layer.2.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.2.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.3.attention.self.query.weight", "utt_encoder.bert.encoder.layer.3.attention.self.query.bias", "utt_encoder.bert.encoder.layer.3.attention.self.key.weight", "utt_encoder.bert.encoder.layer.3.attention.self.key.bias", "utt_encoder.bert.encoder.layer.3.attention.self.value.weight", "utt_encoder.bert.encoder.layer.3.attention.self.value.bias", "utt_encoder.bert.encoder.layer.3.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.3.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.3.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.3.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.3.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.3.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.3.output.dense.weight", "utt_encoder.bert.encoder.layer.3.output.dense.bias", "utt_encoder.bert.encoder.layer.3.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.3.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.4.attention.self.query.weight", "utt_encoder.bert.encoder.layer.4.attention.self.query.bias", "utt_encoder.bert.encoder.layer.4.attention.self.key.weight", "utt_encoder.bert.encoder.layer.4.attention.self.key.bias", "utt_encoder.bert.encoder.layer.4.attention.self.value.weight", "utt_encoder.bert.encoder.layer.4.attention.self.value.bias", "utt_encoder.bert.encoder.layer.4.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.4.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.4.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.4.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.4.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.4.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.4.output.dense.weight", "utt_encoder.bert.encoder.layer.4.output.dense.bias", "utt_encoder.bert.encoder.layer.4.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.4.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.5.attention.self.query.weight", "utt_encoder.bert.encoder.layer.5.attention.self.query.bias", "utt_encoder.bert.encoder.layer.5.attention.self.key.weight", "utt_encoder.bert.encoder.layer.5.attention.self.key.bias", "utt_encoder.bert.encoder.layer.5.attention.self.value.weight", "utt_encoder.bert.encoder.layer.5.attention.self.value.bias", "utt_encoder.bert.encoder.layer.5.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.5.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.5.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.5.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.5.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.5.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.5.output.dense.weight", "utt_encoder.bert.encoder.layer.5.output.dense.bias", "utt_encoder.bert.encoder.layer.5.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.5.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.6.attention.self.query.weight", "utt_encoder.bert.encoder.layer.6.attention.self.query.bias", "utt_encoder.bert.encoder.layer.6.attention.self.key.weight", "utt_encoder.bert.encoder.layer.6.attention.self.key.bias", "utt_encoder.bert.encoder.layer.6.attention.self.value.weight", "utt_encoder.bert.encoder.layer.6.attention.self.value.bias", "utt_encoder.bert.encoder.layer.6.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.6.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.6.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.6.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.6.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.6.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.6.output.dense.weight", "utt_encoder.bert.encoder.layer.6.output.dense.bias", "utt_encoder.bert.encoder.layer.6.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.6.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.7.attention.self.query.weight", "utt_encoder.bert.encoder.layer.7.attention.self.query.bias", "utt_encoder.bert.encoder.layer.7.attention.self.key.weight", "utt_encoder.bert.encoder.layer.7.attention.self.key.bias", "utt_encoder.bert.encoder.layer.7.attention.self.value.weight", "utt_encoder.bert.encoder.layer.7.attention.self.value.bias", "utt_encoder.bert.encoder.layer.7.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.7.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.7.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.7.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.7.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.7.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.7.output.dense.weight", "utt_encoder.bert.encoder.layer.7.output.dense.bias", "utt_encoder.bert.encoder.layer.7.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.7.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.8.attention.self.query.weight", "utt_encoder.bert.encoder.layer.8.attention.self.query.bias", "utt_encoder.bert.encoder.layer.8.attention.self.key.weight", "utt_encoder.bert.encoder.layer.8.attention.self.key.bias", "utt_encoder.bert.encoder.layer.8.attention.self.value.weight", "utt_encoder.bert.encoder.layer.8.attention.self.value.bias", "utt_encoder.bert.encoder.layer.8.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.8.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.8.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.8.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.8.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.8.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.8.output.dense.weight", "utt_encoder.bert.encoder.layer.8.output.dense.bias", "utt_encoder.bert.encoder.layer.8.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.8.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.9.attention.self.query.weight", "utt_encoder.bert.encoder.layer.9.attention.self.query.bias", "utt_encoder.bert.encoder.layer.9.attention.self.key.weight", "utt_encoder.bert.encoder.layer.9.attention.self.key.bias", "utt_encoder.bert.encoder.layer.9.attention.self.value.weight", "utt_encoder.bert.encoder.layer.9.attention.self.value.bias", "utt_encoder.bert.encoder.layer.9.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.9.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.9.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.9.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.9.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.9.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.9.output.dense.weight", "utt_encoder.bert.encoder.layer.9.output.dense.bias", "utt_encoder.bert.encoder.layer.9.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.9.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.10.attention.self.query.weight", "utt_encoder.bert.encoder.layer.10.attention.self.query.bias", "utt_encoder.bert.encoder.layer.10.attention.self.key.weight", "utt_encoder.bert.encoder.layer.10.attention.self.key.bias", "utt_encoder.bert.encoder.layer.10.attention.self.value.weight", "utt_encoder.bert.encoder.layer.10.attention.self.value.bias", "utt_encoder.bert.encoder.layer.10.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.10.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.10.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.10.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.10.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.10.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.10.output.dense.weight", "utt_encoder.bert.encoder.layer.10.output.dense.bias", "utt_encoder.bert.encoder.layer.10.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.10.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.11.attention.self.query.weight", "utt_encoder.bert.encoder.layer.11.attention.self.query.bias", "utt_encoder.bert.encoder.layer.11.attention.self.key.weight", "utt_encoder.bert.encoder.layer.11.attention.self.key.bias", "utt_encoder.bert.encoder.layer.11.attention.self.value.weight", "utt_encoder.bert.encoder.layer.11.attention.self.value.bias", "utt_encoder.bert.encoder.layer.11.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.11.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.11.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.11.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.11.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.11.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.11.output.dense.weight", "utt_encoder.bert.encoder.layer.11.output.dense.bias", "utt_encoder.bert.encoder.layer.11.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.11.output.LayerNorm.bias", "utt_encoder.bert.pooler.dense.weight", "utt_encoder.bert.pooler.dense.bias", "utt_encoder.cls.predictions.bias", "utt_encoder.cls.predictions.transform.dense.weight", "utt_encoder.cls.predictions.transform.dense.bias", "utt_encoder.cls.predictions.transform.LayerNorm.weight", "utt_encoder.cls.predictions.transform.LayerNorm.bias", "utt_encoder.cls.predictions.decoder.weight", "utt_encoder.cls.predictions.decoder.bias", "utt_encoder.cls.seq_relationship.weight", "utt_encoder.cls.seq_relationship.bias", "context_encoder.embeddings.position_ids", "context_encoder.embeddings.word_embeddings.weight", "context_encoder.embeddings.position_embeddings.weight", "context_encoder.embeddings.token_type_embeddings.weight", "context_encoder.embeddings.LayerNorm.weight", "context_encoder.embeddings.LayerNorm.bias", "context_encoder.encoder.layer.0.attention.self.query.weight", "context_encoder.encoder.layer.0.attention.self.query.bias", "context_encoder.encoder.layer.0.attention.self.key.weight", "context_encoder.encoder.layer.0.attention.self.key.bias", "context_encoder.encoder.layer.0.attention.self.value.weight", "context_encoder.encoder.layer.0.attention.self.value.bias", "context_encoder.encoder.layer.0.attention.output.dense.weight", "context_encoder.encoder.layer.0.attention.output.dense.bias", "context_encoder.encoder.layer.0.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.0.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.0.intermediate.dense.weight", "context_encoder.encoder.layer.0.intermediate.dense.bias", "context_encoder.encoder.layer.0.output.dense.weight", "context_encoder.encoder.layer.0.output.dense.bias", "context_encoder.encoder.layer.0.output.LayerNorm.weight", "context_encoder.encoder.layer.0.output.LayerNorm.bias", "context_encoder.encoder.layer.1.attention.self.query.weight", "context_encoder.encoder.layer.1.attention.self.query.bias", "context_encoder.encoder.layer.1.attention.self.key.weight", "context_encoder.encoder.layer.1.attention.self.key.bias", "context_encoder.encoder.layer.1.attention.self.value.weight", "context_encoder.encoder.layer.1.attention.self.value.bias", "context_encoder.encoder.layer.1.attention.output.dense.weight", "context_encoder.encoder.layer.1.attention.output.dense.bias", "context_encoder.encoder.layer.1.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.1.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.1.intermediate.dense.weight", "context_encoder.encoder.layer.1.intermediate.dense.bias", "context_encoder.encoder.layer.1.output.dense.weight", "context_encoder.encoder.layer.1.output.dense.bias", "context_encoder.encoder.layer.1.output.LayerNorm.weight", "context_encoder.encoder.layer.1.output.LayerNorm.bias", "context_encoder.encoder.layer.2.attention.self.query.weight", "context_encoder.encoder.layer.2.attention.self.query.bias", "context_encoder.encoder.layer.2.attention.self.key.weight", "context_encoder.encoder.layer.2.attention.self.key.bias", "context_encoder.encoder.layer.2.attention.self.value.weight", "context_encoder.encoder.layer.2.attention.self.value.bias", "context_encoder.encoder.layer.2.attention.output.dense.weight", "context_encoder.encoder.layer.2.attention.output.dense.bias", "context_encoder.encoder.layer.2.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.2.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.2.intermediate.dense.weight", "context_encoder.encoder.layer.2.intermediate.dense.bias", "context_encoder.encoder.layer.2.output.dense.weight", "context_encoder.encoder.layer.2.output.dense.bias", "context_encoder.encoder.layer.2.output.LayerNorm.weight", "context_encoder.encoder.layer.2.output.LayerNorm.bias", "context_encoder.encoder.layer.3.attention.self.query.weight", "context_encoder.encoder.layer.3.attention.self.query.bias", "context_encoder.encoder.layer.3.attention.self.key.weight", "context_encoder.encoder.layer.3.attention.self.key.bias", "context_encoder.encoder.layer.3.attention.self.value.weight", "context_encoder.encoder.layer.3.attention.self.value.bias", "context_encoder.encoder.layer.3.attention.output.dense.weight", "context_encoder.encoder.layer.3.attention.output.dense.bias", "context_encoder.encoder.layer.3.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.3.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.3.intermediate.dense.weight", "context_encoder.encoder.layer.3.intermediate.dense.bias", "context_encoder.encoder.layer.3.output.dense.weight", "context_encoder.encoder.layer.3.output.dense.bias", "context_encoder.encoder.layer.3.output.LayerNorm.weight", "context_encoder.encoder.layer.3.output.LayerNorm.bias", "context_encoder.encoder.layer.4.attention.self.query.weight", "context_encoder.encoder.layer.4.attention.self.query.bias", "context_encoder.encoder.layer.4.attention.self.key.weight", "context_encoder.encoder.layer.4.attention.self.key.bias", "context_encoder.encoder.layer.4.attention.self.value.weight", "context_encoder.encoder.layer.4.attention.self.value.bias", "context_encoder.encoder.layer.4.attention.output.dense.weight", "context_encoder.encoder.layer.4.attention.output.dense.bias", "context_encoder.encoder.layer.4.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.4.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.4.intermediate.dense.weight", "context_encoder.encoder.layer.4.intermediate.dense.bias", "context_encoder.encoder.layer.4.output.dense.weight", "context_encoder.encoder.layer.4.output.dense.bias", "context_encoder.encoder.layer.4.output.LayerNorm.weight", "context_encoder.encoder.layer.4.output.LayerNorm.bias", "context_encoder.encoder.layer.5.attention.self.query.weight", "context_encoder.encoder.layer.5.attention.self.query.bias", "context_encoder.encoder.layer.5.attention.self.key.weight", "context_encoder.encoder.layer.5.attention.self.key.bias", "context_encoder.encoder.layer.5.attention.self.value.weight", "context_encoder.encoder.layer.5.attention.self.value.bias", "context_encoder.encoder.layer.5.attention.output.dense.weight", "context_encoder.encoder.layer.5.attention.output.dense.bias", "context_encoder.encoder.layer.5.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.5.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.5.intermediate.dense.weight", "context_encoder.encoder.layer.5.intermediate.dense.bias", "context_encoder.encoder.layer.5.output.dense.weight", "context_encoder.encoder.layer.5.output.dense.bias", "context_encoder.encoder.layer.5.output.LayerNorm.weight", "context_encoder.encoder.layer.5.output.LayerNorm.bias", "context_encoder.encoder.layer.6.attention.self.query.weight", "context_encoder.encoder.layer.6.attention.self.query.bias", "context_encoder.encoder.layer.6.attention.self.key.weight", "context_encoder.encoder.layer.6.attention.self.key.bias", "context_encoder.encoder.layer.6.attention.self.value.weight", "context_encoder.encoder.layer.6.attention.self.value.bias", "context_encoder.encoder.layer.6.attention.output.dense.weight", "context_encoder.encoder.layer.6.attention.output.dense.bias", "context_encoder.encoder.layer.6.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.6.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.6.intermediate.dense.weight", "context_encoder.encoder.layer.6.intermediate.dense.bias", "context_encoder.encoder.layer.6.output.dense.weight", "context_encoder.encoder.layer.6.output.dense.bias", "context_encoder.encoder.layer.6.output.LayerNorm.weight", "context_encoder.encoder.layer.6.output.LayerNorm.bias", "context_encoder.encoder.layer.7.attention.self.query.weight", "context_encoder.encoder.layer.7.attention.self.query.bias", "context_encoder.encoder.layer.7.attention.self.key.weight", "context_encoder.encoder.layer.7.attention.self.key.bias", "context_encoder.encoder.layer.7.attention.self.value.weight", "context_encoder.encoder.layer.7.attention.self.value.bias", "context_encoder.encoder.layer.7.attention.output.dense.weight", "context_encoder.encoder.layer.7.attention.output.dense.bias", "context_encoder.encoder.layer.7.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.7.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.7.intermediate.dense.weight", "context_encoder.encoder.layer.7.intermediate.dense.bias", "context_encoder.encoder.layer.7.output.dense.weight", "context_encoder.encoder.layer.7.output.dense.bias", "context_encoder.encoder.layer.7.output.LayerNorm.weight", "context_encoder.encoder.layer.7.output.LayerNorm.bias", "context_encoder.encoder.layer.8.attention.self.query.weight", "context_encoder.encoder.layer.8.attention.self.query.bias", "context_encoder.encoder.layer.8.attention.self.key.weight", "context_encoder.encoder.layer.8.attention.self.key.bias", "context_encoder.encoder.layer.8.attention.self.value.weight", "context_encoder.encoder.layer.8.attention.self.value.bias", "context_encoder.encoder.layer.8.attention.output.dense.weight", "context_encoder.encoder.layer.8.attention.output.dense.bias", "context_encoder.encoder.layer.8.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.8.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.8.intermediate.dense.weight", "context_encoder.encoder.layer.8.intermediate.dense.bias", "context_encoder.encoder.layer.8.output.dense.weight", "context_encoder.encoder.layer.8.output.dense.bias", "context_encoder.encoder.layer.8.output.LayerNorm.weight", "context_encoder.encoder.layer.8.output.LayerNorm.bias", "context_encoder.encoder.layer.9.attention.self.query.weight", "context_encoder.encoder.layer.9.attention.self.query.bias", "context_encoder.encoder.layer.9.attention.self.key.weight", "context_encoder.encoder.layer.9.attention.self.key.bias", "context_encoder.encoder.layer.9.attention.self.value.weight", "context_encoder.encoder.layer.9.attention.self.value.bias", "context_encoder.encoder.layer.9.attention.output.dense.weight", "context_encoder.encoder.layer.9.attention.output.dense.bias", "context_encoder.encoder.layer.9.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.9.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.9.intermediate.dense.weight", "context_encoder.encoder.layer.9.intermediate.dense.bias", "context_encoder.encoder.layer.9.output.dense.weight", "context_encoder.encoder.layer.9.output.dense.bias", "context_encoder.encoder.layer.9.output.LayerNorm.weight", "context_encoder.encoder.layer.9.output.LayerNorm.bias", "context_encoder.encoder.layer.10.attention.self.query.weight", "context_encoder.encoder.layer.10.attention.self.query.bias", "context_encoder.encoder.layer.10.attention.self.key.weight", "context_encoder.encoder.layer.10.attention.self.key.bias", "context_encoder.encoder.layer.10.attention.self.value.weight", "context_encoder.encoder.layer.10.attention.self.value.bias", "context_encoder.encoder.layer.10.attention.output.dense.weight", "context_encoder.encoder.layer.10.attention.output.dense.bias", "context_encoder.encoder.layer.10.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.10.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.10.intermediate.dense.weight", "context_encoder.encoder.layer.10.intermediate.dense.bias", "context_encoder.encoder.layer.10.output.dense.weight", "context_encoder.encoder.layer.10.output.dense.bias", "context_encoder.encoder.layer.10.output.LayerNorm.weight", "context_encoder.encoder.layer.10.output.LayerNorm.bias", "context_encoder.encoder.layer.11.attention.self.query.weight", "context_encoder.encoder.layer.11.attention.self.query.bias", "context_encoder.encoder.layer.11.attention.self.key.weight", "context_encoder.encoder.layer.11.attention.self.key.bias", "context_encoder.encoder.layer.11.attention.self.value.weight", "context_encoder.encoder.layer.11.attention.self.value.bias", "context_encoder.encoder.layer.11.attention.output.dense.weight", "context_encoder.encoder.layer.11.attention.output.dense.bias", "context_encoder.encoder.layer.11.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.11.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.11.intermediate.dense.weight", "context_encoder.encoder.layer.11.intermediate.dense.bias", "context_encoder.encoder.layer.11.output.dense.weight", "context_encoder.encoder.layer.11.output.dense.bias", "context_encoder.encoder.layer.11.output.LayerNorm.weight", "context_encoder.encoder.layer.11.output.LayerNorm.bias", "context_encoder.pooler.dense.weight", "context_encoder.pooler.dense.bias", "context_mlm_trans.dense.weight", "context_mlm_trans.dense.bias", "context_mlm_trans.LayerNorm.weight", "context_mlm_trans.LayerNorm.bias", "context_order_trans.linear_in.weight", "decoder.bert.embeddings.position_ids", "decoder.bert.embeddings.word_embeddings.weight", "decoder.bert.embeddings.position_embeddings.weight", "decoder.bert.embeddings.token_type_embeddings.weight", "decoder.bert.embeddings.LayerNorm.weight", "decoder.bert.embeddings.LayerNorm.bias", "decoder.bert.encoder.layer.0.attention.self.query.weight", "decoder.bert.encoder.layer.0.attention.self.query.bias", "decoder.bert.encoder.layer.0.attention.self.key.weight", "decoder.bert.encoder.layer.0.attention.self.key.bias", "decoder.bert.encoder.layer.0.attention.self.value.weight", "decoder.bert.encoder.layer.0.attention.self.value.bias", "decoder.bert.encoder.layer.0.attention.output.dense.weight", "decoder.bert.encoder.layer.0.attention.output.dense.bias", "decoder.bert.encoder.layer.0.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.0.crossattention.self.query.weight", "decoder.bert.encoder.layer.0.crossattention.self.query.bias", "decoder.bert.encoder.layer.0.crossattention.self.key.weight", "decoder.bert.encoder.layer.0.crossattention.self.key.bias", "decoder.bert.encoder.layer.0.crossattention.self.value.weight", "decoder.bert.encoder.layer.0.crossattention.self.value.bias", "decoder.bert.encoder.layer.0.crossattention.output.dense.weight", "decoder.bert.encoder.layer.0.crossattention.output.dense.bias", "decoder.bert.encoder.layer.0.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.0.intermediate.dense.weight", "decoder.bert.encoder.layer.0.intermediate.dense.bias", "decoder.bert.encoder.layer.0.output.dense.weight", "decoder.bert.encoder.layer.0.output.dense.bias", "decoder.bert.encoder.layer.0.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.attention.self.query.weight", "decoder.bert.encoder.layer.1.attention.self.query.bias", "decoder.bert.encoder.layer.1.attention.self.key.weight", "decoder.bert.encoder.layer.1.attention.self.key.bias", "decoder.bert.encoder.layer.1.attention.self.value.weight", "decoder.bert.encoder.layer.1.attention.self.value.bias", "decoder.bert.encoder.layer.1.attention.output.dense.weight", "decoder.bert.encoder.layer.1.attention.output.dense.bias", "decoder.bert.encoder.layer.1.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.crossattention.self.query.weight", "decoder.bert.encoder.layer.1.crossattention.self.query.bias", "decoder.bert.encoder.layer.1.crossattention.self.key.weight", "decoder.bert.encoder.layer.1.crossattention.self.key.bias", "decoder.bert.encoder.layer.1.crossattention.self.value.weight", "decoder.bert.encoder.layer.1.crossattention.self.value.bias", "decoder.bert.encoder.layer.1.crossattention.output.dense.weight", "decoder.bert.encoder.layer.1.crossattention.output.dense.bias", "decoder.bert.encoder.layer.1.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.intermediate.dense.weight", "decoder.bert.encoder.layer.1.intermediate.dense.bias", "decoder.bert.encoder.layer.1.output.dense.weight", "decoder.bert.encoder.layer.1.output.dense.bias", "decoder.bert.encoder.layer.1.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.attention.self.query.weight", "decoder.bert.encoder.layer.2.attention.self.query.bias", "decoder.bert.encoder.layer.2.attention.self.key.weight", "decoder.bert.encoder.layer.2.attention.self.key.bias", "decoder.bert.encoder.layer.2.attention.self.value.weight", "decoder.bert.encoder.layer.2.attention.self.value.bias", "decoder.bert.encoder.layer.2.attention.output.dense.weight", "decoder.bert.encoder.layer.2.attention.output.dense.bias", "decoder.bert.encoder.layer.2.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.crossattention.self.query.weight", "decoder.bert.encoder.layer.2.crossattention.self.query.bias", "decoder.bert.encoder.layer.2.crossattention.self.key.weight", "decoder.bert.encoder.layer.2.crossattention.self.key.bias", "decoder.bert.encoder.layer.2.crossattention.self.value.weight", "decoder.bert.encoder.layer.2.crossattention.self.value.bias", "decoder.bert.encoder.layer.2.crossattention.output.dense.weight", "decoder.bert.encoder.layer.2.crossattention.output.dense.bias", "decoder.bert.encoder.layer.2.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.intermediate.dense.weight", "decoder.bert.encoder.layer.2.intermediate.dense.bias", "decoder.bert.encoder.layer.2.output.dense.weight", "decoder.bert.encoder.layer.2.output.dense.bias", "decoder.bert.encoder.layer.2.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.attention.self.query.weight", "decoder.bert.encoder.layer.3.attention.self.query.bias", "decoder.bert.encoder.layer.3.attention.self.key.weight", "decoder.bert.encoder.layer.3.attention.self.key.bias", "decoder.bert.encoder.layer.3.attention.self.value.weight", "decoder.bert.encoder.layer.3.attention.self.value.bias", "decoder.bert.encoder.layer.3.attention.output.dense.weight", "decoder.bert.encoder.layer.3.attention.output.dense.bias", "decoder.bert.encoder.layer.3.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.crossattention.self.query.weight", "decoder.bert.encoder.layer.3.crossattention.self.query.bias", "decoder.bert.encoder.layer.3.crossattention.self.key.weight", "decoder.bert.encoder.layer.3.crossattention.self.key.bias", "decoder.bert.encoder.layer.3.crossattention.self.value.weight", "decoder.bert.encoder.layer.3.crossattention.self.value.bias", "decoder.bert.encoder.layer.3.crossattention.output.dense.weight", "decoder.bert.encoder.layer.3.crossattention.output.dense.bias", "decoder.bert.encoder.layer.3.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.intermediate.dense.weight", "decoder.bert.encoder.layer.3.intermediate.dense.bias", "decoder.bert.encoder.layer.3.output.dense.weight", "decoder.bert.encoder.layer.3.output.dense.bias", "decoder.bert.encoder.layer.3.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.attention.self.query.weight", "decoder.bert.encoder.layer.4.attention.self.query.bias", "decoder.bert.encoder.layer.4.attention.self.key.weight", "decoder.bert.encoder.layer.4.attention.self.key.bias", "decoder.bert.encoder.layer.4.attention.self.value.weight", "decoder.bert.encoder.layer.4.attention.self.value.bias", "decoder.bert.encoder.layer.4.attention.output.dense.weight", "decoder.bert.encoder.layer.4.attention.output.dense.bias", "decoder.bert.encoder.layer.4.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.crossattention.self.query.weight", "decoder.bert.encoder.layer.4.crossattention.self.query.bias", "decoder.bert.encoder.layer.4.crossattention.self.key.weight", "decoder.bert.encoder.layer.4.crossattention.self.key.bias", "decoder.bert.encoder.layer.4.crossattention.self.value.weight", "decoder.bert.encoder.layer.4.crossattention.self.value.bias", "decoder.bert.encoder.layer.4.crossattention.output.dense.weight", "decoder.bert.encoder.layer.4.crossattention.output.dense.bias", "decoder.bert.encoder.layer.4.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.intermediate.dense.weight", "decoder.bert.encoder.layer.4.intermediate.dense.bias", "decoder.bert.encoder.layer.4.output.dense.weight", "decoder.bert.encoder.layer.4.output.dense.bias", "decoder.bert.encoder.layer.4.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.attention.self.query.weight", "decoder.bert.encoder.layer.5.attention.self.query.bias", "decoder.bert.encoder.layer.5.attention.self.key.weight", "decoder.bert.encoder.layer.5.attention.self.key.bias", "decoder.bert.encoder.layer.5.attention.self.value.weight", "decoder.bert.encoder.layer.5.attention.self.value.bias", "decoder.bert.encoder.layer.5.attention.output.dense.weight", "decoder.bert.encoder.layer.5.attention.output.dense.bias", "decoder.bert.encoder.layer.5.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.crossattention.self.query.weight", "decoder.bert.encoder.layer.5.crossattention.self.query.bias", "decoder.bert.encoder.layer.5.crossattention.self.key.weight", "decoder.bert.encoder.layer.5.crossattention.self.key.bias", "decoder.bert.encoder.layer.5.crossattention.self.value.weight", "decoder.bert.encoder.layer.5.crossattention.self.value.bias", "decoder.bert.encoder.layer.5.crossattention.output.dense.weight", "decoder.bert.encoder.layer.5.crossattention.output.dense.bias", "decoder.bert.encoder.layer.5.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.intermediate.dense.weight", "decoder.bert.encoder.layer.5.intermediate.dense.bias", "decoder.bert.encoder.layer.5.output.dense.weight", "decoder.bert.encoder.layer.5.output.dense.bias", "decoder.bert.encoder.layer.5.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.attention.self.query.weight", "decoder.bert.encoder.layer.6.attention.self.query.bias", "decoder.bert.encoder.layer.6.attention.self.key.weight", "decoder.bert.encoder.layer.6.attention.self.key.bias", "decoder.bert.encoder.layer.6.attention.self.value.weight", "decoder.bert.encoder.layer.6.attention.self.value.bias", "decoder.bert.encoder.layer.6.attention.output.dense.weight", "decoder.bert.encoder.layer.6.attention.output.dense.bias", "decoder.bert.encoder.layer.6.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.crossattention.self.query.weight", "decoder.bert.encoder.layer.6.crossattention.self.query.bias", "decoder.bert.encoder.layer.6.crossattention.self.key.weight", "decoder.bert.encoder.layer.6.crossattention.self.key.bias", "decoder.bert.encoder.layer.6.crossattention.self.value.weight", "decoder.bert.encoder.layer.6.crossattention.self.value.bias", "decoder.bert.encoder.layer.6.crossattention.output.dense.weight", "decoder.bert.encoder.layer.6.crossattention.output.dense.bias", "decoder.bert.encoder.layer.6.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.intermediate.dense.weight", "decoder.bert.encoder.layer.6.intermediate.dense.bias", "decoder.bert.encoder.layer.6.output.dense.weight", "decoder.bert.encoder.layer.6.output.dense.bias", "decoder.bert.encoder.layer.6.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.attention.self.query.weight", "decoder.bert.encoder.layer.7.attention.self.query.bias", "decoder.bert.encoder.layer.7.attention.self.key.weight", "decoder.bert.encoder.layer.7.attention.self.key.bias", "decoder.bert.encoder.layer.7.attention.self.value.weight", "decoder.bert.encoder.layer.7.attention.self.value.bias", "decoder.bert.encoder.layer.7.attention.output.dense.weight", "decoder.bert.encoder.layer.7.attention.output.dense.bias", "decoder.bert.encoder.layer.7.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.crossattention.self.query.weight", "decoder.bert.encoder.layer.7.crossattention.self.query.bias", "decoder.bert.encoder.layer.7.crossattention.self.key.weight", "decoder.bert.encoder.layer.7.crossattention.self.key.bias", "decoder.bert.encoder.layer.7.crossattention.self.value.weight", "decoder.bert.encoder.layer.7.crossattention.self.value.bias", "decoder.bert.encoder.layer.7.crossattention.output.dense.weight", "decoder.bert.encoder.layer.7.crossattention.output.dense.bias", "decoder.bert.encoder.layer.7.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.intermediate.dense.weight", "decoder.bert.encoder.layer.7.intermediate.dense.bias", "decoder.bert.encoder.layer.7.output.dense.weight", "decoder.bert.encoder.layer.7.output.dense.bias", "decoder.bert.encoder.layer.7.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.attention.self.query.weight", "decoder.bert.encoder.layer.8.attention.self.query.bias", "decoder.bert.encoder.layer.8.attention.self.key.weight", "decoder.bert.encoder.layer.8.attention.self.key.bias", "decoder.bert.encoder.layer.8.attention.self.value.weight", "decoder.bert.encoder.layer.8.attention.self.value.bias", "decoder.bert.encoder.layer.8.attention.output.dense.weight", "decoder.bert.encoder.layer.8.attention.output.dense.bias", "decoder.bert.encoder.layer.8.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.crossattention.self.query.weight", "decoder.bert.encoder.layer.8.crossattention.self.query.bias", "decoder.bert.encoder.layer.8.crossattention.self.key.weight", "decoder.bert.encoder.layer.8.crossattention.self.key.bias", "decoder.bert.encoder.layer.8.crossattention.self.value.weight", "decoder.bert.encoder.layer.8.crossattention.self.value.bias", "decoder.bert.encoder.layer.8.crossattention.output.dense.weight", "decoder.bert.encoder.layer.8.crossattention.output.dense.bias", "decoder.bert.encoder.layer.8.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.intermediate.dense.weight", "decoder.bert.encoder.layer.8.intermediate.dense.bias", "decoder.bert.encoder.layer.8.output.dense.weight", "decoder.bert.encoder.layer.8.output.dense.bias", "decoder.bert.encoder.layer.8.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.attention.self.query.weight", "decoder.bert.encoder.layer.9.attention.self.query.bias", "decoder.bert.encoder.layer.9.attention.self.key.weight", "decoder.bert.encoder.layer.9.attention.self.key.bias", "decoder.bert.encoder.layer.9.attention.self.value.weight", "decoder.bert.encoder.layer.9.attention.self.value.bias", "decoder.bert.encoder.layer.9.attention.output.dense.weight", "decoder.bert.encoder.layer.9.attention.output.dense.bias", "decoder.bert.encoder.layer.9.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.crossattention.self.query.weight", "decoder.bert.encoder.layer.9.crossattention.self.query.bias", "decoder.bert.encoder.layer.9.crossattention.self.key.weight", "decoder.bert.encoder.layer.9.crossattention.self.key.bias", "decoder.bert.encoder.layer.9.crossattention.self.value.weight", "decoder.bert.encoder.layer.9.crossattention.self.value.bias", "decoder.bert.encoder.layer.9.crossattention.output.dense.weight", "decoder.bert.encoder.layer.9.crossattention.output.dense.bias", "decoder.bert.encoder.layer.9.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.intermediate.dense.weight", "decoder.bert.encoder.layer.9.intermediate.dense.bias", "decoder.bert.encoder.layer.9.output.dense.weight", "decoder.bert.encoder.layer.9.output.dense.bias", "decoder.bert.encoder.layer.9.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.attention.self.query.weight", "decoder.bert.encoder.layer.10.attention.self.query.bias", "decoder.bert.encoder.layer.10.attention.self.key.weight", "decoder.bert.encoder.layer.10.attention.self.key.bias", "decoder.bert.encoder.layer.10.attention.self.value.weight", "decoder.bert.encoder.layer.10.attention.self.value.bias", "decoder.bert.encoder.layer.10.attention.output.dense.weight", "decoder.bert.encoder.layer.10.attention.output.dense.bias", "decoder.bert.encoder.layer.10.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.crossattention.self.query.weight", "decoder.bert.encoder.layer.10.crossattention.self.query.bias", "decoder.bert.encoder.layer.10.crossattention.self.key.weight", "decoder.bert.encoder.layer.10.crossattention.self.key.bias", "decoder.bert.encoder.layer.10.crossattention.self.value.weight", "decoder.bert.encoder.layer.10.crossattention.self.value.bias", "decoder.bert.encoder.layer.10.crossattention.output.dense.weight", "decoder.bert.encoder.layer.10.crossattention.output.dense.bias", "decoder.bert.encoder.layer.10.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.intermediate.dense.weight", "decoder.bert.encoder.layer.10.intermediate.dense.bias", "decoder.bert.encoder.layer.10.output.dense.weight", "decoder.bert.encoder.layer.10.output.dense.bias", "decoder.bert.encoder.layer.10.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.attention.self.query.weight", "decoder.bert.encoder.layer.11.attention.self.query.bias", "decoder.bert.encoder.layer.11.attention.self.key.weight", "decoder.bert.encoder.layer.11.attention.self.key.bias", "decoder.bert.encoder.layer.11.attention.self.value.weight", "decoder.bert.encoder.layer.11.attention.self.value.bias", "decoder.bert.encoder.layer.11.attention.output.dense.weight", "decoder.bert.encoder.layer.11.attention.output.dense.bias", "decoder.bert.encoder.layer.11.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.crossattention.self.query.weight", "decoder.bert.encoder.layer.11.crossattention.self.query.bias", "decoder.bert.encoder.layer.11.crossattention.self.key.weight", "decoder.bert.encoder.layer.11.crossattention.self.key.bias", "decoder.bert.encoder.layer.11.crossattention.self.value.weight", "decoder.bert.encoder.layer.11.crossattention.self.value.bias", "decoder.bert.encoder.layer.11.crossattention.output.dense.weight", "decoder.bert.encoder.layer.11.crossattention.output.dense.bias", "decoder.bert.encoder.layer.11.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.intermediate.dense.weight", "decoder.bert.encoder.layer.11.intermediate.dense.bias", "decoder.bert.encoder.layer.11.output.dense.weight", "decoder.bert.encoder.layer.11.output.dense.bias", "decoder.bert.encoder.layer.11.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.output.LayerNorm.bias", "decoder.bert.pooler.dense.weight", "decoder.bert.pooler.dense.bias", "decoder.cls.predictions.bias", "decoder.cls.predictions.transform.dense.weight", "decoder.cls.predictions.transform.dense.bias", "decoder.cls.predictions.transform.LayerNorm.weight", "decoder.cls.predictions.transform.LayerNorm.bias", "decoder.cls.predictions.decoder.weight", "decoder.cls.predictions.decoder.bias".

Could you please share the script for preprocessing the original dialogues?

Hi, I found the code was refreshed 15 days ago.

I would like to use this model for a brand new dialogue dataset. I noticed that the data/ have h5 files such as dailydialog/train.h5. I also downloaded the original dailydialog dataset, but I do not know how to parse them to be train.h5.

Could you please share related script or source code please? thank you very much.

The project must be python 3.6 version ?

I failed to install tokenizers because the pip version is too low. Does Python have to be 3.6?

MUR Task

Hello, I would like to consult the following line of code.

, mlm_tgt_encodings, * = self.utt_encoder.bert(context_mlm_targets[ctx_mlm_mask], context_utts_attn_mask[ctx_mlm_mask])

context_mlm_targets[ctx_mlm_mask] represents the utterance tokenization before [MASK]
context_utts_attn_mask[ctx_mlm_mask] represents the attention mask after [MASK]

They don't match.
Why not recalculate the attention mask？

Model not converging?

Using the standard main.py training loop, I've been training on V100 tiny for almost a week but without it stopping? Is there additional hyperparameter tuning needed even to run the tiny training process?

python3 main.py --model_size=tiny --per_gpu_train_batch_size=24

avg_len = 12.61646884272997
bleu = 0.03122757749152926
meteor = 0.039703799201764936
nist = 0.12024726693793758
perplexity = 116.93566131591797
rouge-L = 0.05778559382996833
valid_loss = 4.761623978844736

Can you share what your final numbers were after training tiny and small?

[Feature Request] gradient checkpointing

gradient checkpointing would be super helpful for training.

Could you please share your parsed data or codes for preprocessing?

I noticed that the main.py loaded a ./data/dailydialog/train.h5. I also downloaded the original dailydialog dataset. But I have no idea that how to parse them to train.h5.

Could you please give me help?

Data processing steps

hi, the dataset in "data/dailydial/*.h5", how can I get this file form raw txt file?
Thanks in advance!

Issue with V100 Distributed Training

I have the following distributed training setup working without issue on Tesla K80, but whenever I attempt to do this with an 8X V100 the training process just silently hangs without dispatching any process to any of the GPUs:

export MASTER_PORT=29500
export MASTER_ADDR="127.0.0.1"
export WORLD_SIZE=8
export RANK=0
python3 main.py --model_size=large --per_gpu_train_batch_size=128 --local_rank 0

What's weird is that training works fine on a single GPU if I drop the --local_rank flag. While the process is just hanging, nothing is being dispatched to any of the GPUs:

$ sudo nvidia-smi
Sat May 15 21:53:13 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      On   | 00000000:10:1C.0 Off |                    0 |
| N/A   52C    P0    60W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  A100-SXM4-40GB      On   | 00000000:10:1D.0 Off |                    0 |
| N/A   47C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  A100-SXM4-40GB      On   | 00000000:20:1C.0 Off |                    0 |
| N/A   49C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  A100-SXM4-40GB      On   | 00000000:20:1D.0 Off |                    0 |
| N/A   45C    P0    55W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   4  A100-SXM4-40GB      On   | 00000000:90:1C.0 Off |                    0 |
| N/A   50C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   5  A100-SXM4-40GB      On   | 00000000:90:1D.0 Off |                    0 |
| N/A   45C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   6  A100-SXM4-40GB      On   | 00000000:A0:1C.0 Off |                    0 |
| N/A   52C    P0    60W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   7  A100-SXM4-40GB      On   | 00000000:A0:1D.0 Off |                    0 |
| N/A   48C    P0    57W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Any ideas?

Difficulty replicating results of the paper

I am training on the DailyDialog dataset with the same hyperparameters as described in the paper. I cannot seem to get the model to perform to the standards described in the paper, specifically the BLEU score for the testing data is half the reported value. In addition, looking at the generated text for the testing dataset shows that the model is generating responses that have little to do with the actual context. Are there any solutions to this?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.