ibm / multidoc2dial Goto Github PK

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

License: Apache License 2.0

Python 96.23% Shell 3.77%

multidoc2dial's Introduction

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

This repository provides data and code for the corresponding paper "MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents" (EMNLP 2021) by Song Feng , Siva Sankalp Patel, Wan Hui and Sachindra Joshi. Please cite the paper and star the repository if you find the paper, data and code useful for your work.

@inproceedings{feng2021multidoc2dial,
    title={MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents},
    author={Feng, Song and Patel, Siva Sankalp and Wan, Hui and Joshi, Sachindra},
    booktitle={EMNLP},
    year={2021}
}

Installation

Please refer to conda_env.yml for creating a virtual environment.

conda env create -f conda_env.yml

Our scripts require to set the following environment variables,

HF_HOME- for caching downloads from Huggingface locally.
CHECKPOINTS for saving the checkpoints.

Data

Please run the commands to download data. It will download the document and dialogue data into folder data/multidoc2dial.

cd scripts
./run_download.sh

Document preprocessing

To segment the document into passages, please refer to

run_data_preprocessing.sh

Data preprocessing for fine-tuning DPR

If you are finetuning DPR on MultiDoc2Dial, please refer to run_data_preprocessing_dpr.sh create positive and negative examples in the format of DPR.

Run Baselines

Finetuning DPR

To finetune DPR, we use Facebook DPR (March 2021 release) with an effective batch size 128. You can finetune DPR on MultiDoc2Dial data yourself ; or use our finetuned version.

If you would like to finetune DPR yourself, please refer to Facebook DPR for detailed instructions.

If you would like to use our finetuned DPR encoders, please use the the following paths as the model path to ctx or question encoder (for instance, run_converter_modelcard.sh),

sivasankalpp/dpr-multidoc2dial-token-question-encoder for fine-tuned DPR question encoder based on token-segmented document passages (link)
sivasankalpp/dpr-multidoc2dial-token-ctx-encoder for fine-tuned DPR ctx encoder based on token-segmented document passages (link)
sivasankalpp/dpr-multidoc2dial-structure-question-encoder fine-tuned DPR question encoder based on structure-segmented document passages (link)
sivasankalpp/dpr-multidoc2dial-structure-ctx-encoder for fine-tuned DPR ctx encoder based on structure-segmented document passages (link)

Using finetuned DPR encoders in RAG

If you obtain your own finetuned DPR checkpoints,

Download the following files from RAG model cards to "../data" folder

Convert your fine-tuned DPR checkpoint and add it to RAG model. Please refer to run_converter.sh.

If you use our finetuned DPR encoders, please refer to run_converter_modelcard.sh.

Finetuning RAG

Our implementation is based on Huggingface RAG. Please refer to their README for more detailed explanations on document retrieval and finetuning RAG.

To create FAISS index, please refer to

run_kb_index.sh

To finetune RAG on MultiDoc2Dial data, please refer to

run_finetune_rag.sh

Evaluations

To evaluate the retrieval results (recall@n for passage and document level), please refer to

run_eval_rag_re.sh

To evaluate the generation results, please refer to

run_eval_rag_e2e.sh

Results

The evaluation results on the validation set of agent response generation task Please refer to the scripts for corresponding hyperparameters.

Model	F1	EM	BLEU	r@1	r@5	r@10
D-token-nq	30.9	2.8	15.7	25.8	48.2	57.7
D-struct-nq	31.5	3.2	16.6	27.4	51.1	60.2
D-token-ft	33.2	3.4	18.8	35.2	63.4	72.9
D-struct-ft	33.7	3.5	19.5	37.5	67.0	75.8

Leaderboard

Please check out our leaderboard and Shared Task.

Acknowledgement

Our code is based on Huggingface Transformers. Our dataset is based on Doc2Dial. We thank the authors for sharing their great work.

multidoc2dial's People

Contributors

Stargazers

Watchers

multidoc2dial's Issues

-nq model results and n_docs

Hello, thank you very much for making this baseline code available. I have tried to reproduce the results for the -nq and -ft models from the paper for Task I and Task II. I am getting lower results for F1, EM, and BLEU, especially for the -nq models. I have been reviewing the code to find any bugs that I may have in my own implementation and came across two questions:

for the -nq models do you use dpr-question_encoder-single-nq-base and dpr-ctx_encoder-single-nq-base on huggingface
additionally, the uploaded code uses n_docs = 5 for fine-tuning and n_docs = 10 in retieval, was this what was used in the paper or should I be using the same number of docs (5 or 10) for both?

Thank you again for your help!

Error in running converter

Hi, thank you for your impressive dataset. I encounted some questions when running DPR converter.

I have downloaded the checkpoint "checkpoint.retriever.single.nq.bert-base-encoder" from the DPR official repo,
and encounted missing key error when running run_converter.sh.
It seems that the only difference between "convert_dpr_original_checkpoint_to_pytorch.py" in repo and in huggingface DPR is that

key = key.replace("bert_model.encoder", "bert_model")

and that causes the code unwork.

So does this line of code indeed unnecessary, or there are mistakes in my usage?

Performance on BM25 retrieval baseline

I am running run_eval_rag_re.sh on BM25 baseline and seeing a much high result on retrieval results,

4168it [08:02,  9.60it/s]INFO:__main__:Using BM25 for retrieval
4176it [08:02,  9.88it/s]INFO:__main__:Using BM25 for retrieval
4184it [08:03,  8.91it/s]INFO:__main__:Using BM25 for retrieval
4192it [08:04, 10.06it/s]INFO:__main__:Using BM25 for retrieval
4201it [08:05,  8.65it/s]
INFO:__main__:Using BM25 for retrieval
INFO:__main__:Doc_Prec@1:  43.18
INFO:__main__:Doc_Prec@5:  67.20
INFO:__main__:Doc_Prec@10:  74.53
INFO:__main__:Pid_Prec@5:  19.45
INFO:__main__:Pid_Prec@5:  40.75
INFO:__main__:Pid_Prec@10:  48.56
INFO:__main__:all:  43.18 &  67.20 &  74.53  &  19.45 &  40.75 &  48.56 &

Settings:
domain=all seg=token score=original task=grounding split=val

Additional parameters:
--bm25 ../data/mdd_kb/mdd-$seg-$domain.csv

Input files are generated by predecessor scripts with same settings.
Datas are generated by run_data_preprocessing.sh.
Index files are generated by run_kb_index.sh.
Checkpoints are generated by run_finetune_rag.sh, with DPR checkpoints generated by run_converter.sh on finetuned DPR checkpoints.
(And if I am not mistaken, although required by the code, RAG checkpoints will not affect the results of run_eval_rag_re.sh with bm25 given).

So any mistake in my usage or understanding?

By the way, I am a bit confusing on the grounding span generation task (Table 4) in the paper.
Does it correspond to the result of run_eval_rag_re.sh? But it dosen't contain F1, EM and BL.
And does the D^token-rr-cls-ft means joint training of DPR question encoder and RAG generator,
while D^token-ft use finetuned DPR directly?
I would be appreciated if you could clarify my confusions.

Question about data preprocessing

Hi, I have a question about the data preprocessing (data_preprocessor.py) and want to seek some help:
When calling the rm_blank of grounding, why the is_shorten is set as True?
Thanks!

question about data preprocessing

Hi, when I try to run your code, I found I can't download datasets by using load_dataset . The error is "HF google storage unreachable. Downloading and preparing it from source" . Although I have used vpn, the problem is stil here. So I want to download data manually, But I found the data is mismatched in some field. Could u help me? thanks.

Question about using multiple gpus

Hi! I'm having some trouble using multiple gpus for run_finetune_rag_dialdoc.sh file.

I have set --gpus parameter as 4 but i kept getting errors as below.

ValueError: ProcessGroupGloo::scatter: invalid tensor type at index 0 (expected TensorOptions(dtype=double, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

So I have modified a line 159 in dialdoc/models/rag/distributed_pytorch_retriever.py file by not specifying target_type variable.
retrieved_doc_embeds = self._scattered(scatter_vectors, [n_queries, n_docs, combined_hidden_states.shape[1]])`

After this modification, i am getting errors as below and I couldn't figure out why I'm getting this error.

File "/home/yunah/multidoc2dial_ours/dialdoc/models/rag/distributed_pytorch_retriever.py", line 157, in retrieve
doc_ids = self._scattered(scatter_ids, [n_queries, n_docs], target_type=torch.int64)
File "/home/yunah/multidoc2dial_ours/dialdoc/models/rag/distributed_pytorch_retriever.py", line 82, in _scattered
dist.scatter(target_tensor, src=0, scatter_list=scatter_list, group=self.process_group)
File "/home/yunah/.conda/envs/multidoc2dial/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 2191, in scatter
work = group.scatter(output_tensors, input_tensors, opts)
ValueError: ProcessGroupGloo::scatter: Incorrect input list size 1. Input list size should be 2, same as size of the process group.

Did I miss any other variables or settings I should change before using multiple gpus?
I would like to know if there is a solution for this error.
Thanks a lot!

Best,
Yunah

the link for script run_sharedtask_eval.sh is corrupted

Hi, I can't access the run_sharedtask_eval.sh . It gives 404 error. Could you please check.
Thanks

How to reproduce the retrieval results in Table 5?

Hi,

It is really a nice and cool dataset. I am wondering how to reproduce the results in Table 5 from your paper, i.e. the retrieval results on validation set. I searched your codebase but found no script to do that (or I missed anything?).

Thanks for any help.

Yiwei

About data download

It seems that the link 404 cannot download the data. Where can I download the data?

no run_finetune_rag.sh and missing positional argument: 'logits_processor'

as it, no run_finetune_rag.sh.
i've done all the above in README.md, as that:
export HF_HOME and CHECKPOINTS. (many files have been downloaded to cache and ckpt dir and it seems work.)
cd scripts
./run_download.sh
run_data_preprocessing.sh
run_kb_index.sh
and up to then. it hasn't received any traceback
btw, i didn't create the same conda-env as conda_env.yml do. but have confirmed mainly pkgs the same version. i'll pub that if need as its too long to pub here.

then, as there is no run_finetune_rag.sh, i tried to bash run_finetune_rag_dialdoc.sh. it's maintains ok until a traceback accured after 'validation sanity check'. it seems not a pkg-version error but a coding error.
can u give a solution or just a hint for me to modify it?

Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/datav/my/multidoc2dial/scripts/rag/finetune_rag_dialdoc.py", line 714, in
main(args)
File "/datav/my/multidoc2dial/scripts/rag/finetune_rag_dialdoc.py", line 676, in main
trainer: pl.Trainer = generic_train(
File "/datav/my/multidoc2dial/scripts/rag/lightning_base.py", line 389, in generic_train
trainer.fit(model)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit
results = self.accelerator_backend.train()
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train
return self.train_or_test()
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
results = self.trainer.train()
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in train
self.run_sanity_check(self.get_model())
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 731, in run_sanity_check
_, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 643, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 171, in evaluation_step
output = self.trainer.accelerator_backend.validation_step(args)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 73, in validation_step
return self._step(self.trainer.model.validation_step, args)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 63, in _step
output = model_step(*args)
File "/datav/my/multidoc2dial/scripts/rag/finetune_rag_dialdoc.py", line 321, in validation_step
return self._generative_step(batch)
File "/datav/my/multidoc2dial/scripts/rag/finetune_rag_dialdoc.py", line 363, in _generative_step
generated_ids = self.model.generate(
File "/datav/my/multidoc2dial/dialdoc/models/rag/modeling_rag_dialdoc.py", line 700, in generate
pre_processor = self._get_logits_processor(
TypeError: _get_logits_processor() missing 1 required positional argument: 'logits_processor'

How to generate prediction file for sharetask?

Hi!

I am working on the sharetask and am wondering is there existing script in the repo to generate files like the sample predictions file included in the sharedtask folder?

Thanks a lot!

Sharing unseen-domain data

Thanks for sharing the test data which includes the seen-domain data. Is there any plan to release the unseen-domain part as well?

ibm / multidoc2dial Goto Github PK

multidoc2dial's Introduction

multidoc2dial's People

Contributors

Stargazers

Watchers

Forkers

multidoc2dial's Issues

Recommend Projects

Recommend Topics

Recommend Org