Coder Social home page Coder Social logo

multidoc2dial's Introduction

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

This repository provides data and code for the corresponding paper "MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents" (EMNLP 2021) by Song Feng , Siva Sankalp Patel, Wan Hui and Sachindra Joshi. Please cite the paper and star the repository if you find the paper, data and code useful for your work.

@inproceedings{feng2021multidoc2dial,
    title={MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents},
    author={Feng, Song and Patel, Siva Sankalp and Wan, Hui and Joshi, Sachindra},
    booktitle={EMNLP},
    year={2021}
}

Installation

Please refer to conda_env.yml for creating a virtual environment.

conda env create -f conda_env.yml

Our scripts require to set the following environment variables,

  • HF_HOME- for caching downloads from Huggingface locally.
  • CHECKPOINTS for saving the checkpoints.

Data

Please run the commands to download data. It will download the document and dialogue data into folder data/multidoc2dial.

cd scripts
./run_download.sh

Document preprocessing

To segment the document into passages, please refer to

run_data_preprocessing.sh

Data preprocessing for fine-tuning DPR

If you are finetuning DPR on MultiDoc2Dial, please refer to run_data_preprocessing_dpr.sh create positive and negative examples in the format of DPR.

Run Baselines

Finetuning DPR

To finetune DPR, we use Facebook DPR (March 2021 release) with an effective batch size 128. You can finetune DPR on MultiDoc2Dial data yourself ; or use our finetuned version.

If you would like to finetune DPR yourself, please refer to Facebook DPR for detailed instructions.

Or

If you would like to use our finetuned DPR encoders, please use the the following paths as the model path to ctx or question encoder (for instance, run_converter_modelcard.sh),

  • sivasankalpp/dpr-multidoc2dial-token-question-encoder for fine-tuned DPR question encoder based on token-segmented document passages (link)
  • sivasankalpp/dpr-multidoc2dial-token-ctx-encoder for fine-tuned DPR ctx encoder based on token-segmented document passages (link)
  • sivasankalpp/dpr-multidoc2dial-structure-question-encoder fine-tuned DPR question encoder based on structure-segmented document passages (link)
  • sivasankalpp/dpr-multidoc2dial-structure-ctx-encoder for fine-tuned DPR ctx encoder based on structure-segmented document passages (link)

Using finetuned DPR encoders in RAG

If you obtain your own finetuned DPR checkpoints,

  1. Download the following files from RAG model cards to "../data" folder
  1. Convert your fine-tuned DPR checkpoint and add it to RAG model. Please refer to run_converter.sh.

OR

If you use our finetuned DPR encoders, please refer to run_converter_modelcard.sh.

Finetuning RAG

Our implementation is based on Huggingface RAG. Please refer to their README for more detailed explanations on document retrieval and finetuning RAG.

To create FAISS index, please refer to

run_kb_index.sh

To finetune RAG on MultiDoc2Dial data, please refer to

run_finetune_rag.sh

Evaluations

To evaluate the retrieval results (recall@n for passage and document level), please refer to

run_eval_rag_re.sh

To evaluate the generation results, please refer to

run_eval_rag_e2e.sh

Results

The evaluation results on the validation set of agent response generation task Please refer to the scripts for corresponding hyperparameters.

Model F1 EM BLEU r@1 r@5 r@10
D-token-nq 30.9 2.8 15.7 25.8 48.2 57.7
D-struct-nq 31.5 3.2 16.6 27.4 51.1 60.2
D-token-ft 33.2 3.4 18.8 35.2 63.4 72.9
D-struct-ft 33.7 3.5 19.5 37.5 67.0 75.8

Leaderboard

Please check out our leaderboard and Shared Task.

Acknowledgement

Our code is based on Huggingface Transformers. Our dataset is based on Doc2Dial. We thank the authors for sharing their great work.

multidoc2dial's People

Contributors

doc2dial avatar sivasankalpp avatar songfeng avatar stevemar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multidoc2dial's Issues

-nq model results and n_docs

Hello, thank you very much for making this baseline code available. I have tried to reproduce the results for the -nq and -ft models from the paper for Task I and Task II. I am getting lower results for F1, EM, and BLEU, especially for the -nq models. I have been reviewing the code to find any bugs that I may have in my own implementation and came across two questions:

  1. for the -nq models do you use dpr-question_encoder-single-nq-base and dpr-ctx_encoder-single-nq-base on huggingface
  2. additionally, the uploaded code uses n_docs = 5 for fine-tuning and n_docs = 10 in retieval, was this what was used in the paper or should I be using the same number of docs (5 or 10) for both?

Thank you again for your help!

Error in running converter

Hi, thank you for your impressive dataset. I encounted some questions when running DPR converter.

I have downloaded the checkpoint "checkpoint.retriever.single.nq.bert-base-encoder" from the DPR official repo,
and encounted missing key error when running run_converter.sh.
It seems that the only difference between "convert_dpr_original_checkpoint_to_pytorch.py" in repo and in huggingface DPR is that

key = key.replace("bert_model.encoder", "bert_model")

and that causes the code unwork.

So does this line of code indeed unnecessary, or there are mistakes in my usage?

Performance on BM25 retrieval baseline

I am running run_eval_rag_re.sh on BM25 baseline and seeing a much high result on retrieval results,

4168it [08:02,  9.60it/s]INFO:__main__:Using BM25 for retrieval
4176it [08:02,  9.88it/s]INFO:__main__:Using BM25 for retrieval
4184it [08:03,  8.91it/s]INFO:__main__:Using BM25 for retrieval
4192it [08:04, 10.06it/s]INFO:__main__:Using BM25 for retrieval
4201it [08:05,  8.65it/s]
INFO:__main__:Using BM25 for retrieval
INFO:__main__:Doc_Prec@1:  43.18
INFO:__main__:Doc_Prec@5:  67.20
INFO:__main__:Doc_Prec@10:  74.53
INFO:__main__:Pid_Prec@5:  19.45
INFO:__main__:Pid_Prec@5:  40.75
INFO:__main__:Pid_Prec@10:  48.56
INFO:__main__:all:  43.18 &  67.20 &  74.53  &  19.45 &  40.75 &  48.56 &

Settings:
domain=all seg=token score=original task=grounding split=val

Additional parameters:
--bm25 ../data/mdd_kb/mdd-$seg-$domain.csv

Input files are generated by predecessor scripts with same settings.
Datas are generated by run_data_preprocessing.sh.
Index files are generated by run_kb_index.sh.
Checkpoints are generated by run_finetune_rag.sh, with DPR checkpoints generated by run_converter.sh on finetuned DPR checkpoints.
(And if I am not mistaken, although required by the code, RAG checkpoints will not affect the results of run_eval_rag_re.sh with bm25 given).

So any mistake in my usage or understanding?

By the way, I am a bit confusing on the grounding span generation task (Table 4) in the paper.
Does it correspond to the result of run_eval_rag_re.sh? But it dosen't contain F1, EM and BL.
And does the D^token-rr-cls-ft means joint training of DPR question encoder and RAG generator,
while D^token-ft use finetuned DPR directly?
I would be appreciated if you could clarify my confusions.

Question about data preprocessing

Hi, I have a question about the data preprocessing (data_preprocessor.py) and want to seek some help:
When calling the rm_blank of grounding, why the is_shorten is set as True?
Thanks!

image

image

question about data preprocessing

Hi, when I try to run your code, I found I can't download datasets by using load_dataset . The error is "HF google storage unreachable. Downloading and preparing it from source" . Although I have used vpn, the problem is stil here. So I want to download data manually, But I found the data is mismatched in some field. Could u help me? thanks.
image

Question about using multiple gpus

Hi! I'm having some trouble using multiple gpus for run_finetune_rag_dialdoc.sh file.

I have set --gpus parameter as 4 but i kept getting errors as below.

ValueError: ProcessGroupGloo::scatter: invalid tensor type at index 0 (expected TensorOptions(dtype=double, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

So I have modified a line 159 in dialdoc/models/rag/distributed_pytorch_retriever.py file by not specifying target_type variable.
retrieved_doc_embeds = self._scattered(scatter_vectors, [n_queries, n_docs, combined_hidden_states.shape[1]])`

After this modification, i am getting errors as below and I couldn't figure out why I'm getting this error.

File "/home/yunah/multidoc2dial_ours/dialdoc/models/rag/distributed_pytorch_retriever.py", line 157, in retrieve
doc_ids = self._scattered(scatter_ids, [n_queries, n_docs], target_type=torch.int64)
File "/home/yunah/multidoc2dial_ours/dialdoc/models/rag/distributed_pytorch_retriever.py", line 82, in _scattered
dist.scatter(target_tensor, src=0, scatter_list=scatter_list, group=self.process_group)
File "/home/yunah/.conda/envs/multidoc2dial/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 2191, in scatter
work = group.scatter(output_tensors, input_tensors, opts)
ValueError: ProcessGroupGloo::scatter: Incorrect input list size 1. Input list size should be 2, same as size of the process group.

Did I miss any other variables or settings I should change before using multiple gpus?
I would like to know if there is a solution for this error.
Thanks a lot!

Best,
Yunah

How to reproduce the retrieval results in Table 5?

Hi,

It is really a nice and cool dataset. I am wondering how to reproduce the results in Table 5 from your paper, i.e. the retrieval results on validation set. I searched your codebase but found no script to do that (or I missed anything?).

Table 5

Thanks for any help.

Yiwei

About data download

It seems that the link 404 cannot download the data. Where can I download the data?

no run_finetune_rag.sh and missing positional argument: 'logits_processor'

as it, no run_finetune_rag.sh.
i've done all the above in README.md, as that:
export HF_HOME and CHECKPOINTS. (many files have been downloaded to cache and ckpt dir and it seems work.)
cd scripts
./run_download.sh
run_data_preprocessing.sh
run_kb_index.sh
and up to then. it hasn't received any traceback
btw, i didn't create the same conda-env as conda_env.yml do. but have confirmed mainly pkgs the same version. i'll pub that if need as its too long to pub here.

then, as there is no run_finetune_rag.sh, i tried to bash run_finetune_rag_dialdoc.sh. it's maintains ok until a traceback accured after 'validation sanity check'. it seems not a pkg-version error but a coding error.
can u give a solution or just a hint for me to modify it?

Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/datav/my/multidoc2dial/scripts/rag/finetune_rag_dialdoc.py", line 714, in
main(args)
File "/datav/my/multidoc2dial/scripts/rag/finetune_rag_dialdoc.py", line 676, in main
trainer: pl.Trainer = generic_train(
File "/datav/my/multidoc2dial/scripts/rag/lightning_base.py", line 389, in generic_train
trainer.fit(model)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit
results = self.accelerator_backend.train()
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train
return self.train_or_test()
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
results = self.trainer.train()
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in train
self.run_sanity_check(self.get_model())
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 731, in run_sanity_check
_, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 643, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 171, in evaluation_step
output = self.trainer.accelerator_backend.validation_step(args)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 73, in validation_step
return self._step(self.trainer.model.validation_step, args)
File "/datav/software/anaconda3/envs/py39th19/lib/python3.9/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 63, in _step
output = model_step(*args)
File "/datav/my/multidoc2dial/scripts/rag/finetune_rag_dialdoc.py", line 321, in validation_step
return self._generative_step(batch)
File "/datav/my/multidoc2dial/scripts/rag/finetune_rag_dialdoc.py", line 363, in _generative_step
generated_ids = self.model.generate(
File "/datav/my/multidoc2dial/dialdoc/models/rag/modeling_rag_dialdoc.py", line 700, in generate
pre_processor = self._get_logits_processor(
TypeError: _get_logits_processor() missing 1 required positional argument: 'logits_processor'

How to generate prediction file for sharetask?

Hi!

I am working on the sharetask and am wondering is there existing script in the repo to generate files like the sample predictions file included in the sharedtask folder?

Thanks a lot!

Sharing unseen-domain data

Thanks for sharing the test data which includes the seen-domain data. Is there any plan to release the unseen-domain part as well?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.