devsinghsachan / emdr2 Goto Github PK

Code and Models for the paper "End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering" (NeurIPS 2021)

License: Other

Python 91.14% Dockerfile 0.11% C++ 7.66% Cuda 1.09%

information-retrieval natural-language-processing natural-questions nlp open-domain-qa open-domain-question-answering pytorch question-answering triviaqa webq

emdr2's People

Contributors

Stargazers

Watchers

Forkers

zhu-minjun 73bro shriyapalsamudram techthiyanes memray ehsk jongwon-jay-lee nvrayzhang rayzzq elshan2k joliang17

emdr2's Issues

Need help for applying your code on my own dataset

Hi Devendra, thanks for open sourcing this great project! I want to apply your code on my own Chinese dataset, but I am confused on how to process my dataset to get files like Pre-tokenized evidence passages and their titles and Wikipedia evidence passages from DPR paper you provided for open-domain QA tasks. Could you give me some advice to build them? Thanks in advance.

Looking forward your code and checkpoints

Fail to reproduce TriviaQA scores with released checkpoints

Hi @DevSinghSachan ,

Thanks for sharing the code and resources. I was trying to reproduce the reported results with the released checkpoints, and I'm able to reproduce most of them except for TriviaQA. The reported reader scores are 71.13/71.43 (Dev/Test), but my reproduced scores are 68.6/68.8, which looks very close to one of the FiD variants (MSS + DPR retriever, T5 reader). Can you check if the released ckpts for TriviaQA are correct?

Besides, I'd also like to know:

I didn't see the code for MSS pretraining? Will you release it as well?
A bert_110m checkpoint is used to initialize the retriever. I wonder where I can download it?
Can you elaborate how to reproduce the ablations in Table 2 (Our Implementation part)? It's not clear to me the difference between (1) FiD (MSS retriever, MSS reader); (2) FiD (MSS retriever, MSS reader) and (3) EMDR2(MSS retriever, MSS reader).

Thank you in advance!
Rui

docker doesn't work

question about evidence embedding file

the precomputed evidence embedding file is only 19GB if I download it by Google,and then I have a error message

Unpickling BlockData: /disk2/qby/Desktop/emdr2-main/embedding-path/emdr2-finetuning-embedding/psgs_w100-retriever-nq-emdr2-finetuning-base-topk50-epochs10-bsize64-async-indexer.pkl
Traceback (most recent call last):
File "tasks/run.py", line 67, in
main()
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 72, in main
open_retrieval_generative_qa(dataset_cls)
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 60, in open_retrieval_generative_qa
end_of_training_callback_provider=distributed_metrics_func_provider)
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/train_e2eqa.py", line 583, in train
model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)
File "/disk2/qby/Desktop/emdr2-main/megatron/training.py", line 134, in setup_model_and_optimizer
model = get_model(model_provider_func)
File "/disk2/qby/Desktop/emdr2-main/megatron/training.py", line 43, in get_model
model = model_provider_func()
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 36, in model_provider
evidence_retriever = PreComputedEvidenceDocsRetriever()
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 387, in init
self.precomputed_index_wrapper()
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 417, in precomputed_index_wrapper
self.get_evidence_embedding(args.embedding_path)
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 412, in get_evidence_embedding
load_from_path=True)
File "/disk2/qby/Desktop/emdr2-main/megatron/data/emdr2_index.py", line 28, in init
self.load_from_file()
File "/disk2/qby/Desktop/emdr2-main/megatron/data/emdr2_index.py", line 50, in load_from_file
state_dict = pickle.load(open(self.embedding_path, 'rb'))
_pickle.UnpicklingError: pickle data was truncated

devsinghsachan / emdr2 Goto Github PK

emdr2's People

Contributors

Stargazers

Watchers

Forkers

emdr2's Issues

Need help for applying your code on my own dataset

Looking forward your code and checkpoints

Fail to reproduce TriviaQA scores with released checkpoints

docker doesn't work

question about evidence embedding file

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent