Hi! I am working on the sharetask and am wondering is there existing

Hi Song, That all makes sense! Thanks for your help! <p dir="aut

How to generate prediction file for sharetask? about multidoc2dial HOT 9 OPEN

DannyLuo-zp commented on June 15, 2024 1

How to generate prediction file for sharetask?

from multidoc2dial.

Comments (9)

songfeng commented on June 15, 2024 4

Hi,

Thank you for the question!

We don't provide a script for generating prediction file in the format as there is no restriction on the final output format, which could be different from one another.

However, in case of the exact output by run_eval_rag_e2e.sh by the baseline code in this repo,
there needs to be a "qid.txt" and "predictions.txt" that are mapped with the "$split.source" by line number,
then we could do something like this,

import json
out = []
with open('predictions.txt') as fp_p:
    with open('ids.txt') as fp_id:
        for id_, text in zip(fp_id, fp_p):
            out += [{'id': id_.strip(), 'utterance': text.strip()}]
json.dump(out, open('output.json', 'w'), indent=4)

Does it make sense?

Thanks,
Song

from multidoc2dial.

songfeng commented on June 15, 2024 1

Hi Danny,

For the files (e.g. mdd_dev_pub.json) provided at leaderboard website, they are meant for evaluation or test time, when annotations such as da and references are not available. So, only the conversational utterances are provided as input. For the current baseline model, da or references are not used to predict utterance.

However, a model could utilize those annotations during training in certain ways, but it will also need to predict them during test time along with or before the generation of utterance.

Let me know if there is any question!

Thanks~

-Song

from multidoc2dial.

songfeng commented on June 15, 2024 1

Hi Danny,

It might help clarify to refer to Section 2.2.1 in the paper.

As indicated in the data processing script, we can set $task as either grounding or generation, where grounding corresponds to the task on predicting the grounding span.

Feel free to ping if there is any question! Thanks.

-Song

from multidoc2dial.

songfeng commented on June 15, 2024 1

Hi Danny,

For the evaluation, --eval_mode is the setting for evaluation metrics, not for the task. For evaluation metrics, e2e corresponds to text generation metrics such as sacrebleu while retrieval corresponds to the retrieval metrics such as recall@n at passage level or document level.

What might be helpful to emphasize, our tasks in the our MultiDoc2Dial paper are "predicting" grounding content or utterance, using the same approach, a "retriever-reader" model. Even though the prediction is on grounding span, the approach still uses BART model (see RAG paper) to generate, not retrieve the span.

Thanks,
Song

from multidoc2dial.

DannyLuo-zp commented on June 15, 2024 1

Hi Song,

Thanks! That makes sense, especially your clarification that the baseline is "predicting" the grounding content by a generator instead of predicting the (start,end) indices on actual document spans. Really appreciate your help!!

Best,
Danny

from multidoc2dial.

DannyLuo-zp commented on June 15, 2024

Hi Song,

Thanks so much for the reply! I truly appreciate your help!

It makes sense to me now how to generate custom prediction file using output generated by run_eval_rag_e2e.sh.
Another quick question I have is: it seems that the sharetask input file mdd_dev_pub.json I downloaded from the competition is of slightly different format than what the baseline model takes (namely, there is no references or da as keys in a turn ). Wondering is the current script of the baseline model compatible to evaluate using those input and to generate prediction or is it something we have to customize?

Thanks,
Danny

from multidoc2dial.

DannyLuo-zp commented on June 15, 2024

Hi Song,

That all makes sense! Thanks for your help!

Best,
Danny

from multidoc2dial.

DannyLuo-zp commented on June 15, 2024

Hi Song,

Can I ask another question? Based on my current understanding, run_eval_rag_re.sh will not output grounding prediction (as needed in sharedtask), but only output retrieval result on document level. I am wondering how to generate grounding prediction on a token level using this baseline model? Thanks a lot!

Best,
Danny

from multidoc2dial.

DannyLuo-zp commented on June 15, 2024

Hi Song,

Thanks for your reply! Yes I have fine-tuned the model on both tasks. Could you clarify one more thing for me? To reproduce results of Table 4 (Evaluation results of Task I on grounding span generation task) in your paper, I should still set the argument --eval_mode to be e2e instead of retrieval. (I think this was what confused me before) Let me know if this is right! Thanks so much!

Best,
Danny

from multidoc2dial.

How to generate prediction file for sharetask? about multidoc2dial HOT 9 OPEN

Comments (9)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent