Coder Social home page Coder Social logo

Comments (9)

songfeng avatar songfeng commented on June 15, 2024 4

Hi,

Thank you for the question!

We don't provide a script for generating prediction file in the format as there is no restriction on the final output format, which could be different from one another.

However, in case of the exact output by run_eval_rag_e2e.sh by the baseline code in this repo,
there needs to be a "qid.txt" and "predictions.txt" that are mapped with the "$split.source" by line number,
then we could do something like this,

import json
out = []
with open('predictions.txt') as fp_p:
    with open('ids.txt') as fp_id:
        for id_, text in zip(fp_id, fp_p):
            out += [{'id': id_.strip(), 'utterance': text.strip()}]
json.dump(out, open('output.json', 'w'), indent=4)

Does it make sense?

Thanks,
Song

from multidoc2dial.

songfeng avatar songfeng commented on June 15, 2024 1

Hi Danny,

For the files (e.g. mdd_dev_pub.json) provided at leaderboard website, they are meant for evaluation or test time, when annotations such as da and references are not available. So, only the conversational utterances are provided as input. For the current baseline model, da or references are not used to predict utterance.

However, a model could utilize those annotations during training in certain ways, but it will also need to predict them during test time along with or before the generation of utterance.

Let me know if there is any question!

Thanks~

-Song

from multidoc2dial.

songfeng avatar songfeng commented on June 15, 2024 1

Hi Danny,

It might help clarify to refer to Section 2.2.1 in the paper.

As indicated in the data processing script, we can set $task as either grounding or generation, where grounding corresponds to the task on predicting the grounding span.

Feel free to ping if there is any question! Thanks.

-Song

from multidoc2dial.

songfeng avatar songfeng commented on June 15, 2024 1

Hi Danny,

For the evaluation, --eval_mode is the setting for evaluation metrics, not for the task. For evaluation metrics, e2e corresponds to text generation metrics such as sacrebleu while retrieval corresponds to the retrieval metrics such as recall@n at passage level or document level.

What might be helpful to emphasize, our tasks in the our MultiDoc2Dial paper are "predicting" grounding content or utterance, using the same approach, a "retriever-reader" model. Even though the prediction is on grounding span, the approach still uses BART model (see RAG paper) to generate, not retrieve the span.

Thanks,
Song

from multidoc2dial.

DannyLuo-zp avatar DannyLuo-zp commented on June 15, 2024 1

Hi Song,

Thanks! That makes sense, especially your clarification that the baseline is "predicting" the grounding content by a generator instead of predicting the (start,end) indices on actual document spans. Really appreciate your help!!

Best,
Danny

from multidoc2dial.

DannyLuo-zp avatar DannyLuo-zp commented on June 15, 2024

Hi Song,

Thanks so much for the reply! I truly appreciate your help!

It makes sense to me now how to generate custom prediction file using output generated by run_eval_rag_e2e.sh.
Another quick question I have is: it seems that the sharetask input file mdd_dev_pub.json I downloaded from the competition is of slightly different format than what the baseline model takes (namely, there is no references or da as keys in a turn ). Wondering is the current script of the baseline model compatible to evaluate using those input and to generate prediction or is it something we have to customize?

Thanks,
Danny

from multidoc2dial.

DannyLuo-zp avatar DannyLuo-zp commented on June 15, 2024

Hi Song,

That all makes sense! Thanks for your help!

Best,
Danny

from multidoc2dial.

DannyLuo-zp avatar DannyLuo-zp commented on June 15, 2024

Hi Song,

Can I ask another question? Based on my current understanding, run_eval_rag_re.sh will not output grounding prediction (as needed in sharedtask), but only output retrieval result on document level. I am wondering how to generate grounding prediction on a token level using this baseline model? Thanks a lot!

Best,
Danny

from multidoc2dial.

DannyLuo-zp avatar DannyLuo-zp commented on June 15, 2024

Hi Song,

Thanks for your reply! Yes I have fine-tuned the model on both tasks. Could you clarify one more thing for me? To reproduce results of Table 4 (Evaluation results of Task I on grounding span generation task) in your paper, I should still set the argument --eval_mode to be e2e instead of retrieval. (I think this was what confused me before) Let me know if this is right! Thanks so much!

Best,
Danny

from multidoc2dial.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.