Coder Social home page Coder Social logo

ama_prompting's Issues

Clarify the definition of "good-quality" prompt and "imperfect" prompt

Dear authors,

I just want to know what is the definition of "good-quality" prompt and "imperfect" prompt. Are my understanding correct?

Good-quality prompt = prompt which mostly contains ground-truth question-answer pairs. (the answers are more tend to be correct)
imperfect prompt = prompt which contains many false question-answer pairs. (the answers are more likely to be incorrect)

Thanks,
Dylan

Demonstration construction of SST2

Hi,

This work is amazing, thanks for providing the code!
I have some questions about it.

  1. In Line 47 of SST2_final.py, there is no ".sample(frac=1, random_state=0)", all demonstrations seem to be arranged by their labels in the few-shot baseline.
  2. For NQ and WebQs, the evaluation metric is span overlap accuracy instead of exact matching, does GPT-3 use the same metric?
  3. Can I change the order of demonstrations in the prompt chain? I tried it on WSC and found that the order can affect the evaluation results.

Thanks so much

problem when run tasks/RTE_final.py

I follow the instructions in readme, and meet this error. Do you know how to solve it?

ama_prompting/manifest/manifest/manifest.py", line 104, in init
raise ValueError(f"{list(kwargs.items())} arguments are not recognized.")
ValueError: [('session_id', None)] arguments are not recognized.

DROP Benchmark metrics seem low

Hi, I ran

python3 tasks/drop_final.py \
  --run_zeroshot 1 \
  --run_fewshot 1 \
  --run_decomp 1 \
  --num_boost 3 \
  --k_shot 3 \
  --output_metrics_file ../ama_logs/metrics.json \
  --cache_connection ../ama_logs/manifest_cache.sqlite \
  --save_dir ../ama_logs/ama_final_runs

and I see:

Accuracy Zero Shot 0.10691393700228803
Accuracy Few Shot 0.21785939533525883
Accuracy by Boost Set Decomposed [0.1622917936425949, 0.16476214293285982, 0.17488727059013848]
Accuracy by Boost Set Decomposed Average 0.1673137357218644
Accuracy Boost Decomposed 0.16522288996755874
Saved metrics to ../ama_logs/metrics.json
Saved final data to ../ama_logs/ama_final_runs/drop

This seems lower than the numbers in the paper, is that expected?

Screen Shot 2022-10-19 at 2 02 46 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.