Coder Social home page Coder Social logo

ama_prompting's People

Contributors

anarayan avatar eltociear avatar erjanmx avatar lorr1 avatar simran-arora avatar thedch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ama_prompting's Issues

Demonstration construction of SST2

Hi,

This work is amazing, thanks for providing the code!
I have some questions about it.

  1. In Line 47 of SST2_final.py, there is no ".sample(frac=1, random_state=0)", all demonstrations seem to be arranged by their labels in the few-shot baseline.
  2. For NQ and WebQs, the evaluation metric is span overlap accuracy instead of exact matching, does GPT-3 use the same metric?
  3. Can I change the order of demonstrations in the prompt chain? I tried it on WSC and found that the order can affect the evaluation results.

Thanks so much

Clarify the definition of "good-quality" prompt and "imperfect" prompt

Dear authors,

I just want to know what is the definition of "good-quality" prompt and "imperfect" prompt. Are my understanding correct?

Good-quality prompt = prompt which mostly contains ground-truth question-answer pairs. (the answers are more tend to be correct)
imperfect prompt = prompt which contains many false question-answer pairs. (the answers are more likely to be incorrect)

Thanks,
Dylan

DROP Benchmark metrics seem low

Hi, I ran

python3 tasks/drop_final.py \
  --run_zeroshot 1 \
  --run_fewshot 1 \
  --run_decomp 1 \
  --num_boost 3 \
  --k_shot 3 \
  --output_metrics_file ../ama_logs/metrics.json \
  --cache_connection ../ama_logs/manifest_cache.sqlite \
  --save_dir ../ama_logs/ama_final_runs

and I see:

Accuracy Zero Shot 0.10691393700228803
Accuracy Few Shot 0.21785939533525883
Accuracy by Boost Set Decomposed [0.1622917936425949, 0.16476214293285982, 0.17488727059013848]
Accuracy by Boost Set Decomposed Average 0.1673137357218644
Accuracy Boost Decomposed 0.16522288996755874
Saved metrics to ../ama_logs/metrics.json
Saved final data to ../ama_logs/ama_final_runs/drop

This seems lower than the numbers in the paper, is that expected?

Screen Shot 2022-10-19 at 2 02 46 PM

problem when run tasks/RTE_final.py

I follow the instructions in readme, and meet this error. Do you know how to solve it?

ama_prompting/manifest/manifest/manifest.py", line 104, in init
raise ValueError(f"{list(kwargs.items())} arguments are not recognized.")
ValueError: [('session_id', None)] arguments are not recognized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.