hazyresearch / ama_prompting Goto Github PK

Ask Me Anything language model prompting

License: Apache License 2.0

Python 99.94% Shell 0.06%

ama_prompting's Issues

Clarify the definition of "good-quality" prompt and "imperfect" prompt

Dear authors,

I just want to know what is the definition of "good-quality" prompt and "imperfect" prompt. Are my understanding correct?

Good-quality prompt = prompt which mostly contains ground-truth question-answer pairs. (the answers are more tend to be correct)
imperfect prompt = prompt which contains many false question-answer pairs. (the answers are more likely to be incorrect)

Thanks,
Dylan

Demonstration construction of SST2

Hi,

This work is amazing, thanks for providing the code!
I have some questions about it.

In Line 47 of SST2_final.py, there is no ".sample(frac=1, random_state=0)", all demonstrations seem to be arranged by their labels in the few-shot baseline.
For NQ and WebQs, the evaluation metric is span overlap accuracy instead of exact matching, does GPT-3 use the same metric?
Can I change the order of demonstrations in the prompt chain? I tried it on WSC and found that the order can affect the evaluation results.

Thanks so much

problem when run tasks/RTE_final.py

I follow the instructions in readme, and meet this error. Do you know how to solve it?

ama_prompting/manifest/manifest/manifest.py", line 104, in init
raise ValueError(f"{list(kwargs.items())} arguments are not recognized.")
ValueError: [('session_id', None)] arguments are not recognized.

DROP Benchmark metrics seem low

Hi, I ran

python3 tasks/drop_final.py \
  --run_zeroshot 1 \
  --run_fewshot 1 \
  --run_decomp 1 \
  --num_boost 3 \
  --k_shot 3 \
  --output_metrics_file ../ama_logs/metrics.json \
  --cache_connection ../ama_logs/manifest_cache.sqlite \
  --save_dir ../ama_logs/ama_final_runs

and I see:

Accuracy Zero Shot 0.10691393700228803
Accuracy Few Shot 0.21785939533525883
Accuracy by Boost Set Decomposed [0.1622917936425949, 0.16476214293285982, 0.17488727059013848]
Accuracy by Boost Set Decomposed Average 0.1673137357218644
Accuracy Boost Decomposed 0.16522288996755874
Saved metrics to ../ama_logs/metrics.json
Saved final data to ../ama_logs/ama_final_runs/drop

This seems lower than the numbers in the paper, is that expected?

hazyresearch / ama_prompting Goto Github PK

ama_prompting's Issues

Clarify the definition of "good-quality" prompt and "imperfect" prompt

Demonstration construction of SST2

problem when run tasks/RTE_final.py

DROP Benchmark metrics seem low

apex==0.1 can't be installed

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent