GPTScore: Evaluate as You Desire

This is the Source Code of Paper: GPTScore: Evaluate as You Desire.

What is GPTScore?

GPTScore is a novel evaluation framework that utilizes the emergent abilities (e.g., zero-shot instruction) of Generative Pre-Trained models to Score generated texts.

GPTScore evaluation framework support:

Customizable. Customized instructions and demonstrations enable the evaluation of new aspects without labeled datasets;
Multifaceted. One evaluator performs multifaceted evaluations;
Training-free.

What PLMs does GPTScore support?

We explored 19 Pre-trained Language Models (PLMs) ranging in size from 80M (FLAN-T5-Small) to 175B (GPT3) to design GPTScore.
The PLMs studied in this paper are listed as follows:

Model	Parameter	Evaluator Name	Model	Parameter	Evaluator Name
GPT3			OPT
text-ada-001	350M	gpt3_score	OPT350M	350M	opt350m_score
text-babbage-001	1.3B	gpt3_score	OPT-1.3B	1.3B	opt1_3B_score
text-curie-001	6.7B	gpt3_score	OPT-6.7B	6.7B	opt6_7B_score
text-davinci-001	175B	gpt3_score	OPT-13B	13B	opt13B_score
text-davinci-003	175B	gpt3_score	OPT-66B	66B	opt66B_score
FLAN-T5			GPT2
FT5-small	80M	flan_small_score	GPT2-M	355M	gpt2_medium_score
FT5-base	250M	flan_base_score	GPT2-L	774M	gpt2_large_score
FT5-L	770M	flan_large_score	GPT2-XL	1.5B	gpt2_xl_score
FT5-XL	3B	flan_xl_score	GPT-J-6B	6B	gptJ6B_score
FT5-XXL	11B	flan_xxl_score

Evaluator Name indicates the name of the evaluator corresponding to the Model name in the first column.

Usage

Use the GPT3-based model as the evaluator

Take the evaluation of GPT3-text-curie-001 model as an example.

Setting gpt3_score to True: the GPTScore evaluator uses a GPT3-based PLM.
Setting gpt3model to curie: the text-curie-001 model is utilized.
out_dir_name: set the folder for saving scoring results.
dataname: set the dataset name for evaluation (e.g., BAGEL).
aspect: set the aspect name to be evaluated (e.g., quality).

1. GPTScore with Instruction and Demonstration

Set both the use_demo and use_ist as True.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo True 
--use_ist True 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

2. GPTScore with only Instruction

Set the use_ist to True and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist True 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

3. GPTScore without both Instruction and Demonstration

Set the use_ist to False and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist False 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

Use the non-GPT3-based model (e.g., OPT) as the evaluator

Here, we take the evaluation of OPT350M model as an example.

Setting opt350m_score to True: use the evaluator named opt350m_score.
out_dir_name: set the folder for saving scoring results.
dataname: set the dataset name for evaluation (e.g., BAGEL).
aspect: set the aspect name to be evaluated (e.g., quality).

1. `opt350m_score` with Instruction and Demonstration

Set both the use_demo and use_ist as True.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo True 
--use_ist True 
--opt350m_score True 
--out_dir_name "optScore_based"  
--aspect 'quality'

2. `opt350m_score` with only Instruction

Set the use_ist to True and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist True 
--opt350m_score True 
--out_dir_name "optScore_based"  
--aspect 'quality'

3. `opt350m_score` without both Instruction and Demonstration

Set the use_ist to False and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist False 
--opt350m_score True 
--out_dir_name "optScore_based"  
--aspect 'quality'

Bib

@article{fu2023gptscore,
  title={GPTScore: Evaluate as You Desire},
  author={Fu, Jinlan and Ng, See-Kiong and Jiang, Zhengbao and Liu, Pengfei},
  journal={arXiv preprint arXiv:2302.04166},
  year={2023}
}

rs6 / gptscore Goto Github PK

gptscore's Introduction

GPTScore: Evaluate as You Desire

What is GPTScore?

What PLMs does GPTScore support?

Usage

Use the GPT3-based model as the evaluator

1. GPTScore with Instruction and Demonstration

2. GPTScore with only Instruction

3. GPTScore without both Instruction and Demonstration

Use the non-GPT3-based model (e.g., OPT) as the evaluator

1. `opt350m_score` with Instruction and Demonstration

2. `opt350m_score` with only Instruction

3. `opt350m_score` without both Instruction and Demonstration

Bib

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

rs6 / gptscore Goto Github PK

gptscore's Introduction

GPTScore: Evaluate as You Desire

What is GPTScore?

What PLMs does GPTScore support?

Usage

Use the GPT3-based model as the evaluator

1. GPTScore with Instruction and Demonstration

2. GPTScore with only Instruction

3. GPTScore without both Instruction and Demonstration

Use the non-GPT3-based model (e.g., OPT) as the evaluator

1. opt350m_score with Instruction and Demonstration

2. opt350m_score with only Instruction

3. opt350m_score without both Instruction and Demonstration

Bib

Recommend Projects

Recommend Topics

Recommend Org

1. `opt350m_score` with Instruction and Demonstration

2. `opt350m_score` with only Instruction

3. `opt350m_score` without both Instruction and Demonstration