Coder Social home page Coder Social logo

gptscore's Introduction

GPTScore: Evaluate as You Desire

This is the Source Code of Paper: GPTScore: Evaluate as You Desire.

What is GPTScore?

GPTScore is a novel evaluation framework that utilizes the emergent abilities (e.g., zero-shot instruction) of Generative Pre-Trained models to Score generated texts.

GPTScore evaluation framework support:

  1. Customizable. Customized instructions and demonstrations enable the evaluation of new aspects without labeled datasets;
  2. Multifaceted. One evaluator performs multifaceted evaluations;
  3. Training-free.

What PLMs does GPTScore support?

We explored 19 Pre-trained Language Models (PLMs) ranging in size from 80M (FLAN-T5-Small) to 175B (GPT3) to design GPTScore.
The PLMs studied in this paper are listed as follows:

Model Parameter Evaluator Name Model Parameter Evaluator Name
GPT3 OPT
text-ada-001 350M gpt3_score OPT350M 350M opt350m_score
text-babbage-001 1.3B gpt3_score OPT-1.3B 1.3B opt1_3B_score
text-curie-001 6.7B gpt3_score OPT-6.7B 6.7B opt6_7B_score
text-davinci-001 175B gpt3_score OPT-13B 13B opt13B_score
text-davinci-003 175B gpt3_score OPT-66B 66B opt66B_score
FLAN-T5 GPT2
FT5-small 80M flan_small_score GPT2-M 355M gpt2_medium_score
FT5-base 250M flan_base_score GPT2-L 774M gpt2_large_score
FT5-L 770M flan_large_score GPT2-XL 1.5B gpt2_xl_score
FT5-XL 3B flan_xl_score GPT-J-6B 6B gptJ6B_score
FT5-XXL 11B flan_xxl_score
  • Evaluator Name indicates the name of the evaluator corresponding to the Model name in the first column.

Usage

Use the GPT3-based model as the evaluator

Take the evaluation of GPT3-text-curie-001 model as an example.

  • Setting gpt3_score to True: the GPTScore evaluator uses a GPT3-based PLM.
  • Setting gpt3model to curie: the text-curie-001 model is utilized.
  • out_dir_name: set the folder for saving scoring results.
  • dataname: set the dataset name for evaluation (e.g., BAGEL).
  • aspect: set the aspect name to be evaluated (e.g., quality).

1. GPTScore with Instruction and Demonstration

Set both the use_demo and use_ist as True.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo True 
--use_ist True 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

2. GPTScore with only Instruction

Set the use_ist to True and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist True 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

3. GPTScore without both Instruction and Demonstration

Set the use_ist to False and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist False 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

Use the non-GPT3-based model (e.g., OPT) as the evaluator

Here, we take the evaluation of OPT350M model as an example.

  • Setting opt350m_score to True: use the evaluator named opt350m_score.
  • out_dir_name: set the folder for saving scoring results.
  • dataname: set the dataset name for evaluation (e.g., BAGEL).
  • aspect: set the aspect name to be evaluated (e.g., quality).

1. opt350m_score with Instruction and Demonstration

Set both the use_demo and use_ist as True.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo True 
--use_ist True 
--opt350m_score True 
--out_dir_name "optScore_based"  
--aspect 'quality'

2. opt350m_score with only Instruction

Set the use_ist to True and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist True 
--opt350m_score True 
--out_dir_name "optScore_based"  
--aspect 'quality'

3. opt350m_score without both Instruction and Demonstration

Set the use_ist to False and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist False 
--opt350m_score True 
--out_dir_name "optScore_based"  
--aspect 'quality'

Bib

@article{fu2023gptscore,
  title={GPTScore: Evaluate as You Desire},
  author={Fu, Jinlan and Ng, See-Kiong and Jiang, Zhengbao and Liu, Pengfei},
  journal={arXiv preprint arXiv:2302.04166},
  year={2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.