Hi, first of all, thank you for releasing the code for this project! Super helpful to

Hi, thanks for the question. Note the model <a href="https://github.

Differences in results between the paper and the code about alpaca_farm HOT 1 CLOSED

tatsu-lab commented on July 18, 2024

Differences in results between the paper and the code

from alpaca_farm.

Comments (1)

lxuechen commented on July 18, 2024

Hi, thanks for the question.

Note the model here with win rate 40.8% is an SFT model trained on 52k data (a reproduction of the original Alpaca model).

This model is not the base SFT model we use for reward modelling and RLHF, which is the SFT model trained on 10k data.

I have rerun the auto-annotations with the exact models used in our paper. While there's stochasticity in the pooled auto-annotator (due to the assignment of examples to different auto-annotators and randomization in ordering), the difference compared to our paper's results is quite small (see Table 2 of the paper).

Below are the results based on a rerun.

                                        n_draws  n_total  n_wins  n_wins_base  standard_error  win_rate
GPT4                                      17.00   805.00  639.00       149.00            1.38     80.43
ChatGPT                                    9.00   804.00  489.00       306.00            1.71     61.38
rlhf_llama_7b_regen_v7_3ep_v12_ckpt_20     9.00   803.00  370.00       424.00            1.75     46.64
sft_52k                                   19.00   805.00  325.00       461.00            1.72     41.55
sft_llama_7b_regen_v7_3ep                 16.00   804.00  320.00       468.00            1.72     40.80
sft_10k                                   19.00   802.00  278.00       505.00            1.67     35.85
Davinci001                                 0.00   805.00  201.00       604.00            1.53     24.97
LLaMA 7B                                   0.00   786.00   94.00       692.00            1.16     11.96

The sft_52k and sft_10k entries are based on reruns.

I will send a patch to clarify this point.

from alpaca_farm.

Differences in results between the paper and the code about alpaca_farm HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent