carperai / autocrit Goto Github PK

View Code? Open in Web Editor NEW

84.0 84.0 17.0 1.15 MB

A repository for transformer critique learning and generation

Python 98.42% Shell 1.58%

autocrit's People

Stargazers

Watchers

Forkers

dmahan93 herbiebradley mistobaan tiendung jsedoc vedudx khoapip louiscastricato xyzinabox skittlepox robertalanm sudhanshu-shukla-git dearborn-open-ai

autocrit's Issues

Reproduce Constitutional AI Steps

Overview

This issue captures some of the key steps required to reproduce the Constitutional AI paper steps to fine tune a RLHF model with feedback generated by a RLAIF model.

Phase One

Gather a dataset of harmful prompts
Create a base script to compose prompts using a base constitution
Generate a new dataset of prompts + responses using Carper's GPT-J RLHF to review / critique the output
Fine-tune the original model on revised responses using supervised learning

Phase Two

Sample the fine tuned model using the dataset of harmful prompts to create a new dataset with multiple outputs
Train a "reward model' (i.e. https://github.com/Dahoas/reward-modeling) to select the best result (fine tuned preference model)
Use RLAIF training to fine tune the RLHF model

License

Can we get a repo license doc btw? The setup.py says MIT but if there is no LICENSE doc I think it may be officially proprietary code until you add one.

Evaluation of instruction tuned models is difficult for many of the properties we actually care about.
Language modelling and multiple choice benchmarks may capture some aspects of knowledge and reasoning but don't capture many of the properties we care about in instruction tuned dialog agents, like long term coherence, multi task generalisation, ability to use tools, harmlessness, etc.
To address this, we can try to use LLMs to evaluate LLMs.

Ways to do this (in order of increasing complexity):

Generate a LM or forced choice QA dataset and evaluate the instruct model offline
Use reward functions on generations from some (possibly generated) prompt dataset (e.g learn RMs, zero-shot LLM reward functions, etc)
Online exploration and evaluation using another LLM

We should implement these into our repository.
Basic implementations would be:

A script that uses langchain and some seed prompts to generate a multiple choice dataset.
A script that prompts LLMs to rate outputs or generate critiques
A script that has an LLM attempt to use the LLM under test to complete some task, and a check for if that task was successfully completed.

carperai / autocrit Goto Github PK

autocrit's People

Stargazers

Watchers

Forkers

autocrit's Issues

Reproduce Constitutional AI Steps

Overview

Phase One

Phase Two

License

Eval Instruct

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent