Question about mask Observation

FireAct: Toward Language Agent Fine-tuning

This repository is based on our publication FireAct: Toward Language Agent Fine-tuning (PDF). It contains prompts, demo code and fine-tuning data we generated. It also includes the description and directory for the model family we fine-tuned. If you use this code or data in your work, please cite:

@misc{chen2023fireact,
      title={FireAct: Toward Language Agent Fine-tuning}, 
      author={Baian Chen and Chang Shu and Ehsan Shareghi and Nigel Collier and Karthik Narasimhan and Shunyu Yao},
      year={2023},
      eprint={2310.05915},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Overview

Define tools in tools/
Define tasks in tasks/
Collect data & run experiments via generation.py
Results will be saved in trajs/

Data & Prompts

Data to generate training data and run experiments in data/. We also include samples of training data for both Alpaca format and GPT format. See details here.
Prompts to generate training data and run experiments in prompts/

Setup

Set up OpenAI API key and store in environment variable (see here)

export OPENAI_API_KEY=<YOUR_KEY>

Set up SERP API key and store in environment variable (see here)

export SERPAPI_API_KEY=<YOUR_KEY>

Create virtual env, for example with conda

conda create -n fireact python=3.9
conda activate fireact

Clone this repo and install dependencies

git clone https://github.com/anchen1011/FireAct.git
pip install -r requirements.txt

Run Demo

Data Generation

Example:

python generation.py \
    --task hotpotqa \
    --backend gpt-4 \
    --promptpath default \
    --evaluate \
    --random \
    --task_split val \
    --temperature 0 \
    --task_end_index 5

See details with command python generation.py -h

You need to set a high number (thousands) of --task_end_index to get sufficient good data samples. [WARNING] This is costly with gpt-4 and serpapi.

You need to convert trajectories into alpaca format or gpt format for training. See our examples here.

Supervised Fine-tuning

Example:

cd finetune/llama_lora
python finetune.py \
    --base_model meta-llama/Llama-2-13b-chat-hf \
    --data_path ../../data/finetune/alpaca_format/hotpotqa.json \
    --micro_batch_size 8 \
    --num_epochs 30 \
    --output_dir ../../models/lora/fireact-llama-2-13b \
    --val_set_size 0.01 \
    --cutoff_len 512 \

See details here.

Inference

Example (FireAct Llama):

python generation.py \
    --task hotpotqa \
    --backend llama \
    --evaluate \
    --random \
    --task_split dev \
    --task_end_index 5 \
    --modelpath meta-llama/Llama-2-7b-chat \
    --add_lora \
    --alpaca_format \
    --peftpath forestai/fireact_llama_2_7b_lora

Example (FireAct GPT):

python generation.py \
    --task hotpotqa \
    --backend ft:gpt-3.5-turbo-0613:<YOUR_MODEL> \
    --evaluate \
    --random \
    --task_split dev \
    --temperature 0 \
    --chatgpt_format \
    --task_end_index 5

See details with command python generation.py -h

Set --task_end_index 500 for quantitative evaluations. See our examples here.

Model Zoo

We release a selected set of multitask models based on Llama family. Details can be found in their model cards.

Base Model	Training Method	Hugging Face
Llama2-7B	LoRA	forestai/fireact_llama_2_7b_lora
Llama2-13B	LoRA	forestai/fireact_llama_2_13b_lora
CodeLlama-7B	LoRA	forestai/fireact_codellama_7b_lora
CodeLlama-13B	LoRA	forestai/fireact_codellama_13b_lora
CodeLlama-34B	LoRA	forestai/fireact_codellama_34b_lora
Llama2-7B	Full Model	forestai/fireact_llama_2_7b

References

Our generation code is based on ysymyth/ReAct
Our Llama full model training code is based on tatsu-lab/stanford_alpaca
Our Llama LoRA training code is based on tloen/alpaca-lora
Our GPT fine-tuning code is based on anchen1011/chatgpt-finetune-ui

anchen1011 / fireact Goto Github PK

fireact's Introduction

FireAct: Toward Language Agent Fine-tuning

Overview

Data & Prompts

Setup

Run Demo

Data Generation

Supervised Fine-tuning

Inference

Model Zoo

References

fireact's People

Stargazers

Watchers

Forkers

fireact's Issues

Recommend Projects

Recommend Topics

Recommend Org