Coder Social home page Coder Social logo

fireact's Introduction

FireAct: Toward Language Agent Fine-tuning

Build License Hugging Face

teaser

This repository is based on our publication FireAct: Toward Language Agent Fine-tuning (PDF). It contains prompts, demo code and fine-tuning data we generated. It also includes the description and directory for the model family we fine-tuned. If you use this code or data in your work, please cite:

@misc{chen2023fireact,
      title={FireAct: Toward Language Agent Fine-tuning}, 
      author={Baian Chen and Chang Shu and Ehsan Shareghi and Nigel Collier and Karthik Narasimhan and Shunyu Yao},
      year={2023},
      eprint={2310.05915},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Overview

  • Define tools in tools/
  • Define tasks in tasks/
  • Collect data & run experiments via generation.py
  • Results will be saved in trajs/

Data & Prompts

  • Data to generate training data and run experiments in data/. We also include samples of training data for both Alpaca format and GPT format. See details here.
  • Prompts to generate training data and run experiments in prompts/

Setup

Set up OpenAI API key and store in environment variable (see here)

export OPENAI_API_KEY=<YOUR_KEY>

Set up SERP API key and store in environment variable (see here)

export SERPAPI_API_KEY=<YOUR_KEY>

Create virtual env, for example with conda

conda create -n fireact python=3.9
conda activate fireact

Clone this repo and install dependencies

git clone https://github.com/anchen1011/FireAct.git
pip install -r requirements.txt

Run Demo

Data Generation

Example:

python generation.py \
    --task hotpotqa \
    --backend gpt-4 \
    --promptpath default \
    --evaluate \
    --random \
    --task_split val \
    --temperature 0 \
    --task_end_index 5

See details with command python generation.py -h

You need to set a high number (thousands) of --task_end_index to get sufficient good data samples. [WARNING] This is costly with gpt-4 and serpapi.

You need to convert trajectories into alpaca format or gpt format for training. See our examples here.

Supervised Fine-tuning

Example:

cd finetune/llama_lora
python finetune.py \
    --base_model meta-llama/Llama-2-13b-chat-hf \
    --data_path ../../data/finetune/alpaca_format/hotpotqa.json \
    --micro_batch_size 8 \
    --num_epochs 30 \
    --output_dir ../../models/lora/fireact-llama-2-13b \
    --val_set_size 0.01 \
    --cutoff_len 512 \

See details here.

Inference

Example (FireAct Llama):

python generation.py \
    --task hotpotqa \
    --backend llama \
    --evaluate \
    --random \
    --task_split dev \
    --task_end_index 5 \
    --modelpath meta-llama/Llama-2-7b-chat \
    --add_lora \
    --alpaca_format \
    --peftpath forestai/fireact_llama_2_7b_lora 

Example (FireAct GPT):

python generation.py \
    --task hotpotqa \
    --backend ft:gpt-3.5-turbo-0613:<YOUR_MODEL> \
    --evaluate \
    --random \
    --task_split dev \
    --temperature 0 \
    --chatgpt_format \
    --task_end_index 5

See details with command python generation.py -h

Set --task_end_index 500 for quantitative evaluations. See our examples here.

Model Zoo

We release a selected set of multitask models based on Llama family. Details can be found in their model cards.

Base Model Training Method Hugging Face
Llama2-7B LoRA forestai/fireact_llama_2_7b_lora
Llama2-13B LoRA forestai/fireact_llama_2_13b_lora
CodeLlama-7B LoRA forestai/fireact_codellama_7b_lora
CodeLlama-13B LoRA forestai/fireact_codellama_13b_lora
CodeLlama-34B LoRA forestai/fireact_codellama_34b_lora
Llama2-7B Full Model forestai/fireact_llama_2_7b

References

  1. Our generation code is based on ysymyth/ReAct
  2. Our Llama full model training code is based on tatsu-lab/stanford_alpaca
  3. Our Llama LoRA training code is based on tloen/alpaca-lora
  4. Our GPT fine-tuning code is based on anchen1011/chatgpt-finetune-ui

fireact's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fireact's Issues

Question about mask Observation

Hello, thanks for your great work!
I see that you try to mask the Observation line in your code, but the mask arg is set to False in your example of training with lora. Does this imply that in your experiments, not using masking performs better than using masking?
Looking forward to your reply!
Qiao

How to make reflexion results on HotPotQA

Good work!

I want to know how the author did reflexion results on hotpotqa.
I have read the original author's method. His evaluator uses the ground truth to the questions in hotpotqa, and then uses em as an evaluation. I would like to ask you how to get the result of reflexion (Especially the evaluator) when doing comparative experiments.

Thanks for your reply

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.