Coder Social home page Coder Social logo

expel's Introduction

Smoll baby robot ExpeL: LLM Agents are Experiential Learners

⚡ [AAAI 2024 (Oral)] Official implementation of the ExpeL Agent ⚡

~ by Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, Gao Huang ~

Release Notes License: Apache 2.0 GitHub star chart Open Issues


🌐 $\cdot$ Project Page   📄 $\cdot$ Paper

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." - Tom Mitchell

📖 Table of Contents

👋 Introduction

This repo is the official implementation of Expel: LLM Agents are Experiential Learners.

Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks. At inference, the agent recalls its extracted insights and past experiences to make informed decisions. Our empirical results highlight the robust learning efficacy of the ExpeL agent, indicating a consistent enhancement in its performance as it accumulates experiences.

🛠️ Installation

Python version : 3.9.17

  1. Create a virtual environment using Anaconda (or your favorite package manager), activate it, clone the repo and install the requirements.
conda create -n expel python=3.9.17
conda activate expel

git clone https://github.com/LeapLabTHU/ExpeL.git expel
cd expel

pip install -r requirements.txt

Next you need to setup the environments.

🌳 Environments

Baby ExpeL has been playing around with the following environments:

Among these, ALFWorld and WebShop require manual installation (+ loading a server (can be local) for WebShop). Details below:

🏠 ALFWorld

The installation instructions are shown below. Use the previously created environment to install ALFWorld. You will also need to download the data at the specified location: data/alfworld.

conda activate expel
pip install alfworld[full]

export ALFWORLD_DATA="data/alfworld"
alfworld-download

If you need more details, please refer to the official repo.

🛒 WebShop

WebShop installation is different from the other environments. You will have to install it and manually run the server (can be local) in parallel of ExpeL to interact with the environment. The succinct installation instructions are shown below.

git clone https://github.com/princeton-nlp/webshop.git webshop
cd webshop

# Create another env for the webshop server to avoid conflicts
conda create -n webshop python=3.8.13 
conda activate webshop

./setup.sh -d all

By default the WebShop only loads 1,000 products. But we need ALL OF THEM (🤯). So change web_agent_site/utils.py:

# DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2_1000.json')
# DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle_1000.json')
DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2.json')
DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle.json')

To run the server, run the following command:

./run_dev.sh

You will be given an URL (and port) once the website is on:

  • Go back to the cloned ExpeL repo
  • Modify the config file and add the given URL in envs/webshop/webshop.py:
WEBSHOP_URL = "http://127.0.0.1:3000" # Example URL

Note that you will have to run the WebShop server in the background to interact with the environment. We gathered some bugs we encountered during the WebShop Server setup here.

If you need more details, please refer to the official repo.

🚀 Quick start

Below are the commands to run the ExpeL Agent.

Either put your OpenAI API key in a .env file (OPENAI_API_KEY=XXX) or get prompted in command line

1. For the Experience Gathering stage:

python train.py benchmark=<benchmark-name> \
  run_name=<train-run-name> \
  testing=false \
  resume=false

# resume = true/false if you want to resume a previous run
# benchmark = {hotpotqa, alfworld, webshop, fever}
# agent.llm = {gpt-3.5-turbo (default), gpt-4}

Below are the commands to run the experience gathering stage as in the paper:

# 🏠 ALFWorld
python train.py benchmark=alfworld run_name=<train-run-name> testing=false resume=false
# 🛒 WebShop
python train.py benchmark=webshop run_name=<train-run-name> testing=false resume=false
# ❓ HotpotQA
python train.py benchmark=hotpotqa run_name=<train-run-name> testing=false resume=false

By default, the result files (logs, dictionnaries) will be saved in logs/<benchmark-name>/expel referenced by <train-run-name>. You can change the log directory by adding log_dir=<log-dir> to the command line.

2. For the Insights Extraction stage:

Use the collected experiences to extract insights.

python insight_extraction.py \
  benchmark=<benchmark-name> \
  load_run_name=<train-run-name> \
  run_name=<insights-extraction-run-name> \ 
  agent.llm=<model> \
  agent.max_num_rules=<insights-num> \
  agent.success_critique_num=<exp-num> \
  testing=true \
  resume=false

# agent.success_critique_num = number of experiences to give per iteration
# agent.max_num_rules = target number of insights to extract

To resume a run that stopped at a specific fold, remove load_run_name from the parameters and specify the fold resume_fold it stopped at and resume=true.

Below are the commands to run the insights extraction stage as in the paper:

# 🏠 ALFWorld
python insight_extraction.py benchmark=alfworld load_run_name=<train-run-name> run_name=<insights-extraction-run-name> agent.llm=gpt-4 agent.max_num_rules=10 agent.success_critique_num=8 testing=false resume=false
# 🛒 WebShop
python insight_extraction.py benchmark=webshop load_run_name=<train-run-name> run_name=<insights-extraction-run-name> agent.llm=gpt-4 agent.max_num_rules=8 agent.success_critique_num=4 testing=false resume=false
# ❓ HotpotQA
python insight_extraction.py benchmark=hotpotqa load_run_name=<train-run-name> run_name=<insights-extraction-run-name> agent.llm=gpt-4 agent.max_num_rules=10 agent.success_critique_num=8 testing=false resume=false

The final result files will be saved in logs/<benchmark-name>/expel/extracted_insights referenced by <insights-extraction-run-name>.

3. For Evaluation:

python eval.py benchmark=<benchmark-name> \
  load_run_name=extracted_insights/<insights-extraction-run-name> \
  run_name=<eval-run-name> \
  benchmark.eval_configs.k_folds=<fold-num> \
  agent.fewshot_strategy=task_similarity \
  agent.retrieval_kwargs.max_fewshot_tokens= <max-retrieval-token-size> \
  agent.retrieval_kwargs.buffer_retrieve_ratio = <retrieve_multiplier-coefficient> \
  testing=false \
  resume=false

# agent.fewshot_strategy = {task_similarity, thought_similarity,task_thought_similarity)
# agent.llm = {gpt-3.5-turbo (default), gpt-4}
# agent.retrieval_kwargs.max_fewshot_tokens=auto
# benchmark.eval_configs.k_folds=2 
# agent.retrieval_kwargs.buffer_retrieve_ratio = safety measure to not retrieve 0 examples (bigger is safer)

To resume a run that stopped, remove load_run_name from the parameters and add resume=true at the end of the command line.

Below are the commands to evalute ExpeL as in the paper:

# 🏠 ALFWorld
python eval.py benchmark=alfworld load_run_name=extracted_insights/<insights-extraction-run-name> run_name=<eval-run-name> agent.fewshot_strategy=task_similarity agent.retrieval_kwargs.max_fewshot_tokens=auto testing=false resume=false
# 🛒 WebShop
python eval.py benchmark=webshop load_run_name=extracted_insights/<insights-extraction-run-name> run_name=<eval-run-name> agent.fewshot_strategy=task_similarity agent.retrieval_kwargs.max_fewshot_tokens=auto agent.retrieval_kwargs.buffer_retrieve_ratio=20 testing=false resume=false
# ❓ HotpotQA
 python eval.py benchmark=hotpotqa load_run_name=extracted_insights/<insights-extraction-run-name> run_name=<eval-run-name> agent.fewshot_strategy=task_similarity testing=false resume=false

The result files will be saved in logs/<benchmark-name>/expel/eval referenced by <eval-run-name>.

🫡 Cite us !

This repository contains code for reproducing results. If you find this work useful in your research (and/or daily life), please cite:

@inproceedings{zhao2024expel,
    author       = {Andrew Zhao and Daniel Huang and Quentin Xu and Matthieu Lin and Yong-Jin Liu and Gao Huang},
    title        = {ExpeL: LLM Agents Are Experiential Learners},
    booktitle    = {Thirty-Eighth {AAAI} Conference on Artificial Intelligence, {AAAI}
                    2024, Thirty-Sixth Conference on Innovative Applications of Artificial
                    Intelligence, {IAAI} 2024, Fourteenth Symposium on Educational Advances
                    in Artificial Intelligence, {EAAI} 2024, February 20-27, 2024, Vancouver,
                    Canada},
    editor       = {Michael J. Wooldridge and Jennifer G. Dy and Sriraam Natarajan},
    year         = {2024},
    pages        = {19632--19642},
    publisher    = {{AAAI} Press},
    url          = {https://ojs.aaai.org/index.php/AAAI/article/view/29936},
    doi          = {10.1609/aaai.v38i17.29936}
}

💌 Contact us !

If you have any questions, feel free to contact Andrew Zhao, Daniel Huang or Quentin Xu.

🏛️ License

Check LICENSE.md

⚠️ Issues

We encountered some errors and gathered them here (note that at time of reading, they might have been fixed). If you don't encountered them, lucky you 😒.

🛒 WebShop-server installation:

# install 
python -m spacy download en_core_web_lg #  
pip install lightgbm nmslib # need to have c compiler and stuff
conda install mkl # if ImportError: libmkl_intel_lp64.so.1:
pip install pysernini
pip install pyserini --no-cache-dir # if low on ram
pip install typing-inspect==0.8.0 typing_extensions==4.5.0 # if issubclass errors 
# if libjvm.so something, need to export JAVA_HOME
./setup.sh -d all 

On Mac, if you have problem with lightgbm or nmslib you might have to replace their pip to:

brew install cmake libomp 
pip install lightgbm

CFLAGS="-mavx -DWARN(a)=(a)" pip install --use-pep517 nmslib

expel's People

Contributors

andrewzh112 avatar hykaruu avatar leaplabthu avatar qwentxz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

expel's Issues

Instructions on adding new tasks

Hi!

I am wondering if you happen to have some quick instructions on how to add a new dataset/task for the ExpeL agent?

For example, where should I add new tools and what prompts should I add or adapt from the existing prompts.

Thank you so much!

如何仅评估ReAct在数据集上的表现?

感谢您的工作,

如果只执行ReAct,是不是就不需要运行train.py ?

简单说就是,我应该如何设置配置文件,命令,就可以用您提供的代码测试react、reflection在benchmark上的表现,

万分感谢!

Is there a situation where there are too many generated rules?

Hi, authors! thanks for your great work!
I have a question that during your experiments, is there a situation that there are too many generated rules?
I know that the max_num_rules would give a certain constraint, however, the prompts seem to suggest that even the number of existing rules reach the max_num_rules, there still have chance to add rule. From provided code, I haven't found an operation to limit the number of rules when adding rules into prompts.
I try your framework on another task, in which I also set max_num_rules=20, however, this task generate more than 100 rules. 😂
I would be very appreciated if I can receive your reply. thanks a lot!!!

Questions about code

您好,很感谢您的工作,我非常喜欢
请问

  1. 为什么HotpotQA的部分env代码跟reflexion中不一样,在我运行的时候很容易发生死循环
while True:
         try:
                observation = self.explorer.search(argument).strip('\n').strip()
                break
          except Exception as e:
                print(e)
                time.sleep(5)
  1. 另外,在运行react和reflexion的时候需要将agent_type指定吗,还是只用控制agent.max_reflection_depth
  2. 在代码中是如何体现数据集的划分的呢,按照我的理解文章应该是有划分train set和eval set

How can I use the rules extracted from "insight_extraction" stage to "eval" stage?

Hi,
Thank you for sharing this excellent work about LLM agents. I have an issue about understanding the code.
I run all the codes under the default settings. However, with "load_cache_rules" set as true, I find no rules will be inserted into the prompt in the eval stage. And if I set the "load_cache_rules" as false, it begins with extracting insights/rules in the eval stage, instead of directly evaluating model.
What should I do if I want to apply the insights/rules extracted in the "insight_extraction" stage to "eval" stage?
Thank you!

expel/reflexion few shot error

您好,首先很感谢您的工作
我在运行Expel以及这个issue中提到的reflexion时发现,true_log中打印出的prompt中的few shot不是所提供的用于reflexion的few shot,而是react的,这里是否有bug

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.