Coder Social home page Coder Social logo

mahdi-shafiei / agentgym Goto Github PK

View Code? Open in Web Editor NEW

This project forked from woooodyy/agentgym

0.0 0.0 0.0 42.59 MB

Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al.

Home Page: https://arxiv.org/abs/2406.04151

License: MIT License

Shell 3.61% Python 92.33% CSS 0.29% HTML 2.85% Jupyter Notebook 0.91%

agentgym's Introduction

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

๐Ÿ“ƒ Paper โ€ข ๐ŸŒ Project Page โ€ข ๐Ÿค— AgentTraj-L โ€ข ๐Ÿค— AgentEval โ€ข ๐Ÿค— Model (AgentEvol-7B)

๐Ÿ”” News

๐ŸŒŸ Introduction

Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities.

AgentGym is a new framework featuring a variety of environments and tasks for broad, real-time, uniformat, and concurrent agent exploration. It is designed to help the community easily evaluate and develop generally-capable LLM-based agents. It also includes a high-quality trajectory set AgentTraj and a benchmark suite AgentEval. We also propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models.

๐ŸŽ AgentGym Suite

AgentGym is a framework designed to help the community easily evaluate and develop generally-capable LLM-based agents. It features diverse interactive environments and tasks with a unified format, i.e., ReAct format. It supports real-time feedback and concurrency, and is easily scalable. It includes 14 environments across web navigating, text games, house-holding tasks, digital games, embodied tasks, tool-using and programming.

Environment Traj Eval Original Repo EnvServer
WebShop 3930 200 WebShop-Repo agentenv-webshop
WebArena 0 20 WebArena agentenv-webarena
MAZE 215 25 MAZE-Repo agentenv-lmrlgym
Wordle 955 25 Wordle-Repo agentenv-lmrlgym
ALFWorld 2420 200 ALFWorld-Repo agentenv-alfworld
SciWorld 2120 200 SciWrold-Repo agentenv-sciworld
BabyAI 810 90 BabyAI-Repo agentenv-babyai
TextCraft 374 100 TextCraft-Repo agentenv-textcraft
Weather 311 20 Weather-Repo agentenv-tool
Movie 215 20 Movie-Repo agentenv-tool
Academia 0 20 Academia-Repo agentenv-tool
Sheet 0 20 Sheet-Repo agentenv-tool
TODOList 135 20 TODOList-Repo agentenv-tool
BIRD 3000 200 BIRD-Repo agentenv-sqlgym

Platform

The platform architecture of AgentGym is illustrated in the following figure. In AgentGym, different environments are deployed on different servers or ports and provide encapsulated HTTP services externally. This decouples the environments from other parts.

These services include APIs such as /createEnv to create an environment, /observation to get the current observation from the environment, /available_actions to get the currently available actions, /step to perform an action, and /reset to reset the environment.

We have implemented 14 types of environments, and developers can easily develop new environments and add them to AgentGym by encapsulating the aforementioned interfaces. EnvClients have the responsibility of receiving services provided by the server and encapsulating them into functions for user calls. AgentController is our core component that connects the agent and the environment. It is responsible for evaluating the agent, collecting data, and training the agent.

Benchmark: AgentEval

The AgentEval benchmark is on Hugging Face: AgentGym/AgentEval.

Here is an example:

{	
  "conversations": null, 
  "item_id": "webshop_5238"
}

Trajectory Set: AgentTraj

The trajectory set AgentTraj-L is available on Hugging Face: AgentGym/AgentTraj-L.

Here is an example:

    {
        "conversations": [
            {
                "from": "human",
                "loss": null,
                "value": "You are web shopping.\nI will give you instructions about what to do.\nYou have to follow the instructions.\nEvery round I will give you an observation and a list of available actions, you have to respond an action based on the state and instruction.\nYou can use search action if search is available.\nYou can click one of the buttons in clickables.\nAn action should be of the following structure:\nsearch[keywords]\nclick[value]\nIf the action is not valid, perform nothing.\nKeywords in search are up to you, but the value in click must be a value in the list of available actions.\nRemember that your keywords in search should be carefully designed.\nYour response should use the following format:\n\nThought:\nI think ... \n\nAction: \nclick[something]"
            },
            {
                "from": "gpt",
                "loss": false,
                "value": "Ok."
            },
            {
                "from": "human",
                "loss": null,
                "value": "WebShop [SEP] Instruction: [SEP] Find me men's shorts with elastic waist, classic fit, short sleeve for gym workout with color: navy, and size: x-large, and price lower than 50.00 dollars [SEP] Search"
            },
            {
                "from": "gpt",
                "loss": true,
                "value": "Thought:\nI think I should use the search bar to look for the shorts that match the given requirements.\n\nAction:\nsearch[men's shorts elastic waist classic fit short sleeve gym workout navy x-large price < 50.00]"
            },
	...(more interactions)
           {
                "from": "gpt",
                "loss": true,
                "value": "Thought:\nI think I have found the right product that matches all the requirements. It's time to buy it.\n\nAction:\nclick[Buy Now]"
            }
        ],
        "item_id": "webshop_6"
    },

๐Ÿ›  Usage & Quick Start

This project contains the agentenv python package and the integrated environments.

Setup agentenv pacakage

from PyPI

pip install agentenv

from Source

git clone --recursive https://github.com/WooooDyy/AgentGym
cd ./AgentGym

cd agentenv
pip install -e .

Depending on which environments you want to use, cd into the corresponding agentenv-* folder and follow the README.md inside.

Tutorials

Examples

Main Experimental Results

๐Ÿ“ง Contact

๐Ÿ”– Citation

@misc{xi2024agentgym,
      title={AgentGym: Evolving Large Language Model-based Agents across Diverse Environments}, 
      author={Zhiheng Xi and Yiwen Ding and Wenxiang Chen and Boyang Hong and Honglin Guo and Junzhe Wang and Dingwen Yang and Chenyang Liao and Xin Guo and Wei He and Songyang Gao and Lu Chen and Rui Zheng and Yicheng Zou and Tao Gui and Qi Zhang and Xipeng Qiu and Xuanjing Huang and Zuxuan Wu and Yu-Gang Jiang},
      year={2024},
      eprint={2406.04151},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

agentgym's People

Contributors

kyln24 avatar woooodyy avatar yiwen-ding avatar zsxmwjz avatar andy15 avatar chenwxoggai avatar hotdog-zz avatar xinguo2002 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.