Coder Social home page Coder Social logo

ollmer / browsergym Goto Github PK

View Code? Open in Web Editor NEW

This project forked from servicenow/browsergym

0.0 0.0 0.0 567 KB

BrowserGym, a gym environment for web task automation in the Chromium browser.

License: Other

JavaScript 3.34% Python 85.64% Makefile 0.10% HTML 10.92%

browsergym's Introduction

BrowserGym: a Gym Environment for Web Task Automation

[Setup][Usage][Demo]

This package provides browsergym, a gym environment for web task automation in the Chromium browser.

4x4.grid.mp4

Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)

BrowserGym includes the following benchmarks by default:

Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the AbstractBrowserTask class.

Setup

To install browsergym, you can either install one of the browsergym-miniwob, browsergym-webarena and browsergym-workarena packages, or you can simply install browsergym which includes all of these by default.

pip install browsergym

Then, a required step is to setup playwright by running

playwright install

Finally, each benchmark comes with its own specific setup that requires to follow additional steps.

Development setup

To install browsergym locally for development, use the following commands:

git clone https://github.com/ServiceNow/BrowserGym.git

cd ~/PATH/TO/REPOSITORY/BrowserGym

make install

Usage

Open-ended task example

Boilerplate code to run an agent on an interactive, open-ended task:

import gymnasium as gym
import browsergym.core  # register the openended task as a gym environment

env = gym.make(
    "browsergym/openended", task_kwargs={"start_url": "https://www.google.com/"}, wait_for_user_message=True
)
obs, info = env.reset()
done = False
while not done:
    action = ...  # implement your agent here
    obs, reward, terminated, truncated, info = env.step(action)

MiniWoB++ task example

Boilerplate code to run an agent on a MiniWoB++ task:

import gymnasium as gym
import browsergym.miniwob  # register miniwob tasks as gym environments

env = gym.make("browsergym/miniwob.choose-list")
obs, info = env.reset()
done = False
while not done:
    action = ...  # implement your agent here
    obs, reward, terminated, truncated, info = env.step(action)

List of all the available MiniWoB++ environments

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")]
print("\n".join(env_ids))

WebArena task example

Boilerplate code to run an agent on a WebArena task:

import gymnasium as gym
import browsergym.webarena  # register webarena tasks as gym environments

env = gym.make("browsergym/webarena.310")
obs, info = env.reset()
done = False
while not done:
    action = ...  # implement your agent here
    obs, reward, terminated, truncated, info = env.step(action)

List of all the available WebArena environments

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")]
print("\n".join(env_ids))

WorkArena task example

Boilerplate code to run an agent on a WorkArena task:

import gymnasium as gym
import browsergym.workarena  # register workarena tasks as gym environments

env = gym.make("browsergym/workarena.servicenow.order-ipad-pro")
obs, info = env.reset()
done = False
while not done:
    action = ...  # implement your agent here
    obs, reward, terminated, truncated, info = env.step(action)

List of all the available WorkArena environments

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))

Demo

If you want to experiment with an agent in BrowserGym, follow these steps:

cd demo-agent
conda env create -f environment.yml; conda activate demo-agent
# or simply use `pip install -r requirements.txt`
playwright install

Optional: Set your OPENAI_API_KEY if you want to use a GPT agent.

Launch the demo on the open web:

python run_demo.py --task_name openended --start_url https://www.google.com

You can customize your experience by changing the model_name to your preferred LLM, toggling Chain-of-thought with use_thinking, adding screenshots for your VLMs with use_screenshot, and much more!

Citing This Work

Please use the following BibTeX to cite our work:

@misc{workarena2024,
      title={WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?}, 
      author={Alexandre Drouin and Maxime Gasse and Massimo Caccia and Issam H. Laradji and Manuel Del Verme and Tom Marty and Léo Boisvert and Megh Thakkar and Quentin Cappart and David Vazquez and Nicolas Chapados and Alexandre Lacoste},
      year={2024},
      eprint={2403.07718},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

browsergym's People

Contributors

gasse avatar recursix avatar optimass avatar imenelydiaker avatar thibaultlsdc avatar aldro61 avatar xhluca avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.