lavague-ai / lavague Goto Github PK

View Code? Open in Web Editor NEW

5.2K 53.0 458.0 98.27 MB

Large Action Model framework to develop AI Web Agents

Home Page: https://docs.lavague.ai/en/latest/

License: Apache License 2.0

Python 69.32% HTML 0.30% SCSS 0.23% TypeScript 15.68% JavaScript 0.81% Jupyter Notebook 13.25% Gherkin 0.40%

ai browser large-action-model llm oss rag

lavague's Introduction

Welcome to LaVague

A Large Action Model framework for developing AI Web Agents

LaVague: Web Agent framework for builders

LaVague is an open-source framework designed for developers who want to create AI Web Agents to automate processes for their end users.

Our Web Agents can take an objective, such as "Print installation steps for Hugging Face's Diffusers library," and generate and perform the actions required to achieve the objective.

LaVague Agents are made up of:

A World Model that takes an objective and the current state (aka the current web page) and outputs an appropriate set of instructions.
An Action Engine which “compiles” these instructions into action code, e.g., Selenium or Playwright & executes them

LaVague QA: Dedicated tooling for QA Engineers

🌊 Built on LaVague

LaVague QA is a tool tailored for QA engineers leveraging our framework.

It allows you to automate test writing by turning Gherkin specs into easy-to-integrate tests. LaVague QA is a project leveraging the LaVague framework behind the scenes to make web testing 10x more efficient.

For detailed information and setup instructions, visit the LaVague QA documentation.

🚀 Getting Started

Demo

Here is an example of how LaVague can take multiple steps to achieve the objective of "Go on the quicktour of PEFT":

Hands-on

You can do this with the following steps:

Download LaVague with:

pip install lavague

Use our framework to build a Web Agent and implement the objective:

from lavague.core import  WorldModel, ActionEngine
from lavague.core.agents import WebAgent
from lavague.drivers.selenium import SeleniumDriver

selenium_driver = SeleniumDriver(headless=False)
world_model = WorldModel()
action_engine = ActionEngine(selenium_driver)
agent = WebAgent(world_model, action_engine)
agent.get("https://huggingface.co/docs")
agent.run("Go on the quicktour of PEFT")

# Launch Gradio Agent Demo
agent.demo("Go on the quicktour of PEFT")

For more information on this example and how to use LaVague, see our quick-tour.

Note, these examples use our default OpenAI API configuration and you will need to set the OPENAI_API_KEY variable in your local environment with a valid API key for these to work.

For an end-to-end example of LaVague in a Google Colab, see our quick-tour notebook

Key Features

✅ Built-in Contexts (aka. configurations)
✅ Customizable configuration
✅ A test runner for testing and benchmarking the performance of LaVague
✅ A Token Counter for estimating token usage and costs
✅ Logging tools
✅ An optional, interactive Gradio interface
✅ Debugging tools
✅ A Chrome Extension

Supported Drivers

We support three Driver options:

A Selenium Webdriver
A Playwright webdriver
A Chrome extension driver

Note that not all drivers support all agent features:

Feature	Selenium	Playwright	Chrome Extension
Headless agents	✅	⏳	N/A
Handle iframes	✅	✅	❌
Open several tabs	✅	⏳	✅
Highlight elements	✅	✅	✅

✅ supported
⏳ coming soon
❌ not supported

🔎 Support

If you're experiencing any issues getting started with LaVague, you can:

Check out our troubleshooting guide where we list information and fixes for common issues.
Opening a GitHub issue describing your issue
Messaging us in the '#support channel' on our Discord server

🙋 Contributing

We would love your help and support on our quest to build a robust and reliable Large Action Model for web automation.

To avoid having multiple people working on the same things & being unable to merge your work, we have outlined the following contribution process:

📢 We outline tasks using GitHub issues: we recommend checking out issues with the help-wanted & good first issue labels
🙋‍♀️ If you are interested in working on one of these tasks, comment on the issue!
🤝 We will discuss with you and assign you the task with a community assigned label
💬 We will then be available to discuss this task with you
⬆️ You should submit your work as a PR
✅ We will review & merge your code or request changes/give feedback

Please check out our contributing guide for more details.

🗺️ Roadmap

To keep up to date with our project backlog here.

💰 How much does it cost to run an agent?

LaVague uses LLMs, (by default OpenAI's gpt4-o but this is completely customizable), under the hood.

The cost of these LLM calls depends on:

the models chosen to run a given agent
the complexity of the objective
the website you're interacting with.

Please see our dedicated documentation on token counting and cost estimations to learn how you can track all tokens and estimate costs for running your agents.

📈 Data collection

We want to build a dataset that can be used by the AI community to build better Large Action Models for better Web Agents. You can see our work so far on building community datasets on our BigAction HuggingFace page.

This is why LaVague collects the following user data telemetry by default:

Version of LaVague installed
Code / List of actions generated for each web action step
The past actions
The "observations" (method used to check the current page)
LLM used (i.e GPT4)
Multi modal LLM used (i.e GPT4)
Randomly generated anonymous user ID
Whether you are using a CLI command (lavague-qa for example), the Gradio demo or our library directly.
The objective used
The chain of thoughts on the agent
The interaction zone on the page (bounding box)
The viewport size of your browser
The current step
The instruction(s) generated & the current engine used
The token costs & usages
The URL you performed an action on
Whether the action failed or succeeded
The extra used data specified
Error message, where relevant
The source nodes (chunks of HTML code retrieved from the web page to perform this action)

Be careful to NEVER includes personal information in your objectives and the extra user data. If you intend to includes personal information in your objectives/extra user data, it is HIGHLY recommended to turn off the telemetry.

🚫 Turn off all telemetry

If you want to turn off all telemetry, you should set the LAVAGUE_TELEMETRY environment variable to "NONE".

For guidance on how to set your LAVAGUE_TELEMTRY environment variable, see our guide here.

lavague's People

Contributors

Stargazers

Watchers

Forkers

twoxfh daikeren nguyenthanhson vesoai rahul007007 touristshaun brianjking techthiyanes tomchapin claudiutraistaru markzachary1 yazanghafir huyvuong sorokinvld whymath ravidhu jeffmartson acbanerj shaon2221 polya20 alparhan lpai-org jeaneigsi xyluo25 dkstar11q weiribao parmeet-97637 leixy76 kustomzone mivanovitch alenbadel punkt2 iwillcodeu babybirdprd zeroxclem svorwerk-flextg jmanhype apollohuang1 jeffara kamjin1996 zebrajack majiajue lyhiving antonpolishko nvtinh368 leizhenpeng fingerx iamuddeshya v6p mr2cool namastexlabs jorik041 jaytoday sekmet b08240 keyman9848 masterzsh stvnshpd hbcbh1999 wrhall mkgobaco-plutotv shubhamofbce martindale samit33 crisschan lplzyp dai karbon0x acumenix k2m5t2 jadepark-dev neuroradiology satchmaui jinchaofs nivir kotthoff yancy777 afsarali-pg daviddelaurier thearchiver mentordotgit jpchavat mocy levidehaan qidonghang invertednz syaikhipin miltos-thestargazer opcodewolf digo-luz jigza donghyun-daniel hsj1 zonewancheng fanxiaowei rkp64 suryatmodulus darshanparab dineshkumares hyoungphi

lavague's Issues

Split dependencies to install

We have a monolithic installation process, for instance cuda is installed by default, which is too heavy for some use cases.

We will provide soon a version where the Action Engine can be provided through a SaaS to make it easier for people who don't want to do computing locally, so their package will be very lightweight.

@lyie28 Could you have a look? I guess this can be done in conjunction with #59

Build hub for agents

Add new config files + integration notebooks such as Claude 3

Place your custom config file for integration with X API or model in examples/api
Add an integration notebook to docs/docs/integrations following the other integration notebooks as an example

Add support for LM Studio

Is it possible to connect LM Studio?

If not, can you add it?

Thanks.

Improve retrieval to make sure only relevant pieces of code are used for code generation

Integrate new experiments to improve model performance

@HiImMadness : see with @mbrunel to integrate the work you did on improving the retriever and the new prompt template to increase performance

The gradio-demo.ipynb does not work, also it does not show what is the error.

I tried running the gradio demo in colab. It is not working. Just shows an error message without telling what is causing the error.

Integrate vision

Setup data telemetry for failure of AI

Add telemetry to help us improve performance/reduce failures:

Collect:
URL
Date
HTML code
Retrieved nodes by Llama Index
LLM used
Error message
Code produced
Screenshot

Do new PyPi release

Puppeteer integration

Getting error message whenever I do "Enter URL and press 'Enter' to load the page."

When I run this, irrespective of which site I simply get "error" inside the virual screen. Tried several different websites and browsers and I get the same thing. No error in the code so presumably all dependencies are installed correctly... This example is the IRS site. Happens within the link and within colab.

User-Level Locking for Selenium Driver Instances

Assigning an ID to Selenium drivers and locking them for individual users

Add export integration with hub

Add modular different export options that will interact with hub when built: python, pytests, FastAPI etc.

Make command center CLI tool

The command center module should:

Get HTML source code from URL
Make calls to actionEngine using config file for customizable parameters such as models to use, prompt template to use, etc.
Execute the generated code

Provide a cleaning function to the action engine

I realize that depending on the model, the generated code can have different issues and needs to be cleaned differently.

For instance, with Hugging Face API, the generation continues after the end of the markdown cell

# First we need to identify the component first, then we can click on it.

# Based on the HTML, the link can be uniquely identified using the class "w-full rounded-full border border-gray-200 text-sm placeholder-gray-400 shadow-inner outline-none focus:shadow-xl focus:ring-1 focus:ring-inset dark:bg-gray-950 h-7 pl-7"
# Let's use this class with Selenium to identify the link
search_bar = driver.find_element(By.XPATH, "//*[@class='w-full rounded-full border border-gray-200 text-sm placeholder-gray-400 shadow-inner outline-none focus:shadow-xl focus:ring-1 focus:ring-inset dark:bg-gray-950 h-7 pl-7']")

search_bar.click()

# Now we can type the asked input
search_bar.send_keys("The Stack")

# Finally we can press the 'Enter' key
search_bar.send_keys(Keys.ENTER)

---

HTML:
<!DOCTYPE html>
<html>
<head>
    <title>Mock Page</title>
</head>```

For Azure OpenAI, it starts with ```python, with Mixtral on Fireworks AI it starts with a natural answer like "Here is an answer".

We should have the Action Engine be initialized with a cleaning function.

Make quick-tour notebook for Python CLI

VSCode plug-in for LaVague

Playwright support

Do you plan to support playwright?

Make getting started experience quicker and easier

Add docker option to "getting started"

Is your feature request related to a problem? Please describe.
Add a way for people to test and run this locally - not in a codelab. Docker is an easy path for folks.

Describe the solution you'd like
I'd like clear instructions on how to run this locally.

Cypress integration

Create wrapper for OpenAI for ease-of-use

code cleaning: add typing + comments, abstract driver

Make HF Gradio space

Error when running google colab

Describe the bug
When simply running all fields one for one in the google colab I get the following:

NoSuchDriverException Traceback (most recent call last)
in <cell line: 40>()
38
39 # Choose Chrome Browser
---> 40 driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
41
42 # action_engine = ActionEngine(llm, embedder)

2 frames
/usr/local/lib/python3.10/dist-packages/selenium/webdriver/common/driver_finder.py in get_path(service, options)
42
43 if path is None or not Path(path).is_file():
---> 44 raise NoSuchDriverException(f"Unable to locate or obtain driver for {options.capabilities['browserName']}")
45
46 return path

NoSuchDriverException: Message: Unable to locate or obtain driver for chrome; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location

I was able to fix it by removing the service from the initialization so instead of # Choose Chrome Browser
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options) I did driver = webdriver.Chrome( options=chrome_options)

Build option to export with fastapi compatibility

Request for Enhancing Agent Robustness Against Bot Detection

It hasn't been effective on most of the sites I've attempted to use it on, as it's often identified as a bot. Are there any plans to enhance the agent's resilience against such detections?

Improving retriever for better precision/ accuracy when finding relevant HTML of current page

Browser won't load

The browser simply won't load. Tried different urls, but still no joy.

Unable to run ActionEngine in Colab

It gives a TypeError saying ,
Missing "Prompt" argument.

Kindly suggest what am I doing wrong.

Split LaVague dependencies so the package is not so heavy

Create contribution guidelines

avague-launch --file_path hf.txt --config_path openai.py

Hello folks,
I'm trying to get the project off its feet with the source code.
I gave the file path in the lavague launch command parameters, this did not work.
I tried the commands in the normal document.

I started with the setup.sh file, there was no error here.
Chrome driver unzip was done successfully.
I am working in a virtual environment with conda.

There's openai.py file on path lavague-files and examples/api

When the command tried on those paths, chromedriver error.

 lavague-launch --file_path hf.txt  --config_path openai.py                                                                                                                    
Traceback (most recent call last):
  File "/opt/anaconda3/envs/LaVague/bin/lavague-launch", line 8, in <module>
    sys.exit(launch())
  File "/Users/serdarkaracay/AiWebScraby/LaVague/src/lavague/__init__.py", line 121, in launch
    action_engine, get_driver = load_action_engine(config_path, streaming=True)
  File "/Users/serdarkaracay/AiWebScraby/LaVague/src/lavague/utils.py", line 18, in load_action_engine
    config_module = import_from_path(path)
  File "/Users/serdarkaracay/AiWebScraby/LaVague/src/lavague/utils.py", line 13, in import_from_path
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 846, in exec_module
  File "<frozen importlib._bootstrap_external>", line 982, in get_code
  File "<frozen importlib._bootstrap_external>", line 1039, in get_data
FileNotFoundError: [Errno 2] No such file or directory: 'openai.py'
(LaVague)

LaVague 
❯ conda info

     active environment : LaVague
    active env location : /opt/anaconda3/envs/LaVague
            shell level : 5
       user config file : /Users/serdarkaracay/.condarc
 populated config files : /Users/serdarkaracay/.condarc
          conda version : 24.1.2
    conda-build version : 24.1.2
         python version : 3.11.7.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=m1
                          __conda=24.1.2=0
                          __osx=14.4.1=0
                          __unix=0=0
       base environment : /opt/anaconda3  (writable)
      conda av data dir : /opt/anaconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-arm64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-arm64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /opt/anaconda3/pkgs
                          /Users/serdarkaracay/.conda/pkgs
       envs directories : /opt/anaconda3/envs
                          /Users/serdarkaracay/.conda/envs
               platform : osx-arm64
             user-agent : conda/24.1.2 requests/2.31.0 CPython/3.11.7 Darwin/23.4.0 OSX/14.4.1 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.6 aau/0.4.3 c/QkeNtkUrgwz7nVYVSzIrAg s/5f_5979BIZ-G0e4oRXaKzg e/ATRt1h_Xl4cCAzCR1M6zcQ
                UID:GID : 501:20
             netrc file : None
           offline mode : False

MacOS M1 Max Sonoma 14.4.1
Python 3.9.19

Stand alone docker for offline execution

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Create browser extension inspired by parts of Selenium IDE

-> Specs: doesn't require developer mode

Local LLM model support like Ollama

Is your feature request related to a problem? Please describe.
It would be nice to support local llms as most companies have data privacy policies especially around recent AI developments

Describe the solution you'd like
It would be nice to point to a local llm or have some integration with standard libraries like ollama

Fine-tune a gemma-7b for better local model

Add LiteLLM support so that we can use 100+ Different LLM's easily with this project

Litellm is open source project you can read about them if you want https://github.com/BerriAI/litellm
Many projects are integrating with it because with this we can use easily 100+ different LLMs as it helps in creating local proxy server which api structure is open ai compatible and with one apo structure it becomes easy to change model and api key and use easily without additional configurations and reading and implementing from individual llm docs so please think about it

Documentation: Add customization/under the hood guide

Install via Homebrew on macOS

Problem: not clear how to install and run locally on macOS.

Proposed solution: Allow to install the tool via the Homebrew package manager on macOS. Example: brew install lavague

Timeout error when using this project.

Hi,
First of all excellent work. I am facing issue with timeouts. I am able to install and run the demo. That's good. However when go the gradio app like this.
I typed in https://youtube.com

Then I supplied the following query

It went on to execute this statement forever. Moreover it does not display any errors or stack trace.

Create a decentralized dataset of interactions between users and LaVague to improve model and evaluate its performance

Create documentation

Add support for Hugging face inference with docker image

Is your feature request related to a problem? Please describe.

Currently the docker has only 2 commands, build and launch, and it launches the config.py file, what If I want to use huggingface_interface.py?

Can we modify the entrypoint.sh? and add the support for huggingface and possibly for others?

Describe the solution you'd like
Modifying the entrypoint.sh file and adding the support for the same.

Describe alternatives you've considered
I am not sure about the alternatives, but modifying the entrypoint.sh seems a good solution at this point.

Additional context

Setting up software CI processes & actions

automated testing needed after pushes to main