Coder Social home page Coder Social logo

page-agent's Introduction

Page Agent

Page Agent Banner

License: MIT TypeScript Bundle Size Downloads GitHub stars

The GUI Agent Living in Your Webpage. Control web interfaces with natural language.

🌐 English | 中文

🚀 Demo | 📖 Docs | 📢 HN Discussion | 𝕏 Follow on X

page-agent-demo-0227.mp4


✨ Features

  • 🎯 Easy integration
    • No need for browser extension / python / headless browser.
    • Just in-page javascript. Everything happens in your web page.
  • 📖 Text-based DOM manipulation
    • No screenshots. No multi-modal LLMs or special permissions needed.
  • 🧠 Bring your own LLMs
  • 🐙 Optional chrome extension for multi-page tasks.

💡 Use Cases

  • SaaS AI Copilot — Ship an AI copilot in your product in lines of code. No backend rewrite.
  • Smart Form Filling — Turn 20-click workflows into one sentence. Perfect for ERP, CRM, and admin systems.
  • Accessibility — Make any web app accessible through natural language. Voice commands, screen readers, zero barrier.
  • Multi-page Agent — Extend your own web agent's reach across browser tabs chrome extension.
  • MCP - Allow your agent clients to control your browser.

🚀 Quick Start

One-line integration

Fastest way to try PageAgent with our free Demo LLM:

<script src="{URL}" crossorigin="true"></script>

⚠️ For technical evaluation only. This demo CDN uses our free testing LLM API. By using it, you agree to its terms.

Mirrors URL
Global https://cdn.jsdelivr.net/npm/[email protected]/dist/iife/page-agent.demo.js
China https://registry.npmmirror.com/page-agent/1.8.2/files/dist/iife/page-agent.demo.js

Add ?autoInit=false to load the script without creating the demo agent automatically. You can then instantiate it with new window.PageAgent(...).

NPM Installation

npm install page-agent
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_API_KEY',
    language: 'en-US',
})

await agent.execute('Click the login button')

For more programmatic usage, see 📖 Documentations.

🌟 Awesome Page Agent

Built something cool with PageAgent? Add it here! Open a PR to share your project.

These are community projects — not maintained or endorsed by us. Use at your own discretion.

Project Description
Yours? Open a PR 🙌

🤝 Contributing

We welcome contributions from the community! See CONTRIBUTING.md for guidelines and docs/developer-guide.md for local development workflows.

Please read the maintainer's note on principles and current state.

Contributions generated entirely by bots or AI without substantial human involvement will not be accepted.

⚖️ License

MIT License

👏 Acknowledgments

This project builds upon the excellent work of browser-use.

PageAgent is designed for client-side web enhancement, not server-side automation.

DOM processing components and prompt are derived from browser-use:

Browser Use <https://github.com/browser-use/browser-use>
Copyright (c) 2024 Gregor Zunic
Licensed under the MIT License

We gratefully acknowledge the browser-use project and its contributors for their
excellent work on web automation and DOM interaction patterns that helped make
this project possible.

⭐ Star this repo if you find PageAgent helpful!

page-agent's People

Contributors

gaomeng1900 avatar dependabot[bot] avatar copilot avatar jasonoa888 avatar linked-danis avatar akinshaywai avatar adonis0123 avatar fancyboi999 avatar lgy2020 avatar voidborne-d avatar lubrsy706 avatar gujiassh avatar wizard-guido avatar xepope avatar fuyua9 avatar hobostay avatar smarkoip avatar zfangqijun avatar zzy-life avatar rinz27 avatar octo-patch avatar mvanhorn avatar tsubasakong avatar cnfeffery avatar anyexyz avatar alibaba-oss avatar 1245040330 avatar

Stargazers

zhang zhiguo avatar  avatar  avatar  avatar Sparx avatar  avatar L3VV15 avatar wang tianfang  avatar  avatar Chen修远 avatar chenwuai avatar Tripp avatar Gaoxing avatar Curt Caines avatar  avatar YuYu avatar  avatar  avatar Hisun avatar Mateo Yadarola avatar  avatar HUU avatar tripcan avatar  avatar Suamarcanotopo avatar  avatar Vinh Quy Phan avatar Rafael Corporan avatar Carlos E. Salazar avatar Ilman Manarul Qori avatar Mohammed Al-Hakem avatar Emmi avatar Joshua avatar Miroslav Bartík avatar brayam pelegrina avatar mauricio gamarra avatar  avatar  avatar  avatar  avatar  avatar btworks.co avatar 鹧鸪天 avatar Vokia avatar Sea avatar Ming-Hsuan Wu avatar  avatar  avatar dak avatar  avatar yongfeide123@sina.com avatar XU Erwen avatar  avatar  avatar  avatar JLiu avatar RuoRuo avatar Mashiro avatar Francesco Archidiacono avatar Jamal Boulhous avatar  avatar  avatar Wenxaing Song avatar  avatar AuroCaffe avatar  avatar AI4YES avatar Садыков Айрат avatar xiyou avatar  avatar lemon222 avatar Fungtin2102 avatar chenLong avatar mar.tiger avatar  avatar Eric Zhang avatar 我知道了嗯 avatar  avatar KyoUK4n avatar Amir Adel avatar Huang DongJiang avatar  avatar zjxx avatar wcok y avatar Dazzlingly avatar  avatar superMG avatar  avatar seavers avatar yangchaojin avatar wulage avatar weiX avatar  avatar Xiao Zhang avatar BrightLoong avatar Iwan Li avatar lllbbb avatar  avatar Haowei avatar Lukas Mateffy avatar

Watchers

fat1 avatar uptown ucoder avatar Nuttapol Wilailuk avatar He-Pin(kerr) avatar Michael Lu avatar Serdar Tulunoğlu avatar HoNooD avatar timelyportfolio avatar Alex Wu avatar  avatar Michele Venturi avatar Yi Xu avatar Unknown avatar Hiroto N. avatar wangyonglin avatar Phil Kurth avatar jpoindexter avatar 6+ avatar Hailey.AI (LogiBricks) avatar  avatar frank avatar Kun Wang avatar Erick Marcia avatar  avatar Adolf avatar Stanley Sun avatar  avatar  avatar xyzdh avatar Kenan SALTIK avatar chen_lou avatar Burak YILDIRIM avatar  avatar  avatar quntion avatar v.andriichuk avatar zhucl avatar Toan Luong Nhu avatar Pyjcsx avatar Timophey Popov avatar  avatar Hammed Olatunji avatar Stefan avatar perejaslav avatar  avatar  avatar HADES avatar aigigi avatar  avatar  avatar AUTOMOBNXT avatar  avatar  avatar  avatar

page-agent's Issues

[Docs] Full i18n docs

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

Make sure all the doc pages are translated. Prioritize the English version.

[Feature] User `takeover`

Feature Description / 功能描述

  • user clicks takeover btn on the panel
  • stop current step including llm calling, actions and so
  • hide mask
  • user handles the page
  • user clicks resume btn
  • modal asks user "what have you changed?"
  • user clicks continue
  • add to observation and resume agent loop

[Bug] 让 AI 理解按钮是否按预期工作。点击中英文切换按钮,预期应该是正确,但AI说是错误的。

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable details. / 我会提供建设性、可行动的细节。
  • I have read the Code of Conduct. / 我已阅读行为准则。

What happened?

让 AI 理解按钮是否按预期工作。点击中英文切换按钮,预期应该是正确,但AI说是错误的。

经过如下:

Image

结果如下:

Image

全屏截图:

Image

Code

null

Browser

chrome 143.0.7475.8

version

0.0.4

[Bug] CDN 快速引入报错

What happened?

通过<script src="https://cdn.jsdelivr.net/npm/page-agent@latest/dist/umd/index.js"></script>来引入报错

Image

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Feature] Add docs for local .env file

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

Standardize dot env file and add docs for it.
Clean up the local dev pipeline.

[Refactor] Monorepo?

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

It's getting clearer that I need to separate the agent loop, the DOM processing logic, the GUI panel, the official site and (potentially) a chrome extension for multi-page agent.

Hosting under Alibaba group makes it impossible to create multiple repos for one project.

Using branches to do this is not acceptable.

Although I dislike mono-repo very much. It is inevitable.

  • refactor project with mono-repo (simplified).
  • leverage raw npm workspace. avoid mono-repo eco sys.

[Feature] Add switches for model patch

Feature Description / 功能描述

Currently, patches are applied according to model name. But in some case, the user may use a proxy or router service that patches certain model in the backend. It's better to allow user to switch on/off patches manually.

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Roadmap V1

see #272

🗺️ PageAgent Roadmap

🚀 Current Works

  • MVP
    • Core functionality implemented
  • SPA interaction
  • Reasoning and (short) memory
  • Multi model provider integration and testing
  • UI with HITL
    • Human-in-the-loop user interface. Agent can ask user questions.
  • Landing and doc pages
  • Remove ai-sdk
    • Only one function of AI-ADK is being used.
    • Our agent memory and thinking mechanism does not suite ai-sdk.
  • Robust LLM output
    • Auto-fix incomplete output format of DeepSeek and QWen.
  • Working homepage with live LLM API
  • CDN
  • Free testing API
  • Custom actions and HITL
  • Hooks and Events
    • lifecycle hooks
    • lifecycle events
  • User takeover
  • ❗Hijack page_open/page_change/page_unload behavior
  • Custom knowledge base and instructions
  • Safeguard
  • Data-masking
  • Improve Memory
  • Optimize for popular UI frameworks
  • i18n of the website
    • Chinese version
    • English version
  • Refactor: Separate Agent and PageController
  • Move mask and mouse simulator to page-controller.
  • 🚩 Stable release v1
  • 🚩 Chrome extenstion for multi-page tasks
  • Edge Extension
  • Firefox Extension

♻️ Following browser-use's update and contribute back.

📋 Pending Features

  • MCP
  • Tools for more complex tasks
    • todo list
    • file sys
  • Support custom llm fetch
  • Testing suits

🤔 To Be Decided

  • Safari Extension
  • Same-origin multi-page-app relay
  • Backend states relay for cross-page task without extension?

[Feature] More flexible `AgentHistory` for user `observation` and `user-takeover`

Feature Description / 功能描述

observation and user-takeover may need to be added to memory. Current history item only store actions. Either allow non-action in history or add non-action fields before or after an action.

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

[Bug] 输入指令后,控制台报错 step failed: Tool arguments validation failed: action "0" with args "{"

What happened?

通过npm方式安装后,再main.js中集成const agent = new PageAgent({
model: DEMO_MODEL,
baseURL: DEMO_BASE_URL,
apiKey: DEMO_API_KEY,
language: 'zh-CN'
})
// await agent.execute("新增一条系统名称")
// 或者显示对话框让用户输入指令
agent.panel.show()
运行项目后,在页面任务中,输入任务,确定执行,就报错了,模型服务确认都已经链接成功

Code

import { PageAgent } from 'page-agent'
const DEMO_MODEL = 'Qwen/Qwen3-Coder-480B-A35B-Instruct'
const DEMO_BASE_URL = 'https://api-inference.modelscope.cn/v1'
const DEMO_API_KEY = 'xxxx'
const agent = new PageAgent({
  model: DEMO_MODEL,
  baseURL: DEMO_BASE_URL,
  apiKey: DEMO_API_KEY,
  language: 'zh-CN'
})
// await agent.execute("新增一条系统名称")
// 或者显示对话框让用户输入指令
agent.panel.show()

Browser

143.0.7499.109

version

0.0.13

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Bug] 初始化没要求展示panel,但panel仍存在且捕获事件,阻碍原网页交互

What happened?

截图:
Image

操作视频:
https://github.com/user-attachments/assets/ffe4eeb1-7e5b-454a-ac00-8d052ece2bc6

Code

Browser

chrome 143

version

0.0.15

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Feature] Support `LLMs.txt`

Feature Description / 功能描述

https://llmstxt.org/

Why not?

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

🌟 V1

🤞 Hope to publish the first stable version of PageAgent next week.

[Docs] Update `Core Lib Development and Testing`

What happened?

Core Lib Development and Testing in Contributing does not work any more because of the mono-repo refactor.

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

🌟 Chrome Extension is Here!

It's about time ⚡️

Download!

Image

Multi-Page-Agent with chrome extension.

Also a secure way to expose the MultiPageAgent to a website. So that your own web-agent can control the whole browser!

[Feature] Optimize tool-call for smaller models.

Feature Description

PageAgent currently uses a very complex tool-call schema to combine self-reflection and action in a single llm call.

Image

However the nested tool-call schema can be very challenging for models not optimized for this kind of task.

There are potential approaches to opt for this:

📖 Describe tool schema in prompt.

It may be helpful to use a text description in the prompt instead of tools api.

Not sure if it can improve the success rate. But can cover open-source models (rarely support tool call).

🪓 Remove the self-reflection process.

Use a simple array of actions as tools. Skip the self reflection process once for all.

  • remove <output> from prompt
  • a new LLMClient to support normal tool calls (but also lenient?)

💦 Separate self-reflection into a dedicated llm call.

  • 2 calls every step.

or

  • add a new tool called thinking and tell the model to use it every other step

In any case. Can be lots of work. Need a better structure to control MacroTool in Agent and LLM modules.

[Bug] 对select下拉框进行自动化,无法切换选项

What happened?

水果: setAddress(e.target.value)} className="border border-gray-300 rounded-md p-2 flex-1" > 请选择喜欢的水果 苹果 香蕉 橘子 葡萄
针对select下拉框,无法进行切换选项

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

Incompatible with Zod v3: `TypeError: process is not a function` in `toJSONSchema`

Description

When using page-agent in a project that has [email protected] installed, the agent fails with a TypeError at runtime. It seems page-agent internally depends on Zod v4's API (specifically the toJSONSchema / zod-to-json-schema processing), which is not compatible with Zod v3.

Error

TypeError: (0 , _to_json_schema_js__WEBPACK_IMPORTED_MODULE_0__.process) is not a function
    at Module.toJSONSchema (json-schema-processors.js:602:1)
    at zodToOpenAITool (page-agent-llms.js:71:1)
    at page-agent-llms.js:146:1
    at Array.map (<anonymous>)
    at OpenAIClient.invoke (page-agent-llms.js:146:1)
    at withRetry.maxRetries (page-agent-llms.js:355:1)
    at withRetry (page-agent-llms.js:384:1)
    at _LLM.invoke (page-agent-llms.js:352:1)
    at _PageAgent.execute (page-agent-core.js:457:1)

Steps to Reproduce

  1. Install page-agent in a project that already has [email protected] as a dependency
  2. Call PageAgent.execute()
  3. The error is thrown when zodToOpenAITool tries to convert Zod schemas to JSON Schema

Expected Behavior

page-agent should either:

  • Be compatible with both Zod v3 and Zod v4, or
  • Declare [email protected] as a peerDependency so users are aware of the version requirement

Environment

  • zod version: 3.x (project-level dependency)
  • page-agent internally expects: zod 4.x

[Feature] Add `observation` to context.

Feature Description / 功能描述

Certain important info can be very helpful for the Agent.

  • URL changed (a previous action caused a in-page nav)
  • Page content not change (the previous action did not make any change in the page)
  • console errors
  • long wait (should stop waiting)
  • too many steps (should sum up now)

Either add it to AgentBrain (stay in memory?) or <browser_state>. Or maybe a new block called <observation>.

[Feature] Upgrade User Event Simulator

Feature Description / 功能描述

Refer to @testing-library/user-event.

Consider separating Event Simulator to a dedicated package.

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

[Refactor] Separate PageController.

Feature Description / 功能描述

Reason:

  • The PageAgent main loop should not heavily rely on DOM env and page env. Which makes it difficult to port it to a pure javascript env (like service worker / node / extension background) or to add multi-page features.
  • The page controller should not rely on LLM. So that it can be tested in unit tests.

Goal:

  • Add a new package (folder: page-controller, package name: @page-agent/page-controller)
  • Communications between the two main modules should always be considered async and isolated. That means actual ref of DOM elements and objects from page controller should never be passed to the main loop. vise versa.
  • The apis of PageController should be simple enough for potential remote calling.
  • Everything in current dom module should move to PageController and expose as async methods. including selectorMap and elementTextMap, should only be saved inside PageController.
  • all the actions in tools/actions that runs in a page should be moved to PageController. tools should call async methods on PageController to actually control the page.
  • PageController works independently.
  • To avoid the fuss. currently the PageAgent class can just import PageController and save the instance on this.pageController.

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

[Feature] qwen3.5 support

Feature Description / 功能描述

qwen3.5 support

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

[Feature] Multi-page process without Chrome Plugin

Feature Description / 功能描述

Being able to navigate different routes within the same domain while keep plan and state would be extremely useful in SaaS platforms.

Right right the only option is to use the Chrome plugin which works but is a barrier to mass adoption.

Is technically possible to implement multi-page plans in a secure, standard way ?

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

[Bug] After deleting baseURL, apiKey, model in Chrome extension settings and saving, an error occurs

What happened?

Originally, I wanted to reset to default settings, but there was no button, so I deleted baseURL, apiKey, model and saved, and I've been stuck on the error page

Image

Code

Browser

Edge 145.0.3800.82 (arm64)

version

0.1.11

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Feature] Remove `pause`

Feature Description / 功能描述

  • Pausing during a task doesn't make much sense.
  • Taking over is a common need for real-time GUI Agents.

[Feature] Knowledge Injection

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

Implement the "Knowledge Injection" feature as planned in the doc.

[Docs] Give best practices

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

Give best practices and suggestions.

  • Task prompts
  • Model selection
  • Page requirements

[Bug] lmstudio 运行模型调用错误

What happened?

lmstudio 运行模型qwen3-30b-a3b-2507,提示InvokeError: HTTP 400: Bad Request

Image { "model": "qwen/qwen3-30b-a3b-2507", "temperature": 1, "messages": [ { "role": "system", "content": "You are an AI agent designed to operate in an iter... ...ame\": {// action-specific parameter}}\n}\n\n" }, { "role": "user", "content": "\n\nAlice Williams是第几个\n ...捷操作\n管理操作\n备忘\n高亮 />\n[End of page]\n\n\n" } ], "tools": [ { "type": "function", "function": { "name": "AgentOutput", "description": "You MUST call this tool every step. Outputs your reflections and next action.", "parameters": { "type": "object", "properties": { "evaluation_previous_goal": { "type": "string" }, "memory": { "type": "string" }, "next_goal": { "type": "string" }, "action": { "anyOf": [ { "type": "object", "properties": { "done": { "type": "object", "properties": { "text": { "type": "string" }, "success": { "default": true, "type": "boolean" } }, "required": [ "text", "success" ], "additionalProperties": false } }, "required": [ "done" ], "additionalProperties": false, "description": "Complete task - provide a summary of results for t... ... be your response to the user summarizing results." }, { "type": "object", "properties": { "wait": { "type": "object", "properties": { "seconds": { "default": 1, "type": "number", "minimum": 1, "maximum": 10 } }, "required": [ "seconds" ], "additionalProperties": false } }, "required": [ "wait" ], "additionalProperties": false, "description": "Wait for x seconds. default 1s (max 10 seconds, mi... ...ed to wait until the page or data is fully loaded." }, { "type": "object", "properties": { "click_element_by_index": { "type": "object", "properties": { "index": { "type": "integer", "minimum": 0, "maximum": 9007199254740991 } }, "required": [ "index" ], "additionalProperties": false } }, "required": [ "click_element_by_index" ], "additionalProperties": false, "description": "Click element by index" }, { "type": "object", "properties": { "input_text": { "type": "object", "properties": { "index": { "type": "integer", "minimum": 0, "maximum": 9007199254740991 }, "text": { "type": "string" } }, "required": [ "index", "text" ], "additionalProperties": false } }, "required": [ "input_text" ], "additionalProperties": false, "description": "Click and input text into a input interactive element" }, { "type": "object", "properties": { "select_dropdown_option": { "type": "object", "properties": { "index": { "type": "integer", "minimum": 0, "maximum": 9007199254740991 }, "text": { "type": "string" } }, "required": [ "index", "text" ], "additionalProperties": false } }, "required": [ "select_dropdown_option" ], "additionalProperties": false, "description": "Select dropdown option for interactive element index by the text of the option you want to select" }, { "type": "object", "properties": { "scroll": { "type": "object", "properties": { "down": { "default": true, "type": "boolean" }, "num_pages": { "default": 0.1, "type": "number", "minimum": 0, "maximum": 10 }, "pixels": { "type": "integer", "minimum": 0, "maximum": 9007199254740991 }, "index": { "type": "integer", "minimum": 0, "maximum": 9007199254740991 } }, "required": [ "down", "num_pages" ], "additionalProperties": false } }, "required": [ "scroll" ], "additionalProperties": false, "description": "Scroll the page by specified number of pages (set ... ...l by a specific number of pixels instead of pages." }, { "type": "object", "properties": { "scroll_horizontally": { "type": "object", "properties": { "right": { "default": true, "type": "boolean" }, "pixels": { "type": "integer", "minimum": 0, "maximum": 9007199254740991 }, "index": { "type": "integer", "minimum": 0, "maximum": 9007199254740991 } }, "required": [ "right", "pixels" ], "additionalProperties": false } }, "required": [ "scroll_horizontally" ], "additionalProperties": false, "description": "Scroll the page or element horizontally (set right... ...its scroll container (works well for wide tables)." }, { "type": "object", "properties": { "open_new_tab": { "type": "object", "properties": { "url": { "type": "string", "description": "The URL to open in the new tab" } }, "required": [ "url" ], "additionalProperties": false } }, "required": [ "open_new_tab" ], "additionalProperties": false, "description": "Open a new browser tab with the specified URL. The new tab becomes the current tab for all subsequent page operations." }, { "type": "object", "properties": { "switch_to_tab": { "type": "object", "properties": { "tab_id": { "type": "integer", "minimum": -9007199254740991, "maximum": 9007199254740991, "description": "The tab ID to switch to" } }, "required": [ "tab_id" ], "additionalProperties": false } }, "required": [ "switch_to_tab" ], "additionalProperties": false, "description": "Switch to an existing tab by its ID. After switchi... ...ch to tabs in the tab list shown in browser state." }, { "type": "object", "properties": { "close_tab": { "type": "object", "properties": { "tab_id": { "type": "integer", "minimum": -9007199254740991, "maximum": 9007199254740991, "description": "The tab ID to close" } }, "required": [ "tab_id" ], "additionalProperties": false } }, "required": [ "close_tab" ], "additionalProperties": false, "description": "Close a tab by its ID. Cannot close the initial tab. Optionally specify which tab to switch to after closing." } ] } }, "required": [ "action" ], "additionalProperties": false } } } ], "parallel_tool_calls": false, "tool_choice": { "type": "function", "function": { "name": "AgentOutput" } } }

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

page-agent 是否真的“可控网页界面”

What happened?

我研究了你们项目的实现,发现它存在多个结构性问题,尤其在如下方面可能根本无法实现你们声称的功能:

  1. 无法处理 iframe 中的目标按钮

多数真实网页按钮嵌套于 iframe(如登录框、支付确认框),你们当前 DOM 抽取方式仅在 window.document 上执行,无法触达嵌套层级。

示例复现场景:任意淘宝或支付宝嵌套式弹窗结构。

  1. 不具备异步 DOM 监听能力

在 SPA / 动态加载场景中,目标按钮页面加载后延迟出现,你们并没有使用 MutationObserver 进行 DOM 变更监听,execute() 调用很容易失败。

  1. 模型响应 stateless,无上下文记忆

所谓“自然语言控制”完全依赖 LLM 一次性生成指令。无法处理多轮交互(例如:点击菜单 → 第二项 → 填表 → 提交),严重限制实际使用。

  1. 安全风险未做限制

是否考虑过如下风险:

  • 是否可以调用付款按钮?
  • 是否可以连续点击导致拒绝服务?
  • 是否可能被 CSRF 注入构造假点击?
  1. 所有功能 puppeteer 都更稳定

如果 page-agent 的所有功能(点击按钮 / 获取文本 / DOM控制)都可以用 puppeteer 更稳定完成,那你们的定位是不是需要重新定义一下?

建议回应点

  • 是否计划支持 iframe 嵌套场景?
  • 是否会引入状态链机制支持多轮对话?
  • 是否考虑更清晰地限定 page-agent 的适用范围?

我个人非常支持 Agent 项目的发展,但该项目当前状态仍存在大量不实宣传与结构问题,建议团队正视并澄清。

期待你们的回应。(你们这个项目对“自然语言控制网页”的理解,还停留在“点点按钮”的阶段。认知还在2020年。我不明白你们这个“page-agent”有何创新之处,以下 20 行代码加 ChatGPT 提示词,就能完成你们整个项目的核心功能——为什么需要一个复杂壳子来套?)

Code

// 假设我们从 ChatGPT 得到自然语言指令:"点击页面上的‘登录’按钮"
const prompt = "点击页面上的‘登录’按钮";

// 手动模拟 LLM → DOM 查询的极简映射逻辑
const elements = Array.from(document.querySelectorAll('button, a, input[type="button"], input[type="submit"]'));

// 简单匹配 innerText 或 aria-label 中包含“登录”的元素
const target = elements.find(el => {
  const text = el.innerText || el.getAttribute('aria-label') || '';
  return //.test(text);
});

// 如果找到,模拟点击
if (target) {
  console.log("Found target button:", target);
  target.click();
} else {
  console.warn("未找到包含‘登录’字样的按钮");
}

Browser

chrome 120

version

0.1

[Feature] Security & Permissions

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

Implement the "Security & Permissions" features planned in the doc.

  • black list of interactive elements
  • black/white list of page urls

[Feature] 能使用当前浏览器登入的Qwen, Gemini或者ChatGPT的模型吗? 使用浏览器当前的认证

Feature Description / 功能描述

[Feature] 能使用当前浏览器登入的Qwen, Gemini或者ChatGPT的模型吗? 使用浏览器当前的认证
像这个插件这样 https://github.com/arsczx/gemini-nexus

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

[Bug] ollama

What happened?

// OpenAI-compatible services (e.g., Alibaba Bailian)
const pageAgent = new PageAgent({
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: 'your-api-key',
model: 'qwen-plus'
});

// Self-hosted models (e.g., Ollama)
const pageAgent = new PageAgent({
baseURL: 'http://localhost:11434/v1',
apiKey: 'N/A', // Ollama typically accepts any value
model: 'qwen3:latest'
});

// Free testing endpoint
// Note: Rate-limited, content-filtered, subject to change. Replace with your own.
// Note: Uses official DeepSeek-chat (3.2). See DeepSeek website for terms & privacy.
const DEMO_MODEL = 'PAGE-AGENT-FREE-TESTING-RANDOM'
const DEMO_BASE_URL = 'https://hwcxiuzfylggtcktqgij.supabase.co/functions/v1/llm-testing-proxy'
const DEMO_API_KEY = 'PAGE-AGENT-FREE-TESTING-RANDOM'

where write it

Code

q

Browser

egde

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Bug] 组件中时间戳没有更新

What happened?

Panel.ts组件中时间戳没有更新,每次都有花费一秒以上时间,但是时间戳好像是任务开始的时间

Image

Code

const agent = new PageAgent({
            instructions: {
                system: `
你是一个页面智能助手,当客户需求不清晰时候需要让客户澄清
`
            },
            model: model,
            baseURL: baseURL,
            apiKey: apiKey,
            language: language,
            enableMask: true  // 启用视觉遮罩
        });

Browser

chrome 145.0.7632.117

version

1.4.0

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Feature] Data Masking

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

Implement the data masking feature as planned in the doc.

Current Progress

Hi there!

This project is still a work in progress. Docs are pretty much drafts.

@see ROADMAP

Check out the demo in the website.

Be patient.

直接在前端定义 apiKey,是否会造成 apiKey 泄露

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Problem

Image

Solution

感觉应该前后端分离,提供一个前端 UI SDK 与后端 SDK,把 LLM 通信部分放在后端?

Proposed API

[Bug] Console errors after close the agent by UI.

What happened?

After clicking the "close" button on the panel. 3 errors are thrown in the console:

Image

Behaviors seem right.

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

demo error

What happened?

export declare class PageAgent extends EventTarget {
#private;
config: PageAgentConfig;
id: string;
panel: Panel;
tools: typeof tools;
paused: boolean;
disposed: boolean;
task: string;
taskId: string;
/** PageController for DOM operations /
pageController: PageController;
/
* Fullscreen mask /
mask: SimulatorMask;
/
* History records /
history: AgentHistory[];
constructor(config?: PageAgentConfig);
/
*
* @todo maybe return something?
*/
execute(task: string): Promise;
dispose(reason?: string): void;
}
import { PageAgent } from 'page-agent'

// test server
// @note: rate limit. prompt limit. Origin limit. May change anytime. Use your own llm!
// @note Using official DeepSeek-chat(3.2). Go to DeepSeek website for privacy policy.
const DEMO_MODEL = 'PAGE-AGENT-FREE-TESTING-RANDOM'
const DEMO_BASE_URL = 'https://hwcxiuzfylggtcktqgij.supabase.co/functions/v1/llm-testing-proxy'
const DEMO_API_KEY = 'PAGE-AGENT-FREE-TESTING-RANDOM'

const agent = new PageAgent({
modelName: DEMO_MODEL,
baseURL: DEMO_BASE_URL,
apiKey: DEMO_API_KEY,
language: 'en-US',
})

await agent.execute('Click the login button')

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Bug] UI uncompleted when model output containing "`"

What happened?

Model output:

Image

UI:

Image

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Feature] Move `ui` to a dedicated package.

Feature Description / 功能描述

Decouple the agent core logic and DOM env progressively.

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

[Feature] 非Tool Call版本的Page-Agent

Feature Description / 功能描述

问题:
国内很多的模型并不支持Tool Call,有些模型即便支持效果也不尽如人意。例如,今天替换了ModelScope上的Qwen3-vl-30B-A3B和Deepseek-V3.2,执行效果都非常差,经常出现#81的问题。

解决:
是否可以考虑研发非Tool Call版本的Page-Agent,实现思路类似于一些常见的Computer Use Agent和Mobile Agent,将Action Space放进提示词里面,结合输出示例,让模型输出包含json对象的文本,并利用json字符串解析抽取的技术从文本中抽取json对象,然后再执行对应action。现在很多模型(比如上述两个模型)指令遵从的能力都很强,正确输出json字符串的概率非常高,感觉应该会比tool call的更稳定。

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

[Feature] Separate the out-of-box testing UMD build and the production CDN build.

Feature Description / 功能描述

Problem

Currently there is only one UMD build on the CDN which was designed only for testing. It causes lots of problems for serious users:

  • auto construction.
  • build-in LLM api may change anytime.
  • no chance to set your own config.
  • global env pollution.

Solution

Give 2 UMD builds on the CDN. One for out-of-box testing (to fulfill our one line of code promise). Another as a alternative for npm with full API.

[Bug] qwen plus schema error

What happened?

Add description field for AgentOutput function.

{
	"type": "missing",
	"loc": ["body", "tools", 0, "function", "description"],
	"msg": "Field required",
	"input": {
		"name": "AgentOutput",
		"parameters": {}

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Feature] URL filters for custom tools

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

Add URL filters (as in the doc) for custom (and maybe internal?) tools.

  • May harm LLM caching though.

[Feature] Collecting good and bad cases.

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。

Feature Description / 功能描述

Good cases for baseline testing. Bad cases for future improvement.

Open a official discussion or issue for collection.

[Bug] GPT-5.4 /v1/responses doesn't support

What happened?

The GPT-5.2 is supported by v1/chat/completions but not by GPT-5.4 per the error below

Image

Code

Browser

No response

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

[Bug] it run with ollama

What happened?

edge extension
edge browser

git clone https://github.com/alibaba/page-agent.git
cd page-agent

npm install

npm start

افتح
C:\Users\m\Desktop\44\page-agent\packages\website\src\constants.ts

C:\Users\m\Desktop\44\page-agent\packages\page-agent\src/demo.ts

ابحث عن

https://hwcxiuzfylggtcktqgij.supabase.co/functions/v1/llm-testing-proxy
PAGE-AGENT-FREE-TESTING-RANDOM
عدلهم الى
http://localhost:11434/v1
qwen3:4b

.env

VITE_LLM_BASE_URL=http://localhost:11434/v1
VITE_LLM_API_KEY=ollama
VITE_LLM_MODEL_NAME=qwen3:4b

here
C:\Users\m\Desktop\44\page-agent\packages\website
C:\Users\m\Desktop\44\page-agent

Code

1

Browser

edge

version

No response

Community Communication / 社区沟通

  • I will be polite and respectful. / 我会保持礼貌与尊重。
  • I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
  • I have read the Code of Conduct. / 我已阅读行为准则。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.