The GUI Agent Living in Your Webpage. Control web interfaces with natural language.
🌐 English | 中文
🚀 Demo | 📖 Docs | 📢 HN Discussion | 𝕏 Follow on X
page-agent-demo-0227.mp4
- 🎯 Easy integration
- No need for
browser extension/python/headless browser. - Just in-page javascript. Everything happens in your web page.
- No need for
- 📖 Text-based DOM manipulation
- No screenshots. No multi-modal LLMs or special permissions needed.
- 🧠 Bring your own LLMs
- 🐙 Optional chrome extension for multi-page tasks.
- And an MCP Server (Beta) to control it from outside
- SaaS AI Copilot — Ship an AI copilot in your product in lines of code. No backend rewrite.
- Smart Form Filling — Turn 20-click workflows into one sentence. Perfect for ERP, CRM, and admin systems.
- Accessibility — Make any web app accessible through natural language. Voice commands, screen readers, zero barrier.
- Multi-page Agent — Extend your own web agent's reach across browser tabs chrome extension.
- MCP - Allow your agent clients to control your browser.
Fastest way to try PageAgent with our free Demo LLM:
<script src="{URL}" crossorigin="true"></script>
⚠️ For technical evaluation only. This demo CDN uses our free testing LLM API. By using it, you agree to its terms.
| Mirrors | URL |
|---|---|
| Global | https://cdn.jsdelivr.net/npm/[email protected]/dist/iife/page-agent.demo.js |
| China | https://registry.npmmirror.com/page-agent/1.8.2/files/dist/iife/page-agent.demo.js |
Add ?autoInit=false to load the script without creating the demo agent automatically. You can then instantiate it with new window.PageAgent(...).
npm install page-agentimport { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: 'YOUR_API_KEY',
language: 'en-US',
})
await agent.execute('Click the login button')
For more programmatic usage, see 📖 Documentations.
Built something cool with PageAgent? Add it here! Open a PR to share your project.
These are community projects — not maintained or endorsed by us. Use at your own discretion.
| Project | Description |
|---|---|
| Yours? | Open a PR 🙌 |
We welcome contributions from the community! See CONTRIBUTING.md for guidelines and docs/developer-guide.md for local development workflows.
Please read the maintainer's note on principles and current state.
Contributions generated entirely by bots or AI without substantial human involvement will not be accepted.
This project builds upon the excellent work of browser-use.
PageAgent is designed for client-side web enhancement, not server-side automation.
DOM processing components and prompt are derived from browser-use:
Browser Use <https://github.com/browser-use/browser-use>
Copyright (c) 2024 Gregor Zunic
Licensed under the MIT License
We gratefully acknowledge the browser-use project and its contributors for their
excellent work on web automation and DOM interaction patterns that helped make
this project possible.
⭐ Star this repo if you find PageAgent helpful!
page-agent's People
Forkers
umicom-foundation hadoop835 meikis xc0703 klaaay gaomeng1900 softctwo jeinfra lytv anthonyt278 tunjing01 cismankit 0xsojalsec darmawan01 rezwanahmedsami programmer-alamgir batterrathod dev-ham-tricloud crimsoncubelab joskid stophobia saintsdad ren-maomao mannix-lei zb0413 ranker-ai hy-vae 71du mingchen666 geekqiaqia lyqwsqk huaxianhu dev-intelligence 1091214370 speedy0526 undefined996 djonce konghayao xyz007wjm sdtm1016 shaysong99 lfz9527 davidlaw hxguo feiok level0r0s thelosttimes gitikid zmk2017 leagiboy wfxp2002 carlosadcaraujo jeevesh415 kngom83-droid gpia book987 ywzh7 richard-shan huberychina anilyagiz zeeeepa igoroffline oluka007 kangzl joaquin-boilet china-zhangbo automationkit linzhenhua1205 cgy1992 ohmygaugh-crypto send2cloud mbrukman claudiug devdoshi kustomzone hoipippeloi rahulsaipandit rahul-augmentme jshen28 visneto-aitest snkasi yabalaya xiongyoufan cayre1021 yji0728 tefuirnever lmendoza70 louloulin cpiprint ailabteam mr-moon121 firuzauxd sean-chow-ml signincloud ngw7163-a11y sunnydanu yanga321 flink-jp xinze01 sarcascoderpage-agent's Issues
[Docs] Full i18n docs
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
Make sure all the doc pages are translated. Prioritize the English version.
[Feature] User `takeover`
Feature Description / 功能描述
- user clicks
takeoverbtn on the panel - stop current step including llm calling, actions and so
- hide mask
- user handles the page
- user clicks
resumebtn - modal asks user "what have you changed?"
- user clicks
continue - add to
observationand resume agent loop
[Bug] 让 AI 理解按钮是否按预期工作。点击中英文切换按钮,预期应该是正确,但AI说是错误的。
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable details. / 我会提供建设性、可行动的细节。
- I have read the Code of Conduct. / 我已阅读行为准则。
What happened?
让 AI 理解按钮是否按预期工作。点击中英文切换按钮,预期应该是正确,但AI说是错误的。
- 直接打开演示页面 https://alibaba.github.io/page-agent/
- 输入提示词
请测试中英文切换功能是否正常
经过如下:
结果如下:
全屏截图:
Code
nullBrowser
chrome 143.0.7475.8
version
0.0.4
[Bug] CDN 快速引入报错
What happened?
通过<script src="https://cdn.jsdelivr.net/npm/page-agent@latest/dist/umd/index.js"></script>来引入报错
Code
Browser
No response
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Feature] Add docs for local .env file
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
Standardize dot env file and add docs for it.
Clean up the local dev pipeline.
[Refactor] Monorepo?
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
It's getting clearer that I need to separate the agent loop, the DOM processing logic, the GUI panel, the official site and (potentially) a chrome extension for multi-page agent.
Hosting under Alibaba group makes it impossible to create multiple repos for one project.
Using branches to do this is not acceptable.
Although I dislike mono-repo very much. It is inevitable.
- refactor project with mono-repo (simplified).
- leverage raw npm workspace. avoid mono-repo eco sys.
[Feature] Add switches for model patch
Feature Description / 功能描述
Currently, patches are applied according to model name. But in some case, the user may use a proxy or router service that patches certain model in the backend. It's better to allow user to switch on/off patches manually.
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Bug] 对启发式找到的可交互元素 include_attributes 没生效,导致llm找不到目标元素
Roadmap V1
see #272
🗺️ PageAgent Roadmap
🚀 Current Works
- MVP
- Core functionality implemented
- SPA interaction
- Reasoning and (short) memory
- Multi model provider integration and testing
- UI with HITL
- Human-in-the-loop user interface. Agent can ask user questions.
- Landing and doc pages
- Remove
ai-sdk- Only one function of AI-ADK is being used.
- Our agent memory and thinking mechanism does not suite ai-sdk.
- Robust LLM output
- Auto-fix incomplete output format of DeepSeek and QWen.
- Working homepage with live LLM API
- CDN
- Free testing API
- Custom actions and HITL
- Hooks and Events
- lifecycle hooks
- lifecycle events
- User takeover
-
❗Hijackpage_open/page_change/page_unloadbehavior - Custom knowledge base and instructions
- Safeguard
- Data-masking
- Improve Memory
- Optimize for popular UI frameworks
- i18n of the website
- Chinese version
- English version
- Refactor: Separate
AgentandPageController - Move mask and mouse simulator to page-controller.
- 🚩 Stable release v1
- 🚩 Chrome extenstion for multi-page tasks
- Edge Extension
- Firefox Extension
♻️ Following browser-use's update and contribute back.
📋 Pending Features
- MCP
- Tools for more complex tasks
- todo list
- file sys
- Support custom llm fetch
- Testing suits
🤔 To Be Decided
- Safari Extension
- Same-origin multi-page-app relay
- Backend states relay for cross-page task without extension?
[Feature] More flexible `AgentHistory` for user `observation` and `user-takeover`
Feature Description / 功能描述
observation and user-takeover may need to be added to memory. Current history item only store actions. Either allow non-action in history or add non-action fields before or after an action.
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Bug] 输入指令后,控制台报错 step failed: Tool arguments validation failed: action "0" with args "{"
What happened?
通过npm方式安装后,再main.js中集成const agent = new PageAgent({
model: DEMO_MODEL,
baseURL: DEMO_BASE_URL,
apiKey: DEMO_API_KEY,
language: 'zh-CN'
})
// await agent.execute("新增一条系统名称")
// 或者显示对话框让用户输入指令
agent.panel.show()
运行项目后,在页面任务中,输入任务,确定执行,就报错了,模型服务确认都已经链接成功
Code
import { PageAgent } from 'page-agent'
const DEMO_MODEL = 'Qwen/Qwen3-Coder-480B-A35B-Instruct'
const DEMO_BASE_URL = 'https://api-inference.modelscope.cn/v1'
const DEMO_API_KEY = 'xxxx'
const agent = new PageAgent({
model: DEMO_MODEL,
baseURL: DEMO_BASE_URL,
apiKey: DEMO_API_KEY,
language: 'zh-CN'
})
// await agent.execute("新增一条系统名称")
// 或者显示对话框让用户输入指令
agent.panel.show()
Browser
143.0.7499.109
version
0.0.13
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Bug] 初始化没要求展示panel,但panel仍存在且捕获事件,阻碍原网页交互
What happened?
操作视频:
https://github.com/user-attachments/assets/ffe4eeb1-7e5b-454a-ac00-8d052ece2bc6
Code
Browser
chrome 143
version
0.0.15
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Feature] Support `LLMs.txt`
Feature Description / 功能描述
Why not?
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
🌟 V1
🤞 Hope to publish the first stable version of PageAgent next week.
[Docs] Update `Core Lib Development and Testing`
What happened?
Core Lib Development and Testing in Contributing does not work any more because of the mono-repo refactor.
Code
Browser
No response
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
🌟 Chrome Extension is Here!
[Feature] Optimize tool-call for smaller models.
Feature Description
PageAgent currently uses a very complex tool-call schema to combine self-reflection and action in a single llm call.
However the nested tool-call schema can be very challenging for models not optimized for this kind of task.
There are potential approaches to opt for this:
📖 Describe tool schema in prompt.
It may be helpful to use a text description in the prompt instead of tools api.
Not sure if it can improve the success rate. But can cover open-source models (rarely support tool call).
🪓 Remove the self-reflection process.
Use a simple array of actions as tools. Skip the self reflection process once for all.
- remove
<output>from prompt - a new LLMClient to support normal tool calls (but also lenient?)
💦 Separate self-reflection into a dedicated llm call.
- 2 calls every step.
or
- add a new tool called
thinkingand tell the model to use it every other step
In any case. Can be lots of work. Need a better structure to control MacroTool in Agent and LLM modules.
Loop detection (same action has been called over and over again)
[Bug] 对select下拉框进行自动化,无法切换选项
What happened?
Code
Browser
No response
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
Incompatible with Zod v3: `TypeError: process is not a function` in `toJSONSchema`
Description
When using page-agent in a project that has [email protected] installed, the agent fails with a TypeError at runtime. It seems page-agent internally depends on Zod v4's API (specifically the toJSONSchema / zod-to-json-schema processing), which is not compatible with Zod v3.
Error
TypeError: (0 , _to_json_schema_js__WEBPACK_IMPORTED_MODULE_0__.process) is not a function
at Module.toJSONSchema (json-schema-processors.js:602:1)
at zodToOpenAITool (page-agent-llms.js:71:1)
at page-agent-llms.js:146:1
at Array.map (<anonymous>)
at OpenAIClient.invoke (page-agent-llms.js:146:1)
at withRetry.maxRetries (page-agent-llms.js:355:1)
at withRetry (page-agent-llms.js:384:1)
at _LLM.invoke (page-agent-llms.js:352:1)
at _PageAgent.execute (page-agent-core.js:457:1)
Steps to Reproduce
- Install
page-agentin a project that already has[email protected]as a dependency - Call
PageAgent.execute() - The error is thrown when
zodToOpenAITooltries to convert Zod schemas to JSON Schema
Expected Behavior
page-agent should either:
- Be compatible with both Zod v3 and Zod v4, or
- Declare
[email protected]as apeerDependencyso users are aware of the version requirement
Environment
- zod version: 3.x (project-level dependency)
- page-agent internally expects: zod 4.x
[Feature] Add `observation` to context.
Feature Description / 功能描述
Certain important info can be very helpful for the Agent.
- URL changed (a previous action caused a in-page nav)
- Page content not change (the previous action did not make any change in the page)
- console errors
- long wait (should stop waiting)
- too many steps (should sum up now)
Either add it to AgentBrain (stay in memory?) or <browser_state>. Or maybe a new block called <observation>.
[Feature] Upgrade User Event Simulator
Feature Description / 功能描述
Refer to @testing-library/user-event.
Consider separating Event Simulator to a dedicated package.
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Refactor] Separate PageController.
Feature Description / 功能描述
Reason:
- The PageAgent main loop should not heavily rely on DOM env and page env. Which makes it difficult to port it to a pure javascript env (like service worker / node / extension background) or to add multi-page features.
- The page controller should not rely on LLM. So that it can be tested in unit tests.
Goal:
- Add a new package (folder:
page-controller, package name:@page-agent/page-controller) - Communications between the two main modules should always be considered async and isolated. That means actual ref of DOM elements and objects from page controller should never be passed to the main loop. vise versa.
- The apis of PageController should be simple enough for potential remote calling.
- Everything in current
dommodule should move to PageController and expose as async methods. includingselectorMapandelementTextMap, should only be saved inside PageController. - all the actions in
tools/actionsthat runs in a page should be moved to PageController.toolsshould call async methods on PageController to actually control the page. - PageController works independently.
- To avoid the fuss. currently the
PageAgentclass can just importPageControllerand save the instance onthis.pageController.
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Feature] qwen3.5 support
Feature Description / 功能描述
qwen3.5 support
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Feature] Multi-page process without Chrome Plugin
Feature Description / 功能描述
Being able to navigate different routes within the same domain while keep plan and state would be extremely useful in SaaS platforms.
Right right the only option is to use the Chrome plugin which works but is a barrier to mass adoption.
Is technically possible to implement multi-page plans in a secure, standard way ?
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Bug] After deleting baseURL, apiKey, model in Chrome extension settings and saving, an error occurs
What happened?
Originally, I wanted to reset to default settings, but there was no button, so I deleted baseURL, apiKey, model and saved, and I've been stuck on the error page
Code
Browser
Edge 145.0.3800.82 (arm64)
version
0.1.11
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Feature] Remove `pause`
Feature Description / 功能描述
- Pausing during a task doesn't make much sense.
- Taking over is a common need for real-time GUI Agents.
[Feature] Knowledge Injection
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
Implement the "Knowledge Injection" feature as planned in the doc.
[Docs] Give best practices
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
Give best practices and suggestions.
- Task prompts
- Model selection
- Page requirements
[Bug] lmstudio 运行模型调用错误
What happened?
lmstudio 运行模型qwen3-30b-a3b-2507,提示InvokeError: HTTP 400: Bad Request
{
"model": "qwen/qwen3-30b-a3b-2507",
"temperature": 1,
"messages": [
{
"role": "system",
"content": "You are an AI agent designed to operate in an iter... ...ame\": {// action-specific parameter}}\n}\n\n"
},
{
"role": "user",
"content": "\n\nAlice Williams是第几个\n ...捷操作\n管理操作\n备忘\n高亮 />\n[End of page]\n\n\n"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "AgentOutput",
"description": "You MUST call this tool every step. Outputs your reflections and next action.",
"parameters": {
"type": "object",
"properties": {
"evaluation_previous_goal": {
"type": "string"
},
"memory": {
"type": "string"
},
"next_goal": {
"type": "string"
},
"action": {
"anyOf": [
{
"type": "object",
"properties": {
"done": {
"type": "object",
"properties": {
"text": {
"type": "string"
},
"success": {
"default": true,
"type": "boolean"
}
},
"required": [
"text",
"success"
],
"additionalProperties": false
}
},
"required": [
"done"
],
"additionalProperties": false,
"description": "Complete task - provide a summary of results for t... ... be your response to the user summarizing results."
},
{
"type": "object",
"properties": {
"wait": {
"type": "object",
"properties": {
"seconds": {
"default": 1,
"type": "number",
"minimum": 1,
"maximum": 10
}
},
"required": [
"seconds"
],
"additionalProperties": false
}
},
"required": [
"wait"
],
"additionalProperties": false,
"description": "Wait for x seconds. default 1s (max 10 seconds, mi... ...ed to wait until the page or data is fully loaded."
},
{
"type": "object",
"properties": {
"click_element_by_index": {
"type": "object",
"properties": {
"index": {
"type": "integer",
"minimum": 0,
"maximum": 9007199254740991
}
},
"required": [
"index"
],
"additionalProperties": false
}
},
"required": [
"click_element_by_index"
],
"additionalProperties": false,
"description": "Click element by index"
},
{
"type": "object",
"properties": {
"input_text": {
"type": "object",
"properties": {
"index": {
"type": "integer",
"minimum": 0,
"maximum": 9007199254740991
},
"text": {
"type": "string"
}
},
"required": [
"index",
"text"
],
"additionalProperties": false
}
},
"required": [
"input_text"
],
"additionalProperties": false,
"description": "Click and input text into a input interactive element"
},
{
"type": "object",
"properties": {
"select_dropdown_option": {
"type": "object",
"properties": {
"index": {
"type": "integer",
"minimum": 0,
"maximum": 9007199254740991
},
"text": {
"type": "string"
}
},
"required": [
"index",
"text"
],
"additionalProperties": false
}
},
"required": [
"select_dropdown_option"
],
"additionalProperties": false,
"description": "Select dropdown option for interactive element index by the text of the option you want to select"
},
{
"type": "object",
"properties": {
"scroll": {
"type": "object",
"properties": {
"down": {
"default": true,
"type": "boolean"
},
"num_pages": {
"default": 0.1,
"type": "number",
"minimum": 0,
"maximum": 10
},
"pixels": {
"type": "integer",
"minimum": 0,
"maximum": 9007199254740991
},
"index": {
"type": "integer",
"minimum": 0,
"maximum": 9007199254740991
}
},
"required": [
"down",
"num_pages"
],
"additionalProperties": false
}
},
"required": [
"scroll"
],
"additionalProperties": false,
"description": "Scroll the page by specified number of pages (set ... ...l by a specific number of pixels instead of pages."
},
{
"type": "object",
"properties": {
"scroll_horizontally": {
"type": "object",
"properties": {
"right": {
"default": true,
"type": "boolean"
},
"pixels": {
"type": "integer",
"minimum": 0,
"maximum": 9007199254740991
},
"index": {
"type": "integer",
"minimum": 0,
"maximum": 9007199254740991
}
},
"required": [
"right",
"pixels"
],
"additionalProperties": false
}
},
"required": [
"scroll_horizontally"
],
"additionalProperties": false,
"description": "Scroll the page or element horizontally (set right... ...its scroll container (works well for wide tables)."
},
{
"type": "object",
"properties": {
"open_new_tab": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL to open in the new tab"
}
},
"required": [
"url"
],
"additionalProperties": false
}
},
"required": [
"open_new_tab"
],
"additionalProperties": false,
"description": "Open a new browser tab with the specified URL. The new tab becomes the current tab for all subsequent page operations."
},
{
"type": "object",
"properties": {
"switch_to_tab": {
"type": "object",
"properties": {
"tab_id": {
"type": "integer",
"minimum": -9007199254740991,
"maximum": 9007199254740991,
"description": "The tab ID to switch to"
}
},
"required": [
"tab_id"
],
"additionalProperties": false
}
},
"required": [
"switch_to_tab"
],
"additionalProperties": false,
"description": "Switch to an existing tab by its ID. After switchi... ...ch to tabs in the tab list shown in browser state."
},
{
"type": "object",
"properties": {
"close_tab": {
"type": "object",
"properties": {
"tab_id": {
"type": "integer",
"minimum": -9007199254740991,
"maximum": 9007199254740991,
"description": "The tab ID to close"
}
},
"required": [
"tab_id"
],
"additionalProperties": false
}
},
"required": [
"close_tab"
],
"additionalProperties": false,
"description": "Close a tab by its ID. Cannot close the initial tab. Optionally specify which tab to switch to after closing."
}
]
}
},
"required": [
"action"
],
"additionalProperties": false
}
}
}
],
"parallel_tool_calls": false,
"tool_choice": {
"type": "function",
"function": {
"name": "AgentOutput"
}
}
}
Code
Browser
No response
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
page-agent 是否真的“可控网页界面”
What happened?
我研究了你们项目的实现,发现它存在多个结构性问题,尤其在如下方面可能根本无法实现你们声称的功能:
- 无法处理 iframe 中的目标按钮
多数真实网页按钮嵌套于 iframe(如登录框、支付确认框),你们当前 DOM 抽取方式仅在 window.document 上执行,无法触达嵌套层级。
示例复现场景:任意淘宝或支付宝嵌套式弹窗结构。
- 不具备异步 DOM 监听能力
在 SPA / 动态加载场景中,目标按钮页面加载后延迟出现,你们并没有使用 MutationObserver 进行 DOM 变更监听,execute() 调用很容易失败。
- 模型响应 stateless,无上下文记忆
所谓“自然语言控制”完全依赖 LLM 一次性生成指令。无法处理多轮交互(例如:点击菜单 → 第二项 → 填表 → 提交),严重限制实际使用。
- 安全风险未做限制
是否考虑过如下风险:
- 是否可以调用付款按钮?
- 是否可以连续点击导致拒绝服务?
- 是否可能被 CSRF 注入构造假点击?
- 所有功能 puppeteer 都更稳定
如果 page-agent 的所有功能(点击按钮 / 获取文本 / DOM控制)都可以用 puppeteer 更稳定完成,那你们的定位是不是需要重新定义一下?
建议回应点
- 是否计划支持 iframe 嵌套场景?
- 是否会引入状态链机制支持多轮对话?
- 是否考虑更清晰地限定 page-agent 的适用范围?
我个人非常支持 Agent 项目的发展,但该项目当前状态仍存在大量不实宣传与结构问题,建议团队正视并澄清。
期待你们的回应。(你们这个项目对“自然语言控制网页”的理解,还停留在“点点按钮”的阶段。认知还在2020年。我不明白你们这个“page-agent”有何创新之处,以下 20 行代码加 ChatGPT 提示词,就能完成你们整个项目的核心功能——为什么需要一个复杂壳子来套?)
Code
// 假设我们从 ChatGPT 得到自然语言指令:"点击页面上的‘登录’按钮"
const prompt = "点击页面上的‘登录’按钮";
// 手动模拟 LLM → DOM 查询的极简映射逻辑
const elements = Array.from(document.querySelectorAll('button, a, input[type="button"], input[type="submit"]'));
// 简单匹配 innerText 或 aria-label 中包含“登录”的元素
const target = elements.find(el => {
const text = el.innerText || el.getAttribute('aria-label') || '';
return /登录/.test(text);
});
// 如果找到,模拟点击
if (target) {
console.log("Found target button:", target);
target.click();
} else {
console.warn("未找到包含‘登录’字样的按钮");
}
Browser
chrome 120
version
0.1
[Feature] new demo video
Feature Description / 功能描述
clipped-compressed.mp4
page-agent-demo.mp4
[Feature] Security & Permissions
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
Implement the "Security & Permissions" features planned in the doc.
- black list of interactive elements
- black/white list of page urls
[Feature] 能使用当前浏览器登入的Qwen, Gemini或者ChatGPT的模型吗? 使用浏览器当前的认证
Feature Description / 功能描述
[Feature] 能使用当前浏览器登入的Qwen, Gemini或者ChatGPT的模型吗? 使用浏览器当前的认证
像这个插件这样 https://github.com/arsczx/gemini-nexus
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Bug] ollama
What happened?
// OpenAI-compatible services (e.g., Alibaba Bailian)
const pageAgent = new PageAgent({
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: 'your-api-key',
model: 'qwen-plus'
});
// Self-hosted models (e.g., Ollama)
const pageAgent = new PageAgent({
baseURL: 'http://localhost:11434/v1',
apiKey: 'N/A', // Ollama typically accepts any value
model: 'qwen3:latest'
});
// Free testing endpoint
// Note: Rate-limited, content-filtered, subject to change. Replace with your own.
// Note: Uses official DeepSeek-chat (3.2). See DeepSeek website for terms & privacy.
const DEMO_MODEL = 'PAGE-AGENT-FREE-TESTING-RANDOM'
const DEMO_BASE_URL = 'https://hwcxiuzfylggtcktqgij.supabase.co/functions/v1/llm-testing-proxy'
const DEMO_API_KEY = 'PAGE-AGENT-FREE-TESTING-RANDOM'
where write it
Code
qBrowser
egde
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Bug] 组件中时间戳没有更新
What happened?
Panel.ts组件中时间戳没有更新,每次都有花费一秒以上时间,但是时间戳好像是任务开始的时间
Code
const agent = new PageAgent({
instructions: {
system: `
你是一个页面智能助手,当客户需求不清晰时候需要让客户澄清
`
},
model: model,
baseURL: baseURL,
apiKey: apiKey,
language: language,
enableMask: true // 启用视觉遮罩
});
Browser
chrome 145.0.7632.117
version
1.4.0
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Feature] Data Masking
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
Implement the data masking feature as planned in the doc.
Current Progress
直接在前端定义 apiKey,是否会造成 apiKey 泄露
[Bug] Console errors after close the agent by UI.
What happened?
After clicking the "close" button on the panel. 3 errors are thrown in the console:
Behaviors seem right.
Code
Browser
No response
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
demo error
What happened?
export declare class PageAgent extends EventTarget {
#private;
config: PageAgentConfig;
id: string;
panel: Panel;
tools: typeof tools;
paused: boolean;
disposed: boolean;
task: string;
taskId: string;
/** PageController for DOM operations /
pageController: PageController;
/* Fullscreen mask /
mask: SimulatorMask;
/* History records /
history: AgentHistory[];
constructor(config?: PageAgentConfig);
/*
* @todo maybe return something?
*/
execute(task: string): Promise;
dispose(reason?: string): void;
}
import { PageAgent } from 'page-agent'
// test server
// @note: rate limit. prompt limit. Origin limit. May change anytime. Use your own llm!
// @note Using official DeepSeek-chat(3.2). Go to DeepSeek website for privacy policy.
const DEMO_MODEL = 'PAGE-AGENT-FREE-TESTING-RANDOM'
const DEMO_BASE_URL = 'https://hwcxiuzfylggtcktqgij.supabase.co/functions/v1/llm-testing-proxy'
const DEMO_API_KEY = 'PAGE-AGENT-FREE-TESTING-RANDOM'
const agent = new PageAgent({
modelName: DEMO_MODEL,
baseURL: DEMO_BASE_URL,
apiKey: DEMO_API_KEY,
language: 'en-US',
})
await agent.execute('Click the login button')
Code
Browser
No response
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Bug] UI uncompleted when model output containing "`"
[Feature] Move `ui` to a dedicated package.
Feature Description / 功能描述
Decouple the agent core logic and DOM env progressively.
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Feature] 非Tool Call版本的Page-Agent
Feature Description / 功能描述
问题:
国内很多的模型并不支持Tool Call,有些模型即便支持效果也不尽如人意。例如,今天替换了ModelScope上的Qwen3-vl-30B-A3B和Deepseek-V3.2,执行效果都非常差,经常出现#81的问题。
解决:
是否可以考虑研发非Tool Call版本的Page-Agent,实现思路类似于一些常见的Computer Use Agent和Mobile Agent,将Action Space放进提示词里面,结合输出示例,让模型输出包含json对象的文本,并利用json字符串解析抽取的技术从文本中抽取json对象,然后再执行对应action。现在很多模型(比如上述两个模型)指令遵从的能力都很强,正确输出json字符串的概率非常高,感觉应该会比tool call的更稳定。
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
[Feature] Separate the out-of-box testing UMD build and the production CDN build.
Feature Description / 功能描述
Problem
Currently there is only one UMD build on the CDN which was designed only for testing. It causes lots of problems for serious users:
- auto construction.
- build-in LLM api may change anytime.
- no chance to set your own config.
- global env pollution.
Solution
Give 2 UMD builds on the CDN. One for out-of-box testing (to fulfill our one line of code promise). Another as a alternative for npm with full API.
[Bug] qwen plus schema error
What happened?
Add description field for AgentOutput function.
{
"type": "missing",
"loc": ["body", "tools", 0, "function", "description"],
"msg": "Field required",
"input": {
"name": "AgentOutput",
"parameters": {}
Code
Browser
No response
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Feature] URL filters for custom tools
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
Add URL filters (as in the doc) for custom (and maybe internal?) tools.
- May harm LLM caching though.
[Feature] Collecting good and bad cases.
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the CODE_OF_CONDUCT.md and CONTRIBUTING.md. / 我已阅读行为准则。
Feature Description / 功能描述
Good cases for baseline testing. Bad cases for future improvement.
Open a official discussion or issue for collection.
[Bug] GPT-5.4 /v1/responses doesn't support
What happened?
The GPT-5.2 is supported by v1/chat/completions but not by GPT-5.4 per the error below
Code
Browser
No response
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
[Bug] it run with ollama
What happened?
edge extension
edge browser
git clone https://github.com/alibaba/page-agent.git
cd page-agent
npm install
npm start
افتح
C:\Users\m\Desktop\44\page-agent\packages\website\src\constants.ts
C:\Users\m\Desktop\44\page-agent\packages\page-agent\src/demo.ts
ابحث عن
https://hwcxiuzfylggtcktqgij.supabase.co/functions/v1/llm-testing-proxy
PAGE-AGENT-FREE-TESTING-RANDOM
عدلهم الى
http://localhost:11434/v1
qwen3:4b
.env
VITE_LLM_BASE_URL=http://localhost:11434/v1
VITE_LLM_API_KEY=ollama
VITE_LLM_MODEL_NAME=qwen3:4b
here
C:\Users\m\Desktop\44\page-agent\packages\website
C:\Users\m\Desktop\44\page-agent
Code
1Browser
edge
version
No response
Community Communication / 社区沟通
- I will be polite and respectful. / 我会保持礼貌与尊重。
- I will share constructive, actionable suggestions. / 我会提供建设性、可行动的建议。
- I have read the Code of Conduct. / 我已阅读行为准则。
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
OpenClaw
Personal AI Assistant
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.







