mobilellm / autodroid Goto Github PK

View Code? Open in Web Editor NEW

215.0 6.0 32.0 46.39 MB

Source code for the paper "Empowering LLM to use Smartphone for Intelligent Task Automation"

Home Page: https://arxiv.org/abs/2308.15272

License: MIT License

Python 59.69% HTML 6.40% JavaScript 33.61% Shell 0.26% CSS 0.05%

agent llm mobile-task-automation

autodroid's Introduction

AutoDroid

About

This repository contains the code for the system for the paper: Empowering LLM to use Smartphone for Intelligent Task Automation.

For accessing the dataset DroidTask, you could download it from Google Cloud, and you could refer to the About Dataset Section.

AutoDroid is implemented based on the DroidBot framework.

How to install

Make sure you have:

Python
Java
Android SDK
Added platform_tools directory in Android SDK to PATH

Then clone this repo and install with pip:

git clone [email protected]:MobileLLM/AutoDroid.git
cd AutoDroid/
pip install -e .

How to use

Prepare:
- If you want to test AutoDroid with the apps used in our paper, please download the apk.zip folder from Google Cloud, and unzip it, and prepare a device or an emulator connected to your host machine via adb.
- If you want to test AutoDroid with other apps, please download the .apk file to your host machine, and prepare a device or an emulator connected to your host machine via adb.
- Prepare a GPT API key, and go to tools.py, replace the os.environ['GPT_URL'] with your own API key.
Start:
```
droidbot -a <path/to/.apk> -o <output/of/app> -task <your task> -keep_env -keep_app
```
you can try the scripts in the ./scripts folder, and the tasks from the DroidTask are listed in the form.

About Dataset

Organization of the Dataset,

    DroidTask
    ├── applauncher
    │   ├── states
    │   │   ├── Screenshot 1.png
    │   │   ├── Screenshot 2.png
    │   │   ├── ...
    │   │   ├── View hierarchy 1.json
    │   │   ├── View hierarchy 2.json
    │   │   └── ...
    │   ├── task1.yaml
    │   ├── task2.yaml
    │   ├── ...
    │   └── utg.yaml
    ├── calendar
    │   ├── states
    │   │   ├── Screenshot 1.png
    │   │   ├── Screenshot 2.png
    │   │   ├── ...
    │   │   ├── View hierarchy 1.json
    │   │   ├── View hierarchy 2.json
    │   │   └── ...
    │   ├── task1.yaml
    │   ├── task2.yaml
    │   ├── ...
    │   └── utg.yaml

DroidTask: The top level of the dataset, containing folders for each application included in the DroidTask, such as applauncher and calendar.
Application Folders: Records all the screenshots and raw view hierarchy parsed by droidbot:
- States Folder: This folder holds all the captured states of the application during usage. A state includes both visual representations (screenshots) and structural data (view hierarchies).
  - Screenshots: Images captured from the application's interface, named sequentially (e.g., Screenshot 1.png, Screenshot 2.png, etc.).
  - View Hierarchies: JSON files detailing the structure of the application's UI for each captured state (e.g., View hierarchy 1.json, View hierarchy 2.json, etc.).
- Task Files: YAML files named task1.yaml, task2.yaml, etc., containing the ground truth data for specific tasks within the application.
- UTG File: A utg.yaml file that records data from the user's random exploration of the application.
Mapping Between Tasks and States: If you want to use the screenshots in your method:
- Each taski.yaml file (where i is the task number) references states through a state_str identifier.
- This state_str can be matched with the state_str in a view hierarchy k.json file to associate tasks with their corresponding application states.
- The name of the view hierarchy k.json file (where k is the state number) is also used to locate the corresponding screenshot, as the screenshot and view hierarchy files share the same naming convention.

Known limitations

The current implementation is not good at determining task completion.
The task automation performance may be unstable due to the randomness of LLMs, the style/quality of app GUI and task descriptions, etc.
It requires connecting to a host machine via adb, instead of a standalone on-device solution.

Welcome to contribute!

Note

We thank a lot for the wonderful open source apps from Simple Mobile Tools.
Note that AutoDroid is currently for research purpose only. It may perform unintended actions (e.g. modifying your account/settings). Please use at your own risk.

Enjoy!

autodroid's People

Contributors

Stargazers

Watchers

autodroid's Issues

Can't reproduce. Complete calendar task "create a event of laundry" loops forever.

After configuring openai Key and installing all the required dependencies, verything worked almost as expected, but unfortunatelly this sample task did not finish. Was it expected to complete?
autodroid.log

开源代码问题请教

作者您好，拜读了您的文章，受益匪浅。关于代码开源方面我有一点问题想要请教。请问文章中提及的离线阶段中，AutoDroid结合UTG生成模拟任务这一块体现在代码的哪一个部分，望解答，谢谢！

Are you considering/trying to tag UI elements using CV methods

I saw that @yuanchun-li mentioned the use of visual methods to detect UI in https://arxiv.org/pdf/1901.02633

Are you considering/trying to tag UI elements using CV methods

In this way, many applications such as Flutter /web that cannot pass xml markup can also use this scheme

How to extract 'utg.yaml' for new apps

Thank you for sharing your awesome work! I'm trying to use the explorator in the droidtask_datatools repo to generate my own memory in order to enhance the TaskPolicy. However, I failed to find a way to acquire the 'utg.yaml' which is the input of the explorator. So, could you please tell me how to get the 'utg.yaml' for new apps and is it the right way to create memory? or just use the MemoryPolicy. Thank you again for your great repo!

How to use a local llm or open source model

I saw the research paper mentioned "The local LLM Vicuna-7B [6] is deployed on the smartphone
based on Machine Learning Compilation for LLM (MLCLLM)". How to setup the local llm and run the ./scripts with local llm?

AttributeError: module 'lmql' has no attribute 'model'

File "e:\aitesttt\autodroid\query_lmql.py", line 4, in
model=lmql.model("openai/gpt-3.5-turbo-instruct") # OpenAI API model
AttributeError: module 'lmql' has no attribute 'model'

Cannot input text for the apps

I was hosting local llm ollama on linux with emulator android 11 api 30 or a physical android 11 device. It can successfully trigger apk and click some buttons. But when it tries to input text, there does not input any texts or show the keyboard. Any ideas?

INFO:DroidBot:Starting DroidBot
INFO:Device:waiting for device
56479
[CONNECTION] ADB is enabled and connected.
[CONNECTION] TelnetConsole is not enabled.
[CONNECTION] DroidBotAppConn is enabled and connected.
[CONNECTION] Minicap is not enabled.
[CONNECTION] Logcat is enabled and connected.
[CONNECTION] UserInputMonitor is enabled and connected.
[CONNECTION] ProcessMonitor is enabled and connected.
[CONNECTION] DroidBotIme is enabled and connected.
Please wait while installing the app...
INFO:Device:App installed: com.simplemobiletools.calendar
INFO:Device:Main activity: com.simplemobiletools.calendar.activities.SplashActivity
INFO:AppEnvManager:Start deploying environment, policy is none
INFO:InputEventManager:start sending events, policy is task
Action: KillAppEvent()
INFO:TaskPolicy:Current state: d2aba9dcf2ce57a988070bcbcda6cc1931adff28e54aee1e491f45399d6e01e1
INFO:TaskPolicy:Trying to start the app...
Action: IntentEvent(intent='am start com.simplemobiletools.calendar/com.simplemobiletools.calendar.activities.SplashActivity')
INFO:TaskPolicy:Current state: 494a7caaf2b0162a25f10630a11b3f4c569841b363eb9ce31ceb5793c1448dd6
********************************** prompt: **********************************
You are a smartphone assistant to help users complete tasks by interacting with mobile apps.Given a task, the previous UI actions, and the content of current UI state, your job is to decide whether the task is already finished by the previous actions, and if not, decide which UI element in current UI state should be interacted.
Task: create a event of tapping title to input laundry and save
Previous UI actions:

launchApp Calendar
Current UI state:

Calendar

March go back

Your answer should always use the following format: { "Steps": "...", "Analyses": "...<Analyses of the relations between the task, and relations between the previous UI actions and current UI state>", "Finished": "Yes/No", "Next step": "None or a ", "id": "an integer or -1 (if the task has been completed by previous UI actions)", "action": "tap or input", "input_text": "N/A or ..." }

**Note that the id is the id number of the UI element to interact with. If you think the task has been completed by previous UI actions, the id should be -1. If 'Finished' is 'Yes', then the 'description' of 'Next step' is 'None', otherwise it is a high level description of the next step. If the 'action' is 'tap', the 'input_text' is N/A, otherwise it is the ''. Please do not output any content other than the JSON format. **
********************************** end of prompt **********************************
INFO:openai._base_client:Retrying request to /chat/completions in 0.812839 seconds
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
response: {
"Steps": "1. Tap on the 'New Event' button to create a new event 2. Tap on the 'March' button twice to select the month of March 3. Tap on the 'More options' button to access more options for the event, such as date and time, and add them if desired",
"Analyses": "The current UI state does not allow for direct input of laundry information. However, it is possible to create a new event using the 'New Event' button and then edit the event's details, including adding laundry information. After creating the event, one can tap on the selected date or time to edit its details further, which may include entering laundry information.",
"Finished": "No",
"Next step": "1. Tap on the 'New Event' button",
"id": 5,
"action": "tap",
"input_text": "-1"
}
Action: TouchEvent(state=494a7caaf2b0162a25f10630a11b3f4c569841b363eb9ce31ceb5793c1448dd6, view=158b14f3e368960dc2c5e14eb0cb8da5(MainActivity/ImageButton-))
INFO:TaskPolicy:Current state: b5a253bf5df48b6e43eab83047c21e3bcdae40b08949a983d412153ba080fbd5
********************************** prompt: **********************************
You are a smartphone assistant to help users complete tasks by interacting with mobile apps.Given a task, the previous UI actions, and the content of current UI state, your job is to decide whether the task is already finished by the previous actions, and if not, decide which UI element in current UI state should be interacted.
Task: create a event of tapping title to input laundry and save
Previous UI actions:

launchApp Calendar
TapOn:
Current UI state:

New Event

Title Location Description All-day OFF March 26 (Tue) 06:00 PM March 26 (Tue) 06:00 PM 10 minutes before Add another reminder No repetition Regular event go back

The knowledge base generation part is not found in the project

Sorry, the knowledge base generation part is not found in the project

Please mention the details of working Android version

After setting up the environment, running /scripts/calendar.sh results in the following error

INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: hkunlp/instructor-xl
load INSTRUCTOR_Transformer
max_seq_length 512
INFO:sentence_transformers.SentenceTransformer:Use pytorch device: cpu
Traceback (most recent call last):
File "/AutoDroid/droidbot/droidbot.py", line 104, in init
self.input_manager = InputManager(
File "/AutoDroid/droidbot/input_manager.py", line 64, in init
self.policy = self.get_input_policy(device, app, master)
File "/AutoDroid/droidbot/input_manager.py", line 84, in get_input_policy
input_policy = TaskPolicy(device, app, self.random_input, task=self.task)
File "/AutoDroid/droidbot/input_policy.py", line 683, in init
self.similar_ele_path, self.similar_ele_function, self.similar_ele_statement = self.get_most_similar_element()
File "/AutoDroid/droidbot/input_policy.py", line 728, in get_most_similar_element
similar_ele = ele_statements[app_name][similar_state_str]['elements'][similar_ele_idx]
TypeError: string indices must be integers
[CONNECTION] ADB is disconnected
WARNING:DroidBotIme:Failed to disconnect DroidBotIME!

No module named 'start' when running droidbot -h

Getting this error when running droidbot -h after doing pip install

Using a Python virtual environment (.venv) when installing pip packages

About experiments

Great job! I really enjoy your work！
If you could release the experimental code in the paper?
For example, metric code...

DroidTask benchmark suite availability

repo https://github.com/MobileLLM/DroidTask not found for me , is it a private closed source asset ?

No module named 'memory.memory_builder'

hi, i just run "droidbot -h" or any .sh in ./scripts folder will get the error "from memory.memory_builder import Memory
ModuleNotFoundError: No module named 'memory.memory_builder'" ,but it is ok to run "droidbot -h" when i follow the step in https://github.com/honeynet/droidbot/ . It means i cant run the code in atuodroid, how can i fix this problem? thank you

Question about the GPT-3.5-turbo response

Hi authors, thanks for the interesting work and awesome codebase.

the issue has been solved, thanks.

I'm honored to be the first issue submitter for this impressive project

I'm honored to be the first issue submitter for this impressive project, but I'd like to offer some suggestions:

Is the usage process overly complicated, for instance, is there a specific reason why an APK address must be provided? Is it because the app will be decompiled?
Could there be clearer instructions regarding the Python environment to facilitate deployment and use? I spent quite a bit of time installing the androguard package.

Thank you, and may it continue to improve.