Coder Social home page Coder Social logo

mobilellm / autodroid Goto Github PK

View Code? Open in Web Editor NEW
215.0 6.0 32.0 46.39 MB

Source code for the paper "Empowering LLM to use Smartphone for Intelligent Task Automation"

Home Page: https://arxiv.org/abs/2308.15272

License: MIT License

Python 59.69% HTML 6.40% JavaScript 33.61% Shell 0.26% CSS 0.05%
agent llm mobile-task-automation

autodroid's Introduction

AutoDroid

About

This repository contains the code for the system for the paper: Empowering LLM to use Smartphone for Intelligent Task Automation.

For accessing the dataset DroidTask, you could download it from Google Cloud, and you could refer to the About Dataset Section.

AutoDroid is implemented based on the DroidBot framework.

How to install

Make sure you have:

  1. Python
  2. Java
  3. Android SDK
  4. Added platform_tools directory in Android SDK to PATH

Then clone this repo and install with pip:

git clone [email protected]:MobileLLM/AutoDroid.git
cd AutoDroid/
pip install -e .

How to use

  1. Prepare:

    • If you want to test AutoDroid with the apps used in our paper, please download the apk.zip folder from Google Cloud, and unzip it, and prepare a device or an emulator connected to your host machine via adb.
    • If you want to test AutoDroid with other apps, please download the .apk file to your host machine, and prepare a device or an emulator connected to your host machine via adb.
    • Prepare a GPT API key, and go to tools.py, replace the os.environ['GPT_URL'] with your own API key.
  2. Start:

    droidbot -a <path/to/.apk> -o <output/of/app> -task <your task> -keep_env -keep_app

    you can try the scripts in the ./scripts folder, and the tasks from the DroidTask are listed in the form.

About Dataset

Organization of the Dataset,

    DroidTask
    ├── applauncher
    │   ├── states
    │   │   ├── Screenshot 1.png
    │   │   ├── Screenshot 2.png
    │   │   ├── ...
    │   │   ├── View hierarchy 1.json
    │   │   ├── View hierarchy 2.json
    │   │   └── ...
    │   ├── task1.yaml
    │   ├── task2.yaml
    │   ├── ...
    │   └── utg.yaml
    ├── calendar
    │   ├── states
    │   │   ├── Screenshot 1.png
    │   │   ├── Screenshot 2.png
    │   │   ├── ...
    │   │   ├── View hierarchy 1.json
    │   │   ├── View hierarchy 2.json
    │   │   └── ...
    │   ├── task1.yaml
    │   ├── task2.yaml
    │   ├── ...
    │   └── utg.yaml
  • DroidTask: The top level of the dataset, containing folders for each application included in the DroidTask, such as applauncher and calendar.

  • Application Folders: Records all the screenshots and raw view hierarchy parsed by droidbot:

    • States Folder: This folder holds all the captured states of the application during usage. A state includes both visual representations (screenshots) and structural data (view hierarchies).

      • Screenshots: Images captured from the application's interface, named sequentially (e.g., Screenshot 1.png, Screenshot 2.png, etc.).

      • View Hierarchies: JSON files detailing the structure of the application's UI for each captured state (e.g., View hierarchy 1.json, View hierarchy 2.json, etc.).

    • Task Files: YAML files named task1.yaml, task2.yaml, etc., containing the ground truth data for specific tasks within the application.

    • UTG File: A utg.yaml file that records data from the user's random exploration of the application.

  • Mapping Between Tasks and States: If you want to use the screenshots in your method:

    • Each taski.yaml file (where i is the task number) references states through a state_str identifier.
    • This state_str can be matched with the state_str in a view hierarchy k.json file to associate tasks with their corresponding application states.
    • The name of the view hierarchy k.json file (where k is the state number) is also used to locate the corresponding screenshot, as the screenshot and view hierarchy files share the same naming convention.

Known limitations

  • The current implementation is not good at determining task completion.
  • The task automation performance may be unstable due to the randomness of LLMs, the style/quality of app GUI and task descriptions, etc.
  • It requires connecting to a host machine via adb, instead of a standalone on-device solution.

Welcome to contribute!

Note

  • We thank a lot for the wonderful open source apps from Simple Mobile Tools.
  • Note that AutoDroid is currently for research purpose only. It may perform unintended actions (e.g. modifying your account/settings). Please use at your own risk.

Enjoy!

autodroid's People

Contributors

kiteflykid avatar wenh18 avatar yuanchun-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

autodroid's Issues

开源代码问题请教

作者您好,拜读了您的文章,受益匪浅。关于代码开源方面我有一点问题想要请教。请问文章中提及的离线阶段中,AutoDroid结合UTG生成模拟任务这一块体现在代码的哪一个部分,望解答,谢谢!

How to extract 'utg.yaml' for new apps

Thank you for sharing your awesome work! I'm trying to use the explorator in the droidtask_datatools repo to generate my own memory in order to enhance the TaskPolicy. However, I failed to find a way to acquire the 'utg.yaml' which is the input of the explorator. So, could you please tell me how to get the 'utg.yaml' for new apps and is it the right way to create memory? or just use the MemoryPolicy. Thank you again for your great repo!

How to use a local llm or open source model

I saw the research paper mentioned "The local LLM Vicuna-7B [6] is deployed on the smartphone
based on Machine Learning Compilation for LLM (MLCLLM)". How to setup the local llm and run the ./scripts with local llm?

Cannot input text for the apps

I was hosting local llm ollama on linux with emulator android 11 api 30 or a physical android 11 device. It can successfully trigger apk and click some buttons. But when it tries to input text, there does not input any texts or show the keyboard. Any ideas?

INFO:DroidBot:Starting DroidBot
INFO:Device:waiting for device
56479
[CONNECTION] ADB is enabled and connected.
[CONNECTION] TelnetConsole is not enabled.
[CONNECTION] DroidBotAppConn is enabled and connected.
[CONNECTION] Minicap is not enabled.
[CONNECTION] Logcat is enabled and connected.
[CONNECTION] UserInputMonitor is enabled and connected.
[CONNECTION] ProcessMonitor is enabled and connected.
[CONNECTION] DroidBotIme is enabled and connected.
Please wait while installing the app...
INFO:Device:App installed: com.simplemobiletools.calendar
INFO:Device:Main activity: com.simplemobiletools.calendar.activities.SplashActivity
INFO:AppEnvManager:Start deploying environment, policy is none
INFO:InputEventManager:start sending events, policy is task
Action: KillAppEvent()
INFO:TaskPolicy:Current state: d2aba9dcf2ce57a988070bcbcda6cc1931adff28e54aee1e491f45399d6e01e1
INFO:TaskPolicy:Trying to start the app...
Action: IntentEvent(intent='am start com.simplemobiletools.calendar/com.simplemobiletools.calendar.activities.SplashActivity')
INFO:TaskPolicy:Current state: 494a7caaf2b0162a25f10630a11b3f4c569841b363eb9ce31ceb5793c1448dd6
********************************** prompt: **********************************
You are a smartphone assistant to help users complete tasks by interacting with mobile apps.Given a task, the previous UI actions, and the content of current UI state, your job is to decide whether the task is already finished by the previous actions, and if not, decide which UI element in current UI state should be interacted.
Task: create a event of tapping title to input laundry and save
Previous UI actions:

  • launchApp Calendar
    Current UI state:

Calendar

March go back

Your answer should always use the following format: { "Steps": "...", "Analyses": "...<Analyses of the relations between the task, and relations between the previous UI actions and current UI state>", "Finished": "Yes/No", "Next step": "None or a ", "id": "an integer or -1 (if the task has been completed by previous UI actions)", "action": "tap or input", "input_text": "N/A or ..." }

**Note that the id is the id number of the UI element to interact with. If you think the task has been completed by previous UI actions, the id should be -1. If 'Finished' is 'Yes', then the 'description' of 'Next step' is 'None', otherwise it is a high level description of the next step. If the 'action' is 'tap', the 'input_text' is N/A, otherwise it is the ''. Please do not output any content other than the JSON format. **
********************************** end of prompt **********************************
INFO:openai._base_client:Retrying request to /chat/completions in 0.812839 seconds
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
response: {
"Steps": "1. Tap on the 'New Event' button to create a new event 2. Tap on the 'March' button twice to select the month of March 3. Tap on the 'More options' button to access more options for the event, such as date and time, and add them if desired",
"Analyses": "The current UI state does not allow for direct input of laundry information. However, it is possible to create a new event using the 'New Event' button and then edit the event's details, including adding laundry information. After creating the event, one can tap on the selected date or time to edit its details further, which may include entering laundry information.",
"Finished": "No",
"Next step": "1. Tap on the 'New Event' button",
"id": 5,
"action": "tap",
"input_text": "-1"
}
Action: TouchEvent(state=494a7caaf2b0162a25f10630a11b3f4c569841b363eb9ce31ceb5793c1448dd6, view=158b14f3e368960dc2c5e14eb0cb8da5(MainActivity/ImageButton-))
INFO:TaskPolicy:Current state: b5a253bf5df48b6e43eab83047c21e3bcdae40b08949a983d412153ba080fbd5
********************************** prompt: **********************************
You are a smartphone assistant to help users complete tasks by interacting with mobile apps.Given a task, the previous UI actions, and the content of current UI state, your job is to decide whether the task is already finished by the previous actions, and if not, decide which UI element in current UI state should be interacted.
Task: create a event of tapping title to input laundry and save
Previous UI actions:

  • launchApp Calendar
  • TapOn:
    Current UI state:

New Event

Title Location Description All-day OFF March 26 (Tue) 06:00 PM March 26 (Tue) 06:00 PM 10 minutes before Add another reminder No repetition Regular event go back

Your answer should always use the following format: { "Steps": "...", "Analyses": "...<Analyses of the relations between the task, and relations between the previous UI actions and current UI state>", "Finished": "Yes/No", "Next step": "None or a ", "id": "an integer or -1 (if the task has been completed by previous UI actions)", "action": "tap or input", "input_text": "N/A or ..." }

**Note that the id is the id number of the UI element to interact with. If you think the task has been completed by previous UI actions, the id should be -1. If 'Finished' is 'Yes', then the 'description' of 'Next step' is 'None', otherwise it is a high level description of the next step. If the 'action' is 'tap', the 'input_text' is N/A, otherwise it is the ''. Please do not output any content other than the JSON format. **
********************************** end of prompt **********************************
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
response: { "Steps": "...", "Analyses": "...<Analyses of the relations between the task, and relations between the previous UI actions and current UI state>", "Finished": "No", "Next step": "Input title, location, description and set 'All-day OFF' checkbox to 'ON', then tap 'Save' button.", "id": "-1", "action": "tap", "input_text": "Title" }
INFO:InputEventManager:Finish sending events
[CONNECTION] ADB is disconnected
[CONNECTION] UserInputMonitor is disconnected
[CONNECTION] Logcat is disconnected
WARNING:DroidBotIme:Failed to disconnect DroidBotIME!
INFO:DroidBot:DroidBot Stopped
[CONNECTION] ProcessMonitor is disconnected

After setting up the environment, running /scripts/calendar.sh results in the following error

INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: hkunlp/instructor-xl
load INSTRUCTOR_Transformer
max_seq_length 512
INFO:sentence_transformers.SentenceTransformer:Use pytorch device: cpu
Traceback (most recent call last):
File "/AutoDroid/droidbot/droidbot.py", line 104, in init
self.input_manager = InputManager(
File "/AutoDroid/droidbot/input_manager.py", line 64, in init
self.policy = self.get_input_policy(device, app, master)
File "/AutoDroid/droidbot/input_manager.py", line 84, in get_input_policy
input_policy = TaskPolicy(device, app, self.random_input, task=self.task)
File "/AutoDroid/droidbot/input_policy.py", line 683, in init
self.similar_ele_path, self.similar_ele_function, self.similar_ele_statement = self.get_most_similar_element()
File "/AutoDroid/droidbot/input_policy.py", line 728, in get_most_similar_element
similar_ele = ele_statements[app_name][similar_state_str]['elements'][similar_ele_idx]
TypeError: string indices must be integers
[CONNECTION] ADB is disconnected
WARNING:DroidBotIme:Failed to disconnect DroidBotIME!

About experiments

Great job! I really enjoy your work!
If you could release the experimental code in the paper?
For example, metric code...

No module named 'memory.memory_builder'

hi, i just run "droidbot -h" or any .sh in ./scripts folder will get the error "from memory.memory_builder import Memory
ModuleNotFoundError: No module named 'memory.memory_builder'" ,but it is ok to run "droidbot -h" when i follow the step in https://github.com/honeynet/droidbot/ . It means i cant run the code in atuodroid, how can i fix this problem? thank you

I'm honored to be the first issue submitter for this impressive project

I'm honored to be the first issue submitter for this impressive project, but I'd like to offer some suggestions:

  1. Is the usage process overly complicated, for instance, is there a specific reason why an APK address must be provided? Is it because the app will be decompiled?
  2. Could there be clearer instructions regarding the Python environment to facilitate deployment and use? I spent quite a bit of time installing the androguard package.

Thank you, and may it continue to improve.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.