Coder Social home page Coder Social logo

ufo's Introduction

UFO UFO Image: A UI-Focused Agent for Windows OS Interaction

arxivPython VersionLicense: MITDocumentationYouTube

UFO is a UI-Focused multi-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications.

🕌 Framework

UFO UFO Image operates as a multi-agent framework, encompassing:

  • HostAgent 🤖, tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application.
  • AppAgent 👾, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application.
  • Application Automator 🎮, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details here.

Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend the application UI and fulfill the user's request. For more details, please consult our technical report and documentation.

📢 News

  • 📅 2024-07-06: We have a New Release for v1.0.0!. You can check out our documentation. We welcome your contributions and feedback!
  • 📅 2024-06-28: We are thrilled to announce that our official introduction video is now available on YouTube!
  • 📅 2024-06-25: New Release for v0.2.1! We are excited to announce the release of version 0.2.1! This update includes several new features and improvements:
    1. HostAgent Refactor: We've refactored the HostAgent to enhance its efficiency in managing AppAgents within UFO.
    2. Evaluation Agent: Introducing an evaluation agent that assesses task completion and provides real-time feedback.
    3. Google Gemini Support: UFO now supports Google Gemini as the inference engine. Refer to our detailed guide in documentation.
    4. Customized User Agents: Users can now create customized agents by simply answering a few questions.
  • 📅 2024-05-21: We have reached 5K stars!✨
  • 📅 2024-05-08: New Release for v0.1.1! We've made some significant updates! Previously known as AppAgent and ActAgent, we've rebranded them to HostAgent and AppAgent to better align with their functionalities. Explore the latest enhancements:
    1. Learning from Human Demonstration: UFO now supports learning from human demonstration! Utilize the Windows Step Recorder to record your steps and demonstrate them for UFO. Refer to our detailed guide in README.md for more information.
    2. Win32 Support: We've incorporated support for Win32 as a control backend, enhancing our UI automation capabilities.
    3. Extended Application Interaction: UFO now goes beyond UI controls, allowing interaction with your application through keyboard inputs and native APIs! Presently, we support Word (examples), with more to come soon. Customize and build your own interactions.
    4. Control Filtering: Streamline LLM's action process by using control filters to remove irrelevant control items. Enable them in config_dev.yaml under the control filtering section at the bottom.
  • 📅 2024-03-25: New Release for v0.0.1! Check out our exciting new features.
    1. We now support creating your help documents for each Windows application to become an app expert. Check the documentation for more details!
    2. UFO now supports RAG from offline documents and online Bing search.
    3. You can save the task completion trajectory into its memory for UFO's reference, improving its future success rate!
    4. You can customize different GPT models for AppAgent and ActAgent. Text-only models (e.g., GPT-4) are now supported!
  • 📅 2024-02-14: Our technical report is online!
  • 📅 2024-02-10: UFO is released on GitHub🎈. Happy Chinese New year🐉!

🌐 Media Coverage

UFO sightings have garnered attention from various media outlets, including:

These sources provide insights into the evolving landscape of technology and the implications of UFO phenomena on various platforms.

💥 Highlights

  • First Windows Agent - UFO is the pioneering agent framework capable of translating user requests in natural language into actionable operations on Windows OS.
  • Agent as an Expert - UFO is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including offline help documents, online search engines, and human demonstrations, making the agent an application "expert".
  • Rich Skill Set - UFO is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native API, and "Copilot".
  • Interactive Mode - UFO facilitates multiple sub-requests from users within the same session, enabling the seamless completion of complex tasks.
  • Agent Customization - UFO allows users to customize their own agents by providing additional information. The agent will proactively query users for details when necessary to better tailor its behavior.
  • Scalable AppAgent Creation - UFO offers extensibility, allowing users and app developers to create their own AppAgents in an easy and scalable way.

✨ Getting Started

🛠️ Step 1: Installation

UFO requires Python >= 3.10 running on Windows OS >= 10. It can be installed by running the following command:

# [optional to create conda environment]
# conda create -n ufo python=3.10
# conda activate ufo

# clone the repository
git clone https://github.com/microsoft/UFO.git
cd UFO
# install the requirements
pip install -r requirements.txt
# If you want to use the Qwen as your LLMs, uncomment the related libs.

⚙️ Step 2: Configure the LLMs

Before running UFO, you need to provide your LLM configurations individually for HostAgent and AppAgent. You can create your own config file ufo/config/config.yaml, by copying the ufo/config/config.yaml.template and editing config for HOST_AGENT and APP_AGENT as follows:

OpenAI

VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.  
API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
API_KEY: "sk-",  # The OpenAI API key, begin with sk-
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview",  # The only OpenAI model

Azure OpenAI (AOAI)

VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.  
API_BASE: "YOUR_ENDPOINT", #  The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
API_KEY: "YOUR_KEY",  # The aoai API key
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview",  # The only OpenAI model
API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API

You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False and proper API_MODEL (openai) and API_DEPLOYMENT_ID (aoai). You can also optionally set an backup LLM engine in the field of BACKUP_AGENT if the above engines failed during the inference.

Non-Visual Model Configuration

You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml file:

  • VISUAL_MODE: False # To enable non-visual mode.
  • Specify the appropriate API_MODEL (OpenAI) and API_DEPLOYMENT_ID (AOAI) for each agent.

Optionally, you can set a backup language model (LLM) engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference. Ensure you configure these settings accurately to leverage non-visual models effectively.

NOTE 💡

UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the documents for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in config_dev.yaml.

📔 Step 3: Additional Setting for RAG (optional).

If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the ufo/config/config.yaml file.

We provide the following options for RAG to enhance UFO's capabilities:

Consult their respective documentation for more information on how to configure these settings.

🎉 Step 4: Start UFO

⌨️ You can execute the following on your Windows command Line (CLI):

# assume you are in the cloned UFO folder
python -m ufo --task <your_task_name>

This will start the UFO process and you can interact with it through the command line interface. If everything goes well, you will see the following message:

Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction. 
 _   _  _____   ___
| | | ||  ___| / _ \
| | | || |_   | | | |
| |_| ||  _|  | |_| |
 \___/ |_|     \___/
Please enter your request to be completed🛸:

⚠️Reminder:

  • Before UFO executing your request, please make sure the targeted applications are active on the system.
  • The GPT-V accepts screenshots of your desktop and application GUI as input. Please ensure that no sensitive or confidential information is visible or captured during the execution process. For further information, refer to DISCLAIMER.md.

Step 5 🎥: Execution Logs

You can find the screenshots taken and request & response logs in the following folder:

./ufo/logs/<your_task_name>/

You may use them to debug, replay, or analyze the agent output.

❓Get help

  • Please first check our our documentation here.
  • ❔GitHub Issues (prefered)
  • For other communications, please contact [email protected].

🎬 Demo Examples

We present two demo videos that complete user request on Windows OS using UFO. For more case study, please consult our technical report.

1️⃣🗑️ Example 1: Deleting all notes on a PowerPoint presentation.

In this example, we will demonstrate how to efficiently use UFO to delete all notes on a PowerPoint presentation with just a few simple steps. Explore this functionality to enhance your productivity and work smarter, not harder!

ufo_delete_note.mp4

2️⃣📧 Example 2: Composing an email using text from multiple sources.

In this example, we will demonstrate how to utilize UFO to extract text from Word documents, describe an image, compose an email, and send it seamlessly. Enjoy the versatility and efficiency of cross-application experiences with UFO!

ufo_meeting_note_crossed_app_demo_new.mp4

📊 Evaluation

Please consult the WindowsBench provided in Section A of the Appendix within our technical report. Here are some tips (and requirements) to aid in completing your request:

  • Prior to UFO execution of your request, ensure that the targeted application is active (though it may be minimized).
  • Please note that the output of GPT-V may not consistently align with the same request. If unsuccessful with your initial attempt, consider trying again.

📚 Citation

Our technical report paper can be found here. Note that previous AppAgent and ActAgent in the paper are renamed to HostAgent and AppAgent in the code base to better reflect their functions. If you use UFO in your research, please cite our paper:

@article{ufo,
  title={{UFO: A UI-Focused Agent for Windows OS Interaction}},
  author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and  Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and  Zhang, Qi},
  journal={arXiv preprint arXiv:2402.07939},
  year={2024}
}

📝 Todo List

  • RAG enhanced UFO.
  • Support more control using Win32 API.
  • Documentation.
  • Support local host GUI interaction model.
  • Chatbox GUI for UFO.

🎨 Related Project

You may also find TaskWeaver useful, a code-first LLM agent framework for seamlessly planning and executing data analytics tasks.

⚠️ Disclaimer

By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices in DISCLAIMER.md

logo Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

ufo's People

Contributors

al-377 avatar dependabot[bot] avatar eltociear avatar lenny2liu avatar lserinol avatar mac0q avatar microsoft-github-operations[bot] avatar microsoftopensource avatar saifeilee avatar shilinhe avatar tiger0se avatar vyokky avatar yunhao0204 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ufo's Issues

Win32COM API is not supported for

pip list:

Package                  Version
------------------------ -----------
aiohttp                  3.9.5
aiosignal                1.3.1
annotated-types          0.7.0
anyio                    4.4.0
art                      6.1
attrs                    23.2.0
beautifulsoup4           4.12.3
certifi                  2024.6.2
cffi                     1.16.0
charset-normalizer       3.3.2
click                    8.1.7
colorama                 0.4.6
comtypes                 1.4.4
cryptography             42.0.8
dashscope                1.15.0
dataclasses-json         0.6.7
distro                   1.9.0
faiss-cpu                1.8.0
filelock                 3.15.4
frozenlist               1.4.1
fsspec                   2024.6.1
greenlet                 3.0.3
h11                      0.14.0
httpcore                 1.0.5
httpx                    0.27.0
huggingface-hub          0.23.4
idna                     3.7
intel-openmp             2021.4.0
Jinja2                   3.1.4
joblib                   1.4.2
jsonpatch                1.33
jsonpointer              3.0.0
langchain                0.1.11
langchain-community      0.0.27
langchain-core           0.1.52
langchain-text-splitters 0.0.2
langsmith                0.1.83
lxml                     5.1.0
MarkupSafe               2.1.5
marshmallow              3.21.3
mkl                      2021.4.0
mpmath                   1.3.0
msal                     1.25.0
multidict                6.0.5
mypy-extensions          1.0.0
networkx                 3.3
numpy                    1.26.4
openai                   1.13.3
orjson                   3.10.6
packaging                23.2
pandas                   1.4.3
pillow                   10.3.0
pip                      24.1.1
psutil                   5.9.8
pycparser                2.22
pydantic                 2.8.0
pydantic_core            2.20.0
PyJWT                    2.8.0
python-dateutil          2.9.0.post0
pytz                     2024.1
pywin32                  306
pywinauto                0.6.8
PyYAML                   6.0.1
regex                    2024.5.15
requests                 2.32.0
safetensors              0.4.3
scikit-learn             1.5.1
scipy                    1.14.0
sentence-transformers    2.5.1
setuptools               65.5.0
six                      1.16.0
sniffio                  1.3.1
soupsieve                2.5
SQLAlchemy               2.0.31
sympy                    1.12.1
tbb                      2021.13.0
tenacity                 8.4.2
threadpoolctl            3.5.0
tokenizers               0.19.1
torch                    2.3.1
tqdm                     4.66.4
transformers             4.42.3
typing_extensions        4.12.2
typing-inspect           0.9.0
urllib3                  2.2.2
yarl                     1.9.4

issue: when i ask the UFO to create file or open some, the console will always shows like this:

...
{"status_code": 200, "request_id": "_xxxxxxxx_", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": [{"text": "_some steps_"}]}}]},_ "usage": {"input_tokens": 391, "output_tokens": 93, "image_tokens": 180}}
...(multi times repeat)

Error occurs when calling LLM: Traceback (most recent call last):
  File "D:\xxx\UFO\ufo\agents\processors\host_agent_processor.py", line 121, in get_response
    self._response, self.cost = self.host_agent.get_response(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\xxx\UFO\ufo\agents\agent\basic.py", line 148, in get_response
    response_string, cost = llm_call.get_completion(
                            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\xxx\UFO\ufo\llm\llm_call.py", line 33, in get_completion
    return responses[0], cost
           ~~~~~~~~~^^^
IndexError: list index out of range

Warning: Win32COM API is not supported for .
Loading offline help document indexer for ...
Creating an experience indexer...
Warning: Failed to load experience indexer from vectordb/experience/experience_db.
Creating an demonstration indexer...
Warning: Failed to load demonstration indexer from vectordb/demonstration/demonstration_db.
Cost is not available for the model qwen-vl-plus or qwen-vl-plus.

the 'IndexError: list index out of range' may caused by the api keys limit cuz retry to ask AI frequently when the Win32COM API error.
Or may be some other reasons, Idk.

I've try search 'Win32COM API is not supported for' but get nothing useful.
Perhaps missing some .COM stuff in my environment(
win10 22H2 19045.4598;
with
Microsoft Visual C++ 2015-2022 Redistributable(x64) - 14.40.33810,
Microsoft Windows Desktop Runtime - 6.0.31(x64),
Windows Software Development Kit 10.0.20348.1,

in python i've got pywin32 == 306
).

Qwen-vl API request error

Hey MS UFO team,

Thanks for your excellent work and recent updates for supporting Qwen API.
I'm testing the code on my win11 using the Qwen-vl-plus api, with outlook activated and fully maximized on my desktop in order to be taken a screenshotbut, but constantly receiving the 'Error making API request' issue.
After debugging, I found that although I can receive text response from Qwen-vl-plus, the content doesn't contain 'Observation' part (according to line86-92 of script 'ufo\llm\qwen.py' and the example in 'ufo\app_agent_example.yaml' ), thus raising an exception.
Could you please help identify how to fix the problem? Thanks in advance!
My input and output details are as follows:

(```
UFO) PS D:\Projects\GithubProjects\UFO-v0.1.1> python -m ufo --task 'write email'

Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction.


| | | || | / _
| | | || |
| | | |
| |
| || | | || |
_
/ |_| __/

Please enter your request to be completed🛸:
My name is Zac. Please send a email to [email protected] to thanks his contribution on the open source.
Round 1, Step 1: Selecting an application.
Error making API request: To send an email via code, you can use various libraries such as smtplib in Python or JavaMail API in Java.

Here's how you could do it using these tools:

Python:

import smtplib

# Set up your credentials and message content here
sender_email = '[email protected]'
receiver_email = '[email protected]'

message_content = """\
Subject: Thanks for contributing!

Dear Jack,

Thank you very much for your valuable contributions to our open-source project! We appreciate your hard work and dedication.

Best regards,
[Your Name]"""

# Establish connection with SMTP server
with smtplib.SMTP('smtp.example.com', 587) as smtp_server:
    # Start TLS encryption if available
    smtp_server.starttls()

    # Login with your credentials
    smtp_server.login(sender_email, 'password')

    # Send mail
    smtp_server.sendmail(
        sender_email,
        receiver_email,
        message_content.encode()
    )

Java:

import com.sun.mail.smtp.SMTPTransport;
import javax.activation.DataHandler;
import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.AddressException;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;

// ...

try {
    // Create session object
    Session session = Session.getDefaultInstance(props);

    // Get address of recipient from command line arguments
    String dest = args[0];

    try {
        InternetAddress internetDest =
            new InternetAddress(dest);
        Message msg = new MimeMessage(session);
        msg.setFrom(new InternetAddress("[email protected]"));
        msg.addRecipient(Message.RecipientType.TO, internetDest);
        msg.setSubject("Thanks for contributing!");
        msg.setText("Dear Jack,\n\n" +
                "Thank you very much for your valuable contributions to our open-source project!\nWe appreciate your hard work and dedication.\n\n" +
                "Best regards," + Environment.NewLine +
                "[Your Name]");
        Transport transport = session.getTransport();
        transport.connect(); // Connects to host specified by properties above
        transport.sendMessage(msg, msg.getAllRecipients());
        System.out.println("Email sent successfully.");
    } catch (MessagingException e) {
        throw new RuntimeException(e);
    }
} finally { ... }

Please replace '[email protected]', ['your-name'] and 'password' with appropriate values before running this script. Also note that sending emails through plain text might not be secure depending on the email service provider. For more security-conscious applications, consider switching to encrypted protocols like SSL/TLS or even HTTPS.
{"status_code": 200, "request_id": "9ff2692f-705c-9724-aba7-304117b95ffc", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": [{"text": "To send an email via code, you can use various libraries such as smtplib in Python or JavaMail API in Java.\n\nHere's how you could do it using these tools:\n\nPython:\npython\nimport smtplib\n\n# Set up your credentials and message content here\nsender_email = '[email protected]'\nreceiver_email = '[email protected]'\n\nmessage_content = \"\"\"\\\nSubject: Thanks for contributing!\n\nDear Jack,\n\nThank you very much for your valuable contributions to our open-source project! We appreciate your hard work and dedication.\n\nBest regards,\n[Your Name]\"\"\"\n\n# Establish connection with SMTP server\nwith smtplib.SMTP('smtp.example.com', 587) as smtp_server:\n # Start TLS encryption if available\n smtp_server.starttls()\n\n # Login with your credentials\n smtp_server.login(sender_email, 'password')\n\n # Send mail\n smtp_server.sendmail(\n sender_email,\n receiver_email,\n message_content.encode()\n )\n\n\nJava:\njava\nimport com.sun.mail.smtp.SMTPTransport;\nimport javax.activation.DataHandler;\nimport javax.mail.Message;\nimport javax.mail.MessagingException;\nimport javax.mail.Session;\nimport javax.mail.Transport;\nimport javax.mail.internet.AddressException;\nimport javax.mail.internet.InternetAddress;\nimport javax.mail.internet.MimeMessage;\n\n// ...\n\ntry {\n // Create session object\n Session session = Session.getDefaultInstance(props);\n\n // Get address of recipient from command line arguments\n String dest = args[0];\n\n try {\n InternetAddress internetDest =\n new InternetAddress(dest);\n Message msg = new MimeMessage(session);\n msg.setFrom(new InternetAddress(\"[email protected]\"));\n msg.addRecipient(Message.RecipientType.TO, internetDest);\n msg.setSubject(\"Thanks for contributing!\");\n msg.setText(\"Dear Jack,\\n\\n\" +\n \"Thank you very much for your valuable contributions to our open-source project!\\nWe appreciate your hard work and dedication.\\n\\n\" +\n \"Best regards,\" + Environment.NewLine +\n \"[Your Name]\");\n Transport transport = session.getTransport();\n transport.connect(); // Connects to host specified by properties above\n transport.sendMessage(msg, msg.getAllRecipients());\n System.out.println(\"Email sent successfully.\");\n } catch (MessagingException e) {\n throw new RuntimeException(e); \n }\n} finally { ... }\n\nPlease replace '[email protected]', ['your-name'] and 'password' with appropriate values before running this script. Also note that sending emails through plain text might not be secure depending on the email service provider. For more security-conscious applications, consider switching to encrypted protocols like SSL/TLS or even HTTPS."}]}}]}, "usage": {"input_tokens": 1241, "output_tokens": 552, "image_tokens": 180}}
Error making API request: To send an email via code, you can use various libraries such as smtplib and email. Here's an example of how you might do this in Python:

import smtplib

# Set up the sender and recipient
sender = '[email protected]'
recipient = '[email protected]'

# Compose the message body
message_body = """
    Subject: Thanks for your contribution!

    Hi Jack,

    Thank you very much for contributing to our open-source project! We really appreciate your help.

    Best regards,
    Zac"""

# Create a secure SSL connection using port 465
with smtplib.SMTP_SSL('smtp.outlook.com', 465) as server:

    # Send the email with the composed message
    server.login(sender, 'your_password')
    server.sendmail(
        sender,
        recipient,
        message_body
    )

Note that you will need to replace '[email protected]', 'your_password', and other placeholders with your actual email address and password. Also be sure to import any necessary modules or dependencies at the top of your script if they're not already imported.
{"status_code": 200, "request_id": "12b3347f-2964-91e0-985a-d348867b4d3f", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": [{"text": "To send an email via code, you can use various libraries such as smtplib and email. Here's an example of how you might do this in Python:\npython\nimport smtplib\n\n# Set up the sender and recipient\nsender = '[email protected]'\nrecipient = '[email protected]'\n\n# Compose the message body\nmessage_body = \"\"\"\n Subject: Thanks for your contribution!\n\n Hi Jack,\n\n Thank you very much for contributing to our open-source project! We really appreciate your help.\n\n Best regards,\n Zac\"\"\"\n\n# Create a secure SSL connection using port 465\nwith smtplib.SMTP_SSL('smtp.outlook.com', 465) as server:\n\n # Send the email with the composed message\n server.login(sender, 'your_password')\n server.sendmail(\n sender,\n recipient,\n message_body\n )\n\n\nNote that you will need to replace '[email protected]', 'your_password', and other placeholders with your actual email address and password. Also be sure to import any necessary modules or dependencies at the top of your script if they're not already imported."}]}}]}, "usage": {"input_tokens": 1241, "output_tokens": 242, "image_tokens": 180}}
Error making API request: To send an email via code, you can use various libraries such as smtplib in Python or JavaMail API in Java. Here's how you could do it using these tools:

Python:

import smtplib

def send_email():
    # Set up the sender and recipient addresses
    fromaddr = '[email protected]'
    toaddrs   = ['[email protected]']

    # Create the message content
    msg = """\
Subject: Thanks!

Dear Jack,

Thank you very much for your contributions to our open-source project! We appreciate your hard work.

Best regards,
The Project Team"""

    try:
        # Connect to the SMTP server and send the mail
        smtpObj = smtplib.SMTP('smtp.example.com')
        smtpObj.sendmail(fromaddr, toaddrs, msg)
        print("Email sent successfully!")
    except Exception as e:
        print(f"Error occurred while sending email: {str(e)}")

send_email()

Java:

import javax.mail.*;
import java.util.Properties;

public class SendEmail {
    public static void main(String args[]) throws MessagingException {

        // Get system properties
        Properties props = System.getProperties();

        // Setup mail server
        String host = "smtp.example.com";
        props.put("mail.smtp.host", host);

        // Get the Session object
        Session session = Session.getDefaultInstance(props);

        try {
            // Create a default MimeMessage object.
            Message message = new MimeMessage(session);
            message.setFrom(new InternetAddress("[email protected]"));
            message.addRecipient(Message.RecipientType.TO, new InternetAddress("[email protected]"));

            // Set the subject and the body of the message
            message.setSubject("Thanks!");
            message.setText("Dear Jack,\n\nThank you very much for your contributions to our open-source project!\nWe appreciate your hard work.\n\nBest regards," +
                    "\nThe Project Team");

            // Send the actual email with the message through the transport service
            Transport.send(message);
            System.out.println("Email sent successfully!");

        } catch (MessagingException mex) {
            throw new RuntimeException(mex);
        }
    }
}

Please replace [email protected], [email protected], and smtp.example.com with appropriate values before running this script. Also note that some parts may vary depending on which programming language/library you choose to implement this functionality.
{"status_code": 200, "request_id": "d58465d5-11ba-9009-93fc-bb682810236c", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": [{"text": "To send an email via code, you can use various libraries such as smtplib in Python or JavaMail API in Java. Here's how you could do it using these tools:\n\nPython:\npython\nimport smtplib\n\ndef send_email():\n # Set up the sender and recipient addresses\n fromaddr = '[email protected]'\n toaddrs = ['[email protected]']\n\n # Create the message content\n msg = \"\"\"\\\nSubject: Thanks!\n\nDear Jack,\n\nThank you very much for your contributions to our open-source project! We appreciate your hard work.\n\nBest regards,\nThe Project Team\"\"\"\n\n try:\n # Connect to the SMTP server and send the mail\n smtpObj = smtplib.SMTP('smtp.example.com')\n smtpObj.sendmail(fromaddr, toaddrs, msg)\n print(\"Email sent successfully!\")\n except Exception as e:\n print(f\"Error occurred while sending email: {str(e)}\")\n\nsend_email()\n\nJava:\njava\nimport javax.mail.*;\nimport java.util.Properties;\n\npublic class SendEmail {\n public static void main(String args[]) throws MessagingException {\n\n // Get system properties\n Properties props = System.getProperties();\n\n // Setup mail server\n String host = \"smtp.example.com\";\n props.put(\"mail.smtp.host\", host);\n\n // Get the Session object\n Session session = Session.getDefaultInstance(props);\n\n try {\n // Create a default MimeMessage object.\n Message message = new MimeMessage(session);\n message.setFrom(new InternetAddress(\"[email protected]\"));\n message.addRecipient(Message.RecipientType.TO, new InternetAddress(\"[email protected]\"));\n\n // Set the subject and the body of the message\n message.setSubject(\"Thanks!\");\n message.setText(\"Dear Jack,\\n\\nThank you very much for your contributions to our open-source project!\\nWe appreciate your hard work.\\n\\nBest regards,\" +\n \"\\nThe Project Team\");\n\n // Send the actual email with the message through the transport service\n Transport.send(message);\n System.out.println(\"Email sent successfully!\");\n\n } catch (MessagingException mex) {\n throw new RuntimeException(mex);\n }\n }\n}\n\n\nPlease replace [email protected], [email protected], and smtp.example.com with appropriate values before running this script. Also note that some parts may vary depending on which programming language/library you choose to implement this functionality."}]}}]}, "usage": {"input_tokens": 1241, "output_tokens": 496, "image_tokens": 180}}
Error occurs when calling LLM: Traceback (most recent call last):
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\module\processors\processor.py", line 165, in get_response
self._response, self._cost = self.host_agent.get_response(
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\agent\basic.py", line 278, in get_response
response_string, cost = llm_call.get_completion(
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\llm\llm_call.py", line 33, in get_completion
return responses[0], cost
IndexError: list index out of range

Traceback (most recent call last):
File "D:\ProgramFiles\anaconda3\envs\UFO\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\ProgramFiles\anaconda3\envs\UFO\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo_main
.py", line 7, in
ufo.main()
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\ufo.py", line 55, in main
clients.run_all()
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\module\client.py", line 52, in run_all
client.run()
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\module\client.py", line 28, in run
self.session.handle()
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\module\basic.py", line 459, in handle
self._state.handle(self)
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\module\state.py", line 194, in handle
session.round_hostagent_execution()
File "D:\Projects\GithubProjects\UFO-v0.1.1\ufo\module\session.py", line 224, in round_hostagent_execution
self.application = self.app_window.window_text()
AttributeError: 'NoneType' object has no attribute 'window_text'

Local models?

Will local models be supported one day as well?
(Unless they are, and I didn't find it in the readme XD)

Failing to load from the demonstration database

Greetings,

When I try to perform a task that I have trained using the demonstration using the steps recorder I get the following error.

Warning: Failed to load demonstration indexer from vectordb/demonstration/demonstration_db.

Kindly help me solve the same.

Thnaks!

Error making API request: Invalid URL 'YOUR_ENDPOINT': No scheme supplied.

Created a config.yaml and populated with API key and "gpt-4-vision-preview" as the model as per the instructions, but am getting the following error:
Step 0: Selecting an application.
Error making API request: Invalid URL 'YOUR_ENDPOINT': No scheme supplied. Perhaps you meant https://YOUR_ENDPOINT?
Error occurs when calling LLM.
Running Win11.

How to get all user requests

How to get all user requests in your paper.
And how many user requests are there in the following applications?
“Outlook, Photos, PowerPoint, Word, Adobe Acrobat, File Explorer, Visual Studio Code, WeChat, Edge Browser, and cross-app”

Gemini Model Could Save Us Some Bucks!

I’ve been having a blast with the UFO project. It’s been great for experimenting with AI and automation. However, using OpenAI’s GPT models can get quite expensive. I was thinking, why not integrate Google’s Gemini model into our project? It could be a more cost-effective option for us developers looking to experiment without spending too much.

Having more options like Gemini could really benefit our development process. Let’s consider adding it to the project!

Azure API base instruction wrong?

Hi. I think you have the Azure API base instruction wrong.

I tried https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/completions?api-version={api-version}
However got errors - error message was the model was ?looking for GPT 4

Instead I used https://{your-resource-name}).openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version=2023-12-01-preview, which seems to work. (i.e. add in '/chat' after deployment ID - which seems to be in some other instructions on general sites around using Azure GPT4V).

Is this correct now, or will this create issues for me?

thanks

Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

I followed the Getting Started steps to configure the OpenAI endpoint, but encountered an error during execution.
Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

In the config.yml, I ONLY modified the following parameters:
OPENAI_API_BASE: "https://api.openai.com/v1/chat/completions"
OPENAI_API_KEY: "###"

Could anybody tell me why and how to solve it ?

Connection not working with AOAI

Hey there!
Thank you for you amazing work so far! I was looking forward to try your agent, but I'm not able to use my AOAI credentials with it. I'm putting this in config.yaml:

image

Train or fine-tune models for computer automation agents

Hello there Microsoft UFO Team! Excellent work for you to do such remarkable job, bringing AI closer to Windows system. I am doing similar works like training custom GPT2 models on computer automation datasets.

I have created two comprehensive datasets, over terminal and GUI environments. My strategy is to create data by random keyboard and mouse actions, collect observations mixed with other textual datasets.

This naive attempt shows my strong interest over computer agents. I like the idea of GUI agent benchmark systems like WindowsBench, and have thought of building some reward system by program exit codes or VimGolf.

If you ever consider my suggestion useful I would like to hear from your reply! Furthermore, if cooperation is possible I would be thrilled to join your team for building better computer agents!


Update: Google has posted an unsupervised action space training method called Genie. Consider that as highly applicable in the area of computer agents.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.