Coder Social home page Coder Social logo

voice_browsing2's Introduction

voice_browsing2

Voice Browsing2 is a project that allows for text and voice-based browsing control, primarily on an Android device. Below are the steps to set up the project and get started with using it.

Setup

Firstly, clone the repository and navigate into the project directory:

$ cd work
$ git clone https://github.com/aRaikoFunakami/voice_browsing2.git
$ cd voice_browsing2/

Please prepare poetry like the following commands

$ curl -sSL https://install.python-poetry.org | python3 -

If your machine is Mac...

$ brew install poetry

Once inside the project directory, use poetry to install the necessary dependencies:

$ poetry install

Next, copy the sample configuration file config_sample.py to config.py:

$ cp config_sample.py config.py

Please replace <YOUR OPENAI ID> by your OpenAI ID in config.py. Please replace <YOUR IP ADDRESS> by your PC's IP ADDRESS.

keys = {
    "openai_api_key": "<YOUR OPENAI ID>",
    "launcher": "http://<YOUR IP ADDRESS>:8080/launcher.html",
}

Chrome running on Android Tablet will load the dummy launcher UI with "http://<YOUR IP ADDRESS>:8080/launcher.html"

Connect to the Android Device

Ensure your Android device is connected to your machine. You can verify the connection using the following command:

$ adb devices
List of devices attached
R52N800FR2Y	device

The device ID R52N800FR2Y confirms that the device is successfully connected.

If you want to connect to the device via WiFi.

$ adb tcpip 5555
$ adb connect xxx.xxx.xxx.xxx

Text-based Browsing

To initiate text-based browsing, run the following command:

$ poetry run python remote_chat.py

You'll be prompted to enter a text search query. For instance, to search for "Back to the Future" on YouTube, type the following and hit enter:

Enter the text to search (or 'exit' to quit): search "back to the future" in youtube

The program will execute the search and prompt you to select a link from the search results:

> Entering new AgentExecutor chain...

Invoking: `search_by_query` with `{'url': 'https://www.youtube.com/', 'input': 'back to the future'}`


<class 'selenium.webdriver.chrome.webdriver.WebDriver'>
The search was successful. Please ask Human to select links.I have searched for "back to the future" on YouTube. Please select the link you want to open by providing the corresponding number.

> Finished chain.
Enter the text to search (or 'exit' to quit):

Voice Browsing

Voice browsing allows you to control the browsing experience using voice commands. To set up voice browsing, run the following command to launch the server application:

poetry run python app.py

The Chrome application on your connected Android device will launch automatically. Next, launch the voice controller application as a Web App by running the following command:

open -a 'Google Chrome' 'http://127.0.0.1:8080'

Now you can use voice commands to control the browsing experience on your Android device.

These steps should provide a smooth setup and user experience for text and voice-controlled browsing using the voice_browsing2 project.

Support list in v0.1.0

Actions YouTube Amazon Prime Hulu dAnime
Video Search YES YES YES YES
Select by Number YES - - -
Play/Pause YES - - -
Audio Mute YES - - -
Enter Fullscreen YES - - -
Adjust Playback Speed YES - - -
Navigate Videos YES - - -

voice_browsing2's People

Contributors

araikofunakami avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.