Coder Social home page Coder Social logo

asterics / asterics-grid-helper Goto Github PK

View Code? Open in Web Editor NEW
0.0 8.0 0.0 78 KB

Helper tool to enable AsTeRICS Grid to do actions on the system or integrations with external services

License: GNU Affero General Public License v3.0

JavaScript 8.84% Python 91.16%

asterics-grid-helper's Introduction

AsTeRICS-Grid-Helper

Helper tools to enable AsTeRICS Grid to do actions on the operating system or integrations with external services, which aren't possible within the browser. Currently limited to provide speech from external sources.

Speech

Normally AsTeRICS Grid uses the Web Speech API and therefore voices that are installed on the operating system (e.g. SAPI voices on Windows, or voices that are coming from a TTS module on Android). Sometimes it's interesting to use voices, which aren't available as system voices. This section describes how to use an external custom speech service using Python.

Terms

  • Speech provider: a Python module that implements access to a speech generating service like MS Azure, Amazon Polly, Piper, MycroftAI mimic3 or any others. Speech providers can have two types:
    • type "playing": a speech provider where playing the audio file is done internally. Using a speech provider of this type only makes sense, if it's used on the same machine as AsTeRICS Grid.
    • type "data": a speech provider that generates the speech audio data, which then is used by AsTeRICS Grid and played within the browser. This type is preferable, because it makes it possible to run the speech service on any device or server and also allows caching of the data.

Installation and Usage

Speech Service

These steps are necessary to start the speech service that can be used by AsTeRICS Grid:

  • pip install flask flask_cors - for installing Flask, which is needed for providing the REST API
  • pip install pyttsx3 - only if you want to try the speech provider provider_pytts_playing.py which is configured by default in config.py, otherwise install any other dependencies needed by the used speech providers, see predefined speech providers.
  • adapt config.py for using the desired speech providers by importing them and adding them to the list speechProviderList.
  • python start.py - to start the REST API

AsTeRICS Grid

In AsTeRICS Grid do the following steps to use the external speech provider:

  • Go to Settings -> General Settings -> Advanced general settings
  • Configure the External speech service URL with the IP/host where the API is running, port 5555. If the speech service is running on the same computer, use http://localhost:5555.
  • Reload AsTeRICS Grid (F5)
  • Go to Settings -> User settings -> Voice and enable Show all voices
  • Verify that the additional voices are selectable and working. For the default provider_pytts_playing speech provider some voices like <voice name>, pytts_playing should be listed.

Caching

For speech providers with type "data", all generated speech data is automatically cached to the folder speech/temp. If you want to cache speech data for a whole AsTeRICS Grid configuration follow these steps:

  • configure AsTeRICS Grid to use your desired speech provider / voice (see steps above)
  • go to Settings -> User settings -> Voice -> Advanced voice settings and click the button Cache all texts of current configuration using external voice. This operation may take some time for big AsTeRICS Grid configurations.

Files

These are the important files within the folder speech of this repository:

  • config.py configuration file where it's possible to define which speech providers should be used
  • provider_<name>_playing.py implementation of a speech provider which generates speech and plays audio on its own
  • provider_<name>_data.py implementation of a speech provider which generates speech audio data and returns the binary data, which then is played by AsTeRICS Grid within the browser
  • start.py main script providing a REST API which can be used by AsTeRICS Grid
  • speechManager.py script which manages different speech providers and is used to access them by the API defined in start.py

Speech providers

This is a list of predefined speech providers with installation hints:

  • mimic3_data: see Mimic 3 installation steps, install in any way which provides mimic3 as CLI-tool, which is used by the speech provider. The current implementation only uses the voice en_UK/apope_low, for further voices the file provider_mimic3_data.py must be adapted.
  • msazure_data, msazure_playing:
    • run pip install azure-cognitiveservices-speech, for further information see MS Azure TTS quickstart
    • to get API credentials, you have to sign-up at MS Azure and create a SpeechServices resource.
    • Create a file speech/credentials.py including two lines AZURE_KEY_1 = "<your-key>" and AZURE_REGION = "<your-region>"
  • piper_data: run pip install piper-tts, for more information see Running Piper in Python.
  • pytts_playing: run pip install pyttsx3
  • elevenlabs_data run pip install requests and create a file speech/credentials.py with ELEVENLABS_KEY = "<your-key>". Read here how to get the API key.

Configuration

See config.py, where the speech providers to use can be imported and added to the list speechProviderList.

Adding new speech providers

Use the templates provider_template_data.py or provider_template_playing.py depending on which type of speech provider you want to add and implement the predefined methods.

REST API

The file speech/start.py starts the REST API with the following endpoints:

  • /voices returns a list of voices that are existing within the current configuration.
  • /speak/<text>/<providerId>/<voiceId> speaks the given text using the given provider and voice.
  • /speakdata/<text>/<providerId>/<voiceId> returns the binary audio data for the text using the given provider and voice.
  • /cache/<text>/<providerId>/<voiceId> caches the audio data for the given parameters to a file in speech/temp in order to be able to use it faster or without internet connection afterwards.
  • /speaking returns true if the system is currently speaking (only applicable for voice type "speaking")

asterics-grid-helper's People

Contributors

klues avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.