Coder Social home page Coder Social logo

tts-ros1's Introduction

tts

Overview

The tts ROS node enables a robot to speak with a human voice by providing a Text-To-Speech service. Out of the box this package listens to a speech topic, submits text to the Amazon Polly cloud service to generate an audio stream file, retrieves the audio stream from Amazon Polly, and plays the audio stream via the default output device. The nodes can be configured to use different voices as well as custom lexicons and SSML tags which enable you to control aspects of speech, such as pronunciation, volume, pitch, speed rate, etc. A sample ROS application with this node, and more details on speech customization are available within the Amazon Polly documentation.

Amazon Polly Summary: Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Amazon Polly is a Text-to-Speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries.

License

The source code is released under an Apache 2.0.

Author: AWS RoboMaker
Affiliation: Amazon Web Services (AWS)

RoboMaker cloud extensions rely on third-party software licensed under open-source licenses and are provided for demonstration purposes only. Incorporation or use of RoboMaker cloud extensions in connection with your production workloads or commercial product(s) or devices may affect your legal rights or obligations under the applicable open-source licenses. License information for this repository can be found here. AWS does not provide support for this cloud extension. You are solely responsible for how you configure, deploy, and maintain this cloud extension in your workloads or commercial product(s) or devices.

Supported ROS Distributions

  • Kinetic
  • Melodic

Installation

AWS Credentials

You will need to create an AWS Account and configure the credentials to be able to communicate with AWS services. You may find AWS Configuration and Credential Files helpful.

This node will require the following AWS account IAM role permissions:

  • polly:SynthesizeSpeech

Dependencies

In order to use the Text-To-Speech node with ROS kinetic you must update the version of boto3 that is installed on your system to at least version 1.9.0. You can do this by running the command:

    pip3 install -U boto3

This step is required before the node will work properly because the version of boto3 is not new enough for the features required by this node.

Building from Source

To build from source you'll need to create a new workspace, clone and checkout the latest release branch of this repository, install all the dependencies, and compile. If you need the latest development features you can clone from the master branch instead of the latest release branch. While we guarantee the release branches are stable, the master should be considered to have an unstable build due to ongoing development.

  • Install build tool: please refer to colcon installation guide

  • Create a ROS workspace and a source directory

      mkdir -p ~/ros-workspace/src
    
  • Clone the package into the source directory .

      cd ~/ros-workspace/src
      git clone https://github.com/aws-robotics/tts-ros1.git -b release-latest
    
  • Install dependencies

      cd ~/ros-workspace 
      sudo apt-get update && rosdep update
      rosdep install --from-paths src --ignore-src -r -y
    

Note: If building the master branch instead of a release branch you may need to also checkout and build the master branches of the packages this package depends on.

  • Build the packages

      cd ~/ros-workspace && colcon build
    
  • Configure ROS library Path

      source ~/ros-workspace/install/setup.bash
    
  • Build and run the unit tests

      colcon test --packages-select tts && colcon test-result --all
    

Testing in Containers/Virtual Machines

Even if your container or virtual machine does not have audio device, you can still test TTS by leveraging an audio server.

The following is an example setup on a MacBook with PulseAudio as the audio server. If you are new to PulseAudio, you may want to read the PulseAudio Documentation.

Step 1: Start PulseAudio on your laptop

After installation, start the audio server with module-native-protocol-tcp loaded:

pulseaudio --load=module-native-protocol-tcp --exit-idle-time=-1 --log-target=stderr -v

Note the extra arguments -v and --log-target are used for easier troubleshooting.

Step 2: Run TTS nodes in container

In your container, make sure you set the right environment variables. For example, you can start the container using docker run -it -e PULSE_SERVER=docker.for.mac.localhost ubuntu:16.04.

Then you will be able to run ROS nodes in the container and hear the audio from your laptop speakers.

Troubleshooting

If your laptop has multiple audio output devices, make sure the right one has the right volume. This command will give you a list of output devices and tell you which one has been selected:

pacmd list-sinks | grep -E '(index:|name:|product.name)'

Launch Files

An example launch file called sample_application.launch is provided.

Usage

Run the node

  • Plain text

    • roslaunch tts sample_application.launch
    • rosrun tts voicer.py 'Hello World'
  • SSML

    • roslaunch tts sample_application.launch
    • rosrun tts voicer.py '<speak>Mary has a <amazon:effect name="whispered">little lamb.</amazon:effect></speak>' '{"text_type":"ssml"}'

Configuration File and Parameters

Parameter Name Type Description
polly_action string Currently only one action named SynthesizeSpeech is supported.
text string The text to be synthesized. It can be plain text or SSML. See also text_type.
text_type string A user can choose from text and ssml. Default: text.
voice_id string The list of supported voices can be found on official Amazon Polly document. Default: Joanna
output_format string Valid formats are ogg_vorbis, mp3 and pcm. Default: ogg_vorbis
output_path string The audio data will be saved as a local file for playback and reuse/inspection purposes. This parameter is to provide a preferred path to save the file. Default: .
sample_rate string Note 16000 is a valid sample rate for all supported formats. Default: 16000.

Performance and Benchmark Results

We evaluated the performance of this node by runnning the followning scenario on a Raspberry Pi 3 Model B:

  • Launch a baseline graph containing the talker and listener nodes from the roscpp_tutorials package, plus two additional nodes that collect CPU and memory usage statistics. Allow the nodes to run for 60 seconds.
  • Launch the nodes polly_node, synthesizer_node and tts_node by using the launch file sample_application.launch as described above. At the same time, perform several calls to the action tts/action/Speech.action using the voicer.py script descried above, by running the following script in the background:
rosrun tts voicer.py  '<speak>Amazon Polly is a <emphasis level="strong">Text-to-Speech</emphasis> (TTS) cloud service</speak>' '{"text_type":"ssml"}' ; sleep 1
rosrun tts voicer.py  '<speak>that converts text into lifelike speech</speak>' '{"text_type":"ssml"}' ; sleep 1
rosrun tts voicer.py  '<speak>You can use Amazon Polly to develop applications that increase <emphasis level="moderate">engagement and accessibility</emphasis></speak>' '{"text_type":"ssml"}' ; sleep 1
rosrun tts voicer.py  '<speak>Amazon Polly supports multiple languages and includes a variety of lifelike voices</speak>' '{"text_type":"ssml"}' ; sleep 1
rosrun tts voicer.py  '<speak>so you can build speech-enabled applications that work in multiple locations</speak>' '{"text_type":"ssml"}' ; sleep 1
rosrun tts voicer.py  '<speak>and use the ideal voice for your customers</speak>' '{"text_type":"ssml"}' ; sleep 1
  • Allow the nodes to run for 180 seconds.
  • Terminate the polly_node, synthesizer_node and tts_node nodes, and allow the reamaining nodes to run for 60 seconds.

The following graph shows the CPU usage during that scenario. The 1 minute average CPU usage starts at 16.75% during the launch of the baseline graph, and stabilizes at 6%. When we launch the Polly nodes around second 85, the 1 minute average CPU increases up to a peak of 22.25% and stabilizes around 20%. After we stop making requests with the script voicer.py around second 206 the 1 minute average CPU usage moves to around 12%, and decreases gradually, and goes down again to 2.5 % after we stop the Polly nodes at the end of the scenario.

cpu

The following graph shows the memory usage during that scenario. We start with a memory usage of around 227 MB that increases to around 335 MB (+47.58%) when we lanch the Polly nodes around second 85, and gets to a peak of 361 MB (+59% wrt. initial value) while we are calling the script voicer.py. The memory usage goes back to the initial values after stopping the Polly nodes.

memory

Nodes

polly

Polly node is the engine for the synthesizing job. It provides user-friendly yet powerful APIs so a user doesn't have to deal with technical details of AWS service calls.

Services

  • polly (tts/Polly)

    Call the service to use Amazon Polly to synthesize the audio.

Reserved for future usage

  • language_code (string, default: None)

    A user doesn't have to provide a language code and this is reserved for future usage.

  • lexicon_content (string, default: None)

  • lexicon_name (string, default: None)

  • lexicon_names (string[], default: empty)

  • speech_mark_types (string[], default: empty)

  • max_results (uint32, default: None)

  • next_token (string, default: None)

  • sns_topic_arn (string, default: None)

  • task_id (string, default: None)

  • task_status (string, default: iNone)

  • output_s3_bucket_name (string, default: None)

  • output_s3_key_prefix (string, default: None)

  • include_additional_language_codes (bool, default: None)

synthesizer node

Services

  • synthesizer (tts/Synthesizer)

    Call the service to synthesize.

Parameters

  • text (string)

    The text to be synthesized.

  • metadata (string, JSON format)

    Optional, for user to have control over how synthesis happens.

tts node

Action

  • speech

Parameters

  • text (string)

    The text to be synthesized.

  • metadata (string, JSON format)

    Optional, for user to have control over how synthesis happens.

tts-ros1's People

Contributors

aalon avatar cevans87 avatar dabonnie avatar dependabot[bot] avatar emersonknapp avatar hyandell avatar juanrh avatar lucashan avatar mike-moore-az avatar mm318 avatar nburek avatar pincom15 avatar ryanewel avatar tfoote avatar xabxx avatar yyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tts-ros1's Issues

tts action server freezes

Hi, the tts action server works fine in most time, but when I send requests multiple times, the action server seems freezing. To reproduce the error, just launch the server:

roslaunch tts sample_application.launch

Then run rosrun tts voicer.py 'Hello World' for multiple times. On my computer, it freezes at about the 32nd times. While freezing, the sound card seems blocked and cannot play any video.

Maybe it is because of my sound card driver or the sound_play package. Hope you can reproduce this. Thank you in advance!

Using and handling cached audio files

Given a repeated prompt, without a specified output path, the system will make a new call to synthesize audio, download the audio, and play the audio. The files are by default cached to /tmp with a md5 hash of the text and a timestamp. That is done here:

if 'output_path' not in kw:

It would be nice if the hash was done on the entirety of the request to be sent without a timestamp and the system would check the storage location to see if the same request has already been returned. If it had been, then the prior audio could be used. Over time, that could build up a large number of files, so it would be nice to be able to set a maximum cache size and have the least-recently used file deleted upon exceeding the maximum.

Missing actionlib

OS: Ubuntu 16.04
ROS Distro: Kinetic

I tried running this package on the ros-kinetic-core docker image. The build succeeded, but when I tried roslaunch tts sample_application.launch I got the following error.

process[tts_node-4]: started with pid [3611]
Traceback (most recent call last):
  File "/ws/polly_test_ws/install/tts/lib/tts/tts_node.py", line 55, in <module>
    import actionlib
ImportError: No module named actionlib
[tts_node-4] process has died [pid 3611, exit code 1, cmd /ws/polly_test_ws/install/tts/lib/tts/tts_node.py __name:=tts_node __log:=/root/.ros/log/a0521072-be04-11e9-828c-0242ac110002/tts_node-4.log].
log file: /root/.ros/log/a0521072-be04-11e9-828c-0242ac110002/tts_node-4*.log
process[soundplay_node-5]: started with pid [3630]
Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/sound_play/soundplay_node.py", line 48, in <module>
    from sound_play.msg import SoundRequest, SoundRequestAction, SoundRequestResult, SoundRequestFeedback
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/sound_play/__init__.py", line 33, in <module>
    import libsoundplay as libsoundplay
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/sound_play/libsoundplay.py", line 41, in <module>
    import actionlib
ImportError: No module named actionlib
[soundplay_node-5] process has died [pid 3630, exit code 1, cmd /opt/ros/kinetic/lib/sound_play/soundplay_node.py __name:=soundplay_node __log:=/root/.ros/log/a0521072-be04-11e9-828c-0242ac110002/soundplay_node-5.log].
log file: /root/.ros/log/a0521072-be04-11e9-828c-0242ac110002/soundplay_node-5*.log

This was solved by adding actionlib as a dependency in the package.xml.

Need better error message when missing credentials

OS: Ubuntu 16.04
ROS Distro: Kinetic

When running roslaunch tts sample_application.launch without credentials I get the following error message:

Traceback (most recent call last):
  File "/ws/polly_test_ws/install/tts/lib/tts/polly_node.py", line 19, in <module>
    tts.amazonpolly.main()
  File "/ws/polly_test_ws/install/tts/lib/python2.7/dist-packages/tts/amazonpolly.py", line 424, in main
    AmazonPolly().start(node_name=node_name, service_name=service_name)
  File "/ws/polly_test_ws/install/tts/lib/python2.7/dist-packages/tts/amazonpolly.py", line 186, in __init__
    self.polly = self._get_polly_client(aws_access_key_id, aws_secret_access_key, aws_session_token, region_name)
  File "/ws/polly_test_ws/install/tts/lib/python2.7/dist-packages/tts/amazonpolly.py", line 217, in _get_polly_client
    return session.client("polly")
  File "/usr/lib/python2.7/dist-packages/boto3/session.py", line 263, in client
    aws_session_token=aws_session_token, config=config)
  File "/usr/lib/python2.7/dist-packages/botocore/session.py", line 850, in create_client
    credentials = self.get_credentials()
  File "/usr/lib/python2.7/dist-packages/botocore/session.py", line 474, in get_credentials
    'credential_provider').load_credentials()
  File "/usr/lib/python2.7/dist-packages/botocore/credentials.py", line 1662, in load_credentials
    creds = provider.load()
  File "/ws/polly_test_ws/install/tts/lib/python2.7/dist-packages/tts/amazonpolly.py", line 92, in load
    'aws-iot-with-certificate'
  File "/usr/lib/python2.7/dist-packages/botocore/credentials.py", line 308, in create_from_metadata
    access_key=metadata['access_key'],
TypeError: 'NoneType' object has no attribute '__getitem__'
[polly_node-2] process has died [pid 4631, exit code 1, cmd /ws/polly_test_ws/install/tts/lib/tts/polly_node.py __name:=polly_node __log:=/root/.ros/log/173ce162-be0f-11e9-b033-0242ac110003/polly_node-2.log].
log file: /root/.ros/log/173ce162-be0f-11e9-b033-0242ac110003/polly_node-2*.log

It looks like this is caused by missing credentials (and is solved by configuring my AWS credentials). That's not immediately obvious from the error message.

Linking issue with other ROS packages

The tts ROS package can't be "find_packaged" in other ROS packages as library tts is expected due to specification in catkin_package .

This is the error:

CMake Error at /opt/ros/melodic/share/tts/cmake/ttsConfig.cmake:173 (message):
  Project 'package_name' tried to find library 'tts'.  The
  library is neither a target nor built/installed properly.  Did you compile
  project 'tts'? Did you find_package() it before the subdirectory containing
  its code is included?

Seperate srv and action to another package

Hi, thank you for providing such a wonderful tool!

I think some users (like me) may want to call the SpeechAction from another package. It would be more convenient to build the package on a sperate package that only has the srv and action file. It's like moveit_msgs and tf2_msgs. A discussion about this can be found here .

This is just a suggestion. I am not an expert. maybe there is a better way to do that.

'AWSHTTPSConnection' object has no attribute 'server_hostname

I am having problems getting the tts-ros1 code to work. I have permissions setup through the aws-cli and can do something like: aws polly synthesize-speech --output-format mp3 --voice-id Ivy --text 'hello, this is a test' test.mp3 which will work. I can also run the demo from the polly docs without any problem.

Running in kinetic, I have made my own config file to have the region as "us-east-1" and have modified the sample launch file to use the new config file.

I then run the launch file and get logging info that the synthesizer is running, polly is running, sound play is running. I send in a command using the voicer.py, I get some info about the text that was received, that it will use Polly, the request. And then the actual request which will be sent to aws:

Node: /polly_node
Time: 12:14:10.334841966 (2019-05-22)
Severity: Info
Published Topics: /rosout

Amazon Polly Request: {'OutputFormat': 'ogg_vorbis', 'SpeechMarkTypes': [], 'VoiceId': 'Joanna', 'Text': 'hello', 'LexiconNames': [], 'SampleRate': '22050', 'TextType': 'text'}

Location:
amazonpolly.py:_synthesize_speech_and_save:295


But then the request fails to return properly and I get this error:

Node: /polly_node
Time: 12:14:10.472778081 (2019-05-22)
Severity: Error
Published Topics: /rosout

{"Audio Type": "ogg", "Audio File": "/opt/ros/kinetic/lib/python2.7/dist-packages/tts/data/error.ogg", "Traceback": "Traceback (most recent call last):\n File "/opt/ros/kinetic/lib/python2.7/dist-packages/tts/amazonpolly.py", line 352, in _node_request_handler\n response = self._dispatch(request)\n File "/opt/ros/kinetic/lib/python2.7/dist-packages/tts/amazonpolly.py", line 339, in _dispatch\n return actionsrequest.polly_action\n File "/opt/ros/kinetic/lib/python2.7/dist-packages/tts/amazonpolly.py", line 296, in _synthesize_speech_and_save\n response = self.polly.synthesize_speech(**kws)\n File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 251, in _api_call\n return self._make_api_call(operation_name, kwargs)\n File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 526, in _make_api_call\n operation_model, request_dict)\n File "/usr/lib/python2.7/dist-packages/botocore/endpoint.py", line 141, in make_request\n return self._send_request(request_dict, operation_model)\n File "/usr/lib/python2.7/dist-packages/botocore/endpoint.py", line 170, in _send_request\n success_response, exception):\n File "/usr/lib/python2.7/dist-packages/botocore/endpoint.py", line 249, in _needs_retry\n caught_exception=caught_exception, request_dict=request_dict)\n File "/usr/lib/python2.7/dist-packages/botocore/hooks.py", line 227, in emit\n return self._emit(event_name, kwargs)\n File "/usr/lib/python2.7/dist-packages/botocore/hooks.py", line 210, in _emit\n response = handler(**kwargs)\n File "/usr/lib/python2.7/dist-packages/botocore/retryhandler.py", line 183, in call\n if self._checker(attempts, response, caught_exception):\n File "/usr/lib/python2.7/dist-packages/botocore/retryhandler.py", line 251, in call\n caught_exception)\n File "/usr/lib/python2.7/dist-packages/botocore/retryhandler.py", line 269, in _should_retry\n return self._checker(attempt_number, response, caught_exception)\n File "/usr/lib/python2.7/dist-packages/botocore/retryhandler.py", line 317, in call\n caught_exception)\n File "/usr/lib/python2.7/dist-packages/botocore/retryhandler.py", line 223, in call\n attempt_number, caught_exception)\n File "/usr/lib/python2.7/dist-packages/botocore/retryhandler.py", line 359, in _check_caught_exception\n raise caught_exception\nAttributeError: 'AWSHTTPSConnection' object has no attribute 'server_hostname'\n", "Exception": {"Value": "'AWSHTTPSConnection' object has no attribute 'server_hostname'", "Type": "<type 'exceptions.AttributeError'>", "Name": "AttributeError", "Module": "exceptions"}}

Location:
amazonpolly.py:_node_request_handler:376


Any thoughts on what I am doing wrong?

Selecting the profile to use

I can't figure out how to select the profile to use. As long as my default profile can do what is needed, the system works.

It seems like the profile could be set here:

botocore_session = get_session()
.

It would be great to be able to just load in a rosparam to do that.

integration_tests don't seem to run

It appears as if the integration tests aren't being run when running colcon's test function. That seems to be causing code coverage problems. The other two test sets (test_unit_polly and test_unit_synthesizer) seem to run just fine. I don't really understand colcon enough to know why this is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.