Coder Social home page Coder Social logo

ros_speech2text's Introduction

Ros Speech2Text Build Status Issues Codacy Badge

A speech2text engine for ROS, using the Google Cloud Speech API.

Setup

Prerequisites

  • ros: Any version newer than ROS Indigo should work.
  • google cloud speech 0.22.0: Available here. Further instructions on API authentication can be found below.
  • PyAudio>=0.2.9: Python package for audio source fetching.
  • svox_tts : it's a SVOX-PICO based wrapper for text-to-speech. It's not necessary, but if you wish to see status messages on the screen of your robot, svox_tts is required. Available here.

Installation

Installation of the package follows the standard building procedure of ROS packages. The following instructions are for catkin build, even though the repository can be used and compile with the old catkin_make without issues.

  1. Compile the repo: catkin build ros_speech2text
  2. To test if the package is working, run roslaunch ros_speech2text ros_speech2text_async.launch.

Note for pyAudio on Ubuntu 14.04

The packaged version of pyAudio on Trusty is 0.2.7. Newer versions can be installed via pip install pyaudio. It however requires portaudio to be installed on the system which can be found here.

Authentication Instructions

Authentication of the Google Cloud Speech API is done by setting an environmental variable. For instructions on obtaing an API credential, check here. The path of the API credential should be supplied in the launch file, see below for more instructions.

Execution

Initial steps (mainly for Scazlab students)

  1. Turn on the robot. Wait for the robot to finish its start-up phase.
  2. Be sure that the system you're running the code has access to the Baxter robot. This is usually done by running the baxter.sh script that should be provided in your Baxter installation. See here for more info. @ScazLab students → for what concerns the Baxter robot on the ScazLab, this means that every time you have to run some ROS software to be used on the robot you should open a new terminal, and do the following: cd ros_devel_ws && ./baxter.sh. A change in the terminal prompt should acknowledge that you now have access to baxter.local. Please be aware of this issue when you operate the robot.
  3. Untuck the robot. @ScazLab students → we have an alias for this, so you just have to type untuck

Launch file parameters

Public ROS params

  • /ros_speech2text/speech_history: location of the speech history for the session
  • GOOGLE_APPLICATION_CREDENTIALS: sets environment variable for Google Cloud APIs to work

Private ROS params

  • audio_device_idx: device ID of audio source.
  • audio_rate: rate for your audio capturing device
  • audio_threshold: volume threshold for static thresholding
  • enable_dynamic_threshold: param for dynamic thresholding
  • audio_dynamic_percentage: activate audio recording when volume is this percentage higher than average
  • audio_dynamic_frame: for x consecutive frames all louder than the percentage we specified, activate recording
  • audio_min_avg: min value of average volume to prevent system from being too sensitive in case of constantly quiet environments
  • speech_context: list of context clues for speech recognition

Recognition modes

Synchronous Recognition

The synchronous recognition mode can be launched by roslaunch ros_speech2text ros_speech2text_sync.launch. In the synchronous mode, after a sentence input is completed, the system makes a blocking API call, and all audio input is halted until the recognition results are returned from the server.

Asynchronous Recognition

The synchronous recognition mode can be launched by roslaunch ros_speech2text ros_speech2text_async.launch. A separate thread in this mode polls the results of the async API calls repeatedly, while the main thread keeps on capturing audio and recording sentence.

Misc

The results of recognition is published to the topic /ros_speech2text/user_output with the custom message type transcript.

Troubleshooting

  1. What if after catkin build, it seems like the ROS package still cannot be found?

    Run catkin clean and rospack profile, and try to build the package again.

  2. What if I don't know the device ID of my audio source?

    Run the node once, and use rosparam to get the param /ros_speech2text/available_audio_device. The devices are sorted by device ID starting from zero.

  3. Can I have multiple instances running at the same time?

    Yes, the private parameters can help you configure different audio sources for different nodes.

ros_speech2text's People

Contributors

alecive avatar almighty-ken avatar jakebrawer avatar omangin avatar sstrohkorb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ros_speech2text's Issues

Things to do

  • Add and read the threshold variable from launch file
  • rename package to ros_speech2text
  • rename repo to ros_speech2text
  • rename launch file to ros_speech2text.launch
  • Set the google cloud credentials api variable in the launch file directly
  • Fix #2
  • Fix #3
  • Add in teh documentation what library you need to install to run the code (and explain how to install them) (See #1 )
  • Remove useless files in include folder
  • Test new launch file envvar method for API auth

Default launch file

Right now, there are two launch files in the repo: ros_speech2text_async.launch and ros_speech2text_sync.launch. It is very difficult for an external user to understand which one of the two is the default/better performing.

I would rename the default to ros_speech2text.launch(and leave the other one as it is), so that it is clear which one is the best to use.

Improve logging in `s2t/speech_detection.py`

– Remove unnecessary messages.
– Add callbacks for custom actions at start and end of utterance detection. Use them to send messages to the robot screen from the script instead of from the speech detection class.

Things to do Bis

  • Publish the list of available devices onto a param in the ros parameter server
  • Asyncronous call to the API (low priority)
  • Dynamic thresholding for filtering out loud but short noises
  • Parameter for dynamic thresholding in the parameter server and the launch file
  • Draft a first set of words to use as a context (to be put in the launch file)

CI/CD and stuff

As per title, I think that :

  • we should add travis to this repo
  • we shuold add the travis integration to the baxter_collaboration channel on slack
  • we should add Docker as with the other repositories

And this would be a good training for @JakeBrawer (who I have assigned to this task). Just copy paste / take inspiration from https://github.com/ScazLab/human_robot_collaboration and or https://github.com/ScazLab/baxter_tictactoe https://github.com/ScazLab/hrc_speech_prediction

Good luck! You have one week 😎

Fix ctrl-c slow exit problem

often times during audio collect, pressing ctrl-c takes very long to terminate program
this is especially problematic when using roslaunch

speech_history folder

This folder should be created in an user folder (to be read from launch file, e.g. .ros/ros_speech2text) and it should be automatically created if it is not there.

Listening to display does not need to have the google cloud dependency

As per title. Cc @omangin

(~/code/catkin_my_ws) (master) 
[alecive@malakim]$ rosrun ros_speech2text listening_to_display.py 
Traceback (most recent call last):
  File "/home/alecive/code/catkin_my_ws/src/ros_speech2text/scripts/listening_to_display.py", line 8, in <module>
    from s2t.speech_recognition import SpeechRecognizer
  File "/home/alecive/code/catkin_my_ws/src/ros_speech2text/s2t/speech_recognition.py", line 10, in <module>
    from google.cloud import speech
ImportError: No module named cloud

(~/code/catkin_my_ws) (master) 
[alecive@malakim]$ rosrun ros_speech2text listening_to_display.py 
Traceback (most recent call last):
  File "/home/alecive/code/catkin_my_ws/src/ros_speech2text/scripts/listening_to_display.py", line 8, in <module>
    from s2t.speech_recognition import SpeechRecognizer
  File "/home/alecive/code/catkin_my_ws/src/ros_speech2text/s2t/speech_recognition.py", line 10, in <module>
    from google.cloud import speech
ImportError: No module named cloud

Improve testing

Test utterance detection against synthetic data (as unit tests) and annotated wav files (functional tests).

Multiuser Support and Other To-Do

  • Allow user to choose audio device if multi_user interactive mode enabled
  • Allow user to specify user_id if multi_user interactive mode enabled
  • Make param private

Improve documentation

@almighty-ken At some point, you should provide this repo with a proper description, a README.md file, and think about thoroughly document the code. We would like this to be used by other users, and it would be nice to have all of this information!

Fix ALSA problem

Fix the problem ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave

Node crashes if the device is not available

If the mic is off (and the device is then not available), the node crashes.

We can implement two strategies here:

  • close the node gracefully because if there is no device there is no way to do STT
  • keep the node running and waiting until the device becomes available

Both strategies make sense to me.

Random issues are popping up

[WARN][/ros_speech2text::generate_msg]: a lot of interesting debate about, confidence:0.962759
[INFO][/ros_speech2text::treat_chunk]: collecting audio segment
[INFO][/ros_speech2text::aux]: audio segment completed
[ERROR][/ros_speech2text::check_operation]: Error in speech recognition thread: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Endpoint read failed)>
[ERROR][/ros_speech2text::check_operation]: Error in speech recognition thread: a float is required
[ERROR][/ros_speech2text::check_operation]: Error in speech recognition thread: a float is required
[ERROR][/ros_speech2text::check_operation]: Error in speech recognition thread: a float is required
NODES
  /
    ros_speech2text (ros_speech2text/ros_s2t.py)

ROS_MASTER_URI=http://baxter.local:11311

core service [/rosout] found
process[ros_speech2text-1]: started with pid [7814]
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
[INFO][/ros_speech2text::main]: Using device: Samson RXD wireless receiver: USB Audio (hw:1,0)
[INFO][/ros_speech2text::treat_chunk]: collecting audio segment
[INFO][/ros_speech2text::aux]: audio segment completed
[WARN][/ros_speech2text::generate_msg]: dependency, confidence:0.548864
[ERROR][/ros_speech2text::check_operation]: Error in speech recognition thread: field speech_duration must be of type Time
[INFO][/ros_speech2text::treat_chunk]: collecting audio segment
[WARN][/ros_speech2text::generate_msg]: dependency, confidence:0.548864

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.