Coder Social home page Coder Social logo

neekchang / ros_speech2text4jibo Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 635 KB

ROS package for using Google Cloud Speech and a TCP connection to send speech to text information to a tablet

CMake 15.00% Python 84.01% Dockerfile 0.99%
ros ros-packages google-cloud speech-to-text jibo-robot

ros_speech2text4jibo's Introduction

Ros Speech2Text with Ubuntu 16.04 and updated Google Speech Client + use with Team Meeting Jibo Project

This was used with the "Team Meeting Project" using the Jibo robots. All information is current as of 2018-7-16.

A speech2text engine for ROS (WARNING! This version on this branch uses Ubuntu 16.04 with ROS Kinetic and NOT Indigo), using the updated Google Cloud Speech API (Python Client Library v0.27).

For setting up ROS and all that fun stuff look here. Just make sure to replace any instance of the word "indigo" with "kinetic" because kinetic is the version of ROS for Ubuntu 16.04 You should also cross reference this as you go just to make sure anything kinetic specific is executed properly.

Once you have made sure that you have built and sourced your new catkin_ws, make sure to make your scripts executable. In terminal, go to the location of your python scripts. For this specific project, these are located in scripts folders. Once there, run chmod +x [name of file] and open a new terminal and you should be able to run using roslaunch.

You may also run into some problems with pip if you are starting from scratch (i.e. fresh installation of Ubuntu 16.04). One error I got a bunch was AttributeError: 'module' object has no attribute 'SSL_ST_INIT'. To fix this, just run sudo python -m easy_install --upgrade pyOpenSSL and it should be fine.

For most all other warnings and errors, you should be able to just do something like sudo pip install [name] --upgrade.

Just make sure that before trying to do pip install pyaudio you run this line first sudo apt-get install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0

NOTE: "SAMSON STAGE PXD1" microphones were used in our project, and if you want to use them again, open up the back and use the screwdriver to set the gain to somewhere near the third tick from the bottom. This sets the sensitivity to a place that easily detects the wearer's voice but not other sounds/voices. But test this yourself because depending on the surroundings, you may need higher or lower sensitivity. Also, make sure to have the mics directly facing your mouth and not to the side of your mouth, or you may get unpleasant results.

We transitioned away from the SAMSON mics to the Plantronics Blackwire C310-M headsets because of technical difficulties involving the SAMSON mics having an unreliable connection and speech recording. The Plantronics headsets are also much cheaper and require less tweaking. Just plug them in and start speaking. In terms of starting the nodes and running the files, everything remains the same.

When creating individual nodes for the mics, use the option to set the mics based on their names and not their numerical id. The numerical id often changes and sometimes just straight up doesn't work. So using the name "hw:#,0" works much better. Just please don't assume the hardware numbers for the mics start at 1. This is not always the case, especially if you have a dedicated graphics card or something else like that. If in the case that you run the launch file and keep getting an error something along the lines of unable to find the mic, then just close all your open terminals and restart roscore and relaunch the files. You shouldn't need to do this, but also this should always work (given that all your code is correct)

Running in terminal:

In terminal, make sure you run roscore first before trying to run the other files

Once you have roscore up and running, open another terminal window or tab and run roslaunch team_meeting_project send_speech.launch type:=tablet or roslaunch team_meeting_project send_speech.launch type:=local whether you want to run the code with a TCP connection to a tablet or locally without the need for the TCP connection.

In another terminal window or tab run roslaunch ros_speech2text ros_speech2text.launch to run with only one mic or roslaunch ros_speech2text ros_speech2text_[2, 3, or 4]mics.launch depending on how many mics you want to run with. For example if I want to run with 3 mics, I would run roslaunch ros_speech2text ros_speech2text_3mics.launch

Note: In the case that the mics become, for whatever reason, out of order (i.e. mic 1 is no longer associated with pid 1), then unplug all of the mics, and run the different launch files in order of increasing number of mics while adding the appropriate mics one at a time. For example, with all of mics unplugged, insert the receiver for mic 1 and then run roslaunch ros_speech2text ros_speech2text.launch then add the second mic receiver and run the launch file for 2 mics and so on. You may need to exit out of all open terminals and restart terminals.

If you want to control whether or not to use the start_utterance messages, look in the ros_speech2text launch files and find the parameter enable_start_utterance.

Using the updated Google-cloud speech-to-text API

Take a look at these pages (navigating Google's documentation can be kind of annoying sometimes):

first place to look

second place to look

(The first link gives the best comprehensive overview of the API while the second link gives better code examples.)

migration from old Google API

stable version of speech client

documentation on methods

beta version of client with added functionality like auto punctuation

auto punctuation documentation (I don't really notice a difference in terms of recognition speed, so it could be cool to keep testing this out)

For information on how to analyze the transcript for things like getting the sentiment of the sentence or grabbing the nouns and verbs of the sentence, look here:

first place to look

second place to look

for analyzing syntax

ros_speech2text4jibo's People

Contributors

neekchang avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.