Coder Social home page Coder Social logo

speech-to-text's Introduction

Speech to text Service

This service implement REST API for recognition audio files using google cloud speech recognition

Installation

  1. First you need to have Google Cloud account
  2. Create credentials.json file according to documentation
  3. Enable speech-to-text service (press "Go to console" button and enable Cloud Speech-to-Text API)
  4. Create google bucket using documentation
  5. Run export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json, where /path/to/credentials.json is path to json file from p.2
  6. Run export BUCKET_NAME=you_bucket, where you_bucket is bucket name from to json file from p.4
  7. Run docker-compose build
  8. Run docker-compose up -d. For watching the application logs use docker-compose logs -f command

Google credentials

To run the application, a prerequisite is the presence of two environment variables. GOOGLE_APPLICATION_CREDENTIALS - The path to the main credentials file, which will be used for all enterpises, for which they are not specified in a separate config (see below about it).
BUCKET_NAME - Also a required variable and the name of the bucket, which will be used for the name of the bucket, in case it is not configured separately for enterprises.

Add credentials for specific enterprises

there are 2 ways to add credentials.

  1. Using config
  2. Using API (which adds these configs to the config in p.1)

Using config

config/buckets.json should contain the configuration for buckets, for example:

{
  "3": "bucket-name-for-enterprise-3",
  "4": "bucket-name-for-enterprise-4"
}

config/credentials/<enterprise_id>.json should contain credentials files for example config/credentials/3.json - is credentials for enterprise with id 3

Using API

POST localhost:7070/getTexts


Body Example:

{
   "credentials":{
      "auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
      "auth_uri":"https://accounts.google.com/o/oauth2/auth",
      "client_email":"[email protected]",
      "client_id":"12341234123412341234",
      "client_x509_cert_url":"https://www.googleapis.com/robot/v1/metadata/x509/speech-to-text%40seraphic-vertex-234234.iam.gserviceaccount.com",
      "private_key":"-----BEGIN PRIVATE KEY-----\nSomePrivateKey...\n-----END PRIVATE KEY-----\n",
      "private_key_id":"SomePrivateKey",
      "project_id":"seraphic-vertex-234234",
      "token_uri":"https://oauth2.googleapis.com/token",
      "type":"service_account"
      }, 
   "bucketName":"someBucketName", 
   "enterpriseId": 1
}

This endpoint checks for the presence of a bucket and the validity of the credentials. If something is invalid, the server will return a 409 error code. In case successful addition, 200 code is returned.


Speech to text Usage

REST API has only one endpoint for speech recognition:

POST localhost:7070/getTexts


Body Example:

[
  {
    "uuid": "A23D3",
    "fileUrl": "https://some-site.com/some_audio.wav",
    "language": "en"
  },
  {
    "fileUrl": "https://some-site.com/some_audio2.wav",
    "language": "en"
  }
]

uuid is optional.

fileUrl and language is required. Supported languages: en, it, de, fr, nl, es, ca, gl, pt, pl, ro, el, da, eu, ru, bg, sl, sr, hr.


Response Example:

[
  {
    "uuid": "A23D3",
    "fileUrl": "https://some-site.com/some_audio.wav",
    "text": "Good morning, and welcome to WWDC. WDC is incredibly important and our users..",
    "duration": 15,
    "language": "en"
  },
  {
    "uuid": "",
    "fileUrl": "https://some-site.com/some_audio2.wav",
    "text": "It's sure that we bring some of our biggest. I have a chance to live and we have not stopped, <.....>",
    "duration": 45,
    "language": "en"
  }
]

text is recogtized text

duration is duration of vaw file which is a multiple of 15, according to Google tariffication

speech-to-text's People

Contributors

devxpro avatar sergius71 avatar romansandsiv avatar lnglr avatar shcherbak avatar

Watchers

James Cloos avatar  avatar  avatar Roman Bielyi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.