Coder Social home page Coder Social logo

saikrishnar / cloud-asr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ufal-dsg/cloud-asr

0.0 2.0 0.0 2.08 MB

Cloud-based Automatic Speech Recognition (ASR) platform and a public ASR webservice.

Home Page: http://www.cloudasr.com

License: Apache License 2.0

Makefile 6.71% Python 69.86% Protocol Buffer 0.99% CSS 0.05% HTML 12.01% JavaScript 6.23% Shell 4.16%

cloud-asr's Introduction

CloudASR

CloudASR is a software platform and a public ASR webservice. Its three strong features are:

  • automatic scalability with increased workload
  • ease of deployment
  • state-of-the-art incremental speech recognition performance

Platform’s API supports both batch and incremental speech recognition. The batch version is compatible with Google Speech API. New ASR engines can be added onto the platform and work simultaneously.

Installation

In order to be able to run CloudASR Docker has to be installed on the host machine. You can follow the instructions for your distribution at http://docs.docker.com/installation/. Additionally it is necessary to download docker images. You can do that by typing make pull - be aware that the images has several GBs.

Running CloudASR locally

Just type make run_locally and everything will be running in a while. You can open http://localhost:8001 to see which workers are running. Additionally, you can open http://localhost:8000 and try out our interactive web demo.

Running CloudASR on Mesos cluster

In order to be able to run the CloudASR on Mesos cluster, you have to update marathon_url and master_ip in the mesos.json configuration:

{
    "domain": "cloudasr.com",
    "marathon_url": "localhost:8080",
    "master_ip": "127.0.0.1 - IP of the mesos-slave where the CloudASR master should run",
    "workers": [
        {"image": "ufaldsg/cloud-asr-worker-en-voxforge", "instances": 1},
        {"image": "ufaldsg/cloud-asr-worker-en-wiki", "instances": 1},
        {"image": "ufaldsg/cloud-asr-worker-en", "instances": 1},
        {"image": "ufalgsg/cloud-asr-worker-cs", "instances": 1},
        {"image": "ufaldsg/cloud-asr-worker-cs-alex", "instances": 1}
    ]
}

Note that the suffix of the worker image name corresponds to lang parameter used in Batch and Online APIs. For example ufaldsg/cloud-asr-worker-en will handle requests with parameter lang=en.

After that you can type make run_mesos and you should see running instances in the Marathon console in a while. After that you should start a load-balancer on a server associated with the domain specified in the mesos.json. You can do that by typing:

docker run -p 80:80 -e MARATHON_URL=localhost:8080 -d choko/haproxy

After that you should be able to see the demo page on http://demo.cloudasr.com and the monitor page on http://monitor.cloudasr.com.

How to use CloudASR

CloudASR provides two modes of speech recognition: online recognition and batch recognition. In the following text we will describe how you can use them.

Batch API

Batch API is compatible with Google Speech API, but it supports only wav files and json output at this moment. Users can use parameter lang to specify which language they want to use for speech recognition. These language models are available now:

  • en-voxforge - English (Voxforge AM+Wikipedia LM)
  • en-wiki - English (TED AM+Wikipedia LM)
  • en-towninfo - English (VYSTADIAL TownInfo AM+LM)
  • cs - Czech (VYSTADIAL AM + Wikipedia LM)
  • cs-alex - Czech (VYSTADIAL AM + PTIcs LM)

If you want to transcribe english speech in a recording.wav file you can send following curl request:

curl -X POST --data-binary @recording.wav --header 'Content-Type: audio/x-wav; rate=16000;' 'http://localhost:8000/recognize?lang=en-towninfo'

and you should get a response similiar to this:

{
  "result": [
    {
      "alternative": [
        {
          "confidence": 0.5549500584602356,
          "transcript": "I'M LOOKING FOR A BAR"
        },
        {
          "confidence": 0.14846260845661163,
          "transcript": "I AM LOOKING FOR A BAR"
        },
        {
          "confidence": 0.08276544511318207,
          "transcript": "I'M LOOKING FOR A RESTAURANT"
        },
        {
          "confidence": 0.06668572872877121,
          "transcript": "I AM LOOKING FOR A RESTAURANT"
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Online API

Online API uses Sockets.io for transfering PCM chunks to the CloudASR server. Messages have following format:

From Client to Server

  • First we have to start recognition by sending information about used language.

    socketio.emit('begin', {'lang': 'en-GB'})
  • After that we can send PCM chunks to the server. Every chunk is a 16 bit PCM array.

    socketio.emit('chunk',  {'chunk': [128, 123, 15,..., 25], 'frame_rate': 16000})
  • Finally we end the recognition by sending following message

    socketio.emit('end', {})

From Server to Client

Server responds to every chunk with a message with interim results:

{
    "status": 0,
    "final": false,
    "result": {
        "hypotheses": [
            {"transcript": "I AM LOOKING"}
        ]
    }
}

At the end of the recognition server sends final hypothesis in the following format:

{
    "result": [
        {
            "alternative": [
                {"confidence": 0.5364137887954712, "transcript": "I AM LOOKING FOR A MY"},
                {"confidence": 0.46358612179756165, "transcript": "I'M LOOKING FOR A MY"}
            ],
            "final": true
        }
    ],
    "result_index": 0
}

Note that the Online API will switch from SocketsIO to binary Websockets to decrease the traffic in the near future.

Using CloudASR's SpeechRecognition.js library

If you want to use speech recegnition on your website, you can use our javascript library. Please add these scripts to your html:

<script src="http://www.cloudasr.com/js/socket.io.js"></script>
<script src="http://www.cloudasr.com/js/Recorder.js"></script>
<script src="http://www.cloudasr.com/js/SpeechRecognition.js'"></script>

Then you can use SpeechRecognition in following manner:

var speechRecognition = new SpeechRecognition();
speechRecognition.onStart = function() {
    console.log("Recognition started");
}

speechRecognition.onEnd = function() {
    console.log("Recognition ended");
}

speechRecognition.onError = function(error) {
    console.log("Error occured: " + error);
}

speechRecognition.onResult = function(result) {
    console.log(result);
}

var lang = "en-wiki";
$("#button_start").click(function() {
    speechRecognition.start(lang);
});

$("#button_stop").click(function() {
    speechRecognition.stop()
});

You can also take a look at source code of our demo page (index.html, main.js).

Privacy & Terms

All data, including audio recording, is stored for the purpose of ASR quality improvement. Note that the data can be shared with third parties for both research and commercial purposes. All collected data will be made available to the ASR community; therefore, do not say anything you do not want anyone to know about.

The service is available for free. As a result, no guarantees are given regarding the quality of ASR results. As of now, it is a beta product; thus, things may break and the service may not be available for large periods of time.

Contact us

The CloudASR platform is developed by the Dialogue Systems Group at UFAL and the work is funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221, by the core research funding of Charles University in Prague. The language resources presented in this work are stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).

If you have any questions regarding CloudASR you can reach us at our mailinglist: [email protected].

cloud-asr's People

Contributors

ondrejklejch avatar ticcky avatar oplatek avatar jurcicek avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.