microsoft / cognitive-services-speech-sdk-js Goto Github PK

Microsoft Azure Cognitive Services Speech SDK for JavaScript

License: Other

Shell 0.14% Batchfile 0.16% JavaScript 0.93% TypeScript 98.50% HTML 0.27%

cognitive-services-speech-sdk-js's Introduction

Microsoft Cognitive Services Speech SDK for JavaScript

The Microsoft Cognitive Services Speech SDK for JavaScript is the JavaScript version of the Microsoft Cognitive Services Speech SDK. An in-depth description of feature set, functionality, supported platforms, as well as installation options is available here.

The JavaScript versions of the Cognitive Services Speech SDK supports browser scenarios as well as the Node.js environment.

Installing

For the latest stable version:

npm install microsoft-cognitiveservices-speech-sdk

Documentation

Samples

Quick-start samples for Node.js: In the Speech SDK samples repo under quickstart/javascript/node.
Quick-start samples for Browser: In the Speech SDK samples repo under quickstart/javascript/browser.
Other Node.js and Browser samples: In the Speech SDK samples repo under samples/js.

Building

This source code for the Cognitive Services Speech SDK (JavaScript) is available in a public GitHub repository. You are not required to go through the build process. We create prebuilt packages tuned for your use-cases. These are updated in regular intervals.

In order to build the Speech SDK, ensure that you have Git and Node.js installed. Version requirement for Node: 12.44.0 or higher (or 14.17.0 or higher for Node 14).

Clone the repository:

git clone https://github.com/Microsoft/cognitive-services-speech-sdk-js

Change to the Speech SDK directory:

cd cognitive-services-speech-sdk-js

Run setup to pull updated dependency versions:

npm run setup

Install the required packages:

npm install

Run the build:

npm run build

Testing

Run all tests

Run tests (see ci/build.yml) -- complete results require several specifically-configured subscriptions, but incomplete results can be obtained with a subset (expect and ignore failures involving missing assignments).

At a minimum, invoking npm run test will compile/lint the test files to catch early problems in test code changes.

RunTests.cmd ^
    SpeechSubscriptionKey:SPEECH_KEY ^
    SpeechRegion:SPEECH_REGION ^
    LuisSubscriptionKey:LUIS_KEY ^
    LuisRegion:LUIS_REGION ^
    SpeechTestEndpointId:CUSTOM_ENDPOINT ^
    BotSubscription:BOT_KEY ^
    BotRegion:BOT_REGION ^
    SpeakerIDSubscriptionKey:SPEAKER_ID_KEY ^
    SpeakerIDRegion:SPEAKER_ID_SUBSCRIPTION_REGION ^
    CustomVoiceSubscriptionKey:CUSTOM_VOICE_KEY ^
    CustomVoiceRegion:CUSTOM_VOICE_REGION

Run a subset of tests

Edit the file jest.config.js. Replace the regex expressions in testRegex: "tests/.*Tests\\.ts$" with one that defines the test file (or files) you want to run. For example, to only run tests defined in AutoSourceLangDetectionTests.ts, replace it with testRegex: "tests/AutoSourceLangDetectionTests.ts". Do this is for the two project jsdom and node.
Option 1: Use a secrets file. Create the file secrets\TestConfiguration.ts. It should import the default configuration settings and define the values of the mandatory ones for this test, as well as and any additional optional settings. For example, to run the AutoSourceLangDetectionTests.ts tests, the required mandatory values are the speech key and region (using a fake key here as an example):
```
import { Settings } from "../tests/Settings";
Settings.SpeechSubscriptionKey = "0123456789abcdef0123456789abcdef";
Settings.SpeechRegion = "westcentralus";
```
Then to run the tests type RunTests.cmd in the root of the repo.
Option 2: Use command line arguments. Instead of creating secrets\TestConfiguration.ts, pass the values directly to RunTests.cmd. For the above example, this would be:
```
RunTests.cmd SpeechSubscriptionKey:0123456789abcdef0123456789abcdef SpeechRegion:westcentralus
```
Option 3: Edit the file tests\Settings.ts directly and enter values needed to run the test.
See summary of the test results in test-javascript-junit.xml.

Data / Telemetry

This project collects data and sends it to Microsoft to help monitor our service performance and improve our products and services. Read the Microsoft Privacy Statement to learn more.

To disable telemetry, you can call the following API:

// disable telemetry data
sdk.Recognizer.enableTelemetry(false);

This is a global setting and will disable telemetry for all recognizers (already created or new recognizers).

We strongly recommend you keep telemetry enabled. With telemetry enabled you transmit information about your platform (operating system and possibly, Speech Service relevant information like microphone characteristics, etc.), and information about the performance of the Speech Service (the time when you did send data and when you received data). It can be used to tune the service, monitor service performance and stability, and might help us to analyze reported problems. Without telemetry enabled, it is not possible for us to do any form of detailed analysis in case of a support request.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

cognitive-services-speech-sdk-js's People

Contributors

Stargazers

Watchers

Forkers

wolfma61 domratchev jay-jh-lee reconbot femotizo jpetitte vdedyukhin orgads robinmanuelthiel lmit-dev tusharvgc euirim rozele frankodoom land007 yinhew compulim rcidaleassumpo bhaskers-blu-org2 bivas6 mstoeber taffywrinkle claudiusgonzo enricop89 shaunholt snookpy 15148880306 donahut prayatna spacekatt jimjdeal global-localhost global19 global19-atlassian-net chousg australiandisability zhamppx97 feng-jy qpc-database glharper dev-loic sparkles08 ej-msl scbedd karlotimmerman khankanz nemoo3ki calinalexandru samjonescode icaksama tomsherman lf4063 phmanik ahmadfaraz-crypto aidealab nordprojects wvdijke thekalinga kaotoby cjac35 olegvolkovgit wlee88 isabella232 maik-parloa jxpxn me4502 cshey15 fmichel5173 summer4096 shivsarthak zenoamaro grand151 miurabo cristina-gabriela hkleungai boogh46 neyshaa2626 owaiskhan9654 variabo stefanoanzolut azakariamsft ahmedismaiel nacartwright michaellee8 behanma5 dearborn-open-ai javithsherif27 iddogino coryapp

cognitive-services-speech-sdk-js's Issues

SDK Sending an Incomplete WAV header

Hi,

I'm one of the devs working on the Microsoft Bot Framework Emulator desktop application, which hosts another web application -- (Microsoft Bot Framework Web Chat) -- which directly consumes the cognitive-services-speech-sdk.

When testing the speech capabilities of Web Chat on its own, in the browser, I am able to properly communicate with a bot using our DirectLine Speech channel, which leverages cognitive services speech and the SDK.

However, when trying to communicate with the same bot via Web Chat when it is embedded in the Emulator -- an Electron desktop application -- the communication to the bot fails when trying to send speech data to cognitive services speech via the SDK.

I took a look at the web socket traffic in both scenarios, and it turns out that when using the speech SDK in the Electron application, an incomplete WAV header is being sent over the web socket, which results in the cognitive services speech service closing the socket with a reason of InvalidPayloadData.

Web Socket Header in Browser (Working):

 cpath: audio
x-requestid: 86A6DF96B0A04C12B89817FEBF5637E2
x-timestamp: 2020-02-06T00:45:51.403Z
RIFF    WAVEfmt �   � � �>   }  � � data

Web Socket Header in Electron (Broken):

 cpath: audio
x-requestid: BC0A81B4AAA84B48A2F89752139B88FD
x-timestamp: 2020-02-06T00:43:41.421Z

You can see that in the Electron header, it is missing the RIFF WAVEfmt ... data chunk of the header.

Add Releases to GitHub

Currently, you are only offering releases as a download on the docs.microsoft.com pages, which is pretty unuseful, as I can't get noticed about new releases and it breaks the media between GitHub and MS Docs. I would suggest, to also use GitHub's Releases feature and also publish new releases there.

Host is cached and not changed if new instance of SpeechRecognizer with different region

Hello!
I've got very strange behavior of SpeechRecognizer instance.
I've started from the quickstart example - https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstart-js-browser
The example allows to switch regions to try different setting. By if once we created one instance of SpeechRecognizer then all others will be created with first region settings, no matter what region is selected.
It is because inside the SpeechConnectionFactory host variable is taken from Storage.local that is InMemoryStorage. So parameter is set once and not changed during the run-time.
And it is not very obvious. Also other parameters use the same approach.
So it is impossible to change region or debug mode in run-time.

It is not a very big problem, but it cost me an hour to understand what happened. :)

Plans to include SpeechSynthesizer implementation for JS SDK?

We are working on an Android Ionic app that uses this SDK for ASR, but we need to use the REST endpoint and the Ionic Native media plugin to play the response. Any plans to add a SpeechSynthesizer class, as found in the Java, .NET, and Obj-C SDKs?

v1.9.0 incompatible with Typescript <3.5.0

The latest release of this SDK is incompatible with earlier versions of the TypeScript compiler. Some common web frameworks (e.g., Angular 8.1.2) still depend on earlier versions of the TypeScript compiler.

E.g., using TypeScript 3.4.5 returns many TS1086 errors:

ERROR in node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common.browser/FileAudioSource.d.ts:14:9 - error TS1086: An accessor cannot be declared in an ambient context.

14     get format(): AudioStreamFormatImpl;
           ~~~~~~

SpeechRecognizer throws undefined TypeError on Android WebView

I'm building an Ionic app and have been testing out using the JS SDK instead of building a custom native plugin.

I encountered an error when the audio is being shutdown:

I was running a simple Ionic app that calls recognizeOnceAsync on the SpeechRecognizer using Android API 28 and Android API 29.

Does the NodeJS version support fromDefaultMicrophone?

It looks like this isn't possible to do, I get an error saying window is not defined. There also doesn't seem be any examples of the nodejs sdk using AudioConfig.fromDefaultMicrophoneInput so I'm guessing it isn't supported? It works perfectly fine if I use AudioConfig.fromStreamInput

If it is possible, can someone tell me what I'm doing wrong here?

var speechsdk = require("microsoft-cognitiveservices-speech-sdk");
var subscriptionKey = ";)";
var serviceRegion = "eastus"; // e.g., "westus"

const speech_Config = speechsdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion, "en-US");
const audioConfig = speechsdk.AudioConfig.fromDefaultMicrophoneInput();
let speech_recognizer= new speechsdk.SpeechRecognizer(speech_Config, audioConfig);

speech_recognizer.recognizeOnceAsync(
    function (result) {
    console.log(result);
	    speech_recognizer.close();
	    speech_recognizer = undefined;
    },
   function (err) {
       console.trace("err - " + err);
	   speech_recognizer.close();
	   speech_recognizer = undefined;
});

How to debug 'Unable to contact server'?

I have some code that uses the SpeechRecognizer class. Yesterday, it was working perfectly, but today, I get this error with recognizer.canceled:

SpeechRecognitionCanceledEventArgs {
  privSessionId: '7E58E62A2C99403681671304D6E7A233',
  privOffset: undefined,
  privReason: 0,
  privErrorDetails: 'Unable to contact server. StatusCode: 1006, undefined Reason: ',
  privErrorCode: 4
}

I don't know why. My code is exactly the same as yesterday.

How can I debug this? I tried changing my private key but nothing changed.

Conversation_Custom_Voice_Deployment_Ids not supported

It seems like the Conversation_Custom_Voice_Deployment_Ids which is available in other SDKs (https://docs.microsoft.com/en-gb/java/api/com.microsoft.cognitiveservices.speech.propertyid) with v1.8.0 is not supported by the JS SDK yet.
Any plans on adding support for it (and release an update for both FromId and this?)?

Unable to contact server. StatusCode: 1006, undefined Reason. Error Code: 7

During long-running recordings, I quite randomly get the following error after approximately 15 - 20 minutes:

The recording crashes and can't be started again.

I use the following code:

const token = // fetch token from server...
const config = // ...
const region = // ...

this.speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(token, config.region);
this.speechConfig.speechRecognitionLanguage = 'de-DE';
this.audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

this.speechRecognizer = new SpeechSDK.SpeechRecognizer(this.speechConfig, this.audioConfig);
this.speechRecognizer.recognizing = (s, e) => this.recognizing(s, e);
this.speechRecognizer.recognized = (s, e) => this.recognized(s, e);
this.speechRecognizer.canceled = (s, e) => this.recordingCanceled(s, e);

this.speechRecognizer.startContinuousRecognitionAsync();

// Renew token every 9 minutes
setInterval(async () => {
  const token = // fetch token from server...
  this.speechRecognizer.authorizationToken = token;
}, 540000);

Is there something wrong on your end or my end?

According to the documentation it is an unexpected error.

missing conversation api

I'm looking for the cognitive service speech conversation api for a project; however this seems to be missing; is this part of another package?

Continuous recognition doesn't work well when throttling the input stream

When using startContinuousRecognitionAsync with throttled input stream, I'm getting a repeating output in "recognizing" and "recognized"

Sample code follows.

"use strict";

// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");
var fs = require("fs");
const Throttle = require('throttle');

// replace with your own subscription key,
// service region (e.g., "westus"), and
// the name of the file you want to run
// through the speech recognizer.
var subscriptionKey = "YourSubscriptionKey";
var serviceRegion = "YourServiceRegion"; // e.g., "westus"
var filename = "test-message-16k.wav"; // 16000 Hz, Mono

// create the push stream we need for the speech sdk.
var pushStream = sdk.AudioInputStream.createPushStream();

// open the file and push it to the push stream.
fs.createReadStream(filename).pipe(new Throttle(16384)).on('data', function(arrayBuffer) {
  pushStream.write(arrayBuffer.buffer);
}).on('end', function() {
  pushStream.close();
});

// we are done with the setup
console.log("Now recognizing from: " + filename);

// now create the audio-config pointing to our stream and
// the speech config specifying the language.
var audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
var speechConfig = sdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);

// setting the recognition language to English.
speechConfig.speechRecognitionLanguage = "en-US";

// create the speech recognizer.
var recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
const reco = recognizer;
// const recConfig = reco.createRecognizerConfig(speechConfig);

// recConfig.recognitionMode = RecognizerConfig.Dictation;

reco.recognizing = (_s, event) => {
  console.log('(recognizing) Text: ' + event.result.text);
};

/*
 * The event recognized signals that a final recognition result is received.
 * This is the final event that a phrase has been recognized.
 * For continuous recognition, you will get one recognized event for each phrase recognized.
 */
reco.recognized = (s, e) => {
  // Indicates that recognizable speech was not detected, and that recognition is done.
  if (e.result.reason === sdk.ResultReason.NoMatch) {
    var noMatchDetail = sdk.NoMatchDetails.fromResult(e.result);

    console.log('(recognized)  Reason: ' + sdk.ResultReason[e.result.reason] + ' NoMatchReason: ' + sdk.NoMatchReason[noMatchDetail.reason]);
  } else {
    try {
    const obj = JSON.parse(e.result.json);

    // In case no real input was recognized (e.g., just a noise on the line), the Confidence level is 0, and should be ignored.
    console.log('(recognized)  Reason: ' + sdk.ResultReason[e.result.reason] + ' Text: ' + e.result.text);
    } catch (err) {
      console.error(err);
    }
  }
};

/*
 * The event signals that the service has stopped processing speech.
 * https://docs.microsoft.com/javascript/api/microsoft-cognitiveservices-speech-sdk/speechrecognitioncanceledeventargs?view=azure-node-latest
 * This can happen for two broad classes of reasons.
 * 1. An error is encountered.
 *    In this case the .errorDetails property will contain a textual representation of the error.
 * 2. Speech was detected to have ended.
 *    This can be caused by the end of the specified file being reached, or ~20 seconds of silence from a microphone input.
 */
reco.canceled = (s, e) => {
  const str = '(cancel) Reason: ' + sdk.CancellationReason[e.reason];

  if (e.reason === sdk.CancellationReason.Error) {
    str += ': ' + e.errorDetails;
  }
  console.log(str);
  this.stop();
};

// Signals that a new session has started with the speech service
reco.sessionStarted = (s, e) => {
  const str = '(sessionStarted) SessionId: ' + e.sessionId;

  console.log(str);
};

// Signals the end of a session with the speech service.
reco.sessionStopped = (s, e) => {
  const str = '(sessionStopped) SessionId: ' + e.sessionId;

  this.stop();
  console.log(str);
};

// Signals that the speech service has started to detect speech.
reco.speechStartDetected = (s, e) => {
  const str = '(speechStartDetected) SessionId: ' + e.sessionId;

  console.log(str);
};

// Signals that the speech service has detected that speech has stopped.
reco.speechEndDetected = (s, e) => {
  const str = '(speechEndDetected) SessionId: ' + e.sessionId;

  console.log(str);
};

// start the recognizer and wait for a result.
recognizer.startContinuousRecognitionAsync(
  null,
  function (err) {
    console.trace("err - " + err);

    recognizer.close();
    recognizer = undefined;
  });

Output:

Now recognizing from: test-message-16k.wav
(sessionStarted) SessionId: 2BD50B1AD5B14F529B8E12935EBD8C00
(speechStartDetected) SessionId: 2BD50B1AD5B14F529B8E12935EBD8C00
(recognizing) Text: this
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.
(recognizing) Text: this
(recognizing) Text: this is a
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.
(recognizing) Text: this
(recognizing) Text: this is a
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.
(recognizing) Text: this
(recognizing) Text: this is a
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.
(recognizing) Text: this
(recognizing) Text: this is a
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.
(recognizing) Text: this
(recognizing) Text: this is a
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.
(recognizing) Text: this
(recognizing) Text: this is a
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.
(recognizing) Text: this
(recognizing) Text: this is a
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.
(recognizing) Text: this
(recognizing) Text: this is a
(recognizing) Text: this is a test
(recognizing) Text: this is a test message
(recognized)  Reason: RecognizedSpeech Text: This is a test message.

For a file that is a bit longer, each sentence becomes duplicated several times.

Please advise.

SDK API alignment for languages

I've been using the C# speech api to develop a proof of concept with blazor; the APIs IMHO should align; the API surface for c# and javascritp is somewhat different making an interop layer difficult and prone to issues.

(cancel) Reason: Error: Failed to obtain OCSP response: 400

AudioInputId is not passed into MicAudioSource when making a new audio configuration

https://github.com/microsoft/cognitive-services-speech-sdk-js/blob/master/src/sdk/Audio/AudioConfig.ts#L40

Hi, I'm reporting from the Bot Framework Web Chat team.

When passing in the deviceId to the AudioConfig.fromMicrophoneInput, it looks like deviceId is being passed as the audioSourceId into the MicAudioSource.

This is causing bugs for Web Chat when attempting to manually set the audio source. We pass the deviceId in, but it continues to use the default device.

Continuous Recognition from Microphone

I am trying to use the continuous recognition feature when somebody speaks on the microphone.
I am using the mic package to get the data from the microphone and recognize it, but I am always getting wrong results. Idk what's missing, may you help me?
I will add some parts of my code so you can understand.

Here is how I create an instance of pushStream

this.pushStream = AudioInputStream.createPushStream(AudioStreamFormat.getWaveFormatPCM(16, 16000, 1));

Here is the method I call to recognize the user voice

    recognizeAsync() {
        this.audioConfig = AudioConfig.fromStreamInput(this.pushStream);
        this.recognizer = new SpeechRecognizer(this.speechConfig, this.audioConfig);
        this.subject = new Observable(subs => {
            this.subscription = subs;
            this.recognizer.startContinuousRecognitionAsync();
            this.recognizer.recognizing = (rec, {result}) => {
                subs.next(result);
            };
            this.recognizer.recognized = (rec, {result}) => {
                subs.next(result);
            };
        });
        return this.subject;
    }

And here is where I use the mic package to get the user voice data

speech = new Speech(language, subscriptionKey, region);
speech.recognizeAsync().subscribe(result => {
        console.log('result', result);
});
var micInstance = mic({
        rate: '16000',
        channels: '1',
        debug: false,
        exitOnSilence: 6,
        fileType: 'wav' //have also tried with raw type
});
const micInputStream = micInstance.getAudioStream();

micInputStream.on('data', function(data) {
    speech.pushStream.write(data);
    //console.log("Recieved Input Stream: ", data);
});

v1.9.0 breaks microphone support

Use AudioConfig.fromDefaultMicrophoneInput() as the audio config for a connection and try listening once. It will crash with something like

Error: Uncaught [TypeError: Cannot read property 'send' of undefined]
        at reportException (…/node_modules/jsdom/lib/jsdom/living/helpers/runtime-script-errors.js:62:24)
        at Timeout.callback [as _onTimeout] (…/node_modules/jsdom/lib/jsdom/browser/Window.js:645:7)
        at ontimeout (timers.js:436:11)
        at tryOnTimeout (timers.js:300:5)
        at listOnTimeout (timers.js:263:5)
        at Timer.processTimers (timers.js:223:10) TypeError: Cannot read property 'send' of undefined
        at …/node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common.speech/src/common.speech/DialogServiceAdapter.ts:460:83
        at Timeout.callback [as _onTimeout] (…/node_modules/jsdom/lib/jsdom/browser/Window.js:643:19)
        at ontimeout (timers.js:436:11)
        at tryOnTimeout (timers.js:300:5)
        at listOnTimeout (timers.js:263:5)
        at Timer.processTimers (timers.js:223:10)

Works fine in v1.8.1 but is broken in the recently released v1.9.0.

Cannot install this package from GitHub.

When installing from GitHub, prepack script is run before dependencies are installed.
See npm/npm#19564.

It is recommended to use prepare script instead:
https://docs.npmjs.com/misc/scripts

Get telemetry data and input quality

In the Readme, you writing:

We strongly recommend you keep telemetry enabled. With telemetry enabled you transmit information about your platform (operating system and possibly, Speech Service relevant information like microphone characteristics, etc.), and information about the performance of the Speech Service (the time when you did send data and when you received data). It can be used to tune the service, monitor service performance and stability, and might help us to analyze reported problems. Without telemetry enabled, it is not possible for us to do any form of detailed analysis in case of a support request.

Is there any chance, to receive that telemetry (especially the microphone characteristics) as a developer.

Throughout our development, we realized that there the Speech-To-Text output quality is obviously highly dependent on the microphone input quality. So we are planning to warn the user, once their microphone volume is too low or it starts clipping.

Is this something, the SDK could theoretically detect by itself?

Offer package managers like NPM

From a modern and state of the art JavaScript SDK, I would highly expect a way to integrate it into my project using a package manager like NPM instead of downloading zip files from the docs.microsoft.com website, as it is the current state.

Please provide official npm support to make it easier to install, version and update the SDK as a nice dependency in your customer's projects.

Brazilian Portuguese transcription not behaving as expected

Hello!

We are running a POC using Javascript SDK that listens to the microphone and transcribe the text to the application.

The language we are using is pt-BR and the behavior is a bit different when comparing to en-US.

E.g:

Numbers in the wrong place:

Person says "Um momento por favor" (One moment please) that is transcribed to "1 momento por favor".

The same sentence in en-US is transcribed exactly as it was said: "One moment please"

Intonation:

Transcriptions does not contains question marks at the end.

One says "Como posso te ajudar?" is then transcribed to "Como posso te ajudar"

Currency:

Currency symbols appears after the value.

One says "O valor total é R$ 250" is transcribed to "O valor total é 250R$"

Do you have any plans on improving the Brazilian Portuguese recognition?

Thank you

Support overriding recognition mode for continuous speech recognition

Originally reported in MicrosoftDocs/azure-docs#28399.

It looks like continuous async recognition has Conversation mode hardcoded. Can you please provide a way to override it by configuration? We need to use Dictation.

Thanks.

speech sdk js for cordova / phonegap (ios / android)

Hello, I'm an Azure customer, and I currently use your speech sdk-js on the browser (it works great)... I want to port this project to mobile phones under an environment like cordova or phonegap ... Is there a way to easily do that or do I need additional plugins?

Regards

Hugo Barbosa

Congrats to your whole team!!!

[Azure Gov] Cognitive Speech API always points to commercial

This comes from a customer issue: microsoft/BotFramework-WebChat#2969

Webchat consumes the cognitive-services-speech-sdk-js and we found that the endpoints for STT API always point to commercial cloud. From customer issue report:

Cognitive Speech API in webchat always points to the commercial endpoint. wss://virginia.stt.speech.microsoft.com instead of wss://virginia.stt.speech.azure.us as stated in this link. This is regardless of using "virginia" or "usgovvirginia" as the region parameter when constructing webSpeechPonyfillFactory.

We noticed that SpeechConnectionFactory uses .stt.speech.microsoft.com regardless of the region and perhaps this is the cause of the issue.

Could you advise us on how to use Speech API in Azure Gov clouds?

v1.9.0 no longer working on Cordova Android

I was able to get v1.8.1 mostly working on a Cordova Android app (modulo a small bug: #128).

When trying to get v1.9.0 working, I get the following error:

Looks like this was recently introduced here: b3a7ca8

It seems like this env variable check should be isolated to Node.js usage of the Speech SDK.

Websockets error when using Custom Speech models

Hello,

We have developed in May-June a web application which is using Speech input. On this web app we can select default endpoint or Custom Speech models, and the implementation is using the same lib (just setting the speechConfig.endpointId when using custom).

For a few days now, we are facing the following error when using the Custom Speech models (we don't have this error when using default model provided by Microsoft):

WebSocket is already in CLOSING or CLOSED state.

This error is thrown by microsoft.cognitiveservices.speech.sdk.bundle.js:2523

Details:

The error is happening when using recognizeOnceAsync or startContinuousRecognitionAsync
We were using the JS SDK 1.5.0, I tried to update to 1.6.0 but we still got the same error.
Test environment: Chrome navigator on Mac and Windows (up-to-date, which is version 75.0.3770.142 today)
Our Custom Speech models are running and the "Check endpoint" feature on Speech portal, used with a wav file, is working successfully
I also checked our API Key and model endpointId: they did not change

We have no idea on how we could further investigate.

Speech-to-Text: disable audio logging

according to the docs (https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/faq-stt) Speech-To-Text logs audio requests by default ("You have a choice when you create a deployment to switch off tracing. At that point, no audio or transcriptions will be logged. Otherwise, requests are typically logged in Azure in secure storage.").

However in docs of SDK I only see option to enable audio logging SpeechConfig.enableAudioLogging()

How can I ensure, that audio is not logged?

Timeout errors: modifying the default timeout

Hello ! I've been having trouble with timeout errors lately that seem to happen when I'm starting a session. The errors don't happen systematically, but seem to affect a small percentage of the sessions started.

This is the code I use to initialize and start my sessions:

      return new Promise((resolve) => {
          /* Event listener that signals that a new session has started with the
          speech service */
          recognizer.sessionStarted = (speechRecognizer, event) => {
            logger.verbose(`[recognizerPromise] | (sessionStarted) SessionId: ${event.sessionId}`);
            startStreamingAudioToMicrosoft();
            resolve(recognizer);
          };

          /* Starting the speech recognition, we use the continous recognition async mode
          given that we're working with streams (and not reading from a file) */
          recognizer.startContinuousRecognitionAsync(
            () => {
              logger.verbose('[recognizerPromise] | starting session...');
            },
            (error) => {
              logger.error('[recognizerPromise] | error during async recognition ', error);
              /* We don't really know when this callback is triggered, the Speech Services
              doc is quite vague, only saying ("Callback invoked in case of an error.")
              https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/speechrecognizer?view=azure-node-latest#startcontinuousrecognitionasync-------void---e--string-----void- */
              return stopStream();
            },
          );
        });

The time-out error happens like that:

After the function in the code snippet above is called, we start by seeing a [recognizerPromise] | starting session... log as expected
However, we don't see the [recognizerPromise] | (sessionStarted) SessionId log line next as expected.
After about two minutes, we receive the following error:

{
    "privSessionId": "XXXX",
    "privReason": 0,
    "privErrorDetails": "Unable to contact server. StatusCode: 1006, undefined Reason: ",
    "privErrorCode": 4
}

As i mentioned above, this error only happens sometimes. I would like to modify the timeout value because two minutes is too long of a timeout for my use case - is there any way of doing that with the SDK ?

I tried looking around in the code but couldn't find it. I'd be happy to help with a PR if someone could just point me in the right direction. Also, any additional info about the "Unable to contact server" error happening two minutes after the start of a sessions is appreciated 🙂

Can't get the Confidence score of transcription results

Hello, first of all it's great to see that there's now a Speech Services SDK for node. Thank you!

I'm having trouble getting the confidence score along with the transcription results.

I've tried setting outputFormat in the hopes of getting more detailed output

...
const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
const speechConfig = sdk.SpeechConfig.fromSubscription(SUBSCRIPTION_KEY, SERVICE_REGION);

speechConfig.speechRecognitionLanguage = language;
speechConfig.outputFormat = 1; 

recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
...

but the transcription results still look like this:

SpeechRecognitionResult {
    privResultId: 'XXXXXXXXX',
    privReason: 2,
    privText: 'oui',
    privDuration: 1800000,
    privOffset: 100000,
    privErrorDetails: undefined,
    privJson: '{"Text":"oui","Offset":100000,"Duration":1800000}',
    privProperties: undefined },

Am I missing anything ?

PS: I tried setting the outputFormat to 1 because I saw it applied in the C++ SDK (Azure-Samples/cognitive-services-speech-sdk#12) and because the Node SDK's SpeechConfig does have an OutputFormat property too.

Unable to set an endpoint ID when using SpeechTranslationConfig

The speechTranslationConfig is setting the EndpointID to the Endpoint property.
This results in an incorrect URL being constructed and a status code of 500.

cognitive-services-speech-sdk-js/src/sdk/SpeechTranslationConfig.ts
public set endpointId(value: string) {
this.privSpeechProperties.setProperty(PropertyId.SpeechServiceConnection_Endpoint, value);
}

privErrorDetails: "Unable to contact server. StatusCode: 500, Reason: SyntaxError: Failed to construct 'WebSocket':

Outputresult simple now returns numerals?

Recognizing outputresult simple now returns numerals and text?

When was this changed and how do I just get text output?

It joins numbers together in a string, which you cant tell if its 3 4 or 34?

Recognizing detailed does not provide the needed lexical text

Microphone audio stream terminates after 10 minutes of recording

When using recognizer.startContinuousRecognitionAsync();, the microphone stream gets terminated after exactly 10:00 minutes. According to the release notes of version 1.2.0
this issue should have been fixed for the JavaScript SDK and the reconnection gets executed automatically.

My code looks like this:

// Create speech and audio config
const speechConfig = SpeechSDK.SpeechConfig.fromEndpoint(new URL('MY_ENDPOINT'), 'MY_KEY');
speechConfig.speechRecognitionLanguage = 'de-DE';
const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

// Setup speech recognizer
const recognizer = new SpeechSDK.SpeechRecognizer(this.speechConfig, this.audioConfig);
recognizer.recognized = (s, e) => this.recognized(s, e);

// Start continous recognition
recognizer.startContinuousRecognitionAsync();

This works fine for exactly 10:00 minutes talking to the microphone. After that period, I get the following error messages:

Am I doing something wrong, or is this not successfully fixed yet?
I am using Angular 6.1 with version 1.2.0 of the microsoft-cognitiveservices-speech-sdk.

PropertyId.Conversation_From_Id does not seem to be supported

There does not seem to be support for PropertyId.Conversation_From_Id yet – at least setting it does not do anything and it's even necessary to cast to any in order to set the property.

this.dialogServiceConfig.setProperty(
  PropertyId.Conversation_From_Id as any,
  'my value'
);

Am I using it wrong or was it not implemented yet?

Calling text to speech service using REST SDK

Hi,

I am trying to use the Text to speech service(REST) from Azure Cognitive, and am successful when trying using Postman, "https://**.tts.speech.microsoft.com/cognitiveservices/v1", and getting a mpga file.
When trying with AJAX, the blob downloaded seems corrupted, or may be I am setting the content-type wrong.
Is there any way the SDK could help? I couldn't find a sample for that.

Thanks

Objects are not cleaned up

When running a load test, I notice that sockets and most objects remain and never get cleaned up.

I'm using a mock server that accepts websocket connections, and simulates cognitive services responses.

I start the sessions with recognizer.startContinuousRecognitionAsync(), and stop them after a while by calling recognizer.close().

Memory consumption increases constantly at a high rate (about 8-10MB/s when the session rate is 20/s).

Using Google Chrome inspector, I see that sockets and many other objects remain alive long after sessions are terminated.

reference error: window is not defined

I am running the following code snippet in node.js while testing the SDK.


const speech = require("microsoft-cognitiveservices-speech-sdk");
const rp = require("request-promise");

const region = "westeurope";
const {speechKey, speechTokenEndpoint} = require("./config.json");

async function transcribe(audioconfig) {    
    try {
        const token = await rp.post({uri: speechTokenEndpoint,headers: {"Ocp-Apim-Subscription-Key":speechKey}});
        const speechconfig = speech.SpeechConfig.fromAuthorizationToken(token, region);
        const recognizer = new speech.SpeechRecognizer(speechconfig, audioconfig);
        recognizer.startContinuousRecognitionAsync(recognitionStartedCallback, recognitionFailedCallback);

    } catch (err) {
        console.log(err);
    }
}
function recognitionFailedCallback(err) {
    console.log("[ERROR] starting speech-to-text recognition failed: ", err);
}
function recognitionStartedCallback() {
    console.log("[INFO] speech-to-text recognition started")
}

const audioconfig = speech.AudioConfig.fromDefaultMicrophoneInput();
transcribe(audioconfig);

This raises an error in startContinuousRecognitionAsync:
[ERROR] starting speech-to-text recognition failed: ReferenceError: window is not defined

I cannot find the cause in the source code. I suspect it might be related to the following snippet in Recognizer.ts:

let osPlatform = (typeof window !== "undefined") ? "Browser" : "Node";

However, running this directly in node works perfectly fine.

Windows 10
node v10.15.0

Does it support React Native?

RN supports WebSockets https://facebook.github.io/react-native/docs/network.html, and your sdk supports browser. Does it mean it's a smooth match? Or any pitfall to note?

SDK needs to implement ECMAScript 2017

SDK should support modern JS.

As it stands, complex actions with the SDK is cumbersome, because it utilizes callbacks.

It would be beneficial for the SDK to utilize promises or async/await, lending to a more streamlined coding experience.

SDK hanging up on audio files after first sentence / 15 seconds

Hi there 😄

When I use this SDK with long, multi-sentence audio files, the SDK appears to hang up the socket connection after the first sentence is recognised, and abandons the rest of the audio. The process then exits.

On the following documentation page, it infers that the SDK can process long audio files:

Continuous transcription of long utterances and streaming audio (>15 seconds). Supports interim and final transcription results.

To reproduce:

I tried running the following code sample from the official NodeJS Quickstart. I made no modifications aside from the audio file name in both tests I did.

Here are two audio samples I tested with. They are both mono and 16kHz.

It is important to note that both of these audio files have worked perfectly with the previous versions of the Translator and Bing Speech websocket endpoints, as I have used them for benchmarking in the past when writing software to interface with the endpoints.

It's because of this previous point that I don't believe the gaps between sentences are long enough to trigger an end of speech event, so I'm not exactly sure what is going on. Do I need to change the recognition mode? I had trouble finding detailed documentation about this use case.

I would love to be able to stop using my own code and switch to the official SDK but this is a bit of a showstopper for me. Any help or pointers would be wonderful; thank you so much for your time 🙏 🙇‍♀️

Rejecting a completed promise

An error in the speech recogniser causes our whole server to crash

It happens when network requests fail, to reproduce just try using speech recognition without an internet connection.

[App] node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common/src/common/Promise.ts:446
[App]                 throw new Error(`'Unhandled callback error: ${e}. InnerError: ${error}'`);
[App]                       ^
[App] Error: 'Unhandled callback error: Error: 'Cannot reject a completed promise'. InnerError: 'Unhandled callback error: Error: 'Cannot reject a completed promise'''
[App]     at Sink.executeErrorCallback (node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common/src/common/Promise.ts:446:23)
[App]     at Sink.executeSuccessCallback (node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common/src/common/Promise.ts:437:18)
[App]     at Sink.resolve (node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common/src/common/Promise.ts:390:18)
[App]     at Deferred.resolve (node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common/src/common/Promise.ts:349:23)
[App]     at WebSocket._this.privWebsocketClient.onclose (node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common.browser/src/common.browser/WebsocketMessageAdapter.ts:192:54)
[App]     at WebSocket.onClose (node_modules/ws/lib/event-target.js:124:16)
[App]     at WebSocket.emit (events.js:182:13)
[App]     at WebSocket.EventEmitter.emit (domain.js:442:20)
[App]     at WebSocket.emitClose (node_modules/ws/lib/websocket.js:184:12)
[App]     at ClientRequest.req.on (node_modules/ws/lib/websocket.js:555:15)
[App]     at ClientRequest.emit (events.js:187:15)
[App]     at ClientRequest.EventEmitter.emit (domain.js:442:20)
[App]     at TLSSocket.socketErrorListener (_http_client.js:391:9)
[App]     at TLSSocket.emit (events.js:182:13)
[App]     at TLSSocket.EventEmitter.emit (domain.js:442:20)
[App]     at emitErrorNT (internal/streams/destroy.js:82:8)
[App]     at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
[App]     at process._tickCallback (internal/process/next_tick.js:63:19)

Can't use Custom Speech endpoint with DialogServiceConnector

Using [email protected].

We are trying to use a custom speech endpoint with DialogServiceConnector so that we can leverage a custom speech model with the Direct Line Speech Bot Framework channel.

We are setting the custom endpoint as a service property on the BotFrameworkConfig as follows:

botFrameworkConfig = BotFrameworkConfig.fromAuthorizationToken(
      authorizationToken,
      region
    );

botFrameworkConfig.setServiceProperty(
        'cid',
        ourCustomEndpointId,
        ServicePropertyChannel.UriQueryParameter
      );

With this property set, speech recognition stops working.
We receive the session started event, but not the speech start detected or recognizing/recognized events.
Removing the property restores normal operation.

The same code and endpoint appears to work correctly with a C# client.

Fallback for Websockets

Are there any plans to include a fallback mechanism for browsers that don't support Websockets?
Bing Speech SDK had and migrating to cognitive services speech is getting trouble because it lacks the fallback option.

Too few chunks send to speech server in throtteled / minimized browser

Hi,

we've encountered an issue where an user using a speech to text app would not see any recognized data. This happens when the browser is minimized (in throtteled mode). As soon as the browser is pulled up again, the microphone data is send to the speech server.

We've tracked the issue down to the class ServiceRecognizerBase and its method "sendAudio". This method creates a function "readAndUploadCycle" that is then committed into a Timeout ("setTimeout"). The timout interval is calculated to provide a good send timeout. However, in a throtteled JS execution environment, this timeout interval is ignored and approximately only 1 chunk per second will be sent resulting in a huge and fast growing delay.

We believe this method should actually send multiple chunks to account for throtteled JS execution.

We hope this information is sufficient and would of course like to see an improvment of that logic soon.

For now we've asked our users to keep the browser window open in the background and not minimize it.

Anyways, we would also like to take the opportunity to thank you for the great work and offer our assistance if needed / desired. We are considering to create a fork to implement a logic suited for a throtteled environment if you feel that this issue isn't worth of improvements or are too busy for a near future investigation.

Thank you in advance for taking the time to consider this issue.
If possible please respond as soon as its possible.

Cannot webpack v.1.4.0 due to https-proxy-agent issue.

Despite the effort to exclude from browser package in #40 it is still referenced by the main entry point in package.json. Which results in a build error in a consuming application.

Unable to proceed to speech-to-text conversion

Hello,
I am referencing to the sample at: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/node/speech.js for speech to text conversion.

But its behaving abruptly. Sometimes the process starts, and sometimes I get the following error.

SpeechRecognitionCanceledEventArgs {
  privSessionId: '3309E8B386C94CEEB733BB4ED8DA5895',
  privOffset: undefined,
  privReason: 0,
  privErrorDetails: undefined,
  privErrorCode: 6
}

What does this error mean, and how can I resolve it?

Also, according to the documentation at: https://docs.microsoft.com/en-gb/javascript/api/microsoft-cognitiveservices-speech-sdk/cancellationerrorcode?view=azure-node-latest, the errorCode = 6 Indicates that an error is returned by the service.

Token based authentication is broken in Safari

There appears to be a feature (bug?) in the latest Safari (v12.1) where an attempt to resume AudioContext upon completion of an async operation fails, even it was initiated by a user action.
This makes it impossible to effectively use AuthorizationToken, as it only makes sense to fetch one upon a user action (e.g. button click) and the fetch has to be asynchronous. Plus there is no way to wait till AudioContext is resumed before supplying the token.

Proposed solution: CognitiveSubscriptionKeyAuthentication class already supports fetch (and fetchOnExpiry) function as a parameter. If it could be passed in via SpeechConfig, the issue would be solved. One obvious obstacle - fetch function returns proprietary Promise. Hence the part of the solution has to be the migration to es6 Promise.

Keyword Spotting Implementation

I want to use custom wake word but it throws an exception Uncaught Error: Not yet implemented.
Is there any way to use it?
if not, how long does it take to be implemented?
Thanks.

Speech Recognition with Intents missing the query words after special character

Speech Recognition with Intents missing query words on Intent Recognition
In Javascript Speech SDK intent recognized with LanguageUnderstandingServiceResponse_JsonResult.
The recognized query words are missed after the special characters($, +, etc.,) occurrence.
Solution: Recognized query words from Speech SDK must parsing/encoded while process with LUIS.

Steps to reproduce the behavior

Speek with sentence with &
Check the SpeechSDK.ResultReason.RecognizedSpeech for exact text we speak
Check the SpeechSDK.ResultReason.RecognizedIntent there where the words after & is missed

Expected behavior
Recognized query words from Speech SDK must parsing/encoded while process with LUIS.

Version of the Cognitive Services Speech SDK
SpeechSDK-JavaScript-1.1.0
SpeechSDK-JavaScript-1.2.1

Programming Language

Programming language: JavaScript

Issues with log information
Sample item to reproduce the issue:
*Note: Problematic lines are highlighted below

(sessionStarted) SessionId: 712B58292F564C9C85B9E45CD6C14AA9
(speechStartDetected) SessionId: 712B58292F564C9C85B9E45CD6C14AA9
(recognizing) Reason: RecognizingIntent Text: he
(recognizing) Reason: RecognizingIntent Text: show
(recognizing) Reason: RecognizingIntent Text: show comp
(recognizing) Reason: RecognizingIntent Text: show company
(recognizing) Reason: RecognizingIntent Text: show company did
(recognizing) Reason: RecognizingIntent Text: show company detail
(recognizing) Reason: RecognizingIntent Text: show company details
(recognizing) Reason: RecognizingIntent Text: show company details for
(recognizing) Reason: RecognizingIntent Text: show company details for 80
(recognizing) Reason: RecognizingIntent Text: show company details for AT and
(recognizing) Reason: RecognizingIntent Text: show company details for AT&T
(recognizing) Reason: RecognizingIntent Text: show company details for AT&T lab
(recognizing) Reason: RecognizingIntent Text: show company details for AT&T laboratory
(recognizing) Reason: RecognizingIntent Text: show company details for AT&T laboratories
(speechEndDetected) SessionId: 712B58292F564C9C85B9E45CD6C14AA9
(recognized) Reason: RecognizedIntent Text: Show company details for AT&T laboratories. IntentId: BusinessDetails Intent JSON: {
"query": "show company details for AT",
"topScoringIntent": {
"intent": "BusinessDetails",
"score": 0.835796535
},
"entities": []
}
(continuation) Reason: RecognizedIntent Text: Show company details for AT&T laboratories. IntentId: BusinessDetails Intent JSON: {
"query": "show company details for AT",
"topScoringIntent": {
"intent": "BusinessDetails",
"score": 0.835796535
},
"entities": []
}
(sessionStopped) SessionId: 712B58292F564C9C85B9E45CD6C14AA9

Latest patch breaks builds on azure webapps but not locally

See screenshot..

Microphone keep recording if subscription key is wrong

Using SDK 1.6.0 on Chrome 75.

If the subscription key is wrong, the microphone will not turn itself off and not able to turn off.

Repro:

Put a wrong subscription key
Call startContinuousRecognitionAsync()
Got audioSourceReady event
Got canceled event, saying "Unable to contact server. StatusCode: 1006, undefined Reason: "

Expected:

Microphone should turn off, or
Calling stopContinuousRecognitionAsync could turn off the microphone

Actual:

Microphone is not turned off, and
Calling stopContinuousRecognitionAsync did not turn off the microphone
The red dot on Chrome tab bar show microphone is continue recording