Hey there, I'm using your STT services with this configuration, <cod

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="use

[Question] STT get raw Blob about cognitive-services-speech-sdk-js HOT 6 CLOSED

GalDayan commented on June 12, 2024

[Question] STT get raw Blob

from cognitive-services-speech-sdk-js.

Comments (6)

glharper commented on June 12, 2024

@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:

        var buffer = [];
        const con = speechsdk.Connection.fromRecognizer(recognizer);
        con.messageSent = (args) => {
            if (args.message.isBinaryMessage) {
                buffer.push(args.messsage.binaryMessage);
            }
        };

You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.

Hope this helps.

from cognitive-services-speech-sdk-js.

GalDayan commented on June 12, 2024

@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:
        var buffer = [];
        const con = speechsdk.Connection.fromRecognizer(recognizer);
        con.messageSent = (args) => {
            if (args.message.isBinaryMessage) {
                buffer.push(args.messsage.binaryMessage);
            }
        };
You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.

Hope this helps.

@glharper thanks for your fast response!
I'm wondering if there any possibility the add identifier to the WAV files that saved from the enableAudioLogging and then option to filter by that identifier. I feel that the sync is one of the most important when thinking on being able to train my own model to decrease WER

from cognitive-services-speech-sdk-js.

glharper commented on June 12, 2024

@GalDayan You should be able to match the audio you've saved with the result.offset (which returns the time in 100 nanosecond increments at the beginning of the result) for each recognized result.

from cognitive-services-speech-sdk-js.

ofekby commented on June 12, 2024

@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:
        var buffer = [];
        const con = speechsdk.Connection.fromRecognizer(recognizer);
        con.messageSent = (args) => {
            if (args.message.isBinaryMessage) {
                buffer.push(args.messsage.binaryMessage);
            }
        };
You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.

Hope this helps.

@glharper Hi, thanks for the help. I do have one concern though.
When reading about the connection class being used (args.message in this code example) I can see that its not production ready (https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?

from cognitive-services-speech-sdk-js.

glharper commented on June 12, 2024

@glharper Hi, thanks for the help. I do have one concern though. When reading about the connection class being used (args.message in this code example) I can see that its not production ready (learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?

Excellent question. Answering your second question, the structure can indeed be locked on when locking the version of the SDK. One caveat to that is that the backend service may someday drop support for the SDK version and structure, but I'd expect deprecation communication in the usual manner should that ever happen.

As to the message itself, while the warning is accurate, I can't imagine an iteration of the JS Speech SDK that doesn't send binary audio data to the backend service. Any changes to the binary portion of the message would be...unexpected for the short to medium term.

from cognitive-services-speech-sdk-js.

ofekby commented on June 12, 2024

@glharper Hi, thanks for the help. I do have one concern though. When reading about the connection class being used (args.message in this code example) I can see that its not production ready (learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?

Excellent question. Answering your second question, the structure can indeed be locked on when locking the version of the SDK. One caveat to that is that the backend service may someday drop support for the SDK version and structure, but I'd expect deprecation communication in the usual manner should that ever happen.

As to the message itself, while the warning is accurate, I can't imagine an iteration of the JS Speech SDK that doesn't send binary audio data to the backend service. Any changes to the binary portion of the message would be...unexpected for the short to medium term.

@glharper Great! thanks you for the quick reply

from cognitive-services-speech-sdk-js.

[Question] STT get raw Blob about cognitive-services-speech-sdk-js HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent