Coder Social home page Coder Social logo

Comments (6)

glharper avatar glharper commented on June 12, 2024

@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:

        var buffer = [];
        const con = speechsdk.Connection.fromRecognizer(recognizer);
        con.messageSent = (args) => {
            if (args.message.isBinaryMessage) {
                buffer.push(args.messsage.binaryMessage);
            }
        };

You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.

Hope this helps.

from cognitive-services-speech-sdk-js.

GalDayan avatar GalDayan commented on June 12, 2024

@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:

        var buffer = [];
        const con = speechsdk.Connection.fromRecognizer(recognizer);
        con.messageSent = (args) => {
            if (args.message.isBinaryMessage) {
                buffer.push(args.messsage.binaryMessage);
            }
        };

You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.

Hope this helps.

@glharper thanks for your fast response!
I'm wondering if there any possibility the add identifier to the WAV files that saved from the enableAudioLogging and then option to filter by that identifier. I feel that the sync is one of the most important when thinking on being able to train my own model to decrease WER

from cognitive-services-speech-sdk-js.

glharper avatar glharper commented on June 12, 2024

@GalDayan You should be able to match the audio you've saved with the result.offset (which returns the time in 100 nanosecond increments at the beginning of the result) for each recognized result.

from cognitive-services-speech-sdk-js.

ofekby avatar ofekby commented on June 12, 2024

@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:

        var buffer = [];
        const con = speechsdk.Connection.fromRecognizer(recognizer);
        con.messageSent = (args) => {
            if (args.message.isBinaryMessage) {
                buffer.push(args.messsage.binaryMessage);
            }
        };

You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.

Hope this helps.

@glharper Hi, thanks for the help. I do have one concern though.
When reading about the connection class being used (args.message in this code example) I can see that its not production ready (https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?

from cognitive-services-speech-sdk-js.

glharper avatar glharper commented on June 12, 2024

@glharper Hi, thanks for the help. I do have one concern though. When reading about the connection class being used (args.message in this code example) I can see that its not production ready (learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?

Excellent question. Answering your second question, the structure can indeed be locked on when locking the version of the SDK. One caveat to that is that the backend service may someday drop support for the SDK version and structure, but I'd expect deprecation communication in the usual manner should that ever happen.

As to the message itself, while the warning is accurate, I can't imagine an iteration of the JS Speech SDK that doesn't send binary audio data to the backend service. Any changes to the binary portion of the message would be...unexpected for the short to medium term.

from cognitive-services-speech-sdk-js.

ofekby avatar ofekby commented on June 12, 2024

@glharper Hi, thanks for the help. I do have one concern though. When reading about the connection class being used (args.message in this code example) I can see that its not production ready (learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?

Excellent question. Answering your second question, the structure can indeed be locked on when locking the version of the SDK. One caveat to that is that the backend service may someday drop support for the SDK version and structure, but I'd expect deprecation communication in the usual manner should that ever happen.

As to the message itself, while the warning is accurate, I can't imagine an iteration of the JS Speech SDK that doesn't send binary audio data to the backend service. Any changes to the binary portion of the message would be...unexpected for the short to medium term.

@glharper Great! thanks you for the quick reply

from cognitive-services-speech-sdk-js.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.