Comments (6)
@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:
var buffer = [];
const con = speechsdk.Connection.fromRecognizer(recognizer);
con.messageSent = (args) => {
if (args.message.isBinaryMessage) {
buffer.push(args.messsage.binaryMessage);
}
};
You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.
Hope this helps.
from cognitive-services-speech-sdk-js.
@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:
var buffer = []; const con = speechsdk.Connection.fromRecognizer(recognizer); con.messageSent = (args) => { if (args.message.isBinaryMessage) { buffer.push(args.messsage.binaryMessage); } };
You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.
Hope this helps.
@glharper thanks for your fast response!
I'm wondering if there any possibility the add identifier to the WAV files that saved from the enableAudioLogging
and then option to filter by that identifier. I feel that the sync is one of the most important when thinking on being able to train my own model to decrease WER
from cognitive-services-speech-sdk-js.
@GalDayan You should be able to match the audio you've saved with the result.offset (which returns the time in 100 nanosecond increments at the beginning of the result) for each recognized result.
from cognitive-services-speech-sdk-js.
@GalDayan Thank you for using JS Speech SDK, and writing this issue up. Our speech recognition results do not include the audio from which that result was generated, nor do we plan to add that feature. For the full input audio, you could use the Connection.messageSent callback to grab the input audio and store it in a buffer, something like:
var buffer = []; const con = speechsdk.Connection.fromRecognizer(recognizer); con.messageSent = (args) => { if (args.message.isBinaryMessage) { buffer.push(args.messsage.binaryMessage); } };
You could adapt this to save a wav file and reset the buffer on every recognized event, but the results wouldn't be perfectly synchronized, given the (non-constant) lag between sending the audio and receiving the recognition result.
Hope this helps.
@glharper Hi, thanks for the help. I do have one concern though.
When reading about the connection class being used (args.message
in this code example) I can see that its not production ready (https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?
from cognitive-services-speech-sdk-js.
@glharper Hi, thanks for the help. I do have one concern though. When reading about the connection class being used (
args.message
in this code example) I can see that its not production ready (learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?
Excellent question. Answering your second question, the structure can indeed be locked on when locking the version of the SDK. One caveat to that is that the backend service may someday drop support for the SDK version and structure, but I'd expect deprecation communication in the usual manner should that ever happen.
As to the message itself, while the warning is accurate, I can't imagine an iteration of the JS Speech SDK that doesn't send binary audio data to the backend service. Any changes to the binary portion of the message would be...unexpected for the short to medium term.
from cognitive-services-speech-sdk-js.
@glharper Hi, thanks for the help. I do have one concern though. When reading about the connection class being used (
args.message
in this code example) I can see that its not production ready (learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/connectionmessage?view=azure-node-latest). How problematic is it? And is it describing the websocket API, or is it a structure that can be locked on when locking the version of it?Excellent question. Answering your second question, the structure can indeed be locked on when locking the version of the SDK. One caveat to that is that the backend service may someday drop support for the SDK version and structure, but I'd expect deprecation communication in the usual manner should that ever happen.
As to the message itself, while the warning is accurate, I can't imagine an iteration of the JS Speech SDK that doesn't send binary audio data to the backend service. Any changes to the binary portion of the message would be...unexpected for the short to medium term.
@glharper Great! thanks you for the quick reply
from cognitive-services-speech-sdk-js.
Related Issues (20)
- [Bug]: ErrorType (UnexpectedBreak,MissingBreak) are not receiving in detailResult words from sdk HOT 4
- [Bug]: speakSsmlAsync produces 0 duration audio but result reason is SynthesizingAudioCompleted HOT 1
- [Bug]: Real-Time Speech-to-Text Lag and Synchronization Problems on Low-Power Devices HOT 4
- [Bug]: ConversationTranscriptionResult always return 0 on Channel info HOT 1
- Illegal Invocation Error When Using Speech SDK in Cloudflare Workers Environment HOT 5
- [Bug]: 2 Node [s] with type [Others] should not contain node [voice] with type [Media] HOT 2
- [Bug]: No way to determine when the produced audio has completed HOT 2
- [Bug]: Websocket 404 in Firefox HOT 15
- [Bug]: 3D Blendshape Data Not Generating for Super Realistic Voices HOT 8
- I'm looking for a way to adjust these threshold values depending on the country, but I haven't found any options or settings for that. HOT 2
- [Bug]: JS SpeechSDK.AudioConfig.fromDefaultMicrophoneInput capturing Teams/Zoom call speaker sounds where as JAVA SpeechSDK.AudioConfig.fromDefaultMicrophoneInput not HOT 8
- How do I get the speaker's name from SpeechSynthesizer events?
- [Bug]: SDK Crashes HOT 1
- [Bug]: Speech translation dynamic addTargetLanguage fails after no speech for 1 min HOT 2
- [Bug]: Firefox WebSocket HTTP/2 Issue: App Malfunction When Engine Started and Stopped Multiple Times HOT 8
- [Bug]: Browser Unable to Decode and Play Partial Speech Segments due to Missing Header Information HOT 1
- [Bug]: Azure Speech Recognition Not Converting Speech to Text for Chinese Language HOT 8
- [Doc]: TTS batch synthesis maximum JSON payload size HOT 1
- [Bug]: SpeakerAudioDestination > onAudioEnd does not work
- Seeking Advice on Optimizing Azure Speech Services Region Handling HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cognitive-services-speech-sdk-js.