Comments (3)
Hey, thanks for reaching out!
The Speech SDK will need to be told about the format of the audio being sent in if it's not 16bit-16Khz mono.
There's a sample class here that will read the header information from a wave file and write the file to a push stream.
You can see it used here.
Please let me know if you need anything else.
from cognitive-services-speech-sdk-js.
Hi @rhurey,
thanks for your reply!
Unfortunately, the sample code you posted is not working with WAV files with headers created in "wave_format_extended".
I will try to change the code, to stripe those headers and push directly the audio bytes to the SDK with the hard-coded format.
In the meantime, what formats does the Azure Speech Service accept?
In the examples/tests, the audio files in the SDK source code, are in 16khz and 16bit and the same online speech tool transforms the file to 16khz and 16bit. Is this the only accepted format from the Azure Speech Recognition service?
Thanks!
from cognitive-services-speech-sdk-js.
Updates:
The service successfully recognized my voice when I sent the raw audio stripped by the audio extended headers and with the specific format of the audio file (48khz and 32bitpersample and PCM).
Thanks for your time!
from cognitive-services-speech-sdk-js.
Related Issues (20)
- IntentRecognizer - Supporting CLU and simple pattern matching HOT 1
- [Bug]: result.text property set to '.' on recognized event when performing speech translation with Arabic languages. HOT 8
- [Bug]: SpeechConfig.FromEndpoint always cancel the connection with Invalid argument exception HOT 1
- [Bug]: Unable to contact server error causes memory leak HOT 1
- What is ServiceTimeout value and how to modify that? HOT 4
- Missing PronunciationAssessmentGranularity.Syllable HOT 1
- [Bug]: Speaker verification failing with 401 error since version 1.27 HOT 4
- [Bug]:Interim Failed WebSocket connection in Continuous translation HOT 4
- [Bug]: TTS doesn't strip out markdown for English(India) voices - both Prabhat and Neerja. HOT 6
- [Bug]: ErrorType (UnexpectedBreak,MissingBreak) are not receiving in detailResult words from sdk HOT 4
- [Bug]: speakSsmlAsync produces 0 duration audio but result reason is SynthesizingAudioCompleted HOT 1
- [Bug]: Real-Time Speech-to-Text Lag and Synchronization Problems on Low-Power Devices HOT 4
- [Bug]: ConversationTranscriptionResult always return 0 on Channel info HOT 1
- Illegal Invocation Error When Using Speech SDK in Cloudflare Workers Environment HOT 5
- [Bug]: 2 Node [s] with type [Others] should not contain node [voice] with type [Media] HOT 2
- [Bug]: No way to determine when the produced audio has completed HOT 2
- [Bug]: Websocket 404 in Firefox HOT 15
- [Bug]: 3D Blendshape Data Not Generating for Super Realistic Voices HOT 8
- I'm looking for a way to adjust these threshold values depending on the country, but I haven't found any options or settings for that. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cognitive-services-speech-sdk-js.