I'm having trouble using VAD. To use VAD, I set the setting to True and ran the se

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Using VAD,about ufal/whisper_streaming

donaldos commented on July 4, 2024 1

Dear...
I solved this problem.
The reason is the python numpy version and librosa library. etc....
Thank you.

from whisper_streaming.

Gldkslfmsd commented on July 4, 2024 1

@koiking213 -- can you investigate what conditions the error? Is it possible that if VAD filters out everything, the input tensor is empty and auto detected as a wrong format?
Maybe you can catch and ignore the exception.

from whisper_streaming.

Gldkslfmsd commented on July 4, 2024 1

Great. Can you please share the fix? Either copy paste the change as a comment, or make PR.
Others may benefit from it.

from whisper_streaming.

Gldkslfmsd commented on July 4, 2024

Hi,
I'm sorry, I don't know how to help. Can you specify which backend do you use? Whisper_timestamped or faster-whisper? Are you sure you installed them correctly? Can you run Whisper in offline mode with VAD using only their code? If not, you can ask their authors for help. If yes, it should work.

from whisper_streaming.

donaldos commented on July 4, 2024

Thank you for your response.
I used faster-whisper, If args.vad=false, the speech recognition result is obtained well in the case of file recognition. However, if you set arg.vad=true, the above error occurs.

"onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(double)) , expected: (tensor(float))"

Based on the above error... it seems to be a data type issue, especially in numpy related functions.

Thanks for your help.

from whisper_streaming.

koiking213 commented on July 4, 2024

@donaldos
I have the same problem now.
Could you share the specific versions of numpy and librosa?

from whisper_streaming.

koiking213 commented on July 4, 2024

As you said, VAD filtered out everything, and initial float32 data was cleared, then the new audio data is added as double.
I solved this problem by specifying the dtype of audio in receive_audio_chunk. (I think the default dtype depends on a version of librosa.)
Thank you for your response!

from whisper_streaming.

henriklied commented on July 4, 2024

I'm still getting this error even after pulling the latest changes. Any ideas for another workaround that might work?

from whisper_streaming.

Gldkslfmsd commented on July 4, 2024

Hi, @henriklied . Can you check your librosa version, and maybe try the latest? Maybe it depends on it?

Or, can you give minimum reproducible example? Short audiofile, setup, dependencies versions, ...

from whisper_streaming.

Using VAD about whisper_streaming HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent