Coder Social home page Coder Social logo

Using VAD about whisper_streaming HOT 9 CLOSED

ufal avatar ufal commented on July 4, 2024
Using VAD

from whisper_streaming.

Comments (9)

donaldos avatar donaldos commented on July 4, 2024 1

Dear...
I solved this problem.
The reason is the python numpy version and librosa library. etc....
Thank you.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on July 4, 2024 1

@koiking213 -- can you investigate what conditions the error? Is it possible that if VAD filters out everything, the input tensor is empty and auto detected as a wrong format?
Maybe you can catch and ignore the exception.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on July 4, 2024 1

Great. Can you please share the fix? Either copy paste the change as a comment, or make PR.
Others may benefit from it.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on July 4, 2024

Hi,
I'm sorry, I don't know how to help. Can you specify which backend do you use? Whisper_timestamped or faster-whisper? Are you sure you installed them correctly? Can you run Whisper in offline mode with VAD using only their code? If not, you can ask their authors for help. If yes, it should work.

from whisper_streaming.

donaldos avatar donaldos commented on July 4, 2024

Thank you for your response.
I used faster-whisper, If args.vad=false, the speech recognition result is obtained well in the case of file recognition. However, if you set arg.vad=true, the above error occurs.

"onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(double)) , expected: (tensor(float))"

"onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(double)) , expected: (tensor(float))"

Based on the above error... it seems to be a data type issue, especially in numpy related functions.

Thanks for your help.

from whisper_streaming.

koiking213 avatar koiking213 commented on July 4, 2024

@donaldos
I have the same problem now.
Could you share the specific versions of numpy and librosa?

from whisper_streaming.

koiking213 avatar koiking213 commented on July 4, 2024

As you said, VAD filtered out everything, and initial float32 data was cleared, then the new audio data is added as double.
I solved this problem by specifying the dtype of audio in receive_audio_chunk. (I think the default dtype depends on a version of librosa.)
Thank you for your response!

from whisper_streaming.

henriklied avatar henriklied commented on July 4, 2024

I'm still getting this error even after pulling the latest changes. Any ideas for another workaround that might work?

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on July 4, 2024

Hi, @henriklied . Can you check your librosa version, and maybe try the latest? Maybe it depends on it?

Or, can you give minimum reproducible example? Short audiofile, setup, dependencies versions, ...

from whisper_streaming.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.