Batching inference should be used in Whisper-Streaming. It's currently not implemented

Hello again <a class="user-mention notranslate" data-hovercard-type="user" data-hoverc

Hi! There's an implementation that supports batch inference: <a href="https://githu

So, how's your progress, <a class="user-mention notranslate" data-hovercard-type="user

batching inference and forced decoding for speedup and multi-target about whisper_streaming HOT 12 OPEN

ufal commented on July 24, 2024

batching inference and forced decoding for speedup and multi-target

from whisper_streaming.

Comments (12)

joaogabrieljunq commented on July 24, 2024 1

OK, I checked it. The Insanely Fast Whisper is just a wrapper of Huggingface Transformers. The example usage of batching is huggingface/transformers#27658 .

This https://github.com/pe-trik/transformers/blob/online_decode/examples/pytorch/online-decoding/whisper-online-demo.py shows the forced decoding.

So these are the initial points to work on this issue. I might do it in a few weeks, but anybody can go on :)

Any news about this implementation or any finds so we can try to work on that? I am trying to build a multi-client server and batching would be nice to run more than one transcript at the same instance

from whisper_streaming.

Gldkslfmsd commented on July 24, 2024 1

Wow, great!
No news, except that batching has becoming also my priority :) Let's cooperate. I want to start later this week. My first step will be a jupyter notebook where I'll quickly inspect and prototype. It will be messy. Then I isolate the working solution into this repo.

The easiest use case for batching is decode the same audio twice, the whole buffer + the whole minus last chunk.

from whisper_streaming.

joaogabrieljunq commented on July 24, 2024 1

Hello again @Gldkslfmsd, nice to know that you are progressing in batch implementation research! I spent yesterday researching also about possible implementations for this. Found WhisperS2T that seems to implement dynamic time length support in batch inference, helping in the pad problem that you mentioned above. Perhaps this could help also. https://github.com/shashikg/WhisperS2T/blob/main/whisper_s2t/backends/ctranslate2/model.py

from whisper_streaming.

vidalfer commented on July 24, 2024

Hi! There's an implementation that supports batch inference: https://github.com/Vaibhavs10/insanely-fast-whisper
I'm not sure if it can be easily implemented in the Whisper streaming project

from whisper_streaming.

Gldkslfmsd commented on July 24, 2024

Hi! There's an implementation that supports batch inference: https://github.com/Vaibhavs10/insanely-fast-whisper

I'm not sure if it can be easily implemented in the Whisper streaming project

yes, me neither. I would need a pointer to the function that takes two audio samples and processes them at once.

from whisper_streaming.

Gldkslfmsd commented on July 24, 2024

OK, I checked it. The Insanely Fast Whisper is just a wrapper of Huggingface Transformers. The example usage of batching is huggingface/transformers#27658 .

This https://github.com/pe-trik/transformers/blob/online_decode/examples/pytorch/online-decoding/whisper-online-demo.py shows the forced decoding.

So these are the initial points to work on this issue. I might do it in a few weeks, but anybody can go on :)

from whisper_streaming.

joaogabrieljunq commented on July 24, 2024

Sure, let's cooperate! My doubt is: decode the same audio twice is for speedup use case, right? I check you mention about multi-client in #42 and would it be necessary to decode + batching backend API to parallelize multiple audios in GPU? I could try to work in this batching backend layer using whisper-streaming source code.

from whisper_streaming.

Gldkslfmsd commented on July 24, 2024

Sure, let's cooperate! My doubt is: decode the same audio twice is for speedup use case, right?

yes. Just be aware that batching multiple audios can result in slow down. There will be independent audio buffers of different lengths. You need to pad the audio input to the longest, and the processing time is the same as the longest. So you gain effectiveness, but lose some speed.

from whisper_streaming.

Gldkslfmsd commented on July 24, 2024

So, how's your progress, @joaogabrieljunq ?
I found this today: https://github.com/m-bain/whisperX/blob/main/whisperx/asr.py They know how to use batching with faster-whisper. I hope I can reuse this code. And I found that huggingface transformers enable batching with Whisper, but most probably not with word-level timestamps. And they're really necessary with Whisper-Streaming.

from whisper_streaming.

SalomonKisters commented on July 24, 2024

Any news on this matter?

from whisper_streaming.

orgh0 commented on July 24, 2024

Any update on batching?

from whisper_streaming.

Gldkslfmsd commented on July 24, 2024

no. Unfortunately it's not among my priorities anymore.

from whisper_streaming.

batching inference and forced decoding for speedup and multi-target about whisper_streaming HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent