Comments (12)
OK, I checked it. The Insanely Fast Whisper is just a wrapper of Huggingface Transformers. The example usage of batching is huggingface/transformers#27658 .
This https://github.com/pe-trik/transformers/blob/online_decode/examples/pytorch/online-decoding/whisper-online-demo.py shows the forced decoding.
So these are the initial points to work on this issue. I might do it in a few weeks, but anybody can go on :)
Any news about this implementation or any finds so we can try to work on that? I am trying to build a multi-client server and batching would be nice to run more than one transcript at the same instance
from whisper_streaming.
Wow, great!
No news, except that batching has becoming also my priority :) Let's cooperate. I want to start later this week. My first step will be a jupyter notebook where I'll quickly inspect and prototype. It will be messy. Then I isolate the working solution into this repo.
The easiest use case for batching is decode the same audio twice, the whole buffer + the whole minus last chunk.
from whisper_streaming.
Hello again @Gldkslfmsd, nice to know that you are progressing in batch implementation research! I spent yesterday researching also about possible implementations for this. Found WhisperS2T that seems to implement dynamic time length support in batch inference, helping in the pad problem that you mentioned above. Perhaps this could help also. https://github.com/shashikg/WhisperS2T/blob/main/whisper_s2t/backends/ctranslate2/model.py
from whisper_streaming.
Hi! There's an implementation that supports batch inference: https://github.com/Vaibhavs10/insanely-fast-whisper
I'm not sure if it can be easily implemented in the Whisper streaming project
from whisper_streaming.
Hi! There's an implementation that supports batch inference: https://github.com/Vaibhavs10/insanely-fast-whisper
I'm not sure if it can be easily implemented in the Whisper streaming project
yes, me neither. I would need a pointer to the function that takes two audio samples and processes them at once.
from whisper_streaming.
OK, I checked it. The Insanely Fast Whisper is just a wrapper of Huggingface Transformers. The example usage of batching is huggingface/transformers#27658 .
This https://github.com/pe-trik/transformers/blob/online_decode/examples/pytorch/online-decoding/whisper-online-demo.py shows the forced decoding.
So these are the initial points to work on this issue. I might do it in a few weeks, but anybody can go on :)
from whisper_streaming.
Sure, let's cooperate! My doubt is: decode the same audio twice is for speedup use case, right? I check you mention about multi-client in #42 and would it be necessary to decode + batching backend API to parallelize multiple audios in GPU? I could try to work in this batching backend layer using whisper-streaming source code.
from whisper_streaming.
Sure, let's cooperate! My doubt is: decode the same audio twice is for speedup use case, right?
yes. Just be aware that batching multiple audios can result in slow down. There will be independent audio buffers of different lengths. You need to pad the audio input to the longest, and the processing time is the same as the longest. So you gain effectiveness, but lose some speed.
from whisper_streaming.
So, how's your progress, @joaogabrieljunq ?
I found this today: https://github.com/m-bain/whisperX/blob/main/whisperx/asr.py They know how to use batching with faster-whisper. I hope I can reuse this code. And I found that huggingface transformers enable batching with Whisper, but most probably not with word-level timestamps. And they're really necessary with Whisper-Streaming.
from whisper_streaming.
Any news on this matter?
from whisper_streaming.
Any update on batching?
from whisper_streaming.
no. Unfortunately it's not among my priorities anymore.
from whisper_streaming.
Related Issues (20)
- can dubbing be done in real-time with 3060 12gb ? HOT 1
- About LocalAgreement HOT 1
- Link about the research paper seem broken HOT 2
- TypeError: Translations.create() got an unexpected keyword argument 'timestamp_granularities' HOT 2
- `tgt_language undefined` problem on `main` HOT 2
- Can't run from windows the command of recording from mic HOT 2
- Tracking down delay HOT 2
- --model_path Never Work! HOT 6
- How to use whisper_online_server.py on macOS HOT 2
- Help to to run the program to transcirbe real time audio from mic HOT 1
- [BUG] Unnecessary socket re-creation inside with statement in whisper_online_server.py HOT 3
- Use of another backend HOT 2
- OpenAi Api not adding punctuation HOT 10
- OpenAI Whisper is not working anymore as a backend for whisper_streaming HOT 5
- Could this impletemented with micphone as voice input? HOT 1
- unexpected slow speed HOT 3
- [Quesion] about embedding whisper on deivce? HOT 1
- bilgi/ instructions notice learning HOT 2
- Server and Client for Web App HOT 1
- How to start the command correctly:whisper_online_server.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper_streaming.