Comments (5)
Wow, thank you, @rodrigoGA ! This is very interesting feedback. I want to review and test your approach and possibly merge the useful parts. Later, when I'll have time.
Thanks!
from whisper_streaming.
Should the suggestion be integrated, I would also suggest changing the way the translation is returned. All streaming systems in some way indicate whether it is a partial or final translation. In this way, what is in the buffer could be returned as partial, and the user would have a more realistic feedback of what is being said. It is understood that the partial can change.
from whisper_streaming.
yes, an option for |||-separated partial output is possible. But anyway, I don't want more complicated output protocol. Plaintext is enough.
from whisper_streaming.
I understand the idea of keeping it simple. However, this is the standard in streaming ASR. You can check how Nvidia uses 'is_final' for all streaming models supported by the Riva platform https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#_CPPv428SpeechRecognitionAlternative or companies that sell the model as a service in streaming APIs https://www.assemblyai.com/docs/guides/real-time-streaming-transcription
All of them use the same concept. As a consumer of these services, I can tell you that this is very useful for knowing when the user is speaking and for getting feedback on what is happening, even though the transcription has not finished. Imagine you want to use an ASR in a real-world use case, for example, transcribing a phone call. You would need to know when the user stops speaking and that the transcription is finished in order to do something with the text. Otherwise, you would have to wait until the call ends to consider the transcription complete, which would lose the aspect of real-time
from whisper_streaming.
@rodrigoGA , thank you very much again. In integrated your VAC in https://github.com/ufal/whisper_streaming/tree/vad-streaming It seems working good, but the code needs to be reviewed and made clearer and simpler. Then I can merge it.
from whisper_streaming.
Related Issues (20)
- can dubbing be done in real-time with 3060 12gb ? HOT 1
- About LocalAgreement HOT 1
- Link about the research paper seem broken HOT 2
- TypeError: Translations.create() got an unexpected keyword argument 'timestamp_granularities' HOT 2
- `tgt_language undefined` problem on `main` HOT 2
- Can't run from windows the command of recording from mic HOT 2
- Tracking down delay HOT 2
- --model_path Never Work! HOT 6
- How to use whisper_online_server.py on macOS HOT 2
- Help to to run the program to transcirbe real time audio from mic HOT 1
- [BUG] Unnecessary socket re-creation inside with statement in whisper_online_server.py HOT 3
- Use of another backend HOT 2
- OpenAi Api not adding punctuation HOT 10
- OpenAI Whisper is not working anymore as a backend for whisper_streaming HOT 5
- Could this impletemented with micphone as voice input? HOT 1
- unexpected slow speed HOT 3
- [Quesion] about embedding whisper on deivce? HOT 1
- bilgi/ instructions notice learning HOT 2
- Server and Client for Web App HOT 1
- How to start the command correctly:whisper_online_server.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper_streaming.