xenova / whisper-web Goto Github PK
View Code? Open in Web Editor NEWML-powered speech recognition directly in your browser
Home Page: https://hf.co/spaces/Xenova/whisper-web
License: MIT License
ML-powered speech recognition directly in your browser
Home Page: https://hf.co/spaces/Xenova/whisper-web
License: MIT License
It is missing line in between paragraph audio and single word or short length audio transcription is not happning . please enhance these thing it is stucking on some word in paragraphp and taking it to the loop as well. and there are autocorrection and filler sentence is adding please do word wise so it can be more accurate. whatever user speak it should in text rather than auto generation of the text.
Hi Xenova, great work with this repo. Do you know if it's possible to get word-level-timestamps with this? I know it's possible if I'm running whisper in the terminal, but I'm not sure if that functionality extends to this browser/huggingface version, and I don't know how to find out. If you're not sure, feel free to let me know and close the issue.
Please Add SRT export or download button on huggingface demo. Thanks
Hey @xenova,
I couldn't find any license in this repo and wanted to ask wether this project can be used commercially (e.g. through MIT License).
Thanks in advance!
It would be great if there was a minimal vanilla JS example of how to use this in a project. I don't use React, so the current output of npm install
is very difficult to extract any understanding from.
During development, the decoder_model_merged.onnx
and encoder_model.onnx
models are fetched fine from Hugging Face or from my Express server. After building the React app with Create React App, transformers.js is trying to fetch model.onnx
which cannot be found.
Error: Could not locate file: "https://huggingface.co/Xenova/whisper-tiny.en/resolve/output_attentions/onnx/model.onnx".
My pipeline options are { progress_callback, revision: "output_attentions", quantized: false }
. The task is "automatic-speech-recognition" and the model is "Xenova/whisper-tiny.en". @xenova What is model.onnx
?
Hey,
Big thanks for awesome project!
It possible to add score/confidence for word level output when using Speech Recognition/Whisper model?
Would appreciate any direction/comments or suggestion where to dig to add it.
Happy to submit PR if I will success in it.
Thanks!
I am trying to publish "whisper-web" to IPFS - https://ipfs.tech
https://ipfs.asycn.io/ipfs/QmaC43rMkK3qik5dGuwzHf9NoRzW3JTQf39br8gkVJVd1s
As processing is done in browser, i would just need "html client side files"
any help is welcome
(^‿‿^)
PS: i would understand that you close this issue, as it is not.
First of all, thank you for your work. I have a question. Does it work in cell phone browsers? Or only on computers. From already thank you very much
i want to clone in HF for use large v3 online, that is possible?
I was having trouble running this in dev mode in firefox.
Loading worker.js
will fail with SyntaxError: import declarations may only appear at top level of a module
This is because firefox does not support worker modules by default
If you wish to run this in dev mode in firefox
you will have to enable it manually via about:config
and set dom.workers.modules.enabled
to true
This was added in firefox111 if you have older version this may not work
Hey @xenova do you think this will be supported in the future? Thanks!
models.js:3384 Uncaught (in promise) Error: Unsupported model type: whisper
at AutoModelForCTC.from_pretrained (models.js:3384:19)
at async pipelines.js:2071:33
from_pretrained @ models.js:3384
I finetuned my whisper-small model with peft and then merged it back to the base model.
I then converted it into onnx with
https://github.com/xenova/transformers.js/blob/main/scripts/convert.py
However, when i try to run it with whisper-web, the above error showed up
Versions:
Issue
The final result of the transcription is always an empty text.
Steps to Replicate
Notes:
• Uncaught (in promise) Error: Unsupported model type: whisper at Function.from_pretrained (background.bundle. is:2:640741
async background. bundle. is:2:689608
Thanks for the nice afford with this app, I was wondering if I could- use it with the large model because I can see that with the multilanguage the transscription the large model have much better results than the one you are using. I have the large model on my Ubuntu server and test it with Gradio it gives a much better transcription. The question is how to adjust the script the use the large model from my local server?. also I saw in your demo on hugging face there is a microphone I do miss it.
Thanks
I know that the whiper model tis possible to send prompt to align the output in complex transcritption. In this version, I could to do this? If yes, how?
Have you thought about / planned a way to support streaming audio instead of sending the entire audio clip? If its not currently supported, how would you solve it? I would appreciate some guidance to send a proper PR to support streaming, if possible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.