xenova / whisper-web Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 155.0 1.01 MB

ML-powered speech recognition directly in your browser

Home Page: https://hf.co/spaces/Xenova/whisper-web

License: MIT License

TypeScript 93.02% HTML 0.43% CSS 0.28% JavaScript 6.27%

javascript transformers whisper

whisper-web's People

Contributors

Stargazers

Watchers

Forkers

simaopedros carl-combrinck leaysgur pushpendersaini0 xsteel2003 albertsyh vitaly-z rhutikcodes wiegerwolf leocyzz ivwv nkzhenhua canriquez abdoiiii jodeveloper apollohuang1 automationkit asteryk szuchang sorokinvld knpau millawell cyydev keyzf radames hitech777 zhuxingwan bigship-ai vcpandya suryatmodulus asif4318 ranbeioc shenhaoguangdaydayup tens0rflowjs ladifire jt-wang sontoriyama claindoc ganjunhong andredacity swgn johnathanch chaxus owenwang2430 muzihuaner mexicanamerican turbo-agi forclan xueg-zhou ortega16 ibelem yonggege tzmzj robpruzan inoculate23 bluesealjs litianw lulu0119 ukaserge hardsudo 0xfoc-eth flmendes narendrapsgim tarifut kodiks raidm3 philosoph89 sylvere-bamenou szxyks shahzadarain zxwzxw shopped realiefan papiche q-org el-cl alexandajerry amery2010 hielenagrin nymbo vital121 wobbble adrienhochedez qingbolan tom-tfl dearborn-open-ai paladinknightmaster ekaone modaresimr xuz1213 inspirit941 litongjava syedusama5556 rohitkesharwani9 aytullahdev inverseinductor jwenjian josegomezr

whisper-web's Issues

Error code = 6

macbook pro M2
chrome
whisper medium
chinese
transcribe

Not correctly conversion in one word or short audio and autocorrection in long audio

It is missing line in between paragraph audio and single word or short length audio transcription is not happning . please enhance these thing it is stucking on some word in paragraphp and taking it to the loop as well. and there are autocorrection and filler sentence is adding please do word wise so it can be more accurate. whatever user speak it should in text rather than auto generation of the text.

Word-level-timestamps

Hi Xenova, great work with this repo. Do you know if it's possible to get word-level-timestamps with this? I know it's possible if I'm running whisper in the terminal, but I'm not sure if that functionality extends to this browser/huggingface version, and I don't know how to find out. If you're not sure, feel free to let me know and close the issue.

Please Add SRT export or download button

Please Add SRT export or download button on huggingface demo. Thanks

Missing License

Hey @xenova,
I couldn't find any license in this repo and wanted to ask wether this project can be used commercially (e.g. through MIT License).
Thanks in advance!

Vanilla JS example

It would be great if there was a minimal vanilla JS example of how to use this in a project. I don't use React, so the current output of npm install is very difficult to extract any understanding from.

Could not locate file model.onnx

During development, the decoder_model_merged.onnx and encoder_model.onnx models are fetched fine from Hugging Face or from my Express server. After building the React app with Create React App, transformers.js is trying to fetch model.onnx which cannot be found.

Error: Could not locate file: "https://huggingface.co/Xenova/whisper-tiny.en/resolve/output_attentions/onnx/model.onnx".

My pipeline options are { progress_callback, revision: "output_attentions", quantized: false }. The task is "automatic-speech-recognition" and the model is "Xenova/whisper-tiny.en". @xenova What is model.onnx?

Speech Recognition/Whisper word level scores or confidence output

Hey,
Big thanks for awesome project!

It possible to add score/confidence for word level output when using Speech Recognition/Whisper model?
Would appreciate any direction/comments or suggestion where to dig to add it.
Happy to submit PR if I will success in it.

Thanks!

(° - ° ) help needed for client/server code isolation

I am trying to publish "whisper-web" to IPFS - https://ipfs.tech
https://ipfs.asycn.io/ipfs/QmaC43rMkK3qik5dGuwzHf9NoRzW3JTQf39br8gkVJVd1s

As processing is done in browser, i would just need "html client side files"

any help is welcome

(^‿‿^)

PS: i would understand that you close this issue, as it is not.

Receiving Error on Hugginface implementation

I am receiving this error everytime I tried to upload a wav file to transcribe by setting language multi and quantized medium model. How can we fix this? Is it only on Hugginface?

Doesn't it work on mobiles? Android, iPhone

First of all, thank you for your work. I have a question. Does it work in cell phone browsers? Or only on computers. From already thank you very much

user large v3 online?

i want to clone in HF for use large v3 online, that is possible?

Development in firefox instruction

I was having trouble running this in dev mode in firefox.
Loading worker.js will fail with SyntaxError: import declarations may only appear at top level of a module

This is because firefox does not support worker modules by default

Fix / Workaround

If you wish to run this in dev mode in firefox
you will have to enable it manually via about:config and set dom.workers.modules.enabled to true

This was added in firefox111 if you have older version this may not work

Realtime transcribe / streaming support?

Hey @xenova do you think this will be supported in the future? Thanks!

Uncaught (in promise) Error: Unsupported model type: whisper

models.js:3384 Uncaught (in promise) Error: Unsupported model type: whisper
at AutoModelForCTC.from_pretrained (models.js:3384:19)
at async pipelines.js:2071:33
from_pretrained @ models.js:3384

I finetuned my whisper-small model with peft and then merged it back to the base model.

I then converted it into onnx with

https://github.com/xenova/transformers.js/blob/main/scripts/convert.py

However, when i try to run it with whisper-web, the above error showed up

Chrome Extension Empty Output

Versions:

@xenova/transformers": "^2.1.0
Manifest v3.0

Issue
The final result of the transcription is always an empty text.

Steps to Replicate

Clone this repository
Load it as chrome extension
Enable microphone by clicking "Record"
Activate extension popup (click on icon or press down "Ctrl+B" (windows) OR "Cmd+B" (mac)
Open DevTools inspector by right-clicking and selecting "inspect" option on the extension popup
Hold down "v" to record, release "v" stop recording and to begin transcribing it.
Observe console outputs
he completed output (return value of "transcriber") will be always empty
Line 105 of background.js

Notes:

I had to downgrade to @xenova/[email protected] or below, otherwise I get the error:

• Uncaught (in promise) Error: Unsupported model type: whisper at Function.from_pretrained (background.bundle. is:2:640741
async background. bundle. is:2:689608

I am mostly copy-pasting code from this repo's worker.js into my repo's background.js (extension worker).

Screenshots:

Large model

Thanks for the nice afford with this app, I was wondering if I could- use it with the large model because I can see that with the multilanguage the transscription the large model have much better results than the one you are using. I have the large model on my Ubuntu server and test it with Gradio it gives a much better transcription. The question is how to adjust the script the use the large model from my local server?. also I saw in your demo on hugging face there is a microphone I do miss it.
Thanks

It's possible send a prompt to the transcriber?

I know that the whiper model tis possible to send prompt to align the output in complex transcritption. In this version, I could to do this? If yes, how?

can not support chinese recognize

Streaming support

Have you thought about / planned a way to support streaming audio instead of sending the entire audio clip? If its not currently supported, how would you solve it? I would appreciate some guidance to send a proper PR to support streaming, if possible.