Coder Social home page Coder Social logo

xenova / whisper-web Goto Github PK

View Code? Open in Web Editor NEW
1.4K 1.4K 155.0 1.01 MB

ML-powered speech recognition directly in your browser

Home Page: https://hf.co/spaces/Xenova/whisper-web

License: MIT License

TypeScript 93.02% HTML 0.43% CSS 0.28% JavaScript 6.27%
javascript transformers whisper

whisper-web's People

Contributors

carl-combrinck avatar pushpendersaini0 avatar xenova avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whisper-web's Issues

Error code = 6

Screenshot 2023-09-02 at 3 14 33 PM

macbook pro M2
chrome
whisper medium
chinese
transcribe

Not correctly conversion in one word or short audio and autocorrection in long audio

It is missing line in between paragraph audio and single word or short length audio transcription is not happning . please enhance these thing it is stucking on some word in paragraphp and taking it to the loop as well. and there are autocorrection and filler sentence is adding please do word wise so it can be more accurate. whatever user speak it should in text rather than auto generation of the text.

Word-level-timestamps

Hi Xenova, great work with this repo. Do you know if it's possible to get word-level-timestamps with this? I know it's possible if I'm running whisper in the terminal, but I'm not sure if that functionality extends to this browser/huggingface version, and I don't know how to find out. If you're not sure, feel free to let me know and close the issue.

Missing License

Hey @xenova,
I couldn't find any license in this repo and wanted to ask wether this project can be used commercially (e.g. through MIT License).
Thanks in advance!

Vanilla JS example

It would be great if there was a minimal vanilla JS example of how to use this in a project. I don't use React, so the current output of npm install is very difficult to extract any understanding from.

Could not locate file model.onnx

During development, the decoder_model_merged.onnx and encoder_model.onnx models are fetched fine from Hugging Face or from my Express server. After building the React app with Create React App, transformers.js is trying to fetch model.onnx which cannot be found.

Error: Could not locate file: "https://huggingface.co/Xenova/whisper-tiny.en/resolve/output_attentions/onnx/model.onnx".

image

My pipeline options are { progress_callback, revision: "output_attentions", quantized: false }. The task is "automatic-speech-recognition" and the model is "Xenova/whisper-tiny.en". @xenova What is model.onnx?

Speech Recognition/Whisper word level scores or confidence output

Hey,
Big thanks for awesome project!

It possible to add score/confidence for word level output when using Speech Recognition/Whisper model?
Would appreciate any direction/comments or suggestion where to dig to add it.
Happy to submit PR if I will success in it.

Thanks!

Receiving Error on Hugginface implementation

image

I am receiving this error everytime I tried to upload a wav file to transcribe by setting language multi and quantized medium model. How can we fix this? Is it only on Hugginface?

Development in firefox instruction

I was having trouble running this in dev mode in firefox.
Loading worker.js will fail with SyntaxError: import declarations may only appear at top level of a module

This is because firefox does not support worker modules by default

Fix / Workaround

If you wish to run this in dev mode in firefox
you will have to enable it manually via about:config and set dom.workers.modules.enabled to true

This was added in firefox111 if you have older version this may not work

Uncaught (in promise) Error: Unsupported model type: whisper

models.js:3384 Uncaught (in promise) Error: Unsupported model type: whisper
at AutoModelForCTC.from_pretrained (models.js:3384:19)
at async pipelines.js:2071:33
from_pretrained @ models.js:3384

I finetuned my whisper-small model with peft and then merged it back to the base model.

I then converted it into onnx with

https://github.com/xenova/transformers.js/blob/main/scripts/convert.py

However, when i try to run it with whisper-web, the above error showed up

Chrome Extension Empty Output

Versions:

  • @xenova/transformers": "^2.1.0
  • Manifest v3.0

Issue
The final result of the transcription is always an empty text.

Steps to Replicate

  • Clone this repository
  • Load it as chrome extension
  • Enable microphone by clicking "Record"
  • Activate extension popup (click on icon or press down "Ctrl+B" (windows) OR "Cmd+B" (mac)
  • Open DevTools inspector by right-clicking and selecting "inspect" option on the extension popup
  • Hold down "v" to record, release "v" stop recording and to begin transcribing it.
  • Observe console outputs
  • he completed output (return value of "transcriber") will be always empty
  • Line 105 of background.js

Notes:

• Uncaught (in promise) Error: Unsupported model type: whisper at Function.from_pretrained (background.bundle. is:2:640741
async background. bundle. is:2:689608
  • I am mostly copy-pasting code from this repo's worker.js into my repo's background.js (extension worker).

Screenshots:
Screenshot 2023-09-09 at 11 25 15 PM

Large model

Thanks for the nice afford with this app, I was wondering if I could- use it with the large model because I can see that with the multilanguage the transscription the large model have much better results than the one you are using. I have the large model on my Ubuntu server and test it with Gradio it gives a much better transcription. The question is how to adjust the script the use the large model from my local server?. also I saw in your demo on hugging face there is a microphone I do miss it.
Thanks

Streaming support

Have you thought about / planned a way to support streaming audio instead of sending the entire audio clip? If its not currently supported, how would you solve it? I would appreciate some guidance to send a proper PR to support streaming, if possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.