hayabhay / frogbase Goto Github PK

View Code? Open in Web Editor NEW

766.0 17.0 88.0 1.25 MB

Transform audio-visual content into navigable knowledge.

Home Page: https://frogbase.dev

License: MIT License

Python 100.00%

embeddings package python search semantic-search speech-to-text streamlit ui

frogbase's Introduction

🐸 FrogBase

Create navigable knowledge from multi-media content

FrogBase (previously whisper-ui) simplifies the download-transcribe-embed-index workflow for multi-media content. It does so by linking content from various platforms (yt_dlp) with speech-to-text models (OpenAI's Whisper), image & text encoders (SentenceTransformers), and embedding stores (hnswlib).

⚠️ Warning: This is currently a pre-release version and is known to be very unstable. For stable releases, please use any 1.x versions.

from frogbase import FrogBase
fb = FrogBase()
fb.demo()
fb.search("What is the name of the squeaky frog?")

Full Documentation (WIP).

FrogBase also comes with a ready-to-use UI for non-technical users!

whisper-ui-update-demo.mp4

Features

FrogBase currently provides functionality to:

Download media files from a wide range of platforms (YouTube, TikTok, Vimeo, etc.) using yt_dlp
Transcribe audio streams for downloaded & local files using OpenAI's Whisper
Embed transcribed text from corresponding video segments using Sentence Transformers
Index & search the embedded content using hnswlib

FrogBase also includes a Streamlit UI to provide a simple GUI for the above functionality enabling a locally hosted, interactive experience.

Software Developers

This section is for software developers who want to use FrogBase as a python package.

Install ffmpeg and FrogBase

sudo apt install ffmpeg
pip install frogbase

Import FrogBase and use it as follows -

from frogbase import FrogBase

fb = FrogBase()

sources = [
   "https://www.youtube.com/watch?v=HBxn56l9WcU",
   "https://www.youtube.com/@hayabhay"
]

fb.add(sources)

fb.search("What is the name of the squeaky frog?")

Non-technical Users

This section is for non-technical users who want to use FrogBase primarily through the accompanying Streamlit UI.

Download the latest release of FrogBase from here and unzip it. Or, you can also clone the repository console git clone https://github.com/hayabhay/frogbase.git
Install FrogBase dependencies manually and run the UI.

Note: This also requires ffmpeg to be installed on your system. You can install it using sudo apt install ffmpeg on Ubuntu.
1. Using pip
```
pip install frogbase streamlit
streamlit run ui/01_🏠_Home.py
```

[Coming soon] Instructions, environment for installation using Docker & Anaconda

frogbase's People

Contributors

Stargazers

Watchers

Forkers

brucepro ricky-sb martinb-ai techthiyanes willtejeda devatdawn sangeethavenkatesan badriprudhvi cate9021 ranggakd ilkersigirci vanjapro jose-begeospatial vedantsingh60 alexzhangji fenago aranch enpi krazy008 im-mp gregory34000 shuyuhuang pitbouyaaaa eidenz napiquet rusevsk kewingj badbl0cks orinocoz prevosti if-ai oboje bingtian88 kjeymax jskherman teemuso mileszim andchir touristshaun pepesilvia2622 bluelinden bhuvanesh morioka ducktapedevops alistair1231 rekidunois liusishan kamil-roszak gunniho hoperiver ayaanzaveri aboomardiiyah cvrobot coruscant187 huhrray jluizgomes reverie-dev fjmoyao yanndd1 bigrixin evmond1 tomjohnh kkpan11 aibol-next takano32 minorcaster armandfardeau imclab lynxye resistor4u hitech777 vin901 blackhawk71 notmoebius deenihl d3fense ituwo shaneholloman fordsupr ngeniedeveloper p4p4n1ck jaedukseo quyen88 yif-liu-github

frogbase's Issues

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Having trouble to get this working on Windows. Any ideas?

Python 3.9.4

(venv) PS C:\Users\tiki\Documents\Transcription\frogbase> streamlit run ui/01_🏠_Home.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://10.0.0.156:8501

2023-09-19 10:45:53.943 Uncaught app exception
Traceback (most recent call last):
  File "c:\users\tiki\documents\transcription\frogbase\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
  File "C:\Users\tiki\Documents\Transcription\frogbase\ui\01_🏠_Home.py", line 13, in <module>
    init_session(st.session_state)
  File "ui\config.py", line 60, in init_session
    from frogbase import FrogBase
  File "C:\Users\tiki\Documents\Transcription\frogbase\frogbase\__init__.py", line 1, in <module>
    from .core import FrogBase
  File "C:\Users\tiki\Documents\Transcription\frogbase\frogbase\core.py", line 13, in <module>
    from frogbase.captions import Captions
  File "C:\Users\tiki\Documents\Transcription\frogbase\frogbase\captions.py", line 17, in <module>
    class Captions(BaseModel):
  File "C:\Users\tiki\Documents\Transcription\frogbase\frogbase\captions.py", line 42, in Captions
    settings: dict | None = Field(default=None, description="The settings under which these captions were generated.")
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

"Download error" when attempting to upload from local storage

DownloadError: �[0;31mERROR:�[0m [generic] None: Unable to download webpage: (caused by URLError('unknown url type: c'))

Traceback:

File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\Users\tzundo\Documents\frogbase (ai transcribe)\frogbase-2.0.0a1\ui\01_🏠_Home.py", line 103, in
fb.add(sources, **opts).transcribe(ignore_captioned=True)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\Documents\frogbase (ai transcribe)\frogbase-2.0.0a1\frogbase\core.py", line 237, in add
self._media_buffer = self.media.add(sources, **opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\Documents\frogbase (ai transcribe)\frogbase-2.0.0a1\frogbase\media.py", line 499, in add
added_media += self._add_from_web(source, **opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\Documents\frogbase (ai transcribe)\frogbase-2.0.0a1\frogbase\media.py", line 264, in _add_from_web
ydl.download(url)
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 3485, in download
self.__download_wrapper(self.extract_info)(
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 3460, in wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 1549, in extract_info
return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 1578, in wrapper
self.report_error(str(e), e.format_traceback())
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 1042, in report_error
self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 981, in trouble
raise DownloadError(message, exc_info)

Feature request - quick way to pause audio without losing spot in page?

Hey, really enjoying checking this so far, thanks for sharing it.

I'm trying this out by reading along some transcripts for some YouTube videos and taking notes, and sometimes I have to pause to finish my note but I don't have an easy way of doing that without scrolling all the way up the page to the pause button on the audio player (I'm currently just using the trimmed audio player). But then after I pause, I have lost my spot in the page on the transcript. Any ideas for making pausing the audio player easier and letting me keep my spot on the page to make it easy to take notes?

Also another sneak request, in the terminal stdout logs I see the transcription timestamps in MM:SS.mmm format, but in the Streamlit UI it looks like the timestamps are only in seconds (so like 181.6s instead of 03:01.600). Any way to get the Streamlit UI timestamps format to minutes, seconds, millis?

RuntimeError: The size of tensor a (261) must match the size of tensor b (3) at non-singleton dimension 3

hello i get this error with large model

nvidia rtx3080ti
32Gb ram
ryzen 7 5800X3D
debian 11 with wsl2

2023-06-17 11:41:10.806 Uncaught app exception | 0/222333 [00:00<?, ?frames/s]
Traceback (most recent call last):
File "/home/ben/.local/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/home/ben/whisper-ui/app/01_🏠_Home.py", line 77, in
media_manager.add(
File "app/core.py", line 181, in add
self._create(source_list=source_list, source_type=source_type, **whisper_args)
File "app/core.py", line 163, in _create
self._transcribe_and_save(media_obj, whisper_model, **whisper_args)
File "app/core.py", line 74, in _transcribe_and_save
transcript = self._transcribe(media_obj.filepath, whisper_model, **whisper_args)
File "app/core.py", line 64, in _transcribe
transcript = transcriber.transcribe(
File "/home/ben/.local/lib/python3.9/site-packages/whisper/transcribe.py", line 181, in transcribe
result: DecodingResult = decode_with_fallback(segment)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/transcribe.py", line 117, in decode_with_fallback
decode_result = model.decode(segment, options)
File "/home/ben/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/decoding.py", line 705, in decode
result = DecodingTask(model, options).run(mel)
File "/home/ben/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/decoding.py", line 637, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/decoding.py", line 592, in _main_loop
logits = self.inference.logits(tokens, audio_features)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/decoding.py", line 145, in logits
return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
File "/home/ben/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/model.py", line 190, in forward
x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
File "/home/ben/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/model.py", line 125, in forward
x = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)[0]
File "/home/ben/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/model.py", line 85, in forward
wv, qk = self.qkv_attention(q, k, v, mask)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/model.py", line 97, in qkv_attention
qk = qk + mask[:n_ctx, :n_ctx]
RuntimeError: The size of tensor a (261) must match the size of tensor b (3) at non-singleton dimension 3

No module named openai? idid everything in the nontechnical

The connection has timed out

After #29 was resolved and I was able to install things I still haven't been able to actually get the UI working.
When I click the link to open the local IP address I get The connection has timed out. This happens both locally on my mac (with docker) and in gitpod (without docker).

I don't see any error messages.
Any pointers on debugging?

If you want to try in gitpod you can at https://gitpod.io/?editor=code#https://github.com/hayabhay/whisper-ui
Just make sure to install ffmpeg

Using docker for existing users isn't backwards compatible

Mounting data directory into a docker container won't work smoothly since the database saves absolute local paths of media files.

Solution: Patch to update all database paths so they're all relative to the repo's location (#23) in a backwards compatible way.

Youtube translation issue

[Feature Request] add support for more types of media file e.g. webm

As a language learner, I collected lots of video and audio files to increase my input, the batch transcribing feature of this UI makes my life a lot easier. Much thanks!

I have a suggestion for the new update that I hope you could consider. I would like to request that you add support for more types of media files, such as .webm. This would make it more convenient to use.

I understand that this may require time and effort. I just wanted to share my feedback and express my gratitude for this awesome ui.

Unable to upload local content

When trying to upload local content (175mb .mkv file) the following error is given. Youtube works fine.

This is running in a Ubuntu VM.

I've attempted to move the temp directory but it went missing after trying to upload.

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/Filenamehere.mkv'
Traceback:

File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/home/frogbase/Desktop/frogbase/frogbase-main/ui/01_🏠_Home.py", line 103, in
fb.add(sources, **opts).transcribe(ignore_captioned=False).embed().index()
File "/home/frogbase/Desktop/frogbase/frogbase-main/frogbase/core.py", line 246, in add
self._media_buffer = self.media.add(sources, **opts)
File "/home/frogbase/Desktop/frogbase/frogbase-main/frogbase/media.py", line 504, in add
added_media += self._add_from_disk(source, **opts)
File "/home/frogbase/Desktop/frogbase/frogbase-main/frogbase/media.py", line 464, in _add_from_disk
upload_date=datetime.fromtimestamp(media_file.stat().st_ctime).strftime("%Y%m%d"),
File "/usr/lib/python3.10/pathlib.py", line 1097, in stat
return self._accessor.stat(self, follow_symlinks=follow_symlinks)

ModuleNotFoundError: No module named 'sqlalchemy

Aparece esse erro, quando coloco o link do youtube, ou faço upload do arquivo.

Issue when running with docker

Hey y'all thanks so much for making this. It looks really great and will be helpful for my studies.

When I run docker-compose up it exits with the following error:

 => ERROR [6/7] RUN pip install --no-cache-dir -r /requirements.txt                                                  20.7s
------
 > [6/7] RUN pip install --no-cache-dir -r /requirements.txt:
#0 0.573 Collecting openai-whisper==20230124
#0 0.694   Downloading openai-whisper-20230124.tar.gz (1.2 MB)
#0 1.002      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 3.8 MB/s eta 0:00:00
#0 1.059   Preparing metadata (setup.py): started
#0 1.410   Preparing metadata (setup.py): finished with status 'done'
#0 1.449 Collecting pytube==12.1.2
#0 1.473   Downloading pytube-12.1.2-py3-none-any.whl (57 kB)
#0 1.489      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.0/57.0 kB 3.6 MB/s eta 0:00:00
#0 1.726 Collecting SQLAlchemy==2.0.3
#0 1.772   Downloading SQLAlchemy-2.0.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.8 MB)
#0 4.235      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.8/2.8 MB 1.2 MB/s eta 0:00:00
#0 4.313 Collecting streamlit==1.18.1
#0 4.337   Downloading streamlit-1.18.1-py2.py3-none-any.whl (9.6 MB)
#0 8.298      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.6/9.6 MB 2.4 MB/s eta 0:00:00
#0 8.548 Collecting numpy
#0 8.575   Downloading numpy-1.24.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.0 MB)
#0 20.32      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 1.1 MB/s eta 0:00:00
#0 20.41 ERROR: Ignored the following versions that require a different python version: 0.55.2 Requires-Python <3.5; 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
#0 20.41 ERROR: Could not find a version that satisfies the requirement torch (from openai-whisper) (from versions: none)
#0 20.41 ERROR: No matching distribution found for torch
------
failed to solve: executor failed running [/bin/sh -c pip install --no-cache-dir -r /requirements.txt]: exit code: 1

This could possibly be related to me running with an M1 mac?

Please let me know if there is anything I can do to help debug here.

Feature request Output text as txt, vtt and srt

Thanks for develop this cool whisper UI project, would like to ask if possible to output the result as .txt, .vtt and .srt ?

It's already in whisper/transcribe.py:

` for audio_path in args.pop("audio"):
result = transcribe(model, audio_path, temperature=temperature, **args)

    audio_basename = os.path.basename(audio_path)

    # save TXT
    with open(os.path.join(output_dir, audio_basename + ".txt"), "w", encoding="utf-8") as txt:
        write_txt(result["segments"], file=txt)

    # save VTT
    with open(os.path.join(output_dir, audio_basename + ".vtt"), "w", encoding="utf-8") as vtt:
        write_vtt(result["segments"], file=vtt)

    # save SRT
    with open(os.path.join(output_dir, audio_basename + ".srt"), "w", encoding="utf-8") as srt:
        write_srt(result["segments"], file=srt)`

ffmpeg Issue

Getting this issue :

[Errno 2] No such file or directory: 'ffmpeg'

Anyone knows the solution here?

Problem after update.

Solving environment: failed

I am trying to run it on windows 11 and I am getting the error below:

conda env create -f environment.yml


Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - zlib==1.2.13=h5eee18b_0
  - bzip2==1.0.8=h7b6447c_0
  - libffi==3.4.2=h6a678d5_6
  - python==3.11.0=h7a1cb2a_2
  - certifi==2022.9.24=py311h06a4308_0
  - tk==8.6.12=h1ccaba5_0
  - pip==22.2.2=py311h06a4308_0
  - libgcc-ng==11.2.0=h1234567_1
  - openssl==1.1.1s=h7f8727e_0
  - ncurses==6.4=h6a678d5_0
  - libstdcxx-ng==11.2.0=h1234567_1
  - xz==5.2.10=h5eee18b_1
  - readline==8.2=h5eee18b_0
  - ca-certificates==2023.01.10=h06a4308_0
  - sqlite==3.40.1=h5082296_0
  - setuptools==65.5.0=py311h06a4308_0
  - _openmp_mutex==5.1=1_gnu
  - libuuid==1.41.5=h5eee18b_0
  - libgomp==11.2.0=h1234567_1
  - ld_impl_linux-64==2.38=h1181459_1

Any tips on how to get it installed?

Apparent problem with certain special characters in File/ Youtube Video name

New to your app, very happy with it so far.

I have had an instance happen twice where a file throws an error. After some very rough testing it seems like its related to the Colon in each title.

I got the following error with this URL: https://youtu.be/qQ84delhpuw

Traceback:

File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "E:\github\whisper-ui\app\01_🏠_Home.py", line 71, in
media_manager.add(
File "E:\github\whisper-ui\app\core.py", line 165, in add
self._create(source_list=source_list, source_type=source_type, **whisper_args)
File "E:\github\whisper-ui\app\core.py", line 117, in _create
yc.streams.get_by_itag(140).download(save_dir, filename=save_filename)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\streams.py", line 298, in download
file_path = self.get_file_path(
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\streams.py", line 349, in get_file_path
return os.path.join(target_directory(output_path), filename)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\helpers.py", line 254, in target_directory
os.makedirs(output_path, exist_ok=True)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\os.py", line 225, in makedirs
mkdir(name, mode)

Similar issue with this URL: https://youtu.be/uYI1PpMElgM

Traceback:

Feature request-Can it do Translation soon?

I have a need to update and translate a bunch of transcripts on a channel. Keep up the good work.

[Feature Request] Ability to edit transcript

Hi,

I've recently installed and playing about with Whisper-UI. It seems like a great tool and will come in handy.

One thing i think would be handy is to be able to edit the transcript manually within the web ui, as it picked up a few things wrong and being able to listen and update the transcript from the web ui would be handy.

Thanks

error report

Hi, I got a new file yesterday and pressed get sample video, but it's like that, what's the error?

UnicodeDecodeError: 'cp949' codec can't decode byte 0xf0 in position 60: illegal multibyte sequence
Traceback:
File "C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\frogbase-main\ui\01_🏠_Home.py", line 255, in
fb.demo()
File "C:\frogbase-main\frogbase\core.py", line 376, in demo
self.add(sources, audio_only=False).transcribe(ignore_captioned=False).embed(overwrite=False).index()
File "C:\frogbase-main\frogbase\core.py", line 246, in add
self._media_buffer = self.media.add(sources, **opts)
File "C:\frogbase-main\frogbase\media.py", line 501, in add
added_media += self._add_from_web(source, **opts)
File "C:\frogbase-main\frogbase\media.py", line 278, in add_from_web
infodict = json.load(f)
File "C:\Users\seo\AppData\Local\Programs\Python\Python310\lib\json_init.py", line 293, in load
return loads(fp.read(),

Winerror 2 The system cannot find the file specified

I keep getting this error after upload the mp3 file:

File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
File "D:\AI-Generators\whisper-ui\app\01_🏠_Home.py", line 77, in <module>
    media_manager.add(
File "D:\AI-Generators\whisper-ui\app\core.py", line 169, in add
    self._create(source_list=source_list, source_type=source_type, **whisper_args)
File "D:\AI-Generators\whisper-ui\app\core.py", line 151, in _create
    self._transcribe_and_save(media_obj, whisper_model, **whisper_args)
File "D:\AI-Generators\whisper-ui\app\core.py", line 64, in _transcribe_and_save
    transcript = self._transcribe(media_obj.filepath, whisper_model, **whisper_args)
File "D:\AI-Generators\whisper-ui\app\core.py", line 54, in _transcribe
    transcript = transcriber.transcribe(
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\whisper\transcribe.py", line 84, in transcribe
    mel = log_mel_spectrogram(audio)
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\whisper\audio.py", line 111, in log_mel_spectrogram
    audio = load_audio(audio)
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\ffmpeg\_run.py", line 313, in run
    process = run_async(
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
    return subprocess.Popen(
File "C:\Program Files\Python310\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\Python310\lib\subprocess.py", line 1440, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,

Help. Still can't get it to run

Cant run it

File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\Users\Arthas\Downloads\frogbase-main\ui\01_🏠_Home.py", line 12, in
init_session(st.session_state)
File "C:\Users\Arthas\Downloads\frogbase-main\ui\config.py", line 65, in init_session
session_state.fb = FrogBase(datadir=DATADIR, library=library, verbose=VERBOSE, dev=DEV, persist=True)
File "C:\Users\Arthas\Downloads\frogbase-main\frogbase\core.py", line 109, in init
self.library = self._default_library
File "C:\Users\Arthas\Downloads\frogbase-main\frogbase\core.py", line 139, in library
self._initdb()
File "C:\Users\Arthas\Downloads\frogbase-main\frogbase\core.py", line 188, in _initdb
db_meta = self._db.get(Query().type == "meta") if self._db else None
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\tinydb\database.py", line 268, in len
return len(self.table(self.default_table_name))
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\tinydb\table.py", line 645, in len
return len(self._read_table())
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\tinydb\table.py", line 704, in _read_table
tables = self.storage.read()
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\tinydb\storages.py", line 136, in read
return json.load(self.handle)
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\json_init.py", line 293, in load
return loads(fp.read(),
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\json_init.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)

Feature request: Implement word-level confidence score visualisation by color coding the transcript

This whisper gui implementation visualizes the confidence score by coloring words: https://github.com/jojojaeger/whisper-streamlit This would be helpful for quicker review or for model accuracy comparison

(Feature request) Voice Activity Detector

Hello! I'm coming from your post on r/MachineLearning.
Japanese transcriptions are more accurate with a VAD and that's the only reason I keep using some very simple WebUI.
Do you have any plan to integrate a detector?

Links for reference:
VAD: https://github.com/snakers4/silero-vad
WebUI I'm currently using: openai/whisper#397

Discord link dead

Link for discord channel dead

It doesn't seem to accept YouTube videos anymore

Thank you so much for the amazing work!

I'm unable to get a transcription for YouTube videos, I always get "Selected video/audio is could not be located.." no matter what video I load in

Wrong language detected

Quite a few times whisper detects the wrong language when asked to translate from not english. Would it be possible to get an option to select the language that the audio is in? Could be useful for both foreign transcribe and translate tasks

Aparece esse erro, e não carrega o vídeo ou a midia enviada

File uploads fail to transcribe unless they are MP4 format.

Trying to upload a PCM wav file, MP3 audio fail to process due to error. (Uploading a wav file gives a codec error) however converting the file to mp4 the process works.

  Metadata:
    encoder         : Lavf56.40.101
  Duration: 00:08:55.75, start: 0.138125, bitrate: 64 kb/s
  Stream #0:0: Audio: mp3, 8000 Hz, mono, fltp, 64 kb/s
[mp4 @ 0x7f970a304ec0] track 0: muxing mp3 at 8000hz is not standard, to mux anyway set strict to -1
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:0 --
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
    Last message repeated 1 times
2023-01-25 11:30:51.555 Uncaught app exception
Traceback (most recent call last):
  File "/Users/ben/Developer/whisper-ui/.venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "/Users/ben/Developer/whisper-ui/01_Transcribe.py", line 66, in <module>
    st.session_state.transcription = Transcription(name, input_file, "file", start, duration)
  File "/Users/ben/Developer/whisper-ui/transcriber.py", line 54, in __init__
    ffmpeg.run(audio, overwrite_output=True)
  File "/Users/ben/Developer/whisper-ui/.venv/lib/python3.10/site-packages/ffmpeg/_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

The current work around is to convert the file to wav then back to mp4

ffmpeg -i src.mp3 audio.wav
ffmpeg -i audio.wav audio.mp4

Error while deploying with Portainer

Hi,

I'm trying to deploy the docker image using the compose file available with a few changes -

version: "3.9"

services:
  whisper-ui:
    image: hayabhay/whisper-ui
    build: .
    container_name: whisper-ui
    networks:
      - ai
    volumes:
      - /srv/dev-disk-by-uuid-6bcdc785-ff84-4338-824f-ac76e5d9695c/docker/whisper/data:/data
    restart: unless-stopped
    ports:
      - 1010:8501

networks:
  ai:
    external: true

But I keep getting this error from Portainer -
failed to deploy a stack: whisper-ui Pulling whisper-ui Warning failed to solve: failed to read dockerfile: open /var/lib/docker/tmp/buildkit-mount3601388643/Dockerfile: no such file or directory

What am I doing wrong?

README Usage doesn't work

streamlit run app.py doesn't work, should it be streamlit run transcriber.py ?

the app not working, getting issues

im using windows 10 & installed all requirements, & got this issue.

Feature Request: Import Transcripts

Hello, thanks for this awesome project!

I've been working on transcribing a podcast that I listen to with Whisper.cpp and have about 2k worth of hours transcribed already. Is there any way to easily import my pre-existing transcriptions into Whisper-ui?

Maybe have the app scan for new directories in the media folder? It would be easy to copy data to it and rename the files to the format that is currently used.

error at startup : unsupported operand type(s) for |: 'type' and 'NoneType'

Hello,
I have a problem starting the web service

the server is in ubuntu 20.04 LTS and I do not understand where the problem comes from
If you have an idea to help me
Thanks

CPU Dynamic Quantization

Would it be possible for you guys to add an option to enable dynamic quantization of the model when it's being run on a CPU? This would greatly improve the run-time performance of the OpenAI Whisper model (CPU-only) with minimal to no loss in performance.

The benchmarks for this are available here.

The implementation only requires adding a few lines of code using features which are already built into PyTorch.

Implementation

Quantization of the Whisper model requires changing the Linear()
layers within the model to nn.Linear(). This is because you need
to specifiy which layer types to dynamically quantize, such as:

quantized_model = torch.quantization.quantize_dynamic(
    model_fp32, {torch.nn.Linear}, dtype=torch.qint8
)

However the whisper model is designed to be adaptable, i.e.
it can run at different precisions, so the Linear() layer contains
custom code to account for this. However, this is not required for
the quantized model. You can either change the Linear() layers in
"/whisper/whisper/model.py" yourself (i.e. create a fork of OpenAI-Whisper
which would be compatible with future merges), or you can use
mine from here.

WhisperX support

Can you please add @whisperx support for speaker diarization and better timestamping

Docker image

A Docker image would be great to make it easier to deploy.

A draft for the quantized version's Dockerfile could look like

FROM python:latest
COPY . .
RUN apt-get update
RUN apt-get install -y ffmpeg
RUN pip install streamlit
RUN pip install setuptools-rust
# RUN pip install git+https://github.com/openai/whisper.git
RUN pip install git+https://github.com/MiscellaneousStuff/whisper.git

ENV PATH="$HOME/.cargo/bin:$PATH"
EXPOSE 8501
VOLUME /data/.whisper_settings.json
CMD streamlit run app/01_🏠_Home.py

I'm trying to figure out what directory is used for the data.

Edit: EXPOSE instead of PORT

Summarize Issue: IndexError: index out of range in self

Getting this error when running Summarize

"Return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self"

Found a bug when transcribe an uploaded files

Update to streamlit 1.20.0 or higher to avoid altair.vegalite.v4 module missing issue

Hello,

when starting the app streamlit 1.18.1 I get this issue :

ModuleNotFoundError: No module named ‘altair.vegalite.v4

Following this post I updated to streamlit 1.20.0 and it was fixed, so I kindly suggest to update the requirements accordingly.

Raphaël

Implement Massively Multilingual Speech - Meta's Open Source model with less than half of Whispers error rate

Please consider implementing Meta's MMS with speech recognition and generation support for over 1000 languages at a drastically reduced error rate compared to Whisper:

https://github.com/facebookresearch/fairseq/tree/main/examples/mms

https://ai.facebook.com/blog/multilingual-model-speech-recognition/

Translation feature request

Given that the Whisper model already supports translation, it would be very cool to also have it supported inside this app.

Feature request.

Hey thanks for creating this… I have a question … do you have any plans for implementing speaker diarization as well.

thank!

Error when uploading local files

Any local file I try to upload, whether audio or video, I get this error:

DownloadError: �[0;31mERROR:�[0m [generic] None: Unable to download webpage: (caused by URLError('unknown url type: c'))
Traceback:
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\Users\Anshul\Downloads\frogbase-main\ui\01_🏠_Home.py", line 103, in
fb.add(sources, **opts).transcribe(ignore_captioned=False).embed().index()
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\Downloads\frogbase-main\frogbase\core.py", line 246, in add
self._media_buffer = self.media.add(sources, **opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\Downloads\frogbase-main\frogbase\media.py", line 499, in add
added_media += self._add_from_web(source, **opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\Downloads\frogbase-main\frogbase\media.py", line 264, in _add_from_web
ydl.download(url)
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 3485, in download
self.__download_wrapper(self.extract_info)(
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 3460, in wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 1549, in extract_info
return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 1578, in wrapper
self.report_error(str(e), e.format_traceback())
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 1042, in report_error
self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 981, in trouble
raise DownloadError(message, exc_info)

hayabhay / frogbase Goto Github PK

frogbase's Introduction

🐸 FrogBase

Features

Quickstart

Software Developers

Non-technical Users

Links

frogbase's People

Contributors

Stargazers

Watchers

Forkers

frogbase's Issues

Implementation

Recommend Projects

Recommend Topics

Recommend Org