Coder Social home page Coder Social logo

hayabhay / frogbase Goto Github PK

View Code? Open in Web Editor NEW
766.0 17.0 88.0 1.25 MB

Transform audio-visual content into navigable knowledge.

Home Page: https://frogbase.dev

License: MIT License

Python 100.00%
embeddings package python search semantic-search speech-to-text streamlit ui

frogbase's Introduction

🐸 FrogBase

Create navigable knowledge from multi-media content

FrogBase (previously whisper-ui) simplifies the download-transcribe-embed-index workflow for multi-media content. It does so by linking content from various platforms (yt_dlp) with speech-to-text models (OpenAI's Whisper), image & text encoders (SentenceTransformers), and embedding stores (hnswlib).

⚠️ Warning: This is currently a pre-release version and is known to be very unstable. For stable releases, please use any 1.x versions.

from frogbase import FrogBase
fb = FrogBase()
fb.demo()
fb.search("What is the name of the squeaky frog?")

Full Documentation (WIP).

FrogBase also comes with a ready-to-use UI for non-technical users!

whisper-ui-update-demo.mp4

PyPI Status Python Version License

Features

FrogBase currently provides functionality to:

  • Download media files from a wide range of platforms (YouTube, TikTok, Vimeo, etc.) using yt_dlp
  • Transcribe audio streams for downloaded & local files using OpenAI's Whisper
  • Embed transcribed text from corresponding video segments using Sentence Transformers
  • Index & search the embedded content using hnswlib

FrogBase also includes a Streamlit UI to provide a simple GUI for the above functionality enabling a locally hosted, interactive experience.

Quickstart

Software Developers

This section is for software developers who want to use FrogBase as a python package.

  1. Install ffmpeg and FrogBase

    sudo apt install ffmpeg
    pip install frogbase
  2. Import FrogBase and use it as follows -

    from frogbase import FrogBase
    
    fb = FrogBase()
    
    sources = [
       "https://www.youtube.com/watch?v=HBxn56l9WcU",
       "https://www.youtube.com/@hayabhay"
    ]
    
    fb.add(sources)
    
    fb.search("What is the name of the squeaky frog?")

Non-technical Users

This section is for non-technical users who want to use FrogBase primarily through the accompanying Streamlit UI.

  1. Download the latest release of FrogBase from here and unzip it. Or, you can also clone the repository console git clone https://github.com/hayabhay/frogbase.git

  2. Install FrogBase dependencies manually and run the UI.

    Note: This also requires ffmpeg to be installed on your system. You can install it using sudo apt install ffmpeg on Ubuntu.

    1. Using pip

      pip install frogbase streamlit
      streamlit run ui/01_🏠_Home.py

[Coming soon] Instructions, environment for installation using Docker & Anaconda

Links

frogbase's People

Contributors

andchir avatar armandfardeau avatar badbl0cks avatar d3fense avatar eidenz avatar hayabhay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

frogbase's Issues

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Having trouble to get this working on Windows. Any ideas?

Python 3.9.4

(venv) PS C:\Users\tiki\Documents\Transcription\frogbase> streamlit run ui/01_🏠_Home.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://10.0.0.156:8501

2023-09-19 10:45:53.943 Uncaught app exception
Traceback (most recent call last):
  File "c:\users\tiki\documents\transcription\frogbase\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
  File "C:\Users\tiki\Documents\Transcription\frogbase\ui\01_🏠_Home.py", line 13, in <module>
    init_session(st.session_state)
  File "ui\config.py", line 60, in init_session
    from frogbase import FrogBase
  File "C:\Users\tiki\Documents\Transcription\frogbase\frogbase\__init__.py", line 1, in <module>
    from .core import FrogBase
  File "C:\Users\tiki\Documents\Transcription\frogbase\frogbase\core.py", line 13, in <module>
    from frogbase.captions import Captions
  File "C:\Users\tiki\Documents\Transcription\frogbase\frogbase\captions.py", line 17, in <module>
    class Captions(BaseModel):
  File "C:\Users\tiki\Documents\Transcription\frogbase\frogbase\captions.py", line 42, in Captions
    settings: dict | None = Field(default=None, description="The settings under which these captions were generated.")
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

"Download error" when attempting to upload from local storage

DownloadError: οΏ½[0;31mERROR:οΏ½[0m [generic] None: Unable to download webpage: (caused by URLError('unknown url type: c'))

Traceback:

File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\Users\tzundo\Documents\frogbase (ai transcribe)\frogbase-2.0.0a1\ui\01_🏠_Home.py", line 103, in
fb.add(sources, **opts).transcribe(ignore_captioned=True)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\Documents\frogbase (ai transcribe)\frogbase-2.0.0a1\frogbase\core.py", line 237, in add
self._media_buffer = self.media.add(sources, **opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\Documents\frogbase (ai transcribe)\frogbase-2.0.0a1\frogbase\media.py", line 499, in add
added_media += self._add_from_web(source, **opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\Documents\frogbase (ai transcribe)\frogbase-2.0.0a1\frogbase\media.py", line 264, in _add_from_web
ydl.download(url)
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 3485, in download
self.__download_wrapper(self.extract_info)(
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 3460, in wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 1549, in extract_info
return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 1578, in wrapper
self.report_error(str(e), e.format_traceback())
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 1042, in report_error
self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
File "C:\Users\tzundo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\yt_dlp\YoutubeDL.py", line 981, in trouble
raise DownloadError(message, exc_info)

Feature request - quick way to pause audio without losing spot in page?

Hey, really enjoying checking this so far, thanks for sharing it.

I'm trying this out by reading along some transcripts for some YouTube videos and taking notes, and sometimes I have to pause to finish my note but I don't have an easy way of doing that without scrolling all the way up the page to the pause button on the audio player (I'm currently just using the trimmed audio player). But then after I pause, I have lost my spot in the page on the transcript. Any ideas for making pausing the audio player easier and letting me keep my spot on the page to make it easy to take notes?

Also another sneak request, in the terminal stdout logs I see the transcription timestamps in MM:SS.mmm format, but in the Streamlit UI it looks like the timestamps are only in seconds (so like 181.6s instead of 03:01.600). Any way to get the Streamlit UI timestamps format to minutes, seconds, millis?

RuntimeError: The size of tensor a (261) must match the size of tensor b (3) at non-singleton dimension 3

hello i get this error with large model

nvidia rtx3080ti
32Gb ram
ryzen 7 5800X3D
debian 11 with wsl2

2023-06-17 11:41:10.806 Uncaught app exception | 0/222333 [00:00<?, ?frames/s]
Traceback (most recent call last):
File "/home/ben/.local/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/home/ben/whisper-ui/app/01_🏠_Home.py", line 77, in
media_manager.add(
File "app/core.py", line 181, in add
self._create(source_list=source_list, source_type=source_type, **whisper_args)
File "app/core.py", line 163, in _create
self._transcribe_and_save(media_obj, whisper_model, **whisper_args)
File "app/core.py", line 74, in _transcribe_and_save
transcript = self._transcribe(media_obj.filepath, whisper_model, **whisper_args)
File "app/core.py", line 64, in _transcribe
transcript = transcriber.transcribe(
File "/home/ben/.local/lib/python3.9/site-packages/whisper/transcribe.py", line 181, in transcribe
result: DecodingResult = decode_with_fallback(segment)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/transcribe.py", line 117, in decode_with_fallback
decode_result = model.decode(segment, options)
File "/home/ben/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/decoding.py", line 705, in decode
result = DecodingTask(model, options).run(mel)
File "/home/ben/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/decoding.py", line 637, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/decoding.py", line 592, in _main_loop
logits = self.inference.logits(tokens, audio_features)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/decoding.py", line 145, in logits
return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
File "/home/ben/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/model.py", line 190, in forward
x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
File "/home/ben/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/model.py", line 125, in forward
x = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)[0]
File "/home/ben/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/model.py", line 85, in forward
wv, qk = self.qkv_attention(q, k, v, mask)
File "/home/ben/.local/lib/python3.9/site-packages/whisper/model.py", line 97, in qkv_attention
qk = qk + mask[:n_ctx, :n_ctx]
RuntimeError: The size of tensor a (261) must match the size of tensor b (3) at non-singleton dimension 3

The connection has timed out

After #29 was resolved and I was able to install things I still haven't been able to actually get the UI working.
When I click the link to open the local IP address I get The connection has timed out. This happens both locally on my mac (with docker) and in gitpod (without docker).

I don't see any error messages.
Any pointers on debugging?

If you want to try in gitpod you can at https://gitpod.io/?editor=code#https://github.com/hayabhay/whisper-ui
Just make sure to install ffmpeg

Using docker for existing users isn't backwards compatible

Mounting data directory into a docker container won't work smoothly since the database saves absolute local paths of media files.

Solution: Patch to update all database paths so they're all relative to the repo's location (#23) in a backwards compatible way.

[Feature Request] add support for more types of media file e.g. webm

As a language learner, I collected lots of video and audio files to increase my input, the batch transcribing feature of this UI makes my life a lot easier. Much thanks!

I have a suggestion for the new update that I hope you could consider. I would like to request that you add support for more types of media files, such as .webm. This would make it more convenient to use.

I understand that this may require time and effort. I just wanted to share my feedback and express my gratitude for this awesome ui.

Unable to upload local content

When trying to upload local content (175mb .mkv file) the following error is given. Youtube works fine.

This is running in a Ubuntu VM.

I've attempted to move the temp directory but it went missing after trying to upload.

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/Filenamehere.mkv'
Traceback:

File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/home/frogbase/Desktop/frogbase/frogbase-main/ui/01_🏠_Home.py", line 103, in
fb.add(sources, **opts).transcribe(ignore_captioned=False).embed().index()
File "/home/frogbase/Desktop/frogbase/frogbase-main/frogbase/core.py", line 246, in add
self._media_buffer = self.media.add(sources, **opts)
File "/home/frogbase/Desktop/frogbase/frogbase-main/frogbase/media.py", line 504, in add
added_media += self._add_from_disk(source, **opts)
File "/home/frogbase/Desktop/frogbase/frogbase-main/frogbase/media.py", line 464, in _add_from_disk
upload_date=datetime.fromtimestamp(media_file.stat().st_ctime).strftime("%Y%m%d"),
File "/usr/lib/python3.10/pathlib.py", line 1097, in stat
return self._accessor.stat(self, follow_symlinks=follow_symlinks)

Issue when running with docker

Hey y'all thanks so much for making this. It looks really great and will be helpful for my studies.

When I run docker-compose up it exits with the following error:

 => ERROR [6/7] RUN pip install --no-cache-dir -r /requirements.txt                                                  20.7s
------
 > [6/7] RUN pip install --no-cache-dir -r /requirements.txt:
#0 0.573 Collecting openai-whisper==20230124
#0 0.694   Downloading openai-whisper-20230124.tar.gz (1.2 MB)
#0 1.002      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 3.8 MB/s eta 0:00:00
#0 1.059   Preparing metadata (setup.py): started
#0 1.410   Preparing metadata (setup.py): finished with status 'done'
#0 1.449 Collecting pytube==12.1.2
#0 1.473   Downloading pytube-12.1.2-py3-none-any.whl (57 kB)
#0 1.489      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.0/57.0 kB 3.6 MB/s eta 0:00:00
#0 1.726 Collecting SQLAlchemy==2.0.3
#0 1.772   Downloading SQLAlchemy-2.0.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.8 MB)
#0 4.235      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.8/2.8 MB 1.2 MB/s eta 0:00:00
#0 4.313 Collecting streamlit==1.18.1
#0 4.337   Downloading streamlit-1.18.1-py2.py3-none-any.whl (9.6 MB)
#0 8.298      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.6/9.6 MB 2.4 MB/s eta 0:00:00
#0 8.548 Collecting numpy
#0 8.575   Downloading numpy-1.24.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.0 MB)
#0 20.32      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 1.1 MB/s eta 0:00:00
#0 20.41 ERROR: Ignored the following versions that require a different python version: 0.55.2 Requires-Python <3.5; 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
#0 20.41 ERROR: Could not find a version that satisfies the requirement torch (from openai-whisper) (from versions: none)
#0 20.41 ERROR: No matching distribution found for torch
------
failed to solve: executor failed running [/bin/sh -c pip install --no-cache-dir -r /requirements.txt]: exit code: 1

This could possibly be related to me running with an M1 mac?

Please let me know if there is anything I can do to help debug here.

Feature request Output text as txt, vtt and srt

Thanks for develop this cool whisper UI project, would like to ask if possible to output the result as .txt, .vtt and .srt ?

It's already in whisper/transcribe.py:

` for audio_path in args.pop("audio"):
result = transcribe(model, audio_path, temperature=temperature, **args)

    audio_basename = os.path.basename(audio_path)

    # save TXT
    with open(os.path.join(output_dir, audio_basename + ".txt"), "w", encoding="utf-8") as txt:
        write_txt(result["segments"], file=txt)

    # save VTT
    with open(os.path.join(output_dir, audio_basename + ".vtt"), "w", encoding="utf-8") as vtt:
        write_vtt(result["segments"], file=vtt)

    # save SRT
    with open(os.path.join(output_dir, audio_basename + ".srt"), "w", encoding="utf-8") as srt:
        write_srt(result["segments"], file=srt)`

ffmpeg Issue

Getting this issue :

[Errno 2] No such file or directory: 'ffmpeg'

Anyone knows the solution here?

Solving environment: failed

I am trying to run it on windows 11 and I am getting the error below:

conda env create -f environment.yml


Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - zlib==1.2.13=h5eee18b_0
  - bzip2==1.0.8=h7b6447c_0
  - libffi==3.4.2=h6a678d5_6
  - python==3.11.0=h7a1cb2a_2
  - certifi==2022.9.24=py311h06a4308_0
  - tk==8.6.12=h1ccaba5_0
  - pip==22.2.2=py311h06a4308_0
  - libgcc-ng==11.2.0=h1234567_1
  - openssl==1.1.1s=h7f8727e_0
  - ncurses==6.4=h6a678d5_0
  - libstdcxx-ng==11.2.0=h1234567_1
  - xz==5.2.10=h5eee18b_1
  - readline==8.2=h5eee18b_0
  - ca-certificates==2023.01.10=h06a4308_0
  - sqlite==3.40.1=h5082296_0
  - setuptools==65.5.0=py311h06a4308_0
  - _openmp_mutex==5.1=1_gnu
  - libuuid==1.41.5=h5eee18b_0
  - libgomp==11.2.0=h1234567_1
  - ld_impl_linux-64==2.38=h1181459_1

Any tips on how to get it installed?

Apparent problem with certain special characters in File/ Youtube Video name

New to your app, very happy with it so far.

I have had an instance happen twice where a file throws an error. After some very rough testing it seems like its related to the Colon in each title.

I got the following error with this URL: https://youtu.be/qQ84delhpuw

Traceback:

File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "E:\github\whisper-ui\app\01_🏠_Home.py", line 71, in
media_manager.add(
File "E:\github\whisper-ui\app\core.py", line 165, in add
self._create(source_list=source_list, source_type=source_type, **whisper_args)
File "E:\github\whisper-ui\app\core.py", line 117, in _create
yc.streams.get_by_itag(140).download(save_dir, filename=save_filename)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\streams.py", line 298, in download
file_path = self.get_file_path(
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\streams.py", line 349, in get_file_path
return os.path.join(target_directory(output_path), filename)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\helpers.py", line 254, in target_directory
os.makedirs(output_path, exist_ok=True)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\os.py", line 225, in makedirs
mkdir(name, mode)

Similar issue with this URL: https://youtu.be/uYI1PpMElgM

Traceback:

File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "E:\github\whisper-ui\app\01_🏠_Home.py", line 71, in
media_manager.add(
File "E:\github\whisper-ui\app\core.py", line 165, in add
self._create(source_list=source_list, source_type=source_type, **whisper_args)
File "E:\github\whisper-ui\app\core.py", line 117, in _create
yc.streams.get_by_itag(140).download(save_dir, filename=save_filename)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\streams.py", line 298, in download
file_path = self.get_file_path(
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\streams.py", line 349, in get_file_path
return os.path.join(target_directory(output_path), filename)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\site-packages\pytube\helpers.py", line 254, in target_directory
os.makedirs(output_path, exist_ok=True)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "C:\Users\carter\AppData\Local\Programs\Python\Python310\lib\os.py", line 225, in makedirs
mkdir(name, mode)

[Feature Request] Ability to edit transcript

Hi,

I've recently installed and playing about with Whisper-UI. It seems like a great tool and will come in handy.

One thing i think would be handy is to be able to edit the transcript manually within the web ui, as it picked up a few things wrong and being able to listen and update the transcript from the web ui would be handy.

Thanks

error report

Hi, I got a new file yesterday and pressed get sample video, but it's like that, what's the error?

UnicodeDecodeError: 'cp949' codec can't decode byte 0xf0 in position 60: illegal multibyte sequence
Traceback:
File "C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\frogbase-main\ui\01_🏠_Home.py", line 255, in
fb.demo()
File "C:\frogbase-main\frogbase\core.py", line 376, in demo
self.add(sources, audio_only=False).transcribe(ignore_captioned=False).embed(overwrite=False).index()
File "C:\frogbase-main\frogbase\core.py", line 246, in add
self._media_buffer = self.media.add(sources, **opts)
File "C:\frogbase-main\frogbase\media.py", line 501, in add
added_media += self._add_from_web(source, **opts)
File "C:\frogbase-main\frogbase\media.py", line 278, in add_from_web
infodict = json.load(f)
File "C:\Users\seo\AppData\Local\Programs\Python\Python310\lib\json_init
.py", line 293, in load
return loads(fp.read(),

Winerror 2 The system cannot find the file specified

I keep getting this error after upload the mp3 file:

File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
File "D:\AI-Generators\whisper-ui\app\01_🏠_Home.py", line 77, in <module>
    media_manager.add(
File "D:\AI-Generators\whisper-ui\app\core.py", line 169, in add
    self._create(source_list=source_list, source_type=source_type, **whisper_args)
File "D:\AI-Generators\whisper-ui\app\core.py", line 151, in _create
    self._transcribe_and_save(media_obj, whisper_model, **whisper_args)
File "D:\AI-Generators\whisper-ui\app\core.py", line 64, in _transcribe_and_save
    transcript = self._transcribe(media_obj.filepath, whisper_model, **whisper_args)
File "D:\AI-Generators\whisper-ui\app\core.py", line 54, in _transcribe
    transcript = transcriber.transcribe(
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\whisper\transcribe.py", line 84, in transcribe
    mel = log_mel_spectrogram(audio)
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\whisper\audio.py", line 111, in log_mel_spectrogram
    audio = load_audio(audio)
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\ffmpeg\_run.py", line 313, in run
    process = run_async(
File "D:\AI-Generators\whisper-ui\venv\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
    return subprocess.Popen(
File "C:\Program Files\Python310\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\Python310\lib\subprocess.py", line 1440, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,

Cant run it

File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\Users\Arthas\Downloads\frogbase-main\ui\01_🏠_Home.py", line 12, in
init_session(st.session_state)
File "C:\Users\Arthas\Downloads\frogbase-main\ui\config.py", line 65, in init_session
session_state.fb = FrogBase(datadir=DATADIR, library=library, verbose=VERBOSE, dev=DEV, persist=True)
File "C:\Users\Arthas\Downloads\frogbase-main\frogbase\core.py", line 109, in init
self.library = self._default_library
File "C:\Users\Arthas\Downloads\frogbase-main\frogbase\core.py", line 139, in library
self._initdb()
File "C:\Users\Arthas\Downloads\frogbase-main\frogbase\core.py", line 188, in _initdb
db_meta = self._db.get(Query().type == "meta") if self._db else None
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\tinydb\database.py", line 268, in len
return len(self.table(self.default_table_name))
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\tinydb\table.py", line 645, in len
return len(self._read_table())
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\tinydb\table.py", line 704, in _read_table
tables = self.storage.read()
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\site-packages\tinydb\storages.py", line 136, in read
return json.load(self.handle)
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\json_init
.py", line 293, in load
return loads(fp.read(),
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\json_init
.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Arthas\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)

Wrong language detected

Quite a few times whisper detects the wrong language when asked to translate from not english. Would it be possible to get an option to select the language that the audio is in? Could be useful for both foreign transcribe and translate tasks

File uploads fail to transcribe unless they are MP4 format.

Trying to upload a PCM wav file, MP3 audio fail to process due to error. (Uploading a wav file gives a codec error) however converting the file to mp4 the process works.

  Metadata:
    encoder         : Lavf56.40.101
  Duration: 00:08:55.75, start: 0.138125, bitrate: 64 kb/s
  Stream #0:0: Audio: mp3, 8000 Hz, mono, fltp, 64 kb/s
[mp4 @ 0x7f970a304ec0] track 0: muxing mp3 at 8000hz is not standard, to mux anyway set strict to -1
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:0 --
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
    Last message repeated 1 times
2023-01-25 11:30:51.555 Uncaught app exception
Traceback (most recent call last):
  File "/Users/ben/Developer/whisper-ui/.venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "/Users/ben/Developer/whisper-ui/01_Transcribe.py", line 66, in <module>
    st.session_state.transcription = Transcription(name, input_file, "file", start, duration)
  File "/Users/ben/Developer/whisper-ui/transcriber.py", line 54, in __init__
    ffmpeg.run(audio, overwrite_output=True)
  File "/Users/ben/Developer/whisper-ui/.venv/lib/python3.10/site-packages/ffmpeg/_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

The current work around is to convert the file to wav then back to mp4

ffmpeg -i src.mp3 audio.wav
ffmpeg -i audio.wav audio.mp4

Error while deploying with Portainer

Hi,

I'm trying to deploy the docker image using the compose file available with a few changes -

version: "3.9"

services:
  whisper-ui:
    image: hayabhay/whisper-ui
    build: .
    container_name: whisper-ui
    networks:
      - ai
    volumes:
      - /srv/dev-disk-by-uuid-6bcdc785-ff84-4338-824f-ac76e5d9695c/docker/whisper/data:/data
    restart: unless-stopped
    ports:
      - 1010:8501

networks:
  ai:
    external: true

But I keep getting this error from Portainer -
failed to deploy a stack: whisper-ui Pulling whisper-ui Warning failed to solve: failed to read dockerfile: open /var/lib/docker/tmp/buildkit-mount3601388643/Dockerfile: no such file or directory

What am I doing wrong?

Feature Request: Import Transcripts

Hello, thanks for this awesome project!

I've been working on transcribing a podcast that I listen to with Whisper.cpp and have about 2k worth of hours transcribed already. Is there any way to easily import my pre-existing transcriptions into Whisper-ui?

Maybe have the app scan for new directories in the media folder? It would be easy to copy data to it and rename the files to the format that is currently used.

CPU Dynamic Quantization

Would it be possible for you guys to add an option to enable dynamic quantization of the model when it's being run on a CPU? This would greatly improve the run-time performance of the OpenAI Whisper model (CPU-only) with minimal to no loss in performance.

The benchmarks for this are available here.

The implementation only requires adding a few lines of code using features which are already built into PyTorch.

Implementation

Quantization of the Whisper model requires changing the Linear()
layers within the model to nn.Linear(). This is because you need
to specifiy which layer types to dynamically quantize, such as:

quantized_model = torch.quantization.quantize_dynamic(
    model_fp32, {torch.nn.Linear}, dtype=torch.qint8
)

However the whisper model is designed to be adaptable, i.e.
it can run at different precisions, so the Linear() layer contains
custom code to account for this. However, this is not required for
the quantized model. You can either change the Linear() layers in
"/whisper/whisper/model.py" yourself (i.e. create a fork of OpenAI-Whisper
which would be compatible with future merges), or you can use
mine from here.

Docker image

A Docker image would be great to make it easier to deploy.

A draft for the quantized version's Dockerfile could look like

FROM python:latest
COPY . .
RUN apt-get update
RUN apt-get install -y ffmpeg
RUN pip install streamlit
RUN pip install setuptools-rust
# RUN pip install git+https://github.com/openai/whisper.git
RUN pip install git+https://github.com/MiscellaneousStuff/whisper.git

ENV PATH="$HOME/.cargo/bin:$PATH"
EXPOSE 8501
VOLUME /data/.whisper_settings.json
CMD streamlit run app/01_🏠_Home.py

I'm trying to figure out what directory is used for the data.

Edit: EXPOSE instead of PORT

Translation feature request

Given that the Whisper model already supports translation, it would be very cool to also have it supported inside this app.

Feature request.

Hey thanks for creating this… I have a question … do you have any plans for implementing speaker diarization as well.

thank!

Error when uploading local files

Any local file I try to upload, whether audio or video, I get this error:

DownloadError: οΏ½[0;31mERROR:οΏ½[0m [generic] None: Unable to download webpage: (caused by URLError('unknown url type: c'))
Traceback:
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\Users\Anshul\Downloads\frogbase-main\ui\01_🏠_Home.py", line 103, in
fb.add(sources, **opts).transcribe(ignore_captioned=False).embed().index()
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\Downloads\frogbase-main\frogbase\core.py", line 246, in add
self._media_buffer = self.media.add(sources, **opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\Downloads\frogbase-main\frogbase\media.py", line 499, in add
added_media += self._add_from_web(source, **opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\Downloads\frogbase-main\frogbase\media.py", line 264, in _add_from_web
ydl.download(url)
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 3485, in download
self.__download_wrapper(self.extract_info)(
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 3460, in wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 1549, in extract_info
return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 1578, in wrapper
self.report_error(str(e), e.format_traceback())
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 1042, in report_error
self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
File "C:\Users\Anshul\AppData\Local\Programs\Python\Python311\Lib\site-packages\yt_dlp\YoutubeDL.py", line 981, in trouble
raise DownloadError(message, exc_info)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.