fortypercnt / stream-translator Goto Github PK

License: MIT License

Python 100.00%

stream-translator's Introduction

stream-translator

Command line utility to transcribe or translate audio from livestreams in real time. Uses streamlink to get livestream URLs from various services and OpenAI's whisper for transcription/translation. This script is inspired by audioWhisper which transcribes/translates desktop audio.

Prerequisites

Install and add ffmpeg to your PATH
Install CUDA on your system. If you installed a different version of CUDA than 11.3, change cu113 in requirements.txt accordingly. You can check the installed CUDA version with nvcc --version.

Setup

Setup a virtual environment.
git clone https://github.com/fortypercnt/stream-translator.git
pip install -r requirements.txt
Make sure that pytorch is installed with CUDA support. Whisper will probably not run in real time on a CPU.

Command-line usage

python translator.py URL --flags

By default, the URL can be of the form twitch.tv/forsen and streamlink is used to obtain the .m3u8 link which is passed to ffmpeg. See streamlink plugins for info on all supported sites.

--flags	Default Value	Description
`--model`	small	Select model size. See here for available models.
`--task`	translate	Whether to transcribe the audio (keep original language) or translate to english.
`--language`	auto	Language spoken in the stream. See here for available languages.
`--interval`	5	Interval between calls to the language model in seconds.
`--history_buffer_size`	0	Seconds of previous audio/text to use for conditioning the model. Set to 0 to just use audio from the last interval. Note that this can easily lead to repetition/loops if the chosen language/model settings do not produce good results to begin with.
`--beam_size`	5	Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate).
`--best_of`	5	Number of candidates when sampling with non-zero temperature.
`--preferred_quality`	audio_only	Preferred stream quality option. "best" and "worst" should always be available. Type "streamlink URL" in the console to see quality options for your URL.
`--disable_vad`		Set this flag to disable additional voice activity detection by Silero VAD.
`--direct_url`		Set this flag to pass the URL directly to ffmpeg. Otherwise, streamlink is used to obtain the stream URL.
`--use_faster_whisper`		Set this flag to use faster_whisper implementation instead of the original OpenAI implementation
`--faster_whisper_model_path`	whisper-large-v2-ct2/	Path to a directory containing a Whisper model in the CTranslate2 format.
`--faster_whisper_device`	cuda	Set the device to run faster-whisper on.
`--faster_whisper_compute_type`	float16	Set the quantization type for faster_whisper. See here for more info.

Using faster-whisper

faster-whisper provides significant performance upgrades over the original OpenAI implementation (~ 4x faster, ~ 2x less memory). To use it, follow the instructions here to install faster-whisper and convert your models to CTranslate2 format. Then you can run the CLI with --use_faster_whisper and set --faster_whisper_model_path to the location of your converted model.

stream-translator's People

Contributors

Stargazers

Watchers

stream-translator's Issues

how can i use it to transcribe desktop audio?

Some way to allow for real time noise reduction, or direct audio input?

This is a pretty sweet repo, been using it a couple times a week recently. Faster-whisper lets you actually run the large model in real time with good latency on a 3090. Actually it's even more insane, I run TWO LARGE MODELS AT THE SAME TIME, two stream-translators, so that I can have dual subtitles: one transcribed, one translated. It works fine 3090 as long as you are just doing normal desktop stuff! Wild.

But when streams have a lot of background noise (music, game sounds) I found you NEED to add some decent real-time noise reduction or Whisper just faceplants over and over.

Mainly I've used the Nvidia Broadcast tool to do this in real time, with a virtual cable if needed to get the audio routed correctly. Whisper is back at full power if I do this. But since stream-translator streams the audio directly I have to use something else instead.

If this could take in mic/speaker device audio alternative to streamlink, that would do it. Using this option loses the simplicity and latency benefits of streaming directly, but the alternative is Whisper collapsing in confusion on some streams. I know other repos already take in direct audio but ideally I want to stop bouncing between them...

Maybe there's a more elegant way to accept direct audio input that doesn't require a wacky virtual-cable or whatever? OBS studio integrates NVIDIA noise reduction via the broadcast SDK. Or there could be a good open source solution. I tried a couple but none of the real-times were close to good enough, compared to the NVIDIA version.

Kick/Other Platforms support

Hi, first of all, I want to thank you for creating this project, it's truly excellent.
Second, I'd like to know if it's possible to add support for other websites, such as Kick or any .m3u8 stream. It's something I think would be fantastic! Due to my limited knowledge, I'm not sure how to do this properly, which is why I'm suggesting it in case you consider doing it in the future.

streamlink not included in requirements.txt

streamlink modules didn't included in the requirements.txt

Your Repo is one of the best at real time translation

Would you mind updating this with using faster whisper repo maybe like https://github.com/guillaumekln/faster-whisper

or maybe the use of any huggingface models https://huggingface.co/models?other=whisper similar to https://github.com/chidiwilliams/buzz/issues implementation. Maybe buzz also using a different repo not the openai one because it seems faster than the original. Maybe you want to look into that.

Of course it just suggestions. Your repo is already working fine as it is. Thank you

FileNotFoundError after lastest Silero VAD update & previous version 400Client Error

Thank you for the work you're putting into this! I'd love to buy you a coffee.

The latest pull with Silero VAD seems to have broken it for me. Unsure if it's just requirements.txt missing some new dependency, though the older version does mostly work still.

The PREVIOUS VERSION, Afreeca was throwing a 400 Client Error I was going to put an issue up for, here's those logs:

[stream.hls][error] Failed to fetch segment 7963: Unable to open URL: http://pc-web.stream.afreecatv.com/live-stmc-38/1920x1080/243027771-common-origina
l-hls_7962_000000C600FB0034C0BA4517C3B4E1CB.TS (400 Client Error: Bad Request for url: http://pc-web.stream.afreecatv.com/live-stmc-38/1920x1080/2430277
71-common-original-hls_7962_000000C600FB0034C0BA4517C3B4E1CB.TS?aid=.A32.7bbT56vyHM9fKZk.jja6AZUtsFlnrSI8mYT8NsEWNHCWP3DgSweuAmVPY5btJPell0Rbbo-BcSDHzyf
qhryLXAQsnIC7H40UFRudQyZTFR3Hf7_fQdhgjyfFsak)

Here's my logs below for the latest version with Silero VAD:

PS J:\Projects\Python\Translator\stream-translator> python translator.py https://play.afreecatv.com/rlekfu6/243027771 --model medium --task translate --
language Korean
Loading model...
Using cache found in C:\Users\Main Account/.cache\torch\hub\snakers4_silero-vad_master
Traceback (most recent call last):
  File "J:\Projects\Python\Translator\stream-translator\translator.py", line 226, in <module>
    cli()
  File "J:\Projects\Python\Translator\stream-translator\translator.py", line 222, in cli
    main(url, **args)
  File "J:\Projects\Python\Translator\stream-translator\translator.py", line 117, in main
    vad = VAD()
  File "J:\Projects\Python\Translator\stream-translator\vad.py", line 9, in __init__
    self.model, _ = torch.hub.load(
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\hub.py", line 540, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\hub.py", line 566, in _load_local
    hub_module = _import_module(MODULE_HUBCONF, hubconf_path)
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\hub.py", line 89, in _import_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "C:\Users\Main Account/.cache\torch\hub\snakers4_silero-vad_master\hubconf.py", line 4, in <module>
    from utils_vad import (init_jit_model,
  File "C:\Users\Main Account/.cache\torch\hub\snakers4_silero-vad_master\utils_vad.py", line 2, in <module>
    import torchaudio
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\_extension.py", line 103, in <module>
    _init_extension()
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\_extension.py", line 88, in _init_extension
    _load_lib("libtorchaudio")
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\_extension.py", line 51, in _load_lib
    torch.ops.load_library(path)
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_ops.py", line 255, in load_library
    ctypes.CDLL(path)
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\Main Account\AppData\Local\Programs\Python\Python39\Lib\site-packages\torchaudio\lib\libtorchaudio.py
d' (or one of its dependencies). Try using the full path with constructor syntax.

How to modify the argument to translate the English to other language?

Is there a possible argument for setting the translated language instead of English?
I was looking into the code but I am not sure where I can modify it to match my thought.
Thank you for reading my question, hope there is a way to do this.

Have you considered adding this stalled project UI functionality

https://github.com/azuse/youtube-streaming-translator-python

Unable to open URL ：SSLError

when i run “python3 translator.py https://www.youtube.com/watch\?v\=eRVUD4-nZOw”，
I got an error:ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure
I attempt to ignore the ssl verify ,but cannot work.

info:
python:3.9.5

RT audio from microphone

Hi,

I know it clears says in the begining of the read.me file that stream-translator transcribes/translates live audio streams from URLs.

But I still wanted to know:
Can it be used to translate/transrcibe real-time audio from the microphone of the system or PC?

feature request: use wit.ai speech to text and deepl/open ai to transtate it

Feature Request

Description of the feature you'd like:

Want to use the user's own wit.ai and deepl API key for real-time speech-to-text translation.

Feature Background:

After using it for a while, I found that there is often a translation delay issue (interval=3~5) when using the medium model.
It also frequently results in blank spaces.

I don't know if it's due to the delay in voice recognition or incorrect identification of language type that causes the translation failure.

And English is not my native language. After receiving English, I need to spend some time converting it into my native language. So I hope to increase the variety of translation languages.

Proposed Solution

speech-to-text: Use wit.ai to convert audio files into text wit.ai docs
- Free to use
- Users can customize the unique language corresponding to the API token, so as not to cause incorrect language identification.
- The recognition speed is very fast and accurate.
  (I use it to identify Google reCAPTCHA voice verification, which is very fast and accurate.)
transalte: use deepl or chatGPT to translate to user target language
- Deepl free api and GPT-3.5 turbo is free to use
- Can set target language by user (for me: KO (text from wit.ai) -> ZH)

Allow ability to specify stream quality within stream translator

With certain Streamlink websites/plugins, the quality needs to be specified else stream-translator just outputs "Stream ended"

Came across this using the afreeca Streamlink plugin specifically. An option to chose any stream quality or a default quality would be great!