Coder Social home page Coder Social logo

stream-translator's Introduction

stream-translator

Command line utility to transcribe or translate audio from livestreams in real time. Uses streamlink to get livestream URLs from various services and OpenAI's whisper for transcription/translation. This script is inspired by audioWhisper which transcribes/translates desktop audio.

Prerequisites

  1. Install and add ffmpeg to your PATH
  2. Install CUDA on your system. If you installed a different version of CUDA than 11.3, change cu113 in requirements.txt accordingly. You can check the installed CUDA version with nvcc --version.

Setup

  1. Setup a virtual environment.
  2. git clone https://github.com/fortypercnt/stream-translator.git
  3. pip install -r requirements.txt
  4. Make sure that pytorch is installed with CUDA support. Whisper will probably not run in real time on a CPU.

Command-line usage

python translator.py URL --flags

By default, the URL can be of the form twitch.tv/forsen and streamlink is used to obtain the .m3u8 link which is passed to ffmpeg. See streamlink plugins for info on all supported sites.

--flags Default Value Description
--model small Select model size. See here for available models.
--task translate Whether to transcribe the audio (keep original language) or translate to english.
--language auto Language spoken in the stream. See here for available languages.
--interval 5 Interval between calls to the language model in seconds.
--history_buffer_size 0 Seconds of previous audio/text to use for conditioning the model. Set to 0 to just use audio from the last interval. Note that this can easily lead to repetition/loops if the chosen language/model settings do not produce good results to begin with.
--beam_size 5 Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate).
--best_of 5 Number of candidates when sampling with non-zero temperature.
--preferred_quality audio_only Preferred stream quality option. "best" and "worst" should always be available. Type "streamlink URL" in the console to see quality options for your URL.
--disable_vad Set this flag to disable additional voice activity detection by Silero VAD.
--direct_url Set this flag to pass the URL directly to ffmpeg. Otherwise, streamlink is used to obtain the stream URL.
--use_faster_whisper Set this flag to use faster_whisper implementation instead of the original OpenAI implementation
--faster_whisper_model_path whisper-large-v2-ct2/ Path to a directory containing a Whisper model in the CTranslate2 format.
--faster_whisper_device cuda Set the device to run faster-whisper on.
--faster_whisper_compute_type float16 Set the quantization type for faster_whisper. See here for more info.

Using faster-whisper

faster-whisper provides significant performance upgrades over the original OpenAI implementation (~ 4x faster, ~ 2x less memory). To use it, follow the instructions here to install faster-whisper and convert your models to CTranslate2 format. Then you can run the CLI with --use_faster_whisper and set --faster_whisper_model_path to the location of your converted model.

stream-translator's People

Contributors

fortypercnt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

stream-translator's Issues

Some way to allow for real time noise reduction, or direct audio input?

This is a pretty sweet repo, been using it a couple times a week recently. Faster-whisper lets you actually run the large model in real time with good latency on a 3090. Actually it's even more insane, I run TWO LARGE MODELS AT THE SAME TIME, two stream-translators, so that I can have dual subtitles: one transcribed, one translated. It works fine 3090 as long as you are just doing normal desktop stuff! Wild.

But when streams have a lot of background noise (music, game sounds) I found you NEED to add some decent real-time noise reduction or Whisper just faceplants over and over.

Mainly I've used the Nvidia Broadcast tool to do this in real time, with a virtual cable if needed to get the audio routed correctly. Whisper is back at full power if I do this. But since stream-translator streams the audio directly I have to use something else instead.

If this could take in mic/speaker device audio alternative to streamlink, that would do it. Using this option loses the simplicity and latency benefits of streaming directly, but the alternative is Whisper collapsing in confusion on some streams. I know other repos already take in direct audio but ideally I want to stop bouncing between them...

Maybe there's a more elegant way to accept direct audio input that doesn't require a wacky virtual-cable or whatever? OBS studio integrates NVIDIA noise reduction via the broadcast SDK. Or there could be a good open source solution. I tried a couple but none of the real-times were close to good enough, compared to the NVIDIA version.

Kick/Other Platforms support

Hi, first of all, I want to thank you for creating this project, it's truly excellent.
Second, I'd like to know if it's possible to add support for other websites, such as Kick or any .m3u8 stream. It's something I think would be fantastic! Due to my limited knowledge, I'm not sure how to do this properly, which is why I'm suggesting it in case you consider doing it in the future.

Your Repo is one of the best at real time translation

Would you mind updating this with using faster whisper repo maybe like https://github.com/guillaumekln/faster-whisper

or maybe the use of any huggingface models https://huggingface.co/models?other=whisper similar to https://github.com/chidiwilliams/buzz/issues implementation. Maybe buzz also using a different repo not the openai one because it seems faster than the original. Maybe you want to look into that.

Of course it just suggestions. Your repo is already working fine as it is. Thank you

FileNotFoundError after lastest Silero VAD update & previous version 400Client Error

Thank you for the work you're putting into this! I'd love to buy you a coffee.

The latest pull with Silero VAD seems to have broken it for me. Unsure if it's just requirements.txt missing some new dependency, though the older version does mostly work still.

The PREVIOUS VERSION, Afreeca was throwing a 400 Client Error I was going to put an issue up for, here's those logs:

[stream.hls][error] Failed to fetch segment 7963: Unable to open URL: http://pc-web.stream.afreecatv.com/live-stmc-38/1920x1080/243027771-common-origina
l-hls_7962_000000C600FB0034C0BA4517C3B4E1CB.TS (400 Client Error: Bad Request for url: http://pc-web.stream.afreecatv.com/live-stmc-38/1920x1080/2430277
71-common-original-hls_7962_000000C600FB0034C0BA4517C3B4E1CB.TS?aid=.A32.7bbT56vyHM9fKZk.jja6AZUtsFlnrSI8mYT8NsEWNHCWP3DgSweuAmVPY5btJPell0Rbbo-BcSDHzyf
qhryLXAQsnIC7H40UFRudQyZTFR3Hf7_fQdhgjyfFsak)

Here's my logs below for the latest version with Silero VAD:

PS J:\Projects\Python\Translator\stream-translator> python translator.py https://play.afreecatv.com/rlekfu6/243027771 --model medium --task translate --
language Korean
Loading model...
Using cache found in C:\Users\Main Account/.cache\torch\hub\snakers4_silero-vad_master
Traceback (most recent call last):
  File "J:\Projects\Python\Translator\stream-translator\translator.py", line 226, in <module>
    cli()
  File "J:\Projects\Python\Translator\stream-translator\translator.py", line 222, in cli
    main(url, **args)
  File "J:\Projects\Python\Translator\stream-translator\translator.py", line 117, in main
    vad = VAD()
  File "J:\Projects\Python\Translator\stream-translator\vad.py", line 9, in __init__
    self.model, _ = torch.hub.load(
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\hub.py", line 540, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\hub.py", line 566, in _load_local
    hub_module = _import_module(MODULE_HUBCONF, hubconf_path)
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\hub.py", line 89, in _import_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "C:\Users\Main Account/.cache\torch\hub\snakers4_silero-vad_master\hubconf.py", line 4, in <module>
    from utils_vad import (init_jit_model,
  File "C:\Users\Main Account/.cache\torch\hub\snakers4_silero-vad_master\utils_vad.py", line 2, in <module>
    import torchaudio
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\_extension.py", line 103, in <module>
    _init_extension()
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\_extension.py", line 88, in _init_extension
    _load_lib("libtorchaudio")
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\_extension.py", line 51, in _load_lib
    torch.ops.load_library(path)
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_ops.py", line 255, in load_library
    ctypes.CDLL(path)
  File "C:\Users\Main Account\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\Main Account\AppData\Local\Programs\Python\Python39\Lib\site-packages\torchaudio\lib\libtorchaudio.py
d' (or one of its dependencies). Try using the full path with constructor syntax.

RT audio from microphone

Hi,

I know it clears says in the begining of the read.me file that stream-translator transcribes/translates live audio streams from URLs.

But I still wanted to know:
Can it be used to translate/transrcibe real-time audio from the microphone of the system or PC?

feature request: use wit.ai speech to text and deepl/open ai to transtate it

Feature Request

Description of the feature you'd like:

Want to use the user's own wit.ai and deepl API key for real-time speech-to-text translation.

Feature Background:

After using it for a while, I found that there is often a translation delay issue (interval=3~5) when using the medium model.
It also frequently results in blank spaces.

I don't know if it's due to the delay in voice recognition or incorrect identification of language type that causes the translation failure.

And English is not my native language. After receiving English, I need to spend some time converting it into my native language. So I hope to increase the variety of translation languages.

Proposed Solution

  • speech-to-text: Use wit.ai to convert audio files into text wit.ai docs

    • Free to use
    • Users can customize the unique language corresponding to the API token, so as not to cause incorrect language identification.
    • The recognition speed is very fast and accurate.
      (I use it to identify Google reCAPTCHA voice verification, which is very fast and accurate.)
  • transalte: use deepl or chatGPT to translate to user target language

    • Deepl free api and GPT-3.5 turbo is free to use
    • Can set target language by user (for me: KO (text from wit.ai) -> ZH)

Allow ability to specify stream quality within stream translator

With certain Streamlink websites/plugins, the quality needs to be specified else stream-translator just outputs "Stream ended"

Came across this using the afreeca Streamlink plugin specifically. An option to chose any stream quality or a default quality would be great!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.