Coder Social home page Coder Social logo

purfview / whisper-standalone-win Goto Github PK

View Code? Open in Web Editor NEW
780.0 26.0 41.0 159 KB

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

openai speech-to-text transcriber whisper asr speech-recognition subtitles ctranslate2 faster-whisper whisper-faster

whisper-standalone-win's Introduction

Donate

whisper-standalone-win's People

Contributors

purfview avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whisper-standalone-win's Issues

How to do multiple files batch processing?

Since I will modify some parameters when generating video subtitles, I need to use the command line to run it. Currently, does it support multiple videos or video files in the same folder to be processed one by one?

If batch processing is possible, can you tell me how to do it.

Accuracy improvements for timestamps

Will the latest commit of faster-whisper be updated to whisper-standalone-win?

The latest commits seem to have improved timestamp accuracy.

I wish you a happy day!

I can't run whisper.exe from CMD for some reason

I followed the video provided but I keep getting this error: "'whisper.exe' is not recognized as an internal or external command,
operable program or batch file."
my whisper.exe is located at D:\FasterWhisper\whisper.exe and I'm running CMD as administrator. then, I type the following: D: -> dir -> cd fasterwhisper -> whisper.exe but to no avail.

Invalid system calls when run under Take Command Console

I use a lot of different command-lines, and yet, I don't think I've seen this happen before.

Whisper-faster.exe ends up sending commands to the command line under TCC command line, but not under CMD.EXE.

But I abandoned CMD.EXE back when it was command.com in 1988. TCC has been in constant development. So it's not some janky command line, even though most people haven't heard of it. It's really solid. So I'm wondering how this is happening.

There's some very niche incompatibility here because this is not something I've seen in decades of use.

Any idea if we can address it?

whisper-faster.exe --language en --verbose True --device cuda --model large --output_form
at all "14_The Water Is Wide.mp3"

Standalone Faster-Whisper r134 running on: CUDA

Number of visible GPU devices: 1

Supported compute types by GPU: {'float32', 'int8', 'int8_float16', 'float16'}

[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info] CPU: AuthenticAMD (SSE4.1=true, AVX=true, AVX2=true, AVX512=false)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - Selected ISA: AVX2
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - Use Intel MKL: false
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - SGEMM backend: DNNL (packed: false)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - GEMM_S16 backend: none (packed: false)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - GEMM_S8 backend: DNNL (packed: false, u8s8 preferred: true)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info] GPU #0: NVIDIA GeForce RTX 3060 (CC=8.6)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - Allow INT8: true
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - Allow FP16: true (with Tensor Cores: true)
[2023-07-06 06:48:52.199] [ctranslate2] [thread 13552] [info] Using CUDA allocator: cuda_malloc_async
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info] Loaded model C:\UTIL2\_models\faster-whisper-large-v2 on device cuda:0
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info]  - Binary version: 6
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info]  - Model specification revision: 3
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info]  - Selected compute type: int8

Model loaded in: 15.06 seconds
Estimating duration from bitrate, this may be inaccurate

Processing audio with duration 03:16.650

VAD filter removed 00:41.630 of audio
VAD filter kept the following audio segments: [00:00.000 -> 01:36.324], [02:03.836 -> 03:02.532]

Audio processing finished in: 2.18 seconds

Processing segment at 00:00.000
[2023-07-06 06:48:56.731] [ctranslate2] [thread 95492] [info] Loaded cuBLAS library version 11.11.3
[00:04.180 --> 00:20.460]  Oh, the water is wide and I can't get o'er Neither have I the wings to fly
Processing segment at 00:20.460
[00:21.420 --> 00:35.640]  Give me a boat that we'll carry to And we both shall row, my love and I
[00:36.080 --> 00:49.460]  There is a ship and she sails the sea She's loaded deep, as deep can be
Processing segment at 00:50.460
TCC: Unknown command "20"
TCC: Unknown command "00:02"
TCC: (Sys) The system cannot find the file specified.
 ""
[00:50.460 --> 01:04.520]  But not as deep as the love I'm in I know not how to sink or swim
[01:04.930 --> 01:18.080]  Oh, the water is wide and I can't get o'er Neither have I the wings to fly
Processing segment at 01:18.080
TCC: Unknown command "49"
TCC: Unknown command "00:03"
TCC: (Sys) The system cannot find the file specified.
 ""
[01:19.080 --> 01:34.060]  Give me a boat that we'll carry to And we both shall row, my love and I
Processing segment at 01:34.060
TCC: Unknown command "78"
TCC: Unknown command "00:04"
TCC: (Sys) The system cannot find the file specified.
 ""
[01:35.060 --> 02:16.570]  Love is handsome and love is fine Love is a jewel when first it's new
Processing segment at 01:49.060
TCC: Unknown command "94"
TCC: Unknown command "00:04"
TCC: (Sys) The system cannot find the file specified.
 ""
[02:17.270 --> 02:30.730]  But love grows old and in time grows cold And fades away like the summer dew
Processing segment at 02:03.220
TCC: Unknown command "137"
TCC: Unknown command "00:05"
TCC: (Sys) The system cannot find the file specified.
 ""
[02:31.730 --> 02:45.450]  Oh, the water is wide and I can't get o'er Neither have I the wings to fly
Processing segment at 02:17.940
TCC: Unknown command "151"
TCC: Unknown command "00:05"
TCC: (Sys) The system cannot find the file specified.
 ""
[02:45.450 --> 03:00.650]  Give me a boat that we'll carry to And we both shall row, my love and I


Transcription speed: 27.12 audio seconds/s

Operation finished in: 24 seconds

TCC: Unknown command "165"
TCC: Unknown command "00:06"
TCC: (Sys) The system cannot find the file specified.
 ""
TCC: Unknown command "165"
TCC: Unknown command "00:07"
TCC: (Sys) The system cannot find the file specified.
 ""

Visually, here's what it looks like under CMD.EXE -- it works just fine:

image

Yet under TCC.EXE, I get this:
image

It's sending the timestamps straight to the command line?!?!

Is this something I could possibly be helped with?

Faster-Whisper crash - ucrtbase.dll

I tried to transcribe audio to text from a video. The Faster-Whisper setting only transcribes 1 minute of video. I tried medium and large models with the same result.

The base model works fine in Faster-Whisper setting.

The medium model works fine in CPP.

I tried to reinstall the Microsoft Visual C++ Redistributable with this tool below, but without success. The application only transcribes the first minute of the video.

https://github.com/abbodi1406/vcredist

Thanks for your help.

Skip files if it already has subtitles

I used recursive on a folder.

It has 7 videos.

4 of them already have .srt files yet it still generates for it.

I'd rather have it skip them if the filename & subtitle name match.

for eg, if i have 3 videos in a folder like;

a.mp4
a.srt
b.mp4
b.srt
c.mp4

only generate subtitles for 1 video like:

c.mp4

Doesn't make sense to do everyone.

I think this should be on by default & someone can switch it off with a flag if they want to regenerate subtitles for everything.

compute type warning float16

[ctranslate2] [thread 6280] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

Ran large-v2 downloaded by your program. Command was "--language English --model large -f srt". I didn't have this issue with older releases. CUDA is fine, cuBLAS and cuDNN libs in whisper folder. Please help

endless loop

hi i use large-v1 because it has more accuracy than large v2
i get this loop
--> 01:50:20,070
line

2626
01:50:20,070 --> 01:50:22,070
line

2627
01:50:22,070 --> 01:50:24,070
line

2628
01:50:24,070 --> 01:50:26,070
line

2629
01:50:26,070 --> 01:50:28,070
line

2630
01:50:28,070 --> 01:50:30,070
اخواني

2631
01:50:30,070 --> 01:50:32,070
اخواني

2632
01:50:32,070 --> 01:50:34,070
shot

2633
01:50:34,070 --> 01:50:36,070
shot

2634
01:50:36,070 --> 01:50:38,070
shot

2635
01:50:38,070 --> 01:50:40,070
shot

2636
01:50:40,070 --> 01:50:42,070
shot

2637
01:50:42,070 --> 01:50:44,070
shot

2638
01:50:44,070 --> 01:50:46,070
shot

2639
01:50:46,070 --> 01:50:48,070
shot

2640
01:50:48,070 --> 01:50:50,070
shot

2641
01:50:50,070 --> 01:50:52,070
shot

2642
01:50:52,070 --> 01:50:54,070
shot

2643
01:50:54,070 --> 01:50:56,070
shot

2644
01:50:56,070 --> 01:50:58,070
shot

2645
01:50:58,070 --> 01:51:00,070
shot

2646
01:51:00,070 --> 01:51:02,070
shot

2647
01:51:02,070 --> 01:51:04,070
shot

2648
01:51:04,070 --> 01:51:06,070
shot

2649
01:51:06,070 --> 01:51:08,070
shot

2650
01:51:08,070 --> 01:51:10,070
shot

2651
01:51:10,070 --> 01:51:12,070
shot

2652
01:51:12,070 --> 01:51:14,070
shot

2653
01:51:14,070 --> 01:51:16,070
shot

2654
01:51:16,070 --> 01:51:18,070
shot

2655
01:51:18,070 --> 01:51:20,070
shot

2656
01:51:20,070 --> 01:51:22,070
shot

2657
01:51:22,070 --> 01:51:24,070
shot

2658
01:51:24,070 --> 01:51:26,070
shot

2659
01:51:26,070 --> 01:51:28,070
shot

2660
01:51:28,070 --> 01:51:30,070
shot

2661
01:51:30,070 --> 01:51:32,070
shot

2662
01:51:32,070 --> 01:51:34,070
shot

2663
01:51:34,070 --> 01:51:36,070
shot

2664
01:51:36,070 --> 01:51:38,070
shot

2665
01:51:38,070 --> 01:51:40,070
shot

2666
01:51:40,070 --> 01:51:42,070
shot

2667
01:51:42,070 --> 01:51:44,070
shot

2668
01:51:44,070 --> 01:51:46,070
shot

2669
01:51:46,070 --> 01:51:48,070
shot

2670
01:51:48,070 --> 01:51:50,070
shot

2671
01:51:50,070 --> 01:51:52,070
shot

2672
01:51:52,070 --> 01:51:54,070
shot

2673
01:51:54,070 --> 01:51:56,070
shot

Error: no such file or directory

I downloaded Whisper-Faster r126 and replaced the original version, but when I run it, the following error message appears.

QQ截图20230617101548

I don't know how to solve it, I need your help

More memory usage for r134.6?

I was using version 126 without any problems, and have now switched to version 136.
On videos that were correctly processed previously, I now have errors. The last message mentions a memory problem.
Does the latest revision use more memory?
Every video file have this same error at some point of the transcript:

Processing segment at 00:27.680
Compression ratio threshold is not met with temperature 0.0 (4.674074 > 2.400000)
Traceback (most recent call last):
  File "D:\whisper-fast\__main__.py", line 615, in <module>
  File "D:\whisper-fast\__main__.py", line 565, in cli
  File "faster_whisper\transcribe.py", line 869, in restore_speech_timestamps
  File "faster_whisper\transcribe.py", line 401, in generate_segments
  File "faster_whisper\transcribe.py", line 603, in generate_with_fallback
RuntimeError: CUDA failed with error out of memory
[21292] Failed to execute script '__main__' due to unhandled exception!

I'm going back to r126 for now.

Error each time I transcribe?

every time I transcribe I get this error "2023-05-25 13:45:37.1493312 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.". It still transcribes and everything seems to be working but i'm curious as to what is this error?

It is not clear how to run this program

image

I click this file and popup window is open and close after 1 second and the guide in the main page are not clear of how to run this program. It will be nice if you make a short video of how to run it.

Thanks!

r126 | word_timestamps not working?

Hi,

r126 did wonders with timestamps! But it looks like word_timestamps is not working.
When running with --word_timestamps True I get results as if it wasn't used.

Need text file without timecodes

Hello! Tell me, what's the matter? It does not work for me. I have software installed, separately whaster-whisper is launched through python, but it won't work through yours. Help me please!
Uploading 453.jpg…

smaller executables.

Hi,
is it possible to release a win64 cuda executable without additional files (models, cuda libraries etc) because it's a huge file right now and I'm from a third world country with a bad internet connection, and I already downloaded the models and installed cuda.

configurable model dir?

This is handy to run fast whisper from any folder. However, it will re-download model files now. Can you add a way to specify the fast whisper model dir?

** I tried [--model_dir MODEL_DIR] but it doesn't prevent downloads.

Thanks.

Suggestion to add standalone executable for WhisperX

I would like to suggest to add a standalone executable for WhisperX (https://github.com/m-bain/whisperX).

Currently, the results of WhisperX v3, are worse than those of v2. So if you add a standalone executable for WhisperX, I would suggest to add a standalone executable for WhisperX v2 (if possible), or to wait until the results of WhisperX v3, are not worse anymore compared to those of v2.

Feature --silent and TPU support

Currently, I am using R134.6, and I would like to inquire about any plans to add a new feature called "--silent" or something similar. Instead of displaying the entire text, it would be convenient if the program could simply show the progress as a percentage, for example, "10% completed."

Additionally, I wanted to ask the potential profitability of supporting TPU for your project, similar to "Whisper JAX." Do you think it would be beneficial to incorporate TPU support into your system?

Thank you.

Ability to manage a live stream for next release ?

Hi,

First of all I would like to congratulate you on the great job you have done.
I used the latest v145.3 with cuda. it worked fine on several files.
I was wondering if it would be possible to add the ability to manage a live stream in the next version.

Best regards

where can I tip ?

how many hours did you spend on this project, can I pay you for your time?

thank you so much for your help, this is an amazing integration for non-tech-savvy users !

Transcription stopped halfway

I downloaded this 27 min Youtube video (uploaded it here).

I run the transcription using this code
whisper-faster "C:\Users\ntnha\Videos\4K Video Downloader\Carl Sagan Astronomer of the People.mp4" --language en --model large-v2 --batch_recursive true

and it stopped at [13:15.860 --> 13:18.860] His greatest achievement was just around the corner.

I downloaded the mp3 file from that YouTube video (uploaded it here)
whisper-faster "C:\Users\ntnha\Videos\4K Video Downloader\Carl Sagan Astronomer of the People.mp3" --language en --model large-v2 --batch_recursive true

and it was able to run to [26:44.760 --> 26:46.180] might have been enough.

Interestingly, it didn't transcribe the advertisement at the beginning and at the end of the video.

Could not load library

I did try downloading "cudnn_ops_infer64_8.dll" from nvidea homepage and put it in base-folder, tried creating lib-folder and library-folder but it gives the same error. I don't know what version I need either, and I don't find any information about this.

`Model not found at: C:\Whisper-Faster_r145.1\Whisper-Faster_models\faster-whisper-medium
Attempting to download:

Downloading (…)90c7f7fb/config.json: 100%|████████████████████████████████████████████████| 2.26k/2.26k [00:00<?, ?B/s]
Downloading (…)7f7fb/vocabulary.txt: 100%|██████████████████████████████████████████| 460k/460k [00:00<00:00, 2.92MB/s]
Downloading (…)7f7fb/tokenizer.json: 100%|████████████████████████████████████████| 2.20M/2.20M [00:00<00:00, 3.26MB/s]
Downloading model.bin: 100%|██████████████████████████████████████████████████████| 1.53G/1.53G [02:06<00:00, 12.0MB/s]
Downloading model.bin: 100%|██████████████████████████████████████████████████████| 1.53G/1.53G [02:06<00:00, 11.5MB/s]
Standalone Faster-Whisper r145.1 running on: CUDA

Starting transcription on: D:\TEST.mkv

Could not load library cudnn_ops_infer64_8.dll. Error code 126
Please make sure cudnn_ops_infer64_8.dll is in your library path!`

transcription yielded zero words

Hi,

I started getting some new strange behavior, suddenly consistently getting a 0-byte LRC file when doing whisper-faster.exe on this new file...

I uploaded the WAV file here:
https://mega.nz/file/QYlFGISC#ci5c9b_i0J7bWDX_8NKvWP9JYOfZzZPJFdUwbTuqepk

 whisper-faster.exe --verbose True --language en --threads 12 --device cuda --model large-v2 --output_dir "T:\new\MUSIC\Ween\Ween Unreleased Anthology\3. Demos & Tapes\3. 12GCG Demos and Sessions\2. 12GCG Outtakes" --output_format lrc  --vad_filter False     --beam_size 5  "2. So Long Jerry (12 GCG outtake).vocals.wav"

Standalone Faster-Whisper r134+++ running on: CUDA

Number of visible GPU devices: 1

Supported compute types by GPU: {'float32', 'float16', 'int8', 'int8_float16'}

[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info] CPU: AuthenticAMD (SSE4.1=true, AVX=true, AVX2=true, AVX512=false)
[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info]  - Selected ISA: AVX2
[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info]  - Use Intel MKL: false
[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info]  - SGEMM backend: DNNL (packed: false)
[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info]  - GEMM_S16 backend: none (packed: false)
[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info]  - GEMM_S8 backend: DNNL (packed: false, u8s8 preferred: true)
[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info] GPU #0: NVIDIA GeForce RTX 3060 (CC=8.6)
[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info]  - Allow INT8: true
[2023-07-16 10:19:42.400] [ctranslate2] [thread 2472] [info]  - Allow FP16: true (with Tensor Cores: true)
[2023-07-16 10:19:45.592] [ctranslate2] [thread 2472] [info] Using CUDA allocator: cuda_malloc_async
[2023-07-16 10:19:46.876] [ctranslate2] [thread 2472] [info] Loaded model C:\UTIL2\_models\faster-whisper-large-v2 on device cuda:0
[2023-07-16 10:19:46.876] [ctranslate2] [thread 2472] [info]  - Binary version: 6
[2023-07-16 10:19:46.876] [ctranslate2] [thread 2472] [info]  - Model specification revision: 3
[2023-07-16 10:19:46.876] [ctranslate2] [thread 2472] [info]  - Selected compute type: float16

Model loaded in: 4.59 seconds

Processing audio with duration 05:23.004

Audio processing finished in: 1.15 seconds

Processing segment at 00:00.000
[2023-07-16 10:19:52.326] [ctranslate2] [thread 106552] [info] Loaded cuBLAS library version 11.11.3
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 00:30.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 01:00.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 01:30.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 02:00.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 02:30.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 03:00.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 03:30.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 04:00.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 04:30.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)
Processing segment at 05:00.000
Log probability threshold is not met with temperature 0.0 (-1.203125 < -1.000000)
No speech threshold is met (0.957031 > 0.600000)


Transcription speed: 49.93 audio seconds/s

Operation finished in: 12 seconds

Consistently gives me a zero-byte LRC file

image

As an experiment i tried changing output format from lrc to srt and beam size from 5 to 1. The results were... different:

whisper-faster.exe --verbose True --language en --threads 12 --device cuda --model large-v2 --output_dir "T:\new\MUSIC\Ween\Ween Unreleased Anthology\3. Demos & Tapes\3. 12GCG Demos and Sessions\2. 12GCG Outtakes" --output_format srt  --vad_filter False     --beam_size 1  "2. So Long Jerry (12 GCG outtake).vocals.wav"

Standalone Faster-Whisper r134+++ running on: CUDA

Number of visible GPU devices: 1

Supported compute types by GPU: {'int8_float16', 'float16', 'int8', 'float32'}

[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info] CPU: AuthenticAMD (SSE4.1=true, AVX=true, AVX2=true, AVX512=false)
[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info]  - Selected ISA: AVX2
[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info]  - Use Intel MKL: false
[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info]  - SGEMM backend: DNNL (packed: false)
[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info]  - GEMM_S16 backend: none (packed: false)
[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info]  - GEMM_S8 backend: DNNL (packed: false, u8s8 preferred: true)
[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info] GPU #0: NVIDIA GeForce RTX 3060 (CC=8.6)
[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info]  - Allow INT8: true
[2023-07-16 10:23:37.171] [ctranslate2] [thread 161244] [info]  - Allow FP16: true (with Tensor Cores: true)
[2023-07-16 10:23:40.137] [ctranslate2] [thread 161244] [info] Using CUDA allocator: cuda_malloc_async
[2023-07-16 10:23:41.410] [ctranslate2] [thread 161244] [info] Loaded model C:\UTIL2\_models\faster-whisper-large-v2 on device cuda:0
[2023-07-16 10:23:41.410] [ctranslate2] [thread 161244] [info]  - Binary version: 6
[2023-07-16 10:23:41.410] [ctranslate2] [thread 161244] [info]  - Model specification revision: 3
[2023-07-16 10:23:41.410] [ctranslate2] [thread 161244] [info]  - Selected compute type: float16

Model loaded in: 4.35 seconds

Processing audio with duration 05:23.004

Audio processing finished in: 1.15 seconds

Processing segment at 00:00.000
[2023-07-16 10:23:46.733] [ctranslate2] [thread 140124] [info] Loaded cuBLAS library version 11.11.3
[00:00.000 --> 00:05.000]  .
Processing segment at 00:29.980
[00:29.980 --> 00:34.980]  .
Processing segment at 00:59.960
[00:59.960 --> 01:02.960]  .
[01:04.960 --> 01:05.460]  .
[01:15.340 --> 01:20.340]  .
[01:20.340 --> 01:22.040]  .
[01:22.040 --> 01:22.620]  .
Processing segment at 01:22.620
[01:22.620 --> 01:23.800]  .
[01:34.900 --> 01:37.620]  .
[01:39.860 --> 01:42.340]  .
Processing segment at 01:42.340
[01:42.340 --> 01:43.160]  .
[01:43.160 --> 01:43.180]  .
[01:59.380 --> 02:00.340]  .
[02:00.340 --> 02:00.480]  .
Processing segment at 02:00.480
[02:04.460 --> 02:04.820]  .
[02:04.820 --> 02:04.840]  .
[02:12.920 --> 02:13.280]  .
[02:13.280 --> 02:13.460]  .
[02:13.460 --> 02:13.480]  .
Processing segment at 02:13.480
[02:13.480 --> 02:14.900]  .
[02:18.480 --> 02:21.440]  .
[02:23.480 --> 02:28.480]  .
Processing segment at 02:43.420
[02:43.420 --> 02:48.420]  .
Processing segment at 03:13.400
[03:13.400 --> 03:18.400]  .
[03:23.400 --> 03:28.400]  .
[03:36.520 --> 03:39.760]  .
[03:39.760 --> 03:40.100]  .
Processing segment at 03:40.100
[03:40.100 --> 03:40.360]  .
[03:52.100 --> 03:52.620]  .
[03:52.620 --> 03:52.700]  .
Processing segment at 03:52.700
[03:56.340 --> 03:56.380]  .
[03:56.380 --> 03:56.400]  .
[04:11.740 --> 04:11.780]  .
[04:11.780 --> 04:11.800]  .
[04:11.800 --> 04:11.820]  .
Processing segment at 04:11.820
[04:11.820 --> 04:14.000]  .
[04:21.820 --> 04:21.880]  .
[04:21.880 --> 04:23.600]  .
[04:23.600 --> 04:23.620]  .
Processing segment at 04:23.620
[04:23.620 --> 04:27.380]  .
[04:33.620 --> 04:38.580]  .
[04:38.580 --> 04:38.600]  .
Processing segment at 04:38.600
[04:44.640 --> 04:44.800]  .
[04:44.800 --> 04:44.820]  .
[04:53.240 --> 04:53.400]  .
[04:53.400 --> 04:53.420]  .
[04:53.420 --> 04:53.500]  .
Processing segment at 04:53.500
[04:53.500 --> 04:54.080]  .
[04:54.080 --> 04:55.100]  .
Processing segment at 04:55.100
[04:55.100 --> 04:56.600]  .
[04:56.600 --> 04:56.620]  .
[05:12.940 --> 05:18.620]  .
[05:18.620 --> 05:22.800]  .
Processing segment at 05:22.800
[05:22.800 --> 05:22.840]  .


Transcription speed: 23.63 audio seconds/s

Operation finished in: 19 seconds

p.s. if i filed this in the project please let me know so i can copy-paste this into the correct project

Guide for options when used in subtitle edit?

Most of the advanced options descriptions aren't very obvious as to what they are and what they effect.

Are you able to make a basic overview of what certain settings might do?

Also are there any basic tips for options you might most commonly mess with when making srt. For example, how/when is initial prompt a good idea to use? If you plan on making a transcription from XYZ to English should the initial prompt be in English? Is this more of a prompt like chatgpt or more of and example text?

Also tips for example workflows. Like if transcribing an xyz video you may want to consider abc settings.

can you share the script to build the executable app

actually i also tried to build exe file using faster-whisper, but
after i use pyinstaller to build it , the folder size is very huge ,
only torch cost 4G space

but your package seems very small , even add torch is only 1G , so
i wonder how you did it.

it will be appreciated if you can share the build script like .spec file usd by
pytinstaller or others.

if you don't want to post it publicly , you can send to my email: [email protected], thanks

%1 is not a valid win32 application

I hope you're doing well. I wanted to share an issue I'm facing and seek your advice. I've been using Whisper-Faster for testing on various recordings, and it has been working fine until today when I encountered an error after a Windows update. The error message says, "%1 is not a valid Win32 application." I tried searching for a solution on Google, but the results returned various variations of the problem, and I'm unsure where to begin troubleshooting.

All the required DLLs are included in the same folder as the application.

If you have any suggestions or insights on how to resolve this issue, I would greatly appreciate your help.
Thank you in advance!

1

Punctuation Missing on Some Files

Just ran WF on roughly 90 voice recording wav files using large-v2 and beam size of 5. The accuracy of the transciption was great given the noisy environment of the recording and a faulty Zoom recorder. Also some very unusual words such a "hyderized" or "Takkakaw" were transcribed correctly to my great surprise.

However, I'd say about a 20 to 25% were missing punctuation altogether--no commas, periods, or capitaliztion of the first word in sentence. In such instances, I reran using the Medium model and they worked OK.

Any ideas to avoid such errors in the future? Thanks.

This really is a great tool.

[FR] Add web video support (Youtube)

I would like to suggest adding support for web video transcription, specifically for platforms like YouTube.

By incorporating web video transcription capabilities, the tool would become even more versatile and valuable, catering to content creators and researchers. This enhancement would save time and effort while attracting a wider user base.

Thank you for considering my suggestion to further enhance your amazing tool's functionality.

GPU execution impossible in Windows 7?

  1. As CUDA Toolkit requires at least Windows 10, does this make GPU processing impossible in Windows 7?
  2. CUDA Toolkit contains cuBLAS. What about cuDNN? Otherwise you need membership in the NVIDIA Developer Program to get cuDNN.

save transcriptions half-way

i have a huge 2-hour video that takes around 5-6 hours or more.

i sometimes finish only 1 hour of video midway so i'd like it to save half-way.

on the next day, i can restart it again from half-way.

idk if this is the right repo for that but would like to request this feature.

probably have to change --skip logic for this.

very rare scenario probably but i have large videos now. all around 2 hours that dont finish before i close my computer.

ideally, i can only do it if i start the process in the morning & only then if my chances are bright, it will be done.

Don't stop batch processing when some file fails

hi,

i'm using batch processing to transcribe multiple files. however, i've run into an issue with some mp3 files related to a setting called "compute_type." to fix this, i need to change "compute_type" to either float16, int8, or float32, depending on the file.

my problem is that when there's an error with one mp3 file, the whole batch processing stops. is there a way to make it skip the problematic file and continue with the next one instead of stopping the entire application?

thanks.

Add version/commit info to releases

Could you please add the info for what versions or commits of whisper/whisper-faster your standalone builds are based on?

It’s helpful if issues are encountered to be able to check if they’re known/addressed upstream.

CUDA failed with error out of memory

Discussed in https://github.com/Purfview/whisper-standalone-win/discussions/49

Originally posted by salsavalencia2000 August 1, 2023
Notified you of the following problem.

You can put the option that it can work with CPU instead of detecting it automatically.

D:\Programacion\DELPHI\JJ 2023\Win32\Debug>w.exe "D:\Programacion\DELPHI\JJ 2023\Win32\Debug\1.ts" --model=large-v2

Standalone Faster-Whisper r141.3 running on: CUDA

Traceback (most recent call last):
File "D:\whisper-fast_main_.py", line 638, in
File "D:\whisper-fast_main_.py", line 529, in cli
File "faster_whisper\transcribe.py", line 125, in init
RuntimeError: CUDA failed with error out of memory
[16468] Failed to execute script 'main' due to unhandled exception!

Keep the release log?

There are small changes here and there with every release, but I believe once a new release comes out, it replaces previous release (instead of just adding new release on top of the page)?

It'd be nice to have these release log kept so that people can search for info/arguments when they need them? Or is there somewhere else that has this info? Thanks a lot!

Please add support for .AMR audio files

Hello, thank you for developing and releasing this wonderful program. I use it daily and prefer it over all the other programs I have tried.

I am requesting that you add support for .AMR audio files. Other Whisper speech-to-text programs support .AMR files by default. I wish I could use your program for all speech-to-text functions, and that I didn't have to switch to using one of the other (inferior) programs when processing .AMR files.

.AMR stands for Adaptive Multi-Rate Audio Codec. It is an audio file format that was specifically designed for speech audio coding and is mostly used for voice recordings and transmissions on mobile networks. It is optimized for low-bitrate voice recordings, making it useful for things like mobile phone conversations, voicemail systems, and audio messaging.

Someone might suggest, "Just convert it to another format or use a different program to encode those files", which is an option, but it would be really nice if this wasn't necessary due to the extra complexity involved. Your program is faster than those other programs, and I believe yours does a better job in processing speech to text than other solutions. I'd love to do everything from within the Whisper Standalone app.

Thank you!

Add postfix to subtitle's filename

Save file name format,For example :

Input: C:/Users/myname/Desktop/audio.mp3
Options: --language=Chinese
The output now is: audio.srt
I want the output to be: audio_Chinese.srt

Options: --language=en
I want the output to be: audio_en.srt

Options: --language=Japanese
I want the output to be: audio_Japanese.srt

The recommended saved file name is: filename_language.xxx
filename _ language . xxx
This facilitates output management in different languages,
rather than directly overwriting files with the same name.
This project is very good,
I have been using it and hope to see updates in the next version.


(Chinese using Google Translate)中文使用谷歌翻译

保存文件的名称格式,举例说明:

输入:C:/Users/myname/Desktop/audio.mp3
选项:--language=Chinese
现在的输出是:audio.srt
我希望输出是:audio_Chinese.srt

选项:--language=en
我希望输出是:audio_en.srt

选项:--language=Japanese
我希望输出是:audio_Japanese.srt

建议的保存的文件名称是:filename_language.xxx
这样方便不同语言的输出管理,而不是直接覆盖同名文件。
这个项目非常好,我一直在使用,希望下个版本能看到更新。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.