Coder Social home page Coder Social logo

Comments (11)

Purfview avatar Purfview commented on May 13, 2024

Post a screenshot of what is written when it finishes.

from whisper-standalone-win.

yuchaoqian avatar yuchaoqian commented on May 13, 2024

image

There is no log, the audio is 17min, and the task always stops in the middle of processing when using the medium model

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

And you sure that whisper-faster is not in memory anymore?
Do you run on CUDA? How much VRAM it has?

from whisper-standalone-win.

yuchaoqian avatar yuchaoqian commented on May 13, 2024

Yes it is run on CUDA using 3070. It has 8GB.
During the execution, i saw vram usage is below 4GB in total. Once the task exits, the ressource usage drops immediately.

When i tried the large model, it works just fine with success log


Transcription speed: 9.89 audio seconds/s

Subtitles are written to 'D:\workspace\whisper\fast-whisper\Whisper-Faster' directory.


Operation finished in: 165 seconds

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

Strange. Can you share that audio?

from whisper-standalone-win.

yuchaoqian avatar yuchaoqian commented on May 13, 2024

Nah I can not share it.

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

In what OS do you run it?
Maybe there would me more info with --verbose=true.
Try --compute_type=int8.

from whisper-standalone-win.

yuchaoqian avatar yuchaoqian commented on May 13, 2024

It is on Windows 10 22H2

2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info] CPU: AuthenticAMD (SSE4.1=true, AVX=true, AVX2=true, AVX512=false)
[2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info]  - Selected ISA: AVX2
[2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info]  - Use Intel MKL: false
[2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info]  - SGEMM backend: DNNL (packed: false)
[2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info]  - GEMM_S16 backend: none (packed: false)
[2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info]  - GEMM_S8 backend: DNNL (packed: false, u8s8 preferred: true)
[2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info] GPU #0: NVIDIA GeForce RTX 3070 (CC=8.6)
[2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info]  - Allow INT8: true
[2023-07-13 10:55:43.305] [ctranslate2] [thread 30220] [info]  - Allow FP16: true (with Tensor Cores: true)
[2023-07-13 10:55:44.283] [ctranslate2] [thread 30220] [info] Using CUDA allocator: cuda_malloc_async
[2023-07-13 10:55:44.644] [ctranslate2] [thread 30220] [info] Loaded model D:\workspace\whisper\fast-whisper\Whisper-Faster\_models\faster-whisper-medium on device cuda:0
[2023-07-13 10:55:44.644] [ctranslate2] [thread 30220] [info]  - Binary version: 6
[2023-07-13 10:55:44.644] [ctranslate2] [thread 30220] [info]  - Model specification revision: 3
[2023-07-13 10:55:44.644] [ctranslate2] [thread 30220] [info]  - Selected compute type: float16

Standalone Faster-Whisper r134.6 running on: CUDA

Number of visible GPU devices: 1

Supported compute types by GPU: {'float16', 'int8', 'int8_float16', 'float32'}


Model loaded in: 1.41 seconds

Processing audio with duration 17:38.773

VAD filter removed 00:31.889 of audio
VAD filter kept the following audio segments: [00:00.000 -> 17:06.884]

Audio processing finished in: 5.02 seconds

Processing segment at 00:00.000

# ....same Processing segment log here so I do not paste them here
# and we have some compression ratio log here
Compression ratio threshold is not met with temperature 0.0 (8.540984 > 2.400000)
Compression ratio threshold is not met with temperature 0.2 (8.540984 > 2.400000)
Compression ratio threshold is not met with temperature 0.4 (8.540984 > 2.400000)
Compression ratio threshold is not met with temperature 0.6 (7.133333 > 2.400000)
Compression ratio threshold is not met with temperature 0.8 (7.444444 > 2.400000)

Processing segment at 16:42.470

And it exits without any error. Just ring a "dou" song from the speaker.

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

But it shows that it's working -> "Processing segment at 16:42.470".
If you hear beep then that indicates that it's successfully finished, and there should be written "Operation finished".

from whisper-standalone-win.

yuchaoqian avatar yuchaoqian commented on May 13, 2024

I think the output log is not correct.
There is no log be written "Operation finished" when using medium size model even it is successfully finished.
I could find the text from 16:42.470 to the end in the generated srt file after the beep, but the last line of log is "Processing segment at 16:42.470".

And the situation is even worse when not using "--verbose"
When it beep, the logged lines never reach the end, but the srt file is correct.

[10:21.460 --> 10:22.460]

from whisper-standalone-win.

Purfview avatar Purfview commented on May 13, 2024

Created .srt and "beep" indicates that a task is successfully finished.
Just that the log output to console for some reason doesn't show up for you.

Check if r136.7 corrected this issue.

from whisper-standalone-win.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.