Coder Social home page Coder Social logo

Comments (13)

kelvincht avatar kelvincht commented on August 15, 2024 2

Ok, I did a hacky fix using a Thread.lock acquire/release between model.transcribe in line 5xx and 7xx in Record.py

It works... you may want to implement it more elegantly

image

image

image

image

from speech-translate.

Dadangdut33 avatar Dadangdut33 commented on August 15, 2024 2

@Dadangdut33 Unfortunately the issue is still there. RuntimeError: The size of tensor a (12) must match the size of tensor b (5) at non-singleton dimension 3

I have also set auto channels and auto sample rate in the setting.. Nothing changed.

I am using the last version released.

Its fixed on 1.3.0 which is not released yet. I will try to release it maybe tomorrow or the day after it

from speech-translate.

Dadangdut33 avatar Dadangdut33 commented on August 15, 2024 2

Fixed in 1.3.0 release

from speech-translate.

Dadangdut33 avatar Dadangdut33 commented on August 15, 2024

Try to set auto channels and auto sample rate in the setting

from speech-translate.

sugarbobo-ch avatar sugarbobo-ch commented on August 15, 2024

Same issue here. And I have enabled the auto settings.
image
image

2023-04-21 09:13:17,328 - INFO - Console window hidden. If it is not hidden (only minimized), try changing your default windows terminal to windows cmd. (Main.py:51) [MainThread]
2023-04-21 09:13:17,328 - INFO - Booting up | Version: 1.2.3 (Main.py:1059) [MainThread]
2023-04-21 09:13:17,393 - DEBUG - Available Theme to use: ['vista', 'sv-light', 'sv-dark'] (Main.py:159) [MainThread]
2023-04-21 09:13:17,393 - DEBUG - Setting theme: sv-dark (Style.py:28) [MainThread]
2023-04-21 09:13:17,406 - DEBUG - Setting custom dark theme style (Style.py:49) [MainThread]
2023-04-21 09:13:17,724 - INFO - Checking for update on start (About.py:100) [MainThread]
2023-04-21 09:13:17,908 - INFO - Checking for update... (About.py:125) [MainThread]
2023-04-21 09:13:18,318 - INFO - No update available (About.py:145) [Thread-5 (req_update_check)]
2023-04-21 09:13:20,523 - INFO - Checking model name (Helper_Whisper.py:19) [MainThread]
2023-04-21 09:13:20,524 - DEBUG - modelKey: Large (v1) (1x speed), src_english: False (Helper_Whisper.py:20) [MainThread]
2023-04-21 09:13:20,524 - DEBUG - modelName: large-v1 (Helper_Whisper.py:25) [MainThread]
2023-04-21 09:13:22,415 - INFO - Checking model name (Helper_Whisper.py:19) [Thread-6 (rec_realTime)]
2023-04-21 09:13:22,415 - DEBUG - modelKey: Large (v1) (1x speed), src_english: False (Helper_Whisper.py:20) [Thread-6 (rec_realTime)]
2023-04-21 09:13:22,415 - DEBUG - modelName: large-v1 (Helper_Whisper.py:25) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,179 - INFO - -------------------------------------------------- (Record.py:344) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,179 - INFO - Task: transcribe (Record.py:345) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,179 - INFO - Modelname: large-v1 (Record.py:346) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,180 - INFO - Engine: Whisper (Record.py:347) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,180 - INFO - Auto mode: True (Record.py:348) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,180 - INFO - Source Lang: auto detect (Record.py:349) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,180 - INFO - Target Lang: english (Record.py:351) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,201 - DEBUG - Device: (1) Microphone (Superlux E205U) (Record.py:393) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,201 - DEBUG - {'index': 1, 'structVersion': 2, 'name': 'Microphone (Superlux E205U)', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.09, 'defaultLowOutputLatency': 0.09, 'defaultHighInputLatency': 0.18, 'defaultHighOutputLatency': 0.18, 'defaultSampleRate': 44100.0, 'isLoopbackDevice': False} (Record.py:394) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,211 - DEBUG - Record Session Started (Record.py:401) [Thread-6 (rec_realTime)]
2023-04-21 09:13:44,484 - ERROR - Error in record session (Record.py:719) [Thread-6 (rec_realTime)]
2023-04-21 09:13:44,485 - ERROR - The size of tensor a (13) must match the size of tensor b (5) at non-singleton dimension 3 (Record.py:720) [Thread-6 (rec_realTime)]
Traceback (most recent call last):
  File "speech_translate\utils\Record.py", line 588, in rec_realTime
  File "whisper\transcribe.py", line 229, in transcribe
  File "whisper\transcribe.py", line 164, in decode_with_fallback
  File "torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "whisper\decoding.py", line 811, in decode
  File "torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "whisper\decoding.py", line 724, in run
  File "whisper\decoding.py", line 673, in _main_loop
  File "whisper\decoding.py", line 157, in logits
  File "torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "whisper\model.py", line 211, in forward
  File "torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "whisper\model.py", line 136, in forward
  File "torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "whisper\model.py", line 90, in forward
  File "whisper\model.py", line 104, in qkv_attention
RuntimeError: The size of tensor a (13) must match the size of tensor b (5) at non-singleton dimension 3
2023-04-21 09:13:48,013 - INFO - Recording Mic Stopped (Main.py:896) [Thread-6 (rec_realTime)]

from speech-translate.

Dadangdut33 avatar Dadangdut33 commented on August 15, 2024

I honestly couldn't figure out what is wrong here, it might be related to the device because i tried on my mic and headphone and it is working just fine. Have you tried another mic / device ?

from speech-translate.

sugarbobo-ch avatar sugarbobo-ch commented on August 15, 2024

I have tested on other devices, also I'm using Windows 11.
When encoutering errors, I close the app, the progress seem not to be killed, and keep the GPU memory getting use.
image

from speech-translate.

kelvincht avatar kelvincht commented on August 15, 2024

Got the same issue.

In my case, if Whisper Translation is used, regardless keeping transcript or not. It will have the same error in the first few seconds.
The size of tensor a (x) must match the size of tensor b (y)

If I just use transcript without translation. No errors

If I use transcript in Whisper, and translate using Google translate. No errors

from speech-translate.

kelvincht avatar kelvincht commented on August 15, 2024

2023-04-21 09:13:44,485 - ERROR - The size of tensor a (13) must match the size of tensor b (5) at non-singleton dimension 3 (Record.py:720) [Thread-6 (rec_realTime)]
Traceback (most recent call last):
File "speech_translate\utils\Record.py", line 588, in rec_realTime

image

Some suggestion from Whisper AI community

openai/whisper#951

Hi, it appears that you're calling the model from different threads. The model is not equipped for that, mainly because of the kv cache logic using the hooks. I'd suggest keep using the lock, if that's not too much of a slowdown.

from speech-translate.

kelvincht avatar kelvincht commented on August 15, 2024

I guess the issue is in Record.py, you can't call two model.transcribe concurrently. It need some kind of lock between them, or have them in one thread.

from speech-translate.

PawelGu avatar PawelGu commented on August 15, 2024

I'm having the same problem and wanted to edit record.py.
Seems I'm too dumb or just can't find the file.
I'm on Windows 10 and there is neither a utils directory nor the python file.
Can anybody help me? The online translation integration is nice but so much slower and eeven seems less accurate...

Edit: Came to my mind that I may have to use the module via pip? I guess the precompiled binary does have some Python code built in.
Thumbs up if I'm right.

from speech-translate.

Dadangdut33 avatar Dadangdut33 commented on August 15, 2024

Ok, I did a hacky fix using a Thread.lock acquire/release between model.transcribe in line 5xx and 7xx in Record.py

It works... you may want to implement it more elegantly

image

image

image

image

Thanks for the help @kelvincht <3 i have added it to the code

from speech-translate.

EllyKher avatar EllyKher commented on August 15, 2024

@Dadangdut33 Unfortunately the issue is still there.
RuntimeError: The size of tensor a (12) must match the size of tensor b (5) at non-singleton dimension 3

I have also set auto channels and auto sample rate in the setting.. Nothing changed.

I am using the last version released.

from speech-translate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.