Comments (13)
Ok, I did a hacky fix using a Thread.lock acquire/release between model.transcribe in line 5xx and 7xx in Record.py
It works... you may want to implement it more elegantly
from speech-translate.
@Dadangdut33 Unfortunately the issue is still there.
RuntimeError: The size of tensor a (12) must match the size of tensor b (5) at non-singleton dimension 3
I have also set auto channels and auto sample rate in the setting.. Nothing changed.
I am using the last version released.
Its fixed on 1.3.0 which is not released yet. I will try to release it maybe tomorrow or the day after it
from speech-translate.
Fixed in 1.3.0 release
from speech-translate.
Try to set auto channels and auto sample rate in the setting
from speech-translate.
Same issue here. And I have enabled the auto settings.
2023-04-21 09:13:17,328 - INFO - Console window hidden. If it is not hidden (only minimized), try changing your default windows terminal to windows cmd. (Main.py:51) [MainThread]
2023-04-21 09:13:17,328 - INFO - Booting up | Version: 1.2.3 (Main.py:1059) [MainThread]
2023-04-21 09:13:17,393 - DEBUG - Available Theme to use: ['vista', 'sv-light', 'sv-dark'] (Main.py:159) [MainThread]
2023-04-21 09:13:17,393 - DEBUG - Setting theme: sv-dark (Style.py:28) [MainThread]
2023-04-21 09:13:17,406 - DEBUG - Setting custom dark theme style (Style.py:49) [MainThread]
2023-04-21 09:13:17,724 - INFO - Checking for update on start (About.py:100) [MainThread]
2023-04-21 09:13:17,908 - INFO - Checking for update... (About.py:125) [MainThread]
2023-04-21 09:13:18,318 - INFO - No update available (About.py:145) [Thread-5 (req_update_check)]
2023-04-21 09:13:20,523 - INFO - Checking model name (Helper_Whisper.py:19) [MainThread]
2023-04-21 09:13:20,524 - DEBUG - modelKey: Large (v1) (1x speed), src_english: False (Helper_Whisper.py:20) [MainThread]
2023-04-21 09:13:20,524 - DEBUG - modelName: large-v1 (Helper_Whisper.py:25) [MainThread]
2023-04-21 09:13:22,415 - INFO - Checking model name (Helper_Whisper.py:19) [Thread-6 (rec_realTime)]
2023-04-21 09:13:22,415 - DEBUG - modelKey: Large (v1) (1x speed), src_english: False (Helper_Whisper.py:20) [Thread-6 (rec_realTime)]
2023-04-21 09:13:22,415 - DEBUG - modelName: large-v1 (Helper_Whisper.py:25) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,179 - INFO - -------------------------------------------------- (Record.py:344) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,179 - INFO - Task: transcribe (Record.py:345) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,179 - INFO - Modelname: large-v1 (Record.py:346) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,180 - INFO - Engine: Whisper (Record.py:347) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,180 - INFO - Auto mode: True (Record.py:348) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,180 - INFO - Source Lang: auto detect (Record.py:349) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,180 - INFO - Target Lang: english (Record.py:351) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,201 - DEBUG - Device: (1) Microphone (Superlux E205U) (Record.py:393) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,201 - DEBUG - {'index': 1, 'structVersion': 2, 'name': 'Microphone (Superlux E205U)', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.09, 'defaultLowOutputLatency': 0.09, 'defaultHighInputLatency': 0.18, 'defaultHighOutputLatency': 0.18, 'defaultSampleRate': 44100.0, 'isLoopbackDevice': False} (Record.py:394) [Thread-6 (rec_realTime)]
2023-04-21 09:13:32,211 - DEBUG - Record Session Started (Record.py:401) [Thread-6 (rec_realTime)]
2023-04-21 09:13:44,484 - ERROR - Error in record session (Record.py:719) [Thread-6 (rec_realTime)]
2023-04-21 09:13:44,485 - ERROR - The size of tensor a (13) must match the size of tensor b (5) at non-singleton dimension 3 (Record.py:720) [Thread-6 (rec_realTime)]
Traceback (most recent call last):
File "speech_translate\utils\Record.py", line 588, in rec_realTime
File "whisper\transcribe.py", line 229, in transcribe
File "whisper\transcribe.py", line 164, in decode_with_fallback
File "torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "whisper\decoding.py", line 811, in decode
File "torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "whisper\decoding.py", line 724, in run
File "whisper\decoding.py", line 673, in _main_loop
File "whisper\decoding.py", line 157, in logits
File "torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "whisper\model.py", line 211, in forward
File "torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "whisper\model.py", line 136, in forward
File "torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "whisper\model.py", line 90, in forward
File "whisper\model.py", line 104, in qkv_attention
RuntimeError: The size of tensor a (13) must match the size of tensor b (5) at non-singleton dimension 3
2023-04-21 09:13:48,013 - INFO - Recording Mic Stopped (Main.py:896) [Thread-6 (rec_realTime)]
from speech-translate.
I honestly couldn't figure out what is wrong here, it might be related to the device because i tried on my mic and headphone and it is working just fine. Have you tried another mic / device ?
from speech-translate.
I have tested on other devices, also I'm using Windows 11.
When encoutering errors, I close the app, the progress seem not to be killed, and keep the GPU memory getting use.
from speech-translate.
Got the same issue.
In my case, if Whisper Translation is used, regardless keeping transcript or not. It will have the same error in the first few seconds.
The size of tensor a (x) must match the size of tensor b (y)
If I just use transcript without translation. No errors
If I use transcript in Whisper, and translate using Google translate. No errors
from speech-translate.
2023-04-21 09:13:44,485 - ERROR - The size of tensor a (13) must match the size of tensor b (5) at non-singleton dimension 3 (Record.py:720) [Thread-6 (rec_realTime)]
Traceback (most recent call last):
File "speech_translate\utils\Record.py", line 588, in rec_realTime
Some suggestion from Whisper AI community
Hi, it appears that you're calling the model from different threads. The model is not equipped for that, mainly because of the kv cache logic using the hooks. I'd suggest keep using the lock, if that's not too much of a slowdown.
from speech-translate.
I guess the issue is in Record.py, you can't call two model.transcribe concurrently. It need some kind of lock between them, or have them in one thread.
from speech-translate.
I'm having the same problem and wanted to edit record.py.
Seems I'm too dumb or just can't find the file.
I'm on Windows 10 and there is neither a utils directory nor the python file.
Can anybody help me? The online translation integration is nice but so much slower and eeven seems less accurate...
Edit: Came to my mind that I may have to use the module via pip? I guess the precompiled binary does have some Python code built in.
Thumbs up if I'm right.
from speech-translate.
Ok, I did a hacky fix using a Thread.lock acquire/release between model.transcribe in line 5xx and 7xx in Record.py
It works... you may want to implement it more elegantly
Thanks for the help @kelvincht <3 i have added it to the code
from speech-translate.
@Dadangdut33 Unfortunately the issue is still there.
RuntimeError: The size of tensor a (12) must match the size of tensor b (5) at non-singleton dimension 3
I have also set auto channels and auto sample rate in the setting.. Nothing changed.
I am using the last version released.
from speech-translate.
Related Issues (20)
- On light themes, text in the context menu in the dynamic text field is not visible [BUG] HOT 1
- [REQ] consider using static-ffmpeg so that you don't need an extra step for ffmpeg installation HOT 2
- [REQ] Add option for no max sentences limit HOT 5
- [BUG] ImportError: cannot import name 'startfile' from 'os' (/usr/lib/python3.10/os.py) HOT 2
- [BUG] AttributeError: module 'subprocess' has no attribute 'STARTUPINFO' HOT 3
- [BUG] TCLError bad pad value "2.5": must be positive screen distance HOT 8
- Application hang due to: FileNotFoundError: [WinError 2] The system cannot find the file specified HOT 2
- for some reason , it is not starting HOT 2
- release 1.3. CPU only cant find "small" module HOT 4
- [BUG] Error when activating Debug recording HOT 2
- [REQ] PLEASEEE ADD THE SUPPORT Speech-to-Speech and Text-to-Speech HOT 1
- [REQ] DIrectML HOT 1
- [BUG] Running on Apple silicon ends with an error HOT 1
- [REQ] Add support for OPUS files HOT 1
- [BUG] The constant YouTube phrases when used for live subtitles HOT 4
- [REQ] Distinguishing speakers in a conversation?
- [REQ] Adding Yandex and DeepL as translators HOT 2
- Making Medium Whisper ASR a bit faster
- [BUG] "Error occured while doing alignment"
- [REQ] Refinement feature should take .srt files, but only takes .json
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from speech-translate.