savbell / whisper-writer Goto Github PK
View Code? Open in Web Editor NEW💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
License: GNU General Public License v3.0
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
License: GNU General Public License v3.0
The script starts fine, and says that it is recording when I push the key combo, but it never stops recording, even when I mute my microphone.
I tried setting the initial_prompt to condition the output from the whisper-1 API but it seems like the whatever that is set in the config file is not influencing the output that is returned from the API. I even tried explicitly to test by saying "insert HELLO in front of in every sentence", but it doesn't seem to be working. Is there any additional configuration that is required, or am I missing something?
I attached a voice recording that should transcribe to "Es wäre schön, wenn das Programm auch Umlaute unterstützen würde."
Instead the output is this: "Es wre schn, wenn das Programm auch Umlaute untersttzen wrde."
https://github.com/mudler/LocalAI
Support can simly be added by
import openai
openai.base_api = "http://someinternalhost.local/v1"
The URL should be configured by the config.json file
"api_options": { "model": "whisper-1", "language": null, "temperature": 0.0, "initial_prompt": null **"base_api": "http://someinternalhost.local/v1"** },
I absolutely love this project, the hold_to_record functionality is excellent! Thanks for sharing! On to the issue:
Not so expert users like me could probably use more detailed instructions on how to get GPU acceleration on Windows. README.md lists the requirements for GPU acceleration as "cuBLAS for CUDA 11 and cuDNN 8 for CUDA 11". There are some issues here:
Solutions:
I solved it, but I don't know how because I implemented both of these attempted fixes at the same time:
Hi! Thanks for creating this! I'm lucky I found it, as I was looking for a program that does exactly this for a while now.
How does the language code in the options affect the output?
I know Whisper's multilingual models can transcribe input without specifying its language. So if I say one sentence in one language and another sentence in a second language, Whisper can transcribe both in their respective languages.
And this is what I need as well. I speak multiple languages and use them all on my PC. The language code option seems to restrict this use case though. Or does it? I don't really know.
Could you please elaborate on how this works and if/how I can achieve what I want?
Thanks!
I tried to install whisper-writer on Ubuntu jammy
Requirements could only be satisfied by removing fixed version for "av" and "faster-whisper" in requirements.txt
Startup now fails
Traceback (most recent call last): File "/home/fabian/whisper-writer/src/main.py", line 136, in <module> keyboard.add_hotkey(config['activation_key'], on_shortcut) File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/__init__.py", line 639, in add_hotkey _listener.start_if_necessary() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_generic.py", line 35, in start_if_necessary self.init() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/__init__.py", line 196, in init _os_keyboard.init() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixkeyboard.py", line 113, in init build_device() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixkeyboard.py", line 109, in build_device ensure_root() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixcommon.py", line 174, in ensure_root raise ImportError('You must be root to use this library on linux.') ImportError: You must be root to use this library on linux.
When I hit Ctrl+Alt+Space and speak french, I can see in the python terminal that the accents are correctly transcribed, but on the target windows, the accents disappears. To test, just say "Touché, coulé", transcribed as follow: "Touch, coul. "
Thanks a lot for this nice software, it helps a lot, I can't believe that this is the only opensource whisper-based dictation tool...
I use the settings
"activation_key": "ctrl+win",
"recording_mode": "hold_to_record",
Unfortunately, this sometimes (too often) causes Windows to perform actions before Whisper Writer starts to auto-type:
"C:\WINDOWS\system32\rundll32.exe" dsquery.dll, OpenQueryWindow {16006700-87AD-11D0-9140-00AA00C16E65}
.Of course it would be good if, when Whisper Writer does its thing, no other software including the OS would even notice that Ctrl+Win has been pressed - if that's possible. But you could also stop auto-typing as soon as the active window changed.
Just starting to type with the start menu open isn't bad, however, because a text box simply appears that is auto-typed into. Since it's desirable to not lose your recording, you should let this happen by accepting an active-window change with the start menu window now being active. Then, one can at least copy-paste the text from the start menu text box. So, please make an exception for the start menu window (executable SearchApp.exe
, window class Windows.UI.Core.CoreWindow
, found with AutoHotkey "Window Spy" tool).
It may also be appropriate to completely deny auto-typing when the taskbar or the desktop window is active, as that may have contributed to the opening of various windows after creating a new desktop, and this could still be a problem when you click on the desktop before auto-typing, e.g. However, the real problem may also be that a modifier key stays pressed while Whisper Writer auto-types. In that case: Could Whisper Writer first wait until there isn't any interference anymore?
G:\Git clones\AI\whisper-writer\venv\Lib\site-packages\whisper\timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit
Traceback (most recent call last):
File "G:\Git clones\AI\whisper-writer\src\main.py", line 94, in
config = load_config_with_defaults()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\Git clones\AI\whisper-writer\src\main.py", line 52, in load_config_with_defaults
user_config = json.load(config_file)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 12 column 21 (char 265)
I used it for a 10 minute recording and it kept on repeating many sentences and did not detect the rest.
Fortunately it saved it as wav in temp, so I could use the official whisper to transcribe it.
I used the large model in both. Having used press_to_toggle, vad is not a reason.
The whisper models require a 16k sample rate, but not many audio devices provide that sample rate. Mine, for example, only supports 44100 and 192000. Leaving the sample rate at 16000 in src/config.json
results in an error:
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2721
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
File "/home/mark/compile/whisper/whisper-writer/src/transcription.py", line 52, in record_and_transcribe
with sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16', blocksize=sample_rate * frame_duration // 1000,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 1421, in __init__
_StreamBase.__init__(self, kind='input', wrap_callback='array',
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 898, in __init__
_check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 2747, in _check
raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]
Changing it to 44100, on the other hand, results in:
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2721
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
File "/home/mark/compile/whisper/whisper-writer/src/transcription.py", line 52, in record_and_transcribe
with sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16', blocksize=sample_rate * frame_duration // 1000,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 1421, in __init__
_StreamBase.__init__(self, kind='input', wrap_callback='array',
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 898, in __init__
_check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 2747, in _check
raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]
Here is the output of arecord -Dhw:0 --dump-hw-params
:
Warning: Some sources (like microphones) may produce inaudible results
with 8-bit sampling. Use '-f' argument to increase resolution
e.g. '-f S16_LE'.
HW Params of device "hw:0":
--------------------
ACCESS: MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT: S16_LE S32_LE
SUBFORMAT: STD
SAMPLE_BITS: [16 32]
FRAME_BITS: [32 64]
CHANNELS: 2
RATE: [44100 192000]
PERIOD_TIME: (83 11888617)
PERIOD_SIZE: [16 524288]
PERIOD_BYTES: [128 2097152]
PERIODS: [2 32]
BUFFER_TIME: (166 23777234)
BUFFER_SIZE: [32 1048576]
BUFFER_BYTES: [128 4194304]
TICK_TIME: ALL
--------------------
arecord: set_params:1371: Sample format non available
Available formats:
- S16_LE
- S32_LE
Does this need some abstraction like sox
?
Hello,
First of all, there were issues/conflicts when installing this script with regard to "numba" and it not being compatible with the latest version of Python (3.12.0) so I had to additionally manually install Python 3.11.6 and use that version explicitly for the virtual environment but that wasn't the biggest deal.
It worked the first few times when I provided my API key and had "use_api: true," set but was very, very slow at transcribing even one or two word phrases. I then tried switching this to false to see if it would be any better and it just got stuck on "transcribing" forever. I exited the script and switched "use_api:" back to true and then tried running it again but now I'm getting the following error:
(venv) C:\Users\rich\whisper-writer>python run.py Starting WhisperWriter... C:\Users\rich\whisper-writer\venv\Lib\site-packages\whisper\timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. @numba.jit Script activated. Whisper is set to run using OpenAI's API. To change this, modify the "use_api" value in the src\config.json file. Press Ctrl+Alt+Space to start recording and transcribing. Press Ctrl+C on the terminal window to quit. Recording... Recording finished. Size: 9120 Transcribing audio file... Transcription: How are you? Exception in thread Thread-2 (process): Traceback (most recent call last): File "C:\Users\rich\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner self.run() File "C:\Users\rich\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run self._target(*self._args, **self._kwargs) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\_generic.py", line 58, in process if self.pre_process_event(event): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\__init__.py", line 218, in pre_process_event callback(event) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\__init__.py", line 649, in <lambda> handler = lambda e: (event_type == KEY_DOWN and e.event_type == KEY_UP and e.scan_code in _logically_pressed_keys) or (event_type == e.event_type and callback()) ^^^^^^^^^^ File "C:\Users\rich\whisper-writer\src\main.py", line 85, in on_shortcut pyautogui.write(transcribed_text, interval=config['writing_key_press_delay']) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\pyautogui\__init__.py", line 593, in wrapper failSafeCheck() File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\pyautogui\__init__.py", line 1734, in failSafeCheck raise FailSafeException( pyautogui.FailSafeException: PyAutoGUI fail-safe triggered from mouse moving to a corner of the screen. To disable this fail-safe, set pyautogui.FAILSAFE to False. DISABLING FAIL-SAFE IS NOT RECOMMENDED.
Hello, sorry to disturb, im not good in python or other, and i just wanted to use your whisper-writer in adition to whister to do somes test and to talk in real time online in other language.
And after all installed, following the steps, the thing start but that didn't do anything except showing the rec thing, no error or anything, just he didn't write... Maybe you have a soluce ?
Could not load library cudnn_ops_infer64_8.dll. Error code 126
Please make sure cudnn_ops_infer64_8.dll is in your library path!
Tried on Windows 10 and 11.
I believe you need to initialize local_model to None in the main script to fix this.
I couldn't find information on this anywhere.
Your UX of having the ability to Whisper-dictate anywhere is awesome. I'd love to see it combined with this project's engine which runs Whisper locally on your own GPU (even low-mid range ones like a 1050) https://github.com/Const-me/Whisper and does not require users to install Python, which is a great hurdle.
And congrats on the amazing project!
I propose a use_shortcut_to_end config setting. If True, it would ignore silence_duration, and end the recording when the user presses the shortcut key a second time.
Hello there,
this tool is awesome, and on one of my machines it runs very well, and on the other I still get an error while setting it up.
What I have done so far:
Error description
I receive the following error, when I execute the command ((venv) C:\Windows\System32\whisper-writer>pip install -r requirements.txt):
Building wheels for collected packages: webrtcvad
Building wheel for webrtcvad (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for webrtcvad (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-311
copying webrtcvad.py -> build\lib.win-amd64-cpython-311
running build_ext
building 'webrtcvad' extension
creating build\temp.win-amd64-cpython-311
creating build\temp.win-amd64-cpython-311\Release
creating build\temp.win-amd64-cpython-311\Release\cbits
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio\signal_processing
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio\vad
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -D_WIN32 -Icbits -IC:\Windows\System32\whisper-writer\venv\include -IC:\Users\lisa\AppData\Local\Programs\Python\Python311\include -IC:\Users\lisa_\AppData\Local\Programs\Python\Python311\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" /Tccbits\pywebrtcvad.c /Fobuild\temp.win-amd64-cpython-311\Release\cbits\pywebrtcvad.obj
pywebrtcvad.c
C:\Users\lisa_\AppData\Local\Programs\Python\Python311\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\bin\HostX86\x64\cl.exe' failed with exit code 2
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for webrtcvad
Failed to build webrtcvad
ERROR: Could not build wheels for webrtcvad, which is required to install pyproject.toml-based projects
Before reinstalling C++ Build Tools, the error (fatal error C1083: Cannot open include file: 'io.h': No such file or directory) was not present, but further reinstalling didn't fix it yet. The error shown above about the webrtcvad-wheels is persistent since my first tries to install it.
Do you guys have an idea, how to fix it?
Thanks for your time and support,
Chris
Hello! First: thank you so much for building this!
Disclaimer: I am new to python and github and i am grateful if someone can help me with my installation :)
I have spent yesterday and today trying to install it on my machine
(MacOS 12.5.1) and found out that the python installation with homebrew didn't have tkinter
-> so i officially installed python 3.11
tkinter seems to work
then I found out I need to sudo python run.py (like I said, im a beginner)
But now I encounter this upon changing the "activation_key": "f4", to f4 (because ctrl+alt+space didn't do anything for me.
I start up by navigarting to my directory and activating the virtual environment:
cd ~/Documents/Projects/whisper-writer
source venv/bin/activate
sudo python run.py
then everything starts up without error
when I press F4 I get this error:
Starting WhisperWriter...
/Users/username/Documents/Projects/whisper-writer/venv/lib/python3.11/site-packages/whisper/timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit
Script activated. Whisper is set to run using a local model. To change this, modify the "use_api" value in the src\config.json file.
Press F4 to start recording and transcribing. Press Ctrl+C on the terminal window to quit.
^[OS2023-11-15 10:23:25.745 Python[58787:801092] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'NSWindow drag regions should only be invalidated on the Main Thread!'
*** First throw call stack:
(
0 CoreFoundation 0x00000001a63a91a8 __exceptionPreprocess + 240
1 libobjc.A.dylib 0x00000001a60f3e04 objc_exception_throw + 60
2 CoreFoundation 0x00000001a63d4128 _CFBundleGetValueForInfoKey + 0
3 AppKit 0x00000001a8eb7930 -[NSWindow(NSWindow_Theme) _postWindowNeedsToResetDragMarginsUnlessPostingDisabled] + 372
4 AppKit 0x00000001a8ea292c -[NSWindow _initContent:styleMask:backing:defer:contentView:] + 948
5 AppKit 0x00000001a8ea256c -[NSWindow initWithContentRect:styleMask:backing:defer:] + 56
6 libtk8.6.dylib 0x0000000106b75ddc TkMacOSXMakeRealWindowExist + 572
7 libtk8.6.dylib 0x0000000106b75a8c TkWmMapWindow + 56
8 libtk8.6.dylib 0x0000000106adedc4 Tk_MapWindow + 152
9 libtk8.6.dylib 0x0000000106ae7130 MapFrame + 76
10 libtcl8.6.dylib 0x0000000106a1c9b0 TclServiceIdle + 84
11 libtcl8.6.dylib 0x0000000106a0106c Tcl_DoOneEvent + 296
12 libtk8.6.dylib 0x0000000106b68ca8 TkpInit + 800
13 libtk8.6.dylib 0x0000000106ae007c Initialize + 2292
14 _tkinter.cpython-311-darwin.so 0x00000001040da368 Tcl_AppInit + 92
15 _tkinter.cpython-311-darwin.so 0x00000001040da000 Tkapp_New + 548
16 _tkinter.cpython-311-darwin.so 0x00000001040d9dd8 _tkinter_create_impl + 268
17 _tkinter.cpython-311-darwin.so 0x00000001040d9a10 _tkinter_create + 240
18 Python 0x00000001019fa034 cfunction_vectorcall_FASTCALL + 80
19 Python 0x0000000101abbf84 _PyEval_EvalFrameDefault + 52572
20 Python 0x0000000101ac19ec _PyEval_Vector + 156
21 Python 0x0000000101995098 _PyObject_FastCallDictTstate + 96
22 Python 0x0000000101a22754 slot_tp_init + 180
23 Python 0x0000000101a190d8 type_call + 136
24 Python 0x0000000101994d78 _PyObject_MakeTpCall + 128
25 Python 0x0000000101abc06c _PyEval_EvalFrameDefault + 52804
26 Python 0x0000000101ac19ec _PyEval_Vector + 156
27 Python 0x0000000101999158 method_vectorcall + 364
28 Python 0x0000000101bbeeb0 thread_run + 220
29 Python 0x0000000101b3aff4 pythread_wrapper + 48
30 libsystem_pthread.dylib 0x00000001a625c26c _pthread_start + 148
31 libsystem_pthread.dylib 0x00000001a625708c thread_start + 8
)
libc++abi: terminating with uncaught exception of type NSException
Thank you very much if you can help! I greatly appreciate your time!
Best wishes
Kasper
Is there a way to specify the device for recording?
So I installed Python 3.11 (on Windows) just to run this app, and I set up the venv like this:
$ py -3.11 -m venv venv
$ ./venv/Scripts/activate
But I'm not experienced at python and it's apparently still running on python 3.12?
$ pip install -r requirements.txt
Collecting numba==0.57.0 (from -r requirements.txt (line 26))
Using cached numba-0.57.0.tar.gz (2.5 MB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'error'
error: subprocess-exited-with-error
Getting requirements to build wheel did not run successfully.
exit code: 1
File "<string>", line 48, in _guard_py_ver
RuntimeError: Cannot install on Python version 3.12.1; only versions >=3.8,<3.12 are supported.
$ python --version
Python 3.12.1
Do you know what to do?
Would it be possible to use local api instead of web api?
I have downloaded a fresh copy from this repo, performed the install with no errors, configured my .env with my OpenAI API key, however, I get the below output stating incorrect API key, and shows a portion of an API key I have verified is not mine and not present in my OpenAI account.
Not sure why its using this key or where its getting it from. Never used this repo before on this machine.
Starting WhisperWriter... Script activated. Whisper is set to run using OpenAI's API. To change this, modify the "use_api" value in the src\config.json file. Press Ctrl+Shift+Space to start recording and transcribing. Press Ctrl+C on the terminal window to quit. Recording... Recording finished. Size: 73440 Transcribing audio file... Traceback (most recent call last): File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\src\transcription.py", line 94, in record_and_transcribe response = openai.Audio.transcribe(model=api_options['model'], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_resources\audio.py", line 65, in transcribe response, _, api_key = requestor.request("post", url, files=files, params=data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 230, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 624, in _interpret_response self._interpret_response_line( File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 687, in _interpret_response_line raise self.handle_error_response( openai.error.AuthenticationError: Incorrect API key provided: sk-NWgh1***************************************jtd2. You can find your API key at https://platform.openai.com/account/api-keys.
As mentioned in another thread, the UI of this tool is great for its purpose, but the latency is significant. Tests done on my RTX4090 laptop (Legion 7i pro from this year):
small.en
1 word: 4.98s (0.2 words/sec)
7 word sentence: 7.43s (0.94 words/sec)
53 words, 2 sentences: 21.3s (2.49 words/s)
large-v2
7 word sentence: 16.84s (0.42 words/sec)
52 words, 2 sentences: 31.2s (1.67 words/s)
If we could instead call one of the c++ based ports (https://github.com/Const-me/Whisper or https://github.com/guillaumekln/faster-whisper) this could be significantly reduced. I tested the first of those by recording the same speech to a file and transcribing to a text file:
1 word = .687s
5 words: 0.844s
53 words: 1.485s (on second run - it took 4.8s on first run presumably to warm something up)
Hello. Currently the language is read from the config.json upon running. It would be a great change if transcription.py would read the keyboard language each time from Windows' currently selected keyboard language, if the Whisper model allows it. This way, multilingual users could use the app switching languages on the fly without having to change the config.json and rerunning.
The following code returns the ISO-639-1 language code for the currently selected input method. It works on my PC. (Many thanks to ChatGPT - I'm very new to Python.)
import ctypes
# Load User32.dll and Kernel32.dll
user32 = ctypes.WinDLL('user32', use_last_error=True)
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
def get_keyboard_layout():
layout_id = user32.GetKeyboardLayout(user32.GetWindowThreadProcessId(user32.GetForegroundWindow(), None))
language_id = layout_id & (2**16 - 1)
return language_id
def get_input_language():
# Get the window that currently has the keyboard focus
foreground_window = ctypes.windll.user32.GetForegroundWindow()
# Get the identifier of the thread that created the window
thread_id = ctypes.windll.user32.GetWindowThreadProcessId(foreground_window, None)
# Get the current keyboard layout for the thread
layout_id = ctypes.windll.user32.GetKeyboardLayout(thread_id)
# Extract the language ID from the layout ID
language_id = layout_id & (2**16 - 1)
# Buffer for the language name
language_name = ctypes.create_unicode_buffer(255)
# Get the language name
ctypes.windll.kernel32.GetLocaleInfoW(language_id, 0x00000002, language_name, 255)
if 'Spanish' in language_name.value:
return 'es'
else:
return 'en'
I've changed transcribe.py in my machine and sent you a pull request, it works on Windows. New languages should be added.
I have tried the default hotkey, as well as several custom ones, and whatever I do, nothing pops up when I press the hotkey.
On an unrelated note, this script requires sudo
on Mac; not sure if there's a way to avoid this.
Hi! I wrote a similar tool, but mine uses push-to-talk
https://github.com/filyp/whisper-simple-dictation
I saw you want to have a push-to-talk option too, so you can have a look there :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.