Coder Social home page Coder Social logo

savbell / whisper-writer Goto Github PK

View Code? Open in Web Editor NEW
188.0 7.0 32.0 113 KB

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.

License: GNU General Public License v3.0

Python 100.00%
openai whisper dictation speech-recognition speech-to-text typing-assistant faster-whisper openai-api openai-whisper

whisper-writer's People

Contributors

grahambojangles avatar josher19 avatar kernalan avatar martinligabue avatar mcesgow avatar savbell avatar uberkael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

whisper-writer's Issues

Won't stop recording

The script starts fine, and says that it is recording when I push the key combo, but it never stops recording, even when I mute my microphone.

initial_prompt doesn't seem to be working

I tried setting the initial_prompt to condition the output from the whisper-1 API but it seems like the whatever that is set in the config file is not influencing the output that is returned from the API. I even tried explicitly to test by saying "insert HELLO in front of in every sentence", but it doesn't seem to be working. Is there any additional configuration that is required, or am I missing something?

Please add support for local whisper api

https://github.com/mudler/LocalAI

Support can simly be added by

import openai
openai.base_api = "http://someinternalhost.local/v1"

The URL should be configured by the config.json file

"api_options": { "model": "whisper-1", "language": null, "temperature": 0.0, "initial_prompt": null **"base_api": "http://someinternalhost.local/v1"** },

CUDA requirements need to match project dependencies. Distribute the libraries?

I absolutely love this project, the hold_to_record functionality is excellent! Thanks for sharing! On to the issue:

Not so expert users like me could probably use more detailed instructions on how to get GPU acceleration on Windows. README.md lists the requirements for GPU acceleration as "cuBLAS for CUDA 11 and cuDNN 8 for CUDA 11". There are some issues here:

  1. The instructions don't specify if those libraries should be installed to Windows or to the Python environment. I guess that it is in Windows.
  2. cuBLAS cannot be downloaded on its own, it can either be obtained from the CUDA toolkit or the HPC SDK. I guess that we should install the former, but it should be clarified in the instructions.
  3. The requirements.txt lists a specific version of torch (PyTorch), 2.0.1, which apparently only works with CUDA 11.8 or 12.1. (and it should match the cuDNN 8 library version and the torch version, as both have different libraries for CUDA 11.x and CUDA 12.x) (see https://pytorch.org/get-started/previous-versions/)
  4. The current CUDA toolkit version is 12.3, the one offered for download by default by nVidia, and apparently incompatible with the PyTorch version used in the project
  5. No specific sub-version of cuDNN 8 is mentioned in the README.md as required (8.x? which x?).
  6. The current cuDNN version is 9.0.0 and the one offered for download by default by nVidia
  7. This is the main issue cuDNN archival versions 8.x (https://developer.nvidia.com/rdp/cudnn-archive) are offered without a Windows installer, only as a collection of .dll files

Solutions:

I solved it, but I don't know how because I implemented both of these attempted fixes at the same time:

  1. I upgraded torch running pip3 install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
  2. I copied the CUDA dlls as distributed by faster-whisper (https://github.com/Purfview/whisper-standalone-win/releases/tag/libs) to the Windows System32 folder. Note that nobody should ever, ever, do this. So it'd be great if whisper-writer could distribute the libraries and import them directly as suggested here SYSTRAN/faster-whisper#153 (comment)

Multilingual support

Hi! Thanks for creating this! I'm lucky I found it, as I was looking for a program that does exactly this for a while now.

How does the language code in the options affect the output?
I know Whisper's multilingual models can transcribe input without specifying its language. So if I say one sentence in one language and another sentence in a second language, Whisper can transcribe both in their respective languages.
And this is what I need as well. I speak multiple languages and use them all on my PC. The language code option seems to restrict this use case though. Or does it? I don't really know.

Could you please elaborate on how this works and if/how I can achieve what I want?

Thanks!

nixkeyboard.py: You must be root to use this library on linux...

I tried to install whisper-writer on Ubuntu jammy

Requirements could only be satisfied by removing fixed version for "av" and "faster-whisper" in requirements.txt

Startup now fails

Traceback (most recent call last): File "/home/fabian/whisper-writer/src/main.py", line 136, in <module> keyboard.add_hotkey(config['activation_key'], on_shortcut) File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/__init__.py", line 639, in add_hotkey _listener.start_if_necessary() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_generic.py", line 35, in start_if_necessary self.init() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/__init__.py", line 196, in init _os_keyboard.init() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixkeyboard.py", line 113, in init build_device() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixkeyboard.py", line 109, in build_device ensure_root() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixcommon.py", line 174, in ensure_root raise ImportError('You must be root to use this library on linux.') ImportError: You must be root to use this library on linux.

Accents are not correctly transcribed

When I hit Ctrl+Alt+Space and speak french, I can see in the python terminal that the accents are correctly transcribed, but on the target windows, the accents disappears. To test, just say "Touché, coulé", transcribed as follow: "Touch, coul. "

Thanks a lot for this nice software, it helps a lot, I can't believe that this is the only opensource whisper-based dictation tool...

Immediately abort typing when active window changed

I use the settings

    "activation_key": "ctrl+win",
    "recording_mode": "hold_to_record",

Unfortunately, this sometimes (too often) causes Windows to perform actions before Whisper Writer starts to auto-type:

  • like opening the start menu as with pressing Win (even opening and closing it multiple times in succession before auto-typing), or
  • even worse, switching to a new desktop and opening all kinds of windows there, including the settings app and some Windows program I never use with an old user interface. Note that Ctrl+Win+D creates a new desktop, and Ctrl+Win+Left/Right switches desktops. There may, e.g., be 12 new desktops created.
    • EDIT: It happened again. The command line that brings up the old-GUI window according to System Informer is: "C:\WINDOWS\system32\rundll32.exe" dsquery.dll, OpenQueryWindow {16006700-87AD-11D0-9140-00AA00C16E65}.

Of course it would be good if, when Whisper Writer does its thing, no other software including the OS would even notice that Ctrl+Win has been pressed - if that's possible. But you could also stop auto-typing as soon as the active window changed.

Just starting to type with the start menu open isn't bad, however, because a text box simply appears that is auto-typed into. Since it's desirable to not lose your recording, you should let this happen by accepting an active-window change with the start menu window now being active. Then, one can at least copy-paste the text from the start menu text box. So, please make an exception for the start menu window (executable SearchApp.exe, window class Windows.UI.Core.CoreWindow, found with AutoHotkey "Window Spy" tool).

It may also be appropriate to completely deny auto-typing when the taskbar or the desktop window is active, as that may have contributed to the opening of various windows after creating a new desktop, and this could still be a problem when you click on the desktop before auto-typing, e.g. However, the real problem may also be that a modifier key stays pressed while Whisper Writer auto-types. In that case: Could Whisper Writer first wait until there isn't any interference anymore?

Issue with Numba.jit

G:\Git clones\AI\whisper-writer\venv\Lib\site-packages\whisper\timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit
Traceback (most recent call last):
File "G:\Git clones\AI\whisper-writer\src\main.py", line 94, in
config = load_config_with_defaults()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\Git clones\AI\whisper-writer\src\main.py", line 52, in load_config_with_defaults
user_config = json.load(config_file)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 12 column 21 (char 265)

Doesn't work for longer recordings using press_to_toggle

I used it for a 10 minute recording and it kept on repeating many sentences and did not detect the rest.

Fortunately it saved it as wav in temp, so I could use the official whisper to transcribe it.

I used the large model in both. Having used press_to_toggle, vad is not a reason.

Need conversion of sample rate

The whisper models require a 16k sample rate, but not many audio devices provide that sample rate. Mine, for example, only supports 44100 and 192000. Leaving the sample rate at 16000 in src/config.json results in an error:

Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2721
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
  File "/home/mark/compile/whisper/whisper-writer/src/transcription.py", line 52, in record_and_transcribe
    with sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16', blocksize=sample_rate * frame_duration // 1000,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 1421, in __init__
    _StreamBase.__init__(self, kind='input', wrap_callback='array',
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 898, in __init__
    _check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 2747, in _check
    raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]

Changing it to 44100, on the other hand, results in:

Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2721
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
  File "/home/mark/compile/whisper/whisper-writer/src/transcription.py", line 52, in record_and_transcribe
    with sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16', blocksize=sample_rate * frame_duration // 1000,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 1421, in __init__
    _StreamBase.__init__(self, kind='input', wrap_callback='array',
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 898, in __init__
    _check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 2747, in _check
    raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]

Here is the output of arecord -Dhw:0 --dump-hw-params:

Warning: Some sources (like microphones) may produce inaudible results
         with 8-bit sampling. Use '-f' argument to increase resolution
         e.g. '-f S16_LE'.
HW Params of device "hw:0":
--------------------
ACCESS:  MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT:  S16_LE S32_LE
SUBFORMAT:  STD
SAMPLE_BITS: [16 32]
FRAME_BITS: [32 64]
CHANNELS: 2
RATE: [44100 192000]
PERIOD_TIME: (83 11888617)
PERIOD_SIZE: [16 524288]
PERIOD_BYTES: [128 2097152]
PERIODS: [2 32]
BUFFER_TIME: (166 23777234)
BUFFER_SIZE: [32 1048576]
BUFFER_BYTES: [128 4194304]
TICK_TIME: ALL
--------------------
arecord: set_params:1371: Sample format non available
Available formats:
- S16_LE
- S32_LE

Does this need some abstraction like sox?

Multiple Issues

Hello,

First of all, there were issues/conflicts when installing this script with regard to "numba" and it not being compatible with the latest version of Python (3.12.0) so I had to additionally manually install Python 3.11.6 and use that version explicitly for the virtual environment but that wasn't the biggest deal.

It worked the first few times when I provided my API key and had "use_api: true," set but was very, very slow at transcribing even one or two word phrases. I then tried switching this to false to see if it would be any better and it just got stuck on "transcribing" forever. I exited the script and switched "use_api:" back to true and then tried running it again but now I'm getting the following error:

(venv) C:\Users\rich\whisper-writer>python run.py Starting WhisperWriter... C:\Users\rich\whisper-writer\venv\Lib\site-packages\whisper\timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. @numba.jit Script activated. Whisper is set to run using OpenAI's API. To change this, modify the "use_api" value in the src\config.json file. Press Ctrl+Alt+Space to start recording and transcribing. Press Ctrl+C on the terminal window to quit. Recording... Recording finished. Size: 9120 Transcribing audio file... Transcription: How are you? Exception in thread Thread-2 (process): Traceback (most recent call last): File "C:\Users\rich\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner self.run() File "C:\Users\rich\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run self._target(*self._args, **self._kwargs) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\_generic.py", line 58, in process if self.pre_process_event(event): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\__init__.py", line 218, in pre_process_event callback(event) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\__init__.py", line 649, in <lambda> handler = lambda e: (event_type == KEY_DOWN and e.event_type == KEY_UP and e.scan_code in _logically_pressed_keys) or (event_type == e.event_type and callback()) ^^^^^^^^^^ File "C:\Users\rich\whisper-writer\src\main.py", line 85, in on_shortcut pyautogui.write(transcribed_text, interval=config['writing_key_press_delay']) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\pyautogui\__init__.py", line 593, in wrapper failSafeCheck() File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\pyautogui\__init__.py", line 1734, in failSafeCheck raise FailSafeException( pyautogui.FailSafeException: PyAutoGUI fail-safe triggered from mouse moving to a corner of the screen. To disable this fail-safe, set pyautogui.FAILSAFE to False. DISABLING FAIL-SAFE IS NOT RECOMMENDED.

Seem to don't work

Hello, sorry to disturb, im not good in python or other, and i just wanted to use your whisper-writer in adition to whister to do somes test and to talk in real time online in other language.
And after all installed, following the steps, the thing start but that didn't do anything except showing the rec thing, no error or anything, just he didn't write... Maybe you have a soluce ?

Doesn't work with API

I believe you need to initialize local_model to None in the main script to fix this.

Consider merging with this project

Your UX of having the ability to Whisper-dictate anywhere is awesome. I'd love to see it combined with this project's engine which runs Whisper locally on your own GPU (even low-mid range ones like a 1050) https://github.com/Const-me/Whisper and does not require users to install Python, which is a great hurdle.

And congrats on the amazing project!

Issues with installation -> webrtcvad error

Hello there,

this tool is awesome, and on one of my machines it runs very well, and on the other I still get an error while setting it up.

What I have done so far:

  • pip install webrtcvad (results in the same error shown below)
  • pip install webrtcvad-wheels (successful)
  • pip install nes-py --no-cache-dir
  • pip install --upgrade pip setuptools wheel
  • pip3 install webrtcvad-wheels
  • pip install Cmake
  • reinstalled Python (using 3.11.7, on both machines)
  • Python reinstall (3.11.7; 3.11.6)
  • setup everything from scratch several times
  • Installed & reinstalled Build tools - C++
    buildtools

Error description
I receive the following error, when I execute the command ((venv) C:\Windows\System32\whisper-writer>pip install -r requirements.txt):

Building wheels for collected packages: webrtcvad
Building wheel for webrtcvad (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for webrtcvad (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-311
copying webrtcvad.py -> build\lib.win-amd64-cpython-311
running build_ext
building 'webrtcvad' extension
creating build\temp.win-amd64-cpython-311
creating build\temp.win-amd64-cpython-311\Release
creating build\temp.win-amd64-cpython-311\Release\cbits
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio\signal_processing
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio\vad
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -D_WIN32 -Icbits -IC:\Windows\System32\whisper-writer\venv\include -IC:\Users\lisa
\AppData\Local\Programs\Python\Python311\include -IC:\Users\lisa_\AppData\Local\Programs\Python\Python311\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" /Tccbits\pywebrtcvad.c /Fobuild\temp.win-amd64-cpython-311\Release\cbits\pywebrtcvad.obj
pywebrtcvad.c
C:\Users\lisa_\AppData\Local\Programs\Python\Python311\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\bin\HostX86\x64\cl.exe' failed with exit code 2
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for webrtcvad
Failed to build webrtcvad
ERROR: Could not build wheels for webrtcvad, which is required to install pyproject.toml-based projects


Before reinstalling C++ Build Tools, the error (fatal error C1083: Cannot open include file: 'io.h': No such file or directory) was not present, but further reinstalling didn't fix it yet. The error shown above about the webrtcvad-wheels is persistent since my first tries to install it.

Do you guys have an idea, how to fix it?

Thanks for your time and support,
Chris

Error: NSInternalInconsistencyException

Hello! First: thank you so much for building this!
Disclaimer: I am new to python and github and i am grateful if someone can help me with my installation :)
I have spent yesterday and today trying to install it on my machine
(MacOS 12.5.1) and found out that the python installation with homebrew didn't have tkinter
-> so i officially installed python 3.11
tkinter seems to work
then I found out I need to sudo python run.py (like I said, im a beginner)

But now I encounter this upon changing the "activation_key": "f4", to f4 (because ctrl+alt+space didn't do anything for me.

I start up by navigarting to my directory and activating the virtual environment:
cd ~/Documents/Projects/whisper-writer
source venv/bin/activate
sudo python run.py

then everything starts up without error

when I press F4 I get this error:

Starting WhisperWriter...
/Users/username/Documents/Projects/whisper-writer/venv/lib/python3.11/site-packages/whisper/timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit
Script activated. Whisper is set to run using a local model. To change this, modify the "use_api" value in the src\config.json file.
Press F4 to start recording and transcribing. Press Ctrl+C on the terminal window to quit.
^[OS2023-11-15 10:23:25.745 Python[58787:801092] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'NSWindow drag regions should only be invalidated on the Main Thread!'
*** First throw call stack:
(
0 CoreFoundation 0x00000001a63a91a8 __exceptionPreprocess + 240
1 libobjc.A.dylib 0x00000001a60f3e04 objc_exception_throw + 60
2 CoreFoundation 0x00000001a63d4128 _CFBundleGetValueForInfoKey + 0
3 AppKit 0x00000001a8eb7930 -[NSWindow(NSWindow_Theme) _postWindowNeedsToResetDragMarginsUnlessPostingDisabled] + 372
4 AppKit 0x00000001a8ea292c -[NSWindow _initContent:styleMask:backing:defer:contentView:] + 948
5 AppKit 0x00000001a8ea256c -[NSWindow initWithContentRect:styleMask:backing:defer:] + 56
6 libtk8.6.dylib 0x0000000106b75ddc TkMacOSXMakeRealWindowExist + 572
7 libtk8.6.dylib 0x0000000106b75a8c TkWmMapWindow + 56
8 libtk8.6.dylib 0x0000000106adedc4 Tk_MapWindow + 152
9 libtk8.6.dylib 0x0000000106ae7130 MapFrame + 76
10 libtcl8.6.dylib 0x0000000106a1c9b0 TclServiceIdle + 84
11 libtcl8.6.dylib 0x0000000106a0106c Tcl_DoOneEvent + 296
12 libtk8.6.dylib 0x0000000106b68ca8 TkpInit + 800
13 libtk8.6.dylib 0x0000000106ae007c Initialize + 2292
14 _tkinter.cpython-311-darwin.so 0x00000001040da368 Tcl_AppInit + 92
15 _tkinter.cpython-311-darwin.so 0x00000001040da000 Tkapp_New + 548
16 _tkinter.cpython-311-darwin.so 0x00000001040d9dd8 _tkinter_create_impl + 268
17 _tkinter.cpython-311-darwin.so 0x00000001040d9a10 _tkinter_create + 240
18 Python 0x00000001019fa034 cfunction_vectorcall_FASTCALL + 80
19 Python 0x0000000101abbf84 _PyEval_EvalFrameDefault + 52572
20 Python 0x0000000101ac19ec _PyEval_Vector + 156
21 Python 0x0000000101995098 _PyObject_FastCallDictTstate + 96
22 Python 0x0000000101a22754 slot_tp_init + 180
23 Python 0x0000000101a190d8 type_call + 136
24 Python 0x0000000101994d78 _PyObject_MakeTpCall + 128
25 Python 0x0000000101abc06c _PyEval_EvalFrameDefault + 52804
26 Python 0x0000000101ac19ec _PyEval_Vector + 156
27 Python 0x0000000101999158 method_vectorcall + 364
28 Python 0x0000000101bbeeb0 thread_run + 220
29 Python 0x0000000101b3aff4 pythread_wrapper + 48
30 libsystem_pthread.dylib 0x00000001a625c26c _pthread_start + 148
31 libsystem_pthread.dylib 0x00000001a625708c thread_start + 8
)
libc++abi: terminating with uncaught exception of type NSException

Thank you very much if you can help! I greatly appreciate your time!
Best wishes
Kasper

Pulseaudio

Is there a way to specify the device for recording?

Cannot install on Python version 3.12

So I installed Python 3.11 (on Windows) just to run this app, and I set up the venv like this:

$ py -3.11 -m venv venv

$ ./venv/Scripts/activate

But I'm not experienced at python and it's apparently still running on python 3.12?

$ pip install -r requirements.txt

Collecting numba==0.57.0 (from -r requirements.txt (line 26))
  Using cached numba-0.57.0.tar.gz (2.5 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  error: subprocess-exited-with-error

  Getting requirements to build wheel did not run successfully.
  exit code: 1

    File "<string>", line 48, in _guard_py_ver
  RuntimeError: Cannot install on Python version 3.12.1; only versions >=3.8,<3.12 are supported.

$ python --version
Python 3.12.1

Do you know what to do?

Seeming to pull erroneous OpenAI API key - different from what I set and verify in .env

I have downloaded a fresh copy from this repo, performed the install with no errors, configured my .env with my OpenAI API key, however, I get the below output stating incorrect API key, and shows a portion of an API key I have verified is not mine and not present in my OpenAI account.

Not sure why its using this key or where its getting it from. Never used this repo before on this machine.

Starting WhisperWriter... Script activated. Whisper is set to run using OpenAI's API. To change this, modify the "use_api" value in the src\config.json file. Press Ctrl+Shift+Space to start recording and transcribing. Press Ctrl+C on the terminal window to quit. Recording... Recording finished. Size: 73440 Transcribing audio file... Traceback (most recent call last): File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\src\transcription.py", line 94, in record_and_transcribe response = openai.Audio.transcribe(model=api_options['model'], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_resources\audio.py", line 65, in transcribe response, _, api_key = requestor.request("post", url, files=files, params=data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 230, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 624, in _interpret_response self._interpret_response_line( File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 687, in _interpret_response_line raise self.handle_error_response( openai.error.AuthenticationError: Incorrect API key provided: sk-NWgh1***************************************jtd2. You can find your API key at https://platform.openai.com/account/api-keys.

Latency/Using another backend

As mentioned in another thread, the UI of this tool is great for its purpose, but the latency is significant. Tests done on my RTX4090 laptop (Legion 7i pro from this year):

small.en
1 word: 4.98s (0.2 words/sec)
7 word sentence: 7.43s (0.94 words/sec)
53 words, 2 sentences: 21.3s (2.49 words/s)

large-v2
7 word sentence: 16.84s (0.42 words/sec)
52 words, 2 sentences: 31.2s (1.67 words/s)

If we could instead call one of the c++ based ports (https://github.com/Const-me/Whisper or https://github.com/guillaumekln/faster-whisper) this could be significantly reduced. I tested the first of those by recording the same speech to a file and transcribing to a text file:
1 word = .687s
5 words: 0.844s
53 words: 1.485s (on second run - it took 4.8s on first run presumably to warm something up)

Detect input language each time the hold_to_record shortcut is pressed

Hello. Currently the language is read from the config.json upon running. It would be a great change if transcription.py would read the keyboard language each time from Windows' currently selected keyboard language, if the Whisper model allows it. This way, multilingual users could use the app switching languages on the fly without having to change the config.json and rerunning.
image

The following code returns the ISO-639-1 language code for the currently selected input method. It works on my PC. (Many thanks to ChatGPT - I'm very new to Python.)

import ctypes

# Load User32.dll and Kernel32.dll
user32 = ctypes.WinDLL('user32', use_last_error=True)
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

def get_keyboard_layout():
    layout_id = user32.GetKeyboardLayout(user32.GetWindowThreadProcessId(user32.GetForegroundWindow(), None))
    language_id = layout_id & (2**16 - 1)
    return language_id

def get_input_language():
    # Get the window that currently has the keyboard focus
    foreground_window = ctypes.windll.user32.GetForegroundWindow()

    # Get the identifier of the thread that created the window
    thread_id = ctypes.windll.user32.GetWindowThreadProcessId(foreground_window, None)

    # Get the current keyboard layout for the thread
    layout_id = ctypes.windll.user32.GetKeyboardLayout(thread_id)

    # Extract the language ID from the layout ID
    language_id = layout_id & (2**16 - 1)

    # Buffer for the language name
    language_name = ctypes.create_unicode_buffer(255)

    # Get the language name
    ctypes.windll.kernel32.GetLocaleInfoW(language_id, 0x00000002, language_name, 255)

    if 'Spanish' in language_name.value:
        return 'es'
    else:
        return 'en'

I've changed transcribe.py in my machine and sent you a pull request, it works on Windows. New languages should be added.
image

Hotkey doesn't work on MacOS

I have tried the default hotkey, as well as several custom ones, and whatever I do, nothing pops up when I press the hotkey.

On an unrelated note, this script requires sudo on Mac; not sure if there's a way to avoid this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.