savbell / whisper-writer Goto Github PK

View Code? Open in Web Editor NEW

244.0 8.0 40.0 931 KB

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.

License: GNU General Public License v3.0

Python 100.00%

openai whisper dictation speech-recognition speech-to-text typing-assistant faster-whisper openai-api openai-whisper

whisper-writer's People

Stargazers

Watchers

whisper-writer's Issues

Doesn't work for longer recordings using press_to_toggle

I used it for a 10 minute recording and it kept on repeating many sentences and did not detect the rest.

Fortunately it saved it as wav in temp, so I could use the official whisper to transcribe it.

I used the large model in both. Having used press_to_toggle, vad is not a reason.

New features: Automatically start and stop to record. Bip sound once text is insert.

Hi,
my name is Daniele. I'm an italian stenographer.
Basically I transcribe what I hear in text using a steno keyboard.

I've interest to better understand OpenAI Whisper and his capability.
I don't have a good hardware, so I'd like to use OpenAI Speech to Text API.

I thank you very much for your project WhisperWriter:

I tried it and it works for me.

Not a lot of application on github which use Whisper for Speech to Text allow at the same time:

Use a microphone as audio source.
Use Whisper API instead of local model.
Transcribe directly into any text editor.

So thanks for this opportunity.

It would be interesting to have two new features:

No need to press any shortcut to run record again.
I mean, once pressed shortcut like Ctrl Shift Spacebar the first time to run recording, once the audio recording is automatically stopped and text transcribed, It would be great if I don't need to press shortcut again, but a new recording starts automatically, waiting for my words.
I only need a new shortcut to stop recording definitively.
Because I'm a blind user, would be useful a sort of "bip sound" which inform me when text is transcribed, in this case I know I can speak again.

thanks a lot.

Daniele.

Seem to don't work

Hello, sorry to disturb, im not good in python or other, and i just wanted to use your whisper-writer in adition to whister to do somes test and to talk in real time online in other language.
And after all installed, following the steps, the thing start but that didn't do anything except showing the rec thing, no error or anything, just he didn't write... Maybe you have a soluce ?

Immediately abort typing when active window changed

I use the settings

    "activation_key": "ctrl+win",
    "recording_mode": "hold_to_record",

Unfortunately, this sometimes (too often) causes Windows to perform actions before Whisper Writer starts to auto-type:

like opening the start menu as with pressing Win (even opening and closing it multiple times in succession before auto-typing), or
even worse, switching to a new desktop and opening all kinds of windows there, including the settings app and some Windows program I never use with an old user interface. Note that Ctrl+Win+D creates a new desktop, and Ctrl+Win+Left/Right switches desktops. There may, e.g., be 12 new desktops created.
- EDIT: It happened again. The command line that brings up the old-GUI window according to System Informer is: "C:\WINDOWS\system32\rundll32.exe" dsquery.dll, OpenQueryWindow {16006700-87AD-11D0-9140-00AA00C16E65}.

Of course it would be good if, when Whisper Writer does its thing, no other software including the OS would even notice that Ctrl+Win has been pressed - if that's possible. But you could also stop auto-typing as soon as the active window changed.

Just starting to type with the start menu open isn't bad, however, because a text box simply appears that is auto-typed into. Since it's desirable to not lose your recording, you should let this happen by accepting an active-window change with the start menu window now being active. Then, one can at least copy-paste the text from the start menu text box. So, please make an exception for the start menu window (executable SearchApp.exe, window class Windows.UI.Core.CoreWindow, found with AutoHotkey "Window Spy" tool).

It may also be appropriate to completely deny auto-typing when the taskbar or the desktop window is active, as that may have contributed to the opening of various windows after creating a new desktop, and this could still be a problem when you click on the desktop before auto-typing, e.g. However, the real problem may also be that a modifier key stays pressed while Whisper Writer auto-types. In that case: Could Whisper Writer first wait until there isn't any interference anymore?

Doesn't seem to support German special characters (Umlaute) like ä, ü, ö, ß

I attached a voice recording that should transcribe to "Es wäre schön, wenn das Programm auch Umlaute unterstützen würde."

https://www.dropbox.com/scl/fi/5jez2tuqyjzxxpemtmz12/Umlaute.mp3?rlkey=mure8ckqmc801dgmfsox6f9bq&dl=0

Instead the output is this: "Es wre schn, wenn das Programm auch Umlaute untersttzen wrde."

Issue with Numba.jit

G:\Git clones\AI\whisper-writer\venv\Lib\site-packages\whisper\timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit
Traceback (most recent call last):
File "G:\Git clones\AI\whisper-writer\src\main.py", line 94, in
config = load_config_with_defaults()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\Git clones\AI\whisper-writer\src\main.py", line 52, in load_config_with_defaults
user_config = json.load(config_file)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jason\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 12 column 21 (char 265)

Please add support for local whisper api

https://github.com/mudler/LocalAI

Support can simly be added by

import openai
openai.base_api = "http://someinternalhost.local/v1"

The URL should be configured by the config.json file

"api_options": { "model": "whisper-1", "language": null, "temperature": 0.0, "initial_prompt": null **"base_api": "http://someinternalhost.local/v1"** },

Could not load library cudnn_ops_infer64_8.dll. (Windows)

Could not load library cudnn_ops_infer64_8.dll. Error code 126
Please make sure cudnn_ops_infer64_8.dll is in your library path!

Tried on Windows 10 and 11.

Consider merging with this project

Your UX of having the ability to Whisper-dictate anywhere is awesome. I'd love to see it combined with this project's engine which runs Whisper locally on your own GPU (even low-mid range ones like a 1050) https://github.com/Const-me/Whisper and does not require users to install Python, which is a great hurdle.

And congrats on the amazing project!

CUDA requirements need to match project dependencies. Distribute the libraries?

I absolutely love this project, the hold_to_record functionality is excellent! Thanks for sharing! On to the issue:

Not so expert users like me could probably use more detailed instructions on how to get GPU acceleration on Windows. README.md lists the requirements for GPU acceleration as "cuBLAS for CUDA 11 and cuDNN 8 for CUDA 11". There are some issues here:

The instructions don't specify if those libraries should be installed to Windows or to the Python environment. I guess that it is in Windows.
cuBLAS cannot be downloaded on its own, it can either be obtained from the CUDA toolkit or the HPC SDK. I guess that we should install the former, but it should be clarified in the instructions.
The requirements.txt lists a specific version of torch (PyTorch), 2.0.1, which apparently only works with CUDA 11.8 or 12.1. (and it should match the cuDNN 8 library version and the torch version, as both have different libraries for CUDA 11.x and CUDA 12.x) (see https://pytorch.org/get-started/previous-versions/)
The current CUDA toolkit version is 12.3, the one offered for download by default by nVidia, and apparently incompatible with the PyTorch version used in the project
No specific sub-version of cuDNN 8 is mentioned in the README.md as required (8.x? which x?).
The current cuDNN version is 9.0.0 and the one offered for download by default by nVidia
This is the main issue cuDNN archival versions 8.x (https://developer.nvidia.com/rdp/cudnn-archive) are offered without a Windows installer, only as a collection of .dll files

Solutions:

I solved it, but I don't know how because I implemented both of these attempted fixes at the same time:

I upgraded torch running pip3 install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
I copied the CUDA dlls as distributed by faster-whisper (https://github.com/Purfview/whisper-standalone-win/releases/tag/libs) to the Windows System32 folder. Note that nobody should ever, ever, do this. So it'd be great if whisper-writer could distribute the libraries and import them directly as suggested here SYSTRAN/faster-whisper#153 (comment)

Cannot install on Python version 3.12

So I installed Python 3.11 (on Windows) just to run this app, and I set up the venv like this:

$ py -3.11 -m venv venv

$ ./venv/Scripts/activate

But I'm not experienced at python and it's apparently still running on python 3.12?

$ pip install -r requirements.txt

Collecting numba==0.57.0 (from -r requirements.txt (line 26))
  Using cached numba-0.57.0.tar.gz (2.5 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  error: subprocess-exited-with-error

  Getting requirements to build wheel did not run successfully.
  exit code: 1

    File "<string>", line 48, in _guard_py_ver
  RuntimeError: Cannot install on Python version 3.12.1; only versions >=3.8,<3.12 are supported.

$ python --version
Python 3.12.1

Do you know what to do?

Accents are not correctly transcribed

When I hit Ctrl+Alt+Space and speak french, I can see in the python terminal that the accents are correctly transcribed, but on the target windows, the accents disappears. To test, just say "Touché, coulé", transcribed as follow: "Touch, coul. "

Thanks a lot for this nice software, it helps a lot, I can't believe that this is the only opensource whisper-based dictation tool...

initial_prompt doesn't seem to be working

I tried setting the initial_prompt to condition the output from the whisper-1 API but it seems like the whatever that is set in the config file is not influencing the output that is returned from the API. I even tried explicitly to test by saying "insert HELLO in front of in every sentence", but it doesn't seem to be working. Is there any additional configuration that is required, or am I missing something?

There appear to be 1 leaked semaphore objects to clean up at shutdown

I launched app, and pressed short cut, but nothing happen.
terminal say

UserWarning: resource_tracker:
There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

I use m2 mac, Python3.11.3

Hotkey doesn't work on MacOS

I have tried the default hotkey, as well as several custom ones, and whatever I do, nothing pops up when I press the hotkey.

On an unrelated note, this script requires sudo on Mac; not sure if there's a way to avoid this.

Latency/Using another backend

As mentioned in another thread, the UI of this tool is great for its purpose, but the latency is significant. Tests done on my RTX4090 laptop (Legion 7i pro from this year):

small.en
1 word: 4.98s (0.2 words/sec)
7 word sentence: 7.43s (0.94 words/sec)
53 words, 2 sentences: 21.3s (2.49 words/s)

large-v2
7 word sentence: 16.84s (0.42 words/sec)
52 words, 2 sentences: 31.2s (1.67 words/s)

If we could instead call one of the c++ based ports (https://github.com/Const-me/Whisper or https://github.com/guillaumekln/faster-whisper) this could be significantly reduced. I tested the first of those by recording the same speech to a file and transcribing to a text file:
1 word = .687s
5 words: 0.844s
53 words: 1.485s (on second run - it took 4.8s on first run presumably to warm something up)

Whisper-writer Popup Stuck on "Recording" After Transcription

Thank you for the fantastic whisper-writer app! I got CUDA working thanks to the helpful reply in #33

I have the app set to record only when I press the default key combo. After transcription, the popup switches back to "Recording" even though I'm not recording. It stays there until closed manually.

Has anyone else faced this issue? Any help would be appreciated.

nixkeyboard.py: You must be root to use this library on linux...

I tried to install whisper-writer on Ubuntu jammy

Requirements could only be satisfied by removing fixed version for "av" and "faster-whisper" in requirements.txt

Startup now fails

Traceback (most recent call last): File "/home/fabian/whisper-writer/src/main.py", line 136, in <module> keyboard.add_hotkey(config['activation_key'], on_shortcut) File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/__init__.py", line 639, in add_hotkey _listener.start_if_necessary() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_generic.py", line 35, in start_if_necessary self.init() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/__init__.py", line 196, in init _os_keyboard.init() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixkeyboard.py", line 113, in init build_device() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixkeyboard.py", line 109, in build_device ensure_root() File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixcommon.py", line 174, in ensure_root raise ImportError('You must be root to use this library on linux.') ImportError: You must be root to use this library on linux.

Issue with Inconsistent Transcription Quality in Whisper Writer

I have recently set up Whisper Writer, and it appears very promising. However, I am facing an issue with inconsistent transcription quality.

In some recordings, Whisper Writer transcribes everything I say perfectly. However, in other instances, it produces nonsensical output. For example:

Transcription:  Mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio.
Post-processed transcription: Mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio.

In the above example, I was speaking in normal, clear sentences and was not repeating the same words.

Error: NSInternalInconsistencyException

Hello! First: thank you so much for building this!
Disclaimer: I am new to python and github and i am grateful if someone can help me with my installation :)
I have spent yesterday and today trying to install it on my machine
(MacOS 12.5.1) and found out that the python installation with homebrew didn't have tkinter
-> so i officially installed python 3.11
tkinter seems to work
then I found out I need to sudo python run.py (like I said, im a beginner)

But now I encounter this upon changing the "activation_key": "f4", to f4 (because ctrl+alt+space didn't do anything for me.

I start up by navigarting to my directory and activating the virtual environment:
cd ~/Documents/Projects/whisper-writer
source venv/bin/activate
sudo python run.py

then everything starts up without error

when I press F4 I get this error:

Starting WhisperWriter...
/Users/username/Documents/Projects/whisper-writer/venv/lib/python3.11/site-packages/whisper/timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit
Script activated. Whisper is set to run using a local model. To change this, modify the "use_api" value in the src\config.json file.
Press F4 to start recording and transcribing. Press Ctrl+C on the terminal window to quit.
^[OS2023-11-15 10:23:25.745 Python[58787:801092] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'NSWindow drag regions should only be invalidated on the Main Thread!'
*** First throw call stack:
(
0 CoreFoundation 0x00000001a63a91a8 __exceptionPreprocess + 240
1 libobjc.A.dylib 0x00000001a60f3e04 objc_exception_throw + 60
2 CoreFoundation 0x00000001a63d4128 _CFBundleGetValueForInfoKey + 0
3 AppKit 0x00000001a8eb7930 -[NSWindow(NSWindow_Theme) _postWindowNeedsToResetDragMarginsUnlessPostingDisabled] + 372
4 AppKit 0x00000001a8ea292c -[NSWindow _initContent:styleMask:backing:defer:contentView:] + 948
5 AppKit 0x00000001a8ea256c -[NSWindow initWithContentRect:styleMask:backing:defer:] + 56
6 libtk8.6.dylib 0x0000000106b75ddc TkMacOSXMakeRealWindowExist + 572
7 libtk8.6.dylib 0x0000000106b75a8c TkWmMapWindow + 56
8 libtk8.6.dylib 0x0000000106adedc4 Tk_MapWindow + 152
9 libtk8.6.dylib 0x0000000106ae7130 MapFrame + 76
10 libtcl8.6.dylib 0x0000000106a1c9b0 TclServiceIdle + 84
11 libtcl8.6.dylib 0x0000000106a0106c Tcl_DoOneEvent + 296
12 libtk8.6.dylib 0x0000000106b68ca8 TkpInit + 800
13 libtk8.6.dylib 0x0000000106ae007c Initialize + 2292
14 _tkinter.cpython-311-darwin.so 0x00000001040da368 Tcl_AppInit + 92
15 _tkinter.cpython-311-darwin.so 0x00000001040da000 Tkapp_New + 548
16 _tkinter.cpython-311-darwin.so 0x00000001040d9dd8 _tkinter_create_impl + 268
17 _tkinter.cpython-311-darwin.so 0x00000001040d9a10 _tkinter_create + 240
18 Python 0x00000001019fa034 cfunction_vectorcall_FASTCALL + 80
19 Python 0x0000000101abbf84 _PyEval_EvalFrameDefault + 52572
20 Python 0x0000000101ac19ec _PyEval_Vector + 156
21 Python 0x0000000101995098 _PyObject_FastCallDictTstate + 96
22 Python 0x0000000101a22754 slot_tp_init + 180
23 Python 0x0000000101a190d8 type_call + 136
24 Python 0x0000000101994d78 _PyObject_MakeTpCall + 128
25 Python 0x0000000101abc06c _PyEval_EvalFrameDefault + 52804
26 Python 0x0000000101ac19ec _PyEval_Vector + 156
27 Python 0x0000000101999158 method_vectorcall + 364
28 Python 0x0000000101bbeeb0 thread_run + 220
29 Python 0x0000000101b3aff4 pythread_wrapper + 48
30 libsystem_pthread.dylib 0x00000001a625c26c _pthread_start + 148
31 libsystem_pthread.dylib 0x00000001a625708c thread_start + 8
)
libc++abi: terminating with uncaught exception of type NSException

Thank you very much if you can help! I greatly appreciate your time!
Best wishes
Kasper

Feature request: use shortcut to end recording

I propose a use_shortcut_to_end config setting. If True, it would ignore silence_duration, and end the recording when the user presses the shortcut key a second time.

Stuck in Creating Local Model...

Hi.

I get stuck in the "creating local model..." terminal output, without anything happening.
It is not even using my GPU, nor CPU, just sitting ducks.

Doesn't work with API

I believe you need to initialize local_model to None in the main script to fix this.

Confusing instructions

I'm sure your instructions are really handy and easy to understand for those who are actually know how to code or commonly work with Github,

But I don't

It talks about something about a virtual environment and then the following once you did install something and the text turns red and it's not very specific on what it wants you to do so I kind of just gave up

Requirements incomplete and device detection does not work properly

"Auto" for device does not detect the right values or if the complete setup is available to use CUDA DNN.
GPU processing seems to require CUDA DNN libraries.
Installing CUDA DNN requires signup to NVIDIA developers.

Whisper as local api

Would it be possible to use local api instead of web api?

Push to talk implementation

Hi! I wrote a similar tool, but mine uses push-to-talk
https://github.com/filyp/whisper-simple-dictation

I saw you want to have a push-to-talk option too, so you can have a look there :)

Seeming to pull erroneous OpenAI API key - different from what I set and verify in .env

I have downloaded a fresh copy from this repo, performed the install with no errors, configured my .env with my OpenAI API key, however, I get the below output stating incorrect API key, and shows a portion of an API key I have verified is not mine and not present in my OpenAI account.

Not sure why its using this key or where its getting it from. Never used this repo before on this machine.

Starting WhisperWriter... Script activated. Whisper is set to run using OpenAI's API. To change this, modify the "use_api" value in the src\config.json file. Press Ctrl+Shift+Space to start recording and transcribing. Press Ctrl+C on the terminal window to quit. Recording... Recording finished. Size: 73440 Transcribing audio file... Traceback (most recent call last): File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\src\transcription.py", line 94, in record_and_transcribe response = openai.Audio.transcribe(model=api_options['model'], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_resources\audio.py", line 65, in transcribe response, _, api_key = requestor.request("post", url, files=files, params=data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 230, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 624, in _interpret_response self._interpret_response_line( File "C:\Users\jerem\OneDrive\Documents\Main Documents\General\GitHub\whisper-writer\venv\Lib\site-packages\openai\api_requestor.py", line 687, in _interpret_response_line raise self.handle_error_response( openai.error.AuthenticationError: Incorrect API key provided: sk-NWgh1***************************************jtd2. You can find your API key at https://platform.openai.com/account/api-keys.

Possible incompatibility with other AI installations?

I tried installing Whisper before through another Github project and it completely destroyed my TextWebUI (for LLMs) and A1111 (for Stable Diffusion). If I'm not mistaken, they use Python 3.10. This project needs Python 3.11. Is it okay to install it or should I take some precautions first?

Multiple Issues

Hello,

First of all, there were issues/conflicts when installing this script with regard to "numba" and it not being compatible with the latest version of Python (3.12.0) so I had to additionally manually install Python 3.11.6 and use that version explicitly for the virtual environment but that wasn't the biggest deal.

It worked the first few times when I provided my API key and had "use_api: true," set but was very, very slow at transcribing even one or two word phrases. I then tried switching this to false to see if it would be any better and it just got stuck on "transcribing" forever. I exited the script and switched "use_api:" back to true and then tried running it again but now I'm getting the following error:

(venv) C:\Users\rich\whisper-writer>python run.py Starting WhisperWriter... C:\Users\rich\whisper-writer\venv\Lib\site-packages\whisper\timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. @numba.jit Script activated. Whisper is set to run using OpenAI's API. To change this, modify the "use_api" value in the src\config.json file. Press Ctrl+Alt+Space to start recording and transcribing. Press Ctrl+C on the terminal window to quit. Recording... Recording finished. Size: 9120 Transcribing audio file... Transcription: How are you? Exception in thread Thread-2 (process): Traceback (most recent call last): File "C:\Users\rich\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner self.run() File "C:\Users\rich\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run self._target(*self._args, **self._kwargs) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\_generic.py", line 58, in process if self.pre_process_event(event): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\__init__.py", line 218, in pre_process_event callback(event) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\keyboard\__init__.py", line 649, in <lambda> handler = lambda e: (event_type == KEY_DOWN and e.event_type == KEY_UP and e.scan_code in _logically_pressed_keys) or (event_type == e.event_type and callback()) ^^^^^^^^^^ File "C:\Users\rich\whisper-writer\src\main.py", line 85, in on_shortcut pyautogui.write(transcribed_text, interval=config['writing_key_press_delay']) File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\pyautogui\__init__.py", line 593, in wrapper failSafeCheck() File "C:\Users\rich\whisper-writer\venv\Lib\site-packages\pyautogui\__init__.py", line 1734, in failSafeCheck raise FailSafeException( pyautogui.FailSafeException: PyAutoGUI fail-safe triggered from mouse moving to a corner of the screen. To disable this fail-safe, set pyautogui.FAILSAFE to False. DISABLING FAIL-SAFE IS NOT RECOMMENDED.

How to re - open, after restarting computer

Thank you for the work you put into this!

I was able to get this functioning so that control shift space started the transcription. However after restarting my computer I don't know how to get it back up and running again.

I'm sorry, but I know little when it comes to code, and I used chat GPT to help me get it installed.

ModuleNotFoundError: No module named 'keyboard'

Hello,

I try to run whisper-writer, but this message comes up : ModuleNotFoundError: No module named 'keyboard'

pv@portpat:~/whisper-writer$ python3 run.py
Starting WhisperWriter...
Traceback (most recent call last):
File "/home/pv/whisper-writer/src/main.py", line 6, in
import keyboard
ModuleNotFoundError: No module named 'keyboard'

Any idea to help ?

thanks, patrick

Missing requirements: pycairo PyGObject

First startup of WhsiperWriter fails
Starting WhisperWriter... Traceback (most recent call last): File "/home/fabian/whisper-writer/src/main.py", line 7, in <module> from audioplayer import AudioPlayer File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/audioplayer/__init__.py", line 12, in <module> from .audioplayer_linux import AudioPlayerLinux as AudioPlayer File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/audioplayer/audioplayer_linux.py", line 7, in <module> import gi ModuleNotFoundError: No module named 'gi'
I solved the problem by adding some dependencies according to https://stackoverflow.com/questions/71369726/no-module-named-gi

sudo apt install libcairo2-dev libxt-dev libgirepository1.0-dev pip install pycairo PyGObject
3) Now the startup fails because I am not root --> #42

Won't stop recording

The script starts fine, and says that it is recording when I push the key combo, but it never stops recording, even when I mute my microphone.

Issues with installation -> webrtcvad error

Hello there,

this tool is awesome, and on one of my machines it runs very well, and on the other I still get an error while setting it up.

What I have done so far:

pip install webrtcvad (results in the same error shown below)
pip install webrtcvad-wheels (successful)
pip install nes-py --no-cache-dir
pip install --upgrade pip setuptools wheel
pip3 install webrtcvad-wheels
pip install Cmake
reinstalled Python (using 3.11.7, on both machines)
Python reinstall (3.11.7; 3.11.6)
setup everything from scratch several times
Installed & reinstalled Build tools - C++

Error description
I receive the following error, when I execute the command ((venv) C:\Windows\System32\whisper-writer>pip install -r requirements.txt):

Building wheels for collected packages: webrtcvad
Building wheel for webrtcvad (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for webrtcvad (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-311
copying webrtcvad.py -> build\lib.win-amd64-cpython-311
running build_ext
building 'webrtcvad' extension
creating build\temp.win-amd64-cpython-311
creating build\temp.win-amd64-cpython-311\Release
creating build\temp.win-amd64-cpython-311\Release\cbits
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio\signal_processing
creating build\temp.win-amd64-cpython-311\Release\cbits\webrtc\common_audio\vad
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -D_WIN32 -Icbits -IC:\Windows\System32\whisper-writer\venv\include -IC:\Users\lisa\AppData\Local\Programs\Python\Python311\include -IC:\Users\lisa_\AppData\Local\Programs\Python\Python311\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" /Tccbits\pywebrtcvad.c /Fobuild\temp.win-amd64-cpython-311\Release\cbits\pywebrtcvad.obj
pywebrtcvad.c
C:\Users\lisa_\AppData\Local\Programs\Python\Python311\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\bin\HostX86\x64\cl.exe' failed with exit code 2
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for webrtcvad
Failed to build webrtcvad
ERROR: Could not build wheels for webrtcvad, which is required to install pyproject.toml-based projects

Before reinstalling C++ Build Tools, the error (fatal error C1083: Cannot open include file: 'io.h': No such file or directory) was not present, but further reinstalling didn't fix it yet. The error shown above about the webrtcvad-wheels is persistent since my first tries to install it.

Do you guys have an idea, how to fix it?

Thanks for your time and support,
Chris

Running locally, is it supposed to download the models automatically?

I couldn't find information on this anywhere.

Multilingual support

Hi! Thanks for creating this! I'm lucky I found it, as I was looking for a program that does exactly this for a while now.

How does the language code in the options affect the output?
I know Whisper's multilingual models can transcribe input without specifying its language. So if I say one sentence in one language and another sentence in a second language, Whisper can transcribe both in their respective languages.
And this is what I need as well. I speak multiple languages and use them all on my PC. The language code option seems to restrict this use case though. Or does it? I don't really know.

Could you please elaborate on how this works and if/how I can achieve what I want?

Thanks!

Detect input language each time the hold_to_record shortcut is pressed

Hello. Currently the language is read from the config.json upon running. It would be a great change if transcription.py would read the keyboard language each time from Windows' currently selected keyboard language, if the Whisper model allows it. This way, multilingual users could use the app switching languages on the fly without having to change the config.json and rerunning.

The following code returns the ISO-639-1 language code for the currently selected input method. It works on my PC. (Many thanks to ChatGPT - I'm very new to Python.)

import ctypes

# Load User32.dll and Kernel32.dll
user32 = ctypes.WinDLL('user32', use_last_error=True)
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

def get_keyboard_layout():
    layout_id = user32.GetKeyboardLayout(user32.GetWindowThreadProcessId(user32.GetForegroundWindow(), None))
    language_id = layout_id & (2**16 - 1)
    return language_id

def get_input_language():
    # Get the window that currently has the keyboard focus
    foreground_window = ctypes.windll.user32.GetForegroundWindow()

    # Get the identifier of the thread that created the window
    thread_id = ctypes.windll.user32.GetWindowThreadProcessId(foreground_window, None)

    # Get the current keyboard layout for the thread
    layout_id = ctypes.windll.user32.GetKeyboardLayout(thread_id)

    # Extract the language ID from the layout ID
    language_id = layout_id & (2**16 - 1)

    # Buffer for the language name
    language_name = ctypes.create_unicode_buffer(255)

    # Get the language name
    ctypes.windll.kernel32.GetLocaleInfoW(language_id, 0x00000002, language_name, 255)

    if 'Spanish' in language_name.value:
        return 'es'
    else:
        return 'en'

I've changed transcribe.py in my machine and sent you a pull request, it works on Windows. New languages should be added.

Version 2024-05-28: OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.

Hello. I'm getting the error towards the end of the code when trying to run whisper-writer. I'm on Windows 11 23H2. I've followed the installation instructions currently on the readme, but it says it fails to support CUDA. The requirements.txt version of torch is CPU only, apparently. So I installed CUDA 11.8 and copied the cuBLAS library to the same path.

Perhaps we could refine the installation instructions, the requirements.txt, or create a container (would it work in a container??) so that the installation is easier. Other alternatives are distributing the libraries or exploring nvidia-cudnn-cu11. This is a brand new Windows 11 installation, so my issues couldn't be related to leftover old versions.


> set PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin;%PATH%
> set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

> python
Python 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print("PyTorch version:", torch.__version__)
PyTorch version: 2.3.0+cu118
>>> print("PyTorch built with CUDA:", torch.version.cuda)
PyTorch built with CUDA: 11.8


> python .\run.py
Starting WhisperWriter...
Creating local model...
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Need conversion of sample rate

The whisper models require a 16k sample rate, but not many audio devices provide that sample rate. Mine, for example, only supports 44100 and 192000. Leaving the sample rate at 16000 in src/config.json results in an error:

Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2721
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
  File "/home/mark/compile/whisper/whisper-writer/src/transcription.py", line 52, in record_and_transcribe
    with sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16', blocksize=sample_rate * frame_duration // 1000,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 1421, in __init__
    _StreamBase.__init__(self, kind='input', wrap_callback='array',
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 898, in __init__
    _check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 2747, in _check
    raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]

Changing it to 44100, on the other hand, results in:

Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2721
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
  File "/home/mark/compile/whisper/whisper-writer/src/transcription.py", line 52, in record_and_transcribe
    with sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16', blocksize=sample_rate * frame_duration // 1000,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 1421, in __init__
    _StreamBase.__init__(self, kind='input', wrap_callback='array',
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 898, in __init__
    _check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
  File "/home/mark/compile/whisper/whisper_dictation/whispervenv/lib/python3.11/site-packages/sounddevice.py", line 2747, in _check
    raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]

Here is the output of arecord -Dhw:0 --dump-hw-params:

Warning: Some sources (like microphones) may produce inaudible results
         with 8-bit sampling. Use '-f' argument to increase resolution
         e.g. '-f S16_LE'.
HW Params of device "hw:0":
--------------------
ACCESS:  MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT:  S16_LE S32_LE
SUBFORMAT:  STD
SAMPLE_BITS: [16 32]
FRAME_BITS: [32 64]
CHANNELS: 2
RATE: [44100 192000]
PERIOD_TIME: (83 11888617)
PERIOD_SIZE: [16 524288]
PERIOD_BYTES: [128 2097152]
PERIODS: [2 32]
BUFFER_TIME: (166 23777234)
BUFFER_SIZE: [32 1048576]
BUFFER_BYTES: [128 4194304]
TICK_TIME: ALL
--------------------
arecord: set_params:1371: Sample format non available
Available formats:
- S16_LE
- S32_LE

Does this need some abstraction like sox?

Program Not Responsive after a few minutes; Require Restart

When I start using the program, it seems to be working just fine. However, if I haven't used it for 20 or 30 minutes, the recording trigger button no longer works (or isn't recognized). The only way I can get the program to work again is to restart it. Any thoughts on fixing this feature or what might be causing the bug .

Thanks

Pulseaudio

Is there a way to specify the device for recording?

"You must be root"

Error on startup: "You must be root"

Starting WhisperWriter...
Script activated. Whisper is set to run using OpenAI's API. To change this, modify the "use_api" value in the src\config.json file.
WhisperWriter is set to record using voice_activity_detection. To change this, modify the "recording_mode" value in the src\config.json file.
The activation key combo is set to Ctrl+Shift+Space. When it is pressed, recording will start, and will stop when you stop speaking.
Press Ctrl+C on the terminal window to quit.
Traceback (most recent call last):
File "/home/fabian/whisper-writer/src/main.py", line 141, in
keyboard.add_hotkey(config['activation_key'], on_shortcut)
File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/init.py", line 639, in add_hotkey
_listener.start_if_necessary()
File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_generic.py", line 35, in start_if_necessary
self.init()
File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/init.py", line 196, in init
_os_keyboard.init()
File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixkeyboard.py", line 113, in init
build_device()
File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixkeyboard.py", line 109, in build_device
ensure_root()
File "/home/fabian/whisper-writer/venv/lib/python3.10/site-packages/keyboard/_nixcommon.py", line 174, in ensure_root
raise ImportError('You must be root to use this library on linux.')
ImportError: You must be root to use this library on linux.

LocalAI support for locally hosted Whisper Models

It would be great to add instructions for the abilty to use the LocalAI API to make use of a locally hosted Whisper model on a more powerful machine (rather than using the Base or Tiny model on the machine running whiser-writer).

Audio to text: https://localai.io/features/audio-to-text/

savbell / whisper-writer Goto Github PK

whisper-writer's People

Stargazers

Watchers

Forkers

whisper-writer's Issues

Recommend Projects

Recommend Topics

Recommend Org