Coder Social home page Coder Social logo

stage-whisper / stage-whisper Goto Github PK

View Code? Open in Web Editor NEW
244.0 18.0 28.0 3.48 MB

The main repo for Stage Whisper — a free, secure, and easy-to-use transcription app for journalists, powered by OpenAI's Whisper automatic speech recognition (ASR) machine learning models.

Home Page: https://stagewhisper.org

License: MIT License

Python 1.45% TypeScript 98.40% HTML 0.15%
hacktoberfest audio-transcription whisper ai-transcription openai openai-whisper electron-app journalism

stage-whisper's Introduction

Stage-Whisper

This is the main repo for Stage Whisper — a free, open-source, and easy-to-use audio transcription app. Stage Whisper uses OpenAI's Whisper machine learning model to produce very accurate transcriptions of audio files, and also allows users to store and edit transcriptions using a simple and intuitive graphical user interface.

Quickstart

Stage Whisper consists of two connected components:

  • A Python backend that interfaces with OpenAI's Whisper library
  • A Node/Electron-powered interface

Prerequisites

The eventual 1.0 release of Stage Whisper will (ideally) not require any additional software. For now, though, you will need the following installed on your machine to develop Stage Whisper. It is currently possible to separately work on the Electron interface or the Python backend, so if you are planning to only work on one or the other, you only have to install the requirements specific to that component.

  • Node (required for Electron)
  • Yarn (required for Electron)
  • Python 3.x (required for backend)
  • Rust (required for backend)
  • ffmpeg (required for backend)
  • Poetry (required for backend)

There's any number of ways to get all these dependencies installed on your workstation, but here is one example of how you might install all of the above on a Mac (skip any step for something you have already installed):

# Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Python, Node, Rust, ffmpeg, and Yarn
brew install python node rust ffmpeg yarn

# Install Poetry
curl -sSL https://install.python-poetry.org | POETRY_HOME=/etc/poetry python3 -

Running the Python Backend

Install dependencies:

cd backend
poetry install

While the backend's primary purpose will be to run as a service for the Electron app to connect to, it can also be run as a standalone script. To do so, run:

poetry run python stagewhisper --input /path/to/audio/file.mp3

Running the Electron Interface

cd electron
yarn
yarn dev

Goal

Earlier this year, OpenAI released Whisper, its automatic speech recognition (ASR) system that is trained on "680,000 hours of multilingual and multitask supervised data collected from the web." You can learn more by reading the paper [PDF] or looking at the examples on OpenAI's website.

As Dan Nguyen noted on Twitter, this could be a "godsend for newsrooms."

The only problem, as @PeterSterne pointed out, is that not all journalists (or others who could benefit from this type of transcription tool) are comfortable with the command line and installing the dependencies required to run Whisper.

Our goal is to package Whisper in an easier to use way so that less technical users can take advantage of this neural net.

Peter came up with the project name, Stage Whisper.

Who is involved

@PeterSterne and @filmgirl (Christina Warren) created the project, and @HarrisLapiroff and @Crazy4Pi314 (Sarah Kaiser) are leading the development with @oenu (Adam Newton-Blows) leading frontend development.

We'd love to collaborate with anyone who has ideas about how we could more easily package Whisper and make it easy to use for non-technical users.

Project Status

The project is currently in the early stages of development. We have a working prototype that uses the Electron and Mantine frameworks to create an app that allows users to input audio files, transcribe them using Whisper, and then manage and edit the resulting transcriptions. The app will be available for MacOS, Windows, and Linux. We are currently working on implementing major improvements and hope to release a beta version soon.

License

Any code that we distribute will be open sourced and follow the license terms of any of the projects that we are using. Whisper is MIT licensed, but some of its dependencies (FFmpeg) are licensed under different terms. We will be sure to adhere to any/all licensing terms and in the event that we cannot bundle ffmpeg with Stage Whisper, we will make it as easy to obtain as possible for the end-user. Any Stage Whisper-specific code will be licensed under the MIT license.

stage-whisper's People

Contributors

crazy4pi314 avatar filmgirl avatar harrislapiroff avatar mattahorton avatar mike-freeai avatar oenu avatar petersterne avatar sawhney17 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stage-whisper's Issues

Sign and notarize app with Apple Developer certificate

In order for the app to install easily on MacOS, we need to get an Apple Developer certificate and then sign the app and get it notarized by Apple. I believe that all needs to be done through Xcode.

If someone (@mattahorton?) already has an individual Apple developer account, we can potentially use theirs. Otherwise I can create a new individual Apple account for $99/yr. Unfortunately, I don't think we'll be able to get an organization developer account since Stage Whisper is not a legal entity.

Clean up language in main window

How restricted are we from changing the language that currently shows in Gooey? I would love to have more human-readable titles and descriptions. Instead of "audio" or "output_dir," can we say "Select an audio file to transcribe" and "Select an Output Directory"? Or will we run into the issue where Gooey chokes on spaces?

Electron app throws error after submitting audio file

Receiving error after drag *.m4a file (others formats too give same error) to input screen to further process file as "window.Main is undefined, app in dev mode, please view in electron to select an output directory". Installed python 3.10.2, electron version 7.1.7.
Received following error also in electron run "The file name, directory name, or volume label syntax is incorrect"
image

Separate output and logging

Right now they get combined into a single shell. We should separate them, possibly by making it easy to direct output to a file? (Ultimately it would be nice to display just output in the interface, but I don't think we'll get there with Gooey.)
Screen Shot 2022-09-25 at 1 58 35 AM

Play/pause controls block transcription page numbers

On the current electron-dev branch, the play/pause controls obscure the page numbers at the bottom right corner of the transcription page (and the edit button for the last line on the transcription page). This prevents the user from being able to move to other pages.

Screen Shot 2022-10-09 at 11 12 56 PM

OS-specific default output directory

Let's have the output directory default to an OS-specific reasonable default. For example on macOS this should be ~/Documents/. The current default of . isn't tenable since I think that would cause files to generate inside the application bundle.

Record the time it takes to transcribe with different Whisper models

I want to get a rough idea of how long it might take to transcribe an interview of a set length on cpu and then include that information in the GUI.

That means running tests with all of the models on at least two different CPUs, recording how much time each test took and whether the larger models seemed more accurate than the smaller ones. We don't necessarily need to transcribe the full files, if it seems like the transcription rate is consistent (e.g. base.en transcribes 6 minutes of audio in 3 minutes, which means it will transcribe 1 hour of audio in 30 minutes).

The goal here is two-fold.

First, I want to be able to explain to readers the differences between the models in a way that is relevant to them (e.g. "A 1 hour long interview will take...roughly 15-20 minutes to transcribe using the tiny English-only model, 30-40 minutes using the base English-only model, 4+ hours to transcribe using the large model," etc. )

Second, I want to figure out which model the app should use by default, by seeing which model has the best trade-off between time to transcribe and accuracy. We're not writing a whitepaper here so it doesn't need to be precisely measured, but I want to know whether we should make base.en (or even tiny.en) model the default instead of the small model.

So we have three things to do:

  • Test the different models and record the rough amount of time they take to transcribe
  • Add that information to the GUI interface
  • Choose the best default model

Either Disable or Fix Dark Mode

Running stagewhisper through poetry while dark mode is active on MacOS causes this:

Dark Mode on MacOS 2022-09-25

Can we just fully disable dark mode or is this an issue where Gooey is automatically going to try to enter dark mode if the OS is in dark mode? The former is not a big deal, since we can just disable dark mode for now. The latter would be a very big problem.

Redesign Dashboard Page

Redesign the Dashboard page to include:

  • a walkthrough of the app
  • an explainer of what it can do
  • any post-install actions that might need to be taken (eg. if we need them to use poetry etc)
  • Links to github / whisper
  • Staff attribution

Large language model crashing

    If I run the Electron app and go to settings and click the toggle switch to enable the large models, the app crashes. Here's the error message in the console:
serializableStateInvariantMiddleware.ts:194 A non-serializable value was detected in an action, in the path: `payload`. Value: SyntheticBaseEvent {_reactName: 'onChange', _targetInst: null, type: 'change', nativeEvent: PointerEvent, target: input#mantine-44pt7i4wl.mantine-rpvocz.mantine-Switch-input, …} 
Take a look at the logic that dispatched this action:  {type: 'settings/setDisplayLanguage', payload: SyntheticBaseEvent} 
(See https://redux.js.org/faq/actions#why-should-type-be-a-string-or-at-least-serializable-why-should-my-action-types-be-constants) 
(To allow non-serializable values see: https://redux-toolkit.js.org/usage/usage-guide#working-with-non-serializable-data)

Originally posted by @petersterne in #34 (comment)

Rewrite Electron front-end to get input directly from Whisper script rather than parsing vtt file

Currently, the Electron app creates a transcription by parsing the vtt file output by the Whisper python script. Specifically, the Electron front-end waits until the Whisper script has finished transcribing the full audio file and written it to a vtt file, then converts the vtt file to json, parses it, splits it into an array of time-stamped lines, and then creates a transcription object that is stored in the sqlite transcription database.

Here's the relevant code: https://github.com/Stage-Whisper/Stage-Whisper/blob/use-backend/electron/electron/handlers/runWhisper/runWhisper.ts#L130

This behavior no longer makes sense for our use case, since @harrislapiroff has tweaked the Whisper script to output directly to stdio rather than producing a vtt file.

So we need to rewrite the Electron front-end so that it takes the Whisper lines directly from stdio (ideally as they're being written) as input. This should be a relatively easy fix to implement for anyone who has experience with Electron, React, or even plain JS.

This is currently a program-breaking bug and fixing it is top priority.

Although there is one obvious temporary workaround (just have the Whisper script produce a vtt file that the Electron front-end can parse, which is how it used to work), I believe it would be a poor solution for a number of reasons.

First of all, it would add an unncessary step — writing the vtt file and then reading it, instead of just reading the transcription directly from stdio. Second, it would delay the Electron front-end, which would not be able to start writing transcribed lines to the database until the entire file has been transcribed. Finally, it would disrupt the program's overall architectural philosophy, which calls for the Electron front-end to manage all the file and database operations while the Whisper script only handles the actual transcription. Ensuring that each component of the program has a clearly-defined job is important for extensibility and future feature development (e.g. speaker diarization).

Poor transcription viewer performance

    >  https://github.com/Stage-Whisper/Stage-Whisper/pull/71#issuecomment-1272010111

Im going to be looking into making it a table component as current approach is difficult to read and very heavy on the system, will make a new branch for this

Originally posted by @oenu in #71 (comment)

Catch errors in whisper process

Currently (on the improvements branch) if the whisper process crashes the spinner seems to keep going. It should instead display an error so a user doesn't wait forever for completion.

(Wanted to make a note of this now, because we'll probably fix the process crashing issue I'm having and then we might forget to handle other errors)

Get the electron app to call our whisper python script in dev

Currently the Electron app works by calling whisper on the local machine. We want it to instead work by calling our script in backend/stagewhisper/__main__.py.

This is only for dev. We will want something slightly different for what the "built" version of the app does.

Translation System

Discussed in #38

Originally posted by oenu October 1, 2022

Plan

  • Have a folder in the app called "localization"
  • Inside are a number of files en.yml tr.yml fr.yml which contain strings and translations
  • Use This code to build a strings library inside the app that can be accessed from react + this code to import from YML
  • Use a service like https://poeditor.com/ to pull in files and support translation by volunteers

Thanks to @notadevyet for their help!

Complex transcription options

Right now the app only takes in audio files and assumes the rest (eg autodetect language and use base model).

The app should offer the user at minimum a choice of model, audio language, entry name, whether to translate and whether to export immediately

  • Model (Previous version exists in input feature folder)
  • Audio Language
  • Entry Name
  • Task - translate/transcribe
  • Immediate export location - #96

Issue: Complex JSON -> RPC -> Redux -> Ui data handling

Duplicating state from json -> sqlite -> redux was a massive headache that should be avoided

My proposal is using Realm to store the data in a fast and local mongoDB instance, then each audio file can have multiple 'jobs' attached to it as documents, each with collections of 'lines' from the script. These could be added from multiple streams without having to worry about attribution or FS collisions.

Realm (as a database for fast mobile development) has the ability to be queried from the render process.

Steps to implement

  • Test Realm connection
  • Design Database Schema
  • Init database
  • Hook up stdio
  • Remove redux functions
  • Migrate pages to request driven rendering

Cant install poetry package

When I run poetry install on main in a fresh LTS Ubuntu container, I get:

(stage-whisper) vscode ➜ /workspaces/Stage-Whisper $ poetry install
Updating dependencies
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/f4/57/fe3e4e96efa3c68d3781a0903de0933ea2afa744852d907b290a2cb2294e/colored-1.4.3.t
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/0a/88/f4f0c7a982efdf7bf22f283acf6009b29a9cc5835b684a49f8d3a4adb22f/numpy-1.23.3.ta
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/0a/88/f4f0c7a982efdf7bf22f283acf6009b29a9cc5835b684a49f8d3a4adb22f/numpy-1.23.3.ta
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/0a/88/f4f0c7a982efdf7bf22f283acf6009b29a9cc5835b684a49f8d3a4adb22f/numpy-1.23.3.ta
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/0a/88/f4f0c7a982efdf7bf22f283acf6009b29a9cc5835b684a49f8d3a4adb22f/numpy-1.23.3.ta
Resolving dependencies... (22.8s)

Writing lock file

Package operations: 6 installs, 3 updates, 0 removals

  • Updating numpy (1.23.1 /tmp/abs_653_j00fmm/croots/recipe/numpy_and_numpy_base_1659432701727/work -> 1.23.3)
  • Installing pillow (9.2.0)
  • Updating six (1.16.0 /tmp/build/80754af9/six_1644875935023/work -> 1.16.0)
  • Installing colored (1.4.3)
  • Installing psutil (5.9.2)
  • Installing pygtrie (2.5.0)
  • Installing wxpython (4.2.0): Failed

  CalledProcessError

  Command '['/opt/conda/envs/stage-whisper/bin/python', '-m', 'pip', 'install', '--use-pep517', '--disable-pip-version-check', '--prefix', '/opt/conda/envs/stage-whisper', '--no-deps', '/home/vscode/.cache/pypoetry/artifacts/dd/8a/ed/054835d5a4eca523ef801b4f79ac35baa52d191d846dd9d6021a9cabee/wxPython-4.2.0.tar.gz']' returned non-zero exit status 1.

  at /opt/conda/lib/python3.9/subprocess.py:528 in run
       524│             # We don't call process.wait() as .__exit__ does that for us.
       525│             raise
       526│         retcode = process.poll()
       527│         if check and retcode:
    →  528│             raise CalledProcessError(retcode, process.args,
       529│                                      output=stdout, stderr=stderr)
       530│     return CompletedProcess(process.args, retcode, stdout, stderr)
       531│ 
       532│ 

The following error occurred when trying to handle this error:


  EnvCommandError

  Command ['/opt/conda/envs/stage-whisper/bin/python', '-m', 'pip', 'install', '--use-pep517', '--disable-pip-version-check', '--prefix', '/opt/conda/envs/stage-whisper', '--no-deps', '/home/vscode/.cache/pypoetry/artifacts/dd/8a/ed/054835d5a4eca523ef801b4f79ac35baa52d191d846dd9d6021a9cabee/wxPython-4.2.0.tar.gz'] errored with the following return code 1, and output: 
  Processing /home/vscode/.cache/pypoetry/artifacts/dd/8a/ed/054835d5a4eca523ef801b4f79ac35baa52d191d846dd9d6021a9cabee/wxPython-4.2.0.tar.gz
    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'error'
    error: subprocess-exited-with-error
    
    × Getting requirements to build wheel did not run successfully.
    │ exit code: 1
    ╰─> [19 lines of output]
        Traceback (most recent call last):
          File "/opt/conda/envs/stage-whisper/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
            main()
          File "/opt/conda/envs/stage-whisper/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
            json_out['return_val'] = hook(**hook_input['kwargs'])
          File "/opt/conda/envs/stage-whisper/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
            return hook(config_settings)
          File "/tmp/pip-build-env-a8gv67ye/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
            return self._get_build_requires(config_settings, requirements=['wheel'])
          File "/tmp/pip-build-env-a8gv67ye/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
            self.run_setup()
          File "/tmp/pip-build-env-a8gv67ye/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 482, in run_setup
            super(_BuildMetaLegacyBackend,
          File "/tmp/pip-build-env-a8gv67ye/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 335, in run_setup
            exec(code, locals())
          File "<string>", line 27, in <module>
          File "/tmp/pip-req-build-_uvrfpnq/buildtools/config.py", line 30, in <module>
            from attrdict import AttrDict
        ModuleNotFoundError: No module named 'attrdict'
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  

  at /opt/conda/lib/python3.9/site-packages/poetry/utils/env.py:1476 in _run
      1472│                 output = subprocess.check_output(
      1473│                     command, stderr=subprocess.STDOUT, env=env, **kwargs
      1474│                 )
      1475│         except CalledProcessError as e:
    → 1476│             raise EnvCommandError(e, input=input_)
      1477│ 
      1478│         return decode(output)
      1479│ 
      1480│     def execute(self, bin: str, *args: str, **kwargs: Any) -> int:

The following error occurred when trying to handle this error:


  PoetryException

  Failed to install /home/vscode/.cache/pypoetry/artifacts/dd/8a/ed/054835d5a4eca523ef801b4f79ac35baa52d191d846dd9d6021a9cabee/wxPython-4.2.0.tar.gz

  at /opt/conda/lib/python3.9/site-packages/poetry/utils/pip.py:51 in pip_install
       47│ 
       48│     try:
       49│         return environment.run_pip(*args)
       50│     except EnvCommandError as e:
    →  51│         raise PoetryException(f"Failed to install {path.as_posix()}") from e
       52│ 

File generation improvements

The CLI utility for whisper which we copied for this application generates files automatically on completion:

audioname.ext.txt -- includes transcription only
audioname.ext.vtt -- includes transcription and timecodes

We should

  • Make this file generation optional (allow someone to only get output through the console)
  • Stream data as-we-go (similar to how it's streamed to stdout) or at least write the file upon interruption so that users still get partially transcribed files
  • Strip audio extension and add _transcript before writing text files (i.e., generate audioname_transcript.txt instead of audioname.mp3.txt)
  • Default to a standard output location by default for each given operating system (e.g., ~/Documents/ for macOS). I think . will not be a useful location once this is packaged as a binary
  • Stretch Goal: Let users specify the name for the output files. Note that this functionality should accommodate a user selecting multiple audio files at once (file name as format string?) or we should disable the ability to select multiple files at once

Get electron app to call our whisper script in build

As part of our build process we will need to bundle our whisper script in backend/stagewhisper/__main__.py into an executable inside of the electron app. We probably will use PyInstaller, PyOxidizer, or a similar utility to create the executable. The fully-built electron app should then be able to call that executable without reliance on any particular software being available on the host machine—our application should be fully self-encapsulated.

@crazy4pi314 Please feel free to elaborate on anything I've gotten wrong here or with more specifics of how we need to accomplish this for our Beta release.

Too many options on application screen

image

This is too many options for a non-technical user (our primary audience). Let's narrow it down to just:

  • Audio
  • Model (with better help text)
  • Language
  • Output directory (and it should probably default to something reasonable for the target OS, e.g., ~/Documents on macOS.)

If possible we should tuck other options behind an "Advanced" disclosure, but if that's not possible through Gooey, let's start with the basic options for our initial release and add that feature for the future.

Immediate Export option

App should be able to immediately export audio files to a given output directory/desktop.

This removes user clicks and brings in a new set of users who just want an easy whisper install

  • Possible ipc call from within RunWhisper to the export handler
  • Output directory storing possibly in the entry or transcription table
  • Feedback to the user when an export occurs (Audio ?)

Target user stories

For working on the GUI design, it would be good to learn more from potential users what features/workflows that they need from the tool. Add in the comments how you might like to use Stage Whisper! <3

Strip Gooey parameters from `electron` branch

The Stage-Whisper:electron branch currently contains the same python script as the one in the main branch, complete with Gooey-specific parameters, despite the fact that it uses Electron as a front-end. We should strip out anything referencing Gooey from this branch and update the readme.md file with the correct requirements and directions to launch the Electron app (e.g. Yarn).

Cancel Transcription

need to support cancelling a transcription (especially useful for long transcriptions)

  • Send a message to interrupt the child process (this might require passing the child process out of the RunWhisper handler as it is stuck inside a promise. Maybe an IPC listener to reject the promise?)
  • Clean up the resulting files/state in redux
  • Communicate to the user that the transcription has been cancelled

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.