Coder Social home page Coder Social logo

soupslurpr / transcribro Goto Github PK

View Code? Open in Web Editor NEW
347.0 5.0 5.0 122.02 MB

Private and on-device speech recognition keyboard and service for Android.

License: ISC License

Kotlin 95.56% CMake 1.06% C 3.38%
android keyboard material-3 material-you speech-recognition speech-to-text jetpack-compose material-design-3 kotlin kotlin-android

transcribro's Introduction

Transcribro

Transcribro is a private and on-device speech recognition keyboard and service for Android.
It uses whisper.cpp to run the OpenAI Whisper family of models and Silero VAD for voice activity detection.
It features a voice input keyboard, enabling you to type with speech.
It can also be used by other apps either explicitly or when set as the user-selected speech to text app which some apps may use for speech to text.

Download

Transcribro is available on the Accrescent app store and GitHub releases.
Accrescent is the recommended way to get Transcribro as it is more secure than GitHub releases.
Click on the badge below to get it on Accrescent.

Get it on Accrescent

The package name and SHA-256 hash of the signing certificate is below, so if you are downloading the APK, you can verify Transcribro with apksigner using apksigner verify --print-certs Transcribro-X.Y.Z.apk and/or AppVerifier. If you are downloading from Accrescent then you should verify Accrescent itself here.

dev.soupslurpr.transcribro
7D:BC:FB:FA:A1:35:B4:4E:6E:93:91:02:25:DC:B1:4E:05:82:91:DA:8C:2D:36:22:73:49:49:B7:1A:B3:BE:64

It can also be found on a Bluesky post to distrust the website. It is encouraged to verify it's the same with other people as well for assurance.

Community

Join the Matrix space at https://matrix.to/#/#transcribro:matrix.org for the General, Announcements, and Testing rooms.

Contributing

Check CONTRIBUTING.md for things to know if you want to contribute.

Donation

Enjoy Transcribro? You can donate to soupslurpr, the lead developer of Transcribro to support their work on Transcribro and their other open source projects. Thank you!

Monero address:
88rAaNowhaC8JG8NJDpcdRWr1gGVmtFPnHWPS9xXvqY44G4XKVi5hZMax2FQ6B8KAcMpzkeJAhNek8qMHZjjwvkEKuiyBKF

The Monero address and also a QR code can be found in the app's Donate screen.

Branding

You may not use the name "Transcribro", a name that includes "Transcribro", and the app icon in a derivative work that has published builds.
This is to prevent confusion of which is the official Transcribro.

Screenshots

Screenshot of the keyboard UI, focused on the search bar of Vanadium's incognito tab.

transcribro's People

Contributors

soupslurpr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

transcribro's Issues

Double period at the end of sentence.

If you stop talking just long enough at the end of the sentence and then tap the microphone to end dictation, you'll end up with two periods at the end of the sentence. .

just like what happened on that sentence above.

It seems the app has already put in the period for when you pause your talking, but then you tap the microphone and it adds another period. This doesn't seem to happen all the time. .

But I got it to happen twice in this comment. .

Now three times. πŸ˜‰

Feature: enter key.

The app works pretty good. The only thing missing on the keyboard is a return key so that you can allow the transcription to take place. Add a new paragraph and start another transcription without leaving the app, because you really can't just let the app run continuously, it'll choke sooner or later.

can't use the app on a Samsung device

Samsung Galaxy Note 9 running A14 (OneUI 6.1) and I can't select the app as a voice input, it is not on the list, only Samsung and google voice input are available to select

Screenshot_20240730_080613_Samsung Keyboard

Bug: O. 2.0 Null output or stops responding with long input

Android 14 GrapheneOS
Pixel 6a
Tested in text editor and Markor

The previous versions I tested did not do this.

I've reproduced this consistently.

Tried different settings, didn't matter if auto-start recognition enabled or not. Disabled GrapheneOS exploit protections for app, didn't make a difference.

A lengthy input causes a flicker of the screen and no output or the app will get the "not responding" message and no output.

I did the following input:

1 and 2 and 3 and 4 and 5 and 6 and...

Up to 20 worked. 25 caused problems, consistently.

System memory shows max memory of 500 used.

Edit:

I did not tap the mic to stop recognition. However I tested that and it gives a toast saying it's still working....no matter how long I wait.

Got one to work up to 30 πŸ˜‚ but it took a very long time for the output = not usable.

Keyboard voice commands

Voice commands to control the keyboard. Maybe at first just try to implement commands for deletion and punctuation, with future commands in mind when figuring out what the specific command for each of those will be.

Not sure of the specific approach that should be followed for this, it likely needs some trial and error to figure out. It may need some lower level code that interfaces with the model while it's generating or something like that. If so, it will be blocked until the speech to text is switched to a Rust implementation.

Sound when starting keyboard is annoying

It would be great if there was an option to disable or control the volume of the voice recognition service startup sound.

it would be great if their was a list of different sounds available for users to select from. This way, they could choose the sound that they think sounds the best.

Thank you for creating such an amazing app❀️. And being one of the first apps on AccrescentπŸŽŠπŸŽ‰

Model chooser screen in settings

A screen in settings to download more models from huggingface from the app itself, pick the model that will be used, and manage/delete them. The models would be downloaded from a repo from my huggingface account and the hashes of the files would be checked against hashes included with Transcribro to ensure integrity even in the event of a huggingface server compromise.

There should be a text box at the top of the screen to test the selected model using the Voice Input Keyboard.

The most recommended models would be shown first and there shouldn't be an overwhelming amount of choice for no benefit. Test different model quants to choose enough models for a sensible variety of speed vs accuracy vs multilingualism and clearly communicate those properties in the interface. If needed, there can be a "more models" button that goes to a screen with the other models to keep the list from being too long.

Additionally, there should be an option to import a model from a file which can show up below the ones downloaded from the app but in a separate section to not mistake them from the official ones.

0.3.0 minor issues

Just did some testing on 0.3.0, pretty good!
Android 14 (GrapheneOS)

Minor issues:

  1. If app is started and microphone is blocked, even though app has microphone permission, the app doesn't trigger the "unblock microphone" notification.

  2. The app will give "β™ͺ β™ͺ" when you aren't talking and you tap the microphone. πŸ€·πŸ»β€β™‚οΈ Nothing big, doesn't seem to do it if you talk, even a little.

  3. I still think there should be some type of trigger to cause it to output the text instead of just stop because the output is very fast when you trigger it via the stop thus, tapping such a trigger key would output what's been input as the user then continues on. Best example is when finishing a paragraph: tap the trigger β†’ output β†’ return key β†’ continue talking 😁

All in all though, great upgrade!

Edit

  1. Not sure about this stuff:

What is 1200 divided by 6?

16 times 12 equals

πŸ€·πŸ»β€β™‚οΈ

Keyboard: Enter/send key

An enter/send key is necessary, and this also would match with other keyboards' placement of the enter key.

Current backspace will be halved and the bottom half will become the enter/send key. The top half will remain a backspace.

Audio file sharing to recognize

A share extension to be able to share voice message or other audio and get a transcription.

An iOS app called Hello Transcribe is a good example of this feature.

Many people leave voice messages these days and using on device transcription would not break the confidentiality of end to end encryption, but would allow us to slim through messages without having to listen to it.

Add link to Matrix community space in-app

In the absence of a website for Transcribro like BeauTyXT has, it'd be good to include the link to the Matrix community space in the app so people can have a clear way of getting support instead of having to click on the source code link and look through the README because they probably won't do it since it isn't obvious the Matrix community space's link can be obtained from trying to view the source code.

It should be the first entry in the About section of the settings because it's important.

Improved voice activity detection and continuous recording

Would you be interested in merging an improvement to the way audio is recorded, voice activity analysed and queued for transcription?
Current state: recording started when VAD is detected and stopped when VAD ends
new state: each audio chunk (~300ms) is analysed for voice activity, if it contains speech it is added to a recording. Once a specified number of silent chunks are detected, the recording is added to a audio processing queue. Separate thread processes queue items and performs transcription. This allows for shorter recordings as we can effectively filter out silent audio chunks and only send audio that contains actual speech to be transcribed. It also decouples recording from transcribing, increasing reliability.

Multiple languages support

An ability to use multilingual models with directing language detection.

Saying I want to recognize English and Slovak language. Whisper has good language detector, but for shorter texts it sometimes mistakes similar language (Slovak, Czech, Slovenian), so allowing to narrow down the choice would be great.

It would be amazing to be able to use this project for those of us who routinely communicate in more than one language.

Keyboard: Auto-Send Transcription

An option to automatically send the recognized speech if the end of speech is detected and the current text box is a send type.

Default: false

ActionId seems to always be 0

No matter what app I go in, the reported actionId is 0 and thus the Send key is shown. Other keyboards such as FlorisBoard and Gboard show the proper action.

Keyboard: Explore prompting whisper with the text before the cursor

This might result in improved performance in cases of correcting a misspelled word, adding on more to a sentence, and to the middle of a sentence. If it works well enough, then the new text adaptiveness feature can be altered or even removed to let this feature work instead of overriding it.

App won't work if global mic toggle is enabled when you try an initial transcription, even if you re-enable it later

When I installed Trascribro, my global microphone toggle was switched off. I gave the app the required permissions and tried to use it, but nothing happened.

I remembered that I had the microphone quick tile disabled (the app didn't prompt me to enable, which other apps do, so it probably should).

The issue, however, is that even after the global microphone toggle was enabled, the app would continue to not be able to transcribe. It seems you have to force stop Transcribro for it to work again.

User Experience - Suggestions for Improvement

This voice keyboard app for Android phones is fantastic! I really enjoy using it. The error rate is quite low. There are just a couple of things I think could be improved.

First, would it be possible to turn off the sound notification that plays when you start and stop recording?

Second, when you start recording, a message letting the user know it's recording would be helpful. Similarly, when you stop recording, a loading or processing sign would be great. Right now, I press stop and don't know how long the processing will take. From a user interface standpoint, these small changes would make a big difference.

Punctuation keys

Generally the app seems to get punctuation pretty good. However, when working with things like list items the only thing you can do is return for new line.

Similarly there's those odd things like "12 times 6 equals"

It would seem nearly impossible to account for all those types of things. Therefore the user has to go back and edit β€” I think that's where SayBoard having its editable "keyboard" works well. And why I'm always saying, it would be great if voice input had a full keyboard available 😁 β€” I guess that's my dream πŸ€£πŸ˜‚

Nonetheless, some way to add punctuation would be excellent.

Integrated directly into something like Simple keyboard 😁 I can dream!

Keyboard customization

The ability to customize all the keys of the voice input keyboard, including being able to delete them, modify their size, and move them around. The customization screen would be accessed from Settings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.