Coder Social home page Coder Social logo

vilassn / whisper_android Goto Github PK

View Code? Open in Web Editor NEW
112.0 4.0 18.0 191.57 MB

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android

License: MIT License

Java 2.18% CMake 0.60% C++ 54.34% C 30.98% Starlark 1.84% Shell 0.23% Python 3.76% NASL 0.22% JavaScript 0.71% Ruby 0.01% Swift 1.07% Kotlin 0.71% Dart 0.57% HTML 0.01% CSS 0.01% Go 0.23% TypeScript 0.94% C# 1.11% Lua 0.23% Nim 0.24%
asr openai texttospeech tts whisper text-to-speech speech-recognition tensorflow tflite offline

whisper_android's People

Contributors

vilassn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

whisper_android's Issues

Realtime use possible?

There are some whisper realtime libraries out there.
Is there any possible way to make this library realtime ?

Get timestamps at the segment or word level

Thanks for the port.

Can this output a transcript of the provided audio with timestamps at the segment, word level, or both. I'm trying to transcribe audio files for dubbing and i need timestamp precision for wav file transcripts. Basically the start and end times for words or texts .

Open ai provides an api for this through the [timestamp_granularities[] parameter](https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-timestamp_granularities)

Can you add this feature?

I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text.

I was able to setup model and it works really great. My code is:

`private fun testAudio() {
// Initialize Whisper
val mWhisper = Whisper(this) // Create Whisper instance

// Load model and vocabulary for Whisper
val basePath = Global.fileOperations.getOutputDirectory("/Models", this)!!.path
val modelPath = basePath + "/whisper-tiny.tflite" // Provide model file path

    val vocabPath: String = basePath +
        "/filters_vocab_multilingual.bin" // Provide vocabulary file path
    println("PATHS: ")
    println(modelPath)
    println(vocabPath)
    mWhisper.loadModel(modelPath, vocabPath, true) // Load model and set multilingual mode

// Set a listener for Whisper to handle updates and results

    mWhisper.setListener(object : IWhisperListener {
        override fun onUpdateReceived(message: String?) {
            Log.i("TRANSCRIBE_WHISPER", "New State: $message")
            // Handle Whisper status updates
        }

        override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }
    })
    // Initialize Recorder
    val mRecorder = Recorder(this) // Create Recorder instance

// Set a listener for Recorder to handle updates and audio data
mRecorder.setListener(object : IRecorderListener {
override fun onUpdateReceived(message: String) {
// Handle Recorder status updates
}

        override fun onDataReceived(samples: FloatArray) {
            // Handle audio data received during recording
            // You can forward this data to Whisper for live recognition using writeBuffer()
            mWhisper.writeBuffer(samples);
        }
    })

    mRecorder.start(); // Start recording

}`

and  override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }

seemed to return:

[audioRecordData][fine] 5s(f:5014 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1

I'll make a hole in the hole.
2 times this:

[audioRecordData][fine] 10s(f:10000 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1
then
I'll be back with a little .... <== repeated a lot

thanks for you hard work :P

Not working on virtual devices

I have tried the project as is on 2 virtual devices (Android 14 and 12) and one physical device (Android 12). It seems to not run on virtual devices, you might want to mention this in the readme.md.

Getting CMake exception while gradle sync/build

I cloned repository and started the gradle sync, but got following exception, can anyone help with that?:

1: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':app:configureCMakeDebug[arm64-v8a]'.
> [CXX1429] error when building with cmake using C:\Users\15010\AndroidStudioProjects\whisper_android\app\src\main\cpp\CMakeLists.txt: C++ build system [configure] failed while executing:
      @echo off
      "C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\cmake\\3.22.1\\bin\\cmake.exe" ^
        "-HC:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\src\\main\\cpp" ^
        "-DCMAKE_SYSTEM_NAME=Android" ^
        "-DCMAKE_EXPORT_COMPILE_COMMANDS=ON" ^
        "-DCMAKE_SYSTEM_VERSION=26" ^
        "-DANDROID_PLATFORM=android-26" ^
        "-DANDROID_ABI=arm64-v8a" ^
        "-DCMAKE_ANDROID_ARCH_ABI=arm64-v8a" ^
        "-DANDROID_NDK=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620" ^
        "-DCMAKE_ANDROID_NDK=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620" ^
        "-DCMAKE_TOOLCHAIN_FILE=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620\\build\\cmake\\android.toolchain.cmake" ^
        "-DCMAKE_MAKE_PROGRAM=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\cmake\\3.22.1\\bin\\ninja.exe" ^
        "-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=C:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\build\\intermediates\\cxx\\Debug\\6v4z4y72\\obj\\arm64-v8a" ^
        "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY=C:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\build\\intermediates\\cxx\\Debug\\6v4z4y72\\obj\\arm64-v8a" ^
        "-DCMAKE_BUILD_TYPE=Debug" ^
        "-BC:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\.cxx\\Debug\\6v4z4y72\\arm64-v8a" ^
        -GNinja
    from C:\Users\15010\AndroidStudioProjects\whisper_android\app

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.