spokestack / spokestack-android Goto Github PK

View Code? Open in Web Editor NEW

66.0 6.0 5.0 1.28 MB

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Home Page: https://spokestack.io

License: Apache License 2.0

Java 98.11% Makefile 0.38% C++ 1.51%

speech-recognition android voice-assistant wakeword asr voice-activity-detection text-to-speech nlu vad speech

spokestack-android's Issues

Custom HTTP timeouts for Spokestack TTS

The read and connect timeouts for SpokestackTTSClient should be configurable. This should be achievable by looking for new SpeechConfig properties in SpokestackTTSService and passing them directly to a new client constructor that accepts this configuration (the current constructor should set the configuration to the current values as defaults and call this new constructor).

Crash in WordpieceTextEncoder

I tried to search for min sdk version and in the manifest it looks like api 8, however in the class
WordpieceTextEncoder there is a call using

return this.vocabulary.getOrDefault(token,this.vocabulary.get(UNKNOWN));

in encodeSingle (line 90) from WordpieceTextEncoder I had a crash because of getOrDefault, this is supported only from api 24+
would be nice to use something like

return this.vocabulary[token] ?:  this.vocabulary.get(UNKNOWN));

version 11.4.1

(tha's kotlin)

Error when building Google Cloud ASR pipeline

Hi 👋

I'm trying to set up the Google Cloud ASR with this configuration:

var json: String? = null
        try {
            val  inputStream: InputStream = assets.open("service_account.json")
            json = inputStream.bufferedReader().use{it.readText()}
        } catch (ex: Exception) {
            ex.printStackTrace()
        }

        val builder = Spokestack.Builder()
            .withoutWakeword()
            .withoutNlu()
            .setProperty("spokestack-id", "my id")
            .setProperty("spokestack-secret", "my secret")
            .withAndroidContext(this)
            .addListener(listener)
        builder
            .pipelineBuilder
            .setProperty("google-credentials", json)
            .setProperty("language", "en-US")
            .useProfile("io.spokestack.spokestack.profile.VADTriggerGoogleASR")
        return builder.build()

Unfortunately, this configuration throws the following exception(s):

E/AndroidRuntime: FATAL EXCEPTION: main
    Process: mypackagename, PID: 26259
    java.lang.RuntimeException: Unable to start activity ComponentInfo{mypackagename.MainActivity}: java.lang.reflect.InvocationTargetException
        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3448)
        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595)
        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83)
        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135)
        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95)
        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147)
        at android.os.Handler.dispatchMessage(Handler.java:107)
        at android.os.Looper.loop(Looper.java:237)
        at android.app.ActivityThread.main(ActivityThread.java:7814)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075)
     Caused by: java.lang.reflect.InvocationTargetException
        at java.lang.reflect.Constructor.newInstance0(Native Method)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
        at io.spokestack.spokestack.SpeechPipeline.createComponents(SpeechPipeline.java:203)
        at io.spokestack.spokestack.SpeechPipeline.start(SpeechPipeline.java:182)
        at io.spokestack.spokestack.Spokestack.start(Spokestack.java:182)
        at mypackagename.MainActivity.onCreate(MainActivity.kt:54)
        at android.app.Activity.performCreate(Activity.java:7955)
        at android.app.Activity.performCreate(Activity.java:7944)
        at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1307)
        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3423)
        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595) 
        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83) 
        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) 
        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) 
        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147) 
        at android.os.Handler.dispatchMessage(Handler.java:107) 
        at android.os.Looper.loop(Looper.java:237) 
        at android.app.ActivityThread.main(ActivityThread.java:7814) 
        at java.lang.reflect.Method.invoke(Native Method) 
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) 
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075) 
     Caused by: java.lang.NoClassDefFoundError: Failed resolution of: Lcom/google/auth/oauth2/ServiceAccountCredentials;
        at io.spokestack.spokestack.google.GoogleSpeechRecognizer.<init>(GoogleSpeechRecognizer.java:66)
        at java.lang.reflect.Constructor.newInstance0(Native Method) 
        at java.lang.reflect.Constructor.newInstance(Constructor.java:343) 
        at io.spokestack.spokestack.SpeechPipeline.createComponents(SpeechPipeline.java:203) 
        at io.spokestack.spokestack.SpeechPipeline.start(SpeechPipeline.java:182) 
        at io.spokestack.spokestack.Spokestack.start(Spokestack.java:182) 
        at mypackagename.MainActivity.onCreate(MainActivity.kt:54) 
        at android.app.Activity.performCreate(Activity.java:7955) 
        at android.app.Activity.performCreate(Activity.java:7944) 
        at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1307) 
        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3423) 
        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3595) 
        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:83) 
        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) 
        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) 
        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2147) 
        at android.os.Handler.dispatchMessage(Handler.java:107) 
        at android.os.Looper.loop(Looper.java:237) 
        at android.app.ActivityThread.main(ActivityThread.java:7814) 
        at java.lang.reflect.Method.invoke(Native Method) 
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) 
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1075) 
     Caused by: java.lang.ClassNotFoundException: Didn't find class "com.google.auth.oauth2.ServiceAccountCredentials" on path: DexPathList[[zip file "/data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/base.apk"],nativeLibraryDirectories=[/data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/lib/arm64, /data/app/mypackagename-IVppXU7KnFHxIENF0_Db1w==/base.apk!/lib/arm64-v8a, /system/lib64]]
        at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:196)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:379)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
        at io.spokestack.spokestack.google.GoogleSpeechRecognizer.<init>(GoogleSpeechRecognizer.java:66) 
        at java.lang.reflect.Constructor.newInstance0(Native Method)

I'm using the .json file from the service account configured in GCP.
What could be the issue here?

Thank you! 🙏

TFWakeWordAzureASR Profile

Hello,

Your docs indicate TFWakewordAzureASR to be a valid pipeline profile.

java.lang.IllegalArgumentException: TFWakewordAzureASR pipeline profile is invalid!

What is the correct way to call upon the profile?

Network error while using VADTriggerAndroidASR Profile

Hi I am trying to implement the following profile VADTriggerAndroidASR - which seems to give NETWORK_ERROR always after activation. Please find the log below.

Can you please suggest a solution for this?
Some preliminary google search gave the following result.

This might happen due to having an overlapping MediaRecorder or AudioRecord instance active at the same time (link)

{ isActive: true,
      error: 'io.spokestack.spokestack.android.SpeechRecognizerError: SpeechRecognizer error code 2: NETWORK_ERROR\n\tat AndroidSpeechRecognizer$SpokestackListener.onError(AndroidSpeechRecognizer.java:143)\n\tat android.speech.SpeechRecognizer$InternalListener$1.handleMessage(SpeechRecognizer.java:450)\n\tat android.os.Handler.dispatchMessage(Handler.java:106)\n\tat android.os.Looper.loop(Looper.java:216)\n\tat android.app.ActivityThread.main(ActivityThread.java:7266)\n\tat java.lang.reflect.Method.invoke(Native Method)\n\tat com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:494)\n\tat com.android.internal.os.ZygoteInit.main(ZygoteInit.java:975)\n',
      message: null,
      transcript: '',
      event: 'ERROR' }

training tflite model

Hi,
Thanks for the pipeline. Any plans to release the code to train tflite models?

Add proguard rules to keep spokestack even when used dynamically

When a project is minified using proguard, Spokestack classes can get removed unless they are loaded and used up front. Some apps may not want to initialize Spokestack until later (e.g. after authentication). I think there's a way to add proguard rules to the project to keep spokestack through minification. For instance, with -keep class com.pylon.spokestack.** { *; }.

Wakeword-only profile

For some use cases, it's helpful to have wakeword detection without ASR. This configuration is fully supported by Spokestack, but it would be convenient to have a premade pipeline profile that omits ASR to simplify setup.

Implementing this is as simple as copying the TFWakewordGoogleASR profile and omitting the ASR stage. It would probably also be helpful to configure low values for both wake-active-min and wake-active-max (used by the ActivationTimeout stage) since the pipeline shouldn't stay active for long, lest it miss a subsequent wakeword utterance.

Alternatively, the PreASRMicrophoneInput stage could be used as an input stage in conjunction with longer activation timeouts to allow the application to control the microphone while the pipeline is active.

A complete implementation will also add the new profile to the profile test for a sanity check.

Missing three trained TensorFlow Lite models for android

Hi, Thank you for the voice pipelines. I couldn't find the three models that you mentioned over github.

The wakeword trigger uses three trained TensorFlow Lite models: a filter model for spectrum preprocessing, an autoregressive encoder encode model, and a detect decoder model for keyword classification

Can you please guide where to download?

Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.