cay-zhang / swiftspeech Goto Github PK

View Code? Open in Web Editor NEW

434.0 434.0 52.0 8.24 MB

A speech recognition framework designed for SwiftUI.

License: MIT License

Swift 98.70% Ruby 1.30%

audio combine ios speech-recognition swift swiftui user-voice voice-recognition

swiftspeech's Introduction

Hi there 👋

🔭 Developer of RSSBud and SwiftSpeech
💬 Ask me about SwiftUI
📫 Reach me on Telegram
😄 Pronouns: He/Him

Support me

swiftspeech's People

Contributors

Stargazers

Watchers

Forkers

kbullaughey sunmoonlotus pigcay minsone rosssong thepowerofswift minsone-opensource-fork ios-swift-controls vietrung23 ritwikbiswas ignotusverum sinxs21 xygkevin dailymemo luckyreny cowbellsolo lemoncandy42 c0per raminkhatami yaalsn janx2 soltrinox agottiparthy randomtask2000 wti wildthink pasovsky gemacjr andretodman mission-dynamic akosmachyov saulfrr dadevmate ngd929 ctkqiang ialaddin wangxuncaigh gfreezy kingkej capmo mohd14shoeb kanakaraju043 liyongjiaoya andyzzhz mevansjr bkindler bing-deng lrlin5656 sikakwesi2020 nurrl rowzzy muneerkk66

swiftspeech's Issues

Possible volume conflict between SwiftSpeech and AVAudioPlayer?

My app involves both SwiftSpeech's features and sound effects through SwiftySound. All sound effects work fine in the simulator, but on the device all sound effects stop working once the SwiftSpeech button is pressed. I have one button that makes a "click" noise when pressed. It works until I press the SwiftSpeech button.

If I press the SwiftSpeech button first, I get a sound effect the first time, but then not for subsequent presses.

I made a new, simple project without SwiftSpeech just to test the sound, and everything worked fine on the device. I also switched out SwiftySound and used the normal AVAudioPlayer procedure, and the sound works that way, too.

So the only thing I can think of is that there is a conflict between SwiftSpeech and the sound effects. Is it possible that SwiftSpeech is turning off my sound effects? If so, how do I turn them back on? My code appears below:

`import SwiftUI
import SwiftSpeech
import SwiftySound
import AVFoundation
import AudioToolbox

// VIEW MODEL
struct ContentView: View {

let emojiArray = ["🐵","🦍","🐶","🐺","🦊","🦝","🐱","🦁","🐅","🐴","🦓","🦌","🐮","🐷","🐐","🐪","🦙","🦒","🐘","🦏","🦛","🐁","🐀","🐰","🦇","🐻","🐨","🐼","🦘","🦃","🐔","🐧","🦅","🦆","🦢","🦉","🦚","🐸","🐊","🐢","🦎","🐍","🐳","🐬","🐟","🐙","🐌","🦋","🐜","🐝","🐞","🦗","🕷","🦂","🦟"]

@State private var emoji = ""
@State private var nextEmoji = ""
@State private var text = "What is this? (Press and Hold)"
@State private var theDescription = ""
@State var isCorrect:Bool
@State var player = AVAudioPlayer()

var body: some View {
    
    ZStack(alignment:.top) {
    VStack(alignment: .center) {
        
        Text (emoji).font(.system(size: 200, weight: .bold, design: .default))
            .onAppear() {
                emoji = emojiArray.randomElement() ?? "none"
               
                theDescription = emoji.applyingTransform(.toUnicodeName, reverse: false) ?? "None"
                print (theDescription) // get the emoji's unicode name
            }

    Text (text)
            .onAppear {
                SwiftSpeech.requestSpeechRecognitionAuthorization()
            }
      .padding()
        
        SwiftSpeech.RecordButton()
        }
            .swiftSpeechRecordOnHold()
            .onRecognize { _, result in
                text = result.bestTranscription.formattedString
                print (text)
                self.text = text
                if theDescription.contains(self.text.uppercased()) == true {
                    
                    print ("That's right")
                    text = "That's right!"
                    isCorrect = true
                    playRightSound() 
                 
                }
                else {print ("That's wrong")
                    text = "Try again!"
                    isCorrect = false
                    playWrongSound()
                }
                    
            } handleError: { _, _ in }
    
    Spacer()
        
    Button("Change Animal") {
        nextEmoji = emojiArray.randomElement() ?? "none"
        while nextEmoji == emoji {
            
            nextEmoji = emojiArray.randomElement() ?? "none"
            
        }
        
        playClickSound()
        emoji = nextEmoji
        text = "What is this? (Press and hold)"
        theDescription = emoji.applyingTransform(.toUnicodeName, reverse: false) ?? "None"
        print (theDescription)
            }
  
        }
        
    }

func playRightSound(){

print ("Playing right sound")

Sound.play(file:"yay.wav")

       }

 func playWrongSound() {
    
 Sound.play(file:"raspberry.wav")
    
}

func playClickSound() {
    
    Sound.play(file:"click.wav")
   
}

struct ContentView_Previews: PreviewProvider {
static var previews: some View {
    
    ContentView(isCorrect:true)
    
}

}

SwiftSpeech.Demos.Basic Start recording automatically

Is there a way to make recording start automatically when I display SwiftSpeech.Demos.Basic ?

语音识别功能在iOS15模拟器下会失效

Does it work when there´s no internet connection?

Graceful notification of microphone activation failure

Hi!

I've encountered an exception due to a bug in iOS 14 beta 4 + AirPods that breaks (on software level, hopefully) mic on AirPods. In system apps (iMessage, Recorder ...) the issue prevents voice recording/recognition from working but it does not crash an app. In case of SwiftSpeech, the app crashes with uncaught exception.

Is is possible to catch such a failure and gracefully pass notification with the error? Or at least prevent the crash.

The log message is:
Terminating app due to uncaught exception 'com.apple.coreaudio.avfaudio', reason: 'required condition is false: IsFormatSampleRateAndChannelCountValid(format)'

Is there a way to get the recorded audio?

Is there a way to get the recorded audio instead of only recognized text?

Automatic stop of recording after some seconds of silence

Hello ✌️ Thank you for such a wonderful library!

In my app I wanted to implement something similar to dictation button in Safari search:

User taps button
User speaks
When user is not speaking for about 2 seconds, dictation stops automatically

RPReplay_Final1698318345.MP4

This way user doesn't need to tap on a button again to stop dictation. There are built-in methods swiftSpeechRecordOnHold and swiftSpeechToggleRecordingOnTap, but both of them need additional interaction from user. Also there was a need for different button.

Here is how I solved this, maybe this will be helpful for somebody in the future. Will be happy to hear any comments on how this can be done better:


import SwiftUI
import SwiftSpeech

// Creating new extension with custom record button view
public extension SwiftSpeech {
    struct RecordButtonCustom: View {
        public var body: some View {
            RecordButtonView()
        }
    }
}

// Define new EnvironmentKey for custom state
struct DictationState: EnvironmentKey {
    static let defaultValue: SwiftSpeech.State = .pending
}

// Define new Environment Values for custom state
extension EnvironmentValues {
    var dictationState: SwiftSpeech.State {
        get {
            self[DictationState.self]
        }
        set {
            self[DictationState.self] = newValue
        }
    }
}

struct SwiftSpeechView: View {
    @State private var text = "Tap to Speak"
    @State private var timer: Timer?
    @State var dictationState: SwiftSpeech.State = .pending
    
    var body: some View {
        VStack() {

            Text(text)
            
            SwiftSpeech
                .RecordButtonCustom()
                .swiftSpeechToggleRecordingOnTap(locale: Locale(identifier: "en_US"))
                .onRecognizeLatest(
                    includePartialResults: true,
                    handleResult: { session, result in
                        text = result.bestTranscription.formattedString
                        
                        timer?.invalidate()
                        // initiate timer to stop recording after 2 seconds of silence
                        timer = Timer.scheduledTimer(withTimeInterval: 2.0, repeats: false) { timer in
                            session.stopRecording()
                            dictationState = .pending
                        }
                    },
                    handleError: { session, error in
                        text = "Error \((error as NSError).code)"
                    
                        session.stopRecording()
                        dictationState = .pending
                })
                .onStartRecording { session in
                    dictationState = .recording
                }
                .onStopRecording { session in
                    dictationState = .pending
                }
                .onCancelRecording{ session in
                    dictationState = .cancelling
                }

        }
        .onAppear {
            SwiftSpeech.requestSpeechRecognitionAuthorization()
        }
        .environment(\.dictationState, dictationState)
    }
}

#Preview {
    SwiftSpeechView()
}

import SwiftUI
import SwiftSpeech

struct RecordButtonView: View {
    
    @Environment(\.dictationState) var state: SwiftSpeech.State
    
    public init() { }
    
    var icon: String {
        switch state {
        case .pending:
            return "mic"
        case .recording:
            return "mic.fill"
        case .cancelling:
            return "xmark"
        }
    }
    
    public var body: some View {
        Button("Dictate", systemImage: icon, action: {
            print("Dictate")
        })
        .buttonStyle(.borderless)
        .labelStyle(.iconOnly)
        .help("Dictate")
    }
    
}

#Preview {
    RecordButtonView()
}

My stack:
Xcode 15.1 beta
visionOS 1.0

Disable recording from somewhere else

Is it possible to stop the recording from somewhere else in my logic?
swiftSpeechState is a getOnly property.
Thanks!

It's possible to save the audio during speech to text?

Maybe like,

SwiftSpeech.RecordButton()                                      
    .swiftSpeechRecordOnHold(sessionConfiguration:animation:distanceToCancel:)
    .onRecognizeLatest(update: $text)                           
    .onFinal(saveTo: url)

IOS 16 beta 2

Worked fine up to iOS 15.6. I tested it on IOS 16 beta 2 on iPhone 12 Pro Max and I always get Thread 1: Fatal error : recordingSession is nil in EndRecording() in the following function when I release the speech button.

fileprivate func endRecording() {
guard let session = recordingSession else { preconditionFailure("recordingSession is nil in (#function)") }
recordingSession?.stopRecording()
delegate.onStopRecording(session: session)
self.viewComponentState = .pending
self.recordingSession = nil
}

install issues

when i put this install url，i got error“Unable to find a specification for 'SwiftSpeech'.”

would you please tell me how can i install it through cocoapods？

SwiftSpeech and Other Languages

I know from the examles that SwiftSpeech can handle all supported languages. But I don't see how to implement this functionality. I gather from the example that I must add like this for Hebrew:

public init(locale: Locale = .autoupdatingCurrent) { self.locale = locale } public init(localeIdentifier: String) { self.locale = Locale(identifier: "he-IL") }

but I don't understand how to use this setting in SwiftSpeech:

`Text (text)
.onAppear {
SwiftSpeech.requestSpeechRecognitionAuthorization()
}

         SwiftSpeech.RecordButton()
                .swiftSpeechRecordOnHold(sessionConfiguration: .init(audioSessionConfiguration:     .playAndRecord))
            .onRecognize { _, result in
                
                text = result.bestTranscription.formattedString
               
                self.text = text
                if text == word {                 // word from array, checking pronunciation
                    
                    playRightSound()
                            }

                else {
                   
                    playWrongSound()
                
                        }
                    
                        } handleError: { _, _ in }
            
                }
        
        }
    
      }`

I appreciate your help!

Swift Playgrounds Compatibility

@Cay-Zhang
Adding this to Swift Playgrounds on iPadOS results in an error message: “package doesn't have version tags”. I fixed that in my fork by adding a new tag removing the “v“ prefix.

You can also have a look at this:
erikdoe/ocmock#496

Unable to use with TextToSpeech

Firstly, the library is awesome.

But I bumped an issue when trying to use this with the text to speech.

Simply you can also reproduce the issue with the following code;

import AVFoundation
func onSpeechToTextEnded() {
   let utterance = AVSpeechUtterance(string: "Hello world")
   utterance.voice = AVSpeechSynthesisVoice(language: "en-GB") 

   let synthesizer = AVSpeechSynthesizer()
   synthesizer.speak(utterance)
}

if I try to call this function (onSpeechToTextEnded) before actually using this library, I can hear the voice.
But when I try calling this function to hear some voices it is now working.

Can you investigate the issue please

Speech Recognition string does not match a (hard coded) string

I assign the speech recognition string to a @State var speechRecogText and I check if the other hardcoded string private var text contains the string from the speech recognition. This works properly and prints contains... in English. But with the Arabic Language, it does not work.

@State var speechRecogText: String = ""
private var text: String = "قل"

if textFieldText.contains(speechRecogText) {
            print("contaions voice text")
        } else {
            print("doesn't contain voice text")
        }

Console
// doesn't contain voice text

However when I try to swap the variables like this:

if speechRecogText.contains(textFieldText) {
            print("contains text")
        } else {
            print("doesnt contain text")
        }

Console
// contains text

What might be the reason for this? Does it have to do anything with the Language or how Strings actually behave?