Comments (13)
I'd also like to see an SFSpeechRecognizer-like API for easy replacement of SFSpeechRecognizer.
from swiftwhisper.
+1 for this feature
from swiftwhisper.
Yes, this would be a great feature.
from swiftwhisper.
Right now, I'm setting a timer to start + stop the transcription every 2 seconds, however it's not that accurate because if a word is cut off, then whisper tries to improvise, and the text often has hallucinations.
from swiftwhisper.
how would whiper officially support real time? The cut off issue is same for official library, correct? @fakerybakery
from swiftwhisper.
how would whiper officially support real time? The cut off issue is same for official library, correct? @fakerybakery
The Whisper cpp repo has examples of how to implement realtime.
from swiftwhisper.
I think that the whisper.cpp library stores some of the previous recording history and uses that to fix the cutoff issue, but I'm not sure.
from swiftwhisper.
how would whiper officially support real time? The cut off issue is same for official library, correct? @fakerybakery
The Whisper cpp repo has examples of how to implement realtime.
thanks for point that out. Just curious, why it's in Obj-C but not in swift version...
from swiftwhisper.
I don't know why, but if someone could port the example to Swift, I would really appreciate that (I'm really bad at Obj-C).
from swiftwhisper.
I think that the whisper.cpp library stores some of the previous recording history and uses that to fix the cutoff issue, but I'm not sure.
Yep, I believe it does too – see this line (and line 245)
from swiftwhisper.
Don't have a great understanding, but to me it looks like whisper.objc is storing the contents of a buffer when it fills up, then calling it's transcribe function against what it just stored, while clearing the buffer and re-enqueuing it. I don't know a ton about AVFAudio, but does anyone know if you could use AVAudioEngine and AVAudioPCMBuffer to create similar functionality? I'm thinking you could call Whisper.transcribe here with the buffer data if you can get that buffer data back from AVAudioEngine. Does anyone know if that would work?
from swiftwhisper.
@barkb have you ever found a solution to this real-time idea?
from swiftwhisper.
Related Issues (20)
- memory free up issue HOT 1
- Confidence value for segment? HOT 3
- GPU vs CPU HOT 1
- Core ML Support HOT 2
- MLModelAsset: load failed with error HOT 2
- Are there plans to support more parameters in WhisperParams?
- Add support for word-level timestamps `--word_timestamps` HOT 3
- Is there a possibility to pause and resume transcription? HOT 2
- Running compiled C++ version of program is faster than using Library HOT 2
- Do you have more boilerplate code (examples) to show an example of some application where the speech recognition is working? HOT 6
- Missing required module 'whisper_cpp' when using the SwiftWhisper code. HOT 4
- Setting --max-len does not change the value of the results HOT 1
- watchOS support? HOT 7
- missing required module whisper_cpp HOT 5
- Example? HOT 1
- Split on word HOT 2
- CPU cost 400% always HOT 1
- Data about every word in each segment HOT 8
- Progress delegate is not working HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from swiftwhisper.