Coder Social home page Coder Social logo

swiftwhisper's People

Contributors

exphat avatar jordibruin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swiftwhisper's Issues

Split segment to words

At first thank you for your job
i have a question: when i transcribe audio file as PCM [Float] i receive as result [Segment]
i noticed that each Segment may contain not separate word, but sentence
how i can split sentence into separate words with timestamp for each?
I tried to use WhisperParams fields:

  1. max_len = 1
  2. split_on_word = true

but result always the same
The only thing is help me decrease words in sentence is using beamSearch strategy, but i still get sentence instead of separate words

my code

let params = WhisperParams(strategy: .beamSearch)
params.max_len = 1
params.split_on_word = true
whisper = Whisper(fromFileURL: modelUrl, withParams: params)

Progress delegate is not working

I tried, no call back.

If we check cpp code, it also didn't update anything to upper oc or swift layer.

// main loop
while (true) {
const int progress_cur = (100*(seek - seek_start))/(seek_end - seek_start);
while (progress_cur >= progress_prev + progress_step) {
progress_prev += progress_step;
if (params.print_progress) {
fprintf(stderr, "%s: progress = %3d%%\n", func, progress_prev);
}
}

How does it work? Does it even works?

Add support for word-level timestamps `--word_timestamps`

First of all: Thank you for coding this Swift Package — it’s terrific! 🙏

What I’m missing: I’d love to get word-level timestamps like mentioned in the Whisper API.

For my understanding this would require that we can set --word_timestamps to true. (Maybe WhisperParams would be a good place for that?)

Keep up the great work!

Best, Martin

Error initializing Segment outside SwiftWhisper

If I try to initialize a Segment struct inside my app, I get the error: "'Segment' initializer is inaccessible due to 'internal' protection level"

See: https://docs.swift.org/swift-book/documentation/the-swift-programming-language/accesscontrol/

"The default memberwise initializer for a structure type is considered private if any of the structure’s stored properties are private. Likewise, if any of the structure’s stored properties are file private, the initializer is file private. Otherwise, the initializer has an access level of internal."

So the build-in memberwise initializer is only available within the package. If you don't provide a public initializer, you won't be able to create your struct from outer space. (https://stackoverflow.com/questions/54673224/public-struct-in-framework-init-is-inaccessible-due-to-internal-protection-lev)

unsafe build flags?

I added SwiftWhisper as a SPM dependency, but I am getting an error about "unsafe build flags". I will try to track it down if I have time, but wanted to check if you know what it was?

Screenshot 2023-04-01 at 17 02 25

Crash when using Large-v3 model

When I use the large-v3 model in my swiftui app, the app crashes. The model is loaded correctly, but then errors with:

Assertion failed: (mel_inp.n_mel == n_mels), function whisper_encode_internal, file whisper.cpp, line 1430.

Everything works as expected with the medium model.

CPU cost 400% always

Is there some interface expose outside to control the number of thread used? It takes 4 thread and always takes really long time to translate a couple of seconds audio. If we use more thread, leverage bio chip capability, would that be faster?

Setting --max-len does not change the value of the results

I have experimented with the actual whisper.cpp library and this library by setting max-len to the same amount so that I can control the number of words per segment. It does not work as expected for SwiftWhisper. It effectively ignores the value of --max-len meanwhile original cpp library does not ignore it.

Diarization

Hi!!
First of all great library!
So thanks for that!
Second, is there an API for diarization?

memory free up issue

how to release it if I don't want it or want to initialize another model, it seems the memory never freed up

Split on word

Firstly, thanks for sharing your library, it's great.

I'm trying to get a transcription that splits on each word. I understand (perhaps incorrectly) that to do this I need to set max_len=1 and split_on_word=true. Found here: https://github.com/ggerganov/whisper.cpp#word-level-timestamp

However I see no change in the segments in that they always seem to be split on the default/automatic settings. Please let me know if I'm doing something wrong. Here's my code:

let params = WhisperParams()
params.language = .english
params.max_len = 1
params.split_on_word = true

let whisper = Whisper(fromFileURL: Bundle.main.url(forResource: "ggml-tiny.en", withExtension: "bin")!, withParams: params)

let segments = try await whisper.transcribe(audioFrames: audioFrames)
transciption = segments.map(\.text).joined()

Are there plans to support more parameters in WhisperParams?

For example initial_prompt, max_len, split_on_word, etc.

public init(strategy: WhisperSamplingStrategy = .greedy) {
    self.whisperParams = whisper_full_default_params(whisper_sampling_strategy(rawValue: strategy.rawValue))
    self.language = .auto
}

crash when initializing with an invalid model

When Whisper.init(fromFileURL) is called with a file URL that is a file that exists, but not a valid model file, the error condition from the underlying whisper.cpp library is not handled.

Specifically:
self.whisperContext = fileURL.relativePath.withCString { whisper_init_from_file($0) }

whisper_init_from_file will return nullptr in this case. The attempted assignment produces the following error which crashes the program using the library.

whisper_init_from_file_no_state: loading model from '.'
whisper_model_load: loading model
whisper_model_load: invalid model data (bad magic)
whisper_init_no_state: failed to load model
SwiftWhisper/Whisper.swift:16: Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value

I'd like to add a check to this initializer so my program can catch and safely handle this case. In theory, my program could attempt to figure out if this was a valid model file, but this would involve re-implementing the detection code from whisper.cpp that I am trying to wrap. Letting that code that is already doing the error handling just pass the error through seems like a better arrangement.

Doing so would probably require changing the init signature to a throwing one or a fail-able one. I understand that this would involve a change in API here. Is there a way to handle this that would be likely to be accepted as a PR? Is there a more general plan for how to handle this sort of error case?

License

I know your repo is very new, but I was wondering if you could add a License so I know if I can add it in my project?

large-v3

is there some new updates for tiny, medium model? Large doesn't work very well on mobile, I won't expect v3 can improve too much.

Real-time transcription

Hey, awesome package!

I wanted to ask how one could use this for on-device realtime description with microphone audio, similar to the objc example from the whisper.cpp package

Is there a demo example?

is there a demo example to learn about how to download the models in the app upon demand and then use those models to transcribe? Thank you

MLModelAsset: load failed with error

I've been following the whisper.cpp project to create the mlmodelc file. However, I've encountered an issue where the weights/weight.bin file, which is required by SwiftWhisper, is not being created.

So when I run project with SwiftWhisper coreml, The exact error message I'm receiving is:

Could not open .../ggml-base-encoder.mlmodelc/weights/weight.bin

I'm not sure what I might be missing or doing incorrectly. Any guidance or suggestions would be greatly appreciated.

missing required module whisper_cpp

I've tried using the master and fast version of this package in xcode and keep running into this issue with the message that I'm missing a required module called whisper_cpp.
image

Thank you!

Example?

Hello,
I'm relatively new to Swift, and I got confused with the AudioKit convertAudioFileToPCMArray.
Does anyone have a working code example I might be able to refer to?
Thank you!

GPU vs CPU

Is it possible to use this either on the CPU or GPU - specifically on macOS Apple Silicon machines. Is this configurable, automatic or not available?

Thanks

Running compiled C++ version of program is faster than using Library

I expected that since the swift package uses the C++ code through interop, it would be just as fast. I did a test transcription using the same wav file and the base.en model. Running the main example from whisper cpp directly takes 2.7s to complete. The swift package takes >10s for the same model and wav file. Have no idea why it is happening. Can someone explain to me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.