Coder Social home page Coder Social logo

Unable to load models about whisperkit HOT 11 CLOSED

argmaxinc avatar argmaxinc commented on August 25, 2024
Unable to load models

from whisperkit.

Comments (11)

ZachNagengast avatar ZachNagengast commented on August 25, 2024

Turbo has a tough time on M1 specifically so we recommend non-turbo versions for that device, which models have worked for you? Also when you manually download can you confirm those files look right? There can sometimes be issues with git-lfs

from whisperkit.

pmanot avatar pmanot commented on August 25, 2024

I've checked and made sure all the files are being downloaded correctly. I've tried using the non turbo version for large_v3 and it fails as well (stuck at loading).

The only model I've got working so far has been base. And for some reason, it's stopped working too now.

Here's the code snippet

private func transcribeAudio(audioPath: String) {
        Task {
            do {
                print("started")
               // try await print(WhisperKit.fetchAvailableModels())
                
                let whisperKit = try await WhisperKit()
                try await whisperKit.loadModels()
                
                let result = try await whisperKit.transcribe(audioPath: audioPath)
                
                await MainActor.run {
                    self.transcription = result?.text ?? "Transcription failed"
                }
            } catch {
                print(error.localizedDescription)
            }
            
        }
    }

My console log just says started and then gets stuck

from whisperkit.

pmanot avatar pmanot commented on August 25, 2024

Is there any way I can download and prepare the models beforehand so it doesn't need time to load / prewarm when I'm using it? Sorry if this is a dumb question, I have no experience with CoreML.

from whisperkit.

ZachNagengast avatar ZachNagengast commented on August 25, 2024

Gotcha, try this out, it should give you a bit more info about whats going on.

    private func transcribeAudio(audioPath: String) {
        Task {
            do {
                print("started")
                // try await print(WhisperKit.fetchAvailableModels())

                let whisperKit = try await WhisperKit(verbose: true, logLevel: .debug)

                let result = try await whisperKit.transcribe(audioPath: audioPath)

                await MainActor.run {
                    let test = result?.text ?? "Transcription failed"
                }
            } catch {
                print(error.localizedDescription)
            }
        }
    }

There was also 1 extra call to loadModels which isn't needed unless setting let whisperKit = try await WhisperKit(verbose: true, logLevel: .debug, prewarm: false) when initializing.

from whisperkit.

ZachNagengast avatar ZachNagengast commented on August 25, 2024

Is there any way I can download and prepare the models beforehand so it doesn't need time to load / prewarm when I'm using it? Sorry if this is a dumb question, I have no experience with CoreML.

CoreML does require the specializing stage but usually after thats done once the models are much faster to load. You can definitely prewarm them ahead of time with something like this:

            let whisperKit = try await WhisperKit(
                verbose: true,
                logLevel: .debug,
                prewarm: true
            )

Then you would just have to reload the models at the time you're ready to use them like you had already try await whisperKit.loadModels()

from whisperkit.

pmanot avatar pmanot commented on August 25, 2024

Gotcha, try this out, it should give you a bit more info about whats going on.

    private func transcribeAudio(audioPath: String) {
        Task {
            do {
                print("started")
                // try await print(WhisperKit.fetchAvailableModels())

                let whisperKit = try await WhisperKit(verbose: true, logLevel: .debug)

                let result = try await whisperKit.transcribe(audioPath: audioPath)

                await MainActor.run {
                    let test = result?.text ?? "Transcription failed"
                }
            } catch {
                print(error.localizedDescription)
            }
        }
    }

There was also 1 extra call to loadModels which isn't needed unless setting let whisperKit = try await WhisperKit(verbose: true, logLevel: .debug, prewarm: False) when initializing.

started
[WhisperKit] <NSProgress: 0x60000195c000> : Parent: 0x0 (portion: 0) / Fraction completed: 1.0000 / Completed: 17 of 17  
[WhisperKit] Audio source details - Sample Rate: 12000.0 Hz, Channel Count: 1, Frame Length: 25536, Duration: 2.128s
[WhisperKit] Audio buffer details - Sample Rate: 16000.0 Hz, Channel Count: 1, Frame Length: 34048, Duration: 2.128s
[WhisperKit] Audio loading time: 0.006842970848083496
[WhisperKit] Audio convert time: 6.401538848876953e-05
[WhisperKit] Loading models from /Users/puravmanot/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base with prewarmMode: false
[WhisperKit] Loading feature extractor
[WhisperKit] Loaded feature extractor
[WhisperKit] Loading audio encoder

[WhisperKit] Loaded audio encoder
[WhisperKit] Loading text decoder
Invalid group id in source layers: logits_tensor
[WhisperKit] Loaded text decoder
[WhisperKit] Loading tokenizer for base

[WhisperKit] Loaded tokenizer
[WhisperKit] Loaded models for whisper size: base
[WhisperKit] Decoder init time: 0.00922095775604248
[WhisperKit] Prefill time: 5.996227264404297e-05
[WhisperKit] Prefill prompt: ["<|startoftranscript|>", "<|en|>", "<|transcribe|>", "<|0.00|>"]
[WhisperKit] Decoding Seek: 0
[WhisperKit] Decoding 0.0s - 2.128s
[WhisperKit] Decoding with tempeartures [0.0, 0.2, 0.4, 0.5996, 0.8, 1.0]
[WhisperKit] Decoding Temperature: 0.0
[WhisperKit] Running main loop for a maximum of 224 iterations, starting at index 0
[WhisperKit] Forcing token 50258 at index 0 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  0 Input Token: 50258
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit]  0.000000 |  0.000000 |           1 |            0 | 0
[WhisperKit]  0.000000 |  0.000000 |           0 |       -10000 | 1
[WhisperKit]  0.000000 |  0.000000 |           0 |       -10000 | 2
[WhisperKit]  0.000000 |  0.000000 |           0 |       -10000 | 3
[WhisperKit] tokenIndex: 0, token: 50259, word: <|en|>
[WhisperKit] Forcing token 50259 at index 1 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  1 Input Token: 50259
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.126221 |  0.048004 |           0 |            0 | 0
[WhisperKit]  0.000000 |  0.000000 |           1 |            0 | 1
[WhisperKit]  0.000000 |  0.000000 |           0 |       -10000 | 2
[WhisperKit]  0.000000 |  0.000000 |           0 |       -10000 | 3
[WhisperKit] tokenIndex: 1, token: 50359, word: <|transcribe|>
[WhisperKit] Forcing token 50359 at index 2 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  2 Input Token: 50359
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.126221 |  0.048004 |           0 |            0 | 0
[WhisperKit]  0.309326 | -0.555176 |           0 |            0 | 1
[WhisperKit]  0.000000 |  0.000000 |           1 |            0 | 2
[WhisperKit]  0.000000 |  0.000000 |           0 |       -10000 | 3
[WhisperKit] tokenIndex: 2, token: 50363, word: <|notimestamps|>
[WhisperKit] Forcing token 50364 at index 3 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  3 Input Token: 50364
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.126221 |  0.048004 |           0 |            0 | 0
[WhisperKit]  0.309326 | -0.555176 |           0 |            0 | 1
[WhisperKit]  0.491211 |  0.045319 |           0 |            0 | 2
[WhisperKit]  0.000000 |  0.000000 |           1 |            0 | 3
[WhisperKit] tokenIndex: 3, token: 709, word:  much
[WhisperKit] tokenIndex: 4, token: 18587, word:  simpler
[WhisperKit] tokenIndex: 5, token: 10063, word:  conclusion
[WhisperKit] tokenIndex: 6, token: 457, word:  but
[WhisperKit] tokenIndex: 7, token: 512, word:  some
[WhisperKit] tokenIndex: 8, token: 295, word:  of
[WhisperKit] tokenIndex: 9, token: 291, word:  you
[WhisperKit] tokenIndex: 10, token: 485, word: ...
[WhisperKit] tokenIndex: 11, token: 50464, word: <|2.00|>
[WhisperKit] [0.00 --> 2.00] <|startoftranscript|><|en|><|transcribe|><|0.00|> much simpler conclusion but some of you...<|2.00|><|endoftext|>
[WhisperKit] ---- Transcription Timings ----
[WhisperKit] Audio Load:              6.91 ms /      1 runs (    6.91 ms/run)  0.48%
[WhisperKit] Audio Processing:       11.75 ms /      1 runs (   11.75 ms/run)  0.82%
[WhisperKit] Mels:                  406.63 ms /      1 runs (  406.63 ms/run) 28.51%
[WhisperKit] Encoding:               51.07 ms /      1 runs (   51.07 ms/run)  3.58%
[WhisperKit] Matrices Init:           9.22 ms /      1 runs (    9.22 ms/run)  0.65%
[WhisperKit] Prefill:                 0.06 ms /      1 runs (    0.06 ms/run)  0.00%
[WhisperKit] Decoding:              846.83 ms /     12 runs (   70.57 ms/run) 59.37%
[WhisperKit] Non-inference:          62.20 ms /     12 runs (    5.18 ms/run)  4.36%
[WhisperKit] - Sampling:             29.68 ms /     12 runs (    2.47 ms/run)  2.08%
[WhisperKit] - Kv Caching:           26.89 ms /     12 runs (    2.24 ms/run)  1.88%
[WhisperKit] - Windowing:             1.22 ms /      1 runs (    1.22 ms/run)  0.09%
[WhisperKit] Fallbacks:               0.00 ms /      0 runs (    0.00 ms/run)  0.00%
[WhisperKit] Decoding Full Loop:   1398.27 ms /     12 runs (  116.52 ms/run) 98.03%
[WhisperKit] -------------------------------
[WhisperKit] Model Load Time:     56.56 seconds
[WhisperKit] Inference Duration:  1.43 seconds
[WhisperKit] - Decoding Loop:     1.40 seconds
[WhisperKit] Time to first token: 1.09 seconds
[WhisperKit] Total Tokens:        18
[WhisperKit] Tokens per Second:   8.58 tok/s
[WhisperKit] Real Time Factor:    0.70
[WhisperKit] Fallbacks:           0.0
[WhisperKit] [0.00 --> 2.00] <|startoftranscript|><|en|><|transcribe|><|0.00|> much simpler conclusion but some of you...<|2.00|><|endoftext|>

from whisperkit.

pmanot avatar pmanot commented on August 25, 2024

Is there any way I can download and prepare the models beforehand so it doesn't need time to load / prewarm when I'm using it? Sorry if this is a dumb question, I have no experience with CoreML.

CoreML does require the specializing stage but usually after thats done once the models are much faster to load. You can definitely prewarm them ahead of time with something like this:

            let whisperKit = try await WhisperKit(
                verbose: true,
                logLevel: .debug,
                prewarm: true
            )

Then you would just have to reload the models at the time you're ready to use them like you had already try await whisperKit.loadModels()

Does this have to be done each time the app is run? Or is it cached / only required the first time.

from whisperkit.

ZachNagengast avatar ZachNagengast commented on August 25, 2024

The system could purge the cache at any time so it's recommended that prewarm is run every time, but if you're fairly confident the models will load in a reasonable time on your device you can skip the prewarm step and just go straight to loading

let whisperKit = try await WhisperKit(
    verbose: true,
    logLevel: .debug,
    prewarm: false,
    load: true,
)

There is no "apple approved" way to check if there is already a specialized/cached model on the system, unfortunately, but they do show up in the temp directory on macs.

from whisperkit.

atiorh avatar atiorh commented on August 25, 2024

@pmanot The performance numbers you pasted above look way off. Do you mind sharing your OS version, Mac device spec (M1, M1 Pro, M1 Max?) and sharing the result of the second run of the same command? e.g. Melspectrogram should have taken 2-3ms but it took 400ms for you. Decoding speed is also unexpectedly low. Thank you in advance!🙏

from whisperkit.

ZachNagengast avatar ZachNagengast commented on August 25, 2024

@pmanot Checking in, were you able to get them loading properly? If you can share your device specs that would be helpful for us to know as well 👍

from whisperkit.

ZachNagengast avatar ZachNagengast commented on August 25, 2024

Hi @pmanot closing this issue for now, if you still are having issues let us know 👍

from whisperkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.