stateful load models once and then generate again use torch compile?</l

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Would love if <a class="user-mention notranslate" data-hovercard-type="user" data-hove

this is going to break the fucking internet <p dir="aut

speeding up inference,about suno-ai/bark

ggerganov commented on May 31, 2024 24

@gkucsko Absolutely - it is high on my todo list. Just need to wrap some ongoing efforts and will give bark a try using ggml and potentially utilizing 4-bit quantization

from bark.

PABannier commented on May 31, 2024 14

@ggerganov @gkucsko FYI I've started implementing encodec.cpp

from bark.

gkucsko commented on May 31, 2024 6

@ggerganov if you send us an email at bark at suno.ai, we would definitely help sponsor some of your coffee consumption during your coding sprees. Incredibly helpful for the community. Speaking as a cpp illiterate PyTorch user :)

from bark.

ggerganov commented on May 31, 2024 5

I guess a first step would be to implement the pre-EnCodec part with ggml as @gkucsko suggested.
This is not ideal since depending on a python library at the end defeats most of the purpose to have this implemented with ggml in the first place. I think we do need to implement the EnCodec inference too, and at the moment I am not sure how difficult this will be.

The good news, is that at llama.cpp we are making good progress with ggml development and more and more people are becoming familiar with the codebase and contributing. I think the best way to go is I will bring attention to bark and hopefully people will help out.

But unfortunately, my expectation is that it will likely take quite some time before we have something working.

from bark.

ggerganov commented on May 31, 2024 4

Had a bit more detailed look into this - it will be more difficult than I initially imagined since bark uses Facebook's EnCodec codec. I'll need to implement that codec first, and it looks not trivial.

from bark.

gkucsko commented on May 31, 2024 2

Would love if @ggerganov is willing to have a look if it's doable to convert to cpp 8-bit. Has been amazing for whisper and llama.

from bark.

ggerganov commented on May 31, 2024 2

Haven't started looking into bark yet. I've posted it in the llama.cpp Roadmap for this month and hopefully it gets to people's attention and we get some help from the community. Will keep you posted if there are any updates

from bark.

3dluvr commented on May 31, 2024 1

this is going to break the fucking internet

not until voice cloning fully works, then the Armageddon will happen.

from bark.

gkucsko commented on May 31, 2024 1

legend!

from bark.

gkucsko commented on May 31, 2024

compile mostly helps during training afaik. model should already be held in memory and not loaded every time. there are other ways of improving inference speed. One is kv caching, currently in a PR, and then some others that we are currently working on for a new version. That's said, nothing works as well as a modern gpu unfortunately. Contributions from the community always welcome though (such as quantization and other tricks)

from bark.

bryanhpchiang commented on May 31, 2024

awesome! where does the model get held in memory? i have a modern GPU but the inference is still not real-time for me

from bark.

remybonnav commented on May 31, 2024

I have no idea how everything is working, but couldn't you implement xformers or transformers with/without Jax to increase speed?

Inference is really too slow

from bark.

gkucsko commented on May 31, 2024

lots of ways to improve, especially on older hardware / gpus. a great PR also just improved by almost 2x: #27

from bark.

gkucsko commented on May 31, 2024

amazing tyty! happy to help in any way (finetuning etc).

from bark.

bryanhpchiang commented on May 31, 2024

this is going to break the fucking internet

from bark.

gkucsko commented on May 31, 2024

Hm, happy to look into that with you. Although that step is very fast, is there an option to somehow run that part as-is without any optimizations? Shouldn’t be the bottleneck

from bark.

sinhprous1 commented on May 31, 2024

hey @gkucsko what GPU that we can run bark inference real-time?

from bark.

santiarias commented on May 31, 2024

We should reach out to the community to raise whatever funds are needed to make this happen.

from bark.

gkucsko commented on May 31, 2024

that would be awesome ya. one step at a time i guess and let's see what we can do :)

from bark.

mparrett commented on May 31, 2024

This is a mac-specific optimization that would be interesting to see: https://github.com/apple/ml-ane-transformers

from bark.

gkucsko commented on May 31, 2024

@ggerganov how are things developing on your end? anything i could help with? btw i believe the folks at huggingface are currently in the process of integrating encodec into their stack as well, so encodec probably has a long life ahead of it in the audio world if that helps. I'm sure @adefossez would appreciate work in this direction as well.

from bark.

gkucsko commented on May 31, 2024

Awesome thanks!

from bark.

danielklk commented on May 31, 2024

I don't know anything about programming, but I'm trying to use Bark following a tutorial and inside G Colab environment. It takes a lifetime to generate one paragraph from text to speech. And every time I want to run a new paragraph, I must re-run all commands saved at G colab, no matter if I saved collab file. I mean like 6 hours or more for a one-page text file. If I am doing something wrong and someone could help me, I would be very glad.

from bark.

danielklk commented on May 31, 2024

I don't know anything about programming, but I'm trying to use Bark following a tutorial and inside G Colab environment. It takes a lifetime to generate one paragraph from text to speech. And every time I want to run a new paragraph I must re-run all commands saved at G colab, no matter if I saved colab file. I mean like 6 hours or more for a one-page text file. If I am doing something wrong and someone could help me, I would be very glad.

from bark.

speeding up inference about bark HOT 24 OPEN

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent