Coder Social home page Coder Social logo

speeding up inference about bark HOT 24 OPEN

suno-ai avatar suno-ai commented on May 31, 2024 2
speeding up inference

from bark.

Comments (24)

ggerganov avatar ggerganov commented on May 31, 2024 24

@gkucsko Absolutely - it is high on my todo list. Just need to wrap some ongoing efforts and will give bark a try using ggml and potentially utilizing 4-bit quantization

from bark.

PABannier avatar PABannier commented on May 31, 2024 14

@ggerganov @gkucsko FYI I've started implementing encodec.cpp

from bark.

gkucsko avatar gkucsko commented on May 31, 2024 6

@ggerganov if you send us an email at bark at suno.ai, we would definitely help sponsor some of your coffee consumption during your coding sprees. Incredibly helpful for the community. Speaking as a cpp illiterate PyTorch user :)

from bark.

ggerganov avatar ggerganov commented on May 31, 2024 5

I guess a first step would be to implement the pre-EnCodec part with ggml as @gkucsko suggested.
This is not ideal since depending on a python library at the end defeats most of the purpose to have this implemented with ggml in the first place. I think we do need to implement the EnCodec inference too, and at the moment I am not sure how difficult this will be.

The good news, is that at llama.cpp we are making good progress with ggml development and more and more people are becoming familiar with the codebase and contributing. I think the best way to go is I will bring attention to bark and hopefully people will help out.

But unfortunately, my expectation is that it will likely take quite some time before we have something working.

from bark.

ggerganov avatar ggerganov commented on May 31, 2024 4

Had a bit more detailed look into this - it will be more difficult than I initially imagined since bark uses Facebook's EnCodec codec. I'll need to implement that codec first, and it looks not trivial.

from bark.

gkucsko avatar gkucsko commented on May 31, 2024 2

Would love if @ggerganov is willing to have a look if it's doable to convert to cpp 8-bit. Has been amazing for whisper and llama.

from bark.

ggerganov avatar ggerganov commented on May 31, 2024 2

Haven't started looking into bark yet. I've posted it in the llama.cpp Roadmap for this month and hopefully it gets to people's attention and we get some help from the community. Will keep you posted if there are any updates

from bark.

3dluvr avatar 3dluvr commented on May 31, 2024 1

this is going to break the fucking internet

not until voice cloning fully works, then the Armageddon will happen.

from bark.

gkucsko avatar gkucsko commented on May 31, 2024 1

legend!

from bark.

gkucsko avatar gkucsko commented on May 31, 2024

compile mostly helps during training afaik. model should already be held in memory and not loaded every time. there are other ways of improving inference speed. One is kv caching, currently in a PR, and then some others that we are currently working on for a new version. That's said, nothing works as well as a modern gpu unfortunately. Contributions from the community always welcome though (such as quantization and other tricks)

from bark.

bryanhpchiang avatar bryanhpchiang commented on May 31, 2024

awesome! where does the model get held in memory? i have a modern GPU but the inference is still not real-time for me

from bark.

remybonnav avatar remybonnav commented on May 31, 2024

I have no idea how everything is working, but couldn't you implement xformers or transformers with/without Jax to increase speed?

Inference is really too slow

from bark.

gkucsko avatar gkucsko commented on May 31, 2024

lots of ways to improve, especially on older hardware / gpus. a great PR also just improved by almost 2x: #27

from bark.

gkucsko avatar gkucsko commented on May 31, 2024

amazing tyty! happy to help in any way (finetuning etc).

from bark.

bryanhpchiang avatar bryanhpchiang commented on May 31, 2024

this is going to break the fucking internet

from bark.

gkucsko avatar gkucsko commented on May 31, 2024

Hm, happy to look into that with you. Although that step is very fast, is there an option to somehow run that part as-is without any optimizations? Shouldn’t be the bottleneck

from bark.

sinhprous1 avatar sinhprous1 commented on May 31, 2024

hey @gkucsko what GPU that we can run bark inference real-time?

from bark.

santiarias avatar santiarias commented on May 31, 2024

We should reach out to the community to raise whatever funds are needed to make this happen.

from bark.

gkucsko avatar gkucsko commented on May 31, 2024

that would be awesome ya. one step at a time i guess and let's see what we can do :)

from bark.

mparrett avatar mparrett commented on May 31, 2024

This is a mac-specific optimization that would be interesting to see: https://github.com/apple/ml-ane-transformers

from bark.

gkucsko avatar gkucsko commented on May 31, 2024

@ggerganov how are things developing on your end? anything i could help with? btw i believe the folks at huggingface are currently in the process of integrating encodec into their stack as well, so encodec probably has a long life ahead of it in the audio world if that helps. I'm sure @adefossez would appreciate work in this direction as well.

from bark.

gkucsko avatar gkucsko commented on May 31, 2024

Awesome thanks!

from bark.

danielklk avatar danielklk commented on May 31, 2024

I don't know anything about programming, but I'm trying to use Bark following a tutorial and inside G Colab environment. It takes a lifetime to generate one paragraph from text to speech. And every time I want to run a new paragraph, I must re-run all commands saved at G colab, no matter if I saved collab file. I mean like 6 hours or more for a one-page text file. If I am doing something wrong and someone could help me, I would be very glad.

from bark.

danielklk avatar danielklk commented on May 31, 2024

I don't know anything about programming, but I'm trying to use Bark following a tutorial and inside G Colab environment. It takes a lifetime to generate one paragraph from text to speech. And every time I want to run a new paragraph I must re-run all commands saved at G colab, no matter if I saved colab file. I mean like 6 hours or more for a one-page text file. If I am doing something wrong and someone could help me, I would be very glad.

from bark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.