Coder Social home page Coder Social logo

bark.cpp's Introduction

bark.cpp

bark.cpp

Actions Status License: MIT

Roadmap / encodec.cpp / ggml

Inference of SunoAI's bark model in pure C/C++.

Description

With bark.cpp, our goal is to bring real-time realistic multilingual text-to-speech generation to the community. Currently, I am focused on porting the Bark model in C++.

  • Plain C/C++ implementation without dependencies
  • AVX, AVX2 and AVX512 for x86 architectures
  • CPU and GPU compatible backends
  • Mixed F16 / F32 precision
  • 4-bit, 5-bit and 8-bit integer quantization
  • Metal and CUDA backends

The original implementation of bark.cpp is the bark's 24Khz English model. We expect to support multiple encoders in the future (see this and this), as well as music generation model (see this). This project is for educational purposes.

Demo on Google Colab (#95)


Here is a typical run using bark.cpp:

make -j && ./main -p "This is an audio generated by bark.cpp"

   __               __
   / /_  ____ ______/ /__        _________  ____
  / __ \/ __ `/ ___/ //_/       / ___/ __ \/ __ \
 / /_/ / /_/ / /  / ,<    _    / /__/ /_/ / /_/ /
/_.___/\__,_/_/  /_/|_|  (_)   \___/ .___/ .___/
                                  /_/   /_/


bark_tokenize_input: prompt: 'this is a dog barking.'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20579 20172 10217 27883 28169 25677 10167 129595

Generating semantic tokens: [========>                                          ] (17%)

bark_print_statistics: mem per token =     0.00 MB
bark_print_statistics:   sample time =     9.90 ms / 138 tokens
bark_print_statistics:  predict time =  3163.78 ms / 22.92 ms per token
bark_print_statistics:    total time =  3188.37 ms

Generating coarse tokens: [==================================================>] (100%)

bark_print_statistics: mem per token =     0.00 MB
bark_print_statistics:   sample time =     3.96 ms / 410 tokens
bark_print_statistics:  predict time = 14303.32 ms / 34.89 ms per token
bark_print_statistics:    total time = 14315.52 ms

Generating fine tokens: [==================================================>] (100%)

bark_print_statistics: mem per token =     0.00 MB
bark_print_statistics:   sample time =    41.93 ms / 6144 tokens
bark_print_statistics:  predict time = 15234.38 ms / 2.48 ms per token
bark_print_statistics:    total time = 15282.15 ms

Number of frames written = 51840.

main:     load time =  1436.36 ms
main:     eval time = 34520.53 ms
main:    total time = 32786.04 ms

Here are typical audio pieces generated by bark.cpp:

audio1.mp4
audio2.mp4

Usage

Here are the steps to use Bark.cpp

Get the code

git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
git submodule update --init --recursive

Build

In order to build bark.cpp you must use CMake:

mkdir build
cd build
cmake ..
cmake --build . --config Release

Prepare data & Run

# install Python dependencies
python3 -m pip install -r requirements.txt

# obtain the original bark and encodec weights and place them in ./models
python3 download_weights.py --download-dir ./models

# download the vocabulary
wget https://huggingface.co/suno/bark/raw/main/vocab.txt
mv ./vocab.txt ./models/

# convert the model to ggml format
python3 convert.py --dir-model ./models --out-dir ./ggml_weights/ --vocab-path ./models

# run the inference
./build/examples/main/main -m ./ggml_weights/ -p "this is an audio"

(Optional) Quantize weights

Weights can be quantized using the following strategy: q4_0, q4_1, q5_0, q5_1, q8_0.

Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.

./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0

Seminal papers

Contributing

bark.cpp is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be

  • bug report: you may encounter a bug while using bark.cpp. Don't hesitate to report it on the issue section.
  • feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions.
  • pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you.

Coding guidelines

  • Avoid adding third-party dependencies, extra files, extra headers, etc.
  • Always consider cross-compatibility with other operating systems and architectures

bark.cpp's People

Contributors

pabannier avatar green-sky avatar jzeiber avatar felrock avatar jhen0409 avatar vietanhdev avatar jmtatsch avatar

Stargazers

Ivan avatar samy kamkar avatar Lulzx avatar  avatar  avatar Legion avatar  avatar  avatar Dane Madsen avatar  avatar Roman Ryltsov avatar 涛声依旧 avatar Kinda Hall avatar Rudolf Olah avatar Neil Pullman avatar Bryan Yang avatar YuanFeng avatar Maxime avatar Vincent Hengel avatar KIENDREBEOGO JONATHAN avatar  avatar Frozen Forest Reality Technologies avatar Shobhit Narayanan avatar Tatsuya Shiozawa avatar  avatar 07 avatar Jun Guo avatar  avatar chester avatar Ziwei Fan avatar Nayan avatar Tuan Pham avatar Ahammad Sabir avatar İsmail Codar avatar  avatar Kyle Mistele avatar Roozbeh avatar Ashok Gelal avatar Chetan baliyan avatar Chandler avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.