Coder Social home page Coder Social logo

bark.cpp's Introduction

About me

  • I work as a Data Scientist at AI biotech Owkin.
  • Previously, I interned at INRIA Parietal on solving neuroscience (M/EEG) inverse problems.
  • I graduated from Ecole Polytechnique and HEC Paris with a double major in data science and management.

In 2022, I co-created skglm, a fast sklearn-compatible solver for sparse generalized linear models. More recently, I've become interested in fast inference for large language models. I have implemented Bark.cpp, a port of SunoAI's Bark model in C/C++, as well as specialized models like BioGPT.cpp.

Cool open-source projects I contributed to

  • MNE-Python, a toolkit for exploring neurophysiological data in Python
  • Linfa, the leading crate for machine learning and data analysis in Rust
  • Benchopt, a benchmarking suite for optimization algorithms

Other projects I worked on

  • Encodec.cpp, Meta's neural codec model ported in C++
  • SparseGLM, a fast coordinate descent solver in Rust
  • Nanograd, a lightweight deep learning framework built around Numpy arrays
  • NarrateMate.ai, a Next.JS web app to practice language comprehension listening to YouTube videos

bark.cpp's People

Contributors

felrock avatar ggerganov avatar green-sky avatar jhen0409 avatar jmtatsch avatar jzeiber avatar pabannier avatar przemoc avatar vietanhdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bark.cpp's Issues

What's the output length?

I think I remember reading bark generates 30s of audio at a time. Is that also true for bark.cpp?

I've tried letting it read some article and it crashed. Is that a length limitation or something else?

Also: Is there example code to make it read back a whole news article, a dialogue or anything useful?

Update on model development

Please make a simple model for this test program, which can be used immediately. I'm not very good at python, sorry to bother you

OSX metal?

DId OSX metal support ever get implemented?

Can we have compiled exe for usage please?

Hello,
Thank you for this!

Can u please compile ur project like other cpp ai such as stable diffusion cpp? so we only can download the exe from release and start to use it, i'm a simple user and don't know much about compiling and coding.

And could u please give an ETA when audiocraft will be supported in the future?

kind regards

Not enough space in the context's memory pool

Following your instructions, I get the following:

$ ./build/bin/main -m ./ggml_weights/ -p "this is an audio"
bark_load_model_from_file: loading model from './ggml_weights/'
bark_load_model_from_file: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_load_model_from_file: reading bark vocab

bark_load_model_from_file: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MB

bark_load_model_from_file: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MB

bark_load_model_from_file: reading bark codec model
encodec_model_load: model size    =   44.32 MB

bark_load_model_from_file: total model size  =  4170.64 MB

bark_tokenize_input: prompt: 'this is an audio'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595 
bark_forward_text_encoder: ...........................................................................................................

bark_print_statistics: mem per token =     4.81 MB
bark_print_statistics:   sample time =    23.58 ms / 109 tokens
bark_print_statistics:  predict time =  9675.77 ms / 87.96 ms per token
bark_print_statistics:    total time =  9702.40 ms

bark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................

bark_print_statistics: mem per token =     8.53 MB
bark_print_statistics:   sample time =     6.76 ms / 324 tokens
bark_print_statistics:  predict time = 50832.34 ms / 156.41 ms per token
bark_print_statistics:    total time = 50843.50 ms

ggml_new_object: not enough space in the context's memory pool (needed 4115076720, available 4112941056)
Segmentation fault (core dumped)

Is this related to my machine memory?

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            39Gi       6.3Gi       8.2Gi       1.1Gi        24Gi        28Gi
Swap:           19Gi       0.0Ki        19Gi

BUG Tokenizer

For some inputs like "john", the tokenizer adds indefinitely "##".

Support for piper models

It would be helpful to add support for piper models into bark.cpp

there is already a c++ library for piper but it is difficult to compile and does not work well cross platform. Piper is currently running on the onnx runtime.

https://github.com/rhasspy/piper

Support iOS and Android?

Hi,
Is it possible to support iOS and Android? General guidelines on how you'd approach that.

Thanks,
Hussain

Unable to build

Hi, when I try to build both on colab as well as locally I get this error :

/content/bark.cpp
/content/bark.cpp/build
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:24 (add_subdirectory):
The source directory

/content/bark.cpp/ggml

does not contain a CMakeLists.txt file.

-- Configuring incomplete, errors occurred!
gmake: Makefile: No such file or directory
gmake: *** No rule to make target 'Makefile'. Stop.

Whats up with this ?

Working example on Google Colab?

Can anyone show a working example on Google Colab where a concrete audio file is generated? In my attempts, execution strangely breaks after these lines.

bark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................

bark_forward_coarse_encoder: mem per token = 8.51 MB
bark_forward_coarse_encoder: sample time = 8.16 ms
bark_forward_coarse_encoder: predict time = 95368.38 ms / 294.35 ms per token
bark_forward_coarse_encoder: total time = 95518.55 ms

Here is the link to my attempt on Google Colab:
https://colab.research.google.com/drive/1JVtJ6CDwxtKfFmEd8J4FGY2lzdL0d0jT?usp=sharing

support GPU or not ?

I have checked the project description that :
The main goal of bark.cpp is to synthesize audio from a textual input with the Bark model in efficiently using only CPU.

Could I know the it support GPU or not ? I suppose that using GPU should be more faster that using CPU

Some broken things for first timers

First of all, thanks for taking up the challenge and democratising this wunderful model.

encodec_24khz-d7cc33bc.th doesn't download for me

Downloading: "https:/dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th" to /Users/tatsch/.cache/torch/hub/checkpoints/encodec_24khz-d7cc33bc.th
Traceback (most recent call last):
  File "/Users/tatsch/workspace/bark.cpp/download_weights.py", line 41, in <module>
    state_dict = torch.hub.load_state_dict_from_url(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/hub.py", line 746, in load_state_dict_from_url
    download_url_to_file(url, cached_file, hash_prefix, progress=progress)
  File "/opt/homebrew/lib/python3.11/site-packages/torch/hub.py", line 611, in download_url_to_file
    u = urlopen(req)
        ^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 516, in open
    req = meth(req)
          ^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1272, in do_request_
    raise URLError('no host given')
urllib.error.URLError: <urlopen error no host given>

curl -o models/encodec_24khz-d7cc33bc.th https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th

vocab.txt also isnt there for me in models, maybe related to the aforementioned issue

curl -o models/vocab.txt https://huggingface.co/suno/bark/blob/main/vocab.txt

but I guess its the wrong one because when I run it

bark_model_load: reading bark vocab
bark_vocab_load: wrong voculary size (305 != 119547)
bark_model_load: invalid model file './ggml_weights//ggml_vocab.bin' (bad text)
main: failed to load model from './ggml_weights/'

also the call in the readme should be

./main -m ./ggml_weights/ -p "this is an audio"
instead of

./main -m ./models/ggml_weights/ -p "this is an audio"
for the default folder structure.

bark_forward_fine_encoder tried to allocate 30GB of memory during forward pass.

As explained before in one of the issues, during a forward pass, the bark_forward_fine_encoder tried to allocate 30GB of memory.

The console log looks something like this:

./main -m ./ggml_weights -p "this is an audio" 
bark_model_load: loading model from './ggml_weights'
bark_model_load: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_model_load: reading bark vocab

bark_model_load: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MB

bark_model_load: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MB

bark_model_load: reading bark codec model
encodec_model_load: model size    =   44.32 MB

bark_model_load: total model size  =  4170.64 MB

bark_generate_audio: prompt: 'this is an audio'
bark_generate_audio: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595 
bark_forward_text_encoder: ...........................................................................................................

bark_forward_text_encoder: mem per token =     4.80 MB
bark_forward_text_encoder:   sample time =    13.86 ms
bark_forward_text_encoder:  predict time =  6651.94 ms / 18.22 ms per token
bark_forward_text_encoder:    total time =  6737.75 ms

bark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................

bark_forward_coarse_encoder: mem per token =     8.51 MB
bark_forward_coarse_encoder:   sample time =     3.54 ms
bark_forward_coarse_encoder:  predict time = 31155.62 ms / 96.16 ms per token
bark_forward_coarse_encoder:    total time = 31228.26 ms

fine_gpt_eval: failed to allocate 31987885670 bytes
bark_forward_fine_encoder: ggml_aligned_malloc: insufficient memory (attempted to allocate 30506.03 MB)
GGML_ASSERT: ggml.c:4408: ctx->mem_buffer != NULL
zsh: killed     ./main -m ./ggml_weights -p "this is an audio"

So far I am unable to track the cause. But will keep trying.

List of errors in build and seg fault on inference

I started off from update_submodule branch

/oos/bark.cpp/bark/bark.cpp:2048:43: error: use of undeclared identifier 'encodec_verbosity_level'; did you mean 'bark_verbosity_level'?
        encodec_model_path, n_gpu_layers, encodec_verbosity_level::LOW);
                                          ^~~~~~~~~~~~~~~~~~~~~~~
                                          bark_verbosity_level
/oos/bark.cpp/bark/./bark.h:23:12: note: 'bark_verbosity_level' declared here
enum class bark_verbosity_level {
           ^
/oos/bark.cpp/bark/bark.cpp:2047:37: error: no matching function for call to 'encodec_load_model'
    struct encodec_context * ectx = encodec_load_model(
                                    ^~~~~~~~~~~~~~~~~~
/oos/bark.cpp/encodec.cpp/./encodec.h:193:26: note: candidate function not viable: requires 2 arguments, but 3 were provided
struct encodec_context * encodec_load_model(
                         ^
/oos/bark.cpp/bark/bark.cpp:2060:5: error: use of undeclared identifier 'encodec_set_sample_rate'
    encodec_set_sample_rate(ectx, sample_rate);
    ^
3 errors generated.
make[2]: *** [CMakeFiles/bark.dir/bark.cpp.o] Error 1
make[1]: *** [CMakeFiles/bark.dir/all] Error 2
make: *** [all] Error 2

I added fixes here:
#132
PABannier/encodec.cpp#34

But even then, the example ./bark/build/examples/main/main -m ./ggml_weights/ -p "this is an audio" will produce noise, and any string other than "this is an audio" will cause a segmentation fault. This is on M macOS.
Cmake version:
cmake version 3.28.3

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Quantize doesn't seem to work for codec model

The text, coarse, and fine models are converted successfully, but the codec model always results in a 0 byte output. After a quick look, it seems the header in the codec model may be slightly different than the other models and it can't read the correct ftype from the file because the offsets are wrong.

Additionally, running the models as f32 or f16 produces very similar output for the same prompt/seed. Running the text, coarse, and fine models quantized at q8_0 produces an entirely different output for the same prompt/seed.

How to accurately set up prompt?

Hi! I've tried different prompts, but the results are very strange. See the following examples:

  1. Precision: fp32. Prompt: "one two three four five six seven eight nine ten." The output is 9 seconds long, but it
    only takes the first 3s to read out "eight nine ten", and the other 6s almost contain nothing.
  2. Precision: q4. Prompt: "one two three four five six seven eight nine ten." The output is a 12-second-long murmur
  3. Precision: q4. Prompt: "one two three four five six." The output only reads out "two three four five six".
    There are also some issues that occur when using different random seeds or prompts like "[MAN] one two three four five six" and "[happy piano music, playing for ten seconds]".
    Are there any solutions or suggestions for setting the prompts accurately (especially for playing music)? Thx!

unable to clone repository

git clone --recursive https://github.com/PABannier/bark.cpp.git
Cloning into 'bark.cpp'...
remote: Enumerating objects: 700, done.
remote: Counting objects: 100% (360/360), done.
remote: Compressing objects: 100% (145/145), done.
remote: Total 700 (delta 292), reused 224 (delta 214), pack-reused 340
Receiving objects: 100% (700/700), 47.85 MiB | 10.92 MiB/s, done.
Resolving deltas: 100% (390/390), done.
Submodule 'encodec.cpp' (https://github.com/PABannier/encodec.cpp) registered for path 'encodec.cpp'
Cloning into '/mnt/ubuntu/home/jape/ai/bark.cpp/encodec.cpp'...
remote: Enumerating objects: 275, done.
remote: Counting objects: 100% (122/122), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 275 (delta 84), reused 68 (delta 52), pack-reused 153
Receiving objects: 100% (275/275), 3.93 MiB | 9.86 MiB/s, done.
Resolving deltas: 100% (155/155), done.
fatal: remote error: upload-pack: not our ref e50cd96d28c89f6c1343c291042b14bab6f3b83b
fatal: Fetched in submodule path 'encodec.cpp', but it did not contain e50cd96d28c89f6c1343c291042b14bab6f3b83b. Direct fetching of that commit failed.

Performance Estimate Benchmarks

Commendable Effort! Could do some form of performance benchmarking? Latency, Memory Usage etc. maybe on Colab with multiple different configurations. Maybe if Batch processing is enabled also an estimate on the largest batch size to use etc.

CI/MNT: Write unit tests

We should have CI running unit tests.

Seeding the Mersenne twister and for a family of inputs, check we have the same tokenization as bark for:

  • Text encoder
  • Coarse encoder
  • Fine encoder

Additionally, we should test the Bert tokenizer.

License of this repository

Hi!
You are building a great project. I plan to use it in my next open source. However, I need to know the license of the project before starting with it.
Could you add a LICENSE file so that we can know what we can do with this project?
Thank you very much!

submodule encodec.cpp is dropped

Fetched in submodule path '../encodec.cpp', but it did not contain e50cd96d28c89f6c1343c291042b14bab6f3b83b. Direct fetching of that commit failed.

ENH: Create a Bark context

Currently, we need to free memory we need to call:

ggml_free(model.coarse_model.ctx);
ggml_free(model.fine_model.ctx);
ggml_free(model.text_model.ctx);
ggml_free(model.codec_model.ctx);

This is cumbersome and could be replaced easily by a bark_free function, similar to llama_free.

MacBook

Why do you want to make it only work on a MacBook?

If it works on a MacBook couldn't it work on other computers too?

Ggml works on any computer, couldn't this do the same?

How to use other languages?

I'm trying to generate audio in another language, but I can't. Is there a way to do that now or is it a planned feature?

First attempts

So I have been following this project with anticipation, and finally decided to give it a go.

  1. simple but obvious, the cmake is missing the main :)
  2. the vocab.bin ships with the repo, so why require it for the conversion (i commented it out)
  3. running main yields in a allocation error, trying to allocate 47GiB 🤣
$ ./main -m models/bark_v0/
bark_model_load: loading model from 'models/bark_v0/'
bark_model_load: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_model_load: reading bark vocab

bark_model_load: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MB

bark_model_load: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MB

bark_model_load: reading bark codec model
encodec_model_load: model size    =   44.32 MB

bark_model_load: total model size  =  4170.64 MB

bark_generate_audio: prompt: 'this is an audio'
bark_generate_audio: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595
bark_forward_text_encoder: ...........................................................................................................

bark_forward_text_encoder: mem per token =     4.80 MB
bark_forward_text_encoder:   sample time =    17.30 ms
bark_forward_text_encoder:  predict time =  6746.21 ms / 18.48 ms per token
bark_forward_text_encoder:    total time =  6825.61 ms

bark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................

bark_forward_coarse_encoder: mem per token =     8.51 MB
bark_forward_coarse_encoder:   sample time =     4.79 ms
bark_forward_coarse_encoder:  predict time = 30730.57 ms / 94.85 ms per token
bark_forward_coarse_encoder:    total time = 30784.73 ms

fine_gpt_eval: failed to allocate 50200313856 bytes
bark_forward_fine_encoder: ggml_aligned_malloc: insufficient memory (attempted to allocate 47874.75 MB)
GGML_ASSERT: ggml.c:4408: ctx->mem_buffer != NULL
Aborted (core dumped)

Unable to build.

Git clone submodules recursive fails, but I was able to manually fix that.

╰─(base) ⠠⠵ git submodule update --init --recursive                                                                                                                                        on main|✚1
fatal: remote error: upload-pack: not our ref e50cd96d28c89f6c1343c291042b14bab6f3b83b
fatal: Fetched in submodule path 'encodec.cpp', but it did not contain e50cd96d28c89f6c1343c291042b14bab6f3b83b. Direct fetching of that commit failed.

But then when I do cmake --build . --config Release I get:

╰─(base) ⠠⠵ cmake --build . --config Release                                                                                                                                               on main|✚1
[  4%] Building C object encodec.cpp/ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[  8%] Building C object encodec.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 12%] Building C object encodec.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend.c.o
[ 16%] Linking C shared library libggml.so
[ 16%] Built target ggml
[ 20%] Building CXX object encodec.cpp/CMakeFiles/encodec.dir/encodec.cpp.o
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:319:22: warning: multi-character character constant [-Wmultichar]
  319 |         if (magic != ENCODEC_FILE_MAGIC) {
      |                      ^~~~~~~~~~~~~~~~~~
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp: In function ‘void print_tensor(ggml_tensor*)’:
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:79:27: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   79 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                        ~~~^                        ~~~~~~~~
      |                           |                               |
      |                           long long int                   int64_t {aka long int}
      |                        %ld
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:79:33: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   79 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                              ~~~^                            ~~~~~~~~
      |                                 |                                   |
      |                                 long long int                       int64_t {aka long int}
      |                              %ld
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:79:39: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   79 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                                    ~~~^                                ~~~~~~~~
      |                                       |                                       |
      |                                       long long int                           int64_t {aka long int}
      |                                    %ld
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:79:45: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   79 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                                          ~~~^                                    ~~~~~~~~
      |                                             |                                           |
      |                                             long long int                               int64_t {aka long int}
      |                                          %ld
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp: In function ‘bool encodec_load_model_weights(const std::string&, encodec_model&, int)’:
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:714:89: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
  714 |                 fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld, %lld], expected [%d, %d, %d]\n",
      |                                                                                      ~~~^
      |                                                                                         |
      |                                                                                         long long int
      |                                                                                      %ld
  715 |                         __func__, name.data(), tensor->ne[0], tensor->ne[1], tensor->ne[2], ne[0], ne[1], ne[2]);
      |                                                ~~~~~~~~~~~~~                             
      |                                                            |
      |                                                            int64_t {aka long int}
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:714:95: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 6 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
  714 |                 fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld, %lld], expected [%d, %d, %d]\n",
      |                                                                                            ~~~^
      |                                                                                               |
      |                                                                                               long long int
      |                                                                                            %ld
  715 |                         __func__, name.data(), tensor->ne[0], tensor->ne[1], tensor->ne[2], ne[0], ne[1], ne[2]);
      |                                                               ~~~~~~~~~~~~~                    
      |                                                                           |
      |                                                                           int64_t {aka long int}
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:714:101: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 7 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
  714 |                 fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld, %lld], expected [%d, %d, %d]\n",
      |                                                                                                  ~~~^
      |                                                                                                     |
      |                                                                                                     long long int
      |                                                                                                  %ld
  715 |                         __func__, name.data(), tensor->ne[0], tensor->ne[1], tensor->ne[2], ne[0], ne[1], ne[2]);
      |                                                                              ~~~~~~~~~~~~~           
      |                                                                                          |
      |                                                                                          int64_t {aka long int}
[ 25%] Linking CXX static library libencodec.a
[ 25%] Built target encodec
[ 29%] Building CXX object CMakeFiles/bark.dir/bark.cpp.o
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp: In function ‘void print_tensor(ggml_tensor*)’:
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:74:27: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   74 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                        ~~~^                        ~~~~~~~~
      |                           |                               |
      |                           long long int                   int64_t {aka long int}
      |                        %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:74:33: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   74 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                              ~~~^                            ~~~~~~~~
      |                                 |                                   |
      |                                 long long int                       int64_t {aka long int}
      |                              %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:74:39: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   74 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                                    ~~~^                                ~~~~~~~~
      |                                       |                                       |
      |                                       long long int                           int64_t {aka long int}
      |                                    %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:74:45: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   74 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                                          ~~~^                                    ~~~~~~~~
      |                                             |                                           |
      |                                             long long int                               int64_t {aka long int}
      |                                          %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp: In function ‘void bark_print_statistics(gpt_model*)’:
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:123:47: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
  123 |     printf("%s:   sample time = %8.2f ms / %lld tokens\n", __func__, model->t_sample_us/1000.0f, model->n_sample);
      |                                            ~~~^                                                  ~~~~~~~~~~~~~~~
      |                                               |                                                         |
      |                                               long long int                                             int64_t {aka long int}
      |                                            %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp: In function ‘bool bark_generate_audio(bark_context*, std::string&, std::string&, int, bark_verbosity_level)’:
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:2048:43: error: ‘encodec_verbosity_level’ has not been declared
 2048 |         encodec_model_path, n_gpu_layers, encodec_verbosity_level::LOW);
      |                                           ^~~~~~~~~~~~~~~~~~~~~~~
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:2060:5: error: ‘encodec_set_sample_rate’ was not declared in this scope
 2060 |     encodec_set_sample_rate(ectx, sample_rate);
      |     ^~~~~~~~~~~~~~~~~~~~~~~
gmake[2]: *** [CMakeFiles/bark.dir/build.make:76: CMakeFiles/bark.dir/bark.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:256: CMakeFiles/bark.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2
╭─arthur at aquarelle in ~/dev/ai/bark.cpp/bark/build on main✘✘✘ 24-03-06 - 5:17:37
╰─(base) ⠠⠵ cmake --build . --config Release      

any ideas?
thanks!

Support for AudioLDM2

We seem to have a working implementation of AudioLDM2

I understand you have already mentioned you will implement Vocos and AudioCraft. But it seems to me that AudioLDM produces better outputs.

Please have a look?
:)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.