Coder Social home page Coder Social logo

bark's Introduction

Notice: Bark is Suno's open-source text-to-speech+ model. If you are looking for our text-to-music models, please visit us on our web page and join our community on Discord.

🐢 Bark

Twitter

πŸ”— Examples β€’ Suno Studio Waitlist β€’ Updates β€’ How to Use β€’ Installation β€’ FAQ



Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.

⚠ Disclaimer

Bark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly.

πŸ“– Quick Index

🎧 Demos

Open in Spaces Open on Replicate Open In Colab

πŸš€ Updates

2023.05.01

  • ©️ Bark is now licensed under the MIT License, meaning it's now available for commercial use!

  • ⚑ 2x speed-up on GPU. 10x speed-up on CPU. We also added an option for a smaller version of Bark, which offers additional speed-up with the trade-off of slightly lower quality.

  • πŸ“• Long-form generation, voice consistency enhancements and other examples are now documented in a new notebooks section.

  • πŸ‘₯ We created a voice prompt library. We hope this resource helps you find useful prompts for your use cases! You can also join us on Discord, where the community actively shares useful prompts in the #audio-prompts channel.

  • πŸ’¬ Growing community support and access to new features here:

  • πŸ’Ύ You can now use Bark with GPUs that have low VRAM (<4GB).

2023.04.20

  • 🐢 Bark release!

🐍 Usage in Python

πŸͺ‘ Basics

from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
from IPython.display import Audio

# download and load all models
preload_models()

# generate audio from text
text_prompt = """
     Hello, my name is Suno. And, uh β€” and I like pizza. [laughs] 
     But I also have other interests such as playing tic tac toe.
"""
audio_array = generate_audio(text_prompt)

# save audio to disk
write_wav("bark_generation.wav", SAMPLE_RATE, audio_array)
  
# play text in notebook
Audio(audio_array, rate=SAMPLE_RATE)
pizza.webm

🌎 Foreign Language


Bark supports various languages out-of-the-box and automatically determines language from input text. When prompted with code-switched text, Bark will attempt to employ the native accent for the respective languages. English quality is best for the time being, and we expect other languages to further improve with scaling.

text_prompt = """
    좔석은 λ‚΄κ°€ κ°€μž₯ μ’‹μ•„ν•˜λŠ” λͺ…μ ˆμ΄λ‹€. λ‚˜λŠ” λ©°μΉ  λ™μ•ˆ νœ΄μ‹μ„ μ·¨ν•˜κ³  친ꡬ 및 κ°€μ‘±κ³Ό μ‹œκ°„μ„ 보낼 수 μžˆμŠ΅λ‹ˆλ‹€.
"""
audio_array = generate_audio(text_prompt)
suno_korean.webm

Note: since Bark recognizes languages automatically from input text, it is possible to use, for example, a german history prompt with english text. This usually leads to english audio with a german accent.

text_prompt = """
    Der DreißigjÀhrige Krieg (1618-1648) war ein verheerender Konflikt, der Europa stark geprÀgt hat.
    This is a beginning of the history. If you want to hear more, please continue.
"""
audio_array = generate_audio(text_prompt)
suno_german_accent.webm

🎢 Music

Bark can generate all types of audio, and, in principle, doesn't see a difference between speech and music. Sometimes Bark chooses to generate text as music, but you can help it out by adding music notes around your lyrics.

text_prompt = """
    β™ͺ In the jungle, the mighty jungle, the lion barks tonight β™ͺ
"""
audio_array = generate_audio(text_prompt)
lion.webm

🎀 Voice Presets

Bark supports 100+ speaker presets across supported languages. You can browse the library of supported voice presets HERE, or in the code. The community also often shares presets in Discord.

Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc.

text_prompt = """
    I have a silky smooth voice, and today I will tell you about 
    the exercise regimen of the common sloth.
"""
audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_1")
sloth.webm

πŸ“ƒ Generating Longer Audio

By default, generate_audio works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see πŸ‘‰ Notebook πŸ‘ˆ

Click to toggle example long-form generations (from the example notebook)
dialog.webm
longform_advanced.webm
longform_basic.webm

Command line

python -m bark --text "Hello, my name is Suno." --output_filename "example.wav"

πŸ’» Installation

‼️ CAUTION ‼️ Do NOT use pip install bark. It installs a different package, which is not managed by Suno.

pip install git+https://github.com/suno-ai/bark.git

or

git clone https://github.com/suno-ai/bark
cd bark && pip install . 

πŸ€— Transformers Usage

Bark is available in the πŸ€— Transformers library from version 4.31.0 onwards, requiring minimal dependencies and additional packages. Steps to get started:

  1. First install the πŸ€— Transformers library from main:
pip install git+https://github.com/huggingface/transformers.git
  1. Run the following Python code to generate speech samples:
from transformers import AutoProcessor, BarkModel

processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark")

voice_preset = "v2/en_speaker_6"

inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()
  1. Listen to the audio samples either in an ipynb notebook:
from IPython.display import Audio

sample_rate = model.generation_config.sample_rate
Audio(audio_array, rate=sample_rate)

Or save them as a .wav file using a third-party library, e.g. scipy:

import scipy

sample_rate = model.generation_config.sample_rate
scipy.io.wavfile.write("bark_out.wav", rate=sample_rate, data=audio_array)

For more details on using the Bark model for inference using the πŸ€— Transformers library, refer to the Bark docs or the hands-on Google Colab.

πŸ› οΈ Hardware and Inference Speed

Bark has been tested and works on both CPU and GPU (pytorch 2.0+, CUDA 11.7 and CUDA 12.0).

On enterprise GPUs and PyTorch nightly, Bark can generate audio in roughly real-time. On older GPUs, default colab, or CPU, inference time might be significantly slower. For older GPUs or CPU you might want to consider using smaller models. Details can be found in out tutorial sections here.

The full version of Bark requires around 12GB of VRAM to hold everything on GPU at the same time. To use a smaller version of the models, which should fit into 8GB VRAM, set the environment flag SUNO_USE_SMALL_MODELS=True.

If you don't have hardware available or if you want to play with bigger versions of our models, you can also sign up for early access to our model playground here.

βš™οΈ Details

Bark is fully generative text-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to AudioLM and Vall-E and a quantized Audio representation from EnCodec. It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.

Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on Discord!

  • [laughter]
  • [laughs]
  • [sighs]
  • [music]
  • [gasps]
  • [clears throat]
  • β€” or ... for hesitations
  • β™ͺ for song lyrics
  • CAPITALIZATION for emphasis of a word
  • [MAN] and [WOMAN] to bias Bark toward male and female speakers, respectively

Supported Languages

Language Status
English (en) βœ…
German (de) βœ…
Spanish (es) βœ…
French (fr) βœ…
Hindi (hi) βœ…
Italian (it) βœ…
Japanese (ja) βœ…
Korean (ko) βœ…
Polish (pl) βœ…
Portuguese (pt) βœ…
Russian (ru) βœ…
Turkish (tr) βœ…
Chinese, simplified (zh) βœ…

Requests for future language support here or in the #forums channel on Discord.

πŸ™ Appreciation

  • nanoGPT for a dead-simple and blazing fast implementation of GPT-style models
  • EnCodec for a state-of-the-art implementation of a fantastic audio codec
  • AudioLM for related training and inference code
  • Vall-E, AudioLM and many other ground-breaking papers that enabled the development of Bark

Β© License

Bark is licensed under the MIT License.

πŸ“±Β Community

🎧 Suno Studio (Early Access)

We’re developing a playground for our models, including Bark.

If you are interested, you can sign up for early access here.

❓ FAQ

How do I specify where models are downloaded and cached?

  • Bark uses Hugging Face to download and store models. You can see find more info here.

Bark's generations sometimes differ from my prompts. What's happening?

  • Bark is a GPT-style model. As such, it may take some creative liberties in its generations, resulting in higher-variance model outputs than traditional text-to-speech approaches.

What voices are supported by Bark?

  • Bark supports 100+ speaker presets across supported languages. You can browse the library of speaker presets here. The community also shares presets in Discord. Bark also supports generating unique random voices that fit the input text. Bark does not currently support custom voice cloning.

Why is the output limited to ~13-14 seconds?

  • Bark is a GPT-style model, and its architecture/context window is optimized to output generations with roughly this length.

How much VRAM do I need?

  • The full version of Bark requires around 12Gb of memory to hold everything on GPU at the same time. However, even smaller cards down to ~2Gb work with some additional settings. Simply add the following code snippet before your generation:
import os
os.environ["SUNO_OFFLOAD_CPU"] = "True"
os.environ["SUNO_USE_SMALL_MODELS"] = "True"

My generated audio sounds like a 1980s phone call. What's happening?

  • Bark generates audio from scratch. It is not meant to create only high-fidelity, studio-quality speech. Rather, outputs could be anything from perfect speech to multiple people arguing at a baseball game recorded with bad microphones.

bark's People

Contributors

alyxdow avatar ding3li avatar fiq avatar gitmylo avatar gkucsko avatar jn-jairo avatar jonathanfly avatar kmfreyberg avatar mcamac avatar mikeyshulman avatar no2chem avatar orlandohohmeier avatar tongbaojia avatar vaibhavs10 avatar ylacombe avatar zygi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bark's Issues

Some questions

@gkucsko Thanks for such an amazing work!
Could you please share some data examples (like 5 items) to show how you construct the dataset? I am quite curious how you manipulate with the [laughs], [humm] tokens or music descriptors. Thanks in advance.
would you mind writing a more specific technical report?
Also the audio generated by the notebook is not as good as the demo shows. Do you have a larger pretrained model?

How can I save it as a wav file

text_prompt = """
Hello, my name is Suno. And, uh β€” and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)

Add my/your voices to the dataset

I would like to enrich the dataset by adding my voice to the dataset, I would appreciate it if you provide information on how to participate in that.

Generation inconsistencies

Hi! Congratulations on the awesome product!

I tried generating the same prompts as from the demo, and was met with a few odd results that differed from the previous generated results. Using the same colab notebook, I got mostly silence for the spanish text, and some harsh screeching interspersed throughout the other prompts as well.

Here is the link to the colab notebook with the generated sounds:
https://colab.research.google.com/drive/1iJtfgTCs3WgE0kfSQYEY1-XCy9G-TAt3#scrollTo=8KV3klnr-lvo

Possible missing deps in installation instructions?

The Issue

I believe the installation instructions may not fully describe the dependencies required to install the library. My guess is that there could be some common dependencies that many Python developers use so frequently that they were unintentionally omitted from the installation instructions.

I attempted to run the example script in the README.md on both a Windows and Ubuntu machine. Unfortunately, it failed both times due to missing dependencies.

For some background, I don't usually use Python or Pip for my day-to-day development work. I installed both from scratch and followed the installation instructions word-for-word since I'm not typically a Python developer.

The example failed to load on both Windows and Ubuntu.

Steps to Reproduce

I'll run the repro steps in a Docker container because it makes it easier for others to reproduce the steps on their local machine. Although I don't plan to actually run Bark in Docker, it's useful for creating a reproduction of the error.

Provision a Machine to Test Things On

First, let's get an Ubuntu 22 machine running (I ran this in Fish, Bash users might need to adjust the script):

docker run --rm -it -v (pwd):/data ubuntu bash

Then we can verify the Ubuntu version:

root@9269e48a1db8:/# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

Install Deps

Now we'll install Python, Pip, Git, and Bark. I'm also going to install nano to make it easier to copy and paste the example from the README into the container.

apt update
apt install python-is-python3 python3-pip git nano --yes

Install Bark

OPTION 1: PIP Installation

pip install git+https://github.com/suno-ai/bark.git

Yields:

Collecting git+https://github.com/suno-ai/bark.git
  Cloning https://github.com/suno-ai/bark.git to /tmp/pip-req-build-3d5bs4cx
  Running command git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-3d5bs4cx
  Resolved https://github.com/suno-ai/bark.git to commit 874af1bae9a74324b1fff5573963373c0016f0e0
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
  Building wheel for UNKNOWN (pyproject.toml) ... done
  Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=7276 sha256=dfc2d55c1364d743af2968153c439788ee12364a281e9c354d5a9e84870d99e4
  Stored in directory: /tmp/pip-ephem-wheel-cache-hvvpy6mf/wheels/e6/6d/c2/107ed849afe600f905bb4049a026df3c7c5aa75d86c2721ec7
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

OPTION 2: Git Clone Installation

git clone https://github.com/suno-ai/bark
cd bark && pip install .

Yields:

Cloning into 'bark'...
remote: Enumerating objects: 280, done.
remote: Counting objects: 100% (61/61), done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 280 (delta 42), reused 28 (delta 19), pack-reused 219
Receiving objects: 100% (280/280), 1.34 MiB | 4.02 MiB/s, done.
Resolving deltas: 100% (70/70), done.
Processing /bark
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
  Building wheel for UNKNOWN (pyproject.toml) ... done
  Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=7276 sha256=f8d1e0b5666bfda15fc921b10a2169365a43918a66adf1dbb8514119992c0855
  Stored in directory: /tmp/pip-ephem-wheel-cache-ntwrhv92/wheels/de/02/45/2e72ff30ce0400df4bc80201420b614232aa3ff723e67fc622
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Run Example

touch example.py
nano example.py # Paste the example here.
python example.py

The Error

I had different issues on Windows, but I unfortunately did not save the results (my Windows machine also had a fresh Pip/Python install).

Here is the error I got when building from Git cloned source:

root@42717c6cb118:/bark# python example.py 
Traceback (most recent call last):
  File "/bark/example.py", line 1, in <module>
    from bark import SAMPLE_RATE, generate_audio
  File "/bark/bark/__init__.py", line 1, in <module>
    from .api import generate_audio, text_to_semantic, semantic_to_waveform, save_as_prompt
  File "/bark/bark/api.py", line 3, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

And the error when installing via the pip install git+... method:

Traceback (most recent call last):
  File "//example.py", line 1, in <module>
    from bark import SAMPLE_RATE, generate_audio
ModuleNotFoundError: No module named 'bark'

Conclusion

It appears that there are some missing steps or dependencies in the installation instructions. Please let me know if there's any other information I can provide to help find a resolution to this issue.

How to pass custom speaker prompt?

Hi,
As I see you have provided some speaker prompt for the model, I want to send my voice as prompt rather than given what should I do that to convert my voice to prompt.

Support for AMD GPUs?

Hi, thank you for creating this amazing project. I wonder if it is possible to run it on AMD GPUs using the ROCm version of PyTorch, like Stable Diffusion does. I would really appreciate your answer. Thanks again

Apple Silicon support

Hey guys, thanks for releasing this as open-source!

Is there any plan to add Apple Silicon support and use MPS with PyTorch if available or is CUDA a "strict" requirement?

CUDA out of memory, running on RTX 3050ti, how to fix?

Exception has occurred: OutOfMemoryError
CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 4.00 GiB total capacity; 3.46 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
File "C:\Users\smast\OneDrive\Desktop\Code Projects\Johnny Five\audio test.py", line 8, in
audio_array = generate_audio(text_prompt)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 4.00 GiB total capacity; 3.46 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Allowing to ignore GPU

I get the error torch.cuda.OutOfMemoryError: CUDA out of memory. So I'd like to run on CPU. But there isn't a setting for that, even though the readme talks about being able to run on both CPU and GPU. It would be great if there was a setting to ignore the GPU to be able to avoid any errors relating to an insufficient GPU.

If technically applicable: If running on CPU wouldn't utilize all logical CPU cores by default, there should also be a setting for the number of threads as in llama.cpp, so one can get CPU utilization up to 100% to maximize speed.

No GPU being used

I have this message.
No GPU being used. Careful

But my is Geforce 1660 Super
What's wrong?

Driver Version: 472.12
CUDA Version: 11.4
Win10x64

Can someone help me understand how to create inference?

Anyway, I installed bark in WSL ubuntu in a conda env, I don't get how I'm supposed to do inference.
These commands don't work

from bark import SAMPLE_RATE, generate_audio
from IPython.display import Audio

text_prompt = """
Hello, my name is Suno. And, uh β€” and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)

Training time GPU hours

Hi,
Can you provide some information about the training time that was required and the input data?
How many A100 hours would be required to train a model like this?

Support for Portuguese (pt)

The examples provided are in pt-br Portuguese from Brazil and not in Portuguese from Portugal. I suggest replacing the Portuguese flag with a Brazilian one in the documents and adding pt-br instead of pt.

Would be much appreciated to have support for Portuguese from Portugal.

How can I generate sound effects?

The documentation mentions being able to generate simple sound effects, but I don't see any examples of how to do this. If I put in a prompt such as "sound effect of a door shutting", I just get the voice of someone saying that, which doesn't have quite the same effect.

Reason for sounding robotic

Hi,
I am interested in knowing why the voice output sounds so robotic.... is it because it only uses 24khz or what is causing this?

AttributeError: module 'torch.cuda' has no attribute 'is_bf16_supported'

Traceback (most recent call last):
File "D:\5118\movielearning\testbark\test.py", line 1, in
from bark import SAMPLE_RATE, generate_audio
File "C:\ProgramData\Anaconda3\envs\movielearning\lib\site-packages\bark_init_.py", line 1, in
from .api import generate_audio, text_to_semantic, semantic_to_waveform
File "C:\ProgramData\Anaconda3\envs\movielearning\lib\site-packages\bark\api.py", line 5, in
from .generation import codec_decode, generate_coarse, generate_fine, generate_text_semantic
File "C:\ProgramData\Anaconda3\envs\movielearning\lib\site-packages\bark\generation.py", line 24, in
torch.cuda.is_bf16_supported()
AttributeError: module 'torch.cuda' has no attribute 'is_bf16_supported'

Arbitrarily long text

Is there a way to run on arbitrarily long text for example breaking up by max token (not splitting words)?

How can I install bark in win10 correctly?

C:\Users\winner\Desktop>pip install git+https://github.com/suno-ai/bark.git
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting git+https://github.com/suno-ai/bark.git
Cloning https://github.com/suno-ai/bark.git to c:\users\winner\appdata\local\temp\pip-req-build-uky214xt
Running command git clone --filter=blob:none -q https://github.com/suno-ai/bark.git 'C:\Users\winner\AppData\Local\Temp\pip-req-build-uky214xt'
Resolved https://github.com/suno-ai/bark.git to commit 2a602ce
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
Building wheel for UNKNOWN (pyproject.toml) ... done
Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=7318 sha256=b171442666007d18d81628603548eb93bc889aa3fbcfdc011c1b9c3e7feb3e83
Stored in directory: C:\Users\winner\AppData\Local\Temp\pip-ephem-wheel-cache-fmy74ei2\wheels\5d\50\6d\04e99a146c274ebc61149dfd86e7f046aa2772170a0bc978d3
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0

How can we specify a smaller batch size for GPU with 8GB memory or less?

Hi Team,
Thanks for the great software. Is it possible to have batch size as a parameter?

I am trying to run the example with a NVIDIA GeForce GTX 1080.
It is a rather old GPU so it is not as powerful. When running the example code, it always fail with the following error:

---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
Cell In[8], line 8
      2 from IPython.display import Audio
      4 text_prompt = """
      5      Hello, my name is Suno. And, uh β€” and I like pizza. [laughs] 
      6      But I also have other interests such as playing tic tac toe.
      7 """
----> 8 audio_array = generate_audio(text_prompt)
      9 Audio(audio_array, rate=SAMPLE_RATE)

File ~\workspace\bark\bark\api.py:77, in generate_audio(text, history_prompt, text_temp, waveform_temp)
     60 def generate_audio(
     61     text: str,
     62     history_prompt: Optional[str] = None,
     63     text_temp: float = 0.7,
     64     waveform_temp: float = 0.7,
     65 ):
     66     """Generate audio array from input text.
     67 
     68     Args:
   (...)
     75         numpy audio array at sample frequency 24khz
     76     """
---> 77     x_semantic = text_to_semantic(text, history_prompt=history_prompt, temp=text_temp)
     78     audio_arr = semantic_to_waveform(x_semantic, history_prompt=history_prompt, temp=waveform_temp)
     79     return audio_arr

File ~\workspace\bark\bark\api.py:23, in text_to_semantic(text, history_prompt, temp)
      8 def text_to_semantic(
      9     text: str,
     10     history_prompt: Optional[str] = None,
     11     temp: float = 0.7,
     12 ):
     13     """Generate semantic array from text.
     14 
     15     Args:
   (...)
     21         numpy semantic array to be fed into `semantic_to_waveform`
     22     """
---> 23     x_semantic = generate_text_semantic(
     24         text,
     25         history_prompt=history_prompt,
     26         temp=temp,
     27     )
     28     return x_semantic

File ~\workspace\bark\bark\generation.py:404, in generate_text_semantic(text, history_prompt, temp, top_k, top_p, use_gpu, silent, min_eos_p, max_gen_duration_s, allow_early_stop, model)
    402 tot_generated_duration_s = 0
    403 for n in range(n_tot_steps):
--> 404     logits = model(x, merge_context=True)
    405     relevant_logits = logits[0, 0, :SEMANTIC_VOCAB_SIZE]
    406     if allow_early_stop:

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:168, in GPT.forward(self, idx, merge_context)
    166 x = self.transformer.drop(tok_emb + pos_emb)
    167 for block in self.transformer.h:
--> 168     x = block(x)
    169 x = self.transformer.ln_f(x)
    171 # inference-time mini-optimization: only forward the lm_head on the very last position

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:100, in Block.forward(self, x)
     98 def forward(self, x):
     99     x = x + self.attn(self.ln_1(x))
--> 100     x = x + self.mlp(self.ln_2(x))
    101     return x

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:82, in MLP.forward(self, x)
     81 def forward(self, x):
---> 82     x = self.c_fc(x)
     83     x = self.gelu(x)
     84     x = self.c_proj(x)

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.33 GiB already allocated; 0 bytes free; 7.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Questions about dataset and training processs

Hello, this project is amazing, I want to reproduce your research and improve on it, can you describe in detail the data set used etc.? Or can you provide the training code? Thanks

Installation fails on Ubuntu 22.04 LTS

When trying to install with the instructions on the README, the project will not install

user@host:~/p/bark-test$ pip3 install git+https://github.com/suno-ai/bark.git
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/suno-ai/bark.git
  Cloning https://github.com/suno-ai/bark.git to /tmp/pip-req-build-_xf6oh0i
  Running command git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-_xf6oh0i
  Resolved https://github.com/suno-ai/bark.git to commit 4b3462d5f5efc93bafa30bd82492c68a9bd161ac
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
  Building wheel for UNKNOWN (pyproject.toml) ... done
  Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=7276 sha256=7bc0c157340f7c229f1253fc7eff09bf224401a9dd068ccdecd9bd56dce59a99
  Stored in directory: /tmp/pip-ephem-wheel-cache-xjhfjbns/wheels/e6/6d/c2/107ed849afe600f905bb4049a026df3c7c5aa75d86c2721ec7
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0

Am I missing something? This installs just fine on my MacOS laptop with the same command.

Need written documentation?

I've observed many members want documentations on Suno-ai.

Please comment your requirements below, I'll try to write to the best of my knowledge.

Location of models is ambiguous

Personally I like to know where external files are stored on my system and even though I'm trying bark within a VENV it is not clear where the models are downloaded to.

It would be "nice" to have models stored inside a models/ folder within the root of the project, rather than some black hole location that is created from the S3 download.

I see there is an option to set ENV variables for the paths to the models, but that is not documented in your README, and one has to be dissecting your code to find their references.

code-switched no accent

For code-switched text, is it possible for BARK to not employ the native accent for each respective language in same voice?

Download the models manually

Hi, the download of the models is slow and unstable from my location,

image

This download takes more than 10 hours, and it does not support "resume download". I have tried several times, but it still cannot be successfully executed

Can you please provide the publicly accessible URL for these models so I can download them using a download tool and manually place them in the CACHE folder?

system requirements

Amazing work! Thank you for publishing your project.
I have Lenovo IdeaPad 3 15ALC6 Ryzen 5500u 8GB RAM with no external GPU. I tried to test the examples. Unfortunately, it's extremly slow.
After hours of struggling, I managed to finish the following code:

      from bark import SAMPLE_RATE, generate_audio
      from IPython.display import Audio
      
      text_prompt = """
           Hello.
      """
      audio_array = generate_audio(text_prompt)
      Audio(audio_array, rate=SAMPLE_RATE)

but it gave me no audio file.

Another point is that I like voice quality of Turkish speech model. I know there are legal issues but I need that Turkish speech model.

I hope you publish your speech model and instructions of how to build them.
All the best.

install error

Looking in indexes: http://mirrors.gwm.cn/pypi/web/simple, https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, https://pypi.mirrors.ustc.edu.cn/simple/, http://pypi.hustunique.com/, http://pypi.sdutlinux.org, http://pypi.douban.com/simple/, https://mirror.baidu.com/pypi/simple
Collecting git+https://github.com/suno-ai/bark.git
  Cloning https://github.com/suno-ai/bark.git to /tmp/pip-req-build-84uue3vz
  Running command git clone -q https://github.com/suno-ai/bark.git /tmp/pip-req-build-84uue3vz
  Resolved https://github.com/suno-ai/bark.git to commit 905c38b8bba2377c1bddd8060b81aea6d8a1c6d6
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... error
  ERROR: Command errored out with exit status 1:
   command: /home/ybZhang/miniconda3/envs/bark/bin/python3.8 /tmp/pip-standalone-pip-1qc0awh2/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-d5go4i6j/normal --no-warn-script-location --no-binary :none: --only-binary :none: -i http://mirrors.gwm.cn/pypi/web/simple --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple --extra-index-url http://mirrors.aliyun.com/pypi/simple/ --extra-index-url https://pypi.mirrors.ustc.edu.cn/simple/ --extra-index-url http://pypi.hustunique.com/ --extra-index-url http://pypi.sdutlinux.org --extra-index-url http://pypi.douban.com/simple/ --extra-index-url https://mirror.baidu.com/pypi/simple --trusted-host mirrors.gwm.cn --trusted-host pypi.tuna.tsinghua.edu.cn --trusted-host mirrors.aliyun.com --trusted-host pypi.mirrors.ustc.edu.cn --trusted-host pypi.hustunique.com --trusted-host pypi.sdutlinux.org --trusted-host pypi.douban.com --trusted-host mirror.baidu.com -- wheel
       cwd: None
  Complete output (3 lines):
  Looking in indexes: http://mirrors.gwm.cn/pypi/web/simple, https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, https://pypi.mirrors.ustc.edu.cn/simple/, http://pypi.hustunique.com/, http://pypi.sdutlinux.org, http://pypi.douban.com/simple/, https://mirror.baidu.com/pypi/simple, https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, https://pypi.mirrors.ustc.edu.cn/simple/, http://pypi.hustunique.com/, http://pypi.sdutlinux.org, http://pypi.douban.com/simple/, https://mirror.baidu.com/pypi/simple
  ERROR: Could not install packages due to an OSError: ('Received response with content-encoding: br, but failed to decode it.', Error("Decompression error: b'CL_SPACE'"))

  ----------------------------------------
WARNING: Discarding git+https://github.com/suno-ai/bark.git. Command errored out with exit status 1: /home/ybZhang/miniconda3/envs/bark/bin/python3.8 /tmp/pip-standalone-pip-1qc0awh2/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-d5go4i6j/normal --no-warn-script-location --no-binary :none: --only-binary :none: -i http://mirrors.gwm.cn/pypi/web/simple --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple --extra-index-url http://mirrors.aliyun.com/pypi/simple/ --extra-index-url https://pypi.mirrors.ustc.edu.cn/simple/ --extra-index-url http://pypi.hustunique.com/ --extra-index-url http://pypi.sdutlinux.org --extra-index-url http://pypi.douban.com/simple/ --extra-index-url https://mirror.baidu.com/pypi/simple --trusted-host mirrors.gwm.cn --trusted-host pypi.tuna.tsinghua.edu.cn --trusted-host mirrors.aliyun.com --trusted-host pypi.mirrors.ustc.edu.cn --trusted-host pypi.hustunique.com --trusted-host pypi.sdutlinux.org --trusted-host pypi.douban.com --trusted-host mirror.baidu.com -- wheel Check the logs for full command output.
ERROR: Command errored out with exit status 1: /home/ybZhang/miniconda3/envs/bark/bin/python3.8 /tmp/pip-standalone-pip-1qc0awh2/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-d5go4i6j/normal --no-warn-script-location --no-binary :none: --only-binary :none: -i http://mirrors.gwm.cn/pypi/web/simple --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple --extra-index-url http://mirrors.aliyun.com/pypi/simple/ --extra-index-url https://pypi.mirrors.ustc.edu.cn/simple/ --extra-index-url http://pypi.hustunique.com/ --extra-index-url http://pypi.sdutlinux.org --extra-index-url http://pypi.douban.com/simple/ --extra-index-url https://mirror.baidu.com/pypi/simple --trusted-host mirrors.gwm.cn --trusted-host pypi.tuna.tsinghua.edu.cn --trusted-host mirrors.aliyun.com --trusted-host pypi.mirrors.ustc.edu.cn --trusted-host pypi.hustunique.com --trusted-host pypi.sdutlinux.org --trusted-host pypi.douban.com --trusted-host mirror.baidu.com -- wheel Check the logs for full command output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.