Hey guys, thanks for releasing this as open-source! Is there any pla

there is a problem <div class="snippet-clipboard-content notranslate position-rela

Apple Silicon support,about suno-ai/bark

Comments (22)

gkucsko commented on May 30, 2024 14

added in most recent commit as experimental. has to be enabled via:

import os
os.environ["SUNO_ENABLE_MPS"] = "True"

from bark.

ludos1978 commented on May 30, 2024 4

there is a problem

NotImplementedError: The operator 'aten::_weight_norm_interface' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

my "run.py" looks like this now:

# run in the console
# > export PYTORCH_ENABLE_MPS_FALLBACK=1
# before running this script

from bark import SAMPLE_RATE, generate_audio
from IPython.display import Audio
import soundfile as sf
import numpy as np
from scipy.io import wavfile

text_prompt = """
     Hello
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)

# convert audio array to 16-bit int represenation
int_audio_arr = (audio_array * np.iinfo(np.int16).max).astype(np.int16)

# save as wav
wavfile.write("my_file.wav", SAMPLE_RATE, int_audio_arr)

it generated something, but i gotta test with other texts now.

from bark.

gkucsko commented on May 30, 2024 1

maybe related to precision? can you try running with float32?

from bark.

ludos1978 commented on May 30, 2024 1

I have been modifying the code a bit so i can use mps on different parts of the generation process.
With these changes i have been able to generate useful sound files when i switch mps on for the text_to_semantic and generate_coarse processes. However it breaks the sound if i use it on codec_decode and crashes if i use it on generate_fine.

I actually added to generation.py to have fixed seeds (generates the same sound every time). The fixed seeds however do not generate the same sound if you change the devices.

import torch
torch.manual_seed(0)
torch.use_deterministic_algorithms(True)
np.random.seed(0)

i am a bit lazy in my free time to use a versioning tool, but if anybody wants to have a look at my code, it's attached.
bark-mps.zip

edit: i am actually not completely sure it's accelerated using mps, but it makes a difference when i switch the devices, so i assume it must change something under the hood.

from bark.

gkucsko commented on May 30, 2024 1

hmm seems to be an issue with tokenizers from hugging face. maybe just try pip install -U tokenizers?

from bark.

DEVANANDJALLA commented on May 30, 2024 1

after using mps it has become slow

from bark.

PhanindraParashar commented on May 30, 2024 1

RuntimeError: Placeholder storage has not been allocated on MPS device!

from bark.

lxgbrl commented on May 30, 2024 1

Apple MacBook M2
#What worked for me

#You may have to install anaconda, ffmpeg
#Create an virtual env named bark
conda create --name bark
#Activate virtual env
conda activate bark
#install git
conda install git
#clone bark repo

git clone https://github.com/suno-ai/bark.git
cd bark
pip install .
pip install git+https://github.com/huggingface/transformers.git

#I also had to install soundfile
pip install soundfile
#enter following line
export PYTORCH_ENABLE_MPS_FALLBACK=1
#Create in root folder, where the setup.py is located, a python file called main.py with following content:

import os
os.environ["SUNO_OFFLOAD_CPU"] = "True"
os.environ["SUNO_USE_SMALL_MODELS"] = "True"

if os.getenv('PYTORCH_ENABLE_MPS_FALLBACK') != '1':
     print ("you really need to run > export PYTORCH_ENABLE_MPS_FALLBACK=1")
     exit(1)

from bark import SAMPLE_RATE, generate_audio
from IPython.display import Audio
import soundfile as sf
import numpy as np
from scipy.io import wavfile
from datetime import datetime as dt

text_prompt = """
I hope this will work for you.
"""
audio_array = generate_audio(text_prompt, history_prompt="en_speaker_8") #{lang_code}_speaker_{0-9}
iPyAudio = Audio(audio_array, rate=SAMPLE_RATE)

filename = 'audio-%s-%i.wav' % (dt.now().strftime('%Y%m%d-%H%M%S'), SAMPLE_RATE)
with open(filename, 'wb') as f:
    f.write(iPyAudio.data)
print ("saved %s" % filename)

#start your file
python main.py
#with the first start it will download the models, so it will take some time. After that it is much faster!
#Hope it works for you as well!

from bark.

ludos1978 commented on May 30, 2024

this seems to run, a bit faster, but way hotter.

add the following after the each of the device definitions on line 274, 292, 375, 545, 665, 743

if torch.backends.mps.is_available():
    device = "mps"

(i'd recommend to put that into a function in the long term, but i have no idea about ai systems...)

i also use the nightly build of https://pytorch.org/get-started/locally/ but i am not sure wether thats needed.

be aware that it gets really hot when cpu and gpu are running. istats said gpu temps up to 100 degrees c on gpu. so maybe have a tool to turn up the fans. it stil took 10 mins to run, and i dont know where i can find the generated content...

btw. i dont know if the result is anything usable, because i cant find it yet...

from bark.

mcamac commented on May 30, 2024

Re Apple Silicon, definitely something we welcome input/suggestions on.

Re where results end up, they start in memory. For example saving to wav/mp3 check out #13

from bark.

ludos1978 commented on May 30, 2024

something is definitely off. The sound file is not usable in any way. But the code runs without errors and the length of the file does make sense. Could be something about conversion. But it might be a deeper problem as well.

Maybe somebody else has an idea.

from bark.

sagardesai90 commented on May 30, 2024

Saw the same issue as you @ludos1978, but I don't see an output file like I do when I don't change the 'device' definition to mps.

NotImplementedError: The operator 'aten::_weight_norm_interface' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

If there's a way you get past this issue, lmk!

from bark.

ludos1978 commented on May 30, 2024

@gkucsko : i dont really understand where you mean to try running it in float32? I did figure out how to write to a file if you meant that.

@sagardesai90 : cant really describe it any better then i did in my script or in the error message.

With the code below, the audio generates fine on cpu. However on mps it only wushes and cracks all the time.

import os

if os.getenv('PYTORCH_ENABLE_MPS_FALLBACK') != '1':
     print ("you really need to run > export PYTORCH_ENABLE_MPS_FALLBACK=1")
     exit(1)

from bark import SAMPLE_RATE, generate_audio
from IPython.display import Audio
import soundfile as sf
import numpy as np
from scipy.io import wavfile
from datetime import datetime as dt

text_prompt = """
     Hello World, this is a test.
"""
audio_array = generate_audio(text_prompt)
iPyAudio = Audio(audio_array, rate=SAMPLE_RATE)

filename = 'audio-%s-%i.wav' % (dt.now().strftime('%Y%m%d-%H%M%S'), SAMPLE_RATE)
with open(filename, 'wb') as f:
    f.write(iPyAudio.data)
print ("saved %s" % filename)

from bark.

gkucsko commented on May 30, 2024

i meant hacking the autocast setting ontop of generation.py, but was a bit of a random guess. hard without access to your setup.
could you try with this branch and see if it works? #22

from bark.

ludos1978 commented on May 30, 2024

autocast does not seem to be available on mps. The code in the beginning of #22 is at least not working for me with pytorch 2.0.0 on a M2 Max.

from bark.

QKJIN commented on May 30, 2024

I am using a mac pro with m2. I put this code
device = "mps" if torch.backends.mps.is_available() else "cpu"
before preload_models(). Then I got voice within 2 mins with below text.

text_prompt = """
    I have a silky smooth voice, and today I will tell you 
    about the exercise regimen of the common sloth.
"""

This is the output info.

No GPU being used. Careful, inference might be extremely slow!
No GPU being used. Careful, inference might be extremely slow!
No GPU being used. Careful, inference might be extremely slow!
100%|████████████████████████████████████| 100/100 [00:19<00:00, 5.12it/s]
100%|████████████████████████████████████| 18/18 [01:29<00:00, 4.96s/it]

I don't have a computer with GPU. So I can't compare. But if delete this code device = "mps" if torch.backends.mps.is_available() else "cpu" . The time will be litter longer, almost double.
No GPU being used. Careful, inference might be extremely slow!
No GPU being used. Careful, inference might be extremely slow!
No GPU being used. Careful, inference might be extremely slow!
100%|████████████████████████████████████| 100/100 [00:38<00:00, 2.60it/s]
100%|████████████████████████████████████| 32/32 [02:44<00:00, 5.13s/it]

from bark.

MSchmidt commented on May 30, 2024

@QKJIN just declaring a device variable without actually using it doesn't seem correct. Do you have the full snippet for comparison?

from bark.

ludos1978 commented on May 30, 2024

From my experience the time it takes to generate sounds varies a lot between running, because there is no fixed seed we can use (this would be a nice addition for testing and comparison), so the voices can be really slow or fast and thus take longer or shorter to generate.

from bark.

QKJIN commented on May 30, 2024

@ludos1978 Yes, it's right. I generate several sounds. The speeds are all different.
@MSchmidt You are right. It's just a variable. It's not used correctly. I'm still searching how to use it.

from bark.

dataf3l commented on May 30, 2024

when I enable MPS I get this error:

Traceback (most recent call last):
File "/Users/b/study/ml/bark/mine.py", line 20, in
audio_array = generate_audio(text_prompt)
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/bark/api.py", line 113, in generate_audio
out = semantic_to_waveform(
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/bark/api.py", line 66, in semantic_to_waveform
audio_arr = codec_decode(fine_tokens)
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/bark/generation.py", line 826, in codec_decode
emb = model.quantizer.decode(arr)
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/encodec/quantization/vq.py", line 112, in decode
quantized = self.vq.decode(codes)
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/encodec/quantization/core_vq.py", line 361, in decode
quantized = layer.decode(indices)
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/encodec/quantization/core_vq.py", line 288, in decode
quantize = self._codebook.decode(embed_ind)
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/encodec/quantization/core_vq.py", line 202, in decode
quantize = self.dequantize(embed_ind)
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/encodec/quantization/core_vq.py", line 188, in dequantize
quantize = F.embedding(embed_ind, self.embed)
File "/Users/b/study/ml/bark/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

just with the basic example, any ideas on how to fix?

from bark.

gkucsko commented on May 30, 2024

it's cause encodec doesn't work on mps i think, but technically that shouldn't happen, i can't reproduce. can you try to find the bug?

from bark.

swyxio commented on May 30, 2024

im sorry but after reading everything here i still havent been able to run bark on apple silicon.

issue here:

             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/swyx/.pyenv/versions/myenv/lib/python3.11/site-packages/transformers-4.29.2-py3.11.egg/transformers/utils/import_utils.py", line 1174, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.bert because of the following error (look up to see its traceback):
dlopen(/Users/swyx/.pyenv/versions/myenv/lib/python3.11/site-packages/tokenizers-0.13.3-py3.11-macosx-13.2-arm64.egg/tokenizers/tokenizers.cpython-311-darwin.so, 0x0002): tried: '/Users/swyx/.pyenv/versions/myenv/lib/python3.11/site-packages/tokenizers-0.13.3-py3.11-macosx-13.2-arm64.egg/tokenizers/tokenizers.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/swyx/.pyenv/versions/myenv/lib/python3.11/site-packages/tokenizers-0.13.3-py3.11-macosx-13.2-arm64.egg/tokenizers/tokenizers.cpython-311-darwin.so' (no such file), '/Users/swyx/.pyenv/versions/myenv/lib/python3.11/site-packages/tokenizers-0.13.3-py3.11-macosx-13.2-arm64.egg/tokenizers/tokenizers.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/Users/swyx/.pyenv/versions/3.11.3/envs/myenv/lib/python3.11/site-packages/tokenizers-0.13.3-py3.11-macosx-13.2-arm64.egg/tokenizers/tokenizers.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/swyx/.pyenv/versions/3.11.3/envs/myenv/lib/python3.11/site-packages/tokenizers-0.13.3-py3.11-macosx-13.2-arm64.egg/tokenizers/tokenizers.cpython-311-darwin.so' (no such file), '/Users/swyx/.pyenv/versions/3.11.3/envs/myenv/lib/python3.11/site-packages/tokenizers-0.13.3-py3.11-macosx-13.2-arm64.egg/tokenizers/tokenizers.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

does nobody else see this? i tried searching "arm64" and didnt find anyone else with this issue

from bark.

Apple Silicon support about bark HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent