bytedance / music_source_separation Goto Github PK

License: Other

Python 93.70% Shell 6.30%

music_source_separation's Introduction

Music Source Separation

Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmementation of music source separation. Users can separate their favorite songs into different sources by installing this repository. Users can also train their own source separation systems. This repository can also be used for training speech enhancement, instruments separation, and any separation systems.

Demos

Vocals and accompaniment separation: https://www.youtube.com/watch?v=WH4m5HYzHsg

Installation

Install Python 3.7.

Sometimes the installation of bytesep will fail when using higher versions of Python. Suggest to use conda (or other environment manage tools) to manage the packages.

pip install bytesep==0.1.1

Separation

After installation, to separate your favorite song is easy. Users can execute the following commands in any directory.

python3 -m bytesep download_checkpoints

python3 -m bytesep separate \
    --source_type="vocals" \
    --audio_path="./resources/vocals_accompaniment_10s.mp3" \
    --output_path="separated_results/output.mp3"

Users can also put many audio files into a directory and separate them all.

python3 -m bytesep separate \
    --source_type="vocals" \
    --audio_path="audios_directory" \
    --output_path="outputs_directory"

The currently supported source types include "vocals" and "accompaniment". Users could also plug this MSS system into their own programmes. See example.py for examples.

The separation models are trained ONLY on the Musdb18 dataset (100 songs). Trained checkpoints can be downloaded at: https://zenodo.org/record/5804160.

Train a music source separation system from scratch

0. Download dataset

Users could train on the MUSDB18 dataset to reproduce our music source separation systems. Execute the following script to download and unzip the MUSDB18 dataset:

./scripts/0_download_datasets/musdb18.sh

The dataset looks like:

./datasets/musdb18
├── train (100 files)
│   ├── 'A Classic Education - NightOwl.stem.mp4'
│   └── ...
├── test (50 files)
│   ├── 'Al James - Schoolboy Facination.stem.mp4'
│   └── ...
└── README.md

1. Pack audio files into hdf5 files

Pack audio waveforms into hdf5 files to speed up training.

./scripts/1_pack_audios_to_hdf5s/musdb18/sr=44100,chn=2.sh

2. Create indexes for training

./scripts/2_create_indexes/musdb18/create_indexes.sh

3. Create evaluation audios

./scripts/3_create_evaluation_audios/musdb18/create_evaluation_audios.sh

4. Train & evaluate & save checkpoints

./scripts/4_train/musdb18/train.sh

5. Separate using user trained checkpoint

./scripts/5_separate/musdb18/separate.sh

Results

1. Separation Metrics

The following table shows the signal to noise ratio (SDR) metrics of vocals and accompaniment. The MSS systems are only trained with 100 songs from the MUSDB18 dataset. The metrics are calculated on the 50 test songs. It is highly suggest to use the subband version because it is faster to train and inference.

Model	vocals (dB)	accompaniment (dB)
ResUNet143 vocals	8.9	16.8
ResUNet143 Subband vocals	8.7	16.4
MobileNet Subband vocals	7.2	14.6

2. Parameters number & speed

The following table shows the number of parameters and inference time of a 1-min audio clip.

Model	Trainable params. num	process 1-min time (GPU Tesla V100)	process 1-min time (CPU Core i7)
ResUNet143 ISMIR	102 million	2.24 s	53.00 s
ResUNet143 Subband	102 million	0.56 s	13.68 s
MobileNet Subband	0.306 million	0.33 s	9.84

3. Metrics over step

The evaluation metrics over different stesp is shown below.

Finetune on new datasets

Users can finetuen pretrained checkpoints on new datasets. The following script is a template showing how to finetune pretrained a MSS system to the VCTK dataset for speech enhancement. (This is just an example. There is no problem if users do not have the VCTK dataset.) Users can also resume the training from a checkpoint by modifying the following script.

./scripts/4_train/vctk-musdb18/finetune.sh

Cite

[1] Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, Yuxuan Wang, Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation, International Society for Music Information Retrieval (ISMIR), 2021.

@inproceedings{kong2021decoupling,
  title={Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation.},
  author={Kong, Qiuqiang and Cao, Yin and Liu, Haohe and Choi, Keunwoo and Wang, Yuxuan },
  booktitle={ISMIR},
  year={2021},
  organization={Citeseer}
}

Contact

Qiquiang Kong

Frequent Asked Questions (FAQ)

FAQ.md

External Links

Other open sourced music source separation projects include but not limited to:

Subband ResUNet: https://github.com/haoheliu/Subband-Music-Separation

Demucs: https://github.com/facebookresearch/demucs

Spleeter: https://github.com/deezer/spleeter

Asteroid: https://github.com/asteroid-team/asteroid

Open-Unmix: https://github.com/sigsep/open-unmix-pytorch

music_source_separation's People

Contributors

Stargazers

Watchers

Forkers

chenchy bubing shaun95 ak391 noahf100 leminhnguyen damonzhenghuang protonfly whitefu 591317622a learningpro paramoeciu-bug alohabooster sodapeter chapter544 cfgarcia roychen99 hyattt xiexinlive picacure xxtcc cclauss comeonlby mapleee cheriylan one-sixth luluren cheng0915de opensourcespacez maxmax2016 sudo-apt-abrar yishionq zhouryan yoosful vkeep virus19887719 hongkaichen jim79 cmeninwa yusufsonmezz cadugimenes mtlong vtalker golang-backup yearlz foxbaby213 appleholic cst781 robotpin ishine xiongmaoxia hongwen-sun yangliuav liurhino mohan-zhang-u zj40n huaxiao87 332plim daiguoxi sapphire-sophie joewybean assassindesign mysqlsc beijinggao agangzz bob-hu enelaliu cxz donlinglok spinachr hadryan bill007bill heporis xinyuanliu2018 twtiyb gygy123 zhanche1 allen0125 ioyy900205 heqi1990 ardaffatsaqif joan126 xintaozhao0805 ballgle xiaolaba martinmml freddd13 julycui0 shinyun davidliujiafeng laosuan lvzhiqiang melonjack sophia1488 chaggai7 insmoin zhaoforever onlyourmiracle zxzxde postacik

music_source_separation's Issues

权重下载命令有问题

你好 , 权重下载命令应该是这个吧 , 应该是短线而不是下划线
python3 -m bytesep download-checkpoints

如何提升声伴分离在**古典乐器上面的表现呢，MUSDB18没有这块的数据做训练

Paper title misspelled in README - ResUet -> ResUNet

Readme: SDR incorrectly defined

SDR is signal-to-distortion ratio, not signal to noise ratio.

ffprobe error

Hi, I created a new Anaconda3 environment and installed the requirements and downloaded musdb18 using the command line in the README. But I could not pack them into hdf5:

$ PYTHONPATH=. ./scripts/1_pack_audios_to_hdf5s/musdb18/sr=44100,chn=2.sh
MUSDB18_DATASET_DIR=./datasets/musdb18
WORKSPACE=./workspaces/bytesep
Traceback (most recent call last):
  File "bytesep/dataset_creation/pack_audios_to_hdf5s/musdb18.py", line 195, in <module>
    pack_audios_to_hdf5s(args)
  File "bytesep/dataset_creation/pack_audios_to_hdf5s/musdb18.py", line 37, in pack_audios_to_hdf5s
    mus = musdb.DB(root=dataset_dir, subsets=[subset], split=split)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/musdb/__init__.py", line 115, in __init__
    self.tracks = self.load_mus_tracks(subsets=subsets, split=split)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/musdb/__init__.py", line 256, in load_mus_tracks
    track = MultiTrack(
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/musdb/audio_classes.py", line 133, in __init__
    super(MultiTrack, self).__init__(path=path, *args, **kwargs)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/musdb/audio_classes.py", line 52, in __init__
    self.info = stempeg.Info(self.path)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/stempeg/read.py", line 328, in __init__
    self.info = ffmpeg.probe(filename)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/ffmpeg/_probe.py", line 23, in probe
    raise Error('ffprobe', out, err)
ffmpeg._run.Error: ffprobe error (see stderr output for detail)

I have installed ffmpeg using conda install -c conda-forge ffmpeg

Why wrong code can work?

Hi @qiuqiangkong

I notice you modified the code of resunet_subbandtime.py a couple of days ago.

Now the code is:

separated_subband_audio = torch.stack(
    [
        self.feature_maps_to_wav(
            input_tensor=x[:, j :: self.subbands_num, :, :],
            sp=mag[:, j :: self.subbands_num, :, :],
            sin_in=sin_in[:, j :: self.subbands_num, :, :],
            cos_in=cos_in[:, j :: self.subbands_num, :, :],
            audio_length=audio_length,
        )
        for j in range(self.subbands_num)
    ],
    dim=2,
)

I think this change is for pqmf.synthesis compatible.

The old code is:

separated_subband_audio = torch.cat(
    [
        self.feature_maps_to_wav(
            input_tensor=net_output[:, j * C1 : (j + 1) * C1, :, :],
            sp=mag[:, j * C2 : (j + 1) * C2, :, :],
            sin_in=sin_in[:, j * C2 : (j + 1) * C2, :, :],
            cos_in=cos_in[:, j * C2 : (j + 1) * C2, :, :],
            audio_length=audio_length,
        )
        for j in range(self.subbands_num)
    ],
    dim=1,
)

The old code is not compatible with pqmf.synthesis. pqmf.synthesis assumes subband first, while the resulting tensor generated by the old code is channel first.

It is strange that the old code still works, and can get pretty good model. Do you know the reason?

Thanks

模型权重链接是失效了吗？

Hi~ 当我运行 python3 -m bytesep download-checkpoints 的时候，输出如下。想问下是我的操作错误还是模型权重链接失效了呢？
感谢回复！

--2022-06-25 00:07:44--  https://zenodo.org/record/5804160/files%5Cmobilenet_subbtandtime_vocals_7.2dB_500k_steps_v2.pth?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:46 ERROR 404: NOT FOUND.

--2022-06-25 00:07:46--  https://zenodo.org/record/5804160/files%5Cmobilenet_subbtandtime_accompaniment_14.6dB_500k_steps_v2.pth?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:47 ERROR 404: NOT FOUND.

--2022-06-25 00:07:47--  https://zenodo.org/record/5804160/files%5Cresunet143_subbtandtime_vocals_8.7dB_500k_steps_v2.pth?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:49 ERROR 404: NOT FOUND.

--2022-06-25 00:07:49--  https://zenodo.org/record/5804160/files%5Cresunet143_subbtandtime_accompaniment_16.4dB_500k_steps_v2.pth?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:50 ERROR 404: NOT FOUND.

--2022-06-25 00:07:50--  https://zenodo.org/record/5804160/files%5Ctrain_scripts.zip?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:52 ERROR 404: NOT FOUND.

Archive:  C:/Users/Administrator/bytesep_data/train_scripts.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in C:/Users/Administrator/bytesep_data/train_scripts.zip,
        and cannot find C:/Users/Administrator/bytesep_data/train_scripts.zip.zip, period.

pretrained model

can you please add the pretrained model thanks

We can extract both vocals and music background from this repo ?

google colab

please add a google colab for inference

Run Example failed in windows10

Hi Team, When I install bytesep according to the readdme, execute the demo command.

python3 -m bytesep separate \
    --source_type="vocals" \
    --audio_path="./resources/vocals_accompaniment_10s.mp3" \
    --output_path="separated_results/output.mp3"

However, I can't get the result and any error info.

My env:
Windows10
Python 3.7.5 (installed by conda)

I tried to use command line, PowerShell and git bash, and the results were the same.
Does it lack support for the windows ?

Appreciate if you can help this !

ModuleNotFoundError: No module named 'joblib'

done

integrate with Lightning ecosystem CI

Hello and so happy to see you use Pytorch-Lightning! 🎉
Just wondering if you already heard about quite the new Pytorch Lightning (PL) ecosystem CI where we would like to invite you to... You can check out our blog post about it: Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI ⚡
As you use PL framework for your cool project, we would like to enhance your experience and offer you safe updates to our future releases. At this moment, you run tests with a particular PL version, but it may accidentally happen that the next version will be incompatible with your project... 😕 We do not intend to change anything on our project side, but still here we have a solution - ecosystem CI with testing both - your and our latest development head we can find it very early and prevent releasing eventually bad version... 👍

What is needed to do?

have some tests, including PL integration
add config to ecosystem CI - https://github.com/PyTorchLightning/ecosystem-ci

What will you get?

scheduled nightly testing configured for development/stable versions
slack notification if something went wrong to investigate
testing also on multi-GPU machine as our gift to you 🐰

cc: @Borda

No module named 'torch'

python3 setup.py install fails on missing module error. I'm on python3.9.7 on 5.14.2-arch1-2.

Help!

Help me please! wuwuwu~
Building wheel for h5py (setup.py) ... error

my mp3 doen't work

I tried a mp3, but it failed:
Using cuda for separating ..
/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn("PySoundFile failed. Trying audioread instead.")
Traceback (most recent call last):
File "/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/librosa/core/audio.py", line 149, in load
with sf.SoundFile(path) as sf_desc:
File "/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '2.mp3': File contains data in an unknown format.

Is there any limitation about the mp3 format?
would you please share your test mp3: ./resources/vocals_accompaniment_10s.mp3

scipy.io.matlab.miobase.MatReadError: Mat file appears to be empty

CUDA out of memory

Hi! First, thanks for your contribution to music source separation!
Since I'd like to separate vocals & accompaniment using only 1 model, and the pre-trained model you provided has only one target, I decided to train it from scratch.
I also modify the code so that a validation set is added.

However, I cannot train it on 3 GTX 1080 with small batch size (<12), and I thought that the ResUNet_ismir doesn't require that much memory? Or does it really need a lot of GPU memory?
Could you provide some insight, like how long it takes for you to train a model and how many GPU memory it requires?

Thank you a lot!

无法提取人声（背景音可以正常提取）

--2021-10-21 23:15:39--  https://zenodo.org/record/5513378/files/resunet143_subbtandtime_vocals_8.8dB_350k_steps?download=1
正在解析主机 zenodo.org (zenodo.org)... 137.138.76.77
正在连接 zenodo.org (zenodo.org)|137.138.76.77|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 404 NOT FOUND
2021-10-21 23:15:40 错误 404：NOT FOUND。

↑ 看起来是因为模型文件404了......

命令为
python separate_scripts/separate.py --audio_path="（我的用户目录）/Desktop/test.mp3" --source_type="vocals"
全部信息如下

/Users/shihongsuo/opt/anaconda3/lib/python3.8/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
Checkpoint path: /Users/shihongsuo/bytesep_data/resunet143_subbtandtime_vocals_8.8dB_350k_steps.pth
--2021-10-21 23:15:39--  https://zenodo.org/record/5513378/files/resunet143_subbtandtime_vocals_8.8dB_350k_steps?download=1
正在解析主机 zenodo.org (zenodo.org)... 137.138.76.77
正在连接 zenodo.org (zenodo.org)|137.138.76.77|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 404 NOT FOUND
2021-10-21 23:15:40 错误 404：NOT FOUND。

Traceback (most recent call last):
  File "separate_scripts/separate.py", line 67, in <module>
    separate(args)
  File "separate_scripts/separate.py", line 26, in separate
    separator = SeparatorWrapper(
  File "/Users/shihongsuo/opt/anaconda3/lib/python3.8/site-packages/bytesep/inference.py", line 255, in __init__
    checkpoint = torch.load(self.checkpoint_path, map_location='cpu')
  File "/Users/shihongsuo/opt/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Users/shihongsuo/opt/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 764, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

wrong checkpoint file size in main.py

When extracting the accompaniment and error in the file is reported. The file size downloaded don't match with the informed.

The problem is in the line 127

assert os.path.getsize(checkpoint_path) == 414046363, error_message

and i think that should be:

assert os.path.getsize(checkpoint_path) == 414036369, error_message

KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to `ModelCheckpoint.save_weights_only` being set to `True`.'

Q1:
在musdb18上训练中断，通过设置pl.Trainer(resume_from_checkpoint=‘path/to/checkpoint.pth’)加载中断前的模型参数重新训练，报出以上错误。我检查了代码/opt/conda/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py，代码中 save_weights_only: bool = False，输出checkpoint，key值只有step和model，其中model的key值为
model.stft.conv_real.weight
model.stft.conv_imag.weight
model.istft.ola_window
model.istft.conv_real.weight
model.istft.conv_imag.weight
model.bn0.weight
model.bn0.bias
model.bn0.running_mean
model.bn0.running_var
model.bn0.num_batches_tracked
model.encoder_block1.conv_block.conv1.weight
model.encoder_block1.conv_block.bn1.weight
model.encoder_block1.conv_block.bn1.bias
model.encoder_block1.conv_block.bn1.running_mean
model.encoder_block1.conv_block.bn1.running_var
model.encoder_block1.conv_block.bn1.num_batches_tracked
model.encoder_block1.conv_block.conv2.weight
model.encoder_block1.conv_block.bn2.weight
model.encoder_block1.conv_block.bn2.bias
model.encoder_block1.conv_block.bn2.running_mean
model.encoder_block1.conv_block.bn2.running_var
model.encoder_block1.conv_block.bn2.num_batches_tracked
model.encoder_block2.conv_block.conv1.weight
model.encoder_block2.conv_block.bn1.weight
model.encoder_block2.conv_block.bn1.bias
model.encoder_block2.conv_block.bn1.running_mean
model.encoder_block2.conv_block.bn1.running_var
model.encoder_block2.conv_block.bn1.num_batches_tracked
model.encoder_block2.conv_block.conv2.weight
model.encoder_block2.conv_block.bn2.weight
model.encoder_block2.conv_block.bn2.bias
model.encoder_block2.conv_block.bn2.running_mean
model.encoder_block2.conv_block.bn2.running_var
model.encoder_block2.conv_block.bn2.num_batches_tracked
model.encoder_block3.conv_block.conv1.weight
model.encoder_block3.conv_block.bn1.weight
model.encoder_block3.conv_block.bn1.bias
model.encoder_block3.conv_block.bn1.running_mean
model.encoder_block3.conv_block.bn1.running_var
model.encoder_block3.conv_block.bn1.num_batches_tracked
model.encoder_block3.conv_block.conv2.weight
model.encoder_block3.conv_block.bn2.weight
model.encoder_block3.conv_block.bn2.bias
model.encoder_block3.conv_block.bn2.running_mean
model.encoder_block3.conv_block.bn2.running_var
model.encoder_block3.conv_block.bn2.num_batches_tracked
model.encoder_block4.conv_block.conv1.weight
model.encoder_block4.conv_block.bn1.weight
model.encoder_block4.conv_block.bn1.bias
model.encoder_block4.conv_block.bn1.running_mean
model.encoder_block4.conv_block.bn1.running_var
model.encoder_block4.conv_block.bn1.num_batches_tracked
model.encoder_block4.conv_block.conv2.weight
model.encoder_block4.conv_block.bn2.weight
model.encoder_block4.conv_block.bn2.bias
model.encoder_block4.conv_block.bn2.running_mean
model.encoder_block4.conv_block.bn2.running_var
model.encoder_block4.conv_block.bn2.num_batches_tracked
model.encoder_block5.conv_block.conv1.weight
model.encoder_block5.conv_block.bn1.weight
model.encoder_block5.conv_block.bn1.bias
model.encoder_block5.conv_block.bn1.running_mean
model.encoder_block5.conv_block.bn1.running_var
model.encoder_block5.conv_block.bn1.num_batches_tracked
model.encoder_block5.conv_block.conv2.weight
model.encoder_block5.conv_block.bn2.weight
model.encoder_block5.conv_block.bn2.bias
model.encoder_block5.conv_block.bn2.running_mean
model.encoder_block5.conv_block.bn2.running_var
model.encoder_block5.conv_block.bn2.num_batches_tracked
model.encoder_block6.conv_block.conv1.weight
model.encoder_block6.conv_block.bn1.weight
model.encoder_block6.conv_block.bn1.bias
model.encoder_block6.conv_block.bn1.running_mean
model.encoder_block6.conv_block.bn1.running_var
model.encoder_block6.conv_block.bn1.num_batches_tracked
model.encoder_block6.conv_block.conv2.weight
model.encoder_block6.conv_block.bn2.weight
model.encoder_block6.conv_block.bn2.bias
model.encoder_block6.conv_block.bn2.running_mean
model.encoder_block6.conv_block.bn2.running_var
model.encoder_block6.conv_block.bn2.num_batches_tracked
model.conv_block7.conv1.weight
model.conv_block7.bn1.weight
model.conv_block7.bn1.bias
model.conv_block7.bn1.running_mean
model.conv_block7.bn1.running_var
model.conv_block7.bn1.num_batches_tracked
model.conv_block7.conv2.weight
model.conv_block7.bn2.weight
model.conv_block7.bn2.bias
model.conv_block7.bn2.running_mean
model.conv_block7.bn2.running_var
model.conv_block7.bn2.num_batches_tracked
model.decoder_block1.conv1.weight
model.decoder_block1.bn1.weight
model.decoder_block1.bn1.bias
model.decoder_block1.bn1.running_mean
model.decoder_block1.bn1.running_var
model.decoder_block1.bn1.num_batches_tracked
model.decoder_block1.conv_block2.conv1.weight
model.decoder_block1.conv_block2.bn1.weight
model.decoder_block1.conv_block2.bn1.bias
model.decoder_block1.conv_block2.bn1.running_mean
model.decoder_block1.conv_block2.bn1.running_var
model.decoder_block1.conv_block2.bn1.num_batches_tracked
model.decoder_block1.conv_block2.conv2.weight
model.decoder_block1.conv_block2.bn2.weight
model.decoder_block1.conv_block2.bn2.bias
model.decoder_block1.conv_block2.bn2.running_mean
model.decoder_block1.conv_block2.bn2.running_var
model.decoder_block1.conv_block2.bn2.num_batches_tracked
model.decoder_block2.conv1.weight
model.decoder_block2.bn1.weight
model.decoder_block2.bn1.bias
model.decoder_block2.bn1.running_mean
model.decoder_block2.bn1.running_var
model.decoder_block2.bn1.num_batches_tracked
model.decoder_block2.conv_block2.conv1.weight
model.decoder_block2.conv_block2.bn1.weight
model.decoder_block2.conv_block2.bn1.bias
model.decoder_block2.conv_block2.bn1.running_mean
model.decoder_block2.conv_block2.bn1.running_var
model.decoder_block2.conv_block2.bn1.num_batches_tracked
model.decoder_block2.conv_block2.conv2.weight
model.decoder_block2.conv_block2.bn2.weight
model.decoder_block2.conv_block2.bn2.bias
model.decoder_block2.conv_block2.bn2.running_mean
model.decoder_block2.conv_block2.bn2.running_var
model.decoder_block2.conv_block2.bn2.num_batches_tracked
model.decoder_block3.conv1.weight
model.decoder_block3.bn1.weight
model.decoder_block3.bn1.bias
model.decoder_block3.bn1.running_mean
model.decoder_block3.bn1.running_var
model.decoder_block3.bn1.num_batches_tracked
model.decoder_block3.conv_block2.conv1.weight
model.decoder_block3.conv_block2.bn1.weight
model.decoder_block3.conv_block2.bn1.bias
model.decoder_block3.conv_block2.bn1.running_mean
model.decoder_block3.conv_block2.bn1.running_var
model.decoder_block3.conv_block2.bn1.num_batches_tracked
model.decoder_block3.conv_block2.conv2.weight
model.decoder_block3.conv_block2.bn2.weight
model.decoder_block3.conv_block2.bn2.bias
model.decoder_block3.conv_block2.bn2.running_mean
model.decoder_block3.conv_block2.bn2.running_var
model.decoder_block3.conv_block2.bn2.num_batches_tracked
model.decoder_block4.conv1.weight
model.decoder_block4.bn1.weight
model.decoder_block4.bn1.bias
model.decoder_block4.bn1.running_mean
model.decoder_block4.bn1.running_var
model.decoder_block4.bn1.num_batches_tracked
model.decoder_block4.conv_block2.conv1.weight
model.decoder_block4.conv_block2.bn1.weight
model.decoder_block4.conv_block2.bn1.bias
model.decoder_block4.conv_block2.bn1.running_mean
model.decoder_block4.conv_block2.bn1.running_var
model.decoder_block4.conv_block2.bn1.num_batches_tracked
model.decoder_block4.conv_block2.conv2.weight
model.decoder_block4.conv_block2.bn2.weight
model.decoder_block4.conv_block2.bn2.bias
model.decoder_block4.conv_block2.bn2.running_mean
model.decoder_block4.conv_block2.bn2.running_var
model.decoder_block4.conv_block2.bn2.num_batches_tracked
model.decoder_block5.conv1.weight
model.decoder_block5.bn1.weight
model.decoder_block5.bn1.bias
model.decoder_block5.bn1.running_mean
model.decoder_block5.bn1.running_var
model.decoder_block5.bn1.num_batches_tracked
model.decoder_block5.conv_block2.conv1.weight
model.decoder_block5.conv_block2.bn1.weight
model.decoder_block5.conv_block2.bn1.bias
model.decoder_block5.conv_block2.bn1.running_mean
model.decoder_block5.conv_block2.bn1.running_var
model.decoder_block5.conv_block2.bn1.num_batches_tracked
model.decoder_block5.conv_block2.conv2.weight
model.decoder_block5.conv_block2.bn2.weight
model.decoder_block5.conv_block2.bn2.bias
model.decoder_block5.conv_block2.bn2.running_mean
model.decoder_block5.conv_block2.bn2.running_var
model.decoder_block5.conv_block2.bn2.num_batches_tracked
model.decoder_block6.conv1.weight
model.decoder_block6.bn1.weight
model.decoder_block6.bn1.bias
model.decoder_block6.bn1.running_mean
model.decoder_block6.bn1.running_var
model.decoder_block6.bn1.num_batches_tracked
model.decoder_block6.conv_block2.conv1.weight
model.decoder_block6.conv_block2.bn1.weight
model.decoder_block6.conv_block2.bn1.bias
model.decoder_block6.conv_block2.bn1.running_mean
model.decoder_block6.conv_block2.bn1.running_var
model.decoder_block6.conv_block2.bn1.num_batches_tracked
model.decoder_block6.conv_block2.conv2.weight
model.decoder_block6.conv_block2.bn2.weight
model.decoder_block6.conv_block2.bn2.bias
model.decoder_block6.conv_block2.bn2.running_mean
model.decoder_block6.conv_block2.bn2.running_var
model.decoder_block6.conv_block2.bn2.num_batches_tracked
model.after_conv_block1.conv1.weight
model.after_conv_block1.bn1.weight
model.after_conv_block1.bn1.bias
model.after_conv_block1.bn1.running_mean
model.after_conv_block1.bn1.running_var
model.after_conv_block1.bn1.num_batches_tracked
model.after_conv_block1.conv2.weight
model.after_conv_block1.bn2.weight
model.after_conv_block1.bn2.bias
model.after_conv_block1.bn2.running_mean
model.after_conv_block1.bn2.running_var
model.after_conv_block1.bn2.num_batches_tracked
model.after_conv2.weight
model.after_conv2.bias
请问输出这种错误是什么原因呢？
Q2:
请问代码保存模型的过程您是自定义函数get_callbacks，并没有使用pl.Trainer自带的checkpoint_callback是吗？模型保存代码是checkpoint = {'step': global_step, 'model': self.model.state_dict()}，仅仅保存了模型的参数，那么我想重训练的话，需要设置哪些参数呢？

Cannot install "torch"

~~it shows that "ackagesNotFoundError: The following packages are not available from current channels:
torch"

Do you know how to fix it?

Readlink & Optimizer_Type errors when preparing the audio and running training script

Thanks for sharing your great work!

Following the steps in the ReadMe, I have these error messages when preparing the audio and running the training script:

How do I solve the issues?

I am running the code on a machine with macOS Big Sur.

Also do you have ONNX version of your models to share? The main reason I was running the training code is trying to convert your PyTorch models to ONNX models, it'd help us a lot if you already have them available!

Thanks!

Windows GUI version with cuda packed here

I packed a GUI version https://github.com/Freddd13/music-separation-gui with both cuda and models.
It's very convenient. You can try this if you have problems installing bytesep.

分离结果存在明显的高频噪音问题

你好，使用该算法重新训练模型，人声和伴奏模型效果非常理想，但基于musdb训练出的四轨分离模型，其分离出的Bass和Drums都存在明显的高频噪声；基于钢琴数据训练的钢琴音分离模型也同样存在这个问题。以Bass为例，结果与Spleeter结果的频谱图对比如下所示：

请问一下，这会是什么原因呢？训练过程中数据增强这块的参数有什么会影响这个效果吗？
期待您的回复，感谢！

主分支使用发现两个命令和参数问题

1、readme.md记录的命令python3 -m bytesep download_checkpoints，最后不是_而是-，即python3 -m bytesep download-checkpoints
2、去伴奏的参数检查没问题，但是去人声的命令及类型为accompaniment时，检查家在的resunet143_subbtandtime_accompaniment_16.4dB_500k_steps_v2.pth文件大小时候，代码里是和414046363判断相等，实际文件大小是414036369，需要修改下__main__.py 127:65 ，414046363改为414036369
我本地这么修改是运行没有问题

请问有没有windows系统的部署方法

对项目挺感兴趣, 不过在windows系统安装失败了, 请问下作者有没有这方面的解决方案?

I want to get the vocal and background sound files at the same time

I have completed the model training of removing the background sound and acquiring the vocal file,but I also want to get the background sound file , may I ask how to operate? Need to retrain the model ?thanks.

ImportError: cannot import name 'container_abcs' from 'torch._six'

I just wanted to run the bytesep/separate.py but met the error like the title.

How can I fix it?

Evaluation scores do not seem to improve during training

Hi, I modified this config so that there'd be 2 targets (vocals & accompaniment).
(https://github.com/bytedance/music_source_separation/blob/master/scripts/4_train/musdb18/configs/accompaniment-vocals%2Cresunet_subbandtime.yaml)
I also changed batch size from 16 to 12, and the MUSDB I use is HQ dataset (.wav)
Those are the only modifications I did.

But given the evaluation scores during training, I'm not sure it'll reach 16.x & 8.x for accompaniment and vocals at step 500001

Step: 0, accompaniment: -0.606, vocals: -2.908
Step: 10000, accompaniment: 2.662, vocals: 0.399
Step: 20000, accompaniment: 2.680, vocals: 0.451
Step: 30000, accompaniment: 2.702, vocals: 0.498
Step: 40000, accompaniment: 2.726, vocals: 0.518
Step: 50000, accompaniment: 2.719, vocals: 0.539

Thanks in advance!

鸟鸣、高跟鞋踩地板的这些声音能从人声中分离出来吗？

poor peformance using short segment <1s

Hi,

Thanks for you grant work! The repo achieve good performance for musice separation offline.

However, for online setting(using short segment <1s), it achieve poor performance. There are some short accompaniment on separated vocals. And about -2 SDR is got. Could you provide some suggestion about how to achieve better performance.

Thanks!

AssertionError: The directory ./workspaces/bytesep/evaluation_audios/musdb18 is empty!

请问这个问题怎么解决？

Dataset for drums

In the sample Youtube video, there is a section which separates the drums.

I could not find a drum separation script and model in the repo.

Can you also share that part?

训练和预测用的默认模型不一致

在预测时用的默认模型是resunet143_subbtandtime_vocals_8.8dB_350k_steps/resunet143_subbtandtime_accompaniment_16.4dB_350k_steps.pth。训练时默认是vocals-accompaniment.unet,并且缺少optimizer_type，请问是为什么呢，如果想自己训练的话，应该用哪个呢？谢谢

现在完全无法转换

在colab 上面現在完全無法使用了
!python3 /content/music_source_separation/bytesep/separate.py 執行這個指令都會出錯

OSError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtESt10shared_ptrIS0_EPSo

希望能解决问题,谢谢！

生成不了文件

如上图,执行 ./scripts/1_pack_audios_to_hdf5s/musdb18/sr=44100,chn=2.sh 操作后没有生成文件。请问是怎么回事？

WE ARE NOT ABLE TO (DOWNLOAD_CHECKPOINTS)

Hi, I'm trying to use bytedance;

But I'm facing a problem when executing this command:

python -m bytesep download_checkpoints

But I'm getting the following error:

C:\Users\lucas\MSS>python -m bytesep download_checkpoints
usage: __main__.py [-h] {download-checkpoints,separate} ...
__main__.py: error: argument mode: invalid choice: 'download_checkpoints' (choose from 'download-checkpoints', 'separate')

I would like to know how I can solve this problem and can use bytedance in my songs,

Note, I was able to use it normally until October 2021, but now I'm not doing it anymore;

My thanks in advance, regards,
Lucas.

How can i install inplace-abn

Some errors occurred during installation

The error is:
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "E:\Sotfware\anaconda\envs\torch\lib\site-packages\colorama\ansitowin32.py", line 59, in closed
return stream.closed
ValueError: underlying buffer has been detached
----------------------------------------
ERROR: Command errored out with exit status 1: 'E:\Sotfware\anaconda\envs\torch\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\HASEE\AppData\Local\Temp\pip-install-983wdizd\inplace-abn_a
aaed263bb064ef2b4c50af117b614df\setup.py'"'"'; file='"'"'C:\Users\HASEE\AppData\Local\Temp\pip-install-983wdizd\inplace-abn_aaaed263bb064ef2b4c50af117b614df\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(_file
_) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users
HASEE\AppData\Local\Temp\pip-record-c2alay5x\install-record.txt' --single-version-externally-managed --compile --install-headers 'E:\Sotfware\anaconda\envs\torch\Include\inplace-abn' Check the logs for full command output.

So how can i install inplace-abn?
thanks

ModuleNotFoundError

Hello, I have done all the installations and downloaded the templates, but when using the command

"python3 separate_scripts/separate.py" or
"./separate_scripts/separate_accompaniment.sh "resources/vocals_accompaniment_10s.mp3" "sep_accompaniment.mp3"

I get the following error:

Traceback (most recent call last):
  File "bytesep/inference.py", line 13, in <module>
    from bytesep.models.lightning_modules import get_model_class
ModuleNotFoundError: No module named 'bytesep

How can I fix this? Thanks in advance.

Checkpoint is incomplete, please download again!

您好，我成功分离了vocals之后，再去分离accompaniment的时候就提示报错提示了Checkpoint文件不完整，我看了下script和模型文件都在，重新下载了也是报同样的错

Traceback (most recent call last):
File "E:\Anaconda3\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "E:\Anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "E:\Anaconda3\lib\site-packages\bytesep_main.py", line 242, in
separate(args)
File "E:\Anaconda3\lib\site-packages\bytesep_main.py", line 162, in separate
config_yaml, checkpoint_path = get_paths(source_type, model_type)
File "E:\Anaconda3\lib\site-packages\bytesep_main_.py", line 127, in get_paths
assert os.path.getsize(checkpoint_path) == 414046363, error_message
AssertionError: Checkpoint is incomplete, please download again!

新的数据进行训练

您好，您训练的过程中单独设置了对musdb18的预处理，如果使用别的数据集（非您罗列出的数据集），请问该如何对数据进行预处理呢?应该如何把信息写入hdf5文件？

How could I using the source code to separate all audio files in the directory

I need to do music source separation for a large number of audio files, and I have written a python script to iterate through all the files in the directory, but how should I import separator in inference.p

NEW GOOGLE COLAB FOR BYTEDANCE 2022

For those who are facing problems with the previously distributed colab, you can use the one I wrote some time ago, it works perfectly, good use!

https://colab.research.google.com/drive/1AQlrzo9UOwAUrOpCyUYRZ8fRwHTqBeNl?usp=sharing

An error occurred

resunet143_subbtandtime_accompaniment_16.4dB_500k_steps_v2

An error occurred

assert os.path.getsize(checkpoint_path) == 414046363, error_message
AssertionError: Checkpoint is incomplete, please download again!

All-Zero Separations

Hi,

first of all, thank you for open-sourcing the code!

I've cloned the repository and downloaded the pretrained models with the ./separate_scripts/download_checkpoints.sh script.

By looking at the weights of the model (resunet143_ismir2021_vocals_8.9dB_350k_steps.pth), I noticed that the running_var of the batchnorm layers in the first decoding blocks has very high mean values, even reaching 2*10^4: is this expected?

The reason I'm asking is the following.
I'm running inference using this script:

#!/bin/bash

WORKSPACE=<my_workspace_path>

echo "WORKSPACE=${WORKSPACE}"
export PYTHONPATH=./

# Users can modify the following config file.
TRAIN_CONFIG_YAML="scripts/4_train/musdb18/configs/vocals-accompaniment,resunet_ismir2021.yaml"

CHECKPOINT_PATH="<my_path_to_checkpoints>/resunet143_ismir2021_vocals_8.9dB_350k_steps.pth"

IN_FILE="<my_path_to_demo_songs>/PanicStation.wav"
OUT_FILE="<my_path_to_results>/PS_separations.wav"

# Inference
python3 bytesep/inference.py \
    --config_yaml=$TRAIN_CONFIG_YAML \
    --checkpoint_path=$CHECKPOINT_PATH \
    --audio_path=$IN_FILE \
    --output_path=$OUT_FILE

The output contains all zeros, in all channels.

By looking at the intermediate feature maps, I noticed that their values increase while going deeper into the architecture. This happens gradually, until the variable x_center at this line contains values such as 10^36.
At that point, I get infinity, followed by NaNs, which eventually result in all-zeros output.

I'm not sure this is related to the high values in the running_var, but the deeper architecture is the only one that causes this trouble (the simple UNet does not).

Does this happen to some of you as well?

我把数据扩展到了10多个类别但很多类别分离不出来是不是要设置什么参数呢

About bass and drums separation

Congratulation, your model is spectacular and far surpasses anything I have tested so far. Incredible !!! I don't have a GPU to train the model, so I hope you can provide the checkpoints to extract the bass and drums. Either way, thank you very much for your work and good luck with your future research.

是否开源instruments_dataset？

在尝试训练maestro的钢琴音分离模型时，发现需要混音数据，而在下载混音数据的脚本
scripts/0_download_datasets/instruments.sh 中没有找到下载链接。