Coder Social home page Coder Social logo

music_source_separation's Introduction

Music Source Separation

Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmementation of music source separation. Users can separate their favorite songs into different sources by installing this repository. Users can also train their own source separation systems. This repository can also be used for training speech enhancement, instruments separation, and any separation systems.

Demos

Vocals and accompaniment separation: https://www.youtube.com/watch?v=WH4m5HYzHsg

Installation

Install Python 3.7.

Sometimes the installation of bytesep will fail when using higher versions of Python. Suggest to use conda (or other environment manage tools) to manage the packages.

pip install bytesep==0.1.1

Separation

After installation, to separate your favorite song is easy. Users can execute the following commands in any directory.

python3 -m bytesep download_checkpoints
python3 -m bytesep separate \
    --source_type="vocals" \
    --audio_path="./resources/vocals_accompaniment_10s.mp3" \
    --output_path="separated_results/output.mp3"

Users can also put many audio files into a directory and separate them all.

python3 -m bytesep separate \
    --source_type="vocals" \
    --audio_path="audios_directory" \
    --output_path="outputs_directory"

The currently supported source types include "vocals" and "accompaniment". Users could also plug this MSS system into their own programmes. See example.py for examples.

The separation models are trained ONLY on the Musdb18 dataset (100 songs). Trained checkpoints can be downloaded at: https://zenodo.org/record/5804160.

Train a music source separation system from scratch

0. Download dataset

Users could train on the MUSDB18 dataset to reproduce our music source separation systems. Execute the following script to download and unzip the MUSDB18 dataset:

./scripts/0_download_datasets/musdb18.sh

The dataset looks like:

./datasets/musdb18
├── train (100 files)
│   ├── 'A Classic Education - NightOwl.stem.mp4'
│   └── ...
├── test (50 files)
│   ├── 'Al James - Schoolboy Facination.stem.mp4'
│   └── ...
└── README.md

1. Pack audio files into hdf5 files

Pack audio waveforms into hdf5 files to speed up training.

./scripts/1_pack_audios_to_hdf5s/musdb18/sr=44100,chn=2.sh

2. Create indexes for training

./scripts/2_create_indexes/musdb18/create_indexes.sh

3. Create evaluation audios

./scripts/3_create_evaluation_audios/musdb18/create_evaluation_audios.sh

4. Train & evaluate & save checkpoints

./scripts/4_train/musdb18/train.sh

5. Separate using user trained checkpoint

./scripts/5_separate/musdb18/separate.sh

Results

1. Separation Metrics

The following table shows the signal to noise ratio (SDR) metrics of vocals and accompaniment. The MSS systems are only trained with 100 songs from the MUSDB18 dataset. The metrics are calculated on the 50 test songs. It is highly suggest to use the subband version because it is faster to train and inference.

Model vocals (dB) accompaniment (dB)
ResUNet143 vocals 8.9 16.8
ResUNet143 Subband vocals 8.7 16.4
MobileNet Subband vocals 7.2 14.6

2. Parameters number & speed

The following table shows the number of parameters and inference time of a 1-min audio clip.

Model Trainable params. num process 1-min time (GPU Tesla V100) process 1-min time (CPU Core i7)
ResUNet143 ISMIR 102 million 2.24 s 53.00 s
ResUNet143 Subband 102 million 0.56 s 13.68 s
MobileNet Subband 0.306 million 0.33 s 9.84

3. Metrics over step

The evaluation metrics over different stesp is shown below.

Finetune on new datasets

Users can finetuen pretrained checkpoints on new datasets. The following script is a template showing how to finetune pretrained a MSS system to the VCTK dataset for speech enhancement. (This is just an example. There is no problem if users do not have the VCTK dataset.) Users can also resume the training from a checkpoint by modifying the following script.

./scripts/4_train/vctk-musdb18/finetune.sh

Cite

[1] Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, Yuxuan Wang, Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation, International Society for Music Information Retrieval (ISMIR), 2021.

@inproceedings{kong2021decoupling,
  title={Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation.},
  author={Kong, Qiuqiang and Cao, Yin and Liu, Haohe and Choi, Keunwoo and Wang, Yuxuan },
  booktitle={ISMIR},
  year={2021},
  organization={Citeseer}
}

Contact

Qiquiang Kong

Frequent Asked Questions (FAQ)

FAQ.md

External Links

Other open sourced music source separation projects include but not limited to:

Subband ResUNet: https://github.com/haoheliu/Subband-Music-Separation

Demucs: https://github.com/facebookresearch/demucs

Spleeter: https://github.com/deezer/spleeter

Asteroid: https://github.com/asteroid-team/asteroid

Open-Unmix: https://github.com/sigsep/open-unmix-pytorch

music_source_separation's People

Contributors

cclauss avatar qiuqiangkong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

music_source_separation's Issues

权重下载命令有问题

你好 , 权重下载命令应该是这个吧 , 应该是短线而不是下划线
python3 -m bytesep download-checkpoints

ffprobe error

Hi, I created a new Anaconda3 environment and installed the requirements and downloaded musdb18 using the command line in the README. But I could not pack them into hdf5:

$ PYTHONPATH=. ./scripts/1_pack_audios_to_hdf5s/musdb18/sr=44100,chn=2.sh
MUSDB18_DATASET_DIR=./datasets/musdb18
WORKSPACE=./workspaces/bytesep
Traceback (most recent call last):
  File "bytesep/dataset_creation/pack_audios_to_hdf5s/musdb18.py", line 195, in <module>
    pack_audios_to_hdf5s(args)
  File "bytesep/dataset_creation/pack_audios_to_hdf5s/musdb18.py", line 37, in pack_audios_to_hdf5s
    mus = musdb.DB(root=dataset_dir, subsets=[subset], split=split)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/musdb/__init__.py", line 115, in __init__
    self.tracks = self.load_mus_tracks(subsets=subsets, split=split)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/musdb/__init__.py", line 256, in load_mus_tracks
    track = MultiTrack(
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/musdb/audio_classes.py", line 133, in __init__
    super(MultiTrack, self).__init__(path=path, *args, **kwargs)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/musdb/audio_classes.py", line 52, in __init__
    self.info = stempeg.Info(self.path)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/stempeg/read.py", line 328, in __init__
    self.info = ffmpeg.probe(filename)
  File "/data1/howard/anaconda3/lib/python3.8/site-packages/ffmpeg/_probe.py", line 23, in probe
    raise Error('ffprobe', out, err)
ffmpeg._run.Error: ffprobe error (see stderr output for detail)

I have installed ffmpeg using conda install -c conda-forge ffmpeg

Why wrong code can work?

Hi @qiuqiangkong

I notice you modified the code of resunet_subbandtime.py a couple of days ago.

Now the code is:

separated_subband_audio = torch.stack(
    [
        self.feature_maps_to_wav(
            input_tensor=x[:, j :: self.subbands_num, :, :],
            sp=mag[:, j :: self.subbands_num, :, :],
            sin_in=sin_in[:, j :: self.subbands_num, :, :],
            cos_in=cos_in[:, j :: self.subbands_num, :, :],
            audio_length=audio_length,
        )
        for j in range(self.subbands_num)
    ],
    dim=2,
)

I think this change is for pqmf.synthesis compatible.

The old code is:

separated_subband_audio = torch.cat(
    [
        self.feature_maps_to_wav(
            input_tensor=net_output[:, j * C1 : (j + 1) * C1, :, :],
            sp=mag[:, j * C2 : (j + 1) * C2, :, :],
            sin_in=sin_in[:, j * C2 : (j + 1) * C2, :, :],
            cos_in=cos_in[:, j * C2 : (j + 1) * C2, :, :],
            audio_length=audio_length,
        )
        for j in range(self.subbands_num)
    ],
    dim=1,
)

The old code is not compatible with pqmf.synthesis. pqmf.synthesis assumes subband first, while the resulting tensor generated by the old code is channel first.

It is strange that the old code still works, and can get pretty good model. Do you know the reason?

Thanks

模型权重链接是失效了吗?

Hi~ 当我运行 python3 -m bytesep download-checkpoints 的时候,输出如下。想问下是我的操作错误还是模型权重链接失效了呢?
感谢回复!

--2022-06-25 00:07:44--  https://zenodo.org/record/5804160/files%5Cmobilenet_subbtandtime_vocals_7.2dB_500k_steps_v2.pth?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:46 ERROR 404: NOT FOUND.

--2022-06-25 00:07:46--  https://zenodo.org/record/5804160/files%5Cmobilenet_subbtandtime_accompaniment_14.6dB_500k_steps_v2.pth?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:47 ERROR 404: NOT FOUND.

--2022-06-25 00:07:47--  https://zenodo.org/record/5804160/files%5Cresunet143_subbtandtime_vocals_8.7dB_500k_steps_v2.pth?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:49 ERROR 404: NOT FOUND.

--2022-06-25 00:07:49--  https://zenodo.org/record/5804160/files%5Cresunet143_subbtandtime_accompaniment_16.4dB_500k_steps_v2.pth?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:50 ERROR 404: NOT FOUND.

--2022-06-25 00:07:50--  https://zenodo.org/record/5804160/files%5Ctrain_scripts.zip?download=1
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 404 NOT FOUND
2022-06-25 00:07:52 ERROR 404: NOT FOUND.

Archive:  C:/Users/Administrator/bytesep_data/train_scripts.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in C:/Users/Administrator/bytesep_data/train_scripts.zip,
        and cannot find C:/Users/Administrator/bytesep_data/train_scripts.zip.zip, period.

Run Example failed in windows10

Hi Team, When I install bytesep according to the readdme, execute the demo command.

python3 -m bytesep separate \
    --source_type="vocals" \
    --audio_path="./resources/vocals_accompaniment_10s.mp3" \
    --output_path="separated_results/output.mp3"

However, I can't get the result and any error info.
1648367704(1)

My env:
Windows10
Python 3.7.5 (installed by conda)

I tried to use command line, PowerShell and git bash, and the results were the same.
Does it lack support for the windows ?

Appreciate if you can help this !

integrate with Lightning ecosystem CI

Hello and so happy to see you use Pytorch-Lightning! 🎉
Just wondering if you already heard about quite the new Pytorch Lightning (PL) ecosystem CI where we would like to invite you to... You can check out our blog post about it: Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI
As you use PL framework for your cool project, we would like to enhance your experience and offer you safe updates to our future releases. At this moment, you run tests with a particular PL version, but it may accidentally happen that the next version will be incompatible with your project... 😕 We do not intend to change anything on our project side, but still here we have a solution - ecosystem CI with testing both - your and our latest development head we can find it very early and prevent releasing eventually bad version... 👍

What is needed to do?

What will you get?

  • scheduled nightly testing configured for development/stable versions
  • slack notification if something went wrong to investigate
  • testing also on multi-GPU machine as our gift to you 🐰

cc: @Borda

No module named 'torch'

python3 setup.py install fails on missing module error. I'm on python3.9.7 on 5.14.2-arch1-2.

Help!

Help me please! wuwuwu~
Building wheel for h5py (setup.py) ... error

my mp3 doen't work

I tried a mp3, but it failed:
Using cuda for separating ..
/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn("PySoundFile failed. Trying audioread instead.")
Traceback (most recent call last):
File "/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/librosa/core/audio.py", line 149, in load
with sf.SoundFile(path) as sf_desc:
File "/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "/home/zeppsh/anaconda3/envs/music/lib/python3.7/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '2.mp3': File contains data in an unknown format.

Is there any limitation about the mp3 format?
would you please share your test mp3: ./resources/vocals_accompaniment_10s.mp3

CUDA out of memory

Hi! First, thanks for your contribution to music source separation!
Since I'd like to separate vocals & accompaniment using only 1 model, and the pre-trained model you provided has only one target, I decided to train it from scratch.
I also modify the code so that a validation set is added.

However, I cannot train it on 3 GTX 1080 with small batch size (<12), and I thought that the ResUNet_ismir doesn't require that much memory? Or does it really need a lot of GPU memory?
Could you provide some insight, like how long it takes for you to train a model and how many GPU memory it requires?

Thank you a lot!

无法提取人声(背景音可以正常提取)

--2021-10-21 23:15:39--  https://zenodo.org/record/5513378/files/resunet143_subbtandtime_vocals_8.8dB_350k_steps?download=1
正在解析主机 zenodo.org (zenodo.org)... 137.138.76.77
正在连接 zenodo.org (zenodo.org)|137.138.76.77|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 404 NOT FOUND
2021-10-21 23:15:40 错误 404:NOT FOUND。

↑ 看起来是因为模型文件404了......

命令为
python separate_scripts/separate.py --audio_path="(我的用户目录)/Desktop/test.mp3" --source_type="vocals"
全部信息如下

/Users/shihongsuo/opt/anaconda3/lib/python3.8/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
Checkpoint path: /Users/shihongsuo/bytesep_data/resunet143_subbtandtime_vocals_8.8dB_350k_steps.pth
--2021-10-21 23:15:39--  https://zenodo.org/record/5513378/files/resunet143_subbtandtime_vocals_8.8dB_350k_steps?download=1
正在解析主机 zenodo.org (zenodo.org)... 137.138.76.77
正在连接 zenodo.org (zenodo.org)|137.138.76.77|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 404 NOT FOUND
2021-10-21 23:15:40 错误 404:NOT FOUND。

Traceback (most recent call last):
  File "separate_scripts/separate.py", line 67, in <module>
    separate(args)
  File "separate_scripts/separate.py", line 26, in separate
    separator = SeparatorWrapper(
  File "/Users/shihongsuo/opt/anaconda3/lib/python3.8/site-packages/bytesep/inference.py", line 255, in __init__
    checkpoint = torch.load(self.checkpoint_path, map_location='cpu')
  File "/Users/shihongsuo/opt/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Users/shihongsuo/opt/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 764, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

wrong checkpoint file size in __main__.py

When extracting the accompaniment and error in the file is reported. The file size downloaded don't match with the informed.

The problem is in the line 127

assert os.path.getsize(checkpoint_path) == 414046363, error_message

and i think that should be:

assert os.path.getsize(checkpoint_path) == 414036369, error_message

KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to `ModelCheckpoint.save_weights_only` being set to `True`.'

Q1:
在musdb18上训练中断,通过设置pl.Trainer(resume_from_checkpoint=‘path/to/checkpoint.pth’)加载中断前的模型参数重新训练,报出以上错误。我检查了代码/opt/conda/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py,代码中 save_weights_only: bool = False,输出checkpoint,key值只有step和model,其中model的key值为
model.stft.conv_real.weight
model.stft.conv_imag.weight
model.istft.ola_window
model.istft.conv_real.weight
model.istft.conv_imag.weight
model.bn0.weight
model.bn0.bias
model.bn0.running_mean
model.bn0.running_var
model.bn0.num_batches_tracked
model.encoder_block1.conv_block.conv1.weight
model.encoder_block1.conv_block.bn1.weight
model.encoder_block1.conv_block.bn1.bias
model.encoder_block1.conv_block.bn1.running_mean
model.encoder_block1.conv_block.bn1.running_var
model.encoder_block1.conv_block.bn1.num_batches_tracked
model.encoder_block1.conv_block.conv2.weight
model.encoder_block1.conv_block.bn2.weight
model.encoder_block1.conv_block.bn2.bias
model.encoder_block1.conv_block.bn2.running_mean
model.encoder_block1.conv_block.bn2.running_var
model.encoder_block1.conv_block.bn2.num_batches_tracked
model.encoder_block2.conv_block.conv1.weight
model.encoder_block2.conv_block.bn1.weight
model.encoder_block2.conv_block.bn1.bias
model.encoder_block2.conv_block.bn1.running_mean
model.encoder_block2.conv_block.bn1.running_var
model.encoder_block2.conv_block.bn1.num_batches_tracked
model.encoder_block2.conv_block.conv2.weight
model.encoder_block2.conv_block.bn2.weight
model.encoder_block2.conv_block.bn2.bias
model.encoder_block2.conv_block.bn2.running_mean
model.encoder_block2.conv_block.bn2.running_var
model.encoder_block2.conv_block.bn2.num_batches_tracked
model.encoder_block3.conv_block.conv1.weight
model.encoder_block3.conv_block.bn1.weight
model.encoder_block3.conv_block.bn1.bias
model.encoder_block3.conv_block.bn1.running_mean
model.encoder_block3.conv_block.bn1.running_var
model.encoder_block3.conv_block.bn1.num_batches_tracked
model.encoder_block3.conv_block.conv2.weight
model.encoder_block3.conv_block.bn2.weight
model.encoder_block3.conv_block.bn2.bias
model.encoder_block3.conv_block.bn2.running_mean
model.encoder_block3.conv_block.bn2.running_var
model.encoder_block3.conv_block.bn2.num_batches_tracked
model.encoder_block4.conv_block.conv1.weight
model.encoder_block4.conv_block.bn1.weight
model.encoder_block4.conv_block.bn1.bias
model.encoder_block4.conv_block.bn1.running_mean
model.encoder_block4.conv_block.bn1.running_var
model.encoder_block4.conv_block.bn1.num_batches_tracked
model.encoder_block4.conv_block.conv2.weight
model.encoder_block4.conv_block.bn2.weight
model.encoder_block4.conv_block.bn2.bias
model.encoder_block4.conv_block.bn2.running_mean
model.encoder_block4.conv_block.bn2.running_var
model.encoder_block4.conv_block.bn2.num_batches_tracked
model.encoder_block5.conv_block.conv1.weight
model.encoder_block5.conv_block.bn1.weight
model.encoder_block5.conv_block.bn1.bias
model.encoder_block5.conv_block.bn1.running_mean
model.encoder_block5.conv_block.bn1.running_var
model.encoder_block5.conv_block.bn1.num_batches_tracked
model.encoder_block5.conv_block.conv2.weight
model.encoder_block5.conv_block.bn2.weight
model.encoder_block5.conv_block.bn2.bias
model.encoder_block5.conv_block.bn2.running_mean
model.encoder_block5.conv_block.bn2.running_var
model.encoder_block5.conv_block.bn2.num_batches_tracked
model.encoder_block6.conv_block.conv1.weight
model.encoder_block6.conv_block.bn1.weight
model.encoder_block6.conv_block.bn1.bias
model.encoder_block6.conv_block.bn1.running_mean
model.encoder_block6.conv_block.bn1.running_var
model.encoder_block6.conv_block.bn1.num_batches_tracked
model.encoder_block6.conv_block.conv2.weight
model.encoder_block6.conv_block.bn2.weight
model.encoder_block6.conv_block.bn2.bias
model.encoder_block6.conv_block.bn2.running_mean
model.encoder_block6.conv_block.bn2.running_var
model.encoder_block6.conv_block.bn2.num_batches_tracked
model.conv_block7.conv1.weight
model.conv_block7.bn1.weight
model.conv_block7.bn1.bias
model.conv_block7.bn1.running_mean
model.conv_block7.bn1.running_var
model.conv_block7.bn1.num_batches_tracked
model.conv_block7.conv2.weight
model.conv_block7.bn2.weight
model.conv_block7.bn2.bias
model.conv_block7.bn2.running_mean
model.conv_block7.bn2.running_var
model.conv_block7.bn2.num_batches_tracked
model.decoder_block1.conv1.weight
model.decoder_block1.bn1.weight
model.decoder_block1.bn1.bias
model.decoder_block1.bn1.running_mean
model.decoder_block1.bn1.running_var
model.decoder_block1.bn1.num_batches_tracked
model.decoder_block1.conv_block2.conv1.weight
model.decoder_block1.conv_block2.bn1.weight
model.decoder_block1.conv_block2.bn1.bias
model.decoder_block1.conv_block2.bn1.running_mean
model.decoder_block1.conv_block2.bn1.running_var
model.decoder_block1.conv_block2.bn1.num_batches_tracked
model.decoder_block1.conv_block2.conv2.weight
model.decoder_block1.conv_block2.bn2.weight
model.decoder_block1.conv_block2.bn2.bias
model.decoder_block1.conv_block2.bn2.running_mean
model.decoder_block1.conv_block2.bn2.running_var
model.decoder_block1.conv_block2.bn2.num_batches_tracked
model.decoder_block2.conv1.weight
model.decoder_block2.bn1.weight
model.decoder_block2.bn1.bias
model.decoder_block2.bn1.running_mean
model.decoder_block2.bn1.running_var
model.decoder_block2.bn1.num_batches_tracked
model.decoder_block2.conv_block2.conv1.weight
model.decoder_block2.conv_block2.bn1.weight
model.decoder_block2.conv_block2.bn1.bias
model.decoder_block2.conv_block2.bn1.running_mean
model.decoder_block2.conv_block2.bn1.running_var
model.decoder_block2.conv_block2.bn1.num_batches_tracked
model.decoder_block2.conv_block2.conv2.weight
model.decoder_block2.conv_block2.bn2.weight
model.decoder_block2.conv_block2.bn2.bias
model.decoder_block2.conv_block2.bn2.running_mean
model.decoder_block2.conv_block2.bn2.running_var
model.decoder_block2.conv_block2.bn2.num_batches_tracked
model.decoder_block3.conv1.weight
model.decoder_block3.bn1.weight
model.decoder_block3.bn1.bias
model.decoder_block3.bn1.running_mean
model.decoder_block3.bn1.running_var
model.decoder_block3.bn1.num_batches_tracked
model.decoder_block3.conv_block2.conv1.weight
model.decoder_block3.conv_block2.bn1.weight
model.decoder_block3.conv_block2.bn1.bias
model.decoder_block3.conv_block2.bn1.running_mean
model.decoder_block3.conv_block2.bn1.running_var
model.decoder_block3.conv_block2.bn1.num_batches_tracked
model.decoder_block3.conv_block2.conv2.weight
model.decoder_block3.conv_block2.bn2.weight
model.decoder_block3.conv_block2.bn2.bias
model.decoder_block3.conv_block2.bn2.running_mean
model.decoder_block3.conv_block2.bn2.running_var
model.decoder_block3.conv_block2.bn2.num_batches_tracked
model.decoder_block4.conv1.weight
model.decoder_block4.bn1.weight
model.decoder_block4.bn1.bias
model.decoder_block4.bn1.running_mean
model.decoder_block4.bn1.running_var
model.decoder_block4.bn1.num_batches_tracked
model.decoder_block4.conv_block2.conv1.weight
model.decoder_block4.conv_block2.bn1.weight
model.decoder_block4.conv_block2.bn1.bias
model.decoder_block4.conv_block2.bn1.running_mean
model.decoder_block4.conv_block2.bn1.running_var
model.decoder_block4.conv_block2.bn1.num_batches_tracked
model.decoder_block4.conv_block2.conv2.weight
model.decoder_block4.conv_block2.bn2.weight
model.decoder_block4.conv_block2.bn2.bias
model.decoder_block4.conv_block2.bn2.running_mean
model.decoder_block4.conv_block2.bn2.running_var
model.decoder_block4.conv_block2.bn2.num_batches_tracked
model.decoder_block5.conv1.weight
model.decoder_block5.bn1.weight
model.decoder_block5.bn1.bias
model.decoder_block5.bn1.running_mean
model.decoder_block5.bn1.running_var
model.decoder_block5.bn1.num_batches_tracked
model.decoder_block5.conv_block2.conv1.weight
model.decoder_block5.conv_block2.bn1.weight
model.decoder_block5.conv_block2.bn1.bias
model.decoder_block5.conv_block2.bn1.running_mean
model.decoder_block5.conv_block2.bn1.running_var
model.decoder_block5.conv_block2.bn1.num_batches_tracked
model.decoder_block5.conv_block2.conv2.weight
model.decoder_block5.conv_block2.bn2.weight
model.decoder_block5.conv_block2.bn2.bias
model.decoder_block5.conv_block2.bn2.running_mean
model.decoder_block5.conv_block2.bn2.running_var
model.decoder_block5.conv_block2.bn2.num_batches_tracked
model.decoder_block6.conv1.weight
model.decoder_block6.bn1.weight
model.decoder_block6.bn1.bias
model.decoder_block6.bn1.running_mean
model.decoder_block6.bn1.running_var
model.decoder_block6.bn1.num_batches_tracked
model.decoder_block6.conv_block2.conv1.weight
model.decoder_block6.conv_block2.bn1.weight
model.decoder_block6.conv_block2.bn1.bias
model.decoder_block6.conv_block2.bn1.running_mean
model.decoder_block6.conv_block2.bn1.running_var
model.decoder_block6.conv_block2.bn1.num_batches_tracked
model.decoder_block6.conv_block2.conv2.weight
model.decoder_block6.conv_block2.bn2.weight
model.decoder_block6.conv_block2.bn2.bias
model.decoder_block6.conv_block2.bn2.running_mean
model.decoder_block6.conv_block2.bn2.running_var
model.decoder_block6.conv_block2.bn2.num_batches_tracked
model.after_conv_block1.conv1.weight
model.after_conv_block1.bn1.weight
model.after_conv_block1.bn1.bias
model.after_conv_block1.bn1.running_mean
model.after_conv_block1.bn1.running_var
model.after_conv_block1.bn1.num_batches_tracked
model.after_conv_block1.conv2.weight
model.after_conv_block1.bn2.weight
model.after_conv_block1.bn2.bias
model.after_conv_block1.bn2.running_mean
model.after_conv_block1.bn2.running_var
model.after_conv_block1.bn2.num_batches_tracked
model.after_conv2.weight
model.after_conv2.bias
请问输出这种错误是什么原因呢?
Q2:
请问代码保存模型的过程您是自定义函数get_callbacks,并没有使用pl.Trainer自带的checkpoint_callback是吗?模型保存代码是checkpoint = {'step': global_step, 'model': self.model.state_dict()},仅仅保存了模型的参数,那么我想重训练的话,需要设置哪些参数呢?

Cannot install "torch"

~~it shows that "ackagesNotFoundError: The following packages are not available from current channels:
torch"

Do you know how to fix it?

Readlink & Optimizer_Type errors when preparing the audio and running training script

Thanks for sharing your great work!

Following the steps in the ReadMe, I have these error messages when preparing the audio and running the training script:

image

image

How do I solve the issues?

I am running the code on a machine with macOS Big Sur.

Also do you have ONNX version of your models to share? The main reason I was running the training code is trying to convert your PyTorch models to ONNX models, it'd help us a lot if you already have them available!

Thanks!

分离结果存在明显的高频噪音问题

你好,使用该算法重新训练模型,人声和伴奏模型效果非常理想,但基于musdb训练出的四轨分离模型,其分离出的Bass和Drums都存在明显的高频噪声;基于钢琴数据训练的钢琴音分离模型也同样存在这个问题。以Bass为例,结果与Spleeter结果的频谱图对比如下所示:
image
image
请问一下,这会是什么原因呢?训练过程中数据增强这块的参数有什么会影响这个效果吗?
期待您的回复,感谢!

主分支 使用发现两个命令和参数问题

1、readme.md记录的命令python3 -m bytesep download_checkpoints,最后不是_而是-,即python3 -m bytesep download-checkpoints
2、去伴奏的参数检查没问题,但是去人声的命令及类型为accompaniment时,检查家在的resunet143_subbtandtime_accompaniment_16.4dB_500k_steps_v2.pth文件大小时候,代码里是和414046363判断相等,实际文件大小是414036369,需要修改下__main__.py 127:65 ,414046363改为414036369
我本地这么修改是运行没有问题

Evaluation scores do not seem to improve during training

Hi, I modified this config so that there'd be 2 targets (vocals & accompaniment).
(https://github.com/bytedance/music_source_separation/blob/master/scripts/4_train/musdb18/configs/accompaniment-vocals%2Cresunet_subbandtime.yaml)
I also changed batch size from 16 to 12, and the MUSDB I use is HQ dataset (.wav)
Those are the only modifications I did.

But given the evaluation scores during training, I'm not sure it'll reach 16.x & 8.x for accompaniment and vocals at step 500001

  • Step: 0, accompaniment: -0.606, vocals: -2.908
  • Step: 10000, accompaniment: 2.662, vocals: 0.399
  • Step: 20000, accompaniment: 2.680, vocals: 0.451
  • Step: 30000, accompaniment: 2.702, vocals: 0.498
  • Step: 40000, accompaniment: 2.726, vocals: 0.518
  • Step: 50000, accompaniment: 2.719, vocals: 0.539

Thanks in advance!

poor peformance using short segment <1s

Hi,

Thanks for you grant work! The repo achieve good performance for musice separation offline.

However, for online setting(using short segment <1s), it achieve poor performance. There are some short accompaniment on separated vocals. And about -2 SDR is got. Could you provide some suggestion about how to achieve better performance.

Thanks!

Dataset for drums

In the sample Youtube video, there is a section which separates the drums.

I could not find a drum separation script and model in the repo.

Can you also share that part?

训练和预测用的默认模型不一致

在预测时用的默认模型是resunet143_subbtandtime_vocals_8.8dB_350k_steps/resunet143_subbtandtime_accompaniment_16.4dB_350k_steps.pth。训练时默认是vocals-accompaniment.unet,并且缺少optimizer_type,请问是为什么呢,如果想自己训练的话,应该用哪个呢?谢谢

现在完全无法转换

在colab 上面 現在完全無法使用了
!python3 /content/music_source_separation/bytesep/separate.py 執行這個指令都會出錯

OSError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtESt10shared_ptrIS0_EPSo

希望能解决问题,谢谢!

生成不了文件

image

如上图,执行 ./scripts/1_pack_audios_to_hdf5s/musdb18/sr=44100,chn=2.sh 操作后没有生成文件。请问是怎么回事?

WE ARE NOT ABLE TO (DOWNLOAD_CHECKPOINTS)

Hi, I'm trying to use bytedance;

But I'm facing a problem when executing this command:

python -m bytesep download_checkpoints

But I'm getting the following error:

C:\Users\lucas\MSS>python -m bytesep download_checkpoints
usage: __main__.py [-h] {download-checkpoints,separate} ...
__main__.py: error: argument mode: invalid choice: 'download_checkpoints' (choose from 'download-checkpoints', 'separate')

I would like to know how I can solve this problem and can use bytedance in my songs,

Note, I was able to use it normally until October 2021, but now I'm not doing it anymore;

My thanks in advance, regards,
Lucas.

How can i install inplace-abn

Some errors occurred during installation

The error is:
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "E:\Sotfware\anaconda\envs\torch\lib\site-packages\colorama\ansitowin32.py", line 59, in closed
return stream.closed
ValueError: underlying buffer has been detached
----------------------------------------
ERROR: Command errored out with exit status 1: 'E:\Sotfware\anaconda\envs\torch\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\HASEE\AppData\Local\Temp\pip-install-983wdizd\inplace-abn_a
aaed263bb064ef2b4c50af117b614df\setup.py'"'"'; file='"'"'C:\Users\HASEE\AppData\Local\Temp\pip-install-983wdizd\inplace-abn_aaaed263bb064ef2b4c50af117b614df\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(_file
_) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users
HASEE\AppData\Local\Temp\pip-record-c2alay5x\install-record.txt' --single-version-externally-managed --compile --install-headers 'E:\Sotfware\anaconda\envs\torch\Include\inplace-abn' Check the logs for full command output.

So how can i install inplace-abn?
thanks

ModuleNotFoundError

Hello, I have done all the installations and downloaded the templates, but when using the command

"python3 separate_scripts/separate.py" or
"./separate_scripts/separate_accompaniment.sh "resources/vocals_accompaniment_10s.mp3" "sep_accompaniment.mp3"

I get the following error:

Traceback (most recent call last):
  File "bytesep/inference.py", line 13, in <module>
    from bytesep.models.lightning_modules import get_model_class
ModuleNotFoundError: No module named 'bytesep

How can I fix this? Thanks in advance.

Checkpoint is incomplete, please download again!

您好,我成功分离了vocals之后,再去分离accompaniment的时候就提示报错提示了Checkpoint文件不完整,我看了下script和模型文件都在,重新下载了也是报同样的错

Traceback (most recent call last):
File "E:\Anaconda3\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "E:\Anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "E:\Anaconda3\lib\site-packages\bytesep_main
.py", line 242, in
separate(args)
File "E:\Anaconda3\lib\site-packages\bytesep_main
.py", line 162, in separate
config_yaml, checkpoint_path = get_paths(source_type, model_type)
File "E:\Anaconda3\lib\site-packages\bytesep_main_.py", line 127, in get_paths
assert os.path.getsize(checkpoint_path) == 414046363, error_message
AssertionError: Checkpoint is incomplete, please download again!

新的数据进行训练

您好,您训练的过程中单独设置了对musdb18的预处理,如果使用别的数据集(非您罗列出的数据集),请问该如何对数据进行预处理呢?应该如何把信息写入hdf5文件?

An error occurred

resunet143_subbtandtime_accompaniment_16.4dB_500k_steps_v2

An error occurred

assert os.path.getsize(checkpoint_path) == 414046363, error_message
AssertionError: Checkpoint is incomplete, please download again!

All-Zero Separations

Hi,

first of all, thank you for open-sourcing the code!

I've cloned the repository and downloaded the pretrained models with the ./separate_scripts/download_checkpoints.sh script.

By looking at the weights of the model (resunet143_ismir2021_vocals_8.9dB_350k_steps.pth), I noticed that the running_var of the batchnorm layers in the first decoding blocks has very high mean values, even reaching 2*10^4: is this expected?

The reason I'm asking is the following.
I'm running inference using this script:

#!/bin/bash

WORKSPACE=<my_workspace_path>

echo "WORKSPACE=${WORKSPACE}"
export PYTHONPATH=./

# Users can modify the following config file.
TRAIN_CONFIG_YAML="scripts/4_train/musdb18/configs/vocals-accompaniment,resunet_ismir2021.yaml"

CHECKPOINT_PATH="<my_path_to_checkpoints>/resunet143_ismir2021_vocals_8.9dB_350k_steps.pth"

IN_FILE="<my_path_to_demo_songs>/PanicStation.wav"
OUT_FILE="<my_path_to_results>/PS_separations.wav"

# Inference
python3 bytesep/inference.py \
    --config_yaml=$TRAIN_CONFIG_YAML \
    --checkpoint_path=$CHECKPOINT_PATH \
    --audio_path=$IN_FILE \
    --output_path=$OUT_FILE

The output contains all zeros, in all channels.

By looking at the intermediate feature maps, I noticed that their values increase while going deeper into the architecture. This happens gradually, until the variable x_center at this line contains values such as 10^36.
At that point, I get infinity, followed by NaNs, which eventually result in all-zeros output.

I'm not sure this is related to the high values in the running_var, but the deeper architecture is the only one that causes this trouble (the simple UNet does not).

Does this happen to some of you as well?

About bass and drums separation

Congratulation, your model is spectacular and far surpasses anything I have tested so far. Incredible !!! I don't have a GPU to train the model, so I hope you can provide the checkpoints to extract the bass and drums. Either way, thank you very much for your work and good luck with your future research.

是否开源instruments_dataset?

在尝试训练maestro的钢琴音分离模型时,发现需要混音数据,而在下载混音数据的脚本
scripts/0_download_datasets/instruments.sh 中没有找到下载链接。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.