Coder Social home page Coder Social logo

kuielab / mdx-net Goto Github PK

View Code? Open in Web Editor NEW
182.0 12.0 21.0 13.08 MB

KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

Home Page: https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021/

License: MIT License

Shell 0.74% Python 99.26%
mdx-challenge music-source-separation source-separation pytorch pytorch-lightning hydra wandb

mdx-net's People

Contributors

intelligence-engineering-lab-ku avatar rlaalstjr47 avatar ws-choi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mdx-net's Issues

Error on building pesq "pesq/cypesq.c:6:10: fatal error: Python.h: No such file or directory"

Error encountered while "pip install -r requrements.txt"

image

Solution:
We need to install libpythonX.X-dev to build pesq.
The version should be followed your SYSTEM PYTHON VERSION, not conda python version, because pesq will build by gcc on your system

ex)
(mdx-net) cnh2769@SPV02:~/_Project/mdx-net$ which python3
/home/cnh2769/anaconda3/envs/mdx-net/bin/python3
(mdx-net) cnh2769@SPV02:~/_Project/mdx-net$ python3 --version
Python 3.8.11
(mdx-net) cnh2769@SPV02:~/Project/mdx-net$ /usr/bin/python3 --version
Python 3.7.11
(mdx-net) cnh2769@SPV02:~/Project/mdx-net$ sudo apt install libpython3.7
-dev

yaml for 12, 13 servers

YAML

  • .yaml files for 12 and 13 servers
  • will be located in
    • configs/experiment/12_${target_name}.yaml
    • configs/experiment/13_${target_name}.yaml

parameter checklist

  • data augmentation: not available for now (available from 0706)
  • seed you want to use
  • datamodule:
    • batch_size
    • target_name
  • conv_tdf
    • lr
    • optimizer
    • num_blocks
    • l
    • g
    • k
    • dim_f
    • dim_t
    • n_fft
    • hop_length
    • bn
    • bias

.ENV File

  • your wandb api key (send me using kakaotalk)

Final model configs

dim_t: 256
model:
num_blocks: 11
l: 3 => 4
g: 32 => 36
k: 3
bn: 8
bias: False

augmentation: pitch=[-2,+2], tempo=[-20,+20]

Data Augmentation

  • implement test code for below
    data.zip

Originally posted by @rlaalstjr47 in #3 (comment)

TODO

  • gen script with hydra
  • auto metadata caching for data augmentation
  • parameterized data aug with hydra
  • debugging to check if it is well-designed

How to train a model to separate an instrument

Hi all,

This might be a silly question
When I use a tool named ultimate voice remover, I noticed the excellent behavior from mdx-net models. So I plan to use this to train a model to separate special instruments from an ensemble. But I have got no idea how to start the work.

Thanks a lot!

How do you save the mixer model?

Hi !

I am trying to train the mixer model but it only saves a .ckpt file that is around 58mb. When i run predict_blend with my mixer checkpoint I get this error

RuntimeError: Error(s) in loading state_dict for Mixer: Missing key(s) in state_dict: "linear.weight". Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "hparams_name", "hyper_parameters".

In your repo the mixer model is very very small and it says to "save .pt, the only learnable parameters in Mixer" Could you tell me how to do this please?

Thanks !

Encounter a error while executing training process

Hi, I followed all you mentioned step. And I executed python run.py experiment=multigpu_vocals model=ConvTDFNet_vocals to start training then encounter following problem. ๐Ÿ’”

RuntimeError: Early stopping conditioned on metric `val/sdr` which is not available. Pass in or modify your `EarlyStopping` callback to use any of the following: `train/loss`, `train/loss_step`, `train/loss_epoch`

Appreciate to get your reply. โœจโœจ

[Torch 1.13.1>2.0.0) RuntimeError: istft requires a complex-valued input tensor matching the output from stft with return_complex=True

I'm using this Colab. It is sketchily modified to work with current Python 3.10 environment introduced in the end of April in all Colabs.
It uses onnxruntime 1.14 to fix ModuleNotFoundError: No module named 'onnxruntime' error and following code modification in line 57 in this main.py

self.onnx_models[c] = ort.InferenceSession(os.path.join(args.onnx,model.target_name+'.onnx'), providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

to fix following error
TypeError: view_as_complex(): argument 'input' (position 1) must be Tensor, not builtin_function_or_method

And beside torch, to fix more issues since the changes to Colab at the begining of the year: soundfile 0.12.1 (for random soundfile errors) and librosa 0.9.1 (for demucs to work again)

The only problem now states Torch 2.0.0 which we will be forced to use in the future when Colab will change runtime to Python 3.11 probably next year. For now, Torch 1.13.1 works, but I thought it won't harm to ask about the solution here.

The error with torch 2.0.0:

Loading checkpoint... done
2023-05-01 18:23:22.498005355 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:515 CreateExecutionProviderInstance] Failed to create TensorrtExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements to ensure all dependencies are met.
Loading onnx model... done
Processing base:   0% 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/content/drive/MyDrive/MDX_Colab/main.py", line 371, in <module>
    main()
  File "/content/drive/MyDrive/MDX_Colab/main.py", line 353, in main
    pred.prediction(
  File "/content/drive/MyDrive/MDX_Colab/main.py", line 73, in prediction
    sources = self.demix(mix.T)
  File "/content/drive/MyDrive/MDX_Colab/main.py", line 149, in demix
    base_out = self.demix_base(segmented_mix, margin_size=margin)
  File "/content/drive/MyDrive/MDX_Colab/main.py", line 188, in demix_base
    tar_waves = model.istft(torch.tensor(_ort.run(None, {'input': spek.cpu().numpy()})[0]))#.cpu()
  File "/content/drive/MyDrive/MDX_Colab/models.py", line 158, in istft
    x = torch.istft(x, n_fft=self.n_fft, hop_length=self.hop, window=self.window, center=True)
RuntimeError: istft requires a complex-valued input tensor matching the output from stft with return_complex=True.
Processing base:   0% 0/1 [00:07<?, ?it/s]

If it's actually a dependency issue, we can use tensorrt 8.6.0 (if it's actually not already), but pypi doesn't have required CUDA 11.6 (11.7 is currently used).
__
tensorrt 8.6.0 didn't help, and 11.6 is not recommended for torch 2.0.0
https://pytorch.org/blog/deprecation-cuda-python-support/
Still stuck.

Edit.
The issue is fixed. I'll write later what has been changed. The changes are based on Vocal Remover 5 Colab by Audio Hacker which is Torch 2.0 compatible.

Separation without training

Hello, is it possible to run separation using KUIELab-MDX-Net without training the model from scratch? Do you share some pretrained models? It would ease the evaluation of your solution and its usage in a real-case scenario.

How to train a model that can fully extract the 44100hz frequency

I want to train a 2 stems model

I noticed that in the yaml configuration of each model, there are some parameters that will affect the final frequency cutoff, it seems that multigpu_drums.yaml can handle the full 44100hz frequency, but with the reduction of num_blocks (11 => 9), the model size will also decrease accordingly (29mb => 21mb).

Although using something like multigpu_drums.yaml can handle 44100hz in full, but the model shrinks instead. Does this affect the final accuracy?

It seems that dim_t, hop_length, overlap, num_blocks these parameters have a wonderful complementarity that I cannot understand, maybe this 'complementarity' is designed for the competition(mix to demucs), but I want to apply this to the real world without demucs(only mdx-net, after some testing, I think the potential of mdx-net is higher than demucs).

When I try to change num_blocks from 9 to 11, the results of inference have overlapping and broken voices... do you have any good parameters recommendations for me to train a full 44100hz one without loss of accuracy (i.e. the model does not Shrinking)

initialization

current: default is xavier

minseok recommendation: remove xavier

Encountered errors while executing training process #2

(Using Leaderboard_B)
First I was stuck solving the environment and I let it sit for 30 min, but conda never finished creating the env from the yml.
Because I was using a cloud instance, I didn't have time to wait and I did this instead:

conda create -n mdx-net
conda update conda
conda config --add channels conda-forge
conda activate mdx-net
sudo apt-get install soundstretch
python -m pip install -r requirements.txt
python src/utils/data_augmentation.py --data_dir /real/path/to/musdbhq/ --train True --test True

It seems that the model doesn't allow me to train it with songs that don't contain vocals.

python src/utils/data_augmentation.py --data_dir /home/ubuntu/mdx-files/musdb/ --train True --test True
 10%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰                                                                                                                                                     | 11/114 [01:13<11:25,  6.65s/it]
Traceback (most recent call last):
  File "src/utils/data_augmentation.py", line 111, in <module>
    main(parser.parse_args())
  File "src/utils/data_augmentation.py", line 30, in main
    save_shifted_dataset(p, t, data_dir, 'train')
  File "src/utils/data_augmentation.py", line 92, in save_shifted_dataset
    source = load_wav(in_path.joinpath(s_name+'.wav'))
  File "src/utils/data_augmentation.py", line 102, in load_wav
    return sf.read(path, samplerate=sr, dtype='float32')[0].T
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 256, in read
    with SoundFile(file, 'r', samplerate, channels,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/home/ubuntu/mdx-files/musdb/train/Artificial Intelligence - Native Instruments/vocals.wav': System error.

I deleted the songs that didn't contain vocals, then the data augmentation succeeded, but all attempts to train failed and I didn't have time to do debugging in the cloud GPU instance.

Here is the output from: python run.py experiment=multigpu_other model=ConvTDFNet_other

/usr/lib/python3/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/lib/python3/dist-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
  warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
  File "run.py", line 7, in <module>
    from pytorch_lightning.utilities import rank_zero_info
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in <module>
    from torchmetrics import Accuracy as _Accuracy
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/__init__.py", line 14, in <module>
    from torchmetrics import functional  # noqa: E402
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/__init__.py", line 14, in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit, pit_permutate
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/__init__.py", line 26, in <module>
    from torchmetrics.functional.audio.pesq import perceptual_evaluation_speech_quality  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/pesq.py", line 20, in <module>
    import pesq as pesq_backend
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/__init__.py", line 5, in <module>
    from ._pesq import pesq, pesq_batch
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/_pesq.py", line 8, in <module>
    from .cypesq import cypesq, cypesq_retvals, cypesq_error_message as pesq_error_message
  File "__init__.pxd", line 238, in init cypesq
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   35C    P0    36W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:08:00.0 Off |                    0 |
| N/A   34C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

A Baseline for SDX 2023

We are Implementing a baseline model for SDX2023.

Currently, a script to train vocals are pused to sdx_label.

  • target

    • vocals: python run.py experiment=multigpu_vocals datamodule=moisesdb23_labelnoise_v1
    • drums
    • bass
    • other
  • datamodule

    • tested on moisesdb23_labelnoise_v1
    • tested on moisesdb23 bleeding
    • on-the-fly data augmentation support
  • bug

    • multi gpus with ddp not working

Please feel free to make a PR for any items on this todo list.


wandb:

TFC-TDF-U-Net's performance on Musdb18

Our main approach is based on TFC-TDF-U-Net [3].

This model was originally proposed for Singing Voice Separation.

but it turns out that this model also performs well for other musical instruments (drums, bass, other)

[3] Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.