Coder Social home page Coder Social logo

ai-research-code's Introduction

Sony AI Research Code

This repository contains code related to research papers in the area of Machine Learning and Artificial Intelligence, that have been published by Sony. We believe in transparent and reproducible research and therefore want to offer a quick and easy access to our findings. Hopefully, others will benefit as much from them as we did.

Available Code

Uhlich, Stefan and Mauch, Lukas and Cardinaux, Fabien and Yoshiyama, Kazuki and Garcia, Javier Alonso and Tiedemann, Stephen and Kemp, Thomas and Nakamura, Akira. Published at the 8th International Conference on Learning Representations (ICLR) 2020 arXiv technical report (arXiv 1905.11452)

Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In particular, mixed precision networks achieve better performance than networks with homogeneous bitwidth for the same size constraint. Since choosing the optimal bitwidths is not straight forward, training methods, which can learn them, are desirable. Differentiable quantization with straight-through gradients allows to learn the quantizer's parameters using gradient methods. We show that a suited parametrization of the quantizer is the key to achieve a stable training and a good final performance. Specifically, we propose to parametrize the quantizer with the step size and dynamic range. The bitwidth can then be inferred from them. Other parametrizations, which explicitly use the bitwidth, consistently perform worse. We confirm our findings with experiments on CIFAR-10 and ImageNet and we obtain mixed precision DNNs with learned quantization parameters, achieving state-of-the-art performance.

NNabla implementation of CrossNet-Open-Unmix (X-UMX) is an improved version of Open-Unmix (UMX) for music source separation. X-UMX achieves an improved performance without additional learnable parameters compared to the original UMX model. Details of X-UMX can be found in our paper.

Related Projects: x-umx | open-unmix-nnabla | open-unmix-pytorch | musdb | museval | norbert

The Model

As shown in Figure (b), X-UMX has almost the same architecture as the original UMX, but only differs by two additional average operations that link the instrument models together. Since these operations are not DNN layers, the number of learnable parameters of X-UMX is the same as for the original UMX and also the computational complexity is almost the same. Besides the model, there are two more differences compared to the original UMX. In particular, Multi Domain Loss (MDL) and a Combination Loss (CL) are used during training, which are different from the original loss function of UMX. Hence, these three contributions, i.e., (i) Crossing architecture, (ii) MDL and (iii) CL, make the original UMX more effective and successful without additional learnable parameters.

This is the official implementation of Out-of-core Training for Extremely Large-Scale Neural Networks With Adaptive Window-Based Scheduling.

We provide OoC feature as one of nnabla's utilities. You can enable OoC training on your nnabla script with only a few additional lines. Please see the document for more details!

While large neural networks demonstrate higher performance in various tasks, training large networks is difficult due to limitations on GPU memory size. We propose a novel out-of-core algorithm that enables faster training of extremely large-scale neural networks with sizes larger than allotted GPU memory. Under a given memory budget constraint, our scheduling algorithm locally adapts the timing of memory transfers according to memory usage of each function, which improves overlap between computation and memory transfers. Additionally, we apply virtual addressing technique, commonly performed in OS, to training of neural networks with out-of-core execution, which drastically reduces the amount of memory fragmentation caused by frequent memory transfers. With our proposed algorithm, we successfully train ResNet-50 with 1440 batch-size with keeping training speed at 55%, which is 7.5x larger than the upper bound of physical memory. It also outperforms a previous state-of-the-art substantially, i.e. it trains a 1.55x larger network than state-of-the-art with faster execution. Moreover, we experimentally show that our approach is also scalable for various types of networks.

This is the official implementation of Data Cleansing for Deep Neural Networks with Storage-efficient Approximation of Influence Functions

Identifying the influence of training data for data cleansing can improve the accuracy of deep learning. An approach with stochastic gradient descent (SGD) called SGD-influence to calculate the influence scores was proposed, but, the calculation costs are expensive. It is necessary to temporally store the parameters of the model during training phase for inference phase to calculate influence sores. In close connection with the previous method, we propose a method to reduce cache files to store the parameters in training phase for calculating inference score. We only adopt the final parameters in last epoch for influence functions calculation. In our experiments on classification, the cache size of training using MNIST dataset with our approach is 1.236 MB. On the other hand, the previous method used cache size of 1.932 GB in last epoch. It means that cache size has been reduced to 1/1,563. We also observed the accuracy improvement by data cleansing with removal of negatively influential data using our approach as well as the previous method. Moreover, our simple and general proposed method to calculate influence scores is available on our auto ML tool without programing, Neural Network Console. The source code is also available.

This is the official NNabla implementation of D3Net, densely connected multidilated convolutional networks for dense prediction tasks which is accepted at CVPR 2021.

Takahashi, Naoya, and Yuki Mitsufuji. "Densely connected multidilated convolutional networks for dense prediction tasks." arXiv preprint arXiv:2011.11844 (2021).

Tasks that involve high-resolution dense prediction require a modeling of both local and galobal patterns in a large input field. Although the local and global structures often depend on each other and their simultaneous modeling is important, many convolutional neural network (CNN)- based approaches interchange representations in different resolutions only a few times. In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net). D3Net involves a novel multidilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously. By combining the multidilated convolution with the DenseNet architecture, D3Net incorporates multiresolution learning with an exponentially growing receptive field in almost all layers, while avoiding the aliasing problem that occurs when we naively incorporate the dilated convolution in DenseNet. Experiments on the image semantic segmentation task using Cityscapes and the audio source separation task using MUSDB18 show that the proposed method has superior performance over stateof-the-art methods.

This is the official NNabla implementation of NVC-Net, an end-to-end adversarial voice conversion approach.

Nguyen, Bac, and Fabien Cardinaux. "NVC-Net: End-to-End Adversarial Voice Conversion." arXiv preprint arXiv:2106.00992 (2021).

Voice conversion has gained increasing popularity in many applications of speech synthesis. The idea is to change the voice identity from one speaker into another while keeping the linguistic content unchanged. Many voice conversion approaches rely on the use of a vocoder to reconstruct the speech from acoustic features, and as a consequence, the speech quality heavily depends on such a vocoder. In this paper, we propose NVC-Net, an end-to-end adversarial network, which performs voice conversion directly on the raw audio waveform of arbitrary length. By disentangling the speaker identity from the speech content, NVC-Net is able to perform non-parallel traditional many-to-many voice conversion as well as zero-shot voice conversion from a short utterance of an unseen target speaker. Importantly, NVC-Net is non-autoregressive and fully convolutional, achieving fast inference. Our model is capable of producing samples at a rate of more than 3600 kHz on an NVIDIA V100 GPU, being orders of magnitude faster than state-of-the-art methods under the same hardware configurations. Objective and subjective evaluations on non-parallel many-to-many voice conversion tasks show that NVC-Net obtains competitive results with significantly fewer parameters.

This is the official implementation of models and experiments for the INTERSPEECH 2023 paper "Towards Robust FastSpeech 2 by Modelling Residual Multimodality" (Kögel, Nguyen, Cardinaux 2023).

This repository contains a PyTorch implementation of FastSpeech 2 with adapted variance predictors and Trivariate-Chain Gaussian Mixture Modelling (TVC-GMM) proposed in our paper. Additionally it contains scripts to export audio and calculate metrics to recreate the experiments presented in the paper.

State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech. For expressive speech datasets however, we observe characteristic audio distortions. We demonstrate that such artefacts are introduced to the vocoder reconstruction by over-smooth mel-spectrogram predictions, which are induced by the choice of mean-squared-error (MSE) loss for training the mel-spectrogram decoder. With MSE loss FastSpeech 2 is limited to learn conditional averages of the training distribution, which might not lie close to a natural sample if the distribution still appears multimodal after all conditioning signals. To alleviate this problem, we introduce TVC-GMM, a mixture model of Trivariate-Chain Gaussian distributions, to model the residual multimodality. TVC-GMM reduces spectrogram smoothness and improves perceptual audio quality in particular for expressive datasets as shown by both objective and subjective evaluation.

ai-research-code's People

Contributors

akiohayakawa-sony avatar bacnguyencong-sony avatar fabiencardinaux avatar jx-huading avatar kazukiyoshiyama-sony avatar kenji-suzuki-s avatar krishnaw10 avatar qiiajia avatar siddharthnijhawan avatar srinidhi-srinivasa avatar takuyanarihira avatar takuyayashima avatar te-basavarajmurali avatar te-kevingeorge avatar tomonobutsujikawa avatar yukiooobuchi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ai-research-code's Issues

Chinese supported?

Thank you for opensourcing the code. I'd like to know is mandarin Chinese supported in this project?

test wavfile using cpu gives Segmentation fault

Hi,

I am using a pre-train model in a CPU environment with --context cpu option. my mixed wave file is 2 min long and I have a 16GB ram and an octa-core cpu system.

I am trying to run the test command for the pre-trained model and it gives segmentation fault. please check the below command and logs.
command: python test.py --input inputs/dm_mixed_vocal_and music_0002.wav --context cpu --model model/x-umx.h5 --outdir outputs/
logs:
2021-01-08 02:02:41,221 [nnabla][INFO]: Initializing CPU extension...
Segmentation fault (core dumped)

why this shows segmentation fault I don't know? Also, my system ram is not full at the time of segfault.
please suggest some ideas to resolve this issue.

【NVC-net】Failed `it != items_.end()`: Any of [cudnn:float, cuda:float, cpu:float] could not be found in []

Hi. I tried to train NVC-net, but the following error occurs:

2021-11-26 03:23:06,638 [nnabla][INFO]: Initializing CPU extension...
2021-11-26 03:23:06,997 [nnabla][INFO]: Initializing CUDA extension...
2021-11-26 03:23:09,117 [nnabla][INFO]: Initializing cuDNN extension...
value error in query
/home/gitlab-runner/builds/zxvvzZDJ/1/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:69
Failed  it != items_.end() : Any of [cudnn:float, cuda:float, cpu:float] could not be found in []

No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2021-11-26 03:23:09,406 [nnabla][INFO]: Training data with 103 speakers.
2021-11-26 03:23:09,407 [nnabla][INFO]: DataSource with shuffle(True)
2021-11-26 03:23:09,464 [nnabla][INFO]: Using DataIterator
Running epoch=1 lr=0.00010
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate again.
Error during forward propagation:

Environment: Tesla T4, Cuda 10.2, Cudnn 8.1, Ubuntu 18.04.4 LTS.

I installed nnabla with pip install nnabla-ext-cuda102.
Besides, if I want to train the model with only one GPU, is python3 main.py the right command?

【NVC-Net】ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory

image

and the details on the GPU is -
GPU: Tesla P100-PCIE-16gb
driver version: 450.119.04
CUDA version: 11.0

I'm using Kaggle to try and run this and keep running into this error. I tried running the docker file but that has no success in successfully running either. I've tried many online methods to resolve this but none seem to be working unfortunately. Could anyone potentially help with this issue?

[X-UMX] Bad performance when using --targets

Hi,

I am doing some tests on the Google Collab you provide and I've seen that the performance varies a lot if I use the flag --targets vocals respect to if I just run the default test command:
!python test.py --inputs $filename --out-dir results --model models/x-umx.h5

Why is this happening? My goal is to have audio + accompaniment

Many thanks in advance,

Guillem

Value error in nnabla on running x-umx on Rpi4, Raspberry Pi OS

On running x-umx on Rpi4, 8GB on Raspberry Pi OS i get the below error,

root@raspberrypi:/home/pi/x-umx# python3 test.py --inputs ../Music/test_16k_S16_LE_stereo.wav --context cpu --model /home/pi/x-umx/x-umx.h5 --outdir /home/pi/x-umx/results
2021-01-30 16:37:37,007 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "test.py", line 198, in
test()
File "test.py", line 170, in test
residual_model=args.residual_model
File "test.py", line 84, in separate
mix_spec, msk, _ = unmix_target(audio_nn, test=True)
File "/home/pi/x-umx/model.py", line 300, in call
lstm_out_bass = self.lstm(cross_1, nb_samples, "lstm_bass", test)
File "/home/pi/x-umx/model.py", line 231, in lstm
bidirectional=not self.unidirectional, training=not test, dropout=0.4, name=scope_name)
File "", line 8, in lstm
File "/usr/local/lib/python3.7/dist-packages/nnabla/parametric_functions.py", line 1567, in lstm
return F.lstm(x, h, c, weight_l0=w0, weight=w, bias=b, num_layers=num_layers, dropout=dropout, bidirectional=bidirectional, training=training)
File "", line 3, in lstm
File "/usr/local/lib/python3.7/dist-packages/nnabla/function_bases.py", line 222, in lstm
return F.LSTM(ctx, num_layers, dropout, bidirectional, training)(*inputs, n_outputs=n_outputs, auto_forward=get_auto_forward(), outputs=outputs)
File "function.pyx", line 292, in nnabla.function.Function.call
File "function.pyx", line 271, in nnabla.function.Function.cg_call
RuntimeError: value error in setup_impl
/home/pi/x-umx/nnabla/src/nbla/function/./generic/split.cpp:36
Failed num_outputs
== outputs.size(): inputs[0].shape[axis] must be the same number as the outputs. inputs[0].shape[axis]: 431, outputs: 2.

I have successfully manually built & installed nnabla & llvmlite.
The latter was really very difficult to build & install.
root@raspberrypi:/home/pi/x-umx# pip3 freeze | grep 'nnabla'
nnabla==1.9.0

I think, now it is throwing error related to nnabla, of the input parameter size not equivalent to
output parameter size. Can you please suggest, where we need to set this nnabla files ?

Please help me.
Regards,
Rajiv.

Memory allocation failed

I tried to train with 2 GPUs by docker, but after one epoch, memory errors in allocation occur. I am not sure what to check and what's wrong possibly.
image

Segmentation fault and RuntimeError: value error in setup_impl

Hi,
I am trying to train using the X-UMX model on Google Colab, getting stuck on this error. Kindly help.

2022-07-02 10:30:45,285 [nnabla][INFO]: Initializing CPU extension...
2022-07-02 10:30:45,626 [root][INFO]: Generating grammar tables from /usr/lib/python3.7/lib2to3/Grammar.txt
2022-07-02 10:30:45,643 [root][INFO]: Generating grammar tables from /usr/lib/python3.7/lib2to3/PatternGrammar.txt
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
2022-07-02 10:30:46,093 [nnabla][INFO]: Initializing CUDA extension...
2022-07-02 10:30:46,111 [nnabla][INFO]: Initializing cuDNN extension...
2022-07-02 10:30:46,558 [nnabla][INFO]: [Communicator] Using gpu_id = 0 as rank = 0
2022-07-02 10:30:46,610 [nnabla][INFO]: DataSource with shuffle(True)
Finished loading dataset with 86 tracks.
2022-07-02 10:30:52,995 [nnabla][INFO]: DataSource with shuffle(False)
Finished loading dataset with 14 tracks.
2022-07-02 10:30:54,130 [nnabla][INFO]: Using DataIterator
2022-07-02 10:30:54,131 [nnabla][INFO]: Using DataIterator
Compute dataset statistics: 100% 86/86 [01:28<00:00,  1.03s/it]
Traceback (most recent call last):
  File "train.py", line 207, in <module>
    train()
  File "train.py", line 113, in train
    model = get_model(args, scaler_mean, scaler_std, max_bin=max_bin)
  File "/content/ai-research-code/x-umx/model.py", line 431, in get_model
    mix_spec, m_hat, pred = unmix(mixture_audio)
  File "/content/ai-research-code/x-umx/model.py", line 327, in __call__
    self.n_fft, window_type='hanning', center=True)
  File "/usr/local/lib/python3.7/dist-packages/nnabla/functions.py", line 1101, in istft
    return istft_base(y_r, y_i, window_size, stride, fft_size, window_type, center, pad_mode, as_stft_backward)
  File "<istft>", line 3, in istft
  File "/usr/local/lib/python3.7/dist-packages/nnabla/function_bases.py", line 4926, in istft
    return F.ISTFT(ctx, window_size, stride, fft_size, window_type, center, pad_mode, as_stft_backward)(y_r, y_i, n_outputs=n_outputs, auto_forward=get_auto_forward(), outputs=outputs)
  File "function.pyx", line 328, in nnabla.function.Function.__call__
  File "function.pyx", line 306, in nnabla.function.Function._cg_call
RuntimeError: value error in setup_impl
/home/gitlab-runner/builds/LRsSYq-B/0/nnabla/builders/all/nnabla/src/nbla/function/./generic/istft.cpp:95
Failed `this->pad_mode_ == "constant"`: `pad_mode` should be "constant" for the normal use of ISTFT (`as_stft_backward == false`) since `pad_mode` is ignored and makes no effects in that case.

[b506019edf61:01753] *** Process received signal ***
[b506019edf61:01753] Signal: Segmentation fault (11)
[b506019edf61:01753] Signal code: Address not mapped (1)
[b506019edf61:01753] Failing at address: 0x7f0c3314f20d
[b506019edf61:01753] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f0c35bf4980]
[b506019edf61:01753] [ 1] /lib/x86_64-linux-gnu/libc.so.6(getenv+0xa5)[0x7f0c35833775]
[b506019edf61:01753] [ 2] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(_ZN13TCMallocGuardD1Ev+0x34)[0x7f0c3609ee44]
[b506019edf61:01753] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xf5)[0x7f0c35834605]
[b506019edf61:01753] [ 4] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(+0x13cb3)[0x7f0c3609ccb3]
[b506019edf61:01753] *** 
End of error message ***

Screenshot from 2022-07-02 16-13-40

Steps followed:
!pip install musdb norbert pydub
!pip install nnabla
!pip install nnabla-ext-cuda110-nccl2-mpi3-1-6
!pip uninstall urllib3 -y
!pip uninstall folium -y
!pip install folium==0.2.1
!pip install urllib3==1.25.*

!git clone https://github.com/sony/ai-research-code.git
%cd ai-research-code/x-umx
!mkdir models
!wget -P models https://nnabla.org/pretrained-models/ai-research-code/x-umx/x-umx.h5

!python train.py --root /content/drive/MyDrive/dataset --output /content/drive/MyDrive/crossnet/ --is-wav --epochs 10 --lr 0.001

X-UMX gets stuck when training

Hello i am trying to train a model using X-UMX on a single gpu. I am using the 7 second preview version of musdb just for testing.

After compute dataset statistics reaches 100% the next line is stuck at 0%

This is everything i did

cd /home/ubuntu/Downloads/ai-research-code/x-umx
ubuntu@:/Downloads/ai-research-code/x-umx$ conda activate open-unmix-nnabla-gpu
(open-unmix-nnabla-gpu) ubuntu@-:~/Downloads/ai-research-code/x-umx$ python train.py --output /home/ubuntu/Downloads/ai-research-code/x-umx/weights
2021-07-27 23:18:56,894 [nnabla][INFO]: Initializing CPU extension...
/home/ubuntu/anaconda3/envs/open-unmix-nnabla-gpu/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)
/home/ubuntu/anaconda3/envs/open-unmix-nnabla-gpu/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)
2021-07-27 23:18:57,502 [nnabla][INFO]: Initializing CUDA extension...
2021-07-27 23:18:57,559 [nnabla][INFO]: Initializing cuDNN extension...
value error in query
/home/gitlab-runner/builds/-phDBBa6/3/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:69
Failed it != items_.end(): Any of [cudnn:float, cuda:float, cpu:float] could not be found in []

No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2021-07-27 23:18:57,560 [nnabla][INFO]: [Communicator] Using gpu_id = 0 as rank = 0
Mixing coef. is 10.0, i.e., MDL = 10.0*TD-Loss + FD-Loss
2021-07-27 23:18:57,561 [nnabla][INFO]: DataSource with shuffle(True)
tracks=80
2021-07-27 23:18:59,025 [nnabla][INFO]: DataSource with shuffle(False)
tracks=14
2021-07-27 23:18:59,289 [nnabla][INFO]: Using DataIterator
2021-07-27 23:18:59,289 [nnabla][INFO]: Using DataIterator
max_iter 320
Compute dataset statistics: 100%|███████████████| 80/80 [00:12<00:00, 8.11it/s]
0%| | 0/1000 [00:00<?, ?it/s]

The GPU memory and power is being used but nothing seems to be happening, please can anyone help me?

Large delay during inference

I'm running the nvcnet model after training and during inference there is a large time delay. The first inference is always large being around 3 seconds. All others have delays of about 1.2 seconds. The length of the wav file input doesn't change the delay.

After looking deeper it appears the delay is caused by the model construction. Is there a way to create the model object once and just do inference?

The delay makes the numbers posted in the paper to be false as you won't get the fast inference times that are published in the paper with these delays.

Any thoughts? I am still in the process of learning nnabla so maybe I am missing something about the library.

Best regards,
Philip

While running the d3net music seperation jupyter notebook in collab "AttributeError: module 'pynvml.nvml' has no attribute 'nvml_lib'" error is coming

2021-06-07 12:26:53,837 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "separate.py", line 96, in
ch_flip_average=True
File "separate.py", line 27, in run_separation
ctx = get_extension_context(args.context)
File "/usr/local/lib/python3.7/dist-packages/nnabla/ext_utils.py", line 97, in get_extension_context
mod = import_extension_module(ext_name)
File "/usr/local/lib/python3.7/dist-packages/nnabla/ext_utils.py", line 46, in import_extension_module
return importlib.import_module('.' + ext_name, 'nnabla_ext')
File "/usr/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/dist-packages/nnabla_ext/cudnn/init.py", line 18, in
import nnabla_ext.cuda
File "/usr/local/lib/python3.7/dist-packages/nnabla_ext/cuda/init.py", line 114, in
check_gpu_compatibility()
File "/usr/local/lib/python3.7/dist-packages/nnabla_ext/cuda/init.py", line 71, in check_gpu_compatibility
from nnabla.utils.nvml import pynvml
File "/usr/local/lib/python3.7/dist-packages/nnabla/utils/nvml.py", line 39, in
load_nvml_for_win()
File "/usr/local/lib/python3.7/dist-packages/nnabla/utils/nvml.py", line 26, in load_nvml_for_win
if not (nvml.nvml_lib == None and sys.platform[:3] == "win"):
AttributeError: module 'pynvml.nvml' has no attribute 'nvml_lib'

NVC-Net Training

Hi, thanks for releasing the code for NVC-Net. I've got two questions:

Firstly, when trying to train on multiple GPUs, I run into the following error:

Failed `it != items_.end()`: Any of [cudnn:float, cuda:float, cpu:float] could not be found in []
No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.

which basically means it's only running on one GPU. In fact I get the same error simply by running the following

import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context
ctx = get_extension_context("cudnn", device_id='0')
C.MultiProcessDataParallelCommunicator(ctx)

I know this is probably more of a nnabla issue but as a PyTorch user I'm not sure where to get help with nnabla.

Secondly, is it normal for the content preservation loss g_loss_con to be 0.0 for the first few epochs? I'm finding that the encoder basically encodes everything to the same vector in the hidden dimension, hence the loss is 0.0. For reference I'm also using the VCTK dataset processed with the given script with default parametres.

Thanks alot!

【NVC-Net】RuntimeError: target_specific error in backward_impl. Failed `status == CUDNN_STATUS_SUCCESS`: UNKNOWN

Hi, I try to train NVC-Net on single gpu, but I meet some errors as follows:

value error in query
/home/gitlab-runner/builds/jmdP2aBr/1/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:69
Failed it != items_.end(): Any of [cudnn:float, cuda:float, cpu:float] could not be found in []

No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2022-02-15 17:16:13,887 [nnabla][INFO]: Training data with 100 speakers.
2022-02-15 17:16:13,888 [nnabla][INFO]: DataSource with shuffle(True)
2022-02-15 17:16:13,934 [nnabla][INFO]: Using DataIterator
Running epoch=1 lr=0.00010
Error during backward propagation:
Add2CudaCudnn
Add2CudaCudnn
Add2CudaCudnn
MulScalarCuda
MeanCudaCudnn
SquaredErrorCuda
Div2Cuda
PowScalarCuda
SumCuda
AddScalarCuda
PowScalarCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
GELUCuda
Add2CudaCudnn
ConvolutionCudaCudnn
Mul2Cuda
TanhCudaCudnn <-- ERROR
Traceback (most recent call last):
File "main.py", line 99, in
run(args)
File "main.py", line 70, in run
Trainer(gen, gen_optim, dis, dis_optim, dataloader, rng, hp).run()
File "11_ai-research-code-master/nvcnet/train.py", line 157, in run
self.train_on_batch(i)
File "11_ai-research-code-master/nvcnet/train.py", line 197, in train_on_batch
p['g_loss'].backward(clear_buffer=True)
File "_variable.pyx", line 826, in nnabla._variable.Variable.backward
RuntimeError: target_specific error in backward_impl
/home/gitlab-runner/builds/-phDBBa6/0/nnabla/builders/all/nnabla-ext-cuda/src/nbla/cuda/cudnn/function/./generic/tanh.cu:79
Failed status == CUDNN_STATUS_SUCCESS: UNKNOWN

I had followed the install page: https://nnabla.org/install/, but it does not work. Could you please give some suggestion?
My environments as follows:
CUDA11.0, cudnn 8.1.0, python 3.6.8

Thank you ! Look forward to your kind reply.

Adding additional speakers - transfer learning

Has anyone figured out a way to use this algorithm to due transfer learning?

Say I train with 100 speakers and want to train the model with an additional 20 speakers. It appears that you have to retrain from the start rather than adding a set of 20 new latent spaces and training this new data.

Anyone tried this? Would be great to be able to transfer what's been learned, but tough on GANs.

Bests,
philip

Bad output sound quality

I noticed that output quality is much worse than in input file. Maybe there is some config, which cuts some frequencies from output file?

In spleeter there is issue like that, which can be resolved with simply config change.

Both pretrained models (openvino and default) provide same bad quality. Sounds like high frequencies are cutted off.

Thanks in advance!

X-UMX - Separating using --context cpu takes very long

Thank you so much for open-sourcing X-UMX!

Is it in a usable state right now?

I followed your instructions to perform source separation with the model as listed here.

I ran test.py with the flag --context cpu and separation is taking a very long time, it took over 15 minutes for separating a 3-minute track. I have 16 GB of RAM. Is CPU-based separation supposed to take this long?

Update: It ended up exhausting all my memory and crashed.

This giving the following errors below

C:\Users\Vinicius111\Downloads\xumx-master\xumx-master\xumx>python test.py --input inputs/"C:\Users\Vinicius111\Music\Test.mp3"python test.py ----model model/C:\Users\Vinicius111\Downloads\x-umx.h5 --outdir outputs/
2021-01-30 12:04:32,295 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "test.py", line 28, in
from .args import get_inference_args
ModuleNotFoundError: No module named 'main.args'; 'main' is not a package

No pretrained models

Hello!
I see now all links for pretrained models are 404, where can I get these models?

D3Net: inference on CPU

Separation of one source (vocals) from 3-minute track using D3Net takes ~2.5 hours on machine with 4 cores. Is there a way to speed up inference on CPU?

[Quantized Depth Completion] Questions about implementation details

First of all, thanks for the great work, but the source code is still missing.
Could you share the training/evaluating code and pretrained weights about this work?

Also, I'm trying to reimplement with PyTorch and I have some questions about the paper:

  1. How to compute the surface normal from ground truth depth in NYU Depth v2? The paper only shows the approximation for training but not the accurate one?!
  2. The dot pattern to produce sparse depth in NYU Depth v2 is unknown. Can you share the example to reproduce?
  3. The kernel_size of MaxPooling2D is missing
  4. The kernel_size, filter_size of Conv2d in Upsampling layer is missing

Thanks! Look forward to your kind reply.

[NVC-Net] About 16 kHz training and model convergence

Hi,

Thank you for sharing your great work!

I'm using nvcnet to train a Japanese voice conversion model, I have two questions.

First, I try to adapt your code to 16 kHz wavs, I did the following two manipulations:

  1. changed sr in hparams.py from 22050 into 16000
  2. changed segment_length in hparams.py from 32768 into 16384
    The training goes well but the performance is bad even after 400 epochs.

I wonder if you have any idea on training nvcnet on 16 kHz wavs? Do I need any other modifications to ensure the training will go well ?

Second, could you share the value of g_loss_rec when the model converges?.
In my training the g_loss_rec converged to around 0.9 to 1.2, I'm not sure if this is what I should expect in model convergence.

additional conda env dependencies needed

to get this working I had to make the following changes to environment-gpu.yml

name: open-unmix-nnabla-gpu

channels:

  • conda-forge

dependencies:
- pip

  • python=3.6
  • numpy=1.16
  • scikit-learn=0.21
  • tqdm=4.28
  • cudatoolkit=10.0
  • cudnn
  • ffmpeg
  • pip:
    - soundfile
    • musdb
    • norbert
    • resampy
    • nnabla
    • nnabla-ext-cuda100
    • pydub

outputs not saved

the output wav files are not saved on disk anywhere, i tried multiple combinations, without the output argument, with the output argument, etc, none seem to work.

NVCnet g_loss_con=0.0000 while training

i met two problems
first one is the same as the issue"#54" mentioned before
i tried to use docker environment that suggested in that issue
`docker pull nnabla/nnabla-ext-cuda-multi-gpu:py37-cuda110-mpi3.1.6-v1.29.0
docker run --rm -it -u $(id -u):$(id -g) --gpus all nnabla/nnabla-ext-cuda-multi-gpu:py37-cuda110-mpi3.1.6-v1.29.0

mpirun -n 2 python3 -c "import nnabla_ext.cudnn; from nnabla.ext_utils import get_extension_context; import nnabla.communicators as C; ctx = get_extension_context('cudnn', device_id='0'); C.MultiProcessDataParallelCommunicator(ctx)"and it went well ![image](https://user-images.githubusercontent.com/63532787/201576169-9c27d8c4-b17e-438d-92bd-19843e2b984e.png) but then i tried to run the main.py and set the batchsize=8 and the error like thiswzy@2f0a2b4b4485:~/NVCnet$ mpirun -n 1 python main.py -c cudnn -d 7 --output_path log/baseline-wzy/ --batch_size 8
2022-11-14 04:17:10,938 [nnabla][INFO]: Initializing CPU extension...
2022-11-14 04:17:11,300 [nnabla][INFO]: Initializing CUDA extension...
2022-11-14 04:17:20,009 [nnabla][INFO]: Initializing cuDNN extension...
2022-11-14 04:17:20,359 [nnabla][INFO]: Training data with 103 speakers.
2022-11-14 04:17:20,360 [nnabla][INFO]: DataSource with shuffle(True)
2022-11-14 04:17:20,371 [nnabla][INFO]: Using DataIterator
Running epoch=1 lr=0.00010
[ 0/4689] d_loss 4.1589 (4.1589) g_loss_avd 2.0793 (2.0793) g_loss_con 0.0000 (0.0000) g_loss_rec 58.3829 (58.3829) g_loss_kld 0.0000 (0.0000)
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate again.
Error during backward propagation:
Add2CudaCudnn
Add2CudaCudnn
Add2CudaCudnn
MulScalarCuda
MeanCudaCudnn
SquaredErrorCuda
Div2Cuda
PowScalarCuda
SumCuda
AddScalarCuda
PowScalarCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
GELUCuda
Add2CudaCudnn
ConvolutionCudaCudnn
Mul2Cuda
TanhCudaCudnn
SigmoidCudaCudnn
SliceCuda <-- ERROR
Traceback (most recent call last):
File "main.py", line 100, in
run(args)
File "main.py", line 70, in run
Trainer(gen, gen_optim, dis, dis_optim, dataloader, rng, hp).run()
File "/home/wzy/NVCnet/train.py", line 156, in run
self.train_on_batch(i)
File "/home/wzy/NVCnet/train.py", line 196, in train_on_batch
p['g_loss'].backward(clear_buffer=True)
File "_variable.pyx", line 827, in nnabla._variable.Variable.backward
RuntimeError: memory error in alloc
/home/gitlab-runner/builds/LRsSYq-B/0/nnabla/builders/all/nnabla/src/nbla/memory/memory.cpp:39
Failed this->alloc_impl(): N4nbla10CudaMemoryE allocation failed.

--------------------------------------------------------------------------`

when i changed the batchsize to 7 or 6 or 2,the glosscon is 0 all the time
image

this is the result of nvidia-msi
image

plz,enlighten me

【NVC-Net】How many epochs will the model converge?

e.g. For the VTCK dataset

Besides, have you tested whether the model is robust with noisy source files (e.g. recorded by mobile phone, with background of air conditioning, or heavy breathing, which is quite common in real life application) at inference time?

Thank you very much

Potential bug in xumx

I'm trying to run xumx (through https://github.com/JeffreyCA/spleeterweb-xumx, but it seems to be the same code).

In this line:

audio_nn = nn.Variable.from_numpy_array(audio.T[None, ...])

The function documentation says audio should be in the shape:

    audio: np.ndarray [shape=(nb_samples, nb_channels, nb_timesteps)]
        mixture audio

However, the ndarray is converted to a nnabla object with a transpose operator and a new dimension added:

    audio_nn = nn.Variable.from_numpy_array(audio.T[None, ...])

So when I pass audio like so:

x.shape: (1, 2, 9265664)

the audio_nn line results in this:

audio_nn: (1, 9265664, 2, 1)

Later on this fails in the STFT step from the __call__ function of the model:

nb_samples, nb_channels, _ = x.shape

It's better to have:

    audio_nn = nn.Variable.from_numpy_array(audio)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.