ksanjeevan / crnn-audio-classification Goto Github PK

View Code? Open in Web Editor NEW

369.0 8.0 80.0 3.56 MB

UrbanSound classification using Convolutional Recurrent Networks in PyTorch

License: MIT License

Python 73.79% Jupyter Notebook 26.21%

audio lstm crnn spectrogram melspectrogram convnet rnn audio-classification pytorch

crnn-audio-classification's Introduction

PyTorch Audio Classification: Urban Sounds

Classification of audio with variable length using a CNN + LSTM architecture on the UrbanSound8K dataset.

Example results:

Models
Inference
Training
Evaluation
To Do

Dependencies

soundfile: audio loading
torchparse: .cfg easy model definition
pytorch/audio: Audio transforms

Features

Easily define CRNN in .cfg format
Spectrogram computation on GPU
Audio data augmentation: Cropping, White Noise, Time Stretching (using phase vocoder on GPU!)

Models

CRNN architecture:

Printing model defined with torchparse:

AudioCRNN(
  (spec): MelspectrogramStretch(num_bands=128, fft_len=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])
  (net): ModuleDict(
    (convs): Sequential(
      (conv2d_0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_0): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_0): ELU(alpha=1.0)
      (maxpool2d_0): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
      (dropout_0): Dropout(p=0.1)
      (conv2d_1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_1): ELU(alpha=1.0)
      (maxpool2d_1): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      (dropout_1): Dropout(p=0.1)
      (conv2d_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_2): ELU(alpha=1.0)
      (maxpool2d_2): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      (dropout_2): Dropout(p=0.1)
    )
    (recur): LSTM(128, 64, num_layers=2)
    (dense): Sequential(
      (dropout_3): Dropout(p=0.3)
      (batchnorm1d_0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (linear_0): Linear(in_features=64, out_features=10, bias=True)
    )
  )
)
Trainable parameters: 139786

Usage

Inference

Run inference on an audio file:

./run.py /path/to/audio/file.wav -r path/to/saved/model.pth

Training

./run.py train -c config.json --cfg arch.cfg

Augmentation

Dataset transforms:

Compose(
    ProcessChannels(mode=avg)
    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
    ToTensorAudio()
)

As well as time stretching:

TensorboardX

Evaluation

./run.py eval -r /path/to/saved/model.pth

Then obtain defined metrics:

100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:03<00:00, 12.68it/s]
{'avg_precision': '0.725', 'avg_recall': '0.719', 'accuracy': '0.804'}

10-Fold Cross Validation

Arch	Accuracy	AvgPrecision(macro)	AvgRecall(macro)
CNN	71.0%	63.4%	63.5%
CRNN	72.3%	64.3%	65.0%
CRNN(Bidirectional, Dropout)	73.5%	65.5%	65.8%
CRNN(Dropout)	73.0%	65.5%	65.7%
CRNN(Bidirectional)	72.8%	64.3%	65.2%

Per fold metrics CRNN(Bidirectional, Dropout):

Fold	Accuracy	AvgPrecision(macro)	AvgRecall(macro)
1	73.1%	65.1%	66.1%
2	80.7%	69.2%	68.9%
3	62.8%	57.3%	57.5%
4	73.6%	65.2%	64.9%
5	78.4%	70.3%	71.5%
6	73.5%	65.5%	65.9%
7	74.6%	67.0%	66.6%
8	66.7%	62.3%	61.7%
9	71.7%	60.7%	62.7%
10	79.9%	72.2%	71.8%

To Do

commit jupyter notebook dataset exploration
Switch overt to using pytorch/audio
use torchaudio-contrib for STFT transforms
CRNN entirely defined in .cfg
Some bug in 'infer'
Run 10-fold Cross Validation
Switch over to pytorch/audio since the merge
Comment things

crnn-audio-classification's People

Contributors

Stargazers

Watchers

Forkers

elyesmanai geochri appletree123123 fkqw dearleiii davidko3 huzhangron mathematiguy ashishpatel26 develooper1994 okanlv kuonanhong aiainui hahaxun cri5castro mantek-chadha joe-nano dorucioclea donghwa-kim manojkl fortuneseeker linhong00316 vettel555 eloqute zzfon light-dawn naomieab mun3im xinsuinizhuan oriankeith001 wenwanchen dahiyaaneesh dendisuhubdy brilliant-stars baldbodybuilder chester-w-xie yuriyarabskyy liuguoyou thelou1s rockerstone michaelldd abhishekchoudhary20141150 doandongnguyen jmhuer yilinw92 road2018 ferugit yingz-e turchaev skshahnawaz javadba russellizadi k-bs kyhoolee zhangwq740 dongkeon leeshd freefrit muskbing lanpang1 miblue119 muzihuole guang-yao yeok-c jackyin68 kacel33 alexalvis 627847108 kimsanyu for3st321 ovuruska maoyuexin work-kelv brynnzhou maplestar2099 5l1v3r1 jeffowino

crnn-audio-classification's Issues

RuntimeError: mat1 and mat2 shapes cannot be multiplied (24600x1 and 1025x128)

I just install all the pkgs needed and downloaded the dataset, and then I got

$ python run.py train -c myconfigs/config.json --cfg crnn.cfg
Compose(
    ProcessChannels(mode=avg)
    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
    ToTensorAudio()
)
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py:917: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs` and `torch.angle`. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type.
  warnings.warn(
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchparse-0.1-py3.9.egg/torchparse/utils.py:54: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  return (spatial + p2 - k)//s + 1
  0%|                                                               | 0/311 [00:00<?, ?it/s]/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/audio.py:10: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  return (lengths + 2 * pad - fft_length + hop_length) // hop_length
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs`. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type.
  return F.complex_norm(complex_tensor, self.power)
  0%|                                                               | 0/311 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/run.py", line 175, in <module>
    train_main(config, args.resume)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/run.py", line 115, in train_main
    trainer.train()
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/train/base_trainer.py", line 88, in train
    result = self._train_epoch(epoch)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/train/trainer.py", line 68, in _train_epoch
    output = self.model(data)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/model.py", line 52, in forward
    xt, lengths = self.spec(xt, lengths)                
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/audio.py", line 55, in forward
    x = self.mel_scale(x)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py", line 386, in forward
    mel_specgram = torch.matmul(specgram.transpose(-1, -2), self.fb).transpose(-1, -2)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (24600x1 and 1025x128)

which is sad

Need Help as I am beignner in Audio classification

I am unable to get what I should return for Custom audio data forn CRNN model...

Like for image dataset class we return image array in numpy and its label through the get_item function in custom dataset class..
Likewise what should I return in Custom dataset class with label of audio for my custom audio dataset.

how to transfowm the model file to Torchscript

hello , i wonder if i can know how to transfowm the model file to Torchscript , so i can call the function by C++ .
i looked up a lot of information , but what i found is too simple case .

for example:
model = torchvision.models.resnet18()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)

or like this:
my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

but that is not suit for us , i can't transform model like that , so ,can you help me with this problem

I have a problem while I run this project

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 176, in
train_main(config, args.resume)
File "run.py", line 78, in train_main
data_manager = getattr(data_module, config['data']['type'])(config['data'])
File "/tf/soundclassify/crnn-audio-git/data/data_manager.py", line 131, in init
self.metadata_df = self._remove_too_small(metadata_df, 1)
File "/tf/soundclassify/crnn-audio-git/data/data_manager.py", line 140, in _remove_too_small
dur_cond = (df['end'] - df['start'])>=min_sec
File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 2980, in getitem
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end'

No License

Hi, can you please add a License to the project?

Thanks.

ValueError: optimizer got an empty parameter list

Hi, when I try to train the model, I get the following output. Any idea how to handle it?
./run.py train -c config.json --cfg arch.cfg

Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
AudioCRNN(
(spec): MelspectrogramStretch(num_mels=128, fft_length=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])
(net): ModuleDict(
(main): Sequential()
)
)
Trainable parameters: 0
Traceback (most recent call last):
File "./run.py", line 176, in
train_main(config, args.resume)
File "./run.py", line 97, in train_main
optimizer = getattr(torch.optim, opt_name)(trainable_params, **opt_args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/optim/adam.py", line 42, in init
super(Adam, self).init(params, defaults)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/optim/optimizer.py", line 46, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list

questions about transforms

Hi, thanks for your excellent work.
I noticed that there is a class called 'class ImageTransforms' in transforms.py
I want to know if this is an image transformation operation on the spectrogram
In other words, I want to know whether the image transformation is applicable to the spectrogram?

Error while training using notebook

!python run.py train -c my-config.json --cfg crnn.cfg

Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
Traceback (most recent call last):
File "run.py", line 176, in
train_main(config, args.resume)
File "run.py", line 85, in train_main
model = getattr(net_module, m_name)(classes, config=config)
File "/Users/dk/projects/ns/misc_git_projects/crnn-audio-classification/net/model.py", line 29, in init
self.net = parse_cfg(config['cfg'], in_shape=[in_chan, self.spec.num_mels, 400])
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 139, in parse_cfg
return CFGParser(fname).get_modules(in_shape)
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 120, in get_modules
model = self._flow(in_shape)
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 108, in _flow
in_shape = layer.get_out_shape()
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/base_layers.py", line 40, in get_out_shape
return torch.cat([channel, spatial])
RuntimeError: Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors

Model

Hello, is there an existing model to use? And can this be run on Jetson Nano? Thank you.

EOFError: Ran out of input

I don't know why there is this error,
————————————————————
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\backend\utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
'The interface of "soundfile" backend is planned to change in 0.8.0 to '

0%| | 0/311 [00:00<?, ?it/s]
0%| | 0/311 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 175, in
train_main(config, args.resume)
File "run.py", line 115, in train_main
trainer.train()
File "D:\pycharm_work\crnn-audio-classification-master\train\base_trainer.py", line 88, in train
result = self._train_epoch(epoch)
File "D:\pycharm_work\crnn-audio-classification-master\train\trainer.py", line 61, in _train_epoch
for batch_idx, batch in enumerate(_trange):
File "D:\Anaconda3\envs\CRNN\lib\site-packages\tqdm\std.py", line 1180, in iter
for obj in iterable:
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\backend\utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
'The interface of "soundfile" backend is planned to change in 0.8.0 to '
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

how can I get the input size

now,I want to covert the mode to caffe; so I want to the input size; like this input = torch.ones([1, 3, 224, 224])

Trainable parameters: 0

when I run the run.py, I log the model,and get the print
"AudioCRNN(
(spec): MelspectrogramStretch()
(net): ModuleDict(
(main): Sequential()
)
)
Trainable parameters: 0
Traceback (most recent call last):
File "./run.py", line 175, in
train_main(config, args.resume)
File "./run.py", line 96, in train_main
optimizer = getattr(torch.optim, opt_name)(trainable_params, **opt_args)
File "/home/wyanqing/.conda/envs/yq/lib/python3.7/site-packages/torch/optim/adam.py", line 48, in init
super(Adam, self).init(params, defaults)
File "/home/wyanqing/.conda/envs/yq/lib/python3.7/site-packages/torch/optim/optimizer.py", line 47, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list".
it seems that there is some problems about the model . how should i fix this ,thanks!

Error: Kernel size can't be greater than actual input size

in inference, if I choose audios with small size (e.g. fold10/7913-3-3-0.wav, fold10/7913-3-1-0.wav), the error will occur:
Kernel size can't be greater than actual input size

How shall I fix the bug?

How to use self-made audio files for training

macOS?

Can this project run on macOS? If so can some notes be added? I get the following errors (on catalina). It appears to be a problem with loading the model

AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'

tqdm _trange= 0%| | 0/5 [00:00<?, ?it/s]
0%| | 0/5 [00:00<?, ?it/s]Traceback (most recent call last):
File "/Users/steve/git/crnn-audio-classification/run.py", line 194, in
train_main(config, args.resume)
File "/Users/steve/git/crnn-audio-classification/run.py", line 131, in train_main
trainer.train()
File "/Users/steve/git/crnn-audio-classification/train/base_trainer.py", line 89, in train
result = self._train_epoch(epoch)
File "/Users/steve/git/crnn-audio-classification/train/trainer.py", line 62, in _train_epoch
for batch_idx, batch in enumerate(_trange):
File "/usr/local/lib/python3.8/site-packages/tqdm/std.py", line 1102, in iter
for obj in iterable:
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 352, in iter
return self._get_iterator()
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 801, in init
w.start()
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'

Pretrained weights availability

Hi,

I was wondering if you could provide pretrained weights for the pytorch models.

Thanks

When I was running the task, I found that the GPU was not used, but the CPU. How should I modify it to make the code run on the GPU?

How can I customize the model arch crnn.cfg

There seem to be possible to modify the number in crnn.cfg
But how do I add more layers in crnn.cfg
I am newbee for this type of implementation
REPEATx2 is predefined keyword in Pytoch or hardcoded variable name in your code?
If I would like to make 10 CNN before LSTM then, how can I modify?
Aldo if I would like to use different input size, where should I change?

[convs_module]
    [conv2d]
        out_channels=16
        kernel_size=3
        stride=1
        padding=valid
    [batchnorm2d]
    [elu]
    [maxpool2d]
        kernel_size=3
        stride=3
    [dropout]
        p=0.1

    REPEATx2
        [conv2d]
            out_channels=32
            kernel_size=4
            stride=1
            padding=valid
        [batchnorm2d]
        [elu]
        [maxpool2d]
            kernel_size=4
            stride=4
        [dropout]
            p=0.1
    END

[moddims]
    permute=[2,1,0]
    collapse=[1,2]

[recur_module]
    [lstm]
        hidden_size = 64
        num_layers = 3
        bidirectional=True

[moddims]
    permute=[1]

About UrbanSound8K

About the train dataset urbansound8K, I saw the dataset introduction. In my training, whether use 9 folds to train and the left one fold should to be validate, and repeat this process？The process doesn't have truly test dataset?

How can i change the batch size?

how can i change the batch size in your code?

thank you

kindly regards