Coder Social home page Coder Social logo

ksanjeevan / crnn-audio-classification Goto Github PK

View Code? Open in Web Editor NEW
369.0 8.0 80.0 3.56 MB

UrbanSound classification using Convolutional Recurrent Networks in PyTorch

License: MIT License

Python 73.79% Jupyter Notebook 26.21%
audio lstm crnn spectrogram melspectrogram convnet rnn audio-classification pytorch

crnn-audio-classification's Introduction

PyTorch Audio Classification: Urban Sounds

Classification of audio with variable length using a CNN + LSTM architecture on the UrbanSound8K dataset.

Example results:

Contents

Dependencies

Features

  • Easily define CRNN in .cfg format
  • Spectrogram computation on GPU
  • Audio data augmentation: Cropping, White Noise, Time Stretching (using phase vocoder on GPU!)

Models

CRNN architecture:

Printing model defined with torchparse:

AudioCRNN(
  (spec): MelspectrogramStretch(num_bands=128, fft_len=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])
  (net): ModuleDict(
    (convs): Sequential(
      (conv2d_0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_0): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_0): ELU(alpha=1.0)
      (maxpool2d_0): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
      (dropout_0): Dropout(p=0.1)
      (conv2d_1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_1): ELU(alpha=1.0)
      (maxpool2d_1): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      (dropout_1): Dropout(p=0.1)
      (conv2d_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_2): ELU(alpha=1.0)
      (maxpool2d_2): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      (dropout_2): Dropout(p=0.1)
    )
    (recur): LSTM(128, 64, num_layers=2)
    (dense): Sequential(
      (dropout_3): Dropout(p=0.3)
      (batchnorm1d_0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (linear_0): Linear(in_features=64, out_features=10, bias=True)
    )
  )
)
Trainable parameters: 139786

Usage

Inference

Run inference on an audio file:

./run.py /path/to/audio/file.wav -r path/to/saved/model.pth 

Training

./run.py train -c config.json --cfg arch.cfg
Augmentation

Dataset transforms:

Compose(
    ProcessChannels(mode=avg)
    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
    ToTensorAudio()
)

As well as time stretching:

TensorboardX

Evaluation

./run.py eval -r /path/to/saved/model.pth

Then obtain defined metrics:

100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:03<00:00, 12.68it/s]
{'avg_precision': '0.725', 'avg_recall': '0.719', 'accuracy': '0.804'}
10-Fold Cross Validation
Arch Accuracy AvgPrecision(macro) AvgRecall(macro)
CNN 71.0% 63.4% 63.5%
CRNN 72.3% 64.3% 65.0%
CRNN(Bidirectional, Dropout) 73.5% 65.5% 65.8%
CRNN(Dropout) 73.0% 65.5% 65.7%
CRNN(Bidirectional) 72.8% 64.3% 65.2%

Per fold metrics CRNN(Bidirectional, Dropout):

Fold Accuracy AvgPrecision(macro) AvgRecall(macro)
1 73.1% 65.1% 66.1%
2 80.7% 69.2% 68.9%
3 62.8% 57.3% 57.5%
4 73.6% 65.2% 64.9%
5 78.4% 70.3% 71.5%
6 73.5% 65.5% 65.9%
7 74.6% 67.0% 66.6%
8 66.7% 62.3% 61.7%
9 71.7% 60.7% 62.7%
10 79.9% 72.2% 71.8%

To Do

  • commit jupyter notebook dataset exploration
  • Switch overt to using pytorch/audio
  • use torchaudio-contrib for STFT transforms
  • CRNN entirely defined in .cfg
  • Some bug in 'infer'
  • Run 10-fold Cross Validation
  • Switch over to pytorch/audio since the merge
  • Comment things

crnn-audio-classification's People

Contributors

ksanjeevan avatar tbass134 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crnn-audio-classification's Issues

RuntimeError: mat1 and mat2 shapes cannot be multiplied (24600x1 and 1025x128)

I just install all the pkgs needed and downloaded the dataset, and then I got

$ python run.py train -c myconfigs/config.json --cfg crnn.cfg
Compose(
    ProcessChannels(mode=avg)
    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
    ToTensorAudio()
)
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py:917: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs` and `torch.angle`. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type.
  warnings.warn(
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchparse-0.1-py3.9.egg/torchparse/utils.py:54: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  return (spatial + p2 - k)//s + 1
  0%|                                                               | 0/311 [00:00<?, ?it/s]/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/audio.py:10: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  return (lengths + 2 * pad - fft_length + hop_length) // hop_length
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs`. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type.
  return F.complex_norm(complex_tensor, self.power)
  0%|                                                               | 0/311 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/run.py", line 175, in <module>
    train_main(config, args.resume)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/run.py", line 115, in train_main
    trainer.train()
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/train/base_trainer.py", line 88, in train
    result = self._train_epoch(epoch)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/train/trainer.py", line 68, in _train_epoch
    output = self.model(data)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/model.py", line 52, in forward
    xt, lengths = self.spec(xt, lengths)                
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/audio.py", line 55, in forward
    x = self.mel_scale(x)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py", line 386, in forward
    mel_specgram = torch.matmul(specgram.transpose(-1, -2), self.fb).transpose(-1, -2)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (24600x1 and 1025x128)

which is sad

Need Help as I am beignner in Audio classification

I am unable to get what I should return for Custom audio data forn CRNN model...

Like for image dataset class we return image array in numpy and its label through the get_item function in custom dataset class..
Likewise what should I return in Custom dataset class with label of audio for my custom audio dataset.

how to transfowm the model file to Torchscript

hello , i wonder if i can know how to transfowm the model file to Torchscript , so i can call the function by C++ .
i looked up a lot of information , but what i found is too simple case .

for example:
model = torchvision.models.resnet18()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)

or like this:
my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

but that is not suit for us , i can't transform model like that , so ,can you help me with this problem

I have a problem while I run this project

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 176, in
train_main(config, args.resume)
File "run.py", line 78, in train_main
data_manager = getattr(data_module, config['data']['type'])(config['data'])
File "/tf/soundclassify/crnn-audio-git/data/data_manager.py", line 131, in init
self.metadata_df = self._remove_too_small(metadata_df, 1)
File "/tf/soundclassify/crnn-audio-git/data/data_manager.py", line 140, in _remove_too_small
dur_cond = (df['end'] - df['start'])>=min_sec
File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 2980, in getitem
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end'

No License

Hi, can you please add a License to the project?

Thanks.

ValueError: optimizer got an empty parameter list

Hi, when I try to train the model, I get the following output. Any idea how to handle it?
./run.py train -c config.json --cfg arch.cfg

Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
AudioCRNN(
(spec): MelspectrogramStretch(num_mels=128, fft_length=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])
(net): ModuleDict(
(main): Sequential()
)
)
Trainable parameters: 0
Traceback (most recent call last):
File "./run.py", line 176, in
train_main(config, args.resume)
File "./run.py", line 97, in train_main
optimizer = getattr(torch.optim, opt_name)(trainable_params, **opt_args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/optim/adam.py", line 42, in init
super(Adam, self).init(params, defaults)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/optim/optimizer.py", line 46, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list

questions about transforms

Hi, thanks for your excellent work.
I noticed that there is a class called 'class ImageTransforms' in transforms.py
I want to know if this is an image transformation operation on the spectrogram
In other words, I want to know whether the image transformation is applicable to the spectrogram?

Error while training using notebook

!python run.py train -c my-config.json --cfg crnn.cfg

Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
Traceback (most recent call last):
File "run.py", line 176, in
train_main(config, args.resume)
File "run.py", line 85, in train_main
model = getattr(net_module, m_name)(classes, config=config)
File "/Users/dk/projects/ns/misc_git_projects/crnn-audio-classification/net/model.py", line 29, in init
self.net = parse_cfg(config['cfg'], in_shape=[in_chan, self.spec.num_mels, 400])
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 139, in parse_cfg
return CFGParser(fname).get_modules(in_shape)
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 120, in get_modules
model = self._flow(in_shape)
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 108, in _flow
in_shape = layer.get_out_shape()
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/base_layers.py", line 40, in get_out_shape
return torch.cat([channel, spatial])
RuntimeError: Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors

Model

Hello, is there an existing model to use? And can this be run on Jetson Nano? Thank you.

EOFError: Ran out of input

I don't know why there is this error,
————————————————————
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\backend\utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
'The interface of "soundfile" backend is planned to change in 0.8.0 to '

0%| | 0/311 [00:00<?, ?it/s]
0%| | 0/311 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 175, in
train_main(config, args.resume)
File "run.py", line 115, in train_main
trainer.train()
File "D:\pycharm_work\crnn-audio-classification-master\train\base_trainer.py", line 88, in train
result = self._train_epoch(epoch)
File "D:\pycharm_work\crnn-audio-classification-master\train\trainer.py", line 61, in _train_epoch
for batch_idx, batch in enumerate(_trange):
File "D:\Anaconda3\envs\CRNN\lib\site-packages\tqdm\std.py", line 1180, in iter
for obj in iterable:
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\backend\utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
'The interface of "soundfile" backend is planned to change in 0.8.0 to '
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

how can I get the input size

now,I want to covert the mode to caffe; so I want to the input size; like this input = torch.ones([1, 3, 224, 224])

Trainable parameters: 0

when I run the run.py, I log the model,and get the print
"AudioCRNN(
(spec): MelspectrogramStretch()
(net): ModuleDict(
(main): Sequential()
)
)
Trainable parameters: 0
Traceback (most recent call last):
File "./run.py", line 175, in
train_main(config, args.resume)
File "./run.py", line 96, in train_main
optimizer = getattr(torch.optim, opt_name)(trainable_params, **opt_args)
File "/home/wyanqing/.conda/envs/yq/lib/python3.7/site-packages/torch/optim/adam.py", line 48, in init
super(Adam, self).init(params, defaults)
File "/home/wyanqing/.conda/envs/yq/lib/python3.7/site-packages/torch/optim/optimizer.py", line 47, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list".
it seems that there is some problems about the model . how should i fix this ,thanks!

macOS?

Can this project run on macOS? If so can some notes be added? I get the following errors (on catalina). It appears to be a problem with loading the model

AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'

tqdm _trange= 0%| | 0/5 [00:00<?, ?it/s]
0%| | 0/5 [00:00<?, ?it/s]Traceback (most recent call last):
File "/Users/steve/git/crnn-audio-classification/run.py", line 194, in
train_main(config, args.resume)
File "/Users/steve/git/crnn-audio-classification/run.py", line 131, in train_main
trainer.train()
File "/Users/steve/git/crnn-audio-classification/train/base_trainer.py", line 89, in train
result = self._train_epoch(epoch)
File "/Users/steve/git/crnn-audio-classification/train/trainer.py", line 62, in _train_epoch
for batch_idx, batch in enumerate(_trange):
File "/usr/local/lib/python3.8/site-packages/tqdm/std.py", line 1102, in iter
for obj in iterable:
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 352, in iter
return self._get_iterator()
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 801, in init
w.start()
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'

How can I customize the model arch crnn.cfg

There seem to be possible to modify the number in crnn.cfg
But how do I add more layers in crnn.cfg
I am newbee for this type of implementation
REPEATx2 is predefined keyword in Pytoch or hardcoded variable name in your code?
If I would like to make 10 CNN before LSTM then, how can I modify?
Aldo if I would like to use different input size, where should I change?

[convs_module]
    [conv2d]
        out_channels=16
        kernel_size=3
        stride=1
        padding=valid
    [batchnorm2d]
    [elu]
    [maxpool2d]
        kernel_size=3
        stride=3
    [dropout]
        p=0.1

    REPEATx2
        [conv2d]
            out_channels=32
            kernel_size=4
            stride=1
            padding=valid
        [batchnorm2d]
        [elu]
        [maxpool2d]
            kernel_size=4
            stride=4
        [dropout]
            p=0.1
    END

[moddims]
    permute=[2,1,0]
    collapse=[1,2]

[recur_module]
    [lstm]
        hidden_size = 64
        num_layers = 3
        bidirectional=True

[moddims]
    permute=[1]

About UrbanSound8K

About the train dataset urbansound8K, I saw the dataset introduction. In my training, whether use 9 folds to train and the left one fold should to be validate, and repeat this process?The process doesn't have truly test dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.