seannaren / deepspeech.pytorch Goto Github PK
View Code? Open in Web Editor NEWSpeech Recognition using DeepSpeech2.
License: MIT License
Speech Recognition using DeepSpeech2.
License: MIT License
When I try to do continue from the checkpoint deepspeech_6.pth.tar I receive this:
Loading checkpoint model ./models/deepspeech_6.pth.tar
6
File "train.py", line 324, in <module>
main()
File "train.py", line 147, in main
start_iter = int(package.get('iteration', -1)) + 1
ValueError: invalid literal for int() with base 10: 'N/A'
Hi,
I have trouble installing the py_torch bindings using anaconda. I'm using Ubuntu 14.04.3, and gcc 5.4.1 and cuda 8.
Everythings works fine untill i try to import the warp bindings in python:
>>> from warpctc_pytorch import CTCLoss
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch/__init__.py", line 7, in <module>
from ._warp_ctc import lib as _lib, ffi as _ffi
ImportError: /home/sorson/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libgomp.so.1: version `GOMP_4.0' not found (required by /home/sorson/warp-ctc/build/libwarpctc.so)
Output of installation:
sorson@phoebe:~/warp-ctc/build$ cmake ..
-- The C compiler identification is GNU 5.4.1
-- The CXX compiler identification is GNU 5.4.1
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found CUDA: /usr/local/cuda-8.0 (found suitable version "8.0", minimum required is "6.5")
-- cuda found TRUE
CMake Warning at CMakeLists.txt:48 (FIND_PACKAGE):
By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Torch", but
CMake did not find one.
Could not find a package configuration file provided by "Torch" with any of
the following names:
TorchConfig.cmake
torch-config.cmake
Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
"Torch_DIR" to a directory containing one of the above files. If "Torch"
provides a separate development package or SDK, be sure it has been
installed.
-- Torch found Torch_DIR-NOTFOUND
-- Building shared library with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sorson/warp-ctc/build
sorson@phoebe:~/warp-ctc/build$ make
[ 25%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o
[ 50%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_ctc_entrypoint.cu.o
Scanning dependencies of target warpctc
Linking CXX shared library libwarpctc.so
[ 50%] Built target warpctc
Scanning dependencies of target test_cpu
[ 75%] Building CXX object CMakeFiles/test_cpu.dir/tests/test_cpu.cpp.o
Linking CXX executable test_cpu
[ 75%] Built target test_cpu
[100%] Building NVCC (Device) object CMakeFiles/test_gpu.dir/tests/./test_gpu_generated_test_gpu.cu.o
Scanning dependencies of target test_gpu
Linking CXX executable test_gpu
[100%] Built target test_gpu
sorson@phoebe:~/warp-ctc/pytorch_binding$ python setup.py install
generating build/_warp_ctc.c
regenerated: 'build/_warp_ctc.c'
running install
running build
running build_py
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/warpctc_pytorch
copying warpctc_pytorch/__init__.py -> build/lib.linux-x86_64-3.6/warpctc_pytorch
running build_ext
building 'warpctc_pytorch._warp_ctc' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/build
creating build/temp.linux-x86_64-3.6/home
creating build/temp.linux-x86_64-3.6/home/sorson
creating build/temp.linux-x86_64-3.6/home/sorson/warp-ctc
creating build/temp.linux-x86_64-3.6/home/sorson/warp-ctc/pytorch_binding
creating build/temp.linux-x86_64-3.6/home/sorson/warp-ctc/pytorch_binding/src
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/sorson/warp-ctc/include -I/home/sorson/anaconda3/include/python3.6m -c build/_warp_ctc.c -o build/temp.linux-x86_64-3.6/build/_warp_ctc.o -std=c++11 -fPIC -DWARPCTC_ENABLE_GPU
cc1: warning: command line option ‘-std=c++11’ is valid for C++/ObjC++ but not for C
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/sorson/warp-ctc/include -I/home/sorson/anaconda3/include/python3.6m -c /home/sorson/warp-ctc/pytorch_binding/src/binding.cpp -o build/temp.linux-x86_64-3.6/home/sorson/warp-ctc/pytorch_binding/src/binding.o -std=c++11 -fPIC -DWARPCTC_ENABLE_GPU
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
g++ -pthread -shared -L/home/sorson/anaconda3/lib -Wl,-rpath=/home/sorson/anaconda3/lib,--no-as-needed build/temp.linux-x86_64-3.6/build/_warp_ctc.o build/temp.linux-x86_64-3.6/home/sorson/warp-ctc/pytorch_binding/src/binding.o -L/home/sorson/warp-ctc/build -L/home/sorson/anaconda3/lib -Wl,--enable-new-dtags,-R/home/sorson/warp-ctc/build -lwarpctc -lpython3.6m -o build/lib.linux-x86_64-3.6/warpctc_pytorch/_warp_ctc.cpython-36m-x86_64-linux-gnu.so
running install_lib
copying build/lib.linux-x86_64-3.6/warpctc_pytorch/_warp_ctc.cpython-36m-x86_64-linux-gnu.so -> /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch
copying build/lib.linux-x86_64-3.6/warpctc_pytorch/__init__.py -> /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch
byte-compiling /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch/__init__.py to __init__.cpython-36.pyc
running install_egg_info
Removing /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6.egg-info
Writing /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6.egg-info
I have got following error:
ImportError: /usr/local/lib/python2.7/dist-packages/torch/lib/libgomp.so.1: version `GOMP_4.0' not found (required by /home/nitin/warp-ctc/build/libwarpctc.so)
I installed pytorch from pip ,my current gcc version is 4.9..
Dear friends,
When trying to run an4.py from ./data I get:
dlm@vm001nc6:~/code/deepspeech.pytorch/data$ python an4.py
Traceback (most recent call last):
File "an4.py", line 8, in
from data.utils import create_manifest
ModuleNotFoundError: No module named 'data'
I guess when I try to run an4.py from data dir (as explained in the README.md), the data module is not resolved.
David
To help determine what batch size to use, create a script that will test tensors through the entire training process to determine if a batch size will not cause OOM issues
To fully get deepspeech integration, there needs to be a beam search across a language model constrained to a dictionary. I know a few people have been working on this recently and this issue will monitor progress!
In addition there is C code for KenLM beam search here for Tensorflow that should be portable from what I can see here.
Hello, I'm trying to use CTC for an OCR which uses a similar architecture, following the examples at https://github.com/SeanNaren/warp-ctc/blob/pytorch_bindings/pytorch_binding/README.md.
I was initially having a problem with the types (IntTensor/FloatTensor), which I believe I fixed with some trial and error, but right now I'm stuck at a segfault.
Can you tell me what probs_sizes, and label sizes are? I don't understand this in deepspeech.pytorch:
sizes = Variable(input_percentages.mul_(int(seq_length)).int(), requires_grad=False)
Using Anaconda GCC, when I used the command make to build warp-ctc, I got this:
dlm@vm001nc6:~/code/warp-ctc/build$ make
CMake Warning at /home/dlm/anaconda3/share/cmake-3.6/Modules/FindCUDA.cmake:779 (message):
Expecting to find librt for libcudart_static, but didn't find it.
Call Stack (most recent call first):
CMakeLists.txt:20 (FIND_PACKAGE)
-- cuda found TRUE
-- Found Torch7 in /home/dlm/torch/install
-- Torch found /home/dlm/torch/install/share/cmake/torch
-- Building shared library with GPU support
-- Building Torch Bindings with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dlm/code/warp-ctc/build
[ 10%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o
[ 20%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_ctc_entrypoint.cu.o
[ 30%] Linking CXX shared library libwarpctc.so
[ 30%] Built target warpctc
Scanning dependencies of target test_cpu
[ 40%] Building CXX object CMakeFiles/test_cpu.dir/tests/test_cpu.cpp.o
[ 50%] Linking CXX executable test_cpu
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::random_device::_M_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::runtime_error::runtime_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long)@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)@GLIBCXX_3.4.21'
collect2: error: ld returned 1 exit status
CMakeFiles/test_cpu.dir/build.make:110: recipe for target 'test_cpu' failed
make[2]: *** [test_cpu] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/test_cpu.dir/all' failed
make[1]: *** [CMakeFiles/test_cpu.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
dlm@vm001nc6:~/code/warp-ctc/build$
But with Ubuntu GCC, everything is fine.
On each of the datasets provided, we must train a Deepspeech model. The overall architecture is encompassed in this command:
python train.py --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --visdom --train_manifest /path/to/train_manifest.csv --val_manifest /path/to/val_manifest.csv --epochs 100 --num_workers $(nproc) --cuda
In the above command you must replace the manifests paths with the correct paths to the dataset. A few notes:
--learning anneal
and setting it to a smaller value, like 1.01
. For larger datasets, the default is fine (up to around 4.5k hours from internal testing on the deepspeech.torch version)A release will be cut from the DeepSpeech package that will have the models, and a reference to the latest release added to the README to find latest models!
Progress tracker for datasets:
Let me know if you plan on working on running any of these, and I'll update the ticket with details!
I am very new in this field, I ran train.py
on librispeech dataset after complete training I ran test.py
as following -
python test.py --model_path models/deepspeech_final.pth.tar --val_manifest data/libri_test_manifest.csv --cuda
I got following output:
Validation Summary Average WER 3.972 Average CER 0.747
One more thing I noticed in test.py
argument is --val_manifest
, that is validation manifest file, But I think it should be --test_manifest
?
Now I wanted to test the same model on unseen data using predict.py
, but how do I include language model?
Hi there,
Just out of curiosity, what kind of RTF can one expect from your code?
I want to test it out, but I was curious to know ahead of time if the decoding is fairly fast.
Thank you!
Miguel
I checked out the previous issues regarding similiar matter, but the solutions don't seem to help.
Traceback (most recent call last):
File "train.py", line 11, in <module>
from data.data_loader import AudioDataLoader, SpectrogramDataset
File "/home/lintangsutawika/deeplearning/deepspeech.pytorch/data/__init__.py", line 1, in <module>
from . import data_loader
File "/home/lintangsutawika/deeplearning/deepspeech.pytorch/data/data_loader.py", line 8, in <module>
import torchaudio
File "build/bdist.linux-x86_64/egg/torchaudio/__init__.py", line 5, in <module>
File "build/bdist.linux-x86_64/egg/torchaudio/_ext/th_sox/__init__.py", line 3, in <module>
File "build/bdist.linux-x86_64/egg/torchaudio/_ext/th_sox/_th_sox.py", line 7, in <module>
File "build/bdist.linux-x86_64/egg/torchaudio/_ext/th_sox/_th_sox.py", line 6, in __bootstrap__
ImportError: /home/lintangsutawika/anaconda2/lib/python2.7/site-packages/torch/lib/libgomp.so.1: version `GOMP_4.0' not found (required by /usr/lib/x86_64-linux-gnu/libsox.so.2)
Add noise injection capabilities that were dumped here into our dataloader.
As I remember in DeepSpeech paper the samples were sorted by duration only during the first epoch? Maybe it would make sense to switch from the sequential sampler to random sampler in AudioDataLoader after the first epoch?
I'm trying to get some insight into batch sizes and whether or not the performance I'm seeing is expected. It seems that I can't set batch sizes much more than say, 32, w/ my dual Titan Xs. It's further my understanding that dataparallel
will split that batch of 32 across the two GPUs for an effective batch size of 16 per gpu per batch. The model I'm training is all default: 4 LSTM layers w/ 400 hidden units. Now this is a fair amount different than many of the DeepSpeech 2 configurations in the paper, but I am seeing references to them having batch sizes of 512 spread over 8 Titan X's. This implies that whatever system they're running allows them to support batches of 64 per gpu. Seems to me we should be able to get closer to this number unless I'm missing something. Any thoughts?
Dear friends,
In order to run an4.py from inside the data dir (as described by README.md) I fix the data.utils in an4.py with the following changes:
#from data.utils import create_manifest
from utils import create_manifest
If I am using Python 3.6:
Now, I am getting following error:
Traceback (most recent call last):
File "an4.py", line 84, in
main()
File "an4.py", line 72, in main
_format_data(root_path, 'train', name, 'an4_clstk')
File "an4.py", line 31, in _format_data
_format_files(file_ids, new_transcript_path, new_wav_path, transcripts, wav_path)
File "an4.py", line 57, in _format_files
file.write(extracted_transcript)
TypeError: a bytes-like object is required, not 'str'
David
I installed both torch and pytoch version of deep speech2.
However, I noticed that torch version converges much faster than pytorch on AN4 dataset.
In pytorch version, I followed default setting of torch version by using the following command:
python train.py --cuda --train_manifest data/an4_train_manifest.csv --val_manifest data/an4_val_manifest.csv --rnn_type rnn --hidden_size 1760 --hidden_layers 7 --noise_prob 0
I can not figure out what makes a big convergence speed difference.
I'm working on implementing a beam decoder for this, and just realized that the output values do not appear to be posteriors. In model.py
I see the output layer is a Linear
layer. Why not a Softmax or LogSoftmax activation? I suppose such a layer is not strictly necessary when doing a Greedy decode (and less efficient), but will be necessary for more complicated decoders. Just wondering if there's a specific reason or if there's something I'm missing.
Thanks!
Typical Deepspeech architecture uses batch normalized BRNNs. Implement this to stay true to the architecture.
In dataloader.py
under func augment_audio_with_sox
I think
y = load_audio(path)
should be replaced to
y = load_audio(augmented_filename)
Am I right?
and do I have to post this kind of thing in the Issues? (I am newbie to github)
Any pre-trained models available?
Training process introduce memory leak that prevents model to be trained on larger dataset for long time.
I haven't dig into issue deep enough but here are my observations (i've adapted code to work with vctk dataset): each iteration increase memory consumption of main process by some number (~1.5G for vctk, don't have exact number), on each iteration worker processes double memory requirements (they're fork()
ed on each iteration).
Right now I'm thinking the problem is inside data loader and/or it integration. But again, I haven't dig into that because of time constraints.
I'm able to train the model on VCTK for 9 iterations with default parameters. My current workaround is to resume training from last checkpoint.
Dear Friends,
I would like to know if it is possible to optimize the memory usage of the DeepSpeach.Pytorch?
In my tests, DeepSpeeach.Pytorch uses about 9 GB while DeepSpeeach.Torch uses 5 GB training AN4.
Thanks,
David
DEEPSPEECH.TORCH
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 7C1A:00:00.0 Off | 0 |
| N/A 72C P0 112W / 149W | 5006MiB / 11439MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 8700 C /home/dlm/torch/install/bin/luajit 5004MiB |
+-----------------------------------------------------------------------------+
DEEPSPEECH.PYTORCH
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 7C1A:00:00.0 Off | 0 |
| N/A 70C P0 115W / 149W | 9076MiB / 11439MiB | 88% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 10611 C python 9072MiB |
+-----------------------------------------------------------------------------+
Dear friends,
My DeepSpeech.PyTorch stopped working after installing Torch to use also DeepSpeech.Torch. See the logs bellow. It is very similar with an another issue of the repo and they said we should use another gcc, but I am not sure exactly what is the REAL problem.
If a move the torch installation directory, DeepSpeech.PyTorch works again! If a move the torch installation directory back, DeepSpeech.PyTorch fails!
> dlm@vm001nc6:~/code/deepspeech.pytorch$
> dlm@vm001nc6:~/code/deepspeech.pytorch$
> dlm@vm001nc6:~/code/deepspeech.pytorch$ python train.py --train_manifest data/train_manifest.csv --val_manifest data/val_manifest.csv
> Traceback (most recent call last):
> File "train.py", line 9, in <module>
> from warpctc_pytorch import CTCLoss
> File "/home/dlm/anaconda3/lib/python3.6/site-packages/warpctc_pytorch/__init__.py", line 7, in <module>
> from ._warp_ctc import lib as _lib, ffi as _ffi
> ImportError: /home/dlm/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libgomp.so.1: version `GOMP_4.0' not found (required by /home/dlm/torch/install/lib/libwarpctc.so)
> dlm@vm001nc6:~/code/deepspeech.pytorch$
DataParallel (
(module): DeepSpeech (
(conv): Sequential (
(0): Conv2d(1, 32, kernel_size=(41, 11), stride=(2, 2))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
(2): Hardtanh (min_val=0, max_val=20, inplace)
(3): Conv2d(32, 32, kernel_size=(21, 11), stride=(2, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
(5): Hardtanh (min_val=0, max_val=20, inplace)
)
(rnns): Sequential (
(0): BatchLSTM (
(batch_norm): SequenceWise (
Sequential (
(0): Linear (672 -> 400)
(1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
))
(rnn): LSTM(672, 400, bias=False, bidirectional=True)
)
(1): BatchLSTM (
(batch_norm): SequenceWise (
Sequential (
(0): Linear (400 -> 400)
(1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
))
(rnn): LSTM(400, 400, bias=False, bidirectional=True)
)
(2): BatchLSTM (
(batch_norm): SequenceWise (
Sequential (
(0): Linear (400 -> 400)
(1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
))
(rnn): LSTM(400, 400, bias=False, bidirectional=True)
)
(3): BatchLSTM (
(batch_norm): SequenceWise (
Sequential (
(0): Linear (400 -> 400)
(1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
))
(rnn): LSTM(400, 400, bias=False, bidirectional=True)
)
)
(fc): Sequential (
(0): SequenceWise (
Sequential (
(0): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
(1): Linear (400 -> 29)
))
)
)
)
Traceback (most recent call last):
File "train.py", line 263, in <module>
main()
File "train.py", line 145, in main
out = model(inputs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 40, in forward
return self.module(input.cuda(self.device_ids[0]))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
File "/home/demobin/work/github/deepspeech.pytorch/model.py", line 94, in forward
x = self.rnns(x)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.py", line 63, in forward
input = module(input)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
File "/home/demobin/work/github/deepspeech.pytorch/model.py", line 48, in forward
x, _ = self.rnn(x)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
TypeError: forward() takes exactly 3 arguments (2 given)
I have observed that loading wav files through librosa.core.load is slower than with scipy's wavfile.read function.
IHere are two examples(same setup: LibriSpeech train, batch size 64, two GPU 1080, single layer GRU 1024 hidden units, 4 workers ):
with librosa.core.load
Epoch: [1][800/4395] Time 0.534 (0.571) Data 0.002 (0.215) Loss 177.7152 (132.5715)
Epoch: [1][801/4395] Time 2.908 (0.574) Data 2.372 (0.218) Loss 178.4938 (132.6288)
Epoch: [1][802/4395] Time 1.918 (0.576) Data 1.382 (0.219) Loss 170.5456 (132.6761)
Epoch: [1][803/4395] Time 1.359 (0.577) Data 0.821 (0.220) Loss 172.4971 (132.7257)
Epoch: [1][804/4395] Time 1.007 (0.577) Data 0.473 (0.221) Loss 173.2764 (132.7761)
Epoch: [1][805/4395] Time 2.924 (0.580) Data 2.387 (0.223) Loss 170.5575 (132.8231)
Epoch: [1][806/4395] Time 1.580 (0.581) Data 0.993 (0.224) Loss 171.3481 (132.8709)
Epoch: [1][807/4395] Time 1.711 (0.583) Data 1.173 (0.225) Loss 171.6878 (132.9190)
Epoch: [1][808/4395] Time 1.234 (0.584) Data 0.684 (0.226) Loss 180.7255 (132.9781)
Epoch: [1][809/4395] Time 1.770 (0.585) Data 1.232 (0.227) Loss 165.3421 (133.0181)
Epoch: [1][810/4395] Time 3.278 (0.588) Data 2.740 (0.230) Loss 166.3688 (133.0593)
Epoch: [1][811/4395] Time 1.039 (0.589) Data 0.503 (0.231) Loss 174.8942 (133.1109)
Epoch: [1][812/4395] Time 0.936 (0.589) Data 0.398 (0.231) Loss 173.7952 (133.1610)
Epoch: [1][813/4395] Time 2.226 (0.591) Data 1.685 (0.233) Loss 177.2127 (133.2152)
Epoch: [1][814/4395] Time 3.571 (0.595) Data 3.034 (0.236) Loss 167.6428 (133.2575)
Epoch: [1][815/4395] Time 0.548 (0.595) Data 0.002 (0.236) Loss 169.2704 (133.3017)
Epoch: [1][816/4395] Time 0.867 (0.595) Data 0.330 (0.236) Loss 166.2436 (133.3420)
Epoch: [1][817/4395] Time 2.086 (0.597) Data 1.538 (0.237) Loss 181.1502 (133.4006)
Epoch: [1][818/4395] Time 3.903 (0.601) Data 3.295 (0.241) Loss 171.9818 (133.4477)
Epoch: [1][819/4395] Time 0.543 (0.601) Data 0.002 (0.241) Loss 175.6589 (133.4993)
Epoch: [1][820/4395] Time 0.545 (0.601) Data 0.002 (0.241) Loss 174.9106 (133.5498)
Epoch: [1][821/4395] Time 2.316 (0.603) Data 1.776 (0.242) Loss 168.7184 (133.5926)
Epoch: [1][822/4395] Time 3.403 (0.607) Data 2.852 (0.246) Loss 174.2117 (133.6420)
Epoch: [1][823/4395] Time 0.700 (0.607) Data 0.164 (0.246) Loss 166.9638 (133.6825)
Epoch: [1][824/4395] Time 0.548 (0.607) Data 0.002 (0.245) Loss 183.0482 (133.7424)
Epoch: [1][825/4395] Time 2.556 (0.609) Data 1.995 (0.247) Loss 170.0922 (133.7865)
and with scipy.io.wavfile.read
Epoch: [1][800/4395] Time 0.998 (0.486) Data 0.002 (0.005) Loss 182.8188 (137.1815)
Epoch: [1][801/4395] Time 0.988 (0.486) Data 0.002 (0.005) Loss 182.8905 (137.2385)
Epoch: [1][802/4395] Time 0.978 (0.487) Data 0.002 (0.005) Loss 177.0392 (137.2881)
Epoch: [1][803/4395] Time 0.966 (0.488) Data 0.002 (0.005) Loss 176.3289 (137.3368)
Epoch: [1][804/4395] Time 0.993 (0.488) Data 0.002 (0.005) Loss 177.1688 (137.3863)
Epoch: [1][805/4395] Time 0.951 (0.489) Data 0.002 (0.005) Loss 173.7921 (137.4315)
Epoch: [1][806/4395] Time 0.999 (0.490) Data 0.002 (0.005) Loss 176.1973 (137.4796)
Epoch: [1][807/4395] Time 0.995 (0.490) Data 0.002 (0.005) Loss 176.3397 (137.5278)
Epoch: [1][808/4395] Time 0.968 (0.491) Data 0.002 (0.005) Loss 184.6344 (137.5861)
Epoch: [1][809/4395] Time 0.997 (0.491) Data 0.002 (0.005) Loss 170.4413 (137.6267)
Epoch: [1][810/4395] Time 0.951 (0.492) Data 0.002 (0.005) Loss 169.0714 (137.6655)
Epoch: [1][811/4395] Time 1.005 (0.493) Data 0.002 (0.005) Loss 176.8274 (137.7138)
Epoch: [1][812/4395] Time 0.998 (0.493) Data 0.002 (0.005) Loss 178.3829 (137.7639)
Epoch: [1][813/4395] Time 1.010 (0.494) Data 0.002 (0.005) Loss 181.8083 (137.8181)
Epoch: [1][814/4395] Time 1.006 (0.494) Data 0.002 (0.005) Loss 168.5279 (137.8558)
Epoch: [1][815/4395] Time 0.990 (0.495) Data 0.002 (0.005) Loss 171.7180 (137.8973)
Epoch: [1][816/4395] Time 1.025 (0.496) Data 0.002 (0.005) Loss 169.9290 (137.9366)
Epoch: [1][817/4395] Time 1.006 (0.496) Data 0.002 (0.005) Loss 183.6840 (137.9926)
Epoch: [1][818/4395] Time 0.992 (0.497) Data 0.002 (0.005) Loss 176.9357 (138.0402)
Epoch: [1][819/4395] Time 1.038 (0.498) Data 0.002 (0.005) Loss 180.7055 (138.0923)
Epoch: [1][820/4395] Time 0.994 (0.498) Data 0.002 (0.005) Loss 179.9813 (138.1434)
Epoch: [1][821/4395] Time 1.000 (0.499) Data 0.002 (0.005) Loss 171.4475 (138.1839)
Epoch: [1][822/4395] Time 1.025 (0.499) Data 0.002 (0.005) Loss 180.0951 (138.2349)
Epoch: [1][823/4395] Time 0.989 (0.500) Data 0.002 (0.005) Loss 172.6573 (138.2767)
Epoch: [1][824/4395] Time 1.010 (0.501) Data 0.002 (0.005) Loss 185.8300 (138.3345)
Epoch: [1][825/4395] Time 1.040 (0.501) Data 0.002 (0.005) Loss 175.1313 (138.3791)
Hi! I'm playing around with deepspeech with the Librispeech dataset. Initially, when I blindly ran the training script with a large batch size I ran into an out-of-memory error, as expected.
So then I checked the largest sequence length in the librispeech data(around 2900 for my the default audio config) and ran the benchmarking script with --seconds 30. Turns out I can only support a batch size of 10. But when I let my model train using this batch-size I run into an OOM error in the final few batches.
Why is this the case? Since the benchmarking script is testing the exact same model configuration (I'm going with the default config) with random input data of size larger than any of the batches in Librispeech, shouldn't it run to completion?
I would like to believe that the model is overfitting but why would the WER keep decreasing if it was overfitting?
The architecture is as follows:
500 hidden size
5 RNN layers
default LR
1.001 annealing factor which I'm increasing by 0.001 every epoch.
I'm training using Librispeech train-clean-100.tar.gz and validating on dev-clean.tar.gz
This will give us the ability to do a real time model, being able to predict just using forward only RNNs.
I've opened a branch called fp_16 that has a benchmark script for fp_16.
I haven't got official confirmation from anyone at NVIDIA, but it seems like consumer cards (post Titan X maxwell) have been nerfed thus perform poorly using FP16 unless it is the Tesla P100 (to differentiate both markets, more info here).
Measure the memory usage as well as the time taken for the benchmark script on various hardware at default settings. @ryanleary would it be possible to get benchmark times using this script? IIRC you have Titan X Maxwells?
this happened when i set cuda=True
)
Traceback (most recent call last):
File "train.py", line 318, in <module>
main()
File "train.py", line 182, in main
out = model(inputs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
return parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
raise output
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR
Hello,
I am using Ubuntu 16.04 and pytorch 0.1.12_2. When I make warpctc following README, I meet the following problem:
from warpctc_pytorch import CTCLoss
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-4-fdfcfb094b24> in <module>()
----> 1 from warpctc_pytorch import CTCLoss
/usr/local/lib/python2.7/dist-packages/warpctc_pytorch/__init__.pyc in <module>()
5 from torch.nn.modules.loss import _assert_no_grad
6 from torch.utils.ffi import _wrap_function
----> 7 from ._warp_ctc import lib as _lib, ffi as _ffi
8
9 __all__ = []
ImportError: /usr/local/lib/python2.7/dist-packages/torch/lib/libgomp.so.1: version `GOMP_4.0' not found (required by /usr/local/warp-ctc/build/libwarpctc.so)
Anyone can me?
Any idea when will you be expanding other than an4? https://github.com/mozilla/DeepSpeech/blob/master/util/importers/ted.py
How can I help/contribute?
The error happens after the end of the first epoch.
What might be the cause of this?
Epoch: [1][40175/40178] Time 1.165 (0.568) Data 0.003 (0.002) Loss 140.8571 (87.2619)
Epoch: [1][40176/40178] Time 1.227 (0.568) Data 0.001 (0.002) Loss 149.9813 (87.2635)
Epoch: [1][40177/40178] Time 1.282 (0.568) Data 0.002 (0.002) Loss 106.1354 (87.2639)
Epoch: [1][40178/40178] Time 0.895 (0.568) Data 0.001 (0.002) Loss 113.1760 (87.2641)
Training Summary Epoch: [1] Average Loss 87.265
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/home/lintangsutawika/deeplearning/deepspeech.pytorch/train.py in <module>()
320
321 if __name__ == '__main__':
--> 322 main()
/home/lintangsutawika/deeplearning/deepspeech.pytorch/train.py in main()
261 sizes = Variable(input_percentages.mul_(int(seq_length)).int())
262
--> 263 decoded_output = decoder.decode(out.data, sizes)
264 target_strings = decoder.process_strings(decoder.convert_to_strings(split_targets))
265 wer, cer = 0, 0
/home/lintangsutawika/deeplearning/deepspeech.pytorch/decoder.pyc in decode(self, probs, sizes)
140 """
141 _, max_probs = torch.max(probs.transpose(0, 1), 2)
--> 142 strings = self.convert_to_strings(max_probs.view(max_probs.size(0), max_probs.size(1)), sizes)
143 return self.process_strings(strings, remove_repetitions=True)
/home/lintangsutawika/deeplearning/deepspeech.pytorch/decoder.pyc in convert_to_strings(self, sequences, sizes)
44 for x in xrange(len(sequences)):
45 string = self.convert_to_string(sequences[x])
---> 46 string = string[0:int(sizes.data[x])] if sizes else string
47 strings.append(string)
48 return strings
/home/lintangsutawika/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.pyc in __bool__(self)
119 return False
120 raise RuntimeError("bool value of Variable objects containing non-empty " +
--> 121 torch.typename(self.data) + " is ambiguous")
122
123 __nonzero__ = __bool__
RuntimeError: bool value of Variable objects containing non-empty torch.IntTensor is ambiguous
It seems (at least in my setup) that something breaks at the end of the first epoch -- probably due to an issue in the dataloader(?). Haven't dug much into it yet.
Epoch: [1][4244/4549] Time 0.584 (0.309) Data 0.002 (0.004) Loss 190.2870 (149.7884)
Epoch: [1][4245/4549] Time 0.576 (0.309) Data 0.002 (0.004) Loss 220.6672 (149.8051)
Epoch: [1][4246/4549] Time 0.582 (0.309) Data 0.002 (0.004) Loss 206.2761 (149.8184)
Epoch: [1][4247/4549] Time 0.598 (0.309) Data 0.002 (0.004) Loss 194.7978 (149.8290)
Epoch: [1][4248/4549] Time 0.603 (0.309) Data 0.002 (0.004) Loss 190.6617 (149.8386)
Epoch: [1][4249/4549] Time 0.583 (0.309) Data 0.002 (0.004) Loss 189.1278 (149.8478)
Epoch: [1][4250/4549] Time 0.601 (0.310) Data 0.002 (0.004) Loss 204.6577 (149.8607)
Epoch: [1][4251/4549] Time 0.585 (0.310) Data 0.001 (0.004) Loss 247.6785 (149.8837)
Epoch: [1][4252/4549] Time 0.616 (0.310) Data 0.002 (0.004) Loss 223.9250 (149.9012)
Epoch: [1][4253/4549] Time 0.592 (0.310) Data 0.002 (0.004) Loss 201.2460 (149.9132)
Epoch: [1][4254/4549] Time 0.590 (0.310) Data 0.002 (0.004) Loss 226.0778 (149.9311)
Epoch: [1][4255/4549] Time 0.592 (0.310) Data 0.002 (0.004) Loss 183.1511 (149.9389)
Epoch: [1][4256/4549] Time 0.601 (0.310) Data 0.002 (0.004) Loss 202.0417 (149.9512)
Epoch: [1][4257/4549] Time 0.585 (0.310) Data 0.001 (0.004) Loss 192.3692 (149.9611)
Epoch: [1][4258/4549] Time 0.617 (0.310) Data 0.002 (0.004) Loss 210.9887 (149.9755)
Epoch: [1][4259/4549] Time 0.616 (0.310) Data 0.002 (0.004) Loss 222.2541 (149.9925)
Epoch: [1][4260/4549] Time 0.605 (0.310) Data 0.002 (0.004) Loss 171.1521 (149.9974)
Epoch: [1][4261/4549] Time 0.606 (0.310) Data 0.002 (0.004) Loss 234.3608 (150.0172)
Epoch: [1][4262/4549] Time 0.608 (0.310) Data 0.001 (0.004) Loss 215.2132 (150.0325)
Traceback (most recent call last):
File "train.py", line 318, in <module>
main()
File "train.py", line 169, in main
for i, (data) in enumerate(train_loader, start=start_iter):
File "/home/ryan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 210, in __next__
return self._process_next_batch(batch)
File "/home/ryan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 237, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/ryan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 41, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/ryan/devel/deepspeech.pytorch/data/data_loader.py", line 125, in _collate_fn
targets = torch.IntTensor(targets)
RuntimeError: tried to construct a tensor from a int sequence, but found an item of type NoneType at index (2794)```
I was in the process of introducing Tensorboard logging when I noticed bugs in how visdom was dealing with continuing from a checkpoint model.
Essentially, there are two bugs that I found and fixed.
epoch
is being incremented wrongly inside the visdom logging section in line 289 of train.pyloss_results, wer_results, cer_results
from the package
, the current statement assigns the variables directly from the dictionary. However, these variables are shorter buffers that only contain the results until the checkpoint epoch. Since they were assigned directly, the three variables are no longer of length = args.epochs
and therefore give an IndexError
when they are eventually accessed.I've fixed these bugs and also added tensorboard logging in my fork: https://github.com/SiddGururani/deepspeech.pytorch
Shall I submit a pull request and we can discuss the tensorboard logging part in further detail there?
Currently if a model is trained via cuda, due to data parallel it has to be loaded with the cuda flag set to true. Create a script to take a cuda based model and save a CPU based model that can be loaded.
This is a reproduction of a traceback I didn't save from last week from a real run. Seems to happen only when the --cuda flag is enabled during the initial run and the continuation. If not using cuda for either, the run continues as expected. Made a quick hacky workaround last week, but it involved hardcoding the last learning rate, and gpu utilization was severely affected.
aaron@...$ python train.py --train_manifest data/ted_train_manifest.csv --val_manifest data/ted_val_manifest.csv --checkpoint --checkpoint_per_batch 20 --cuda --continue_from models/deepspeech_checkpoint_epoch_1_iter_20.pth.tar
Directory already exists.
Loading checkpoint model models/deepspeech_checkpoint_epoch_1_iter_20.pth.tar
Traceback (most recent call last):
File "train.py", line 321, in <module>
main()
File "train.py", line 137, in main
model.load_state_dict(package['state_dict'])
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 331, in load_state_dict
.format(name))
KeyError: 'unexpected key "conv.0.weight" in state_dict'
Epoch: [70][48/48] Time 12.067 (9.480) Data 0.003 (0.019) Loss 0.2493 (6.2196)
Training Summary Epoch: [70] Average Loss 0.184
Validation Summary Epoch: [70] Average WER 25 Average CER 14
Learning rate annealed to: 0.000000
Traceback (most recent call last):
File "train.py", line 318, in <module>
main()
File "train.py", line 314, in main
torch.save(checkpoint(model, optimizer, args, len(labels)), args.final_model_path)
TypeError: checkpoint() takes at least 5 arguments (4 given)
In order to build on ubuntu 17.04 using gcc-4.7 I had to add this include.
torch/lib/THPP/Tensor.hpp:#include <stdexcept>
It may not have been the best place but at least you now know.
Otherwise, many pages of warnings and errors.
Also, sudo
was required to install to /usr
folders.
When run "python train.py", there is an error appeared: "Segmentation fault (core dumped)."
Epoch: [70][48/48] Time 0.377 (0.251) Data 0.002 (0.012) Loss 0.1478 (4.6092)
Training Summary Epoch: [70] Average Loss 0.095
Validation Summary Epoch: [70] Average WER 25 Average CER 11
Traceback (most recent call last):
File "train.py", line 324, in <module>
main()
File "train.py", line 306, in main
update='replace',
File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 179, in result
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 603, in line
append=update == 'append', opts=opts)
File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 179, in result
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 461, in updateTrace
assert Y.shape == X.shape, 'Y should be same size as X'
AssertionError: Y should be same size as X
What sorts of durations are folks using for injected noise files? Any ideas on whether it's likely to be problematic if the ~2000 files I'm using are all less than one second?
The loss seems to be always zero when i use train.py
with --cuda
Epoch: [1][1/14063] Time 1.228 (1.228) Data 0.105 (0.105) Loss 0.0000 (0.0000)
Epoch: [1][2/14063] Time 0.135 (0.681) Data 0.002 (0.053) Loss 0.0000 (0.0000)
Epoch: [1][3/14063] Time 0.140 (0.501) Data 0.002 (0.036) Loss 0.0000 (0.0000)
Epoch: [1][4/14063] Time 0.138 (0.410) Data 0.002 (0.028) Loss 0.0000 (0.0000)
Epoch: [1][5/14063] Time 0.140 (0.356) Data 0.002 (0.022) Loss 0.0000 (0.0000)
Epoch: [1][6/14063] Time 0.145 (0.321) Data 0.003 (0.019) Loss 0.0000 (0.0000)
Epoch: [1][7/14063] Time 0.142 (0.295) Data 0.000 (0.016) Loss 0.0000 (0.0000)
Epoch: [1][8/14063] Time 0.151 (0.277) Data 0.002 (0.015) Loss 0.0000 (0.0000)
Epoch: [1][9/14063] Time 0.154 (0.264) Data 0.002 (0.013) Loss 0.0000 (0.0000)
Epoch: [1][10/14063] Time 0.156 (0.253) Data 0.002 (0.012) Loss 0.0000 (0.0000)
default train.py
works normally.
Epoch: [1][1/14063] Time 2.203 (2.203) Data 0.100 (0.100) Loss 113.9308 (113.9308)
Epoch: [1][2/14063] Time 2.132 (2.167) Data 0.000 (0.050) Loss 130.0348 (121.9828)
Epoch: [1][3/14063] Time 2.235 (2.190) Data 0.002 (0.034) Loss 118.3163 (120.7607)
Epoch: [1][4/14063] Time 2.307 (2.219) Data 0.014 (0.029) Loss 101.6445 (115.9816)
Epoch: [1][5/14063] Time 2.482 (2.272) Data 0.020 (0.027) Loss 75.0004 (107.7854)
Epoch: [1][6/14063] Time 2.604 (2.327) Data 0.014 (0.025) Loss 62.9188 (100.3076)
What seems to be the problem? Could this be due to my misconfigurations?
I was in the process of adding Tensorboard logging to the training process when I noticed that some of the gradients were None which led me to investigate this further.
To investigate, I passed a single batch of data and ran a backward pass. I then iterate over the named parameters and check for any None gradients. In doing so, I found this:
module.rnns.0.batch_norm.module.weight True
module.rnns.0.batch_norm.module.bias True
True indicates that they have require_grad set to True.
It doesn't seem like these gradients are meant to be zero so I've raised an issue. Can someone explain to me why this would be the case?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.