coqui-ai / stt Goto Github PK

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

License: Mozilla Public License 2.0

Shell 7.61% Python 20.23% Makefile 0.72% Ruby 0.01% Starlark 0.41% C++ 37.03% C 8.04% SWIG 0.34% C# 2.12% Java 0.94% CMake 0.74% JavaScript 0.24% TypeScript 0.46% Swift 1.17% Awk 0.08% Jupyter Notebook 19.86%

stt speech-to-text tensorflow deep-learning automatic-speech-recognition asr voice-recognition speech-recognition speech-recognizer speech-recognition-api

stt's Introduction

Note

This project is no longer actively maintained, and we have stopped hosting the online Model Zoo. We've seen focus shift towards newer STT models such as [Whisper](https://github.com/openai/whisper), and have ourselves focused on [Coqui TTS](https://github.com/coqui-ai/TTS) and [Coqui Studio](https://coqui.ai/).

The models will remain available in [the releases of the coqui-ai/STT-models repo](https://github.com/coqui-ai/STT-models/releases).

👉 Subscribe to 🐸Coqui's Newsletter

Coqui STT (🐸STT) is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models. 🐸STT is battle tested in both production and research 🚀

🐸STT features

High-quality pre-trained STT model.
Efficient training pipeline with Multi-GPU support.
Streaming inference.
Multiple possible transcripts, each with an associated confidence score.
Real-time inference.
Small-footprint acoustic model.
Bindings for various programming languages.

Quickstart

Where to Ask Questions

Type	Link
🚨 Bug Reports	Github Issue Tracker
🎁 Feature Requests & Ideas	Github Issue Tracker
❔ Questions	Github Discussions
💬 General Discussion	Github Discussions or Gitter Room

Links & Resources

Type	Link
📰 Documentation	stt.readthedocs.io
🚀 Latest release with pre-trained models	see the latest release on GitHub
🤝 Contribution Guidelines	CONTRIBUTING.rst

stt's People

Contributors

Stargazers

Watchers

Forkers

jrmeyer rutujasurve94 erogol gedeon-m-gedus gedeonmuhawenayo theolivenbaum whitefu knut0815 databill86 brightflysolutions kumar-asista sadam1195 stjordanis felipeescallon kokizzu mathiasjakobsen nanonabla ilnarselimcan oziee li492549979 orton98 twitch-girl aigizk erksch cmftall purplesparkle zaouk oytunturk nikolasmelui andyweiqiu illusiveman196 solat110 jeremiahrose bastbnl xinsuinizhuan shihuaxing weipin will-rice proger mishav78 tylerhartwig legion2 kju196 newagemob cyberflamego hjortnaes axchanda vpnry c01n73l phat-vtcf rexiome spyrettaleivaditi harikalarkutusu hongwen-sun beatea109 tajcore juliandarley comodoro chamathkb vincentfretin japita-se aya-aljafari hmen97 danmasanii combadge magicalvoice mineraldragon cxz jonorthwash vnsavitri marcus-arcadius vinayasathyanarayana ftyers muharremtac sblarg bernardohenz suryatmodulus techwoes pawel838383 normano hanoi arcada-uas makinglong dexterp37 wasertech vtalker bahlinc-dev jina-ai rsd108 vivinmeth64 maxmax2016 heldib ajayrajpurohit a-d-dasare forrestbakeriv blenderviking akbarfa49 samchan2022 aseer1109 droundy

stt's Issues

Bug: v0.10.0-alpha.14 stt binary does not build in Dockerfile.build: undefined reference to tflite::DefaultErrorReporter()

Describe the bug
When attempting to build the stt binary in version v0.10.0-alpha.14, the linker fails with the following errors:

c++   -std=c++11 -o stt -I/STT/sox-build/include client.cc  -Wl,--no-as-needed -Wl,-rpath,\$ORIGIN -L/STT/tensorflow/bazel-bin/native_client -L/STT/tensorflow/bazel-bin/tensorflow/lite  -lstt -lkenlm -ltensorflowlite  -L/STT/sox-build/lib -lsox
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::DefaultErrorReporter()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::GetString(TfLiteTensor const*, int)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `vtable for tflite::MutableOpResolver'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::SetNumThreads(int)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::SetExecutionPlan(std::vector<int, std::allocator<int> > const&)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::Invoke()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::~Interpreter()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::InterpreterBuilder::InterpreterBuilder(tflite::FlatBufferModel const&, tflite::OpResolver const&)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::ModifyGraphWithDelegate(TfLiteDelegate*)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::InterpreterBuilder::operator()(std::unique_ptr<tflite::impl::Interpreter, std::default_delete<tflite::impl::Interpreter> >*)'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::Interpreter::AllocateTensors()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::FlatBufferModel::~FlatBufferModel()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::impl::InterpreterBuilder::~InterpreterBuilder()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::ops::builtin::BuiltinOpResolver::BuiltinOpResolver()'
/STT/tensorflow/bazel-bin/native_client/libstt.so: undefined reference to `tflite::FlatBufferModel::BuildFromFile(char const*, tflite::ErrorReporter*)'
collect2: error: ld returned 1 exit status
Makefile:22: recipe for target 'stt' failed
make: *** [stt] Error 1
The command '/bin/sh -c make NUM_PROCESSES=$(nproc) stt' returned a non-zero code: 2

To Reproduce
Download STT source.
From STT dir, run
docker build -f Dockerfile.build .

Expected behavior
The Dockerfile should build without errors.
Note I am also getting the same errors in my own Dockerfile (that worked with past versions) and in my custom Yocto build.

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu Linux x86_64
TensorFlow installed from (our builds, or upstream TensorFlow): As per Dockerfile.build
TensorFlow version (use command below): As per Dockerfile.build
Python version: As per Dockerfile.build
Bazel version (if compiling from source): As per Dockerfile.build
GCC/Compiler version (if compiling from source): As per Dockerfile.build
CUDA/cuDNN version: As per Dockerfile.build
GPU model and memory:
Exact command to reproduce: docker build -f Dockerfile.build .

Support Chinese stt?

Bug: When doing transfer learning model throws shapes and refuses to test

root@2335bec676a7:/code# bash -x train.sh ; bash -x test.sh ; bash -x lm.sh ; bash -x export.sh 
+ LLENGUA=el
+ mkdir -p checkpoints
+ TF_CUDNN_RESET_RND_GEN_STATE=1
+ python3 train.py --show_progressbar True --train_cudnn --epochs 25 --es_epochs 3 --max_to_keep 3 --drop_source_layers 2 --train_batch_size 8 --test_batch_size 8 --dev_batch_size 8 --alphabet_config_path /media/cv-corpus-6.1-2020-12-11/el/alphabet.txt --save_checkpoint_dir checkpoints --load_checkpoint_dir deepspeech-0.9.3-checkpoint/ --train_files /media/cv-corpus-6.1-2020-12-11/el/clips/train.csv --dev_files /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv --test_files /media/cv-corpus-6.1-2020-12-11/el/clips/test.csv
W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAINING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
I Loading best validating checkpoint from deepspeech-0.9.3-checkpoint/best_dev-1466475
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I Initializing variable: layer_5/bias
I Initializing variable: layer_5/bias/Adam
I Initializing variable: layer_5/bias/Adam_1
I Initializing variable: layer_5/weights
I Initializing variable: layer_5/weights/Adam
I Initializing variable: layer_5/weights/Adam_1
I Initializing variable: layer_6/bias
I Initializing variable: layer_6/bias/Adam
I Initializing variable: layer_6/bias/Adam_1
I Initializing variable: layer_6/weights
I Initializing variable: layer_6/weights/Adam
I Initializing variable: layer_6/weights/Adam_1
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:01:58 | Steps: 289 | Loss: 150.440151                                                                                 
Epoch 0 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 118.190063 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 118.190063 to: checkpoints/best_dev-1466764
--------------------------------------------------------------------------------
Epoch 1 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 112.087529                                                                                 
Epoch 1 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 108.616011 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 108.616011 to: checkpoints/best_dev-1467053
--------------------------------------------------------------------------------
Epoch 2 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 99.295714                                                                                  
Epoch 2 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 94.945360 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                      
I Saved new best validating model with loss 94.945360 to: checkpoints/best_dev-1467342
--------------------------------------------------------------------------------
Epoch 3 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 83.251152                                                                                  
Epoch 3 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 80.855902 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                      
I Saved new best validating model with loss 80.855902 to: checkpoints/best_dev-1467631
--------------------------------------------------------------------------------
Epoch 4 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 69.715410                                                                                  
Epoch 4 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 71.249459 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                      
I Saved new best validating model with loss 71.249459 to: checkpoints/best_dev-1467920
--------------------------------------------------------------------------------
Epoch 5 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 60.592198                                                                                  
Epoch 5 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 65.730284 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                      
I Saved new best validating model with loss 65.730284 to: checkpoints/best_dev-1468209
--------------------------------------------------------------------------------
Epoch 6 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 54.503149                                                                                  
Epoch 6 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 62.088597 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                      
I Saved new best validating model with loss 62.088597 to: checkpoints/best_dev-1468498
--------------------------------------------------------------------------------
Epoch 7 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 50.214056                                                                                  
Epoch 7 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 59.742852 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                      
I Saved new best validating model with loss 59.742852 to: checkpoints/best_dev-1468787
--------------------------------------------------------------------------------
Epoch 8 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 46.843261                                                                                  
Epoch 8 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 58.021387 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                      
I Saved new best validating model with loss 58.021387 to: checkpoints/best_dev-1469076
--------------------------------------------------------------------------------
Epoch 9 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 44.179930                                                                                  
Epoch 9 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 56.870370 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                      
I Saved new best validating model with loss 56.870370 to: checkpoints/best_dev-1469365
--------------------------------------------------------------------------------
Epoch 10 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 41.892102                                                                                 
Epoch 10 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 55.820707 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 55.820707 to: checkpoints/best_dev-1469654
--------------------------------------------------------------------------------
Epoch 11 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 39.862837                                                                                 
Epoch 11 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 54.944336 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 54.944336 to: checkpoints/best_dev-1469943
--------------------------------------------------------------------------------
Epoch 12 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 38.057278                                                                                 
Epoch 12 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 54.388507 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 54.388507 to: checkpoints/best_dev-1470232
--------------------------------------------------------------------------------
Epoch 13 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 36.473011                                                                                 
Epoch 13 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 53.870240 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 53.870240 to: checkpoints/best_dev-1470521
--------------------------------------------------------------------------------
Epoch 14 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 34.977883                                                                                 
Epoch 14 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 53.513870 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 53.513870 to: checkpoints/best_dev-1470810
--------------------------------------------------------------------------------
Epoch 15 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 33.688198                                                                                 
Epoch 15 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 53.115888 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 53.115888 to: checkpoints/best_dev-1471099
--------------------------------------------------------------------------------
Epoch 16 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 32.372648                                                                                 
Epoch 16 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.871868 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 52.871868 to: checkpoints/best_dev-1471388
--------------------------------------------------------------------------------
Epoch 17 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 31.218126                                                                                 
Epoch 17 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.653315 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 52.653315 to: checkpoints/best_dev-1471677
--------------------------------------------------------------------------------
Epoch 18 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 30.027386                                                                                 
Epoch 18 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.504788 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 52.504788 to: checkpoints/best_dev-1471966
--------------------------------------------------------------------------------
Epoch 19 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 28.985256                                                                                 
Epoch 19 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.341194 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 52.341194 to: checkpoints/best_dev-1472255
--------------------------------------------------------------------------------
Epoch 20 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 28.030723                                                                                 
Epoch 20 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.125567 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 52.125567 to: checkpoints/best_dev-1472544
--------------------------------------------------------------------------------
Epoch 21 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 27.094344                                                                                 
Epoch 21 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.197986 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
--------------------------------------------------------------------------------
Epoch 22 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 26.126977                                                                                 
Epoch 22 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.169392 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
--------------------------------------------------------------------------------
Epoch 23 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 25.329971                                                                                 
Epoch 23 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 51.964646 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
I Saved new best validating model with loss 51.964646 to: checkpoints/best_dev-1473411
--------------------------------------------------------------------------------
Epoch 24 |   Training | Elapsed Time: 0:01:55 | Steps: 289 | Loss: 24.420492                                                                                 
Epoch 24 | Validation | Elapsed Time: 0:00:27 | Steps: 176 | Loss: 52.078386 | Dataset: /media/cv-corpus-6.1-2020-12-11/el/clips/dev.csv                     
--------------------------------------------------------------------------------
I FINISHED optimization in 1:00:13.756332
I Loading best validating checkpoint from deepspeech-0.9.3-checkpoint/best_dev-1466475
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
Traceback (most recent call last):
  File "train.py", line 12, in <module>
    ds_train.run_script()
  File "/code/training/coqui_stt_training/train.py", line 986, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/code/training/coqui_stt_training/train.py", line 962, in main
    test()
  File "/code/training/coqui_stt_training/train.py", line 682, in test
    samples = evaluate(FLAGS.test_files.split(','), create_model)
  File "/code/training/coqui_stt_training/evaluate.py", line 87, in evaluate
    load_graph_for_evaluation(session)
  File "/code/training/coqui_stt_training/util/checkpoints.py", line 151, in load_graph_for_evaluation
    _load_or_init_impl(session, methods, allow_drop_layers=False)
  File "/code/training/coqui_stt_training/util/checkpoints.py", line 98, in _load_or_init_impl
    return _load_checkpoint(session, ckpt_path, allow_drop_layers, allow_lr_init=allow_lr_init)
  File "/code/training/coqui_stt_training/util/checkpoints.py", line 71, in _load_checkpoint
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (29,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(37,)'

Feature request: Scorer dealing with OOV

Hi,

me and my team use STT, for Brazilian Portuguese, and we were having problems when dealing with consecutive OOV (out-of-vocabulary) words. The problem was that, when receiving two or more OOV words, the decoder enters in a state that stop accepting any other word.

After some experimentation, I've taken out the return of OOV_SCORE (in https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/scorer.cpp#L247), but adding a penalization together with the BaseScore as follows:

    // encounter OOV
    // if (word_index == lm::kUNK) {
    //   return OOV_SCORE;
    // }

    cond_prob = language_model_->BaseScore(in_state, word_index, out_state);
    if (word_index == lm::kUNK) {
       cond_prob-=10;
    }

I believe there could be a better solution for this, thus I am opening this issue for discussing a solution.

As your LM is built over a huuge corpus, I suppose that your models do not suffer from OOV words, but I believe that many people may have problems with OOV words with LMs built over smaller corpus.

Website "Quickstart: Deployment" instructions incorrect

Following the Quickstart: Deployment instructions results in an error.

To Reproduce
-Install WSL 2 using Debian on Windows.
-Follow the "Quickstart: Deployment" instructions at: https://stt.readthedocs.io/en/latest/
-When executing the last step "# Transcribe an audio file", this error is thrown:

Loading model from file coqui-stt-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988fcde
Coqui STT: v0.10.0-alpha.14-0-g07ed4176
ERROR: Model provided has model identifier 'u/3', should be 'TFL3'

Error at reading model file coqui-stt-0.9.3-models.pbmm
Traceback (most recent call last):
File "/home/bitbarrel/venv-stt/bin/stt", line 8, in
sys.exit(main())
File "/home/bitbarrel/venv-stt/lib/python3.8/site-packages/stt/client.py", line 148, in main
ds = Model(args.model)
File "/home/bitbarrel/venv-stt/lib/python3.8/site-packages/stt/init.py", line 40, in init
raise RuntimeError(
RuntimeError: CreateModel failed with 'Failed to initialize memory mapped model.' (0x3000)

Edit:
It works when tflite is used instead of pbmm, so it looks like the wrong model was downloaded. How to fix this?

Feature request: Include compiled generate_scorer_package in official docker image

Is your feature request related to a problem? Please describe.
When training a language model in an official docker image obtained via $ docker pull ghcr.io/coqui-ai/stt-train, it is possible to train a language model with generate_lm.py, but it is not possible to generate a scorer package via ./generate_scorer_package.

Describe the solution you'd like
I'd like a compiled generate_scorer_package binary to be included in the docker image.

Describe alternatives you've considered
I usually just wget the compiled version from the linux native_clients on github releases

Additional context
NA

Feature request: Link all assets / repos to the docs

model zoo
STT-examples
model-manager
open-speech-corpora

Bug: Node version - npm install, empty index.js in node_modules/stt

Describe the bug
In the Node version when installing via npm install stt (v 0.10.0-alpha.4) and running the code example, you get an error

TypeError: Ds.Model is not a constructor

But most importantly if you look inside node_modules/stt/index.js the file is empty.

To Reproduce

create a new node project with npm init
do npm install stt
inspect node_modules/stt/index.js
See error - index.js is empty

Expected behavior
To be able to install stt via npm, import/require the module and use it etc..

Environment (please complete the following information):
Mac OS X

Additional context
Where's the code that is installed via npm? is it in one of the folders in this repo https://github.com/coqui-ai/STT?

Feature request: jupyter notebooks

Add jupyter notebooks for:

hotword boosting
export a model
train a language model
train with JSON config file

Unity support?

Feature request: Output training logs to a persistent destination by default

Is your feature request related to a problem? Please describe.
Yes. I find myself saving training logs to a text file (what I consider) a hacky style. On a server, using the docker training image, I find myself always doing this:

$ docker run train.py &> /my/local/file.log &

This seems non-obvious to me, and I see people training models in docker containers, and then they're surprised when the training logs and loss data is gone. Saving the Loss curves to a text file makes viewing the training progress much easier on a remote server than using tensorboard, for example.

Describe the solution you'd like
I'd like to be able to save training loss information to a text file with a command-line flag such as --training_log_file /path/to/file.txt

Describe alternatives you've considered
Redirecting output from stdout to a text file (as mentioned above). Inspecting tensorboard graphs (but I haven't tried this on a remote server, and afaik you can't access them during training.) I like to keep an eye on training curves during training with gnuplot, by piping a single col of (train or dev) losses into | gnuplot -p -e "set terminal dumb $(tput cols) $(tput lines); plot '-' using 1"

Additional context
None

Use manylinux2014 to build Python wheels

This would allow us to publish aarch64 and armv7 wheels, as well as avoid weird binary incompatibility issues due to the hacking of the platform tag we currently do. Caveat is that installation will fail if the user does not upgrade pip, but at this point I think we instruct people to update pip on every piece of documentation that does a pip install.

Bug: Training Docker image AttributeError: module 'tensorflow' has no attribute 'lite'

Describe the bug
When training or converting a model to TFLite with latest Docker image, it fails with the error:

Traceback (most recent call last):
  File "train.py", line 12, in <module>
    stt_train.main()
  File "/code/training/coqui_stt_training/train.py", line 1266, in main
    export()
  File "/code/training/coqui_stt_training/train.py", line 1107, in export
    converter = tf.lite.TFLiteConverter(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in __getattr__
    attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow' has no attribute 'lite'

To Reproduce
I am pulling the Docker image and adding Sox as follows:

# Get the latest 🐸STT image
FROM ghcr.io/coqui-ai/stt-train:latest

ENV DEBIAN_FRONTEND=noninteractive

# Install nano editor
RUN apt-get -y update && apt-get install -y nano

# Install sox for inference and for processing Common Voice data
RUN apt-get -y update && apt-get install -y sox

Then I am attempting to convert the yesno model checkpoints to TFLite as follows (note - I also got this error with the same Docker image when training a model from scratch)

#!/bin/bash
set -x

docker build -t stt-image .

docker run \
    --rm \
    -it \
    --entrypoint /bin/bash \
    --name stt-train \
    --gpus all \
    --mount type=bind,source="$(pwd)"/stt-data,target=/code/stt-data \
    stt-image \
    -c "cd /code && \
        python3 train.py \
          --checkpoint_dir /code/stt-data/checkpoints \
          --export_dir /code/stt-data/exported-model \
          --n_hidden 64 \
          --alphabet_config_path /code/stt-data/alphabet.txt \
          --export_tflite=true "

Expected behavior
The model should train / convert / export without an error.

Environment (please complete the following information):

Ubuntu 20.04
Dependency versions as per Docker image ghcr.io/coqui-ai/stt-train latest 5adb1e5d8af5
CUDA Version: 11.2
GeForce MX150 2GB

Additional context
I believe this is probably an issue with Tensorflow versions, and I am going to try adding .contrib to the offending lines in train.py

Feature request: Allow random batch order during training

Is your feature request related to a problem? Please describe.
you can only train batches in order of length of the batch (i.e. audio length), or reverse audio length

Describe the solution you'd like
I'd like to have more control over the order of batches, in particular random ordering

Bug: lm_optimizer.py still calls utils.flags

From the gitter channel <https://gitter.im/coqui-ai/STT>

While running this

python3 lm_optimizer.py --test_files stt-data/cv-corpus-7.0-2021-07-21/lg/clips/test.csv --checkpoint_dir stt-data/best_dev-1594273

I get

Traceback (most recent call last): File "lm_optimizer.py", line 15, in <module> from coqui_stt_training.util.flags import FLAGS, create_flags ModuleNotFoundError: No module named 'coqui_stt_training.util.flags'

Building Java bindings from source needs documentation and build system improvements

See for example #1844 (comment)

Bug: it writes text without dot and comma

Describe the bug
it writes text without dot and comma

To Reproduce
Steps to reproduce the behavior:

tts --text "To help with the large amounts of pull requests, we would appreciate your reviews of other pull requests, especially simple package updates. Just leave a comment describing what you have tested in the relevant package/service. Reviewing helps to reduce the average time-to-merge for everyone. Thanks a lot if you do!"

[nix-shell:~]$ stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio '/home/davidak/Downloads/tts-0.0.14.wav' 
TensorFlow: v2.3.0-6-g23ad988fcde
 Coqui STT: v0.10.0-alpha.4-74-g49cdf7a6
2021-05-21 23:11:10.956472: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
rate: rate clipped 1 samples; decrease volume?
to help with the large amounts of pull requests we would appreciate your reviews of other poll requests especially simple package up dates just leave a comment describing what you have tested in the relevant packages service reviewing helps to reduce the average time to merge for every one thanks a lot if you do

Expected behavior
In the best case, it would be exactly the same as the input here.

I also tried with spoken words from me and it ignored pauses and intonation.

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): NixOS 21.05pre289526.7a1fbc38a4b
TensorFlow installed from (our builds, or upstream TensorFlow):
TensorFlow version (use command below): v2.3.0-6-g23ad988fcde
Python version:
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:
Exact command to reproduce:

Feature request: Better handling of OOM from large batch sizes

Is your feature request related to a problem? Please describe.

When training an STT model on some data set with some GPU(s), it's impossible to know the largest possible batchsize ahead of time. So, the user must do some search and trial and error to find the largest possible batch size. The trial and error involves setting a batchsize and starting a training run with --reverse_train --reverse_dev --reverse_test. If the user hits OOM errors, then they can set a lower batchsize and try again.

This process is sub-optimal for a few reasons:

The OOM errors are not obvious. The error usually take the form of: (1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 195, 128, 2048] (exact values will vary). As such, new users hit this error and don't realize this is an OOM.
The error may occur on train and/or dev and/or test, and this takes a long time. The user only knows that train and dev batchsizes are set correctly after a full epoch of both train and dev. The user only knows that the test batchsize is correct at the end of training. However, only the single largest batch of train and dev and test is needed. Only these three batches matter, but currently the user must try all batches in all datasets.
Most users are unaware of the flags --reverse_train --reverse_test --reverse_dev

Describe the solution you'd like
I would like a separate script to quickly find the largest possible batchsize for train and test and dev. Something like get_batchsize.py

Describe alternatives you've considered
manually sorting data and running a single epoch on a subset of largest samples in train/test/dev

Additional context
This error comes up often on Gitter

Copy alphabet file into checkpoint directory to make it harder to share checkpoints without alphabets

Bug: System.AccessViolationException on IntermediateDecode from .NET

Describe the bug
(As I understand the native_client for .Net is still 'DeepSpeech'; Please correct me if I am wrong.)
I am using DeepSpeech for inference from microphone stream captured via CSCore audio module. I have custom code for VAD and get Intermediate decoding done to get sentence wise live transcription.

Models: 9.0.3 Pre-Trained English Audio Model and custom Scorers with the same hyper-parameters as the Pre-Trained Scorer.

This works but at random times I get the following "unhandled" exceptions from the .so file.
StackOverflowException
from objDeepSpeech.FeedAudioContent(objStream, buffers, Convert.ToUInt32(buffers.Length));
or
System.AccessViolationException: 'Attempted to read or write protected memory. This is often an indication that other memory is corrupt.'
from objDeepSpeech.IntermediateDecodeWithMetadata(objStream, 1);

The errors originate from within libdeepspeech.so, so I am not able to debug any further. Any help is much appreciated. Thanks.

To Reproduce
Steps to reproduce the behavior:

Use CSCore to listen to Mic
Call objDeepSpeech.IntermediateDecodeWithMetadata(objStream, 1); occasionally - (VAD or just randomly)
Between 30 secs to 2 mins, the call to decode fails with AccessViolationException and is not handle-able as its from the unmanaged c[++] compiled code from the .so file.

Expected behavior

For the calls to keep working indefinitely as long as the system resources allow it.

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows 10
TensorFlow installed from (our builds, or upstream TensorFlow): NA
TensorFlow version (use command below): NA
Python version: NA
Bazel version (if compiling from source): NA
GCC/Compiler version (if compiling from source): NA
CUDA/cuDNN version: NA
GPU model and memory: NA
Exact command to reproduce: objDeepSpeech.IntermediateDecodeWithMetadata(objStream, 1);

Additional context
No failure is seen when using NAudio but STT accuracy is a lot worser than when using CSCore for the same audio.

Can NAudio be made better instead for Intermediate Decoding?
Is any other language implicitly better at this than C#?

Bug: Specified as "raspberry 4 upwards" but no armv71 .whl on pypi

The .whl file DOES exist in the latest pre-release files! So maybe not a real bug anymore, but already fixed?

Describe the bug
The documentation talks about the project being runable on raspberry pi 4 or higher, but there is no .whl for ARMv7 at the moment.

To Reproduce
Steps to reproduce the behavior:

Run the following command 'pip3 install stt-tflite'
Be on a raspi4 while doing step 1
See error

Expected behavior
stt-tflite should have an arm compatible wheel
(after looking at the supported systems - stt should not have the arm .whl)

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Raspbian GNU/Linux 10 (buster) | 5.10.17-v7l+
Python version: cp37m

Additional context
I made a successful switch from deepspeech to coqui with a "manual" setup from: https://github.com/coqui-ai/STT/releases/download/v0.10.0-alpha.6/stt_tflite-0.10.0a6-cp37-cp37m-linux_armv7l.whl - so it is working, just the wheel is mising!
WIP implementation: https://github.com/project-alice-assistant/ProjectAlice/blob/1.0.0-rc1/core/asr/model/CoquiAsr.py

Thank you for making this awesome program publicly available! :)

Use perplexity in metadata confidence return to make values more comparable across difference sentence lengths

Suggested by @bernardohenz:

beta_total = beta_weight * num_words
total_confidence = (confidence -  beta_total) / alpha_weight     # Discount weights from confidence (we're using deepspeech 0.7.1)
 
raw_probability = math.pow( math.e, total_confidence)  # Transform logarithm to probability
perplexity = math.pow(1/raw_probability, 1/num_words)  # Compute perplexity

Embed alphabet information in scorer and check it against loaded acoustic model for a match

This should help with alphabet incompatibility problems.

Bug: Failed to resolve: ai.coqui:libstt:0.9.3

After checking out the project and opening it in Android Studio (4.1), running the app throws this error:

Could not resolve all files for configuration ':app:debugRuntimeClasspath'.
Could not find ai.coqui:libstt:0.9.3.
Searched in the following locations:

Make checkpoints more robust re. versioning/loading/completeness

Alphabet file is crucial for checkpoint loading but is normally not saved alongside it
No programmatic way to check load compatibility against a given version of the training code

Does this speech-to-text conversion work offline

Does this speech-to-text conversion work offline? Or is it online? Is there a program in C # to say something through the microphone to test this model? How does it work?

Per letter and per word confidence score in Metadata output

Feature request: Download models automatically from model zoo

Bug: iOS test project has libstt.so dependency

The iOS test project references a libstt.so file, for example here.
I wonder if this is still intended? I thought the stt_ios.framework will cover everything needed for run the test project.
Or am I wrong here?

Rename and publish decoder package

update docs for generate_scorer_package

advise to not build, but to download

How to install along with tensorflow 2.4.1?

I have tensorflow 2.4.1 installed, but when run

python setup.py install

I got this:

Searching for tensorflow==1.15.4
Reading https://pypi.org/simple/tensorflow/
No local packages or working download links found for tensorflow==1.15.4
error: Could not find suitable distribution for Requirement.parse('tensorflow==1.15.4')

why it force a version? And I can not found where this requirements come from so that nowhere to comment it.

Feature request: Add tool to change scorer alphabet

Is your feature request related to a problem? Please describe.
I have a scorer trained for language X, and it was compiled with alphabet Y. Now I have a new acoustic model (*.pbmm file) which was trained with alphabet Z. I'd like to use my old scorer on new acoustic model, but because the alphabets are not exactly the same, the models are incompatible. I would have to retrain one of the models with the compatible alphabet to use models together. This is burdensome because of the need for data and compute resources.

Describe the solution you'd like
I'd like to be able to specify a new alphabet, and re-export the scorer to be compatible with my acoustic model.

Describe alternatives you've considered
Re-train the language model and re-export the scorer.

Additional context
This is a common problem for sharing models.

Bug: intermediateDecode ignores unprocessed internal buffer

Describe the bug
Stream.intermediateDecode() ignores part of the buffer data that has been sent to the stream.

To Reproduce
(In Python)

# 1. Create a stream
model = stt.Model(args.model)
stream_context = model.createStream()

# 2. Feed it a buffer
stream_context.feedAudioContent(some_buffer)

# 3. Do an intermediate decode
text = stream_context.intermediateDecode()

Current behavior
Depending on the size of some_buffer:

If some_buffer is smaller than the internal batch size, nothing will be processed.
If some_buffer is larger than the internal batch size, only the first part will be processed.

Expected behavior
All of the audio data that has been sent to the stream API so far would be processed and an intermediate result returned.

This is important when streaming to STT with VAD, where a stream will be automatically stopped whenever VAD detection is negative. If the stream is stopped at a point that is not an exact multiple of the internal batch buffer, parts of the audio will not have been processed by the acoustic model and therefore will be missing from the intermediate result. This causes mis-recognition of words.

Stream.finalizeStream() does not suffer from this defect, but it cannot provide intermediate results because it shuts down the stream completely. Intermediate results are essential for fast / low latency voice recognition.

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Custom Yocto Linux
TensorFlow installed from (our builds, or upstream TensorFlow): Coqui Tensorflow
TensorFlow version (use command below):
Python version: 3.5
Bazel version (if compiling from source): 3.1.0
GCC/Compiler version (if compiling from source): Yocto cross compiler
CUDA/cuDNN version: N/A
GPU model and memory: N/A

Feature request: replace progressbar2 with tqdm

right now we use both progressbar2 and tqdm, but the latter is better maintained and the former has caused issues. We should completely replace progressbar2 with tqdm

Bug: missing dependencies in Dockerfile.train for importing data

sox is missing when importing librispeech
libopusfile0 missing when importing MLS

Feature request: support for linux/Aarch64 + gpu

If you have a feature request, then please provide the following information:

Is your feature request related to a problem? Please describe.
We have an Aarch64 device (nvidia jetson agx) and would like to use its GPU cores as running non CUDA version of coqui tends to be too slow for real time inference on our platform as well as being too CPU intensive.

Describe the solution you'd like
Add linux/Aarch64 + gpu to supported platforms

Describe alternatives you've considered
We have tried Tflite models on these devices, they run but but tend be of worse quality, and at least with deepspeech, ran slower than CUDA enabled full scale models.

With deepspeech this project https://github.com/domcross/DeepSpeech-for-Jetson-Nano/releases, managed to port 0.9.x to a jetson nano and agx. Showing at least with Deepspeech code base before your fork this is possible. We have also built it ourselves for an older version of Deepspeech but the build process was tricky and likely beyond the scope for many teams.

Bug: ImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /path/to/.local/lib/python3.7/site-packages/stt/lib/libstt.so)

Description of the bug and steps to Reproduce the behaviour
I was trying to follow the "Quickstart:Deployment" section. When I ran the final command stt --model ... It did not give any output but directly the following Traceback:

Traceback (most recent call last):
  File "/home/sammit/.local/bin/stt", line 5, in <module>
    from stt.client import main
  File "/home/sammit/.local/lib/python3.7/site-packages/stt/__init__.py", line 23, in <module>
    from stt.impl import Version as version
  File "/home/sammit/.local/lib/python3.7/site-packages/stt/impl.py", line 13, in <module>
    from . import _impl
ImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /home/sammit/.local/lib/python3.7/site-packages/stt/lib/libstt.so)

The documentation says Coqui STT supports GLIBC>=2.19 and I have GLIBC=2.23. Please see below the output when I run ldd --version

ldd (Ubuntu GLIBC 2.23-0ubuntu11.3) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

And my OS Description:

Operating System: Ubuntu 16.04.6 LTS
            Kernel: Linux 4.4.0-184-generic
      Architecture: x86-64

Feature request: Docs: Pin "Advanced Training Topics" in sidebar

this page: https://stt.readthedocs.io/en/latest/TRAINING_ADVANCED.html#advanced-training-docs is only accessible from Training: Quickstart, which isn't a great userflow

Better error for missing alphabet.txt or bytes_output_mode

If you don't configure the alphabet, you get an error about the dims of layer 6. You should get an error about the alphabet

root@e02b8a0e3e26:/code# python3 train.py \
>   --train_files stt-data/clips/train.csv \
>   --dev_files stt-data/clips/dev.csv \
>   --test_files stt-data/clips/test.csv 
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Traceback (most recent call last):
  File "train.py", line 12, in <module>
    stt_train.main()
  File "/code/training/coqui_stt_training/train.py", line 1258, in main
    train()
  File "/code/training/coqui_stt_training/train.py", line 619, in train
    gradients, loss, non_finite_files = get_tower_results(
  File "/code/training/coqui_stt_training/train.py", line 417, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(
  File "/code/training/coqui_stt_training/train.py", line 335, in calculate_mean_edit_distance_and_loss
    logits, _ = create_model(
  File "/code/training/coqui_stt_training/train.py", line 294, in create_model
    "layer_6", layer_5, Config.n_hidden_6, relu=False
  File "/code/training/coqui_stt_training/util/config.py", line 28, in __getattr__
    raise RuntimeError(
RuntimeError: Configuration option n_hidden_6 not found in config.

Feature request: Docs: add section on how to chose batch size

OOM errors happen all the time in training for newcomers and pros... we should have a simple guide on using --reverse_{train,test,dev} and steps to choose the right batchsize on some setup...

also, maybe we just need --reverse_batches instead of all three --reverse_{train,test,dev}... I can't think of a situation where you'd want a subset, but not all, set to reverse

Bug: Dockerfile.build redownloads the entire source code

Describe the bug
When running docker build -f Dockerfile.build . from the STT directory, Dockerfile.build redownloads the entire STT source code, instead of using the existing source code.

To Reproduce
Clone STT source code with git
Make some commits / check out the tag you want to build
docker build -f Dockerfile.build .

Expected behavior
The Dockerfile would build the source that exists in the STT directory, including any changes that have been made to the source.

Actual behaviour
The Dockerfile downloads a fresh version of the source and builds that. This is unexpected and unintuitive, and a waste of disk space, bandwidth and time.

Bug: duplicate command-line training flags silently parsed

Current behavior: If a flag is duplicated at the CLI and passed to train.py, there is no warning or error message, and the last setting is saved. This was first observed with the --augment flag where the effect results in only the last augmentation being applied. The behavior was replicated with --reverse_train true --reverse_train false, with the final result in flags.txt being --reverse_train false

Expected behavior: An error message stating that a flag has been duplicated, and aborting the training attempt.

Bug: Broken link in documentation DATA_FORMATTING.html

In the Coqui STT 0.10.0-alpha.13 documentation, at https://stt.readthedocs.io/en/latest/playbook/DATA_FORMATTING.html, where it says

`Data from Common Voice

If you are using data from Common Voice for training a model, you will need to prepare it as outlined in the 🐸STT documentation.`

The link to https://stt.readthedocs.io/en/latest/TRAINING.html#common-voice-training-data is broken.

ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container, STT Docker Image

While running the STT docker image on Tesla K80 GPU,
docker run -it --gpus all --mount type=bind,source="$(pwd)"/stt-data,target=/code/stt-data 5adb1e5d8af5

The container starts and once the image is loaded, I'm welcomed by the following message:

ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container

NOTE: Legacy NVIDIA Driver detected. Compatibility mode ENABLED.

That's the message I get while running on a p2.xlarge 1 GPU, 4 vCPUs and 61 GB RAM --- Still a Tesla K80 . I get the following message

On a p2.8xlarge, 8 GPUs, 32vCPUs and 488 GB RAM

== TensorFlow ==

NVIDIA Release 21.05-tf1 (build 22596046)
TensorFlow Version 1.15.5

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container

NOTE: Legacy NVIDIA Driver detected. Compatibility mode ENABLED.

I'm I right to say that the this image works for a section of the NVIDIA GPUs?

Feature request: parallelize generate_lm.py

If you have a feature request, then please provide the following information:

Is your feature request related to a problem? Please describe.
Training a 5-gram language model on 13G of text takes a very long time (hours?) and only uses one CPU.

Describe the solution you'd like
I'd like to use all my CPUs in parallel to finish the job faster.

Describe alternatives you've considered
Training the LM on a lower resource machine is about the only option I can think of, to not waste time on bigger machines.

Additional context
None.

Missing binaries

The binary "generate_scorer_package" is missing.

From the manual here:
https://stt.readthedocs.io/en/latest/LANGUAGE_MODEL.html

"Finally, we package the trained KenLM model for deployment with generate_scorer_package. You can find pre-built binaries for generate_scorer_package on the official 🐸STT release page (inside native_client.*.tar.xz). "

However, no binaries are available.

Drop `doc/Makefile` in favor of doing things exactly like they're done in ReadTheDocs to avoid discrepancies between local/CI/RTD

Bug: training flags docs broke after coqpit

this page in the docs no longer renders, (i think) because there's no flags.py anymore https://stt.readthedocs.io/en/latest/TRAINING_FLAGS.html#training-flags

Bug: /code/kenlm/build/bin/lmplz: error while loading shared libraries: libboost_program_options.so.1.71.0: cannot open shared object file: No such file or directory

Describe the bug
I got the following error when running the command as instructed in Train the Language Model

/code/kenlm/build/bin/lmplz: error while loading shared libraries: libboost_program_options.so.1.71.0: cannot open shared object file: No such file or directory

To Reproduce
Steps to reproduce the behavior:

Run the command python3 data/lm/generate_lm.py \ --input_txt /experiment/librispeech-lm-norm.txt.gz \ --output_dir . \ --top_k 500000 \ --kenlm_bins /code/kenlm/build/bin/ \ --arpa_order 5 \ --max_arpa_memory "85%" \ --arpa_prune "0|0|1" \ --binary_a_bits 255 \ --binary_q_bits 8 \ --binary_type trie
See full error below

================
== TensorFlow ==
================

NVIDIA Release 20.06-tf1 (build 13409399)
TensorFlow Version 1.15.2

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.

NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

python3 data/lm/generate_lm.py \
  --input_txt /experiment/librispeech-lm-norm.txt.gz \
  --output_dir . \
  --top_k 500000 \
  --kenlm_bins /code/kenlm/build/bin/ \
  --arpa_order 5 \
  --max_arpa_memory "85%" \
  --arpa_prune "0|0|1" \
  --binary_a_bits 255 \
  --binary_q_bits 8 \
  --binary_type trieroot@f48fa96f3b9c:/code# python3 data/lm/generate_lm.py \
>   --input_txt /experiment/librispeech-lm-norm.txt.gz \
>   --output_dir . \
>   --top_k 500000 \
>   --kenlm_bins /code/kenlm/build/bin/ \
>   --arpa_order 5 \
>   --max_arpa_memory "85%" \
>   --arpa_prune "0|0|1" \
>   --binary_a_bits 255 \
>   --binary_q_bits 8 \
>   --binary_type trie

Converting to lowercase and counting word occurrences ...
| |                                               #                                                                   | 40418260 Elapsed Time: 0:13:52

Saving top 500000 words ...

Calculating word statistics ...
  Your text file has 803288729 words in total
  It has 973673 unique words
  Your top-500000 words are 99.9354 percent of all words
  Your most common word "the" occurred 49059384 times
  The least common word in your top-k is "corders" with 2 times
  The first word with 3 occurrences is "zungwan" at place 420186

Creating ARPA file ...
/code/kenlm/build/bin/lmplz: error while loading shared libraries: libboost_program_options.so.1.71.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "data/lm/generate_lm.py", line 210, in <module>
    main()
  File "data/lm/generate_lm.py", line 201, in main
    build_lm(args, data_lower, vocab_str)
  File "data/lm/generate_lm.py", line 97, in build_lm
    subprocess.check_call(subargs)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/code/kenlm/build/bin/lmplz', '--order', '5', '--temp_prefix', '.', '--memory', '85%', '--text', './lower.txt.gz', '--arpa', './lm.arpa', '--prune', '0', '0', '1']' returned non-zero exit status 127.
root@f48fa96f3b9c:/code#

Expected behavior
I should be able to generate lm.binary and vocab-500000.txt

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
TensorFlow installed from (our builds, or upstream TensorFlow): Using docker ghcr.io/coqui-ai/stt-train:latest
TensorFlow version (use command below): 1.15.2
Python version: 3.6.9
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 11.2
GPU model and memory: Quadro P4200
Exact command to reproduce:

Solution

I actually got it working by removing the original KenLM and re-compiling KenLM using the following command in the /code folder

git clone https://github.com/kpu/kenlm.git && cd kenlm && mkdir build && cd build/ && cmake .. && make -j 4

coqui-ai / stt Goto Github PK

stt's Introduction

🐸STT features

Where to Ask Questions

Links & Resources

stt's People

Contributors

Stargazers

Watchers

Forkers

stt's Issues

On a p2.8xlarge, 8 GPUs, 32vCPUs and 488 GB RAM

== TensorFlow ==

Recommend Projects

Recommend Topics

Recommend Org