louiskirsch / speecht Goto Github PK

View Code? Open in Web Editor NEW

157.0 157.0 36.0 537 KB

An opensource speech-to-text software written in tensorflow

License: Apache License 2.0

Python 100.00%

asr language-model openslr python3 speech-to-text tensorflow wav2letter

speecht's People

Contributors

Stargazers

Watchers

speecht's Issues

ValueError: Directory data/preprocessed-power/test does not exist

I installed speechT with:

sudo apt install python3-pip portaudio19-dev ffmpeg
pip3 install git+https://github.com/timediv/speechT

Then I wanted to test it with the pre-trained model by following this section.

On speecht-cli evaluate --run-name best_run I'm getting this error:

Determine input size from first sample
Traceback (most recent call last):
  File "/home/mertyildiran/.local/bin/speecht-cli", line 221, in <module>
    cli.run()
  File "/home/mertyildiran/.local/bin/speecht-cli", line 207, in run
    self.command_executor.run()
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/lazy/lazy.py", line 28, in __get__
    value = self.__func(inst)
  File "/home/mertyildiran/.local/bin/speecht-cli", line 200, in command_executor
    }[self.parsed.command](self.parsed)
  File "/home/mertyildiran/.local/bin/speecht-cli", line 169, in _get_evaluation_executor
    return speecht.evaluation.Evaluation(flags)
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/execution.py", line 33, in __init__
    self.input_size = self.determine_input_size()
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/execution.py", line 41, in determine_input_size
    return next(self.create_sample_generator(limit_count=1))[0].shape[1]
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/preprocessing.py", line 261, in load_samples
    raise ValueError('Directory {} does not exist'.format(load_directory))
ValueError: Directory data/preprocessed-power/test does not exist

I found that it's related to this line but I don't know how to fix this error. @timediv could you help me please?

Thank you so much with this great project 👍 😊

Compile kenlm with tensorflow

Hey Louis !!! How did you integrate kenlm with tensorflow ??

As of now tensorflow decoder does not have kenlm argument . I went through the reference given by you also but things are still not clear .

Not saving .npz files in the preprocessed folder

After preprocessing, files are not getting stored in the desired folder even though the right directory path has been generated .

Like for developmental data, the directory path is data/preprocessed-power/dev but it shows empty directory even after running the preprocessing step.

Always getting decoded value blank

I try to adapt to use this code with LSTM network by changing network to LSTM by duplicate class Wav2LetterModel and change model to LSTM. After train 10,000 samples for 4,000 round decoded value always return blank. Please help.

class Wav2LetterLSTMModel(SpeechModel): #Add Sep 14, 2017 to create LSTM model

def init(self, input_loader: BaseInputLoader, input_size: int, num_classes: int):
super().init(input_loader, input_size, num_classes)

def _create_network(self, num_classes):

cellsize = 64
num_layers = 3

inputs = self.inputs       
inputs, sequence_lengths, labels = self.input_loader.get_inputs() 

XT = tf.transpose(inputs, [1, 0, 2])  # permute time_step_size and batch_size
XR = tf.reshape(XT, [-1, self.input_size]) # each row has input for each lstm cell (lstm_size=input_vec_size)
X_split = tf.split(XR, cellsize, 0) # split them to time_step_size (arrays)

lstm = rnn.BasicLSTMCell(cellsize, forget_bias=0.5, state_is_tuple=True)
outputs, _states = rnn.static_rnn(lstm, X_split, dtype=tf.float32)

return tf.transpose(outputs, (1, 0, 2))

where to find output of records?

Hello. I ran speecht-cli evaluate --run-name best_run
and the output is:

Initialize SingleInputLoader
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
2018-06-03 13:26:16.185898: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Reading model parameters from train/best_run/speechT.ckpt-106000
Recording audio

But where to find outputs?

Code not runing

After running preprocess file,I tried running training.py and there is nothing happening. There is nothing inside the training.py to start training of the file.

tensorflow version?

Thank you for this great project.
Could you tell me which tensorflow version I have to install to run speechT?
1.0? 1.1? 0.9? older?

Multiple gpu?

How to use multiple gpu for this code ?

issue with pretrained model

Hello,
great work on this repo!
I used the pretrained model with the live recording but when I do I get very inaccurate results.

what I did:

downloaded Kenlm-english & extracted it to project root (same path as speecht-cli: speecht-master/kenlm-english )
downloaded pretrained model & extracted it under train folder( path: speecht-master/train/best_run)
I run this command : python speecht-cli record --train-dir train --run-name best_run --language-model kenlm-english/
eg input: hello 123, eg output:

Generate MFCCs or power spectrogram
Running speech recognition
decoded: unrew winke  ith leane

is there anything I'm doing wrong? or is the model expected to perform like that with ENG ?

would appreciate your help!

Segmentation Fault

@timediv I am getting segmentation fault (core dumped) error while evaulating the model using language model (kenlm)

Starting input pipeline
Begin evaluation
Segmentation fault (core dumped)

epochs for training

How do you set the number of epochs for training ?

epoch-count flag comes in evaluation file,..

Error while using record with --mfcc

Hi, @timediv. I changed in the recording.py

print('Recording audio')
raw_audio, sample_width = recorder.record()
raw_audio = np.array(raw_audio)

import soundfile as sf
raw_audio, sample_rate = sf.read(path_wav_file)
raw_audio = np.array(raw_audio)

and try to run sudo python3 speecht-cli record --mfcc --train-dir train --run-name best_run --language-model kenlm-english/, but have:

Generate MFCCs or power spectrogram
Running speech recognition
Traceback (most recent call last):
  File "speecht-cli", line 221, in <module>
    cli.run()
  File "speecht-cli", line 207, in run
    self.command_executor.run()
  File "/home/ubuntu/s2t/speecht/recording.py", line 71, in run
    [decoded] = model.step(sess, loss=False, update=False, decode=True)
  File "/home/ubuntu/s2t/speecht/speech_model.py", line 231, in step
    input_feed_dict = self.input_loader.get_feed_dict() or {}
  File "/home/ubuntu/s2t/speecht/speech_input.py", line 108, in get_feed_dict
    input_tensor, sequence_lengths, max_time = self._get_inputs_feed_item([self.speech_input])
  File "/home/ubuntu/s2t/speecht/speech_input.py", line 43, in _get_inputs_feed_item
    input_tensor[idx, :inp.shape[0], :] = inp
ValueError: could not broadcast input array from shape (493,39) into shape (493,128)

Why does it happen and how can I fix it?
P.S. with --power work OK

librosa.logamplitude has been removed

The function has been removed (and also parameter name ref_power has been renamed to ref)

So replace speechT/speecht/preprocessing.py:53
from

log_spectrogram = librosa.logamplitude(spectrogram, ref_power=np.max)

log_spectrogram = librosa.power_to_db(spectrogram, ref=np.max)

Language model while training

Can we use Kenlm during training ?

Add default values to --help output

Is this support LSTM model?

Is this support LSTM model? Please guide.

Thanks

hi.. how much diskspace is needed

hi.. running on a vm and would like to know how much diskspace this is likley to take up

many thanks

tuneable parameters like word_count and valid_word_count_weight

@timediv what could be the range of value for these parameters ? Is there any article where i can read more on it and can set them ?

Unexpected keyword argument 'kenlm_directory_path'

I'm trying to follow Using a language model section.

I have only tensorflow's official GPU installation.

With speecht-cli record --run-name best_run --language-model kenlm-english/ command, I'm getting this error:

Traceback (most recent call last):
  File "/home/mertyildiran/.local/bin/speecht-cli", line 221, in <module>
    cli.run()
  File "/home/mertyildiran/.local/bin/speecht-cli", line 207, in run
    self.command_executor.run()
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/recording.py", line 43, in run
    model = create_default_model(self.flags, self.flags.input_size, speech_input_loader)
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/speech_model.py", line 318, in create_default_model
    valid_word_count_weight=flags.valid_word_count_weight)
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/speech_model.py", line 111, in add_decoding_ops
    top_paths=1)
TypeError: ctc_beam_search_decoder() got an unexpected keyword argument 'kenlm_directory_path'

Probably because you said; "you need Tensorflow with KenLM integration" in that section. But I also stuck in tensorflow-with-kenlm's guide and I don't know what the exact cause. So I'm opening this issue as a question @timediv thank you so much.

I also opened louiskirsch/tensorflow-with-kenlm#4

louiskirsch / speecht Goto Github PK

speecht's People

Contributors

Stargazers

Watchers

Forkers

speecht's Issues

Recommend Projects

Recommend Topics

Recommend Org