Coder Social home page Coder Social logo

louiskirsch / speecht Goto Github PK

View Code? Open in Web Editor NEW
157.0 157.0 36.0 537 KB

An opensource speech-to-text software written in tensorflow

License: Apache License 2.0

Python 100.00%
asr language-model openslr python3 speech-to-text tensorflow wav2letter

speecht's People

Contributors

louiskirsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speecht's Issues

ValueError: Directory data/preprocessed-power/test does not exist

I installed speechT with:

sudo apt install python3-pip portaudio19-dev ffmpeg
pip3 install git+https://github.com/timediv/speechT

Then I wanted to test it with the pre-trained model by following this section.

On speecht-cli evaluate --run-name best_run I'm getting this error:

Determine input size from first sample
Traceback (most recent call last):
  File "/home/mertyildiran/.local/bin/speecht-cli", line 221, in <module>
    cli.run()
  File "/home/mertyildiran/.local/bin/speecht-cli", line 207, in run
    self.command_executor.run()
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/lazy/lazy.py", line 28, in __get__
    value = self.__func(inst)
  File "/home/mertyildiran/.local/bin/speecht-cli", line 200, in command_executor
    }[self.parsed.command](self.parsed)
  File "/home/mertyildiran/.local/bin/speecht-cli", line 169, in _get_evaluation_executor
    return speecht.evaluation.Evaluation(flags)
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/execution.py", line 33, in __init__
    self.input_size = self.determine_input_size()
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/execution.py", line 41, in determine_input_size
    return next(self.create_sample_generator(limit_count=1))[0].shape[1]
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/preprocessing.py", line 261, in load_samples
    raise ValueError('Directory {} does not exist'.format(load_directory))
ValueError: Directory data/preprocessed-power/test does not exist

I found that it's related to this line but I don't know how to fix this error. @timediv could you help me please?

Thank you so much with this great project ๐Ÿ‘ ๐Ÿ˜Š

Compile kenlm with tensorflow

Hey Louis !!! How did you integrate kenlm with tensorflow ??

As of now tensorflow decoder does not have kenlm argument . I went through the reference given by you also but things are still not clear .

Not saving .npz files in the preprocessed folder

After preprocessing, files are not getting stored in the desired folder even though the right directory path has been generated .

Like for developmental data, the directory path is data/preprocessed-power/dev but it shows empty directory even after running the preprocessing step.

Always getting decoded value blank

I try to adapt to use this code with LSTM network by changing network to LSTM by duplicate class Wav2LetterModel and change model to LSTM. After train 10,000 samples for 4,000 round decoded value always return blank. Please help.

class Wav2LetterLSTMModel(SpeechModel): #Add Sep 14, 2017 to create LSTM model

def init(self, input_loader: BaseInputLoader, input_size: int, num_classes: int):
super().init(input_loader, input_size, num_classes)

def _create_network(self, num_classes):

cellsize = 64
num_layers = 3

inputs = self.inputs       
inputs, sequence_lengths, labels = self.input_loader.get_inputs() 

XT = tf.transpose(inputs, [1, 0, 2])  # permute time_step_size and batch_size
XR = tf.reshape(XT, [-1, self.input_size]) # each row has input for each lstm cell (lstm_size=input_vec_size)
X_split = tf.split(XR, cellsize, 0) # split them to time_step_size (arrays)

lstm = rnn.BasicLSTMCell(cellsize, forget_bias=0.5, state_is_tuple=True)
outputs, _states = rnn.static_rnn(lstm, X_split, dtype=tf.float32)

return tf.transpose(outputs, (1, 0, 2))

where to find output of records?

Hello. I ran speecht-cli evaluate --run-name best_run
and the output is:

Initialize SingleInputLoader
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
2018-06-03 13:26:16.185898: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Reading model parameters from train/best_run/speechT.ckpt-106000
Recording audio

But where to find outputs?

Code not runing

After running preprocess file,I tried running training.py and there is nothing happening. There is nothing inside the training.py to start training of the file.

tensorflow version?

Thank you for this great project.
Could you tell me which tensorflow version I have to install to run speechT?
1.0? 1.1? 0.9? older?

issue with pretrained model

Hello,
great work on this repo!
I used the pretrained model with the live recording but when I do I get very inaccurate results.

what I did:

  1. downloaded Kenlm-english & extracted it to project root (same path as speecht-cli: speecht-master/kenlm-english )
  2. downloaded pretrained model & extracted it under train folder( path: speecht-master/train/best_run)
  3. I run this command : python speecht-cli record --train-dir train --run-name best_run --language-model kenlm-english/
  4. eg input: hello 123, eg output:
Generate MFCCs or power spectrogram
Running speech recognition
decoded: unrew winke  ith leane 

is there anything I'm doing wrong? or is the model expected to perform like that with ENG ?

would appreciate your help!

Segmentation Fault

@timediv I am getting segmentation fault (core dumped) error while evaulating the model using language model (kenlm)

Starting input pipeline
Begin evaluation
Segmentation fault (core dumped)

epochs for training

How do you set the number of epochs for training ?

epoch-count flag comes in evaluation file,..

Error while using record with --mfcc

Hi, @timediv. I changed in the recording.py

print('Recording audio')
raw_audio, sample_width = recorder.record()
raw_audio = np.array(raw_audio)

to

import soundfile as sf
raw_audio, sample_rate = sf.read(path_wav_file)
raw_audio = np.array(raw_audio)

and try to run sudo python3 speecht-cli record --mfcc --train-dir train --run-name best_run --language-model kenlm-english/, but have:

Generate MFCCs or power spectrogram
Running speech recognition
Traceback (most recent call last):
  File "speecht-cli", line 221, in <module>
    cli.run()
  File "speecht-cli", line 207, in run
    self.command_executor.run()
  File "/home/ubuntu/s2t/speecht/recording.py", line 71, in run
    [decoded] = model.step(sess, loss=False, update=False, decode=True)
  File "/home/ubuntu/s2t/speecht/speech_model.py", line 231, in step
    input_feed_dict = self.input_loader.get_feed_dict() or {}
  File "/home/ubuntu/s2t/speecht/speech_input.py", line 108, in get_feed_dict
    input_tensor, sequence_lengths, max_time = self._get_inputs_feed_item([self.speech_input])
  File "/home/ubuntu/s2t/speecht/speech_input.py", line 43, in _get_inputs_feed_item
    input_tensor[idx, :inp.shape[0], :] = inp
ValueError: could not broadcast input array from shape (493,39) into shape (493,128)

Why does it happen and how can I fix it?
P.S. with --power work OK

Unexpected keyword argument 'kenlm_directory_path'

I'm trying to follow Using a language model section.

I have only tensorflow's official GPU installation.

With speecht-cli record --run-name best_run --language-model kenlm-english/ command, I'm getting this error:

Traceback (most recent call last):
  File "/home/mertyildiran/.local/bin/speecht-cli", line 221, in <module>
    cli.run()
  File "/home/mertyildiran/.local/bin/speecht-cli", line 207, in run
    self.command_executor.run()
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/recording.py", line 43, in run
    model = create_default_model(self.flags, self.flags.input_size, speech_input_loader)
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/speech_model.py", line 318, in create_default_model
    valid_word_count_weight=flags.valid_word_count_weight)
  File "/home/mertyildiran/.local/lib/python3.5/site-packages/speecht/speech_model.py", line 111, in add_decoding_ops
    top_paths=1)
TypeError: ctc_beam_search_decoder() got an unexpected keyword argument 'kenlm_directory_path'

Probably because you said; "you need Tensorflow with KenLM integration" in that section. But I also stuck in tensorflow-with-kenlm's guide and I don't know what the exact cause. So I'm opening this issue as a question @timediv thank you so much.

I also opened louiskirsch/tensorflow-with-kenlm#4

sample rate problem

In code the librsa.load by default takes sample rate as 22050 hz rather 16khz ? I think you need to add 16000 in the sample rate ..

Problem while testing the code

File "/home/arpit_agrawal/test/speechT/speecht/evaluation.py", line 143, in run_epoch
decoded_ids = next(decoded_path)
StopIteration
I am getting this stopIteration error

Calculating epochs?

@arpit601
arpit601
commented about 1 hour ago
Lets say I put 1000 files for training and give batch size as 10 , then after running it for 1000 global steps what should be the number of epochs ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.