pannous / tensorflow-speech-recognition Goto Github PK

View Code? Open in Web Editor NEW

2.2K 190.0 643.0 31.87 MB

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

License: Other

Python 98.02% Swift 1.98%

tensorflow speech-recognition neural-network deep-learning stt speech-to-text

tensorflow-speech-recognition's Introduction

Tensorflow Speech Recognition

Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks.

Replaces caffe-speech-recognition, see there for some background.

Update 2024: Use Whisper !

This (relatively) old project is NO LONGER UP TO DATE.
The tensorflow 1.0 used is not compatible anymore and the theory is no longer state of the art either.
We highly recommend you check out and use whisper

Update 2020: Mozilla released DeepSpeech

They achieve good error rates. Free Speech is in good hands, go there if you are an end user. For now this project is only maintained for educational purposes.

Ultimate goal

Create a decent standalone speech recognition for Linux etc. Some people say we have the models but not enough training data. We disagree: There is plenty of training data (100GB here and 21GB here on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

Installation

clone code

git clone https://github.com/pannous/tensorflow-speech-recognition
cd tensorflow-speech-recognition
git clone https://github.com/pannous/layer.git
git clone https://github.com/pannous/tensorpeers.git

pyaudio

requirements portaudio from http://www.portaudio.com/

git clone  https://git.assembla.com/portaudio.git
./configure --prefix=/path/to/your/local
make
make install
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/local/lib
export LIDRARY_PATH=$LIBRARY_PATH:/path/to/your/local/lib
export CPATH=$CPATH:/path/to/your/local/include
source ~/.bashrc

install pyaudio

pip install pyaudio

Getting started

Toy examples: ./number_classifier_tflearn.py ./speaker_classifier_tflearn.py

Some less trivial architectures: ./densenet_layer.py

Later: ./train.sh ./record.py

Update: Nervana demonstrated that it is possible for 'independents' to build speech recognizers that are state of the art.

Fun tasks for newcomers

Watch video : https://www.youtube.com/watch?v=u9FPqkuoEJ8
Understand and correct the corresponding code: lstm-tflearn.py
Data Augmentation : create on-the-fly modulation of the data: increase the speech frequency, add background noise, alter the pitch etc,...

Extensions

Extensions to current tensorflow which are probably needed:

WarpCTC on the GPU see issue
Incremental collaborative snapshots ('P2P learning') !
Modular graphs/models + persistance

Even though this project is far from finished we hope it gives you some starting points.

Looking for a tensorflow collaboration / consultant / deep learning contractor? Reach out to [email protected]

tensorflow-speech-recognition's People

Contributors

Stargazers

Watchers

Forkers

copyfun colingogo zentechthaingo nagyistoce virneo trigrass2 amairesse towhidabsar dvisockas ramchandra94 qingsong99 realentertain kentchun33333 nagyistge xiyuanhou weilamchung lijian8 xuanhan863 olgit25 zerwany bullud sherriiie yanpanlau hotkee simmoncn zhangjiulong pwkalana9 firatkarakusoglu wudeshi chagge audioderant djeraseit huleg stephen-xu cmxnono hammingcube wbgxx333 dreadlord1984 yanzqing sagaruprety jiashaoyong whytin shubashree navya-xx elmargb e-nouri cequencer deeplearningsprint bakingam1983 mihaibalint robi56 liside xiashaxiaoxue navinkr xhuvom kuonanhong willwil ssp154774273 koosuf zentiment twistedmove fancyerii robomate ml-ai-nlp-ir mainyaa weili1988 ahmedlearns infinith4 shawnwongmilab uirye smitheone akboles lhhightech qboticslabs khellan harshil007 mrgoogol mmanas15 halolimat tplink32 kenjichao lvaleriu himanshuraiml adroit91 ck8275411 realitian robustfengbin solertis jocimarcan kevinwenya matcauthon scm-ns tspannhw ahaque12 aymankhattar troixanhvuctham nikaszalias johndpope bzichett benlaitang

tensorflow-speech-recognition's Issues

Requirements File, Installation Guide

Hello,

The requirements.txt file is missing, I was not able to install the project.
It would be great if we can have an installation guide.

Thanks in advance,
Regards

spoken_words url broken

the url spoken_words is broken
`
Downloading from http://pannous.net/files/spoken_words to data/spoken_words
Traceback (most recent call last):
File "speaker_classifier_tflearn.py", line 17, in

urllib.error.HTTPError: HTTP Error 404: Not Found`

Train new language

Is it possible to tran new language with a hundred words?

Speaker Classification Clarification

Hello, I was investing your speaker classification example that uses TFLearn. I had a question about the test audio sample that was used to test the model. I may be mistaken, but I believe that this sample is inside the training set which would not be ideal for testing. Why is this (or isn't this if I am wrong) done?

Thank you in advance for your help!

python number_classifier_tflearn.py hangs on wrong path instead of raising error and take long time to get the data

Hi all, awesome project! I checked it out and wasn't able to run the classification procedure.

Running
python number_classifier_tflearn.py causes the code to run properly up to some line and then hangs out for long time and then capture the mouse while hang out.

everything works fine except that patterns recognition index is not retrieved

Greetings

I did the following:

cloned the repository;
installed pre-requisites
trained my dataset using ./number_classifier_tflearn.py
ran ./record.py
erased the repository and repeated the steps copying the output to gist

as stated below
https://gist.github.com/tiagmoraismorgado/673ca5de5317a1583761a314e7d38ab1

even though, everything works fine except that patterns recognition index is not retrieved tensorflow records voice but it doesn't return recognized pattern index. looking forward for help

Failed to find any matching files for tflearn.lstm.model in speech2text-tflearn.py

speech2text-tflean.py fails with the error:
Failed to find any matching files for tflearn.lstm.model

or
ValueError: Restore called with invalid save path: 'tflearn.lstm.model'. File path is: 'tflearn.lstm.model'

on both linux and windows.

on lines:
model.load("tflearn.lstm.model")
and
model.save("tflearn.lstm.model")

thanks..

How to augment data for the spectogram?

Any sample code to do data augmentation on the spectrograms?
Observed that for spectrogram words, they are mainly in 160 format. How do you get other variations such as 40, 60 +? What kind of transformation is that?

Thanks!

Any solution for the error -> Exception: Invalid objective: catagorical_crossentropy

I came across with some errors. I solved some of them but I can't solve this one.
Speech_data.py comes with error below.

Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
'tar' is not recognized as an internal or external command,
operable program or batch file.
Data ready!
loaded batch of 2402 files
Traceback (most recent call last):
File "demo.py", line 15, in
net = tflearn.regression(net, optimizer='adam', learning_rate=learning_rate, loss='catagorical_crossentropy')
File "C:\python35\lib\site-packages\tflearn\layers\estimator.py", line 174, in regression
loss = objectives.get(loss)(incoming, placeholder)
File "C:\python35\lib\site-packages\tflearn\objectives.py", line 10, in get
return get_from_module(identifier, globals(), 'objective')
File "C:\python35\lib\site-packages\tflearn\utils.py", line 25, in get_from_module
raise Exception('Invalid ' + str(module_name) + ': ' + str(identifier))
Exception: Invalid objective: catagorical_crossentropy

What is this error and how to do I resolve this?

Can someone tell me how to fix this issue? Please

TODOs

split test set(s) #28
make input->chars converge well (input->class works well already)
sliding window
merge WarpCTC or alternative
peer2peer training!

ValueError: No variables to save

I am getting error while running the examples; number_calssifier_tflearn.py and speaker_classifier_tflearn.py. below are details;

Looking for data spoken_numbers_pcm.tar in data/ Extracting data/spoken_numbers_pcm.tar to data/ Data ready! loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files Traceback (most recent call last): File "number_classifier_tflearn.py", line 26, in <module> model = tflearn.DNN(net) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\models\dnn.py", line 57, in __init__ session=session) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\helpers\trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1000, in __init__ self.build() File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save

AND

15 speakers: ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] speakers ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] Looking for data spoken_numbers_pcm.tar in data/ Extracting data/spoken_numbers_pcm.tar to data/ Data ready! 15 speakers: ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] loaded batch of 2402 files Traceback (most recent call last): File "speaker_classifier_tflearn.py", line 27, in <module> model = tflearn.DNN(net) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\models\dnn.py", line 57, in __init__ session=session) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\helpers\trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1000, in __init__ self.build() File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save

Thanks..

What should dtype of placeholder y_ in training be?

From speech_encoder.py,
batch_xs, batch_ys = speech.train.next_batch(100)
batch_xs=[flatten(matrix) for matrix in batch_xs]
feed = {x: batch_xs, y_: batch_ys}

The above has the following error:

ValueError: invalid literal for float(): 2 14 68 6 32 14 73 6 47 14 73 3

What should placeholder of y_ be?

Thanks!

The error when running densenet_layer.py

ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.hdmi.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM hdmi
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.hdmi.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM hdmi
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline:CARD=0,DEV=0
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline:CARD=0,DEV=0
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM phoneline
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM phoneline
[Errno Input overflowed] -9981
Expression 'ret' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1735
Expression 'AlsaOpen( &alsaApi->baseHostApiRep, params, streamDir, &self->pcm )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1902
Expression 'PaAlsaStreamComponent_Initialize( &self->capture, alsaApi, inParams, StreamDirection_In, NULL != callback )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2166
Expression 'PaAlsaStream_Initialize( stream, alsaHostApi, inputParameters, outputParameters, sampleRate, framesPerBuffer, callback, streamFlags, userData )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2835
Traceback (most recent call last):
File "record.py", line 101, in record
dataraw = stream.read(CHUNK)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 605, in read
return pa.read_stream(self._stream, num_frames)
OSError: [Errno Input overflowed] -9981

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "record.py", line 138, in
record()
File "record.py", line 104, in record
stream=get_audio_input_stream()
File "record.py", line 71, in get_audio_input_stream
input_device_index=INDEX)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 747, in open
stream = Stream(self, *args, **kwargs)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 442, in init
self._stream = pa.open(**arguments)
OSError: [Errno Device unavailable] -9985

How to create a speech to text conversion model using this library?

Hi, I'm a beginner to ML concepts and I was wondering whether a speech to text model can be constructed using this library. I'm clueless as of now on how do it. I'd love to be able to test out one and learn from it. Thanks,

Currently empty

When will the tensorflow version be implemented?

errors

there are many errors when i use tensorflow 0.12 , could you get me a readnode for speech recognition?
thanks!

Error When Run densenet_layer.py

Hello Everybody,
I have a problem.
./number_classifier_tflearn.py ./speaker_classifier_tflearn.py run and success but densenet_layer.py not working
I follow this steps on docker.

docker run -it -v C:\WorkData\GitRespostory\tensorflow-speech-recognition:/tf_speech gcr.io/tensorflow/tensorflow:latest-devel

after on shell command screen show

cd /tensorflow
git pull

then run this steps

apt-get update
apt-get install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0
cd /tf_speech
pip install -r requirements.txt
pip install h5py
pip install librosa

Note: spoken_words.tar file manuel download and copy to folder.
and now
python densenet_layer.py

but show this error, please help me.

Traceback (most recent call last): File "densenet_layer.py", line 69, in <module> net.train(data=batch,batch_size=10,steps=5000,dropout=0.6,display_step=10,test_step=100) # run File "/tf_speech/layer/net.py", line 385, in train loss,_= session.run([self.cost,self.optimizer], feed_dict=feed_dict) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 943, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (10, 262144) for Tensor u'data/Placeholder:0', which has shape '(?, 4096, 4096)'

record my own voice

can anyone tell me how do i record my own voice and where we should put it so that we get speech to text converted??

can people tell me in what order to run these codes?

I study these codes recently,but i do not konw which code should i run first,and next.can someone help me?please

missing import in speech2text-tflearn.py

missing import for librosa in speech2text-tflearn.py

also require python-tk to be installed through apt-get

.

Data for CTC in lstm to chars.

The data directory given for ctc data in the lstm_to_chars.py file is given as -
INPUT_PATH = '/data/ctc/sample_data/mfcc' # directory of MFCC nFeatures x nFrames 2-D array .npy files
Where can I find the data (since it is not available in speech_data.py)?

tflearn error: No variables to save

Hi,
I download the speech_data.py and speaker_classifier_tflearn.py.
When I run the speaker_classifier_tflearn.py, I got errors as follows:
Traceback (most recent call last): File "speaker_classifier_tflearn.py", line 28, in <module> model = tflearn.DNN(net) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/models/dnn.py", line 57, in __init__ session=session) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/helpers/trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1000, in __init__ self.build() File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save

spoken_words_wav.tar

Could you, please, upload spoken_words_wav.tar somewhere?
Thank you.

Training Number

The Training keeps going on and on no matter what...I set the training_iters value to 3000 still it keeps going on .. what is the reason?

data download link is broken for words test data?

Any other alternative download link for the data?
I tried clicking the link provided in the source file and in caffe's implementation but to no avail:
https://www.dropbox.com/s/eb5zqskvnuj0r78/spoken_words.tar?dl=0

Thanks!

License of the project

Hi!
What license does the project have? Also who is meant to be the copyright owner for the code commited to the project by third party developers?

i am getting this issue

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /home/hitesh/speec/tflearn.lstm.model
[[Node: save_1/RestoreV2_16 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/RestoreV2_16/tensor_names, save_1/RestoreV2_16/shape_and_slices)]]

Getting some ideas from Wavenet?

Since this project is still in planning stage, I guess we are more open for new ideas. The README mentioned the LSTM, but Wavenet yields better results than LSTM accoring to DeepMind's paper. The Wavenet is explained in the following white paper. Do you think it will be too difficult for us to use the Wavenet approach?

https://drive.google.com/file/d/0B3cxcnOkPx9AeWpLVXhkTDJINDQ/view

Thanks.

could not run train.py

train.py uses a function prepare_data in speech_data, but there is no such a function defined in speech_data

Broken dependency ?? in densenet_layer.py

Not able to run densenet_layer.py; getting the error below;

Traceback (most recent call last): File "densenet_layer.py", line 4, in <module> import layer File "D:\git\AI\tensorflow-speech-recognition\layer\__init__.py", line 1, in <module> from net import * ImportError: No module named 'net'

i fixed the problem with;

from .net import *
File "D:\git\AI\tensorflow-speech-recognition\layer_init_.py", line 1, in

Running on windows 10, tensorflow 0.12, python3.5

Thanks..

Where is requirements.txt file?

Who has successfully found requirements.txt file?

Create Spectrograms

How did you create "Sample spectrogram, Karen uttering 'zero' with 160 words per minute."? How did you create that gray scale spectrogram?

How to...

How does one use this code? More specifically: How does someone who doesn't have an nvidia GPU to train a model use the speech-to-text?

Train data is used to determine accuracy in dense_layer

I'm trying to use dense_layer. Dense_layer uses spectro_batch_generator from speech_data.py to fetch batches of data. Here it is already noted, that training and testing/validation set needs to be split
# shuffle(files) # todo : split test_fraction batch here!

A bit further in dense_layer, the function train from layer/net.py is used. In the train function, currently around line 389, there is:

  feed_dict = {x: batch_xs, y: batch_ys, keep_prob: dropout, self.train_phase: True}
  loss,_= session.run([self.cost,self.optimizer], feed_dict=feed_dict)

Immediately followed by:

  if step % display_step == 0:
    # Calculate batch accuracy, loss
    feed = {x: batch_xs, y: batch_ys, keep_prob: 1., self.train_phase: False}
    acc , summary = session.run([self.accuracy,self.summaries], feed_dict=feed)

If I understand it correctly (and I am new to this, so it's likely that I am wrong), the data is first fed into the train step, after which the exact same data is used to determine the accuracy.

How to record my own speech?

I have tried to record but unsuccessfully. There is module Record.py but it doesn't save speech.

What should I add to code in order to recognize my own speech?

Does not work "number detection using speech" example in this module

Hi @pannous ,

I happy to find example like yours with audio classification. But I see that you need to update your code because it has some problems in running the code.

predict.py

i need a predict.py file. can anyone please help me out with it.

ImportError: No module named layer while running densenet_layer.py

Hi,

I could not run densenet_layer.py since it throws import error of the module layer.

Traceback (most recent call last):
File "densenet_layer.py", line 6, in
import layer
ImportError: No module named layer

From my understanding, this is layers in tflearn. But the model architecture defined here doesnt work
net = layer.net(simple_dense, input_shape=(width,height), output_width=classes, learning_rate=0.01)

Thanks
Manishanker

could not able to run the pannous/tensorflow-speech-recognition

sorry sir , but i have tried a lot to run your application on my system but i am unable to run it on my local system so please help me to run it on my local system i have already installed the setup of python based tensorflow on my system.

ls

ImportError: No module named core_rnn

Hi,

I'm trying to run ./number_classifier_tflearn.py and the following error occurs:
hdf5 is not supported on this machine (please install/reinstall h5py for optimal experience) Traceback (most recent call last): File "./number_classifier_tflearn.py", line 3, in <module> import tflearn File "/usr/local/lib/python2.7/site-packages/tflearn/__init__.py", line 21, in <module> from .layers import normalization File "/usr/local/lib/python2.7/site-packages/tflearn/layers/__init__.py", line 10, in <module> from .recurrent import lstm, gru, simple_rnn, bidirectional_rnn, \ File "/usr/local/lib/python2.7/site-packages/tflearn/layers/recurrent.py", line 8, in <module> from tensorflow.contrib.rnn.python.ops.core_rnn import static_rnn as _rnn, \ ImportError: No module named core_rnn

Any suggestions?

OS : OSX Yosemit on Unix

multiple problems with speech2text-seq2seq.py

Broken dependency "sugartensor" not in requirements.txt

AND

"tensorflow.examples.tutorials" not available in windows install of tensorflow, needs to be copied manually from git repo of tensorflow.

AND

Line 13, "Update:" needs to be commented in speech2text-seq2seq.py

AND

Traceback (most recent call last): File "speech2text-seq2seq.py", line 65, in <module> z = x.sg_conv1d(size=1, dim=num_dim, act='tanh', bn=True) AttributeError: 'list' object has no attribute 'sg_conv1d'

OS: Windows 10,
TensorFlow: 0.12
Python: 3.5

How to create custom 'train_words_index.txt'

I would like to know how to get this sequence number (2 42 14 66 93 19 46 42 24 43 49 3)?

In train_words_index.txt there are number of lines of the word and sequence number like this
'measurement_Victoria_160.wav.png 2 42 14 66 93 19 46 42 24 43 49 3'. I had try to find the way to create this sequence number many where but couldn't be found

Thank you in advance,

newer paper?

Hello, Pannous,
You do great work!!! Really great!!!
I'm new comer in speech recognition. I noticed that you cite in the project some papers of 2012 and 2014. Now it is 2017. If I want to repeat the state-of-art work, do you think I should read some recent papers beyond the ones you use in this project?
Please give me some suggestion. Thanks in advance.

format of file in /data/speech

could you like to give an example of file in /data/speech?

Training is not using GPU capacity

Hey everyone!

I'm trying to train my model with the speech2text-tflearn code. Unfortunately it takes ages to train (a few days). I have installed Tensorflow with GPU support, but the Code is not using any of the GPUs capacity. I have not changed any paramters. What am I getting wrong? Any suggestions?

Thanks!
Cheers
julitosm

Error while reshaping tensors

I am facing problem in reshaping my tensors. Right now I am running train.py from your source but I got the following error:

File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 625, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (100, 4096) for Tensor 'Placeholder:0', which has shape '(?, 262144)'

this is my code snippet

for i in range(6000-1):
    batch_xs, batch_ys = speech.train.next_batch(100)
    # WTF, tensorflow can't do 3D tensor operations?
    # https://github.com/tensorflow/tensorflow/issues/406 =>

    batch_xs=[flatten(matrix) for matrix in batch_xs]

    #batch_ys = np.reshape(batch_ys, (100,4096))
    #batch_xs = np.reshape(batch_xs, (4096,100))

    #  you have to reshape to flat/matrix data? why didn't they call it matrixflow?
    feed = {x: batch_xs, y_: batch_ys}
    speech_step.run(feed) # better for encod_entropy too! (later)
    if(i%100==0):
        print("iteration %d"%i)#, end=' ')
        eval(feed)
    if((i+1)%7000==0):
      print("l_rate*=0.1")
      sess.run(tf.assign(l_rate,l_rate*0.1))
  print("Train")

problems with tensorboard_util.py

tensorboard_util.py will not run on windows; i made the following changes to make it work.

in layer/__init__.py:
changed "from tensorboard_util import *" to "from .tensorboard_util import *"
tensorboard_logs = '/tmp/tensorboard_logs/'
needs to be updated for windows, i just changed it to tensorboard_logs = './tmp/tensorboard_logs/'

logs=subprocess.check_output(["ls", tensorboard_logs]).split("\n")
to
logs=subprocess.check_output(["ls", tensorboard_logs]).decode("utf-8").split("\n")

thanks..

How to classify a entire sentence.

I have many sounds that WAV format. The duration is about 10 seconds. All the wav is one of the ten sentences. For example: "The number you have dialed is power off"/"dialed number is not exist"/ some other sentences. Is it suitable to use your project to do this?

pannous / tensorflow-speech-recognition Goto Github PK

tensorflow-speech-recognition's Introduction

Tensorflow Speech Recognition

Update 2024: Use Whisper !

Update 2020: Mozilla released DeepSpeech

Ultimate goal

Installation

clone code

pyaudio

requirements portaudio from http://www.portaudio.com/

install pyaudio

Getting started

Fun tasks for newcomers

Extensions

tensorflow-speech-recognition's People

Contributors

Stargazers

Watchers

Forkers

tensorflow-speech-recognition's Issues

Greetings

Recommend Projects

Recommend Topics

Recommend Org