Coder Social home page Coder Social logo

nl8590687 / asrt_speechrecognition Goto Github PK

View Code? Open in Web Editor NEW
7.5K 187.0 1.9K 7.93 MB

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Home Page: https://asrt.ailemon.net

License: GNU General Public License v3.0

Python 97.73% Dockerfile 1.70% HTML 0.58%
tensorflow cnn ctc python keras speech-recognition speech-to-text chinese-speech-recognition asrt python3

asrt_speechrecognition's Introduction

asrt_speechrecognition's People

Contributors

atomicvar avatar dependabot[bot] avatar huangyz0918 avatar nl8590687 avatar phanatoszou avatar poria-cat avatar williamchenwl avatar wkyo avatar zhangxu999 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asrt_speechrecognition's Issues

训练报错的问题

请教一下,我训练报如下错误是什么原因?谢谢~
python3 SpeechModel22.py
[*提示] 创建模型成功,模型编译成功
Traceback (most recent call last):
File "SpeechModel22.py", line 384, in
ms.LoadModel(modelpath + 'm22_2\speech_model22_e_0_step_257000.model')
File "SpeechModel22.py", line 178, in LoadModel
self._model.load_weights(filename)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 2658, in load_weights
with h5py.File(filepath, mode='r') as f:
File "/usr/local/lib/python3.5/dist-packages/h5py/_hl/files.py", line 269, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/usr/local/lib/python3.5/dist-packages/h5py/_hl/files.py", line 99, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'model_speech/m22_2\speech_model22_e_0_step_257000.model', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

没有pre输出

在实验的过程中我重新训练了网络,发现log文件中没有predict输出,每次都显示错误率100%,但是终端显示每一次迭代loss值都有下降,我的lr设置为0.1,训练了24000个数据,loss值始终在80~81不下降了。
以下是我的log文件,请问这是什么原因?


125
True:	[ 631 1155  840  376 1135 1232 1035  291  108  766  686 1120  405  921
  535]
Pred:	[]

126
True:	[ 631 1155  884 1062  786  816 1074  308 1133 1098 1130  899  751  165
 1227]
Pred:	[]

127
True:	[ 631 1155  882  905  191  208 1001 1108 1057  180  670  316 1029 1133
  908]
Pred:	[]

*[测试结果] 语音识别 train 集语音单字错误率: 100.0 %

当我使用您release的0.3版的模型时,log文件是有pred输出的,但是如果我用0.3版的train模型去训练,输出的log文件依旧没有pred显示。
Any help is appreciated!! Thanks a lot!

在没有语音文件对应的文本情况下如何进行用户意向的判断?

你好,不知道此项目能否达到这样的目标,就是我现在有几十万个wav音频文件,都是几秒钟的,这些文件总体来说可以分为两类,一类是用户肯定回答的,一类是用户拒绝的,这些文件没有对应的语音文本

然后我想用这些文件训练一个模型进行语音识别,我不需要得到准确的识别内容,只要能判断我说的话的意向就好了,想问一下这个应该怎么进行实现呢?

Not enough time for target transition sequence

您好,按着你的数据读入方式, 把模型改成tensoflow版的出现以下问题:
InvalidArgumentError (see above for traceback): Not enough time for target transition sequence (required: 64, available: 39)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[Node: loss/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](output_logits/_31, _arg_input_label/indices_0_0, _arg_input_label/values_0_2, _arg_seq_len_0_4)]]

我用CTCLoss中ignore_longer_outputs_than_inputs=True 代码可以跑通,但是训练损失会出现 inf .

纠结了很久,不知道如何解决?

测试时候报错: ValueError: operands could not be broadcast together with shapes (320,) (400,) ,什么原因?请问怎么解决?谢谢

Traceback (most recent call last):
File "E:\百度云同步盘\pythonProjects\ASRT_v0.3\test.py", line 40, in
r = ms.RecognizeSpeech_FromFile('E:\10086.wav')
File "E:\百度云同步盘\pythonProjects\ASRT_v0.3\SpeechModel25.py", line 357, in RecognizeSpeech_FromFile
r = self.RecognizeSpeech(wavsignal, fs)
File "E:\百度云同步盘\pythonProjects\ASRT_v0.3\SpeechModel25.py", line 326, in RecognizeSpeech
data_input = GetFrequencyFeature3(wavsignal, fs)
File "E:\百度云同步盘\pythonProjects\ASRT_v0.3\general_function\file_wav.py", line 128, in GetFrequencyFeature3
data_line = data_line * w # 加窗
ValueError: operands could not be broadcast together with shapes (320,) (400,)

为什么不使用AISHELL的数据集呢?

目前模型最好的CER能降到多少?我用了aishell数据训练deepspeech2模型,最后CER一直高达16%,目前没有什么好的思路降低,您有没有什么建议?

Training data and label

I want to know which are "dict.txt" or train/ or test/ or dev/ or trans/label ...
It is not match the thchs30 data which is download from the webset

Can you tell me or share your dataset? Thx

運行的過程中出現這個問題

Traceback (most recent call last):
File "asrserver.py", line 16, in
ms = ModelSpeech(datapath)
File "C:\ASRT_SpeechRecognition-master\SpeechModel24.py", line 39, in init
self._model, self.base_model = self.CreateModel()
File "C:\ASRT_SpeechRecognition-master\SpeechModel24.py", line 126, in CreateModel
test_func = K.function([input_data], [y_pred])
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\keras-2.1.6-py3.6.egg\keras\backend\theano_backend.py", line 1248, in function
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\keras-2.1.6-py3.6.egg\keras\backend\theano_backend.py", line 1234, in init
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\function.py", line 317, in function
output_keys=output_keys)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\function_module.py", line 1839, in orig_function
name=name)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\function_module.py", line 1487, in init
accept_inplace)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\function_module.py", line 181, in std_fgraph
update_mapping=update_mapping)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\gof\fg.py", line 175, in init
self.import_r(output, reason="init")
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\gof\fg.py", line 346, in import_r
self.import(variable.owner, reason=reason)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\gof\fg.py", line 391, in import
raise MissingInputError(error_msg, variable=r)
theano.gof.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute if{}(keras_learning_phase, Elemwise{true_div,no_inplace}.0, Elemwise{mul,no_inplace}.0), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

Backtrace when that variable is created:

File "", line 656, in _load_unlocked
File "", line 626, in load_backward_compatible
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\keras-2.1.6-py3.6.egg\keras\backend_init
.py", line 81, in
from .theano_backend import *
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\keras-2.1.6-py3.6.egg\keras\backend\theano_backend.py", line 34, in
_LEARNING_PHASE = T.scalar(dtype='uint8', name='keras_learning_phase') # 0 = test, 1 = train

dict.txt中的音调有些并没有出现在汉语词典中

主要两个问题交流:

  • 比如这个jie5,按作者的意思应该是轻声,但汉语字典中并没有这个发音。
    image

  • 文件末尾添加的顺序比较乱的拼音,有些跟之前的重复,有些也是没出现在汉语字典中。
    image

自己录制语音识别错误

File "D:\1GH\work\ASRT_v0.4\general_function\file_wav.py", line 128, in GetFrequencyFeature3
data_line = data_line * w # 加窗
ValueError: operands could not be broadcast together with shapes (330,) (400,)

这样的问题是需要改代码还是怎么做,作者遇到过没,

multi gpu bug

must add "config = tf.ConfigProto(allow_soft_placement=True)" at train_mspeech.py

测试报错

Using TensorFlow backend.
[*提示] 创建模型成功,模型编译成功
测试进度: 0 / 128
测试进度: 10 / 128
测试进度: 20 / 128
测试进度: 30 / 128
测试进度: 40 / 128
测试进度: 50 / 128
测试进度: 60 / 128
测试进度: 70 / 128
测试进度: 80 / 128
测试进度: 90 / 128
测试进度: 100 / 128
测试进度: 110 / 128
测试进度: 120 / 128
*[测试结果] 语音识别 test 集语音单字错误率: 30.01432664756447 %
Traceback (most recent call last):
File "test_mspeech.py", line 55, in
r = ms.RecognizeSpeech_FromFile('/root/ffdwd1.wav')
File "/data/ASRT_SpeechRecognition-0.3/SpeechModel25.py", line 357, in RecognizeSpeech_FromFile
r = self.RecognizeSpeech(wavsignal, fs)
File "/data/ASRT_SpeechRecognition-0.3/SpeechModel25.py", line 337, in RecognizeSpeech
r1 = self.Predict(data_input, input_length)
File "/data/ASRT_SpeechRecognition-0.3/SpeechModel25.py", line 276, in Predict
x_in[i,0:len(data_input)] = data_input
ValueError: could not broadcast input array from shape (43059,200,1) into shape (1600,200,1)

train_mspeech有关问题?

我将数据集存放在D盘,data_thchs30文件我不会查看,在训练过程报以下错误:
FileNotFoundError: [Errno 2] No such file or directory: 'D:\语音数据集\data_thchs30\train\C13_571.wav'
可以请教一下吗?

结果为空

请问你们都用什么软件录制 16kHz,单声道,16bit 这样的文件?谢谢。为什么我用win10自带录音软件录制的语音测试结果为:
Using TensorFlow backend.
[*提示] 创建模型成功,模型编译成功
*[提示] 语音识别结果:
[]
语音转文字结果:

[Finished in 21.6s]

谢谢

一些疑问与建议

您好,
1、languagemodel1.txt中汉字中有许多“?”,不是应该只有一个吗?
2、您langugemodel1.txt 和 langugemodel2.txt 词频统计的语料来源是?
3、pycorrector 这个模型可以对识别的汉字进行一定程度纠错
import pycorrector
hanzi = pycorrector.correct(hanzi)[0]
4、非(单声道,16khz)wav文件不可以识别,但可以在代码中使用ffmpeg软件进行格式转换。
subprocess.call("ffmpeg -i ** -ac 1 -ar 16000 **.wav", shell =True)
5、百度、科大等公司都使用kenlm这个模型进行语言模型训练,咋们这个project是否也可以使用kenlm模型

multi gpu init bug

multi_gpu.py need fix init func to :
class ParallelModel(keras.models.Model):
"""Subclasses the standard Keras Model and adds multi-GPU support.
It works by creating a copy of the model on each GPU. Then it slices
the inputs and sends a slice to each copy of the model, and then
merges the outputs together and applies the loss on the combined
outputs.
"""

def __init__(self, keras_model, gpu_count):
    """Class constructor.
    keras_model: The Keras model to parallelize
    gpu_count: Number of GPUs. Must be > 1
    """
    super(ParallelModel, self).__init__()
    self.inner_model = keras_model
    self.gpu_count = gpu_count
    merged_outputs = self.make_parallel()
    super(ParallelModel, self).__init__(inputs=self.inner_model.inputs,
                                        outputs=merged_outputs)

各位对机器学习、语音识别等感兴趣的欢迎加入本项目作者的AI柠檬博客群一起讨论~ QQ群号:894112051 微信群二维码见下方

各位对机器学习、人工智能和语音识别等领域感兴趣的欢迎加入我的“AI柠檬博客群”一起交流讨论~ (^_^)
2群群号:894112051
1群群号:867888133 (已满员,请加2群)
我的个人博客为 AI柠檬 (https://blog.ailemon.me)
微信公众号 “AI柠檬博客” (微信号: ailemon_me)
AI柠檬博客微信公众号二维码
Twitter: @ailemon_me
我会不定期分享一些机器学习、人工智能和语音识别技术干货、前沿动态和学习资源等内容。

AI柠檬博客微信群请先加“AI柠檬”(ailemon-me)的微信,通过后会拉您进入群聊:
AI柠檬微信二维码

汉字的解码结果比较凌乱

随机测试了thch30两个测试集:

  1. A2_0
    绿 是 阳春 烟 景 大块 文章 的 底色 四月 的 林 峦 更是 绿 得 鲜活 秀媚 诗意 盎然
    lv4 shi4 yang2 chun1 yan1 jing3 da4 kuai4 wen2 zhang1 de5 di3 se4 si4 yue4 de5 lin2 luan2 geng4 shi4 lv4 de5 xian1 huo2 xiu4 mei4 shi1 yi4 ang4 ran2
    l v4 sh ix4 ii iang2 ch un1 ii ian1 j ing3 d a4 k uai4 uu un2 zh ang1 d e5 d i3 s e4 s iy4 vv ve4 d e5 l in2 l uan2 g eng4 sh ix4 l v4 d e5 x ian1 h uo2 x iu4 m ei4 sh ix1 ii i4 aa ang4 r an2

识别结果:
'lv4', 'shi4', 'yang2', 'chun1', 'yan1', 'jing3', 'da4', 'kuai4', 'wen2', 'rang4', 'de5', 'di3', 'se4', 'si4', 'yue4', 'de5', 'ling2', 'wan2', 'ge4', 'shi2', 'lv4', 'de5', 'xian1', 'huo2', 'xing4', 'mei4', 'shi4', 'yi4', 'er4', 'ran2'

让的底色四月的领玩个时虑的鲜活性昧是一二然

  1. A2_2
    企业 依靠 技术 挖潜 增效 他 负责 全厂 产品质量 与 技术培训 成了 厂里 的 大忙人
    qi3 ye4 yi1 kao4 ji4 shu4 wa1 qian2 zeng1 xiao4 ta1 fu4 ze2 quan2 chang3 chan2 pin3 zhi4 liang4 yu3 ji4 shu4 pei2 xun4 cheng2 le5 chang3 li3 de5 da4 mang2 ren2
    q i3 ii ie4 ii i1 k ao4 j i4 sh u4 uu ua1 q ian2 z eng1 x iao4 t a1 f u4 z e2 q van2 ch ang3 ch an2 p in3 zh ix4 l iang4 vv v3 j i4 sh u4 p ei2 x vn4 ch eng2 l e5 ch ang3 l i3 d e5 d a4 m ang2 r en2

['qi3', 'ye4', 'yi1', 'hou4', 'ji4', 'shu4', 'wa1', 'qian2', 'ceng2', 'xiao4', 'ta1', 'fu4', 'ze2', 'quan2', 'chang3', 'cai2', 'pin3', 'zhi4', 'liang4', 'yu3', 'ji4', 'shu4', 'pei2', 'qu4', 'cheng2', 'le5', 'chang2', 'li3', 'de5', 'da4', 'meng2', 'ren2']
语音转文字结果:
企业一曾笑他负责全场才品质量与技术培去成了厂里的大蒙人

拼音的识别率还不错,应该有80%,汉字比较凌乱,貌似缺乏一个有效的LM去纠正。

InvalidArgumentError (see above for traceback): Not enough time for target transition sequence

Caused by op u'ctc/CTCLoss', defined at:
File "train_mspeech.py", line 44, in
ms = ModelSpeech(datapath)
File "/home/yuyin/ASRT_SpeechRecognition-master/SpeechModel24.py", line 39, in init
self._model, self.base_model = self.CreateModel()
File "/home/yuyin/ASRT_SpeechRecognition-master/SpeechModel24.py", line 109, in CreateModel
loss_out = Lambda(self.ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 457, in call
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 682, in call
return self.function(inputs, **arguments)
File "/home/yuyin/ASRT_SpeechRecognition-master/SpeechModel24.py", line 136, in ctc_lambda_func
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 4167, in ctc_batch_cost
sequence_length=input_length), 1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/ctc_ops.py", line 158, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_ctc_ops.py", line 283, in ctc_loss
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1654, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Not enough time for target transition sequence (required: 15, available: 13)10You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[Node: ctc/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log/_285, ctc/ToInt64/_287, ctc/ToInt32_2/_289, ctc/ToInt32_1/_291)]]
[[Node: ctc/CTCLoss/_293 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_889_ctc/CTCLoss", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

拼音转文字时识别不全

刚转语音识别不太懂,请问我测试时,拼音转文字识别不全,具体原因有哪些呢?
问题如下:
['su1', 'bei3', 'jun1', 'de5', 'yi1', 'qi3', 'ai4', 'guo2', 'jiang4', 'shi4', 'ma3', 'zhan4', 'shan1', 'yi3', 'du4', 'tang2', 'ju4', 'wu3', 'su1', 'bing3', 'ai4', 'dan4', 'tie3', 'mei2', 'gan3', 'ye3', 'fen4', 'qi3', 'kang4', 'zhan4']
语音转文字结果:
苏北军的一起爱国将是马占山以铁煤赶也分起抗战

训练报错

模型构建是成功的,但是训练一开始就报错,如下:

Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 1415 labels:
Traceback (most recent call last):
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1361, in _do_call
    return fn(*args)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _run_fn
    target_list, status, run_metadata)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 1415 labels:
         [[Node: ctc/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log/_1203, ctc/ToInt64/_1205, ctc/ToInt32_2/_1207, ctc/ToInt32_1/_1209)]]
         [[Node: training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1/_959 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5057_training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_clooptraining/Adadelta/gradients/NextIteration_7/_252)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\SpeechModel.py", line 345, in <module>
    ms.TrainModel(datapath, epoch = 2, batch_size = 8, save_step = 1)
  File ".\SpeechModel.py", line 161, in TrainModel
    self._model.fit_generator(yielddatas, save_step)
  File "G:\asr\asrvenv\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "G:\asr\asrvenv\lib\site-packages\keras\engine\training.py", line 2224, in fit_generator
    class_weight=class_weight)
  File "G:\asr\asrvenv\lib\site-packages\keras\engine\training.py", line 1883, in train_on_batch
    outputs = self.train_function(ins)
  File "G:\asr\asrvenv\lib\site-packages\keras\backend\tensorflow_backend.py", line 2478, in __call__
    **self.session_kwargs)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
    run_metadata_ptr)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1355, in _do_run
    options, run_metadata)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 1415 labels:
         [[Node: ctc/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log/_1203, ctc/ToInt64/_1205, ctc/ToInt32_2/_1207, ctc/ToInt32_1/_1209)]]
         [[Node: training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1/_959 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5057_training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_clooptraining/Adadelta/gradients/NextIteration_7/_252)]]

Caused by op 'ctc/CTCLoss', defined at:
  File ".\SpeechModel.py", line 342, in <module>
    ms = ModelSpeech(datapath)
  File ".\SpeechModel.py", line 44, in __init__
    self._model = self.CreateModel()
  File ".\SpeechModel.py", line 109, in CreateModel
    loss_out = Lambda(self.ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
  File "G:\asr\asrvenv\lib\site-packages\keras\engine\topology.py", line 619, in __call__
    output = self.call(inputs, **kwargs)
  File "G:\asr\asrvenv\lib\site-packages\keras\layers\core.py", line 663, in call
    return self.function(inputs, **arguments)
  File ".\SpeechModel.py", line 135, in ctc_lambda_func
    return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
  File "G:\asr\asrvenv\lib\site-packages\keras\backend\tensorflow_backend.py", line 3950, in ctc_batch_cost
    sequence_length=input_length), 1)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\ops\ctc_ops.py", line 158, in ctc_loss
    ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\ops\gen_ctc_ops.py", line 231, in _ctc_loss
    name=name)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op
    op_def=op_def)
  File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 1415 labels:
         [[Node: ctc/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log/_1203, ctc/ToInt64/_1205, ctc/ToInt32_2/_1207, ctc/ToInt32_1/_1209)]]
         [[Node: training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1/_959 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5057_training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_clooptraining/Adadelta/gradients/NextIteration_7/_252)]]

是不是我的数据输入格式有问题?我的数据是这样的:

  1. dict.txt (来自清华数据集)
$ head dict.txt
SIL sil
<SPOKEN_NOISE> sil
啊 aa a1
啊 aa a2
啊 aa a4
啊 aa a5
啊啊啊 aa a2 aa a2 aa a2
啊啊啊 aa a5 aa a5 aa a5
阿 aa a1
阿 ee e1
  1. train.wav.lst
$ head train.wav.lst
A11_000 A11_0.wav
A11_001 A11_1.wav
A11_010 A11_10.wav
A11_100 A11_100.wav
A11_102 A11_102.wav
A11_103 A11_103.wav
A11_104 A11_104.wav
A11_105 A11_105.wav
A11_106 A11_106.wav
A11_107 A11_107.wav
  1. train.syllable.txt
$ head train.syllable.txt
A11_000 绿 是 阳春 烟 景 大块 文章 的 底色 四月 的 林 峦 更是 绿 得 鲜活 秀媚 诗意 盎然
A11_001 他 仅 凭 腰部 的 力量 在 泳道 上下 翻腾 蛹 动 蛇行 状 如 海豚 一直 以 一头 的 优势 领先
A11_010 炮眼 打好 了 炸药 怎么 装 岳 正 才 咬 了 咬牙 倏 地 脱去 衣服 光膀子 冲进 了 水 窜 洞
A11_100 可 谁知 纹 完 后 她 一 照镜子 只见 左下 眼睑 的 线 又 粗 又 黑 与 右侧 明显 不对称
A11_102 一进门 我 被 惊呆 了 这 户 名叫 庞 吉 的 老农 是 抗美援朝 负伤 回乡 的 老兵 妻子 长年 有病 家徒四壁 一贫如洗
A11_103 走出 村子 老远 老远 我 还 回头 张望 那个 安宁 恬静 的 小院 那个 使 我 终身 难忘的 小院
A11_104 二月 四日 住进 新 西门外 罗家 碾 王家 冈 朱自清 闻讯 特地 从 东门外 赶来 庆贺
A11_105 单位 不是我 老爹 开 的 凭什么 要 一 次 二 次 照顾 我 我 不能 把 自己 的 包袱 往 学校 甩
A11_106 都 用 草帽 或 胳膊肘 护 着 碗 趔 趔趄 趄 穿过 烂 泥塘 般 的 院坝 跑回 自己 的 宿 舍去 了
A11_107 香港 演艺圈 欢迎 毛阿敏 加盟 无线 台 与 华星 一些 重大 的 演唱 活动 都 邀请 她 出场 有几次 还 特意 安排 压轴 演出

感谢大神看一看我的问题,还有,可以加QQ联系吗?

找不到文件问题

您好,上次我发现一直出现找不到文件的错误,我发现是不是readdata22-2.py 文件中GetData函数没有当self.type==dev时读取文件的程序,才导致speechmodel.py210行测试程序出错,找不到文件。当我把210行注释掉后就可以在window下运行了。但是在Linux下,我直接把程序复制过去在服务器gpu跑时,又出现找不到路径的问题FileNotFoundError: [Errno 2] No such file or directory: 'dataset/wav/train/A11/A11_183.WAV'
,可是路径下是有该音频的。想问下window和Linux下有什么需要注意修改的地方吗

一些疑问和建议

Hi,您好,
很感谢您的分享。我仔细研究了您的代码,有一些建议和疑问想和您交流一下:

关于模型方面,我有一些建议:

  1. 在模型中可以适当加入BN层,从而加速收敛,且可以适当删掉若干处dropout层。
  2. 例如在m251模型中,对每层卷积加入BN层,同时删掉前面几层卷积的dropout,我实测可以通过调节超参,将模型在测试集下的WER降到17%。

关于语言模型,我想请教一下,您在model_language文件夹下的三个文件,是用什么工具生成的呢?

谢谢!

模型训练

您好,我用的speechmodel22 的模型,是不是我应该把385行的注释#去掉,先训练模型,再测试模型,
然后我在训练模型的时候,出现
496/500 [============================>.] - ETA: 24s - loss: 155.2476
497/500 [============================>.] - ETA: 18s - loss: 155.0813
498/500 [============================>.] - ETA: 12s - loss: 155.0358
499/500 [============================>.] - ETA: 6s - loss: 154.9670
500/500 [==============================] - 3026s 6s/step - loss: 154.7819
*[测试结果] 语音识别 train 集语音单字错误率: 100.0 %
Traceback (most recent call last):
File "D:/ASRT/ASRT_SpeechRecognition-master/SpeechModel22.py", line 385, in
ms.TrainModel(datapath, epoch = 50, batch_size =4, save_step = 500)
File "D:/ASRT/ASRT_SpeechRecognition-master/SpeechModel22.py", line 172, in TrainModel
self.TestModel(self.datapath, str_dataset='dev', data_count = 4)
File "D:/ASRT/ASRT_SpeechRecognition-master/SpeechModel22.py", line 213, in TestModel
data_input, data_labels = data.GetData((ran_num + i) % num_data) # 从随机数开始连续向后取一定数量数据
File "D:\ASRT\ASRT_SpeechRecognition-master\readdata22_2.py", line 141, in GetData
wavsignal,fs=read_wav_data(self.datapath + filename)
File "D:\ASRT\ASRT_SpeechRecognition-master\general_function\file_wav.py", line 21, in read_wav_data
wav = wave.open(filename,"rb") # 打开一个wav格式的声音文件流
File "D:\anaconda\anaconda3\lib\wave.py", line 499, in open
return Wave_read(f)
File "D:\anaconda\anaconda3\lib\wave.py", line 159, in init
f = builtins.open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'D:\ASRT\ASRT_SpeechRecognition-master\dataset\wav\train\A34\A34_190.wav'

Process finished with exit code 1
可是数据集就是没有A34-190.WAV的。请问您有遇到过这个问题吗

OOM error occurred after having 100k+ train steps

My device info:
NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] x 4
Model params:
AUDIO_FEATURE_LENGTH = 200
batch_size = 112


bath size设置为112以上时,运行不久后便会OOM,设置为112,可以跑到10万步左右报OOM。
有两个问题请教一下:
1、请问在10万步左右挂掉后,再load最后的模型继续训练,是不是依旧从原来的数据进行重新训练,也就是说这样会导致前部分数据多次训练,而后面的数据没有机会参加训练?
2、为解决这个OOM的问题要降低AUDIO_FEATURE_LENGTH和batch_size这两种参数?

寻求帮助

作者您好,我现在下载了您最新release版本的代码,也就是ASRT_v0.4,我使用里面已经训练好的model进行识别测试时,识别率还挺高的,但有些关键字还有点问题,例如:政企、集团这些比较独特的词,拼音就不是很准,所以想自己训练一个模型。
现在我使用您代码里面的train_mspeech.py训练新模型,语料库使用了您这边的ST-CMDS-20170001_1-OS和st-cmds,同时加入了一些自己的语料库,训练12000步的模型不准,60000步的也不准,代码没有修改过,只是把batch_size改成了4(16的好像GPU带不起来,一跑就关机,cpu模式又太慢),我想问下,您这边训练好的模型,是用release对应的代码跑的吗?如果是的话,我这问题主要出在哪呢?谢谢

最新的release代码好像有问题

最新的ASRT version 0.4 released 文件中在test_test_mspeech.py中的第15行
和repo中的源码不同且运行会报错
from SpeechModel25 import ModelSpeech
是不是应该改为
from SpeechModel251 import ModelSpeech

Bug,请问楼主,报错这个吗TypeError: __init__() missing 1 required positional argument: 'nb_col'

C:\Users\Administrator\Desktop\ASRT_v0.1>python3 test_mspeech.py
Using TensorFlow backend.
2018-06-03 20:38:53.581395: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-06-03 20:38:53.617397: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_UNKNOWN
2018-06-03 20:38:53.637398: I T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: USER-20171121WK
2018-06-03 20:38:53.638398: I T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_diagnostics.cc:165] hostname: USER-20171121WK
Traceback (most recent call last):
File "test_mspeech.py", line 45, in
ms = ModelSpeech(datapath)
File "C:\Users\Administrator\Desktop\ASRT_v0.1\SpeechModel22.py", line 48, in init
self._model, self.base_model = self.CreateModel()
File "C:\Users\Administrator\Desktop\ASRT_v0.1\SpeechModel22.py", line 84, in CreateModel
layer_h1 = Conv2D(32, (3,3), use_bias=True, activation='relu', padding='same', kernel_initializer='he_normal')(input_data) # 卷积层
TypeError: init() missing 1 required positional argument: 'nb_col'

训练好模型后,测试单个语音文件出现错误

Traceback (most recent call last):
File "/home/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1668, in
main()
File "/home/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1662, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1072, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/pycharm-2017.3.3/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/ASRT_SpeechRecognition-master/SpeechModel23.py", line 371, in
r = ms.RecognizeSpeech_FromFile('/home/kaldi-master/egs/thchs30/thchs30-openslr/data_thchs30/train/C4_568.wav')
File "/home/ASRT_SpeechRecognition-master/SpeechModel23.py", line 322, in RecognizeSpeech_FromFile
r = self.RecognizeSpeech(wavsignal, fs)
File "/home/ASRT_SpeechRecognition-master/SpeechModel23.py", line 303, in RecognizeSpeech
r1 = self.Predict(data_input, input_length)
File "/home/ASRT_SpeechRecognition-master/SpeechModel23.py", line 245, in Predict
x_in[i,0:len(data_input)] = data_input
ValueError: could not broadcast input array from shape (785,400,1) into shape (785,39,1)

准确率低

我自己录音测试感觉准确率比较低,怎么样才能提高准确率?谢谢

code review中出现的问题

关于 readdata24.py中的GetData方法的问题:

  1. bili 变量是干什么用的?为什么等于11;

  2. 语句
    `
    else:

     	n = n_start // bili * (bili - 1)
     	yushu = n_start % bili
     	length=len(self.list_wavnum_stcmds)
     	filename = self.dic_wavlist_stcmds[self.list_wavnum_stcmds[(n + yushu - 1)%length]]
     	list_symbol=self.dic_symbollist_stcmds[self.list_symbolnum_stcmds[(n + yushu - 1)%length]]
    

`
是什么含义?能否解释下

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.