- ASRT Speech Recognition (Core Service)
- ASRT SDK for Python
- ASRT SDK for Windows Client
- ASRT SDK for Golang
- ASRT SDK for Java
- covid19-citymap-china (PR: #3)
- nextcloud-social-login (Issue #409)
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Home Page: https://asrt.ailemon.net
License: GNU General Public License v3.0
你好,为什么我在自有数据上直接跑test.py结果永远都是空,得到的预测list=[]
请教一下,我训练报如下错误是什么原因?谢谢~
python3 SpeechModel22.py
[*提示] 创建模型成功,模型编译成功
Traceback (most recent call last):
File "SpeechModel22.py", line 384, in
ms.LoadModel(modelpath + 'm22_2\speech_model22_e_0_step_257000.model')
File "SpeechModel22.py", line 178, in LoadModel
self._model.load_weights(filename)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 2658, in load_weights
with h5py.File(filepath, mode='r') as f:
File "/usr/local/lib/python3.5/dist-packages/h5py/_hl/files.py", line 269, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/usr/local/lib/python3.5/dist-packages/h5py/_hl/files.py", line 99, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'model_speech/m22_2\speech_model22_e_0_step_257000.model', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
我看到一般fbank是取40,谢谢
在实验的过程中我重新训练了网络,发现log文件中没有predict输出,每次都显示错误率100%,但是终端显示每一次迭代loss值都有下降,我的lr设置为0.1,训练了24000个数据,loss值始终在80~81不下降了。
以下是我的log文件,请问这是什么原因?
125
True: [ 631 1155 840 376 1135 1232 1035 291 108 766 686 1120 405 921
535]
Pred: []
126
True: [ 631 1155 884 1062 786 816 1074 308 1133 1098 1130 899 751 165
1227]
Pred: []
127
True: [ 631 1155 882 905 191 208 1001 1108 1057 180 670 316 1029 1133
908]
Pred: []
*[测试结果] 语音识别 train 集语音单字错误率: 100.0 %
当我使用您release的0.3版的模型时,log文件是有pred输出的,但是如果我用0.3版的train模型去训练,输出的log文件依旧没有pred显示。
Any help is appreciated!! Thanks a lot!
你好,不知道此项目能否达到这样的目标,就是我现在有几十万个wav音频文件,都是几秒钟的,这些文件总体来说可以分为两类,一类是用户肯定回答的,一类是用户拒绝的,这些文件没有对应的语音文本
然后我想用这些文件训练一个模型进行语音识别,我不需要得到准确的识别内容,只要能判断我说的话的意向就好了,想问一下这个应该怎么进行实现呢?
您好,按着你的数据读入方式, 把模型改成tensoflow版的出现以下问题:
InvalidArgumentError (see above for traceback): Not enough time for target transition sequence (required: 64, available: 39)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[Node: loss/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](output_logits/_31, _arg_input_label/indices_0_0, _arg_input_label/values_0_2, _arg_seq_len_0_4)]]
我用CTCLoss中ignore_longer_outputs_than_inputs=True 代码可以跑通,但是训练损失会出现 inf .
纠结了很久,不知道如何解决?
Traceback (most recent call last):
File "E:\百度云同步盘\pythonProjects\ASRT_v0.3\test.py", line 40, in
r = ms.RecognizeSpeech_FromFile('E:\10086.wav')
File "E:\百度云同步盘\pythonProjects\ASRT_v0.3\SpeechModel25.py", line 357, in RecognizeSpeech_FromFile
r = self.RecognizeSpeech(wavsignal, fs)
File "E:\百度云同步盘\pythonProjects\ASRT_v0.3\SpeechModel25.py", line 326, in RecognizeSpeech
data_input = GetFrequencyFeature3(wavsignal, fs)
File "E:\百度云同步盘\pythonProjects\ASRT_v0.3\general_function\file_wav.py", line 128, in GetFrequencyFeature3
data_line = data_line * w # 加窗
ValueError: operands could not be broadcast together with shapes (320,) (400,)
目前模型最好的CER能降到多少?我用了aishell数据训练deepspeech2模型,最后CER一直高达16%,目前没有什么好的思路降低,您有没有什么建议?
看代码里是二维卷积,但一维卷积更多用于序列问题,会不会更好点
I want to know which are "dict.txt" or train/ or test/ or dev/ or trans/label ...
It is not match the thchs30 data which is download from the webset
Can you tell me or share your dataset? Thx
比如南方方言 我需要怎么做才能识别
用WFST做解码图,比HMM解码,识别率要高,给个建议
dic_pinyin.txt 应该是拼音(2-gram)--- 词频
languagemodel1.txt
第一行数字没看懂(跑了个脚本,发现不是所有词的词频)
第二行以后应该是 汉字(1-gram)--- 词频
languagemodel2.txt
同上
不知道我这边理解的对不对
想问一下作者,为什么最终模型只使用了cnn、ctc,而把lstm注释了?
是因为最终识别的效果不理想吗?
如题,GPU跑的,大约4000+步就开始出现这个问题,然后一直跳跃100%,0%之间,最后全为0%,请问这是什么情况?
Traceback (most recent call last):
File "asrserver.py", line 16, in
ms = ModelSpeech(datapath)
File "C:\ASRT_SpeechRecognition-master\SpeechModel24.py", line 39, in init
self._model, self.base_model = self.CreateModel()
File "C:\ASRT_SpeechRecognition-master\SpeechModel24.py", line 126, in CreateModel
test_func = K.function([input_data], [y_pred])
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\keras-2.1.6-py3.6.egg\keras\backend\theano_backend.py", line 1248, in function
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\keras-2.1.6-py3.6.egg\keras\backend\theano_backend.py", line 1234, in init
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\function.py", line 317, in function
output_keys=output_keys)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\function_module.py", line 1839, in orig_function
name=name)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\function_module.py", line 1487, in init
accept_inplace)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\compile\function_module.py", line 181, in std_fgraph
update_mapping=update_mapping)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\gof\fg.py", line 175, in init
self.import_r(output, reason="init")
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\gof\fg.py", line 346, in import_r
self.import(variable.owner, reason=reason)
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\theano\gof\fg.py", line 391, in import
raise MissingInputError(error_msg, variable=r)
theano.gof.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute if{}(keras_learning_phase, Elemwise{true_div,no_inplace}.0, Elemwise{mul,no_inplace}.0), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.
Backtrace when that variable is created:
File "", line 656, in _load_unlocked
File "", line 626, in load_backward_compatible
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\keras-2.1.6-py3.6.egg\keras\backend_init.py", line 81, in
from .theano_backend import *
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "C:\Users\bub20\AppData\Local\Programs\Python\Python36\lib\site-packages\keras-2.1.6-py3.6.egg\keras\backend\theano_backend.py", line 34, in
_LEARNING_PHASE = T.scalar(dtype='uint8', name='keras_learning_phase') # 0 = test, 1 = train
File "D:\1GH\work\ASRT_v0.4\general_function\file_wav.py", line 128, in GetFrequencyFeature3
data_line = data_line * w # 加窗
ValueError: operands could not be broadcast together with shapes (330,) (400,)
这样的问题是需要改代码还是怎么做,作者遇到过没,
请问下多gpu训练时,在speech_model24中multi_gpu_model(model, 2)应该放在哪?
must add "config = tf.ConfigProto(allow_soft_placement=True)" at train_mspeech.py
Server raise a bug
Using TensorFlow backend.
[*提示] 创建模型成功,模型编译成功
测试进度: 0 / 128
测试进度: 10 / 128
测试进度: 20 / 128
测试进度: 30 / 128
测试进度: 40 / 128
测试进度: 50 / 128
测试进度: 60 / 128
测试进度: 70 / 128
测试进度: 80 / 128
测试进度: 90 / 128
测试进度: 100 / 128
测试进度: 110 / 128
测试进度: 120 / 128
*[测试结果] 语音识别 test 集语音单字错误率: 30.01432664756447 %
Traceback (most recent call last):
File "test_mspeech.py", line 55, in
r = ms.RecognizeSpeech_FromFile('/root/ffdwd1.wav')
File "/data/ASRT_SpeechRecognition-0.3/SpeechModel25.py", line 357, in RecognizeSpeech_FromFile
r = self.RecognizeSpeech(wavsignal, fs)
File "/data/ASRT_SpeechRecognition-0.3/SpeechModel25.py", line 337, in RecognizeSpeech
r1 = self.Predict(data_input, input_length)
File "/data/ASRT_SpeechRecognition-0.3/SpeechModel25.py", line 276, in Predict
x_in[i,0:len(data_input)] = data_input
ValueError: could not broadcast input array from shape (43059,200,1) into shape (1600,200,1)
我将数据集存放在D盘,data_thchs30文件我不会查看,在训练过程报以下错误:
FileNotFoundError: [Errno 2] No such file or directory: 'D:\语音数据集\data_thchs30\train\C13_571.wav'
可以请教一下吗?
请问你们都用什么软件录制 16kHz,单声道,16bit 这样的文件?谢谢。为什么我用win10自带录音软件录制的语音测试结果为:
Using TensorFlow backend.
[*提示] 创建模型成功,模型编译成功
*[提示] 语音识别结果:
[]
语音转文字结果:
[Finished in 21.6s]
谢谢
请问为什么加窗后取一半,数据并不对称啊
您好,
1、languagemodel1.txt中汉字中有许多“?”,不是应该只有一个吗?
2、您langugemodel1.txt 和 langugemodel2.txt 词频统计的语料来源是?
3、pycorrector 这个模型可以对识别的汉字进行一定程度纠错
import pycorrector
hanzi = pycorrector.correct(hanzi)[0]
4、非(单声道,16khz)wav文件不可以识别,但可以在代码中使用ffmpeg软件进行格式转换。
subprocess.call("ffmpeg -i ** -ac 1 -ar 16000 **.wav", shell =True)
5、百度、科大等公司都使用kenlm这个模型进行语言模型训练,咋们这个project是否也可以使用kenlm模型
multi_gpu.py need fix init func to :
class ParallelModel(keras.models.Model):
"""Subclasses the standard Keras Model and adds multi-GPU support.
It works by creating a copy of the model on each GPU. Then it slices
the inputs and sends a slice to each copy of the model, and then
merges the outputs together and applies the loss on the combined
outputs.
"""
def __init__(self, keras_model, gpu_count):
"""Class constructor.
keras_model: The Keras model to parallelize
gpu_count: Number of GPUs. Must be > 1
"""
super(ParallelModel, self).__init__()
self.inner_model = keras_model
self.gpu_count = gpu_count
merged_outputs = self.make_parallel()
super(ParallelModel, self).__init__(inputs=self.inner_model.inputs,
outputs=merged_outputs)
各位对机器学习、人工智能和语音识别等领域感兴趣的欢迎加入我的“AI柠檬博客群”一起交流讨论~ (^_^)
2群群号:894112051
1群群号:867888133 (已满员,请加2群)
我的个人博客为 AI柠檬 (https://blog.ailemon.me)
微信公众号 “AI柠檬博客” (微信号: ailemon_me)
Twitter: @ailemon_me
我会不定期分享一些机器学习、人工智能和语音识别技术干货、前沿动态和学习资源等内容。
随机测试了thch30两个测试集:
识别结果:
'lv4', 'shi4', 'yang2', 'chun1', 'yan1', 'jing3', 'da4', 'kuai4', 'wen2', 'rang4', 'de5', 'di3', 'se4', 'si4', 'yue4', 'de5', 'ling2', 'wan2', 'ge4', 'shi2', 'lv4', 'de5', 'xian1', 'huo2', 'xing4', 'mei4', 'shi4', 'yi4', 'er4', 'ran2'
让的底色四月的领玩个时虑的鲜活性昧是一二然
['qi3', 'ye4', 'yi1', 'hou4', 'ji4', 'shu4', 'wa1', 'qian2', 'ceng2', 'xiao4', 'ta1', 'fu4', 'ze2', 'quan2', 'chang3', 'cai2', 'pin3', 'zhi4', 'liang4', 'yu3', 'ji4', 'shu4', 'pei2', 'qu4', 'cheng2', 'le5', 'chang2', 'li3', 'de5', 'da4', 'meng2', 'ren2']
语音转文字结果:
企业一曾笑他负责全场才品质量与技术培去成了厂里的大蒙人
拼音的识别率还不错,应该有80%,汉字比较凌乱,貌似缺乏一个有效的LM去纠正。
Caused by op u'ctc/CTCLoss', defined at:
File "train_mspeech.py", line 44, in
ms = ModelSpeech(datapath)
File "/home/yuyin/ASRT_SpeechRecognition-master/SpeechModel24.py", line 39, in init
self._model, self.base_model = self.CreateModel()
File "/home/yuyin/ASRT_SpeechRecognition-master/SpeechModel24.py", line 109, in CreateModel
loss_out = Lambda(self.ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 457, in call
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 682, in call
return self.function(inputs, **arguments)
File "/home/yuyin/ASRT_SpeechRecognition-master/SpeechModel24.py", line 136, in ctc_lambda_func
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 4167, in ctc_batch_cost
sequence_length=input_length), 1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/ctc_ops.py", line 158, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_ctc_ops.py", line 283, in ctc_loss
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1654, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Not enough time for target transition sequence (required: 15, available: 13)10You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[Node: ctc/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log/_285, ctc/ToInt64/_287, ctc/ToInt32_2/_289, ctc/ToInt32_1/_291)]]
[[Node: ctc/CTCLoss/_293 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_889_ctc/CTCLoss", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
刚转语音识别不太懂,请问我测试时,拼音转文字识别不全,具体原因有哪些呢?
问题如下:
['su1', 'bei3', 'jun1', 'de5', 'yi1', 'qi3', 'ai4', 'guo2', 'jiang4', 'shi4', 'ma3', 'zhan4', 'shan1', 'yi3', 'du4', 'tang2', 'ju4', 'wu3', 'su1', 'bing3', 'ai4', 'dan4', 'tie3', 'mei2', 'gan3', 'ye3', 'fen4', 'qi3', 'kang4', 'zhan4']
语音转文字结果:
苏北军的一起爱国将是马占山以铁煤赶也分起抗战
搜到的各种都是腾讯的那篇综述文章。谢谢你
模型构建是成功的,但是训练一开始就报错,如下:
Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 1415 labels:
Traceback (most recent call last):
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1361, in _do_call
return fn(*args)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 1415 labels:
[[Node: ctc/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log/_1203, ctc/ToInt64/_1205, ctc/ToInt32_2/_1207, ctc/ToInt32_1/_1209)]]
[[Node: training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1/_959 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5057_training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_clooptraining/Adadelta/gradients/NextIteration_7/_252)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".\SpeechModel.py", line 345, in <module>
ms.TrainModel(datapath, epoch = 2, batch_size = 8, save_step = 1)
File ".\SpeechModel.py", line 161, in TrainModel
self._model.fit_generator(yielddatas, save_step)
File "G:\asr\asrvenv\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "G:\asr\asrvenv\lib\site-packages\keras\engine\training.py", line 2224, in fit_generator
class_weight=class_weight)
File "G:\asr\asrvenv\lib\site-packages\keras\engine\training.py", line 1883, in train_on_batch
outputs = self.train_function(ins)
File "G:\asr\asrvenv\lib\site-packages\keras\backend\tensorflow_backend.py", line 2478, in __call__
**self.session_kwargs)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
run_metadata_ptr)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1355, in _do_run
options, run_metadata)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\client\session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 1415 labels:
[[Node: ctc/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log/_1203, ctc/ToInt64/_1205, ctc/ToInt32_2/_1207, ctc/ToInt32_1/_1209)]]
[[Node: training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1/_959 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5057_training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_clooptraining/Adadelta/gradients/NextIteration_7/_252)]]
Caused by op 'ctc/CTCLoss', defined at:
File ".\SpeechModel.py", line 342, in <module>
ms = ModelSpeech(datapath)
File ".\SpeechModel.py", line 44, in __init__
self._model = self.CreateModel()
File ".\SpeechModel.py", line 109, in CreateModel
loss_out = Lambda(self.ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
File "G:\asr\asrvenv\lib\site-packages\keras\engine\topology.py", line 619, in __call__
output = self.call(inputs, **kwargs)
File "G:\asr\asrvenv\lib\site-packages\keras\layers\core.py", line 663, in call
return self.function(inputs, **arguments)
File ".\SpeechModel.py", line 135, in ctc_lambda_func
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
File "G:\asr\asrvenv\lib\site-packages\keras\backend\tensorflow_backend.py", line 3950, in ctc_batch_cost
sequence_length=input_length), 1)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\ops\ctc_ops.py", line 158, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\ops\gen_ctc_ops.py", line 231, in _ctc_loss
name=name)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op
op_def=op_def)
File "G:\asr\asrvenv\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 1415 labels:
[[Node: ctc/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log/_1203, ctc/ToInt64/_1205, ctc/ToInt32_2/_1207, ctc/ToInt32_1/_1209)]]
[[Node: training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1/_959 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5057_training/Adadelta/gradients/lstm_1/while/Softmax_grad/mul_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_clooptraining/Adadelta/gradients/NextIteration_7/_252)]]
是不是我的数据输入格式有问题?我的数据是这样的:
$ head dict.txt
SIL sil
<SPOKEN_NOISE> sil
啊 aa a1
啊 aa a2
啊 aa a4
啊 aa a5
啊啊啊 aa a2 aa a2 aa a2
啊啊啊 aa a5 aa a5 aa a5
阿 aa a1
阿 ee e1
$ head train.wav.lst
A11_000 A11_0.wav
A11_001 A11_1.wav
A11_010 A11_10.wav
A11_100 A11_100.wav
A11_102 A11_102.wav
A11_103 A11_103.wav
A11_104 A11_104.wav
A11_105 A11_105.wav
A11_106 A11_106.wav
A11_107 A11_107.wav
$ head train.syllable.txt
A11_000 绿 是 阳春 烟 景 大块 文章 的 底色 四月 的 林 峦 更是 绿 得 鲜活 秀媚 诗意 盎然
A11_001 他 仅 凭 腰部 的 力量 在 泳道 上下 翻腾 蛹 动 蛇行 状 如 海豚 一直 以 一头 的 优势 领先
A11_010 炮眼 打好 了 炸药 怎么 装 岳 正 才 咬 了 咬牙 倏 地 脱去 衣服 光膀子 冲进 了 水 窜 洞
A11_100 可 谁知 纹 完 后 她 一 照镜子 只见 左下 眼睑 的 线 又 粗 又 黑 与 右侧 明显 不对称
A11_102 一进门 我 被 惊呆 了 这 户 名叫 庞 吉 的 老农 是 抗美援朝 负伤 回乡 的 老兵 妻子 长年 有病 家徒四壁 一贫如洗
A11_103 走出 村子 老远 老远 我 还 回头 张望 那个 安宁 恬静 的 小院 那个 使 我 终身 难忘的 小院
A11_104 二月 四日 住进 新 西门外 罗家 碾 王家 冈 朱自清 闻讯 特地 从 东门外 赶来 庆贺
A11_105 单位 不是我 老爹 开 的 凭什么 要 一 次 二 次 照顾 我 我 不能 把 自己 的 包袱 往 学校 甩
A11_106 都 用 草帽 或 胳膊肘 护 着 碗 趔 趔趄 趄 穿过 烂 泥塘 般 的 院坝 跑回 自己 的 宿 舍去 了
A11_107 香港 演艺圈 欢迎 毛阿敏 加盟 无线 台 与 华星 一些 重大 的 演唱 活动 都 邀请 她 出场 有几次 还 特意 安排 压轴 演出
感谢大神看一看我的问题,还有,可以加QQ联系吗?
您好,上次我发现一直出现找不到文件的错误,我发现是不是readdata22-2.py 文件中GetData函数没有当self.type==dev时读取文件的程序,才导致speechmodel.py210行测试程序出错,找不到文件。当我把210行注释掉后就可以在window下运行了。但是在Linux下,我直接把程序复制过去在服务器gpu跑时,又出现找不到路径的问题FileNotFoundError: [Errno 2] No such file or directory: 'dataset/wav/train/A11/A11_183.WAV'
,可是路径下是有该音频的。想问下window和Linux下有什么需要注意修改的地方吗
Hi,您好,
很感谢您的分享。我仔细研究了您的代码,有一些建议和疑问想和您交流一下:
关于模型方面,我有一些建议:
关于语言模型,我想请教一下,您在model_language文件夹下的三个文件,是用什么工具生成的呢?
谢谢!
您好,我用的speechmodel22 的模型,是不是我应该把385行的注释#去掉,先训练模型,再测试模型,
然后我在训练模型的时候,出现
496/500 [============================>.] - ETA: 24s - loss: 155.2476
497/500 [============================>.] - ETA: 18s - loss: 155.0813
498/500 [============================>.] - ETA: 12s - loss: 155.0358
499/500 [============================>.] - ETA: 6s - loss: 154.9670
500/500 [==============================] - 3026s 6s/step - loss: 154.7819
*[测试结果] 语音识别 train 集语音单字错误率: 100.0 %
Traceback (most recent call last):
File "D:/ASRT/ASRT_SpeechRecognition-master/SpeechModel22.py", line 385, in
ms.TrainModel(datapath, epoch = 50, batch_size =4, save_step = 500)
File "D:/ASRT/ASRT_SpeechRecognition-master/SpeechModel22.py", line 172, in TrainModel
self.TestModel(self.datapath, str_dataset='dev', data_count = 4)
File "D:/ASRT/ASRT_SpeechRecognition-master/SpeechModel22.py", line 213, in TestModel
data_input, data_labels = data.GetData((ran_num + i) % num_data) # 从随机数开始连续向后取一定数量数据
File "D:\ASRT\ASRT_SpeechRecognition-master\readdata22_2.py", line 141, in GetData
wavsignal,fs=read_wav_data(self.datapath + filename)
File "D:\ASRT\ASRT_SpeechRecognition-master\general_function\file_wav.py", line 21, in read_wav_data
wav = wave.open(filename,"rb") # 打开一个wav格式的声音文件流
File "D:\anaconda\anaconda3\lib\wave.py", line 499, in open
return Wave_read(f)
File "D:\anaconda\anaconda3\lib\wave.py", line 159, in init
f = builtins.open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'D:\ASRT\ASRT_SpeechRecognition-master\dataset\wav\train\A34\A34_190.wav'
Process finished with exit code 1
可是数据集就是没有A34-190.WAV的。请问您有遇到过这个问题吗
求一个训练好的模型。我使用其他模型效果很差
ASRT_SpeechRecognition/SpeechModel25.py
Line 165 in 67ee7e9
这里什么时候会Raise StopIteration呢?
RT
My device info:
NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] x 4
Model params:
AUDIO_FEATURE_LENGTH = 200
batch_size = 112
bath size设置为112以上时,运行不久后便会OOM,设置为112,可以跑到10万步左右报OOM。
有两个问题请教一下:
1、请问在10万步左右挂掉后,再load最后的模型继续训练,是不是依旧从原来的数据进行重新训练,也就是说这样会导致前部分数据多次训练,而后面的数据没有机会参加训练?
2、为解决这个OOM的问题要降低AUDIO_FEATURE_LENGTH和batch_size这两种参数?
作者您好,我现在下载了您最新release版本的代码,也就是ASRT_v0.4,我使用里面已经训练好的model进行识别测试时,识别率还挺高的,但有些关键字还有点问题,例如:政企、集团这些比较独特的词,拼音就不是很准,所以想自己训练一个模型。
现在我使用您代码里面的train_mspeech.py训练新模型,语料库使用了您这边的ST-CMDS-20170001_1-OS和st-cmds,同时加入了一些自己的语料库,训练12000步的模型不准,60000步的也不准,代码没有修改过,只是把batch_size改成了4(16的好像GPU带不起来,一跑就关机,cpu模式又太慢),我想问下,您这边训练好的模型,是用release对应的代码跑的吗?如果是的话,我这问题主要出在哪呢?谢谢
最新的ASRT version 0.4 released 文件中在test_test_mspeech.py中的第15行
和repo中的源码不同且运行会报错
from SpeechModel25 import ModelSpeech
是不是应该改为
from SpeechModel251 import ModelSpeech
C:\Users\Administrator\Desktop\ASRT_v0.1>python3 test_mspeech.py
Using TensorFlow backend.
2018-06-03 20:38:53.581395: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-06-03 20:38:53.617397: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_UNKNOWN
2018-06-03 20:38:53.637398: I T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: USER-20171121WK
2018-06-03 20:38:53.638398: I T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_diagnostics.cc:165] hostname: USER-20171121WK
Traceback (most recent call last):
File "test_mspeech.py", line 45, in
ms = ModelSpeech(datapath)
File "C:\Users\Administrator\Desktop\ASRT_v0.1\SpeechModel22.py", line 48, in init
self._model, self.base_model = self.CreateModel()
File "C:\Users\Administrator\Desktop\ASRT_v0.1\SpeechModel22.py", line 84, in CreateModel
layer_h1 = Conv2D(32, (3,3), use_bias=True, activation='relu', padding='same', kernel_initializer='he_normal')(input_data) # 卷积层
TypeError: init() missing 1 required positional argument: 'nb_col'
Traceback (most recent call last):
File "/home/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1668, in
main()
File "/home/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1662, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1072, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/pycharm-2017.3.3/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/ASRT_SpeechRecognition-master/SpeechModel23.py", line 371, in
r = ms.RecognizeSpeech_FromFile('/home/kaldi-master/egs/thchs30/thchs30-openslr/data_thchs30/train/C4_568.wav')
File "/home/ASRT_SpeechRecognition-master/SpeechModel23.py", line 322, in RecognizeSpeech_FromFile
r = self.RecognizeSpeech(wavsignal, fs)
File "/home/ASRT_SpeechRecognition-master/SpeechModel23.py", line 303, in RecognizeSpeech
r1 = self.Predict(data_input, input_length)
File "/home/ASRT_SpeechRecognition-master/SpeechModel23.py", line 245, in Predict
x_in[i,0:len(data_input)] = data_input
ValueError: could not broadcast input array from shape (785,400,1) into shape (785,39,1)
ST-CMDS数据集train集有30万数据,实际只用了10万,为什么不用全部呢?
我自己录音测试感觉准确率比较低,怎么样才能提高准确率?谢谢
关于 readdata24.py中的GetData方法的问题:
bili 变量是干什么用的?为什么等于11;
语句
`
else:
n = n_start // bili * (bili - 1)
yushu = n_start % bili
length=len(self.list_wavnum_stcmds)
filename = self.dic_wavlist_stcmds[self.list_wavnum_stcmds[(n + yushu - 1)%length]]
list_symbol=self.dic_symbollist_stcmds[self.list_symbolnum_stcmds[(n + yushu - 1)%length]]
`
是什么含义?能否解释下
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.