Comments (8)
没有哦,直接输出文字的,或者英文单词。
你是什么原因需要输出拼音的呢?
或者可以把中文转换为拼音输入到模型训练,应该可以,但是没有对应的语言模型。
from masr.
谢谢!
这种情况下是否需关闭语言模型?或者无需修改?
from masr.
直接使用贪心解码方法就好
from masr.
我fork了代码,增加了一个pinyin模式,就是将dataset目录下的 manifest.* 和 vocabulary.txt 内的汉字都转换为拼音了。
但这样训练了之后,输出总是很多<unk><unk>
,不知道是否遗漏了什么。
from masr.
已解决。感谢支持!
from masr.
from masr.
其实就是汉字的分字和拼音的分字不同。
之前的问题很可能是汉字分字采用的“2字节”一个汉字,而在拼音中,这样就无法对应词汇表中的拼音了。
修改也比较简单,就是增加了按空格分字的模式,并将每个字的最大长度调高。(这样一想,变成英文分词了)
如果有用的话,我生成一个PR。
确实也希望能支持英文识别。
from masr.
一直都支持英文识别啊,你看的是最新新代码 吗?
from masr.
Related Issues (20)
- 默认类型应该改为float HOT 1
- About dataset HOT 2
- 没有python_speech_features包 HOT 1
- 选择音频处理方式前向计算维度错误 HOT 1
- 数据准备部分怎么生成数据列表文件,存在dataset/annotation/目录下 HOT 2
- 运行tune.py报错,提示没有mean_std.npz文件 HOT 2
- online 和offline自己炼的话 相同条件下是不是offline效果好点? HOT 67
- no module named "torch.inference" HOT 1
- RuntimeError: PytorchStreamReader failed locating file constants.pkl: file not found HOT 4
- AIShell (179小时) 的预训练模型,是哪个版本torch训练的 HOT 3
- No such file or directory: 'dataset/manifest.test' HOT 2
- 在数据库WenetSpeech上预训练的模型conformere免费下载 HOT 1
- 缺失vocabulary文件
- RuntimeError: PytorchStreamReader failed locating file constants.pkl: file not found HOT 2
- Is it possible to build a WeChat group for better communication? HOT 1
- RuntimeError: `lengths` array must be sorted in decreasing order when `enforce_sorted` is True. You can pass `enforce_sorted=False` to pack_padded_sequence and/or pack_sequence to sidestep this requirement if you do not need ONNX exportability HOT 3
- 在train.py运行开始时发生如下报错 HOT 5
- 运行train.py时报错 HOT 4
- 是否能提供一份python3.10的paddlespeech-ctcdecoders HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from masr.