Comments (4)
@lvZic , I apologize for writing in English since I can read, but not write, Chinese.
There appear to be two misunderstandings here:
allosaurus
supports Chinese (including 普通话) in the same way it supports every language: by recognizing acoustic speech signals as sequences of IPA (International Phonetic Alphabet) phones. It does not, strictly speaking, support phonemes and it does not directly support orthographies—such as Pinyin (拼音)—whether or not they are phonemically adequate.- You are confusing phonemes with syllabic constituents (initial/onset, final/rhyme, and tone). By definition, a phoneme is a minimal contrastive unit of sound and is anything but minimal (consisting of three segments) [j], [o], and [ŋ].
If you want to recognize 普通话 speech as 拼音, you have at least three options:
- Use a pronouncing dictionary of Chinese to transliterate a speech corpus into Pinyin, then train a standardard ASR model on the corpus.
- Train a model to transduce IPA to Pinyin and use it in a pipeline with Allosaurus: speech signal --allosaurus--> IPA --transducer--> Pinyin
- Use an off-the-shelf Chinese ASR model and convert the output (汉子) to Pinyin using a pronouncing dictionary: speech signal --Chinese ASR--> 汉子 --transducer--> 拼音 (Easiest).
from allosaurus.
你好,请问你说的支持具体是什么支持?
from allosaurus.
你好,请问你说的支持具体是什么支持?
我是指音素的language里没有汉语普通话,如下:
Initials (consonants) - 21 phonemes
(b) (c) (d) (f) (g) (h) (j) (k) (l) (m) (n) (p) (q) (r) (s) (t) (x) (z) (zh) (ch) (sh)
Finals (vowels and vowel-nasal pairs) - 35 phonemes
(a) (e) (i) (o) (u) (ü) (iu) (ui) (un) (ün) (ia) (ie) (ua) (uo) (ai) (ei) (in) (ou) (an) (ao) (en) (ang) (ong) (eng) (ing) (ian) (iao) (uan) (uai) (iou) (üan) (iang) (iong) (uang) (ueng)
from allosaurus.
@lvZic , I apologize for writing in English since I can read, but not write, Chinese.
There appear to be two misunderstandings here:
allosaurus
supports Chinese (including 普通话) in the same way it supports every language: by recognizing acoustic speech signals as sequences of IPA (International Phonetic Alphabet) phones. It does not, strictly speaking, support phonemes and it does not directly support orthographies—such as Pinyin (拼音)—whether or not they are phonemically adequate.- You are confusing phonemes with syllabic constituents (initial/onset, final/rhyme, and tone). By definition, a phoneme is a minimal contrastive unit of sound and is anything but minimal (consisting of three segments) [j], [o], and [ŋ].
If you want to recognize 普通话 speech as 拼音, you have at least three options:
- Use a pronouncing dictionary of Chinese to transliterate a speech corpus into Pinyin, then train a standardard ASR model on the corpus.
- Train a model to transduce IPA to Pinyin and use it in a pipeline with Allosaurus: speech signal --allosaurus--> IPA --transducer--> Pinyin
- Use an off-the-shelf Chinese ASR model and convert the output (汉子) to Pinyin using a pronouncing dictionary: speech signal --Chinese ASR--> 汉子 --transducer--> 拼音 (Easiest).
thanks for your reply. I will have a look.
And I wonder if allosaurus has enough accuracy, while I want to use it to generate phoneme dataset for animation lip training. I found there is a little difference between the resulting phonemes of eng_to_ipa method and allosaurus.
from allosaurus.
Related Issues (20)
- Prior.txt file path HOT 2
- Optimizing for Latency
- support for python 3.10 HOT 4
- Not able to transcribe simple word what in English HOT 5
- more model for recognition HOT 1
- The timestamp of model 'interspeech21' is incorrect HOT 4
- Unable to run interspeech21 model HOT 1
- Feature normalization can cause NaN to appear HOT 1
- Directory Name con not allowed on Windows HOT 1
- NumPy requirement is less than 1.22 and latest is 1.19.5
- Difference in outputs of splitted v/s unsplitted audio file HOT 2
- Wave error for given sample
- Any way to add new languages?
- UnicodeEncodeError: 'charmap' codec can't encode character '\u02d0' in position 28 when redirecting in WIndows
- Content of fine-tuning files?
- AttributeError: 'PosixPath' object has no attribute 'startswith' HOT 1
- Fix setup.py
- Phone inventory always the default one even after specifying model eng2102 and lang eng
- Is there any way of getting arpabet phonetic transcription for hindi language?
- How long does it theoretically take for "allosaurus" to recognize phonemes?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from allosaurus.