Coder Social home page Coder Social logo

Comments (4)

dmort27 avatar dmort27 commented on June 9, 2024 1

@lvZic , I apologize for writing in English since I can read, but not write, Chinese.

There appear to be two misunderstandings here:

  1. allosaurus supports Chinese (including 普通话) in the same way it supports every language: by recognizing acoustic speech signals as sequences of IPA (International Phonetic Alphabet) phones. It does not, strictly speaking, support phonemes and it does not directly support orthographies—such as Pinyin (拼音)—whether or not they are phonemically adequate.
  2. You are confusing phonemes with syllabic constituents (initial/onset, final/rhyme, and tone). By definition, a phoneme is a minimal contrastive unit of sound and is anything but minimal (consisting of three segments) [j], [o], and [ŋ].

If you want to recognize 普通话 speech as 拼音, you have at least three options:

  1. Use a pronouncing dictionary of Chinese to transliterate a speech corpus into Pinyin, then train a standardard ASR model on the corpus.
  2. Train a model to transduce IPA to Pinyin and use it in a pipeline with Allosaurus: speech signal --allosaurus--> IPA --transducer--> Pinyin
  3. Use an off-the-shelf Chinese ASR model and convert the output (汉子) to Pinyin using a pronouncing dictionary: speech signal --Chinese ASR--> 汉子 --transducer--> 拼音 (Easiest).

from allosaurus.

xinjli avatar xinjli commented on June 9, 2024

你好,请问你说的支持具体是什么支持?

from allosaurus.

lvZic avatar lvZic commented on June 9, 2024

你好,请问你说的支持具体是什么支持?

我是指音素的language里没有汉语普通话,如下:

Initials (consonants) - 21 phonemes

(b) (c) (d) (f) (g) (h) (j) (k) (l) (m) (n) (p) (q) (r) (s) (t) (x) (z) (zh) (ch) (sh)

Finals (vowels and vowel-nasal pairs) - 35 phonemes

(a) (e) (i) (o) (u) (ü) (iu) (ui) (un) (ün) (ia) (ie) (ua) (uo) (ai) (ei) (in) (ou) (an) (ao) (en) (ang) (ong) (eng) (ing) (ian) (iao) (uan) (uai) (iou) (üan) (iang) (iong) (uang) (ueng)

from allosaurus.

lvZic avatar lvZic commented on June 9, 2024

@lvZic , I apologize for writing in English since I can read, but not write, Chinese.

There appear to be two misunderstandings here:

  1. allosaurus supports Chinese (including 普通话) in the same way it supports every language: by recognizing acoustic speech signals as sequences of IPA (International Phonetic Alphabet) phones. It does not, strictly speaking, support phonemes and it does not directly support orthographies—such as Pinyin (拼音)—whether or not they are phonemically adequate.
  2. You are confusing phonemes with syllabic constituents (initial/onset, final/rhyme, and tone). By definition, a phoneme is a minimal contrastive unit of sound and is anything but minimal (consisting of three segments) [j], [o], and [ŋ].

If you want to recognize 普通话 speech as 拼音, you have at least three options:

  1. Use a pronouncing dictionary of Chinese to transliterate a speech corpus into Pinyin, then train a standardard ASR model on the corpus.
  2. Train a model to transduce IPA to Pinyin and use it in a pipeline with Allosaurus: speech signal --allosaurus--> IPA --transducer--> Pinyin
  3. Use an off-the-shelf Chinese ASR model and convert the output (汉子) to Pinyin using a pronouncing dictionary: speech signal --Chinese ASR--> 汉子 --transducer--> 拼音 (Easiest).

thanks for your reply. I will have a look.
And I wonder if allosaurus has enough accuracy, while I want to use it to generate phoneme dataset for animation lip training. I found there is a little difference between the resulting phonemes of eng_to_ipa method and allosaurus.

from allosaurus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.