Coder Social home page Coder Social logo

Comments (8)

Gushasad avatar Gushasad commented on May 18, 2024

So if I understand correctly, this will not be possible in near future?

from espeak-ng.

rhdunn avatar rhdunn commented on May 18, 2024

Not in the current plan. I am working on improving espeak-ng to better support different voices at the moment, and only have my spare time to work on espeak. Other people could work on this, but full emoticon/emoji support is complex, and would need a design being planned first to make sure that the general approach is ok.

from espeak-ng.

valdisvi avatar valdisvi commented on May 18, 2024

It is not too hard to add pronunciation for proper unicode emoticons e.g. ☺ and 😟
But pronunciation for specific character sequences is much harder, as eSpeak was not intended to handle "improper" use of punctuation.

from espeak-ng.

rhdunn avatar rhdunn commented on May 18, 2024

Another problem is that there are a large number of emoji. Emojipedia puts this at 1,394 (excluding variants involving sequences of characters like with the flags) -- http://emojipedia.org/stats/. Adding the other Unicode characters (e.g. the card suites and chess pieces) makes the list even larger.

My thinking on that problem is to support a character name table for languages, where these can be stored and updated independently of the dictionary files. For example:

☺️️ smiling face

This would then be pronounced as something like "smiling face character" in English. The flags are also complex as they now support all of https://en.wikipedia.org/wiki/ISO_3166-2. That is, you can use things like GB for Great Britain and GBSCT for Scotland. These would need enumerating as well in the character name table:

🇬🇧    great britain flag

The next issue here is how to make the lookup both fast and compact. Something similar to how I have done this with the ucd-tools code could be done. This does not handle multiple unicode characters though, so a better solution should be identified.

A generic character table source file could be used for the general emoticons and recognised symbol sequences that could then be included into the language specific table like how the phoneme table sources can include different source files. For example:

:)    ☺️️

from espeak-ng.

valdisvi avatar valdisvi commented on May 18, 2024

As initial support I can add following in dictsource/en_extra

// Emoticons
😀     gr'InIN||f'eIs
😁     gr'InIN||f'eIs||wID||sm'aIlIN 'aIz
😂     f'eIs||wID||t'i@3z||0v||dZ'OI
😃     sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT
😄     sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||sm'aIlIN 'aIz
😅     sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||k'oUld||sw'Et
😆     sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||t'aItlikl'oUzd 'aIz
😇     sm'aIlIN||f'eIs||wID||h'eIloU
😈     sm'aIlIN||f'eIs||wID||h'O@nz
😉     w'INkIN||f'eIs
😊     sm'aIlIN||f'eIs||wID||sm'aIlIN 'aIz
😋     f'eIs||s'eIv3r-IN||dI2l'IS@s||f'u:d
😌     rI2l'i:vd||f'eIs
😍     sm'aIlIN||f'eIs||wID||h'A@tS'eIpt 'aIz
😎     sm'aIlIN||f'eIs||wID||s'VNglaasI2z
😏     sm'3:kIN||f'eIs
😐     nj'u:tr@L||f'eIs
😑     Ekspr'ES@nl@s||f'eIs
😒     Vna#mj'u:sd||f'eIs
😓     f'eIs||wID||k'oUld||sw'Et
😔     p'EnsIv||f'eIs
😕     k@nfj'u:zd||f'eIs
😖     k@nf'aUndI2d||f'eIs
😗     k'IsIN||f'eIs
😘     f'eIs||Tr'oUIN||a# k'Is
😙     k'IsIN||f'eIs||wID||sm'aIlIN 'aIz
😚     k'IsIN||f'eIs||wID||kl'oUzd 'aIz
😛     f'eIs||wID||st'Vk'aUt||t'VN
😜     f'eIs||wID||st'Vk'aUt||t'VN_:_: and||w'INkIN 'aI
😝     f'eIs||wID||st'Vk'aUt||t'VN_:_: and||t'aItlikl'oUzd 'aIz
😞     d,Isa#p'OIntI2d||f'eIs
😟     w'Vrid||f'eIs
😠     'aNgri||f'eIs
😡     p'aUtIN||f'eIs
😢     kr'aIIN||f'eIs
😣     p,3:sIv'i@3rIN||f'eIs
😤     f'eIs||wID||l'Uk||0v||tr'aI;Vmf
😥     d,Isa#p'OIntI2d||b,Vt||rI2l'i:vd||f'eIs
😦     fr'aUnIN||f'eIs||wID 'oUp@n||m'aUT
😧     'aNgwISt||f'eIs
😨     f'i@3f@L||f'eIs
😩     w'i@3ri||f'eIs
😪     sl'i:pi||f'eIs
😫     t'aI@d||f'eIs
😬     gr'ImIsIN||f'eIs
😭     l'aUdli||kr'aIIN||f'eIs
😮     f'eIs||wID 'oUp@n||m'aUT
😯     h'VSt||f'eIs
😰     f'eIs||wID 'oUp@n||m'aUT_:_: and||k'oUld||sw'Et
😱     f'eIs||skr'i:mIN||In||f'i@3
😲     a#st'0nISt||f'eIs
😳     fl'VSt||f'eIs
😴     sl'i:pIN||f'eIs
😵     d'Izi||f'eIs
😶     f'eIs||wID,aUt||m'aUT
😷     f'eIs||wID||m'EdIk@L||m'aask
....

Would it be OK, to do generate similar list for all from (http://www.unicode.org/Public/emoji/1.0/emoji-data.txt) or just some of most popular?
But spelling of e.g. ¯\_(ツ)_/¯ can be described only in en_rules, because it is not single (Unicode) character, but word.
To handle such "synonym words", e.g.:

:)             ☺️️
¯\_(ツ)_/¯ 🤷

we have to implement #199 item: "replace rule extended to replace not only characters, but group of characters".

from espeak-ng.

rhdunn avatar rhdunn commented on May 18, 2024

I would prefer this is:

  1. a separate table from the dictionary files -- making it easier to update the dictionary and symbol name files independently;
  2. uses words (similar to my proposal for the number support).

Point 2 is to make it easier to support different accents and to avoid transcription errors.

Implementing this will require changes to the espeak-ng code to support this functionality.

Another requirement is to support these through the espeak_ng_SpeakCharacter function (and adding an espeak_ng_SpeakMultiCharacter function for things like flag emoji).

from espeak-ng.

rhdunn avatar rhdunn commented on May 18, 2024

Issue #216 is relevant here, and should be the preferred solution in the long-term. That is, the emoji would be defined in en-Zsye, de-Zsye, etc. language dictionaries that if present would be used to speak the emoji characters. The same applies to symbols (Zsym) and mathematical notation (Zmth), as well as reading things like Greek characters (Grek) in English.

The more complex cases are for things like the shrug character (¯\_(ツ)_/¯) that mix punctuation characters with Japanese characters ().

from espeak-ng.

rhdunn avatar rhdunn commented on May 18, 2024

I am restricting this to just support reading the Zsye characters, instead of also supporting their ASCII equivalents. This still covers the combined emoji characters (https://en.wikipedia.org/wiki/Emoji), e.g.:

  1. flags
  2. skin colours (fitzpatrick skin tones)
  3. joined emoji characters (e.g. man+woman+girl = family)

from espeak-ng.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.