There are 3 types of emoticons/emoji that can be supported: sp

As initial support I can add following in dictsource/en_extra <div class="snippet-

I would prefer this is: a separate table from the dictionary f

Issue <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-

Support emoticons and emoji symbols (Zsye). about espeak-ng HOT 8 CLOSED

espeak-ng commented on May 18, 2024

Support emoticons and emoji symbols (Zsye).

from espeak-ng.

Comments (8)

Gushasad commented on May 18, 2024

So if I understand correctly, this will not be possible in near future?

from espeak-ng.

rhdunn commented on May 18, 2024

Not in the current plan. I am working on improving espeak-ng to better support different voices at the moment, and only have my spare time to work on espeak. Other people could work on this, but full emoticon/emoji support is complex, and would need a design being planned first to make sure that the general approach is ok.

from espeak-ng.

valdisvi commented on May 18, 2024

It is not too hard to add pronunciation for proper unicode emoticons e.g. ☺ and 😟
But pronunciation for specific character sequences is much harder, as eSpeak was not intended to handle "improper" use of punctuation.

from espeak-ng.

rhdunn commented on May 18, 2024

Another problem is that there are a large number of emoji. Emojipedia puts this at 1,394 (excluding variants involving sequences of characters like with the flags) -- http://emojipedia.org/stats/. Adding the other Unicode characters (e.g. the card suites and chess pieces) makes the list even larger.

My thinking on that problem is to support a character name table for languages, where these can be stored and updated independently of the dictionary files. For example:

☺️️ smiling face

This would then be pronounced as something like "smiling face character" in English. The flags are also complex as they now support all of https://en.wikipedia.org/wiki/ISO_3166-2. That is, you can use things like GB for Great Britain and GBSCT for Scotland. These would need enumerating as well in the character name table:

🇬🇧    great britain flag

The next issue here is how to make the lookup both fast and compact. Something similar to how I have done this with the ucd-tools code could be done. This does not handle multiple unicode characters though, so a better solution should be identified.

A generic character table source file could be used for the general emoticons and recognised symbol sequences that could then be included into the language specific table like how the phoneme table sources can include different source files. For example:

:)    ☺️️

from espeak-ng.

valdisvi commented on May 18, 2024

As initial support I can add following in dictsource/en_extra

// Emoticons
😀     gr'InIN||f'eIs
😁     gr'InIN||f'eIs||wID||sm'aIlIN 'aIz
😂     f'eIs||wID||t'i@3z||0v||dZ'OI
😃     sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT
😄     sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||sm'aIlIN 'aIz
😅     sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||k'oUld||sw'Et
😆     sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||t'aItlikl'oUzd 'aIz
😇     sm'aIlIN||f'eIs||wID||h'eIloU
😈     sm'aIlIN||f'eIs||wID||h'O@nz
😉     w'INkIN||f'eIs
😊     sm'aIlIN||f'eIs||wID||sm'aIlIN 'aIz
😋     f'eIs||s'eIv3r-IN||dI2l'IS@s||f'u:d
😌     rI2l'i:vd||f'eIs
😍     sm'aIlIN||f'eIs||wID||h'A@tS'eIpt 'aIz
😎     sm'aIlIN||f'eIs||wID||s'VNglaasI2z
😏     sm'3:kIN||f'eIs
😐     nj'u:tr@L||f'eIs
😑     Ekspr'ES@nl@s||f'eIs
😒     Vna#mj'u:sd||f'eIs
😓     f'eIs||wID||k'oUld||sw'Et
😔     p'EnsIv||f'eIs
😕     k@nfj'u:zd||f'eIs
😖     k@nf'aUndI2d||f'eIs
😗     k'IsIN||f'eIs
😘     f'eIs||Tr'oUIN||a# k'Is
😙     k'IsIN||f'eIs||wID||sm'aIlIN 'aIz
😚     k'IsIN||f'eIs||wID||kl'oUzd 'aIz
😛     f'eIs||wID||st'Vk'aUt||t'VN
😜     f'eIs||wID||st'Vk'aUt||t'VN_:_: and||w'INkIN 'aI
😝     f'eIs||wID||st'Vk'aUt||t'VN_:_: and||t'aItlikl'oUzd 'aIz
😞     d,Isa#p'OIntI2d||f'eIs
😟     w'Vrid||f'eIs
😠     'aNgri||f'eIs
😡     p'aUtIN||f'eIs
😢     kr'aIIN||f'eIs
😣     p,3:sIv'i@3rIN||f'eIs
😤     f'eIs||wID||l'Uk||0v||tr'aI;Vmf
😥     d,Isa#p'OIntI2d||b,Vt||rI2l'i:vd||f'eIs
😦     fr'aUnIN||f'eIs||wID 'oUp@n||m'aUT
😧     'aNgwISt||f'eIs
😨     f'i@3f@L||f'eIs
😩     w'i@3ri||f'eIs
😪     sl'i:pi||f'eIs
😫     t'aI@d||f'eIs
😬     gr'ImIsIN||f'eIs
😭     l'aUdli||kr'aIIN||f'eIs
😮     f'eIs||wID 'oUp@n||m'aUT
😯     h'VSt||f'eIs
😰     f'eIs||wID 'oUp@n||m'aUT_:_: and||k'oUld||sw'Et
😱     f'eIs||skr'i:mIN||In||f'i@3
😲     a#st'0nISt||f'eIs
😳     fl'VSt||f'eIs
😴     sl'i:pIN||f'eIs
😵     d'Izi||f'eIs
😶     f'eIs||wID,aUt||m'aUT
😷     f'eIs||wID||m'EdIk@L||m'aask
....

Would it be OK, to do generate similar list for all from (http://www.unicode.org/Public/emoji/1.0/emoji-data.txt) or just some of most popular?
But spelling of e.g. ¯\_(ツ)_/¯ can be described only in en_rules, because it is not single (Unicode) character, but word.
To handle such "synonym words", e.g.:

:)             ☺️️
¯\_(ツ)_/¯ 🤷

we have to implement #199 item: "replace rule extended to replace not only characters, but group of characters".

from espeak-ng.

rhdunn commented on May 18, 2024

I would prefer this is:

a separate table from the dictionary files -- making it easier to update the dictionary and symbol name files independently;
uses words (similar to my proposal for the number support).

Point 2 is to make it easier to support different accents and to avoid transcription errors.

Implementing this will require changes to the espeak-ng code to support this functionality.

Another requirement is to support these through the espeak_ng_SpeakCharacter function (and adding an espeak_ng_SpeakMultiCharacter function for things like flag emoji).

from espeak-ng.

rhdunn commented on May 18, 2024

Issue #216 is relevant here, and should be the preferred solution in the long-term. That is, the emoji would be defined in en-Zsye, de-Zsye, etc. language dictionaries that if present would be used to speak the emoji characters. The same applies to symbols (Zsym) and mathematical notation (Zmth), as well as reading things like Greek characters (Grek) in English.

The more complex cases are for things like the shrug character (¯\_(ツ)_/¯) that mix punctuation characters with Japanese characters (ツ).

from espeak-ng.

rhdunn commented on May 18, 2024

I am restricting this to just support reading the Zsye characters, instead of also supporting their ASCII equivalents. This still covers the combined emoji characters (https://en.wikipedia.org/wiki/Emoji), e.g.:

flags
skin colours (fitzpatrick skin tones)
joined emoji characters (e.g. man+woman+girl = family)

from espeak-ng.

Support emoticons and emoji symbols (Zsye). about espeak-ng HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent