Comments (8)
So if I understand correctly, this will not be possible in near future?
from espeak-ng.
Not in the current plan. I am working on improving espeak-ng to better support different voices at the moment, and only have my spare time to work on espeak. Other people could work on this, but full emoticon/emoji support is complex, and would need a design being planned first to make sure that the general approach is ok.
from espeak-ng.
It is not too hard to add pronunciation for proper unicode emoticons e.g. ☺ and 😟
But pronunciation for specific character sequences is much harder, as eSpeak was not intended to handle "improper" use of punctuation.
from espeak-ng.
Another problem is that there are a large number of emoji. Emojipedia puts this at 1,394 (excluding variants involving sequences of characters like with the flags) -- http://emojipedia.org/stats/. Adding the other Unicode characters (e.g. the card suites and chess pieces) makes the list even larger.
My thinking on that problem is to support a character name table for languages, where these can be stored and updated independently of the dictionary files. For example:
This would then be pronounced as something like "smiling face character" in English. The flags are also complex as they now support all of https://en.wikipedia.org/wiki/ISO_3166-2. That is, you can use things like GB
for Great Britain and GBSCT
for Scotland. These would need enumerating as well in the character name table:
🇬🇧 great britain flag
The next issue here is how to make the lookup both fast and compact. Something similar to how I have done this with the ucd-tools code could be done. This does not handle multiple unicode characters though, so a better solution should be identified.
A generic character table source file could be used for the general emoticons and recognised symbol sequences that could then be included into the language specific table like how the phoneme table sources can include different source files. For example:
:) ☺️️
from espeak-ng.
As initial support I can add following in dictsource/en_extra
// Emoticons
😀 gr'InIN||f'eIs
😁 gr'InIN||f'eIs||wID||sm'aIlIN 'aIz
😂 f'eIs||wID||t'i@3z||0v||dZ'OI
😃 sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT
😄 sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||sm'aIlIN 'aIz
😅 sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||k'oUld||sw'Et
😆 sm'aIlIN||f'eIs||wID 'oUp@n||m'aUT_:_: and||t'aItlikl'oUzd 'aIz
😇 sm'aIlIN||f'eIs||wID||h'eIloU
😈 sm'aIlIN||f'eIs||wID||h'O@nz
😉 w'INkIN||f'eIs
😊 sm'aIlIN||f'eIs||wID||sm'aIlIN 'aIz
😋 f'eIs||s'eIv3r-IN||dI2l'IS@s||f'u:d
😌 rI2l'i:vd||f'eIs
😍 sm'aIlIN||f'eIs||wID||h'A@tS'eIpt 'aIz
😎 sm'aIlIN||f'eIs||wID||s'VNglaasI2z
😏 sm'3:kIN||f'eIs
😐 nj'u:tr@L||f'eIs
😑 Ekspr'ES@nl@s||f'eIs
😒 Vna#mj'u:sd||f'eIs
😓 f'eIs||wID||k'oUld||sw'Et
😔 p'EnsIv||f'eIs
😕 k@nfj'u:zd||f'eIs
😖 k@nf'aUndI2d||f'eIs
😗 k'IsIN||f'eIs
😘 f'eIs||Tr'oUIN||a# k'Is
😙 k'IsIN||f'eIs||wID||sm'aIlIN 'aIz
😚 k'IsIN||f'eIs||wID||kl'oUzd 'aIz
😛 f'eIs||wID||st'Vk'aUt||t'VN
😜 f'eIs||wID||st'Vk'aUt||t'VN_:_: and||w'INkIN 'aI
😝 f'eIs||wID||st'Vk'aUt||t'VN_:_: and||t'aItlikl'oUzd 'aIz
😞 d,Isa#p'OIntI2d||f'eIs
😟 w'Vrid||f'eIs
😠 'aNgri||f'eIs
😡 p'aUtIN||f'eIs
😢 kr'aIIN||f'eIs
😣 p,3:sIv'i@3rIN||f'eIs
😤 f'eIs||wID||l'Uk||0v||tr'aI;Vmf
😥 d,Isa#p'OIntI2d||b,Vt||rI2l'i:vd||f'eIs
😦 fr'aUnIN||f'eIs||wID 'oUp@n||m'aUT
😧 'aNgwISt||f'eIs
😨 f'i@3f@L||f'eIs
😩 w'i@3ri||f'eIs
😪 sl'i:pi||f'eIs
😫 t'aI@d||f'eIs
😬 gr'ImIsIN||f'eIs
😭 l'aUdli||kr'aIIN||f'eIs
😮 f'eIs||wID 'oUp@n||m'aUT
😯 h'VSt||f'eIs
😰 f'eIs||wID 'oUp@n||m'aUT_:_: and||k'oUld||sw'Et
😱 f'eIs||skr'i:mIN||In||f'i@3
😲 a#st'0nISt||f'eIs
😳 fl'VSt||f'eIs
😴 sl'i:pIN||f'eIs
😵 d'Izi||f'eIs
😶 f'eIs||wID,aUt||m'aUT
😷 f'eIs||wID||m'EdIk@L||m'aask
....
Would it be OK, to do generate similar list for all from (http://www.unicode.org/Public/emoji/1.0/emoji-data.txt) or just some of most popular?
But spelling of e.g. ¯\_(ツ)_/¯
can be described only in en_rules
, because it is not single (Unicode) character, but word.
To handle such "synonym words", e.g.:
:) ☺️️
¯\_(ツ)_/¯ 🤷
we have to implement #199 item: "replace
rule extended to replace not only characters, but group of characters".
from espeak-ng.
I would prefer this is:
- a separate table from the dictionary files -- making it easier to update the dictionary and symbol name files independently;
- uses words (similar to my proposal for the number support).
Point 2 is to make it easier to support different accents and to avoid transcription errors.
Implementing this will require changes to the espeak-ng code to support this functionality.
Another requirement is to support these through the espeak_ng_SpeakCharacter
function (and adding an espeak_ng_SpeakMultiCharacter
function for things like flag emoji).
from espeak-ng.
Issue #216 is relevant here, and should be the preferred solution in the long-term. That is, the emoji would be defined in en-Zsye
, de-Zsye
, etc. language dictionaries that if present would be used to speak the emoji characters. The same applies to symbols (Zsym
) and mathematical notation (Zmth
), as well as reading things like Greek characters (Grek
) in English.
The more complex cases are for things like the shrug character (¯\_(ツ)_/¯
) that mix punctuation characters with Japanese characters (ツ
).
from espeak-ng.
I am restricting this to just support reading the Zsye
characters, instead of also supporting their ASCII equivalents. This still covers the combined emoji characters (https://en.wikipedia.org/wiki/Emoji), e.g.:
- flags
- skin colours (fitzpatrick skin tones)
- joined emoji characters (e.g. man+woman+girl = family)
from espeak-ng.
Related Issues (20)
- espeak: symbol lookup error: espeak: undefined symbol: espeak_ng_SetVoiceByFile
- Word-Level Correspondence in ESpeak Phonetic Transcriptions for English HOT 6
- If a Hungarian exception contains a minus sign, Espeak often mispronounces the replacement phrase
- Can the default voice be improved?
- it says "Chinese Letter" HOT 13
- Robotic sound - difficult to understand HOT 8
- Missing phonemes in Japanese HOT 2
- Many language codes cause errors
- some Phonemes are combined and output when the input text contains phrase 'in the' HOT 2
- Truncation of Long Korean Sentences
- Android: speech recognition does not work when using Espeak-ng as the default text to speech engine HOT 7
- Incompatible with Android 14 HOT 6
- How can I convert ARPABET phones to the English phone set for this project? HOT 1
- android voice new struct HOT 4
- Win 10 quarantines an index.php file as being the backdoor:PHP/Dirtelti.HA trojan HOT 2
- Japanese Dakuten separation lead to incorrect conversion HOT 3
- Improvements for Urdu
- Fix inclusion of <sys/endian.h>
- speech.c: fifo_stop() creates a noticeable delay (at least in NetBSD), call audio_object_flush() first.
- Have espeak stop saying the alphabet name before characters constantly for non-latin alphabets
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from espeak-ng.