espeak-ng / espeak-ng Goto Github PK

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

License: GNU General Public License v3.0

Shell 4.32% C 77.16% C++ 3.21% HTML 0.21% Makefile 1.61% M4 0.84% JavaScript 0.55% Java 9.40% Python 1.84% Vim Script 0.22% CMake 0.63%

espeak-ng espeak android text-to-speech speech-synthesis

espeak-ng's People

Stargazers

Watchers

Forkers

rhdunn mondhs dinhtrung tom31203120 ashengmz aadeshnpn gude432 kabanyara srsree jongsix gale320 fyz7891 fuycadw gsas93 test1212121212 mimoccc alltoy rubychen0509 maoqis rascyber ragb boosheng ozbek bingshanguxue vanloswang enuilin hujihong menny aminusia jimregan vampire44 thomasguillory babycaseny iniyanakila lovenelson21 geekcrowds leavesofgrass synaptek zekom mcanthony silver2row gdseller linuxscout michaeldcurran oniongarlic cloudstdio christianlm nyqvist codemonki hgparedess jamaicanuser babbagecom themuso ezhomelabs 2php neozhangthe1 florian-hoenicke ming-hai boylittle softaragones amitj975 pettarin zyh329 niedzielski ghs2015 cmb jrmeyer stevenlol edacat gusenkovs niudong2015 geithub dejunliu georgesstavracas mukhammadsaid unforeseenocean nirvandy stfnrpplngr pro-to-tip martindelille nkscorpion g10dras minmaung chcg eidy eeejay captnc orotau cmrdt florianionascu7 marcismajors biaoyinzi kursh xhuvom danghoaiphuc shan-ge can2apple lparam mbencherif drdento

espeak-ng's Issues

Move the current voice definition files to language definition files.

The voice files are closer to language files. These languages should be organised by the closest ISO 639-5 language family code (e.g. voices/europe/cy moving to language/cel/cy -- Celtic/Welsh).

Additionally, the languages should be BCP 47 compliant (e.g. en-GB-scotland instead of en-sc). Where extensions are needed, the private use tags from Cainteoir Text-to-Speech should be used. These should be described in a privateuse.dat file included in the espeak-ng project.

Create man pages for the command line and C API in markdown

The command line interfaces to espeak-ng and speak-ng, as well as the C API, should all be documented in markdown that generates man pages and HTML.

Define the MBROLA voices as primary phoneme table data

The MBROLA voice phoneme maps should be removed -- these should be replaced by MBROLA voice specific phoneme tables. These should support:

Mapping standard phonemes (consonants, vowels affricates and diphthongs);
Long and short variants (for vowels);
Constructed phonemes (for greater phoneme coverage);
Phoneme aliases (for greater language support).

It should be possible to modify the phoneme sequence to account for missing diphones (e.g. the h-6 diphone on the German voices).

Regression: build 32655d doesn't speak

When 32655d build is compiled and installed it makes just few beeps and clicks and halts.
(Built on Ubuntu 14.04.3 LTS, kernel x86_64 3.13.0-74-lowlatency).

Remove commented/#ifdef'd out code.

This code is not used by the program and is accessible by the source control history. Therefore, this code should be removed to make the code more readable.

espeak-ng --compile-debug=xx returns "Can't access (r) file 'xx_rules'"

When trying to compile language with advanced debugging
e.g. from src map (and similarly from different other maps):

./espeak-ng --compile-debug=lv

It returns:

Can't access (r) file 'lv_rules'

Maybe src/espeak-ng.1.html should be added to version control?

Even though generated files in general are not included in version control, adding src/espeak-ng.1.html (and/or src/speak-ng.1.html?) would allow to use it in references from other documents.

Look at using ucd-tools for isalpha, toupper, etc. wide-character support.

The ucd-tools project is being used in the eSpeak for Android project. It is also needed on platforms like Windows Mobile. Additionally, different versions of platforms support different versions of Unicode. Using ucd-tools would make this the Unicode support consistent.

Create language specification files

These files should use the BCP47 code for the language (en-GB-scotland, pt-BR, da, etc.) and should be separate from the voice files.

Store the phoneme data files in a text format and compile to binary.

The binary data files are difficult to work with and version properly. These should be stored in a text format and compiled to binary during the build, like the language dictionaries are.

Make all error paths return an espeak_ng_STATUS code.

Currently, not all code paths inherited from the espeak codebase return the correct error code. The code should be checked so that:

Calls that can fail correctly check and return a valid espeak_ng_STATUS code;
Calls that fail processing a file should set an espeak_ng_ERROR_CONTEXT object.

This will ensure that the underlying error is correctly preserved.

speech.h should not hardcode platform-specific defines

Currently at master 36be9ac src/libespeak-ng/speech.h line 41: PLATFORM_POSIX is a hardcoded define.
This makes it impossible to compile on non-posix platforms such as Windows, even by providing contrary PLATFORM_* defines via compiler commandlines.
Perhaps it is a work in progress to move the code across to some kind of configure.h.in or equivalent, but for now this is completely breaking compilation of libespeak-ng on Windows for NVDA.

Note that commenting this line out, addressing the fclose bug in LoadSpectSeq, handling NULL log FILE arguments in compilePhonemeData and compileIntonations, and addressing the phsource directory check in compilePhonemeData2, libespeak-ng is being successfully compiled and used by NVDA.

Refactor the spect and synthesize code to share common code

Make the spect code use the structures from synthesize.h directly (e.g. for frames).
Move those definitions to spect.h
Move implementations of the frame synthesis code from synthesize to spect.

Support building espeak on Windows with Visual C++.

This should add Visual C++ project files to support Windows-based builds.

mkdictlist is missed

The mkdictlist file is missed, I want to use it to add new voices for arabic

Rebrand the espeak program to espeak-ng.

The program, library, environment variables, etc. should reflect the project being espeak-ng. This will help differentiate it from the upstream espeak project, should Jonathan continue to make releases to it, allowing both to run on a system. It also helps to avoid confusion.

Clean up the portability support for espeak.

Currently, the espeak portability is a mix of PLATFORM_* checks and HAVE_* checks.

The portability approach should be like how libressl handles portability. That is:

Define POSIX and ISO C headers needing compatibility in src/include/compat.
Move the header-specific compatibility checks in those compatibility headers.
Create a set of src/compat/API_NAME.c compatibility implementations.
Add the compatibility sources that are needed for a given platform.

This will keep the main source code clean and free of #if ... checks.

Never call assert in the library codebase.

Various places in the codebase (e.g. the async thread code) call assert on errors. This will cause the calling application to exit abruptly.

These assert checks should be made into if checks that return espeak_ng_STATUS error codes that are propagated to the caller.

Don't build libespeak-ng code in speak-ng

The speak-ng program should link to the static library.

Don't include StdAfx.h in every file.

This is an artifact of the Windows Visual Studio build support for pre-compiled headers. The pre-compiled headers are only used for including a lean and mean version of windows.h. Thus, in the places where windows.h is actually used, it should be included within an #ifdef PLATFORM_WINDOWS block. The Visual Studio project files should use the "not using pre-compiled headers" option.

Reimplement the SAPI bindings.

The SAPI bindings should be rewritten to avoid the Copyright assignments to Microsoft. The rewrite should also look to improve the SAPI binding in general.

General Infrastructure:

Refer to external versions of the SAPI libraries.
Don't include generated files from midl, etc.
Use a more modern project format (MSBuild) than Visual Studio 6!
Avoid using pre-compiled headers (see #6).
Implement the SAPI COM interfaces in libespeak-ng.dll, not in a separate DLL.
Don't depend on the ATL library -- only use the Windows headers.

SAPI Interfaces:

ISpObjectWithToken
ISpTTSEngine

ISpTTSEngine::Speak SPVTEXTFRAG state actions:

Voice Management:

Select and install voices from the MSI installer -- language+accent.
Select the voice variant from an eSpeak SAPI configuration/properties page.

Create a "jonathan" voice from the existing phoneme table data

The existing espeak phoneme table voice data should be restructured into a src/voices/jonathan. It should be capable of speaking as many IPA phonemes as possible.

Once the project has been restructured to make use of this voice profile, the existing voice data should be removed.

Start with the English voice;
Build out to other English accents;
Extend to support other language phonemes on a language-by-language basis.

Support SMS encoded Greek

SMS encoded Greek is uppercase Latin and Greek characters, where the Latin characters are visually identical to the Greek characters (e.g. using A for alpha).

NOTE: This will require heuristics as the Android platform, etc. will map the SMS characters to Unicode, mixing the Latin and Greek scripts.

Reference: https://en.wikipedia.org/wiki/GSM_03.38#GSM_7-bit_default_alphabet_and_extension_table_of_3GPP_TS_23.038_.2F_GSM_03.38

Make the project pass Coverity scanning.

The eSpeak project has a lot of issues identified by Coverity. The espeak-ng code should pass a Coverity scan.

Use only one implementation of functions like Free

These functions are duplicated in libespeak-ng, espeak-ng and speak-ng. There should only be one version of each of these functions.

Make the project pass clang scan-build scanning.

The espeak codebase should pass the clang static analysis checks.

Documentation: specify allowed number range of rule groups

In Text to Phoneme Translation it is said that rule groups can be from 01 to 25 but in dictsource .._rules files group numbers are up to 38.
Please update documentation to what is actually allowed range.

Move the compiledata code from espeakedit to libespeak-ng

This requires porting the code from wxWidgets to C.

This allows the espeak program to support building the phoneme table and intonation data, making it possible to build espeak without espeakedit on a headless (GUI-less) system.

Support selecting alternative voice data

The voice data is the phoneme tables and intonations. This would allow selecting a different location for this data:

if the location is a directory, it loads the files separately from that directory (like is done currently, but allowing a different location than the espeak-data path);
if the location is a file, this loads a combined data file that contains voice metadata and the phonemetable and intonation data files.

Access to this functionality should be provided by the C API and command-line interface.

Use the system implementation of the sonic library.

This will allow newer versions of sonic to be used and incorporate fixes from distributions.

Regression: espeak-ng 5d295bd is incorrectly slowed

Tested on commit 5d295bd
Listen http://odo.lv/ftp/temp/broken.wav.mp3
It looks like lengthening phoneme : is much too long for all vowels.
Last good pronunciation is for commit 5bbc0d3

Track the quality and maintainer of a language within the language data.

The documentation for espeak contains a description of available languages, along with an assessment of its quality. This is not maintainable, as the supported languages are continually being updated, and new languages added.

This change will add metadata to the language files that describe their quality/maturity, to indicate their level of assessment by native speakers for how good the pronunciations are. The current maintainer (or unmaintained if none are currently provided), will track if the voice is being actively improved.

Remove redundant `end of ...` comments

Various functions inconsistently have {//===... at the start and } // end of ... comments. These make the code harder to read and should be removed.

Don't depend on a modified praat program to construct the spectral phoneme files.

The way that eSpeak is designed, it uses a modified version of praat to determine the spectral parameters for a phoneme file. This should not be needed -- it should be possible to do this within espeak-ng itself (or the associated GUI voice editor).

Support phoneme transcriptions in the phoneme definitions

The phoneme definitions should provide:

ipa -- the Unicode IPA transcription;
ascii -- voice specific ascii transcription (sampa, kirshenbaum, espeak, x-sampa, cxs, etc.);
kirshenbaum features -- the phoneme features based on Evan Kirshenbaum's ASCII-IPA paper, with the Cainteoir Text-to-Speech extensions to cover all of IPA.

The ascii transcription used should be specified within the voice definition file.

NOTE: The kirshenbaum/cainteoir phoneme features should be documented in a markdown file in the docs directory.

Declare variables at their first point of use.

The eSpeak codebase uses old-style C89 variable declaration constraints. It should declare the variables at the point they are first used to make the code easier to read and maintain.

Create a src/compat/getopts.[hc] file for getopt compatibility.

On systems that need getopts (e.g. Windows), the getopt compatibility helper should be moved to a separate src/compat/getopt.[hc] file. This allows the compatibility code to be shared between espeak-ng and speak-ng as well as use a standard (well tested) implementation of the compatibility code.

Support phoneme input modes

The espeak command line and API should provide the following input modes:

text;
SSML tags;
HTML tags;
SSML and HTML tags;
IPA phoneme transcription;
ASCII phoneme transcription (voice-specific transcriptions).

autogen.sh warnings

On (L)ubuntu 14.04, when autogen.sh is invoked following warnings are shown:

libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
Makefile.am:54: warning: '%'-style pattern rules are a GNU make extension
Makefile.am:58: warning: '%'-style pattern rules are a GNU make extension

Implement a voice and language editor in Qt.

The espeakedit program requires wxWidgets to build. This dependency should be removed, such that:

the parts of espeakedit used to build the phoneme tables and intonation files should be rewritten in C and exposed in the espeak-ng program itself (issue #18);
the remaining espeakedit code should be removed (issue #18);

Implementing the editor functionality from scratch allows avoiding the complexities of the espeakedit code (it relies on accessing internal APIs) and in porting from wxWidgets to Qt. It also allows the editor to be redesigned to match the needs of making it easy to create, edit and test the voice and language creation graphically.

NOTE: It should be possible to create, edit and test voices and languages on a command line without needing to use the GUI. The GUI should just make the process easier.

Support Mac OSX

The main issue here is reworking the event logic to use Mac compatible versions of the sem_ functions, e.g. via POSIX APIs.

Rework the C API and provide espeak compatibility.

The speak_lib.h API should be redesigned to better fit the usage, provide more detailed status codes, etc. and be placed in espeak_ng.h. This will allow the eSpeak NG API to be used independently of the eSpeak API, and make it possible to evolve the API to meet the needs of eSpeak NG to provide new features.

For eSpeak compatibility, the speak_lib.h methods should be implemented in speak_lib.c -- these should forward to the espeak_ng.h APIs, map the status codes, etc..

Reformat the code with a consistent style.

The code should be reformatted to:

Use consistent indentation;
Use a space after if, return, etc.;
Use return x instead of return(x).

Other style improvements should be applied to reflect modern C practices.

Use stdbool.h instead of int.

The eSpeak project uses int/0/1 for boolean values. The stdbool.h header and bool/true/false should be used instead. This will help with the readability and maintainability of the code.

Dutch language improvements

Although some improvements in the Dutch language have been included in the original espeak in 2012 and 2013, Dutch is still far from ideal. Things which have to be improved:

The [r] phoneme, which is a trilled rhotic yet, but doesn't sound as it should. Furthermore, in Dutch, we have several accepted phonemes for words with 'r'.
The [Q] phoneme in words with 'g' really doesn't sound as it should either
[v] is used for words with 'v', which makes 'v' sound like [w]. It is not possible to distinguish 'van' from 'wan', for example
Pronunciation of several English and French words which have been incorporated into the Dutch language

In this branch, i started work on ch/g. Only thing I did so far is renaming the [x2] phoneme to [x], making words with [x] use the [x2] phoneme in Afrikaans. Please let me know whether this is desired behavior, or whether I should change nl-rules and nl-list to have [x2] instead, thereby abandoning [x].

Support emoticons and emoji symbols (Zsye).

There are 3 types of emoticons/emoji that can be supported:

special punctuation/symbol sequences like :);
Unicode characters like 😃 (smiling face with open mouth);
using emoji shortcodes like :smile:.

Ideally, the punctuation sequences and emoji shortcodes should be mapped to the Unicode characters, and those characters specify the text of how they are pronounced (e.g. "smiling face"). The Unicode character support should be possible, but I suspect the others would need modifications to espeak's text analysis logic to detect the emoticon/emoji sequences -- I don't currently understand this logic too well, so would need some time understanding how it works.

The other issue is sharing the punctuation sequence and emoji shortcode mappings, so the different languages don't need to duplicate those definitions. I don't currently know how possible that would be.

Pronouncing the Unicode characters can be tricky as well ("emoji" technically also cover emoticons and other characters like playing cards that are not necessarily part of the emoji block).

The Unicode characters can combine in complex ways. The flag "emoji" are an encoding of the 2-letter country code that flag represents (IT for the Italian flag, US for the American flag, GB for the British flag, etc.) -- all these permutations need supporting. Another complexity is the recent addition of skin tone modifiers.

Resources/References:

http://www.unicode.org/emoji/charts/full-emoji-list.html -- a list of emoji/emoticon characters;
http://www.emoji-cheat-sheet.com/ -- a list of emoji/emoticon shortcodes;
http://www.unicode.org/Public/emoji/1.0/emoji-data.txt -- information about emoji;
http://www.unicode.org/reports/tr51/tr51-2.html -- Unicode emoji technical report;
http://cldr.unicode.org/ -- Unicode Common Locale Data Repository (includes TTS name annotations for many emoji in several languages, including Italian);
http://emojipedia.org/ -- a catalogue of emoji;
http://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt (1.5Mb) -- includes the names of all the Unicode characters (including emoji);
http://www.unicode.org/Public/8.0.0/charts/CodeCharts.pdf (98Mb) -- the Unicode code charts, including the emoji, emoticon and other symbol charts;
https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 -- the 2-letter country codes for the flag emoji.

Fix the portability of `strcasecmp`.

Windows does not provide strings.h, needed for strcasecmp. This should be checked via configure checks and wrapped in #ifdef HAVE_STRINGS_H.

The strcasecmp function should also be wrapped in a configure check. On Windows systems, _stricmp should be used instead, e.g. via:

#define strcasecmp _stricmp

in a Windows-specific config.h file.

See: 603f046#commitcomment-15038137

Provide portable logic to check if a directory exists.

The current directory checking function in src/compiledata.c needs to be made portable.

On POSIX systems, it should use access, then use stat and S_IFDIR to check if it is actually a directory. On Windows, it should use GetFileAttributes.

See: 8218109#commitcomment-15038561

Convert the HTML documentation to markdown

The docs directory contains documentation in HTML format. This should be converted to markdown, with the index linked from the README.md file. The docs can be build using kramdown.

Latest espeak requires a C11 compiler to compile

Latest espeak doesn't compile:

...
src/libespeak-ng/speak_lib.c:148:4: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
    espeak_ERROR a_error = event_declare(event);
    ^
src/libespeak-ng/speak_lib.c: In function 'sync_espeak_terminated_msg':
src/libespeak-ng/speak_lib.c:225:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  int finished=0;
  ^
src/libespeak-ng/speak_lib.c: In function 'MarkerEvent':
src/libespeak-ng/speak_lib.c:570:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_EVENT *ep;
  ^
src/libespeak-ng/speak_lib.c: In function 'sync_espeak_Synth':
src/libespeak-ng/speak_lib.c:632:2: error: 'for' loop initial declarations are only allowed in C99 mode
  for (int i=0; i < N_SPEECH_PARAM; i++)
  ^
src/libespeak-ng/speak_lib.c:632:2: note: use option -std=c99 or -std=gnu99 to compile your code
src/libespeak-ng/speak_lib.c: In function 'espeak_Initialize':
src/libespeak-ng/speak_lib.c:772:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  int param;
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Synth':
src/libespeak-ng/speak_lib.c:854:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error=EE_INTERNAL_ERROR;
  ^
src/libespeak-ng/speak_lib.c:870:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c1 = create_espeak_text(text, size, position, position_type, end_position, flags, user_data);
  ^
src/libespeak-ng/speak_lib.c:876:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c2 = create_espeak_terminated_msg(*unique_identifier, user_data);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Synth_Mark':
src/libespeak-ng/speak_lib.c:935:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c1 = create_espeak_mark(text, size, index_mark, end_position,
  ^
src/libespeak-ng/speak_lib.c:942:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c2 = create_espeak_terminated_msg(*unique_identifier, user_data);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Key':
src/libespeak-ng/speak_lib.c:977:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error = EE_OK;
  ^
src/libespeak-ng/speak_lib.c:986:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c = create_espeak_key( key, NULL);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Char':
src/libespeak-ng/speak_lib.c:1009:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error;
  ^
src/libespeak-ng/speak_lib.c:1017:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c = create_espeak_char( character, NULL);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_SetParameter':
src/libespeak-ng/speak_lib.c:1109:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error;
  ^
src/libespeak-ng/speak_lib.c:1117:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c = create_espeak_parameter(parameter, value, relative);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_SetPunctuationList':
src/libespeak-ng/speak_lib.c:1138:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error;
  ^
src/libespeak-ng/speak_lib.c:1146:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c = create_espeak_punctuation_list( punctlist);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Cancel':
src/libespeak-ng/speak_lib.c:1219:2: error: 'for' loop initial declarations are only allowed in C99 mode
  for (int i=0; i < N_SPEECH_PARAM; i++)
  ^
make[1]: *** [src/libespeak-ng/src_libespeak_la-speak_lib.lo] Error 1
make[1]: Leaving directory `/home/valdis/code/espeak-ng'
make: *** [all] Error 2