Coder Social home page Coder Social logo

espeak-ng / espeak-ng Goto Github PK

View Code? Open in Web Editor NEW
2.9K 103.0 761.0 56.85 MB

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

License: GNU General Public License v3.0

Shell 4.32% C 77.16% C++ 3.21% HTML 0.21% Makefile 1.61% M4 0.84% JavaScript 0.55% Java 9.40% Python 1.84% Vim Script 0.22% CMake 0.63%
espeak-ng espeak android text-to-speech speech-synthesis

espeak-ng's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

espeak-ng's Issues

Move the current voice definition files to language definition files.

The voice files are closer to language files. These languages should be organised by the closest ISO 639-5 language family code (e.g. voices/europe/cy moving to language/cel/cy -- Celtic/Welsh).

Additionally, the languages should be BCP 47 compliant (e.g. en-GB-scotland instead of en-sc). Where extensions are needed, the private use tags from Cainteoir Text-to-Speech should be used. These should be described in a privateuse.dat file included in the espeak-ng project.

Define the MBROLA voices as primary phoneme table data

The MBROLA voice phoneme maps should be removed -- these should be replaced by MBROLA voice specific phoneme tables. These should support:

  1. Mapping standard phonemes (consonants, vowels affricates and diphthongs);
  2. Long and short variants (for vowels);
  3. Constructed phonemes (for greater phoneme coverage);
  4. Phoneme aliases (for greater language support).

It should be possible to modify the phoneme sequence to account for missing diphones (e.g. the h-6 diphone on the German voices).

Remove commented/#ifdef'd out code.

This code is not used by the program and is accessible by the source control history. Therefore, this code should be removed to make the code more readable.

Make all error paths return an espeak_ng_STATUS code.

Currently, not all code paths inherited from the espeak codebase return the correct error code. The code should be checked so that:

  1. Calls that can fail correctly check and return a valid espeak_ng_STATUS code;
  2. Calls that fail processing a file should set an espeak_ng_ERROR_CONTEXT object.

This will ensure that the underlying error is correctly preserved.

speech.h should not hardcode platform-specific defines

Currently at master 36be9ac src/libespeak-ng/speech.h line 41: PLATFORM_POSIX is a hardcoded define.
This makes it impossible to compile on non-posix platforms such as Windows, even by providing contrary PLATFORM_* defines via compiler commandlines.
Perhaps it is a work in progress to move the code across to some kind of configure.h.in or equivalent, but for now this is completely breaking compilation of libespeak-ng on Windows for NVDA.

Note that commenting this line out, addressing the fclose bug in LoadSpectSeq, handling NULL log FILE arguments in compilePhonemeData and compileIntonations, and addressing the phsource directory check in compilePhonemeData2, libespeak-ng is being successfully compiled and used by NVDA.

Rebrand the espeak program to espeak-ng.

The program, library, environment variables, etc. should reflect the project being espeak-ng. This will help differentiate it from the upstream espeak project, should Jonathan continue to make releases to it, allowing both to run on a system. It also helps to avoid confusion.

Clean up the portability support for espeak.

Currently, the espeak portability is a mix of PLATFORM_* checks and HAVE_* checks.

The portability approach should be like how libressl handles portability. That is:

  1. Define POSIX and ISO C headers needing compatibility in src/include/compat.
  2. Move the header-specific compatibility checks in those compatibility headers.
  3. Create a set of src/compat/API_NAME.c compatibility implementations.
  4. Add the compatibility sources that are needed for a given platform.

This will keep the main source code clean and free of #if ... checks.

Never call assert in the library codebase.

Various places in the codebase (e.g. the async thread code) call assert on errors. This will cause the calling application to exit abruptly.

These assert checks should be made into if checks that return espeak_ng_STATUS error codes that are propagated to the caller.

Don't include StdAfx.h in every file.

This is an artifact of the Windows Visual Studio build support for pre-compiled headers. The pre-compiled headers are only used for including a lean and mean version of windows.h. Thus, in the places where windows.h is actually used, it should be included within an #ifdef PLATFORM_WINDOWS block. The Visual Studio project files should use the "not using pre-compiled headers" option.

Reimplement the SAPI bindings.

The SAPI bindings should be rewritten to avoid the Copyright assignments to Microsoft. The rewrite should also look to improve the SAPI binding in general.

General Infrastructure:

  • Refer to external versions of the SAPI libraries.
  • Don't include generated files from midl, etc.
  • Use a more modern project format (MSBuild) than Visual Studio 6!
  • Avoid using pre-compiled headers (see #6).
  • Implement the SAPI COM interfaces in libespeak-ng.dll, not in a separate DLL.
  • Don't depend on the ATL library -- only use the Windows headers.

SAPI Interfaces:

  • ISpObjectWithToken
  • ISpTTSEngine

ISpTTSEngine::Speak SPVTEXTFRAG state actions:

  • SPVA_Speak
  • SPVA_Silence
  • SPVA_Pronounce
  • SPVA_Bookmark
  • SPVA_SpellOut
  • SPVA_ParseUnknownTag

Voice Management:

  • Select and install voices from the MSI installer -- language+accent.
  • Select the voice variant from an eSpeak SAPI configuration/properties page.

Create a "jonathan" voice from the existing phoneme table data

The existing espeak phoneme table voice data should be restructured into a src/voices/jonathan. It should be capable of speaking as many IPA phonemes as possible.

Once the project has been restructured to make use of this voice profile, the existing voice data should be removed.

  1. Start with the English voice;
  2. Build out to other English accents;
  3. Extend to support other language phonemes on a language-by-language basis.

Move the compiledata code from espeakedit to libespeak-ng

This requires porting the code from wxWidgets to C.

This allows the espeak program to support building the phoneme table and intonation data, making it possible to build espeak without espeakedit on a headless (GUI-less) system.

Support selecting alternative voice data

The voice data is the phoneme tables and intonations. This would allow selecting a different location for this data:

  1. if the location is a directory, it loads the files separately from that directory (like is done currently, but allowing a different location than the espeak-data path);
  2. if the location is a file, this loads a combined data file that contains voice metadata and the phonemetable and intonation data files.

Access to this functionality should be provided by the C API and command-line interface.

Track the quality and maintainer of a language within the language data.

The documentation for espeak contains a description of available languages, along with an assessment of its quality. This is not maintainable, as the supported languages are continually being updated, and new languages added.

This change will add metadata to the language files that describe their quality/maturity, to indicate their level of assessment by native speakers for how good the pronunciations are. The current maintainer (or unmaintained if none are currently provided), will track if the voice is being actively improved.

Support phoneme transcriptions in the phoneme definitions

The phoneme definitions should provide:

  1. ipa -- the Unicode IPA transcription;
  2. ascii -- voice specific ascii transcription (sampa, kirshenbaum, espeak, x-sampa, cxs, etc.);
  3. kirshenbaum features -- the phoneme features based on Evan Kirshenbaum's ASCII-IPA paper, with the Cainteoir Text-to-Speech extensions to cover all of IPA.

The ascii transcription used should be specified within the voice definition file.

NOTE: The kirshenbaum/cainteoir phoneme features should be documented in a markdown file in the docs directory.

Declare variables at their first point of use.

The eSpeak codebase uses old-style C89 variable declaration constraints. It should declare the variables at the point they are first used to make the code easier to read and maintain.

Create a src/compat/getopts.[hc] file for getopt compatibility.

On systems that need getopts (e.g. Windows), the getopt compatibility helper should be moved to a separate src/compat/getopt.[hc] file. This allows the compatibility code to be shared between espeak-ng and speak-ng as well as use a standard (well tested) implementation of the compatibility code.

Support phoneme input modes

The espeak command line and API should provide the following input modes:

  1. text;
  2. SSML tags;
  3. HTML tags;
  4. SSML and HTML tags;
  5. IPA phoneme transcription;
  6. ASCII phoneme transcription (voice-specific transcriptions).

autogen.sh warnings

On (L)ubuntu 14.04, when autogen.sh is invoked following warnings are shown:

libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
Makefile.am:54: warning: '%'-style pattern rules are a GNU make extension
Makefile.am:58: warning: '%'-style pattern rules are a GNU make extension

Implement a voice and language editor in Qt.

The espeakedit program requires wxWidgets to build. This dependency should be removed, such that:

  1. the parts of espeakedit used to build the phoneme tables and intonation files should be rewritten in C and exposed in the espeak-ng program itself (issue #18);
  2. the remaining espeakedit code should be removed (issue #18);

Implementing the editor functionality from scratch allows avoiding the complexities of the espeakedit code (it relies on accessing internal APIs) and in porting from wxWidgets to Qt. It also allows the editor to be redesigned to match the needs of making it easy to create, edit and test the voice and language creation graphically.

NOTE: It should be possible to create, edit and test voices and languages on a command line without needing to use the GUI. The GUI should just make the process easier.

Support Mac OSX

The main issue here is reworking the event logic to use Mac compatible versions of the sem_ functions, e.g. via POSIX APIs.

Rework the C API and provide espeak compatibility.

The speak_lib.h API should be redesigned to better fit the usage, provide more detailed status codes, etc. and be placed in espeak_ng.h. This will allow the eSpeak NG API to be used independently of the eSpeak API, and make it possible to evolve the API to meet the needs of eSpeak NG to provide new features.

For eSpeak compatibility, the speak_lib.h methods should be implemented in speak_lib.c -- these should forward to the espeak_ng.h APIs, map the status codes, etc..

Reformat the code with a consistent style.

The code should be reformatted to:

  1. Use consistent indentation;
  2. Use a space after if, return, etc.;
  3. Use return x instead of return(x).

Other style improvements should be applied to reflect modern C practices.

Use stdbool.h instead of int.

The eSpeak project uses int/0/1 for boolean values. The stdbool.h header and bool/true/false should be used instead. This will help with the readability and maintainability of the code.

Dutch language improvements

Although some improvements in the Dutch language have been included in the original espeak in 2012 and 2013, Dutch is still far from ideal. Things which have to be improved:

  • The [r] phoneme, which is a trilled rhotic yet, but doesn't sound as it should. Furthermore, in Dutch, we have several accepted phonemes for words with 'r'.
  • The [Q] phoneme in words with 'g' really doesn't sound as it should either
  • [v] is used for words with 'v', which makes 'v' sound like [w]. It is not possible to distinguish 'van' from 'wan', for example
  • Pronunciation of several English and French words which have been incorporated into the Dutch language

In this branch, i started work on ch/g. Only thing I did so far is renaming the [x2] phoneme to [x], making words with [x] use the [x2] phoneme in Afrikaans. Please let me know whether this is desired behavior, or whether I should change nl-rules and nl-list to have [x2] instead, thereby abandoning [x].

Support emoticons and emoji symbols (Zsye).

There are 3 types of emoticons/emoji that can be supported:

  1. special punctuation/symbol sequences like :);
  2. Unicode characters like ๐Ÿ˜ƒ (smiling face with open mouth);
  3. using emoji shortcodes like :smile:.

Ideally, the punctuation sequences and emoji shortcodes should be mapped to the Unicode characters, and those characters specify the text of how they are pronounced (e.g. "smiling face"). The Unicode character support should be possible, but I suspect the others would need modifications to espeak's text analysis logic to detect the emoticon/emoji sequences -- I don't currently understand this logic too well, so would need some time understanding how it works.

The other issue is sharing the punctuation sequence and emoji shortcode mappings, so the different languages don't need to duplicate those definitions. I don't currently know how possible that would be.

Pronouncing the Unicode characters can be tricky as well ("emoji" technically also cover emoticons and other characters like playing cards that are not necessarily part of the emoji block).

The Unicode characters can combine in complex ways. The flag "emoji" are an encoding of the 2-letter country code that flag represents (IT for the Italian flag, US for the American flag, GB for the British flag, etc.) -- all these permutations need supporting. Another complexity is the recent addition of skin tone modifiers.

Resources/References:

  1. http://www.unicode.org/emoji/charts/full-emoji-list.html -- a list of emoji/emoticon characters;
  2. http://www.emoji-cheat-sheet.com/ -- a list of emoji/emoticon shortcodes;
  3. http://www.unicode.org/Public/emoji/1.0/emoji-data.txt -- information about emoji;
  4. http://www.unicode.org/reports/tr51/tr51-2.html -- Unicode emoji technical report;
  5. http://cldr.unicode.org/ -- Unicode Common Locale Data Repository (includes TTS name annotations for many emoji in several languages, including Italian);
  6. http://emojipedia.org/ -- a catalogue of emoji;
  7. http://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt (1.5Mb) -- includes the names of all the Unicode characters (including emoji);
  8. http://www.unicode.org/Public/8.0.0/charts/CodeCharts.pdf (98Mb) -- the Unicode code charts, including the emoji, emoticon and other symbol charts;
  9. https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 -- the 2-letter country codes for the flag emoji.

Fix the portability of `strcasecmp`.

Windows does not provide strings.h, needed for strcasecmp. This should be checked via configure checks and wrapped in #ifdef HAVE_STRINGS_H.

The strcasecmp function should also be wrapped in a configure check. On Windows systems, _stricmp should be used instead, e.g. via:

#define strcasecmp _stricmp

in a Windows-specific config.h file.

See: 603f046#commitcomment-15038137

Convert the HTML documentation to markdown

The docs directory contains documentation in HTML format. This should be converted to markdown, with the index linked from the README.md file. The docs can be build using kramdown.

Latest espeak requires a C11 compiler to compile

Latest espeak doesn't compile:

...
src/libespeak-ng/speak_lib.c:148:4: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
    espeak_ERROR a_error = event_declare(event);
    ^
src/libespeak-ng/speak_lib.c: In function 'sync_espeak_terminated_msg':
src/libespeak-ng/speak_lib.c:225:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  int finished=0;
  ^
src/libespeak-ng/speak_lib.c: In function 'MarkerEvent':
src/libespeak-ng/speak_lib.c:570:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_EVENT *ep;
  ^
src/libespeak-ng/speak_lib.c: In function 'sync_espeak_Synth':
src/libespeak-ng/speak_lib.c:632:2: error: 'for' loop initial declarations are only allowed in C99 mode
  for (int i=0; i < N_SPEECH_PARAM; i++)
  ^
src/libespeak-ng/speak_lib.c:632:2: note: use option -std=c99 or -std=gnu99 to compile your code
src/libespeak-ng/speak_lib.c: In function 'espeak_Initialize':
src/libespeak-ng/speak_lib.c:772:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  int param;
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Synth':
src/libespeak-ng/speak_lib.c:854:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error=EE_INTERNAL_ERROR;
  ^
src/libespeak-ng/speak_lib.c:870:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c1 = create_espeak_text(text, size, position, position_type, end_position, flags, user_data);
  ^
src/libespeak-ng/speak_lib.c:876:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c2 = create_espeak_terminated_msg(*unique_identifier, user_data);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Synth_Mark':
src/libespeak-ng/speak_lib.c:935:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c1 = create_espeak_mark(text, size, index_mark, end_position,
  ^
src/libespeak-ng/speak_lib.c:942:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c2 = create_espeak_terminated_msg(*unique_identifier, user_data);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Key':
src/libespeak-ng/speak_lib.c:977:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error = EE_OK;
  ^
src/libespeak-ng/speak_lib.c:986:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c = create_espeak_key( key, NULL);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Char':
src/libespeak-ng/speak_lib.c:1009:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error;
  ^
src/libespeak-ng/speak_lib.c:1017:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c = create_espeak_char( character, NULL);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_SetParameter':
src/libespeak-ng/speak_lib.c:1109:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error;
  ^
src/libespeak-ng/speak_lib.c:1117:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c = create_espeak_parameter(parameter, value, relative);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_SetPunctuationList':
src/libespeak-ng/speak_lib.c:1138:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  espeak_ERROR a_error;
  ^
src/libespeak-ng/speak_lib.c:1146:2: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
  t_espeak_command* c = create_espeak_punctuation_list( punctlist);
  ^
src/libespeak-ng/speak_lib.c: In function 'espeak_Cancel':
src/libespeak-ng/speak_lib.c:1219:2: error: 'for' loop initial declarations are only allowed in C99 mode
  for (int i=0; i < N_SPEECH_PARAM; i++)
  ^
make[1]: *** [src/libespeak-ng/src_libespeak_la-speak_lib.lo] Error 1
make[1]: Leaving directory `/home/valdis/code/espeak-ng'
make: *** [all] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.