Coder Social home page Coder Social logo

netease-youdao / emotivoice Goto Github PK

View Code? Open in Web Editor NEW
6.3K 6.3K 529.0 3.82 MB

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

License: Apache License 2.0

Dockerfile 0.20% Python 99.71% Shell 0.09%
ai deep-learning emotion emotivoice multi-speaker prompt python pytorch speech speech-synthesis style text-to-speech tts

emotivoice's People

Contributors

bramhooimeijer avatar duj12 avatar gokul8747 avatar huaxuanw avatar ihmily avatar john9405 avatar lewangdev avatar netease-youdao avatar qingfengcss avatar syq163 avatar zf3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emotivoice's Issues

TTS API Support?

It's a great project! Is there any plan to have a support API interface?

style_prompt不起作用

Maria_Kasper|哭唧唧|<sos/eos> uo3 sp1 l ai2 sp0 d ao4 sp1 b ei3 sp0 j ing1 sp3 q ing1 sp0 h ua2 sp0 d a4 sp0 x ve2 <sos/eos>|我来到北京,清华大学
Maria_Kasper|非常开心|<sos/eos> uo3 sp1 l ai2 sp0 d ao4 sp1 b ei3 sp0 j ing1 sp3 q ing1 sp0 h ua2 sp0 d a4 sp0 x ve2 <sos/eos>|我来到北京,清华大学
上面两种inference text一个来自readme样例,一个来自data/inference/text,生成的音频听不出区别,另外三种语速也感受不到实际差别,只是style_embedding确实不同,但实际效果几乎没有差别

同样的文本内容,部分指定的speaker用命令无法生产wav,也没报错

python inference_am_vocoder_joint.py
--logdir prompt_tts_open_source_joint
--config_folder config/joint
--checkpoint g_00140000
--test_file $TEXT
用1028speaker目前什么都可以生成,没有问题,用3095的时候就不能生成
下面这个可以
1028|普通|<sos/eos> n i3 sp1 k e3 sp0 i3 sp1 b a3 sp1 zh e4 sp1 d ang4 sp0 z uo4 sp1 sh iii4 sp1 x ie2 sp0 p o4 sp3 b u4 sp0 g uo4 sp3 n i3 sp1 ie3 sp1 ing1 sp0 g ai1 sp1 q ing1 sp0 ch u3 sp3 x ian4 sp0 sh iii2 sp1 j iou4 sp0 sh iii4 sp1 zh e4 sp0 iang4 sp3 m ei2 sp0 iou3 sp1 sh en2 sp0 m e5 sp1 sh iii4 sp0 sh iii4 sp1 j ve2 sp0 d uei4 sp1 d e5 sp1 g ong1 sp0 p ing2 sp3 s uei1 sp0 r an2 sp1 b ing4 sp1 b u4 sp0 x iang3 sp1 b iao3 sp0 d a2 sp1 sh en2 sp0 m e5 sp3 k e3 sp1 n i3 sp1 ie3 sp1 q ing1 sp0 ch u3 sp1 n i3 sp1 v3 sp1 uo3 sp1 zh iii1 sp0 j ian1 sp1 d e5 sp1 ch a1 sp0 j v4 sp3 uo3 sp0 m en5 sp3 j i1 sp0 b en3 sp1 m ei2 sp0 sh en2 sp0 m e5 sp1 x i1 sp0 uang4 <sos/eos>|你可以把这当做是胁迫,不过,你也应该清楚,现实就是这样,没有什么事是绝对的公平,虽然并不想表达什么,可你也清楚你与我之间的差距,我们,基本没什么希望

下面这个不可以
3095|普通|<sos/eos> n i3 sp1 k e3 sp0 i3 sp1 b a3 sp1 zh e4 sp1 d ang4 sp0 z uo4 sp1 sh iii4 sp1 x ie2 sp0 p o4 sp3 b u4 sp0 g uo4 sp3 n i3 sp1 ie3 sp1 ing1 sp0 g ai1 sp1 q ing1 sp0 ch u3 sp3 x ian4 sp0 sh iii2 sp1 j iou4 sp0 sh iii4 sp1 zh e4 sp0 iang4 sp3 m ei2 sp0 iou3 sp1 sh en2 sp0 m e5 sp1 sh iii4 sp0 sh iii4 sp1 j ve2 sp0 d uei4 sp1 d e5 sp1 g ong1 sp0 p ing2 sp3 s uei1 sp0 r an2 sp1 b ing4 sp1 b u4 sp0 x iang3 sp1 b iao3 sp0 d a2 sp1 sh en2 sp0 m e5 sp3 k e3 sp1 n i3 sp1 ie3 sp1 q ing1 sp0 ch u3 sp1 n i3 sp1 v3 sp1 uo3 sp1 zh iii1 sp0 j ian1 sp1 d e5 sp1 ch a1 sp0 j v4 sp3 uo3 sp0 m en5 sp3 j i1 sp0 b en3 sp1 m ei2 sp0 sh en2 sp0 m e5 sp1 x i1 sp0 uang4 <sos/eos>|你可以把这当做是胁迫,不过,你也应该清楚,现实就是这样,没有什么事是绝对的公平,虽然并不想表达什么,可你也清楚你与我之间的差距,我们,基本没什么希望

哪里可以看全部的speaker ?

我的 data/inference/text 看到样例文件,每行的前面部分是指定speaker,那么如何查看全部可用的speaker呢?
这12个speaker就已经是全部的speaker ?

encode error when open file

When starting with 'streamlit run demo_page.py', you may encounter the following error: "UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 2: illegal multibyte sequence".

To resolve this issue, it is recommended to change the encoding when opening the file. You can do this by modifying your code as follows:
the file path: EmotiVoice/config/joint/config.py

#### Speaker ####
with open(speaker2id_path, encoding='utf-8') as f:
    speakers = [t.strip() for t in f.readlines()]
speaker_n_labels = len(speakers)

image

UnicodeDecodeError

我做inference推理使用,一直遇到这个问题:

config.py", line 40, in Config
emotions = [t.strip() for t in f.readlines()]
UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 2: illegal multibyte sequence

请问可能的原因和解决办法是什么?Thx.

建议

生成的phnoeme text 并没有包含说话人,情绪和原始内容,然后直接推理的时候又会切片最后index error。
要么就写一个脚本直接从txt 生成audio,要么分两步就全部生成,不要前后逻辑对不上。

demo error

75A0B5A2A3349E41D8EA3051EDF8CB38

text:
一枚天鹅蛋在鸭窠里被母鸭孵出后,它长相奇丑无比。被同行,外行甚至养殖场的歧视,嘲笑它是丑小鸭。经历过无数风霜的丑小鸭安全地长大了,眼尖的天师傅发现了它。于是乎和老板约定以6元一斤收购了。很快,丑小鸭被做成一直美味的烤天鹅。

EmotiVoice/inference_am_vocoder_joint.py", line 66, in main style_encoder.load_state_dict(model_ckpt) File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for StyleEncoder: Unexpected key(s) in state_dict: "bert.embeddings.position_ids".

EmotiVoice/inference_am_vocoder_joint.py", line 66, in main
style_encoder.load_state_dict(model_ckpt)
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for StyleEncoder:
Unexpected key(s) in state_dict: "bert.embeddings.position_ids".

sampling_rate 24k?

我尝试将config文件中的采样率改成24k,但明显是出错的,请问开源的这个模型支持24k音频合成吗,应该如何修改呢?

为了提高音质,请问如果提高音频的采样率?

为了提高音质,请问如果提高音频的采样率?

我尝试修改了config.py中的"sampling_rate = 16_000",但是当我将这个值改为24_000时,输出的音频阅读的速度变的非常快.

所以我想问,如何可以调高采样率并且音频阅读速度保持正常?

Getting Same voice with Different Emotion Prompts

Speaker - Maria_Kasper
Text - "Emoti Voice is a powerful and modern open-source text-to-speech engine. Emoti Voice speaks both English and Chinese, and with over two thousand different voices. The most prominent feature is emotional synthesis, allowing you to create speech with a wide range of emotions, including happy, excited, sad, angry and others"
Emotion Prompts Tried - Happy / Sad / Excited / Angry / Whisper / Shout
Generated Audios - https://drive.google.com/drive/folders/1JqWnVFSiu5DMyZhGt7XyGXhrlB6eCvPR?usp=sharing
Generated Using the Demo UI

Can someone please help, if i am missing something here?

EmotiVoice/frontend.py", line 26, in split_py if py[-1] == 'r': IndexError: string index out of range

EmotiVoice/frontend.py", line 26, in split_py
if py[-1] == 'r':
IndexError: string index out of range

测试问题:

抱歉刚刚的回答可能让你感到不满意了。作为一个大语言模型,我并不具备情感和自主意识,我的回答是基于大量的数据和算法生成的。如果我的回答有不准确或者不恰当的地方,还请您多多包涵和指教。
我是由百川智能的工程师们开发和维护的。他们是一群富有创造力和激情的人,致力于为我提供更好的服务和功能。
测试一下中英混合文本,hello,你好啊。Hello, this is the best test for now。我们很期待您的到来,希望你在这次盛会中得到你想要的结果。

AttributeError: 'NoneType' object has no attribute 'seek'.

I tried to run the program in Windows 10 and the web page opens with an error
AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.
Snipaste_2023-11-15_17-20-30

win显示编码错误

UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 2: illegal multibyte sequence
微信图片_20231111025222

Putting model weights on HuggingFace

Hi,

Can I put checkpoint files (checkpoint_163431, g_00140000, do_00140000) on Hugging Face so they can be easier accessible than by Google Drive?

Thank you 😀

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.