netease-youdao / emotivoice Goto Github PK

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

License: Apache License 2.0

Dockerfile 0.20% Python 99.71% Shell 0.09%

ai deep-learning emotion emotivoice multi-speaker prompt python pytorch speech speech-synthesis style text-to-speech tts

emotivoice's People

Contributors

Stargazers

Watchers

Forkers

huaxuanw ishine liuyongjie985 amorjnyh maxmax2016 playvoice pengyun1314123 whitefu maoshuiyang pigorz taichuai peter05010402 zwglory haojingyuan v-mi fortunecat0884 louis-xwb henghaheng prahs superfreakman jollyant zqlsnr yuanfangme jetwaves zhaoxiaobao keikinn hangox albin-zhu alex-ibb zfbok mhe014 wendongj moreyogurt liangshaojiang achillesxu majiajue forica q-coding-cg splinter21 14923523 larriti thinkerchina zlg810 poeticmedia ilovefeng kawais ghowtan rogervaas sagarneo11 hadryan lextimezy iamleon121 road0001 eric-wei gaoxiaowei soon14 z1446722374 news780 ai-jie01 zhoulingjie pmp181818 kamasamikon caplost zinjoyce zjw-swun sunilgitb lemon22333 af-74413592 hhy5277 guonetnet51 xhwskhizein creative-v bankxi zhangdekui wuxiaoxrj kary372022 bambuo linecode wjmboss genexis-ai willkhoza codeaudit twonp168 chunhualiu suryatmodulus heycms v1cc0 fdoperezi tomchapin ryanhollander umhau ali-biz-gh f901107 lyhiving michaelten emacser dgreen2017 liuguoyou dowhere luchuanze

emotivoice's Issues

你好，我在测试英文tts生成的时候，他的速度很慢13-14s，但是在生成中文语音的时候却很快，1-2秒就生成好了，这正常吗？

如图，cuda正常

Voice cloning

Hello @netease-youdao,
Is there any way to support zero-shot voice cloning from a voice sample?
Thank you!

请问有自带**人声音的speaker吗

看了一下speaker列表似乎都是外国人？有**人的speaker吗，还是说需要自己在哪里下载导入到什么地方，谢谢啦

python inference_am_vocoder_joint.py 命令没起作用呢

，看见输出run，但是没有test_audio目录和生成的语音文件呢

请问推理输入的sp0， sp1是什么意思？

请问推理输入的音素序列中 sp0， sp1 是什么？是停顿标记吗？在推理时是怎么得到的？

TTS API Support?

It's a great project! Is there any plan to have a support API interface?

Maria_Kasper|哭唧唧|<sos/eos> uo3 sp1 l ai2 sp0 d ao4 sp1 b ei3 sp0 j ing1 sp3 q ing1 sp0 h ua2 sp0 d a4 sp0 x ve2 <sos/eos>|我来到北京，清华大学
Maria_Kasper|非常开心|<sos/eos> uo3 sp1 l ai2 sp0 d ao4 sp1 b ei3 sp0 j ing1 sp3 q ing1 sp0 h ua2 sp0 d a4 sp0 x ve2 <sos/eos>|我来到北京，清华大学
上面两种inference text一个来自readme样例，一个来自data/inference/text，生成的音频听不出区别，另外三种语速也感受不到实际差别，只是style_embedding确实不同，但实际效果几乎没有差别

Needs documentation for hardware requirement

想知道大概需要多少GPU memory做inference

同样的文本内容，部分指定的speaker用命令无法生产wav，也没报错

python inference_am_vocoder_joint.py
--logdir prompt_tts_open_source_joint
--config_folder config/joint
--checkpoint g_00140000
--test_file $TEXT
用1028speaker目前什么都可以生成，没有问题，用3095的时候就不能生成
下面这个可以
1028|普通|<sos/eos> n i3 sp1 k e3 sp0 i3 sp1 b a3 sp1 zh e4 sp1 d ang4 sp0 z uo4 sp1 sh iii4 sp1 x ie2 sp0 p o4 sp3 b u4 sp0 g uo4 sp3 n i3 sp1 ie3 sp1 ing1 sp0 g ai1 sp1 q ing1 sp0 ch u3 sp3 x ian4 sp0 sh iii2 sp1 j iou4 sp0 sh iii4 sp1 zh e4 sp0 iang4 sp3 m ei2 sp0 iou3 sp1 sh en2 sp0 m e5 sp1 sh iii4 sp0 sh iii4 sp1 j ve2 sp0 d uei4 sp1 d e5 sp1 g ong1 sp0 p ing2 sp3 s uei1 sp0 r an2 sp1 b ing4 sp1 b u4 sp0 x iang3 sp1 b iao3 sp0 d a2 sp1 sh en2 sp0 m e5 sp3 k e3 sp1 n i3 sp1 ie3 sp1 q ing1 sp0 ch u3 sp1 n i3 sp1 v3 sp1 uo3 sp1 zh iii1 sp0 j ian1 sp1 d e5 sp1 ch a1 sp0 j v4 sp3 uo3 sp0 m en5 sp3 j i1 sp0 b en3 sp1 m ei2 sp0 sh en2 sp0 m e5 sp1 x i1 sp0 uang4 <sos/eos>|你可以把这当做是胁迫,不过,你也应该清楚,现实就是这样,没有什么事是绝对的公平,虽然并不想表达什么,可你也清楚你与我之间的差距,我们,基本没什么希望

下面这个不可以
3095|普通|<sos/eos> n i3 sp1 k e3 sp0 i3 sp1 b a3 sp1 zh e4 sp1 d ang4 sp0 z uo4 sp1 sh iii4 sp1 x ie2 sp0 p o4 sp3 b u4 sp0 g uo4 sp3 n i3 sp1 ie3 sp1 ing1 sp0 g ai1 sp1 q ing1 sp0 ch u3 sp3 x ian4 sp0 sh iii2 sp1 j iou4 sp0 sh iii4 sp1 zh e4 sp0 iang4 sp3 m ei2 sp0 iou3 sp1 sh en2 sp0 m e5 sp1 sh iii4 sp0 sh iii4 sp1 j ve2 sp0 d uei4 sp1 d e5 sp1 g ong1 sp0 p ing2 sp3 s uei1 sp0 r an2 sp1 b ing4 sp1 b u4 sp0 x iang3 sp1 b iao3 sp0 d a2 sp1 sh en2 sp0 m e5 sp3 k e3 sp1 n i3 sp1 ie3 sp1 q ing1 sp0 ch u3 sp1 n i3 sp1 v3 sp1 uo3 sp1 zh iii1 sp0 j ian1 sp1 d e5 sp1 ch a1 sp0 j v4 sp3 uo3 sp0 m en5 sp3 j i1 sp0 b en3 sp1 m ei2 sp0 sh en2 sp0 m e5 sp1 x i1 sp0 uang4 <sos/eos>|你可以把这当做是胁迫,不过,你也应该清楚,现实就是这样,没有什么事是绝对的公平,虽然并不想表达什么,可你也清楚你与我之间的差距,我们,基本没什么希望

IsADirectoryError: [Errno 21] Is a directory: '/home/firefly/project_lzl/EmotiVoice/outputs/style_encoder/ckpt/checkpoint_163431' python-BaseException

IsADirectoryError: [Errno 21] Is a directory: '/EmotiVoice/outputs/style_encoder/ckpt/checkpoint_163431'

demo_page.py里头 device写死了用CPU

建议改为
DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"

有没有技术文档呢？

哪里可以看全部的speaker ?

我的 data/inference/text 看到样例文件，每行的前面部分是指定speaker,那么如何查看全部可用的speaker呢？
这12个speaker就已经是全部的speaker ?

请问能否提供一个可以在 gitee.com 或国内主流云盘的下载地址？

目前 https://huggingface.co 好像国内很难访问，请支持一下国内开发者和测试者，提供一个方便国内网络下载本项目模型或较大文件的地址。谢谢先。

encode error when open file

When starting with 'streamlit run demo_page.py', you may encounter the following error: "UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 2: illegal multibyte sequence".

To resolve this issue, it is recommended to change the encoding when opening the file. You can do this by modifying your code as follows:
the file path: EmotiVoice/config/joint/config.py

#### Speaker ####
with open(speaker2id_path, encoding='utf-8') as f:
    speakers = [t.strip() for t in f.readlines()]
speaker_n_labels = len(speakers)

群可以拉我一下吗，申请不进去

UnicodeDecodeError

我做inference推理使用，一直遇到这个问题：

config.py", line 40, in Config
emotions = [t.strip() for t in f.readlines()]
UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 2: illegal multibyte sequence

请问可能的原因和解决办法是什么？Thx.

建议

生成的phnoeme text 并没有包含说话人，情绪和原始内容，然后直接推理的时候又会切片最后index error。
要么就写一个脚本直接从txt 生成audio，要么分两步就全部生成，不要前后逻辑对不上。

demo error

text:
一枚天鹅蛋在鸭窠里被母鸭孵出后，它长相奇丑无比。被同行，外行甚至养殖场的歧视，嘲笑它是丑小鸭。经历过无数风霜的丑小鸭安全地长大了，眼尖的天师傅发现了它。于是乎和老板约定以6元一斤收购了。很快，丑小鸭被做成一直美味的烤天鹅。

EmotiVoice/inference_am_vocoder_joint.py", line 66, in main style_encoder.load_state_dict(model_ckpt) File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for StyleEncoder: Unexpected key(s) in state_dict: "bert.embeddings.position_ids".

EmotiVoice/inference_am_vocoder_joint.py", line 66, in main
style_encoder.load_state_dict(model_ckpt)
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for StyleEncoder:
Unexpected key(s) in state_dict: "bert.embeddings.position_ids".

sampling_rate 24k?

我尝试将config文件中的采样率改成24k，但明显是出错的，请问开源的这个模型支持24k音频合成吗，应该如何修改呢？

Hggingface Space ?

Can a Huggingface Space be made for this project ?

如何基于自己的声音进行微调呢

如题

感谢作者开源，请问有官方交流群没？

为了提高音质,请问如果提高音频的采样率?

我尝试修改了config.py中的"sampling_rate = 16_000",但是当我将这个值改为24_000时,输出的音频阅读的速度变的非常快.

所以我想问,如何可以调高采样率并且音频阅读速度保持正常?

Getting Same voice with Different Emotion Prompts

Speaker - Maria_Kasper
Text - "Emoti Voice is a powerful and modern open-source text-to-speech engine. Emoti Voice speaks both English and Chinese, and with over two thousand different voices. The most prominent feature is emotional synthesis, allowing you to create speech with a wide range of emotions, including happy, excited, sad, angry and others"
Emotion Prompts Tried - Happy / Sad / Excited / Angry / Whisper / Shout
Generated Audios - https://drive.google.com/drive/folders/1JqWnVFSiu5DMyZhGt7XyGXhrlB6eCvPR?usp=sharing
Generated Using the Demo UI

Can someone please help, if i am missing something here?

Streaming TTS Support？

This is a great project! Is there any plan to support streaming TTS?

建议增加一个http类的接口方便集成到运营环境中，开箱即用

建议使用 get或者post 传文字、speaker_id 、提示这类参数返回文件 url 或者音频数据

如果为了效率可以增加一层缓存，对传参做个md5，作为文件名缓存文件效果也挺好

EmotiVoice/frontend.py", line 26, in split_py if py[-1] == 'r': IndexError: string index out of range

EmotiVoice/frontend.py", line 26, in split_py
if py[-1] == 'r':
IndexError: string index out of range

测试问题：

抱歉刚刚的回答可能让你感到不满意了。作为一个大语言模型，我并不具备情感和自主意识，我的回答是基于大量的数据和算法生成的。如果我的回答有不准确或者不恰当的地方，还请您多多包涵和指教。
我是由百川智能的工程师们开发和维护的。他们是一群富有创造力和激情的人，致力于为我提供更好的服务和功能。
测试一下中英混合文本，hello,你好啊。Hello, this is the best test for now。我们很期待您的到来，希望你在这次盛会中得到你想要的结果。

苹果Mac M2机子能玩吗？

后续会不会支持下苹果环境？

不是英伟达的显卡和gpu能跑这个项目吗

不是英伟达的显卡和gpu能跑这个项目吗，不是指docker运行，就是比如要做修改那种

How to obtain the pretrained model

cuda和python之类的版本有要求么？

AttributeError: 'NoneType' object has no attribute 'seek'.

I tried to run the program in Windows 10 and the web page opens with an error
AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

Thank you 😀

config.py 里头的encoding还是没改过来

第28行
with open(file_path, encoding = "UTF-8") as f: