shibing624 / parrots Goto Github PK

View Code? Open in Web Editor NEW

428.0 12.0 83.0 12.53 MB

Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成，支持多语言，准确率高

License: Apache License 2.0

Python 100.00%

speech-recognition tts parrot text-to-speech-python3 pinyin2hanzi chinese-speech-recognition chinese-speech-synthesis

parrots's Introduction

🇨🇳中文 | 🌐English | 📖文档/Docs | 🤖模型/Models

Online Demo

Parrots: ASR and TTS toolkit

Introduction

Parrots, Automatic Speech Recognition(ASR), Text-To-Speech(TTS) toolkit, support Chinese, English, Japanese, etc.

parrots实现了语音识别和语音合成模型一键调用，开箱即用，支持中英文。

Features

ASR：基于distilwhisper实现的中文语音识别（ASR）模型，支持中、英等多种语言
TTS：基于GPT-SoVITS训练的语音合成（TTS）模型，支持中、英、日等多种语言

Install

pip install torch # or conda install pytorch
pip install -r requirements.txt
pip install parrots

pip install torch # or conda install pytorch
git clone https://github.com/shibing624/parrots.git
cd parrots
python setup.py install

Demo

Offical Demo: https://www.mulanai.com/product/tts/
HuggingFace Demo: https://huggingface.co/spaces/shibing624/parrots

run example: examples/tts_gradio_demo.py to see the demo:

python examples/tts_gradio_demo.py

Usage

ASR(Speech Recognition)

example: examples/demo_asr.py

import os
import sys

sys.path.append('..')
from parrots import SpeechRecognition

pwd_path = os.path.abspath(os.path.dirname(__file__))

if __name__ == '__main__':
    m = SpeechRecognition()
    r = m.recognize_speech_from_file(os.path.join(pwd_path, 'tushuguan.wav'))
    print('[提示] 语音识别结果：', r)

output:

{'text': '北京图书馆'}

TTS(Speech Synthesis)

example: examples/demo_tts.py

import sys
sys.path.append('..')
import parrots
from parrots import TextToSpeech
parrots_path = parrots.__path__[0]
sys.path.append(parrots_path)

m = TextToSpeech(
    speaker_model_path="shibing624/parrots-gpt-sovits-speaker-maimai",
    speaker_name="MaiMai",
)
m.predict(
    text="你好，欢迎来北京。welcome to the city.",
    text_language="auto",
    output_path="output_audio.wav"
)

output:

Save audio to output_audio.wav

命令行模式（CLI）

支持通过命令行方式执行ARS和TTS任务，代码：cli.py

> parrots -h                                    

NAME
    parrots

SYNOPSIS
    parrots COMMAND

COMMANDS
    COMMAND is one of the following:

     asr
       Entry point of asr, recognize speech from file

     tts
       Entry point of tts, generate speech audio from text

run：

pip install parrots -U
# asr example
parrots asr -h
parrots asr examples/tushuguan.wav

# tts example
parrots tts -h
parrots tts "你好，欢迎来北京。welcome to the city." output_audio.wav

asr、tts是二级命令，asr是语音识别，tts是语音合成，默认使用的模型是中文模型
各二级命令使用方法见parrots asr -h
上面示例中examples/tushuguan.wav是asr方法的audio_file_path参数，输入的音频文件（required）

Release Models

ASR

BELLE-2/Belle-distilwhisper-large-v2-zh

TTS

shibing624/parrots-gpt-sovits-speaker

speaker name	说话人名	character	角色特点	language	语言
KuileBlanc	葵·勒布朗	lady	标准美式女声	en	英
LongShouRen	龙守仁	gentleman	标准美式男声	en	英
MaiMai	卖卖	singing female anchor	唱歌女主播声	zh	中
XingTong	星瞳	singing ai girl	活泼女声	zh	中
XuanShen	炫神	game male anchor	游戏男主播声	zh	中
KusanagiNene	草薙寧々	loli	萝莉女学生声	ja	日

shibing624/parrots-gpt-sovits-speaker-maimai

speaker name	说话人名	character	角色特点	language	语言
MaiMai	卖卖	singing female anchor	唱歌女主播声	zh	中

Contact

Issue(建议)：
邮件我：xuming: [email protected]
微信我：加我微信号：xuming624, 进Python-NLP交流群，备注：姓名-公司名-NLP

Citation

如果你在研究中使用了parrots，请按如下格式引用：

@misc{parrots,
  title={parrots: ASR and TTS Tool},
  author={Ming Xu},
  year={2024},
  howpublished={\url{https://github.com/shibing624/parrots}},
}

License

授权协议为 The Apache License 2.0，可免费用做商业用途。请在产品说明中附加parrots的链接和授权协议。

Contribute

项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目，在提交之前，注意以下两点：

在tests添加相应的单元测试
使用python -m pytest来运行所有单元测试，确保所有单测都是通过的

之后即可提交PR。

Reference

ASR(Speech Recognition)

TTS(Speech Synthesis)

parrots's People

Contributors

Stargazers

Watchers

Forkers

templeblock maggie0830 saubcy whaozl moonish08heart junshipeng jacktang ii0 yangyw08 ycangus2415 muskbing iamweiweishi luweishuang sigmaquan zhuangleiscut jsliugang y742035557 we1l1n li492549979 ygexe markkun bluesky-xsk huginnhuginn fernandonichey cremiy yun-li jeffcao wilson1823 nuck555 sofool daxiongpro xiaoyuxiaoer northdeng wynncoin marchbeta2087 craii hgneng yanyundata nickoo123 liushuchun elaa0505 yangboz jiangluping1994 jingchunzhang lplhock grit1024 elslb liuanhua110 maoliming road2018 lukezhangmengxi treedy2020 zhujsh888 uutool 11joker mdys russelyang tantailong veryquant mentosl dengweigong hi-barry yanyuxiyangzk motwnb milkguy howiewang17 bbc-123 siwen-wu kiwh77 pyqgithub deandeandone bradbann syh0304 liuqhahah libin89 endysaiwang ganjunhong husw725 qzeroq keyzf

parrots's Issues

module 'tensorflow' has no attribute 'get_default_graph'

"C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\python.exe" "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 57254 --file C:/Users/16413/Documents/GitHub/LostXmas/seq2seq/data/mining/SpeechRec/sr.py
pydev debugger: process 64336 is connecting

Connected to pydev debugger (build 201.7846.77)
2020-07-10 18:18:36.025768: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Using TensorFlow backend.
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2020-07-10 18:18:42,676 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\pinyin_hanzi_dict.txt, size: 1421
2020-07-10 18:18:42,676 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\char_idx.txt, size: 5832
2020-07-10 18:18:43,380 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\word_idx.txt, size: 568646
2020-07-10 18:18:43,630 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\dic_pinyin.txt, size: 96117
Backend TkAgg is interactive backend. Turning interactive mode on.
2020-07-10 18:18:46.081700: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-10 18:18:47.311791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-07-10 18:18:47.312512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 18:18:47.357397: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 18:18:47.396445: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 18:18:47.404674: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 18:18:47.411615: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 18:18:47.458602: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 18:18:47.689152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 18:18:47.690095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 18:18:47.691017: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-10 18:18:47.701188: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20fce644d50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-10 18:18:47.701708: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-10 18:18:47.702573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-07-10 18:18:47.703142: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 18:18:47.703421: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 18:18:47.703701: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 18:18:47.703979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 18:18:47.704255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 18:18:47.704539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 18:18:47.704827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 18:18:47.705628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 18:18:48.633762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-10 18:18:48.634075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-07-10 18:18:48.634244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-07-10 18:18:48.635184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4602 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-07-10 18:18:48.639308: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20f87a4a760 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-10 18:18:48.639690: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
2020-07-10 18:18:49,452 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Loading pinyin dict cost 0.016 seconds.
2020-07-10 18:18:49,514 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Loading model cost 0.063 seconds.
2020-07-10 18:18:49,514 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Speech recognition model has been built ok.
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1438, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/16413/Documents/GitHub/LostXmas/seq2seq/data/mining/SpeechRec/sr.py", line 4, in <module>
    text = parrots.recognize_speech_from_file('voice.wav')
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 203, in recognize_speech_from_file
    return self.recognize_speech(signal, fs)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 184, in recognize_speech
    self.check_initialized()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 69, in check_initialized
    self.initialize()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 64, in initialize
    self.graph = tf.get_default_graph()
AttributeError: module 'tensorflow' has no attribute 'get_default_graph'

Process finished with exit code 1

pretrained model

Great job!Thanks for your sharing!Where is the pretraind model?And the syllables.zip file can not be available.Looking forward to your reply!

请问老师,这个库性能如何啊能支持多少并发呢? 若做成websocket方式流传入该如何做啊? 谢谢!

如题感谢大佬!

无法下载

I checked to make sure that this is not a duplicate issue

Describe the solution you'd like

A clear and concise description of what you want to happen.
https://huggingface.co/spaces/shibing624/parrots 无法打开
建议在国内网盘设置一个下载点

Describe the bug

Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.

说话方式太机械化了

从试用体验来看，当面的文字转语音太机械化了，基本是按照相同的时间间隔来吐词。大佬有没有考虑利用深度学习技术使得语气更加的拟人化？

Architectural description of parrots and how to train in english or any other language

AttributeError

import parrots
text = parrots.speech_recognition_from_file('./16k.wav')
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'parrots' has no attribute 'speech_recognition_from_file'

How to solve this problem? Thanks.

Another audio file input error

ValueError: could not broadcast input array from shape (91597,200,1) into shape (1600,200,1)

语音转文字识别率低

环境：
Windows 10 专业版

问题：
安装环境之后，使用example中存在的例子和个人素材进行demo：

example ：

个人素材也是同样识别出第一个音，后面就没有了。

目的：
想请教大佬们，目前转化的准确率是存在问题，后面能进一步提高嘛？

安装时 keras 导入包报错

ImportError: cannot import name 'Adam' from 'keras.optimizers'

调用m = TextToSpeech(speaker_model_path='shibing624/parrots-gpt-sovits-speaker-maimai', speaker_name='MaiMai') 报错

第一步调用就报错了,我的pytorch版本是2.2.1+cu121, 是不是太高了?

Cell In[3], line 1
----> 1 m = TextToSpeech(speaker_model_path='shibing624/parrots-gpt-sovits-speaker-maimai', speaker_name='MaiMai')

File e:\bomb\proj\python\BarkVoice\parrots\tts.py:342, in TextToSpeech.init(self, bert_model_path, hubert_model_path, sovits_model_path, gpt_model_path, speaker_model_path, speaker_name, device, half)
339 raise ValueError("sovits_model_path, gpt_model_path or speaker_model_path must be provided")
341 # SoVITS
--> 342 sovits_dict = torch.load(sovits_model_path, map_location="cpu")
343 hps = DictToAttrRecursive(sovits_dict["config"])
344 logger.debug(f"SoVITS config: {hps}")

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1026, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
1024 except RuntimeError as e:
1025 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
-> 1026 return _load(opened_zipfile,
1027 map_location,
1028 pickle_module,
1029 overall_storage=overall_storage,
1030 **pickle_load_args)
1031 if mmap:
1032 raise RuntimeError("mmap can only be used with files saved with "
1033 "`torch.save(_use_new_zipfile_serialization=True), "
1034 "please torch.save your checkpoint with this option in order to use mmap.")

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1438, in _load(zip_file, map_location, pickle_module, pickle_file, overall_storage, **pickle_load_args)
1436 unpickler = UnpicklerWrapper(data_file, **pickle_load_args)
1437 unpickler.persistent_load = persistent_load
-> 1438 result = unpickler.load()
1440 torch._utils._validate_loaded_sparse_tensors()
1441 torch._C._log_api_usage_metadata(
1442 "torch.load.metadata", {"serialization_id": zip_file.serialization_id()}
1443 )

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1431, in _load..UnpicklerWrapper.find_class(self, mod_name, name)
1429 pass
1430 mod_name = load_module_mapping.get(mod_name, mod_name)
-> 1431 return super().find_class(mod_name, name)

ModuleNotFoundError: No module named 'utils'

keras库版本?

请问这个报错可以怎么解决啊？是我的keras库版本太低？还是？
报错信息：
Traceback (most recent call last):
File "paddle_asr.py", line 25, in
test_parrots("/data/wav_ocr/2022103000000012/")
File "paddle_asr.py", line 22, in test_parrots
r = m.recognize_speech_from_file(input_path+wav)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 197, in recognize_speech_from_file
return self.recognize_speech(signal, fs)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 178, in recognize_speech
self.check_initialized()
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 66, in check_initialized
self.initialize()
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 53, in initialize
self._model.load_weights(self.model_path)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1516, in load_weights
saving.load_weights_from_hdf5_group(f, self.layers)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/saving.py", line 772, in load_weights_from_hdf5_group
original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'
简单代码调用用来做性能测试：
from parrots import SpeechRecognition, Pinyin2Hanzi
import time
start_time=time.time()
#m = SpeechRecognition()
#n = Pinyin2Hanzi()
def test_parrots(input_path):
m = SpeechRecognition()
n = Pinyin2Hanzi()
for wav in os.listdir(input_path):
if wav.endswith(".wav"):
r = m.recognize_speech_from_file(input_path+wav)
text = n.pinyin_2_hanzi(r)
print("parrots-ocr-finished")
test_parrots("/data/wav_ocr/2022103000000012/")
end_time=time.time()
print(end_time-start_time)

import sys

sys.path.append('..')
from parrots import TextToSpeech

if __name__ == '__main__':
    m = TextToSpeech()
    # say text
    m.speak('北京图书馆')

输出为：

2023-03-08 21:39:16.605 | DEBUG    | parrots.tts:speak:66 - ['bei3', 'jing1', 'tu2', 'shu1', 'guan3']

但是没有声音播放。在windows平台，测试其它的文本转语音项目，可以输出声音。

这tts是需要联网在线服务吗

Describe the bug

Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.

shibing624 / parrots Goto Github PK

parrots's Introduction

Parrots: ASR and TTS toolkit

Introduction

Features

Install

Demo

Usage

ASR(Speech Recognition)

TTS(Speech Synthesis)

命令行模式（CLI）

Release Models

ASR

TTS

Contact

Citation

License

Contribute

Reference

ASR(Speech Recognition)

TTS(Speech Synthesis)

parrots's People

Contributors

Stargazers

Watchers

Forkers

parrots's Issues

Describe the solution you'd like

Describe the bug

Describe the bug

Recommend Projects

Recommend Topics

Recommend Org