Coder Social home page Coder Social logo

shibing624 / parrots Goto Github PK

View Code? Open in Web Editor NEW
428.0 12.0 83.0 12.53 MB

Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成,支持多语言,准确率高

License: Apache License 2.0

Python 100.00%
speech-recognition tts parrot text-to-speech-python3 pinyin2hanzi chinese-speech-recognition chinese-speech-synthesis

parrots's Introduction

🇨🇳中文 | 🌐English | 📖文档/Docs | 🤖模型/Models


Parrots: ASR and TTS toolkit

PyPI version Downloads Contributions welcome GitHub contributors License Apache 2.0 python_vesion GitHub issues Wechat Group

Introduction

Parrots, Automatic Speech Recognition(ASR), Text-To-Speech(TTS) toolkit, support Chinese, English, Japanese, etc.

parrots实现了语音识别和语音合成模型一键调用,开箱即用,支持中英文。

Features

  1. ASR:基于distilwhisper实现的中文语音识别(ASR)模型,支持中、英等多种语言
  2. TTS:基于GPT-SoVITS训练的语音合成(TTS)模型,支持中、英、日等多种语言

Install

pip install torch # or conda install pytorch
pip install -r requirements.txt
pip install parrots

or

pip install torch # or conda install pytorch
git clone https://github.com/shibing624/parrots.git
cd parrots
python setup.py install

Demo

run example: examples/tts_gradio_demo.py to see the demo:

python examples/tts_gradio_demo.py

Usage

ASR(Speech Recognition)

example: examples/demo_asr.py

import os
import sys

sys.path.append('..')
from parrots import SpeechRecognition

pwd_path = os.path.abspath(os.path.dirname(__file__))

if __name__ == '__main__':
    m = SpeechRecognition()
    r = m.recognize_speech_from_file(os.path.join(pwd_path, 'tushuguan.wav'))
    print('[提示] 语音识别结果:', r)

output:

{'text': '北京图书馆'}

TTS(Speech Synthesis)

example: examples/demo_tts.py

import sys
sys.path.append('..')
import parrots
from parrots import TextToSpeech
parrots_path = parrots.__path__[0]
sys.path.append(parrots_path)

m = TextToSpeech(
    speaker_model_path="shibing624/parrots-gpt-sovits-speaker-maimai",
    speaker_name="MaiMai",
)
m.predict(
    text="你好,欢迎来北京。welcome to the city.",
    text_language="auto",
    output_path="output_audio.wav"
)

output:

Save audio to output_audio.wav

命令行模式(CLI)

支持通过命令行方式执行ARS和TTS任务,代码:cli.py

> parrots -h                                    

NAME
    parrots

SYNOPSIS
    parrots COMMAND

COMMANDS
    COMMAND is one of the following:

     asr
       Entry point of asr, recognize speech from file

     tts
       Entry point of tts, generate speech audio from text

run:

pip install parrots -U
# asr example
parrots asr -h
parrots asr examples/tushuguan.wav

# tts example
parrots tts -h
parrots tts "你好,欢迎来北京。welcome to the city." output_audio.wav
  • asrtts是二级命令,asr是语音识别,tts是语音合成,默认使用的模型是中文模型
  • 各二级命令使用方法见parrots asr -h
  • 上面示例中examples/tushuguan.wavasr方法的audio_file_path参数,输入的音频文件(required)

Release Models

ASR

TTS

speaker name 说话人名 character 角色特点 language 语言
KuileBlanc 葵·勒布朗 lady 标准美式女声 en
LongShouRen 龙守仁 gentleman 标准美式男声 en
MaiMai 卖卖 singing female anchor 唱歌女主播声 zh
XingTong 星瞳 singing ai girl 活泼女声 zh
XuanShen 炫神 game male anchor 游戏男主播声 zh
KusanagiNene 草薙寧々 loli 萝莉女学生声 ja
speaker name 说话人名 character 角色特点 language 语言
MaiMai 卖卖 singing female anchor 唱歌女主播声 zh

Contact

  • Issue(建议):GitHub issues
  • 邮件我:xuming: [email protected]
  • 微信我:加我微信号:xuming624, 进Python-NLP交流群,备注:姓名-公司名-NLP

Citation

如果你在研究中使用了parrots,请按如下格式引用:

@misc{parrots,
  title={parrots: ASR and TTS Tool},
  author={Ming Xu},
  year={2024},
  howpublished={\url{https://github.com/shibing624/parrots}},
}

License

授权协议为 The Apache License 2.0,可免费用做商业用途。请在产品说明中附加parrots的链接和授权协议。

Contribute

项目代码还很粗糙,如果大家对代码有所改进,欢迎提交回本项目,在提交之前,注意以下两点:

  • tests添加相应的单元测试
  • 使用python -m pytest来运行所有单元测试,确保所有单测都是通过的

之后即可提交PR。

Reference

ASR(Speech Recognition)

TTS(Speech Synthesis)

parrots's People

Contributors

daxiongpro avatar nuck555 avatar shibing624 avatar sonictl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parrots's Issues

module 'tensorflow' has no attribute 'get_default_graph'

"C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\python.exe" "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 57254 --file C:/Users/16413/Documents/GitHub/LostXmas/seq2seq/data/mining/SpeechRec/sr.py
pydev debugger: process 64336 is connecting

Connected to pydev debugger (build 201.7846.77)
2020-07-10 18:18:36.025768: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Using TensorFlow backend.
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2020-07-10 18:18:42,676 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\pinyin_hanzi_dict.txt, size: 1421
2020-07-10 18:18:42,676 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\char_idx.txt, size: 5832
2020-07-10 18:18:43,380 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\word_idx.txt, size: 568646
2020-07-10 18:18:43,630 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\dic_pinyin.txt, size: 96117
Backend TkAgg is interactive backend. Turning interactive mode on.
2020-07-10 18:18:46.081700: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-10 18:18:47.311791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-07-10 18:18:47.312512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 18:18:47.357397: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 18:18:47.396445: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 18:18:47.404674: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 18:18:47.411615: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 18:18:47.458602: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 18:18:47.689152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 18:18:47.690095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 18:18:47.691017: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-10 18:18:47.701188: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20fce644d50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-10 18:18:47.701708: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-10 18:18:47.702573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-07-10 18:18:47.703142: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 18:18:47.703421: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 18:18:47.703701: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 18:18:47.703979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 18:18:47.704255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 18:18:47.704539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 18:18:47.704827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 18:18:47.705628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 18:18:48.633762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-10 18:18:48.634075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-07-10 18:18:48.634244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-07-10 18:18:48.635184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4602 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-07-10 18:18:48.639308: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20f87a4a760 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-10 18:18:48.639690: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
2020-07-10 18:18:49,452 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Loading pinyin dict cost 0.016 seconds.
2020-07-10 18:18:49,514 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Loading model cost 0.063 seconds.
2020-07-10 18:18:49,514 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Speech recognition model has been built ok.
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1438, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/16413/Documents/GitHub/LostXmas/seq2seq/data/mining/SpeechRec/sr.py", line 4, in <module>
    text = parrots.recognize_speech_from_file('voice.wav')
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 203, in recognize_speech_from_file
    return self.recognize_speech(signal, fs)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 184, in recognize_speech
    self.check_initialized()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 69, in check_initialized
    self.initialize()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 64, in initialize
    self.graph = tf.get_default_graph()
AttributeError: module 'tensorflow' has no attribute 'get_default_graph'

Process finished with exit code 1

pretrained model

Great job!Thanks for your sharing!Where is the pretraind model?And the syllables.zip file can not be available.Looking forward to your reply!

distil-whisper 中文支持?效果能用?

Describe the bug

Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.

说话方式太机械化了

从试用体验来看,当面的文字转语音太机械化了,基本是按照相同的时间间隔来吐词。大佬有没有考虑利用深度学习技术使得语气更加的拟人化?

AttributeError

import parrots
text = parrots.speech_recognition_from_file('./16k.wav')
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'parrots' has no attribute 'speech_recognition_from_file'

How to solve this problem? Thanks.

语音转文字识别率低

环境:
Windows 10 专业版

问题:
安装环境之后,使用example中存在的例子和个人素材进行demo:

example :
image

个人素材也是同样识别出第一个音,后面就没有了。

目的:
想请教大佬们,目前转化的准确率是存在问题,后面能进一步提高嘛?

调用m = TextToSpeech(speaker_model_path='shibing624/parrots-gpt-sovits-speaker-maimai', speaker_name='MaiMai') 报错

第一步调用就报错了,我的pytorch版本是2.2.1+cu121, 是不是太高了?

Cell In[3], line 1
----> 1 m = TextToSpeech(speaker_model_path='shibing624/parrots-gpt-sovits-speaker-maimai', speaker_name='MaiMai')

File e:\bomb\proj\python\BarkVoice\parrots\tts.py:342, in TextToSpeech.init(self, bert_model_path, hubert_model_path, sovits_model_path, gpt_model_path, speaker_model_path, speaker_name, device, half)
339 raise ValueError("sovits_model_path, gpt_model_path or speaker_model_path must be provided")
341 # SoVITS
--> 342 sovits_dict = torch.load(sovits_model_path, map_location="cpu")
343 hps = DictToAttrRecursive(sovits_dict["config"])
344 logger.debug(f"SoVITS config: {hps}")

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1026, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
1024 except RuntimeError as e:
1025 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
-> 1026 return _load(opened_zipfile,
1027 map_location,
1028 pickle_module,
1029 overall_storage=overall_storage,
1030 **pickle_load_args)
1031 if mmap:
1032 raise RuntimeError("mmap can only be used with files saved with "
1033 "`torch.save(_use_new_zipfile_serialization=True), "
1034 "please torch.save your checkpoint with this option in order to use mmap.")

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1438, in _load(zip_file, map_location, pickle_module, pickle_file, overall_storage, **pickle_load_args)
1436 unpickler = UnpicklerWrapper(data_file, **pickle_load_args)
1437 unpickler.persistent_load = persistent_load
-> 1438 result = unpickler.load()
1440 torch._utils._validate_loaded_sparse_tensors()
1441 torch._C._log_api_usage_metadata(
1442 "torch.load.metadata", {"serialization_id": zip_file.serialization_id()}
1443 )

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1431, in _load..UnpicklerWrapper.find_class(self, mod_name, name)
1429 pass
1430 mod_name = load_module_mapping.get(mod_name, mod_name)
-> 1431 return super().find_class(mod_name, name)

ModuleNotFoundError: No module named 'utils'

keras库版本?

请问这个报错可以怎么解决啊?是我的keras库版本太低?还是?
报错信息:
Traceback (most recent call last):
File "paddle_asr.py", line 25, in
test_parrots("/data/wav_ocr/2022103000000012/")
File "paddle_asr.py", line 22, in test_parrots
r = m.recognize_speech_from_file(input_path+wav)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 197, in recognize_speech_from_file
return self.recognize_speech(signal, fs)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 178, in recognize_speech
self.check_initialized()
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 66, in check_initialized
self.initialize()
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 53, in initialize
self._model.load_weights(self.model_path)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1516, in load_weights
saving.load_weights_from_hdf5_group(f, self.layers)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/saving.py", line 772, in load_weights_from_hdf5_group
original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'
简单代码调用用来做性能测试:
from parrots import SpeechRecognition, Pinyin2Hanzi
import time
start_time=time.time()
#m = SpeechRecognition()
#n = Pinyin2Hanzi()
def test_parrots(input_path):
m = SpeechRecognition()
n = Pinyin2Hanzi()
for wav in os.listdir(input_path):
if wav.endswith(".wav"):
r = m.recognize_speech_from_file(input_path+wav)
text = n.pinyin_2_hanzi(r)
print("parrots-ocr-finished")
test_parrots("/data/wav_ocr/2022103000000012/")
end_time=time.time()
print(end_time-start_time)

运行官方示例,没有声音输出

最近一直想如何将文字转换为语音,找到这个项目。 首先感谢作者的付出,但是我使用的时候有点问题。

测试代码为:

import sys

sys.path.append('..')
from parrots import TextToSpeech

if __name__ == '__main__':
    m = TextToSpeech()
    # say text
    m.speak('北京图书馆')

输出为:

2023-03-08 21:39:16.605 | DEBUG    | parrots.tts:speak:66 - ['bei3', 'jing1', 'tu2', 'shu1', 'guan3']

但是没有声音播放。在windows平台,测试其它的文本转语音项目,可以输出声音。

这tts是需要联网在线服务吗

Describe the bug

Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.