kuangdd / ttskit Goto Github PK
View Code? Open in Web Editor NEWtext to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。
License: MIT License
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。
License: MIT License
不要浪费时间了这个项目根本跑不起来
from ttskit.resource import _speaker_dict
How to solve this problem
Building wheels for collected packages: llvmlite
Building wheel for llvmlite (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [26 lines of output]
running bdist_wheel
/Library/Frameworks/Python.framework/Versions/3.9/bin/python3 /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py
LLVM version... Traceback (most recent call last):
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 105, in main_posix
out = subprocess.check_output([llvm_config, '--version'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 168, in <module>
main()
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 162, in main
main_posix('osx', '.dylib')
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 107, in main_posix
raise RuntimeError("%s failed executing, please point LLVM_CONFIG "
RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
error: command '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llvmlite
Running setup.py clean for llvmlite
Failed to build llvmlite
Installing collected packages: llvmlite, zope.interface, zope.event, torch, numba, greenlet, matplotlib, gevent, umap-learn, ttskit
Attempting uninstall: llvmlite
Found existing installation: llvmlite 0.38.0
Uninstalling llvmlite-0.38.0:
Successfully uninstalled llvmlite-0.38.0
Running setup.py install for llvmlite ... error
error: subprocess-exited-with-error
× Running setup.py install for llvmlite did not run successfully.
│ exit code: 1
╰─> [29 lines of output]
running install
running build
got version from file /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/llvmlite/_version.py {'version': '0.31.0', 'full': 'fe7d985f6421d87f613bd414479d29d912771562'}
running build_ext
/Library/Frameworks/Python.framework/Versions/3.9/bin/python3 /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py
LLVM version... Traceback (most recent call last):
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 105, in main_posix
out = subprocess.check_output([llvm_config, '--version'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 168, in <module>
main()
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 162, in main
main_posix('osx', '.dylib')
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 107, in main_posix
raise RuntimeError("%s failed executing, please point LLVM_CONFIG "
RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
error: command '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
WARNING: No metadata found in /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages
Rolling back uninstall of llvmlite
Moving to /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/llvmlite-0.38.0.dist-info/
from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/~lvmlite-0.38.0.dist-info
Moving to /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/llvmlite/
from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/~lvmlite
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> llvmlite
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
[wuyu@wuyudeMacBook-Pro Documents]$>
D:\ai\ttskit\ttskit\mellotron\stft.py:67: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
D:\ai\ttskit\ttskit\mellotron\layers.py:66: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error
sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)
D:\ai\ttskit\ttskit\mellotron\stft.py:67: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
D:\ai\ttskit\ttskit\mellotron\layers.py:66: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error
sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)
因在arm mac,我替换tensorflow
而安装了pip install tensorflow-macos
,其他无变化。
环境: arm mac os 12.6
+ Python 3.9.16
进入到python后:
>>> from ttskit import http_server
>>> http_server.start_sever()
第二个命令报错
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/http_server.py", line 95, in start_sever
from . import sdk_api
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/sdk_api.py", line 55, in <module>
_stft = TacotronSTFT(File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/mellotron/layers.py", line 64, in __init__
self.stft_fn = STFT(filter_length, hop_length, win_length)
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/mellotron/stft.py", line 67, in __init__
fft_window = pad_center(fft_window, filter_length
TypeError: pad_center() takes 1 positional argument but 2 were given
参数 speaker 为发声人,和audio(参考音频)有啥区别么?是语音克隆么?最好能举个带这两个参数的例子
“还”、“长”等等这样的多音字总是读错,有办法手动标出读音减少错误读音吗?
@:~/sub-tts$ tkcli
Traceback (most recent call last):
File "/usr/local/bin/tkcli", line 5, in
from ttskit.cli_api import tts_cli
ModuleNotFoundError: No module named 'ttskit'
ttskit-main
作为自己的项目文件夹resource
将其放到 ttskit-main\ttskit
文件夹中覆盖原有的 resource
文件夹ttskit-main\ttskit\resoure\__init__.py
不要替换resoure\__init__.py
单独替换回来ttskit-main
目录打开命令行, 输入并回车 pip install -U ttskit
ttskit-main
文件夹中建立一个 demo.py
文件, 并输入以下代码
from ttskit import http_server
http_server.start_sever()```
from ttskit import sdk_api
wav = sdk_api.tts_sdk('文本', audio='24')
怎么让他放出声音
合成字数一多,就大概率会得到很差的语音,内容都听不出来
比如我连续合成2,4,6直到20个字的语音。
16,18,20个字的语音基本都是没法听的,这种语音的语音时长都是一个固定的值11.629931972789116s。
有什么方法可以解决这个问题吗?问题是出在哪里?是否和我频繁合成有关系?
如果要对获得的音频进行降噪、升调、降调等复杂处理,或者将多个speaker的返回内容拼合成一个音频的话,势必要用到numpy.array类型的音频数据,按目前的SDK只能将返回值写入文件,然后再读入,有些繁复,因此建议作者 加入直接返回numpy.array的SDK参数.(如果本来就有而我没找到的话,就抱歉了)
原代码 :
...
return wav
更改后的代码 :
...
wav_array = np.array(wav_out.get_array_of_samples())
if kwargs.get('array', False):return wav_array
else:return wav
使用示例 :
from ttskit import sdk_api
wav_array = sdk_api.tts_sdk(text='返回数组',array = True)
有了这样的返回值后,就可以方便地对返回音频进行傅里叶变换等复杂处理了。我对这个库的代码编写不完全熟悉,因此不确定这个更改是否会产生未知错误。在我小数据量测试中,我的修改是稳定可行的,希望作者可以阅读我的代码,确定其安全有效后,将其更新入这个库中,谢谢!
Windows 10
Python 3.6.5
installed completely and install something manually
When I execute "import ttskit" in python cli:
>>> import ttskit
2021-10-28 12:24:34.621870: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-10-28 12:24:34.627138: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "F:\Python\Python36\lib\site-packages\ttskit\__init__.py", line 50, in <module>
import sdk_api
File "F:\Python\Python36\lib\site-packages\ttskit\sdk_api.py", line 41, in <module>
from ttskit.resource import _speaker_dict
ImportError: cannot import name '_speaker_dict'
>>>
ydub/audio_segment.py", line 374, in radd
raise TypeError("Gains must be the second addend after the "
TypeError: Gains must be the second addend after the AudioSegment
2022-10-27T02:42:03Z {'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '50648', 'HTTP_HOST': 'localhost:9000', (hidden keys: 26)} failed with TypeError
ERROR:ttskit.web_api:Exception on /tts [GET]
Traceback (most recent call last):
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\codedemo\tts\ttskit-main\ttskit\web_api.py", line 50, in tts_web
wav = sdk_api.tts_sdk(text=text, speaker=speaker, audio=audio)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 364, in tts_sdk_base
mels, mels_postnet, gates, alignments = mellotron.generate_mel(text_data, style_data, speaker_data, f0_data)
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\inference.py", line 77, in generate_mel
mels, mels_postnet, gates, alignments = _model.inference((text, style, speaker, f0))
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 677, in inference
mel_outputs, gate_outputs, alignments = self.decoder.inference(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 524, in inference
mel_outputs, gate_outputs, alignments = self.parse_decoder_outputs(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 357, in parse_decoder_outputs
alignments = torch.stack(alignments).transpose(0, 1)
RuntimeError: stack expects each tensor to be equal size, but got [1, 20] at entry 0 and [1, 68] at entry 9
ERROR:ttskit.web_api:Exception on /tts [GET]
Traceback (most recent call last):
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\codedemo\tts\ttskit-main\ttskit\web_api.py", line 50, in tts_web
wav = sdk_api.tts_sdk(text=text, speaker=speaker, audio=audio)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 364, in tts_sdk_base
mels, mels_postnet, gates, alignments = mellotron.generate_mel(text_data, style_data, speaker_data, f0_data)
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\inference.py", line 77, in generate_mel
mels, mels_postnet, gates, alignments = _model.inference((text, style, speaker, f0))
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 677, in inference
mel_outputs, gate_outputs, alignments = self.decoder.inference(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 524, in inference
mel_outputs, gate_outputs, alignments = self.parse_decoder_outputs(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 357, in parse_decoder_outputs
alignments = torch.stack(alignments).transpose(0, 1)
RuntimeError: stack expects each tensor to be equal size, but got [1, 20] at entry 0 and [1, 68] at entry 8
wsl2 ubuntu 22.04子系统中,执行 命令行报错,这个是什么问题?该如何解决,按理说在没有 Nvidia GPU 的情况下,将默认使用 CPU 运行
$ tkcli -h
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/usr/local/bin/tkcli", line 5, in <module>
from ttskit.cli_api import tts_cli
File "/usr/local/lib/python3.10/dist-packages/ttskit/cli_api.py", line 68, in <module>
from . import sdk_api
File "/usr/local/lib/python3.10/dist-packages/ttskit/sdk_api.py", line 55, in <module>
_stft = TacotronSTFT(
File "/usr/local/lib/python3.10/dist-packages/ttskit/mellotron/layers.py", line 64, in __init__
self.stft_fn = STFT(filter_length, hop_length, win_length)
File "/usr/local/lib/python3.10/dist-packages/ttskit/mellotron/stft.py", line 67, in __init__
fft_window = pad_center(fft_window, filter_length)
TypeError: pad_center() takes 1 positional argument but 2 were given
我采用pip直接安装ttskit包,运行环境为Ubuntu18.04 + Python3.8.0。
用tscli运行时,出现以下错误:
Input text (输入文本或exit退出,不输入则随机):
你好
Input kwargs (输入控制参数,格式:audio=1,speaker=biaobei,不输入则默认)
dictionary update sequence element #0 has length 1; 2 is required
Text: 你好
Kwargs: {'audio': '6', 'speaker': 'tmp'}
TTS running ...
INFO:sdk_api:Synthesizing: 你好
load phrase: 36778it [00:00, 239649.94it/s]
load pinyin: 41459it [00:00, 592173.16it/s]
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
DEBUG:jieba:Loading model from cache /tmp/jieba.cache
Loading model cost 0.393 seconds.
DEBUG:jieba:Loading model cost 0.393 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
Traceback (most recent call last):
File "/home/season/.local/bin/tkcli", line 8, in
sys.exit(tts_cli())
File "/home/season/.local/lib/python3.8/site-packages/ttskit/cli_api.py", line 155, in tts_cli
sdk_api.tts_sdk(text=text,
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 377, in tts_sdk_base
wavs = melgan.generate_wave(mel=mels_postnet)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/melgan/inference.py", line 63, in generate_wave
wav = _model.inference(mel)
AttributeError: 'NoneType' object has no attribute 'inference'
怎么下载训练数据啊
ttskit.tts('这是个示例', audio='14')
AttributeError: module 'ttskit' has no attribute 'tts'
the calling code:
def text2wave(
self,
txt: str,
audio="14",
speaker="",
sampling_rate=22050,
processes=2,
maxlen=60,
save_to: Optional[str] = None,
) -> Optional[bytes]:
wav = sdk_api.tts_sdk(
txt, audio=audio, processes=processes, sampling_rate=sampling_rate
)
if save_to is not None:
with open(save_to, "wb") as f:
f.write(wav)
else:
return wav
anything wrong with my call?
发生异常: ImportError
cannot import name 'replace_tone2_style_dict_to_default' from 'pypinyin.utils' (D:\Program Files\Python\lib\site-packages\pypinyin\utils.py)
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\text_init.py", line 7, in
from phkit.chinese import text_to_sequence as text_to_sequence_phkit, sequence_to_text, text2pinyin
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\data_utils.py", line 24, in
from mellotron.text import text_to_sequence, cmudict
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\inference.py", line 26, in
from .data_utils import transform_mel, transform_text, transform_f0, transform_embed, transform_speaker
File "E:\Work\Github\ttskit\TTS_kit\ttskit\sdk_api.py", line 37, in
from ttskit.mellotron import inference as mellotron
File "E:\Work\Github\ttskit\TTS_kit\ttskit_init_.py", line 50, in
import sdk_api
File "E:\Work\Github\ttskit\TTS_kit\test.py", line 31, in test_http_server
from ttskit import http_server
File "E:\Work\Github\ttskit\TTS_kit\test.py", line 42, in
test_http_server()
INFO:audio_player:ImportError: No module named 'sounddevice'
INFO:audio_player:ImportError: No module named 'pyaudio'
INFO:audio_griffinlim:ImportError: No module named 'tensorflow'
但功能正常。请问会有什么影响吗?
在用某云提供的tts接口合成,基本在200毫秒内,ttskit能否做到500毫秒以内呢?
web api请求完可以保存音频吗,请求完都没看到
搞了两天,终于跑起来了。很Nice,赞一个
还想请教几个问题:
1.如何添加自定义发音人,和如何训练发音人
2.如何添加对英文的支持
我剛開始測試,使用readme裡的範例:
from ttskit import sdk_api
wav = sdk_api.tts_sdk('文本', audio='24')
但是得到
File ~\anaconda3\lib\site-packages\ttskit\mellotron\stft.py:67 in __init__ fft_window = pad_center(fft_window, filter_length)
TypeError: pad_center() takes 1 positional argument but 2 were given
請問是什麼問題?
我的環境是anaconda3@Windows11,使用Spyder+IPython
我现在是在ubuntu运行,但是似乎不太成功,有大佬帮忙看看吗
例如TTS这种缩写字母的朗读,谢谢!
这里的resource 是ttskit包里面的resource吗,并没有看到这个函数
(pytorch1.6) C:\Users\Administrator>python E:\TTS\ttskit\myTest.py
Traceback (most recent call last):
File "E:\TTS\ttskit\myTest.py", line 3, in
from ttskit import sdk_api
File "E:\TTS\ttskit\ttskit\sdk_api.py", line 49, in
from .resource import _speaker_dict
ImportError: cannot import name '_speaker_dict'
每次部署环境都太久了
非常感谢提供这么好的工具。有两个问题想问一下:
1、可以支持更改语速吗?
2、生成语音文件的速度个人感觉很慢,这是正常的现象还是?
另外,在长文本生成上,建议可以用标点来进行分句会不会更好一些?
再次感谢!!
在windows上能正常运行,想将将语音合成移植到安卓(android6,android7)设备上,能提供思路不?
I tried to install and test the repo but it post errors for not build the package included in the setup script. I am testing it on a windows machine with a python 3.9 environment. I suspect this repo is supported by lower version python. Can you list the environment requirement as part of the read me file?
Classification according to my own hearing, no guarantee on the accuracy.
Some voices are buggy? Some are unclear and have repetition problem.
我想要训练自己的发声人,翻遍整个库都没有找到系统的教程,所以想问下如何训练自己的数据
长文本合成音频,总是只有最后一句。
#!usr/bin/env python
from ttskit import sdk_api
var='工业和信息化部总工程师田玉龙在国新办新闻发布会上介绍'
wav = sdk_api.tts_sdk_for(var,speaker='cctvfa', output=r'E:\TTS\ttskits\my9.wav')
Hi everyone!
My name is David Martin Rius and I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/
Now you can create datasets automatically with any audio or lists of audios.
I hope you find it useful.
Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).
Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.
Sound Quality Improvement: It improves the quality of the audio when needed.
Audio Segmentation: It can segment audio files within specified second ranges.
Transcription: The project transcribes the segmented audio, providing a textual representation.
Gender Identification: It identifies the gender of each speaker in the audio.
Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.
Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.
Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.
Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.
Syllabic and words-per-minute metrics
Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator
David Martin Rius
Traceback (most recent call last):
File "", line 1, in
File "/home/new/ly_test/ttskit-main/ttskit/init.py", line 50, in
import sdk_api
File "/home/new/ly_test/ttskit-main/ttskit/sdk_api.py", line 37, in
from ttskit.mellotron import inference as mellotron
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/inference.py", line 26, in
from .data_utils import transform_mel, transform_text, transform_f0, transform_embed, transform_speaker
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/data_utils.py", line 24, in
from mellotron.text import text_to_sequence, cmudict
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/text/init.py", line 7, in
from phkit.chinese import text_to_sequence as text_to_sequence_phkit, sequence_to_text, text2pinyin
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/init.py", line 94, in
from phkit.chinese import doc as doc_chinese
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/chinese/init.py", line 37, in
from .pinyin import text2pinyin, split_pinyin, change_diao
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/chinese/pinyin.py", line 11, in
from ..pinyinkit import text2pinyin, split_pinyin, change_diao
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/pinyinkit/init.py", line 6, in
from .core import lazy_pinyin, pinyin, slug, Style, initialize
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/pinyinkit/core.py", line 20, in
from pypinyin.utils import _replace_tone2_style_dict_to_default
ImportError: cannot import name '_replace_tone2_style_dict_to_default' from 'pypinyin.utils' (/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/pypinyin/utils.py)
请问ImportError: cannot import name '_replace_tone2_style_dict_to_default' from 'pypinyin.utils' 这个问题怎么解决呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.