plachtaa / vits-fast-fine-tuning Goto Github PK
View Code? Open in Web Editor NEWThis repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
License: Apache License 2.0
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
License: Apache License 2.0
Hello!
Yesterday I tested the Colab notebook that included an interface to record your own voice clips and fine tune with that, and did not have the video part. I was starting to build a fork of the repo to repeat the process in spanish but it seems the notebook was updated along with these new video features, is there a chance I can still access to that old notebook, to adapt it for my fork?
Of course as soon as I manage to adapt it I can provide you with the changes I made and try to adapt it so that you can have an extra language option.
Thanks in advance, and congratulations on such an amazing work, I had already contacted you from the hugging face repo but the more and more I look at your work I am more amazed.
Greeting from Argentina,
Juanma
PS: btw, my fork of the repo is here, I just started it today so it has almost nothing changed but you can take a look at the ToDo list and give me your feedback.. Otherwise I will still contact you as soon as I have something solid working on spanish.
The end of output from STEP 3:
Detected language: ja
こんな便利なもの持ってたんだ
Detected language: ja
あの人はもう戦わなくていいって
Detected language: ja
今の私は 誰が何と言おうと
Downloading: "https://github.com/r9y9/open_jtalk/releases/download/v1.11.1/open_jtalk_dic_utf_8-1.11.tar.gz"
dic.tar.gz: 100% 22.6M/22.6M [00:01<00:00, 18.3MB/s]
Extracting tar file /usr/local/lib/python3.8/dist-packages/pyopenjtalk/dic.tar.gz
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
DEBUG:jieba:Dumping model to file cache /tmp/jieba.cache
Loading model cost 1.172 seconds.
DEBUG:jieba:Loading model cost 1.172 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
*** buffer overflow detected ***: terminated
The output of STEP 4:
Reusing TensorBoard on port 6006 (pid 37700), started 0:02:42 ago. (Use '!kill 37700' to kill it.)
INFO:OUTPUT_MODEL:{'train': {'log_interval': 100, 'eval_interval': 1000, 'seed': 1234, 'epochs': 10000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 12, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'final_annotation_train.txt', 'validation_files': 'final_annotation_val.txt', 'text_cleaners': ['cjke_cleaners2'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1001, 'cleaned_text': True}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'symbols': ['_', ',', '.', '!', '?', '-', '~', '…', 'N', 'Q', 'a', 'b', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'ɑ', 'æ', 'ʃ', 'ʑ', 'ç', 'ɯ', 'ɪ', 'ɔ', 'ɛ', 'ɹ', 'ð', 'ə', 'ɫ', 'ɥ', 'ɸ', 'ʊ', 'ɾ', 'ʒ', 'θ', 'β', 'ŋ', 'ɦ', '⁼', 'ʰ', '`', '^', '#', '*', '=', 'ˈ', 'ˌ', '→', '↓', '↑', ' '], 'speakers': {'特别周 Special Week (Umamusume Pretty Derby)': 0, '无声铃鹿 Silence Suzuka (Umamusume Pretty Derby)': 1, '东海帝王 Tokai Teio (Umamusume Pretty Derby)': 2, '丸善斯基 Maruzensky (Umamusume Pretty Derby)': 3, '富士奇迹 Fuji Kiseki (Umamusume Pretty Derby)': 4, '小栗帽 Oguri Cap (Umamusume Pretty Derby)': 5, '黄金船 Gold Ship (Umamusume Pretty Derby)': 6, '伏特加 Vodka (Umamusume Pretty Derby)': 7, '大和赤骥 Daiwa Scarlet (Umamusume Pretty Derby)': 8, '大树快车 Taiki Shuttle (Umamusume Pretty Derby)': 9, '草上飞 Grass Wonder (Umamusume Pretty Derby)': 10, '菱亚马逊 Hishi Amazon (Umamusume Pretty Derby)': 11, '目白麦昆 Mejiro Mcqueen (Umamusume Pretty Derby)': 12, '神鹰 El Condor Pasa (Umamusume Pretty Derby)': 13, '好歌剧 T.M. Opera O (Umamusume Pretty Derby)': 14, '成田白仁 Narita Brian (Umamusume Pretty Derby)': 15, '鲁道夫象征 Symboli Rudolf (Umamusume Pretty Derby)': 16, '气槽 Air Groove (Umamusume Pretty Derby)': 17, '爱丽数码 Agnes Digital (Umamusume Pretty Derby)': 18, '青云天空 Seiun Sky (Umamusume Pretty Derby)': 19, '玉藻十字 Tamamo Cross (Umamusume Pretty Derby)': 20, '美妙姿势 Fine Motion (Umamusume Pretty Derby)': 21, '琵琶晨光 Biwa Hayahide (Umamusume Pretty Derby)': 22, '重炮 Mayano Topgun (Umamusume Pretty Derby)': 23, '曼城茶座 Manhattan Cafe (Umamusume Pretty Derby)': 24, '美普波旁 Mihono Bourbon (Umamusume Pretty Derby)': 25, '目白雷恩 Mejiro Ryan (Umamusume Pretty Derby)': 26, '雪之美人 Yukino Bijin (Umamusume Pretty Derby)': 28, '米浴 Rice Shower (Umamusume Pretty Derby)': 29, '艾尼斯风神 Ines Fujin (Umamusume Pretty Derby)': 30, '爱丽速子 Agnes Tachyon (Umamusume Pretty Derby)': 31, '爱慕织姬 Admire Vega (Umamusume Pretty Derby)': 32, '稻荷一 Inari One (Umamusume Pretty Derby)': 33, '胜利奖券 Winning Ticket (Umamusume Pretty Derby)': 34, '空中神宫 Air Shakur (Umamusume Pretty Derby)': 35, '荣进闪耀 Eishin Flash (Umamusume Pretty Derby)': 36, '真机伶 Curren Chan (Umamusume Pretty Derby)': 37, '川上公主 Kawakami Princess (Umamusume Pretty Derby)': 38, '黄金城市 Gold City (Umamusume Pretty Derby)': 39, '樱花进王 Sakura Bakushin O (Umamusume Pretty Derby)': 40, '采珠 Seeking the Pearl (Umamusume Pretty Derby)': 41, '新光风 Shinko Windy (Umamusume Pretty Derby)': 42, '东商变革 Sweep Tosho (Umamusume Pretty Derby)': 43, '超级小溪 Super Creek (Umamusume Pretty Derby)': 44, '醒目飞鹰 Smart Falcon (Umamusume Pretty Derby)': 45, '荒漠英雄 Zenno Rob Roy (Umamusume Pretty Derby)': 46, '东瀛佐敦 Tosen Jordan (Umamusume Pretty Derby)': 47, '中山庆典 Nakayama Festa (Umamusume Pretty Derby)': 48, '成田大进 Narita Taishin (Umamusume Pretty Derby)': 49, '西野花 Nishino Flower (Umamusume Pretty Derby)': 50, '春乌拉拉 Haru Urara (Umamusume Pretty Derby)': 51, '青竹回忆 Bamboo Memory (Umamusume Pretty Derby)': 52, '待兼福来 Matikane Fukukitaru (Umamusume Pretty Derby)': 55, '名将怒涛 Meisho Doto (Umamusume Pretty Derby)': 57, '目白多伯 Mejiro Dober (Umamusume Pretty Derby)': 58, '优秀素质 Nice Nature (Umamusume Pretty Derby)': 59, '帝王光环 King Halo (Umamusume Pretty Derby)': 60, '待兼诗歌剧 Matikane Tannhauser (Umamusume Pretty Derby)': 61, '生野狄杜斯 Ikuno Dictus (Umamusume Pretty Derby)': 62, '目白善信 Mejiro Palmer (Umamusume Pretty Derby)': 63, '大拓太阳神 Daitaku Helios (Umamusume Pretty Derby)': 64, '双涡轮 Twin Turbo (Umamusume Pretty Derby)': 65, '里见光钻 Satono Diamond (Umamusume Pretty Derby)': 66, '北部玄驹 Kitasan Black (Umamusume Pretty Derby)': 67, '樱花千代王 Sakura Chiyono O (Umamusume Pretty Derby)': 68, '天狼星象征 Sirius Symboli (Umamusume Pretty Derby)': 69, '目白阿尔丹 Mejiro Ardan (Umamusume Pretty Derby)': 70, '八重无敌 Yaeno Muteki (Umamusume Pretty Derby)': 71, '鹤丸刚志 Tsurumaru Tsuyoshi (Umamusume Pretty Derby)': 72, '目白光明 Mejiro Bright (Umamusume Pretty Derby)': 73, '樱花桂冠 Sakura Laurel (Umamusume Pretty Derby)': 74, '成田路 Narita Top Road (Umamusume Pretty Derby)': 75, '也文摄辉 Yamanin Zephyr (Umamusume Pretty Derby)': 76, '真弓快车 Aston Machan (Umamusume Pretty Derby)': 80, '骏川手纲 Hayakawa Tazuna (Umamusume Pretty Derby)': 81, '小林历奇 Kopano Rickey (Umamusume Pretty Derby)': 83, '奇锐骏 Wonder Acute (Umamusume Pretty Derby)': 85, '秋川理事长 President Akikawa (Umamusume Pretty Derby)': 86, '綾地 寧々 Ayachi Nene (Sanoba Witch)': 87, '因幡 めぐる Inaba Meguru (Sanoba Witch)': 88, '椎葉 紬 Shiiba Tsumugi (Sanoba Witch)': 89, '仮屋 和奏 Kariya Wakama (Sanoba Witch)': 90, '戸隠 憧子 Togakushi Touko (Sanoba Witch)': 91, '九条裟罗 Kujou Sara (Genshin Impact)': 92, '芭芭拉 Barbara (Genshin Impact)': 93, '派蒙 Paimon (Genshin Impact)': 94, '荒泷一斗 Arataki Itto (Genshin Impact)': 96, '早柚 Sayu (Genshin Impact)': 97, '香菱 Xiangling (Genshin Impact)': 98, '神里绫华 Kamisato Ayaka (Genshin Impact)': 99, '重云 Chongyun (Genshin Impact)': 100, '流浪者 Wanderer (Genshin Impact)': 102, '优菈 Eula (Genshin Impact)': 103, '凝光 Ningguang (Genshin Impact)': 105, '钟离 Zhongli (Genshin Impact)': 106, '雷电将军 Raiden Shogun (Genshin Impact)': 107, '枫原万叶 Kaedehara Kazuha (Genshin Impact)': 108, '赛诺 Cyno (Genshin Impact)': 109, '诺艾尔 Noelle (Genshin Impact)': 112, '八重神子 Yae Miko (Genshin Impact)': 113, '凯亚 Kaeya (Genshin Impact)': 114, '魈 Xiao (Genshin Impact)': 115, '托马 Thoma (Genshin Impact)': 116, '可莉 Klee (Genshin Impact)': 117, '迪卢克 Diluc (Genshin Impact)': 120, '夜兰 Yelan (Genshin Impact)': 121, '鹿野院平藏 Shikanoin Heizou (Genshin Impact)': 123, '辛焱 Xinyan (Genshin Impact)': 124, '丽莎 Lisa (Genshin Impact)': 125, '云堇 Yun Jin (Genshin Impact)': 126, '坎蒂丝 Candace (Genshin Impact)': 127, '罗莎莉亚 Rosaria (Genshin Impact)': 128, '北斗 Beidou (Genshin Impact)': 129, '珊瑚宫心海 Sangonomiya Kokomi (Genshin Impact)': 132, '烟绯 Yanfei (Genshin Impact)': 133, '久岐忍 Kuki Shinobu (Genshin Impact)': 136, '宵宫 Yoimiya (Genshin Impact)': 139, '安柏 Amber (Genshin Impact)': 143, '迪奥娜 Diona (Genshin Impact)': 144, '班尼特 Bennett (Genshin Impact)': 146, '雷泽 Razor (Genshin Impact)': 147, '阿贝多 Albedo (Genshin Impact)': 151, '温迪 Venti (Genshin Impact)': 152, '空 Player Male (Genshin Impact)': 153, '神里绫人 Kamisato Ayato (Genshin Impact)': 154, '琴 Jean (Genshin Impact)': 155, '艾尔海森 Alhaitham (Genshin Impact)': 156, '莫娜 Mona (Genshin Impact)': 157, '妮露 Nilou (Genshin Impact)': 159, '胡桃 Hu Tao (Genshin Impact)': 160, '甘雨 Ganyu (Genshin Impact)': 161, '纳西妲 Nahida (Genshin Impact)': 162, '刻晴 Keqing (Genshin Impact)': 165, '荧 Player Female (Genshin Impact)': 169, '埃洛伊 Aloy (Genshin Impact)': 179, '柯莱 Collei (Genshin Impact)': 182, '多莉 Dori (Genshin Impact)': 184, '提纳里 Tighnari (Genshin Impact)': 186, '砂糖 Sucrose (Genshin Impact)': 188, '行秋 Xingqiu (Genshin Impact)': 190, '奥兹 Oz (Genshin Impact)': 193, '五郎 Gorou (Genshin Impact)': 198, '达达利亚 Tartalia (Genshin Impact)': 202, '七七 Qiqi (Genshin Impact)': 207, '申鹤 Shenhe (Genshin Impact)': 217, '莱依拉 Layla (Genshin Impact)': 228, '菲谢尔 Fishl (Genshin Impact)': 230, 'User': 999}, 'model_dir': '././OUTPUT_MODEL', 'max_epochs': 20}
2023-02-23 03:10:15.392600: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-02-23 03:10:16.901032: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-23 03:10:16.901213: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-23 03:10:16.901242: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
...
...
...
0% 0/55 [00:34<?, ?it/s]
Traceback (most recent call last):
File "finetune_speaker.py", line 320, in <module>
main()
File "finetune_speaker.py", line 55, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/VITS_voice_conversion/finetune_speaker.py", line 133, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/VITS_voice_conversion/finetune_speaker.py", line 241, in train_and_evaluate
evaluate(hps, net_g, eval_loader, writer_eval)
File "/content/VITS_voice_conversion/finetune_speaker.py", line 279, in evaluate
y_hat, attn, mask, *_ = generator.module.infer(x, x_lengths, speakers, max_len=1000)
UnboundLocalError: local variable 'x' referenced before assignment
总是显示
python3: can't open file 'rearrange_speaker.py': [Errno 2] No such file or directory
ERROR:root:File 'download_model.py'
not found.
ValueError Traceback (most recent call last)
/content/VITS-fast-fine-tuning/preprocess_v2.py in
67 if len(txt) > 150:
68 continue
---> 69 cleaned_text = text._clean_text(txt, hps['data']['text_cleaners'])
70 cleaned_text += "\n" if not cleaned_text.endswith("\n") else ""
71 cleaned_new_annos.append(path + "|" + str(speaker2id[speaker]) + "|" + cleaned_text)
7 frames
/usr/local/lib/python3.8/dist-packages/cn2an/an2cn.py in __integer_convert(self, integer_data, mode)
154 len_integer_data = len(integer_data)
155 if len_integer_data > len(unit_list):
--> 156 raise ValueError(f"超出数据范围,最长支持 {len(unit_list)} 位")
157
158 output_an = ""
ValueError: 超出数据范围,最长支持 16 位
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\Users\Yan\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "G:\VITS\finetune_speaker_v2.py", line 134, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "G:\VITS\finetune_speaker_v2.py", line 242, in train_and_evaluate
evaluate(hps, net_g, eval_loader, writer_eval)
File "G:\VITS\finetune_speaker_v2.py", line 265, in evaluate
for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths, speakers) in enumerate(eval_loader):
File "C:\Users\Yan\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 628, in next
data = self._next_data()
File "C:\Users\Yan\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\Yan\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data_utils\fetch.py", line 61, in fetch
return self.collate_fn(data)
File "G:\VITS\data_utils.py", line 159, in call
spec_padded[i, :, :spec.size(1)] = spec
RuntimeError: expand(torch.FloatTensor{[2, 513, 478]}, size=[513, 513]): the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (3)
首先感谢大佬的分享和优化!我自己在尝试本地windows 环境下CONDA python3.9部署后,安装环境未出现错误,于是尝试下一步
Q1,两个预训练模型都下载了(CJ与CJE),如果想更换其他标准汉语单语种(嗯……现在大佐味道很浓),这个是不是就是关键的部分?(当然还有configs/finetune_speaker.json的配置文件)
Q2,sampled_audio4ft这个文件以及.TXT是用来做什么的呢,(应该是数据集验证吧,看到里面有音素的标注,但是800多条,缺失了很大一部分中文的标注)自己更换的话,是否需要同步进行替换?或者如何进行音素的转换?
Q3,在本地命令行输入“python denoise_audio.py”去噪,“python short_audio_transcribe.py --languages "CJE" --whisper_size medium”后,发现并未识别出中文字幕,只是一直一行一行显示“Detected language: zh”直到结束。
Q4,由于失败未进行下一步。若使用辅助训练,则是否命令行输入“python finetune_speaker_v2.py -m "./OUTPUT_MODEL" --max_epochs "20"”
期待大佬您的答疑解惑!
请问自动去除背景音包括音乐吗?我自己通过Vocal Remover类似的软件分离音乐和说话声数据会不会效果更好些。
在进行STEP 3时已经完成后似乎并没有生成任何txt
Detected language: ja
ですが……
Detected language: ja
キュッ!
Detected language: ja
絶好の探検日よりですね!
Detected language: ja
天文班の皆さんのように、私も大きな夢を持ちたいです!
Detected language: ja
見てください!あんなところに昔川だった痕跡が!
Detected language: ja
昔はこうやって測量しながら地図を作っていたみたいですね。私も自分の地図を作るために、少しずつ歩いていかないと。仙里の道も一歩から、ですね。
Detected language: ja
このあたりの測量はバッチリです!
Detected language: ja
こうです!
Detected language: ja
本当に今日バーベキューをやるんですか?
Downloading: "https://github.com/r9y9/open_jtalk/releases/download/v1.11.1/open_jtalk_dic_utf_8-1.11.tar.gz"
dic.tar.gz: 100%|██████████████████████████| 22.6M/22.6M [00:02<00:00, 10.1MB/s]
Extracting tar file /opt/conda/lib/python3.7/site-packages/pyopenjtalk/dic.tar.gz
finished
但是ls ./
时并未有根目录更新文件
之前
LICENSE download_model.py sampled_audio4ft
README.md finetune_speaker.py sampled_audio4ft.txt
README_EN.md losses.py sampled_audio4ft.zip
README_ZH.md mel_processing.py text
VC_inference.py models.py transforms.py
attentions.py models_infer.py user_voice
commons.py modules.py user_voice_collect.py
configs monotonic_align utils.py
custom_character_anno.txt preprocess.py video_transcribe.py
custom_character_voice pretrained_models voice_upload.py
data_utils.py requirements.txt whisper_transcribe.py
demucs_denoise.py requirements_infer.txt
之后
LICENSE demucs_denoise.py sampled_audio4ft
OUTPUT_MODEL download_model.py sampled_audio4ft.txt
README.md finetune_speaker.py sampled_audio4ft.zip
README_EN.md losses.py text
README_ZH.md mel_processing.py transforms.py
VC_inference.py models.py user_voice
__pycache__ models_infer.py user_voice_collect.py
attentions.py modules.py utils.py
commons.py monotonic_align video_transcribe.py
configs preprocess.py voice_upload.py
custom_character_anno.txt pretrained_models whisper_transcribe.py
custom_character_voice requirements.txt
data_utils.py requirements_infer.txt
导致STEP 4无法找到final_annotation_train.txt
Traceback (most recent call last):
File "finetune_speaker.py", line 320, in <module>
main()
File "finetune_speaker.py", line 55, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/kaggle/working/VITS_voice_conversion/finetune_speaker.py", line 72, in run
train_dataset = TextAudioSpeakerLoader(hps.data.training_files, hps.data)
File "/kaggle/working/VITS_voice_conversion/data_utils.py", line 164, in __init__
self.audiopaths_sid_text = load_filepaths_and_text(audiopaths_sid_text)
File "/kaggle/working/VITS_voice_conversion/utils.py", line 144, in load_filepaths_and_text
with open(filename, encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'final_annotation_train.txt'
但是因为看见 #14 中的问题,所以将colab运行完成的STEP 3中生成的custom_character_anno.txt
下载下来上传到根目录,但是也不行
custom_character_anno.txt大概内容
./custom_character_voice/mai/processed_0.wav|1000|ga↑sʃɯ*kɯmi↓taina mo↑no↓desɯ*ka?
./custom_character_voice/mai/processed_1.wav|1000|a↑ɾi↓gatoo go↑zaima↓sɯ*!
请问如何才能生成final_annotation_train.txt?(或者final_annotation_train.txt是在哪里?)
步骤2:Audio saved to ./user_voice/4.wav successfully!
步骤2.5
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS_voice_conversion/separated/htdemucs
Separating track user_voice/21.wav
Traceback (most recent call last):
File "/usr/local/bin/demucs", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/demucs/separate.py", line 159, in main
wav = load_track(track, model.audio_channels, model.samplerate)
File "/usr/local/lib/python3.8/dist-packages/demucs/separate.py", line 41, in load_track
wav = convert_audio(wav, sr, samplerate, audio_channels)
File "/usr/local/lib/python3.8/dist-packages/demucs/audio.py", line 175, in convert_audio
return julius.resample_frac(wav, from_samplerate, to_samplerate)
File "/usr/local/lib/python3.8/dist-packages/julius/resample.py", line 166, in resample_frac
return ResampleFrac(old_sr, new_sr, zeros, rolloff).to(x)(x, output_length, full)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/julius/resample.py", line 132, in forward
x = x.reshape(-1, length)
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS_voice_conversion/separated/htdemucs
Separating track user_voice/12.wav
Traceback (most recent call last):
File "/usr/local/bin/demucs", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/demucs/separate.py", line 159, in main
wav = load_track(track, model.audio_channels, model.samplerate)
File "/usr/local/lib/python3.8/dist-packages/demucs/separate.py", line 41, in load_track
wav = convert_audio(wav, sr, samplerate, audio_channels)
File "/usr/local/lib/python3.8/dist-packages/demucs/audio.py", line 175, in convert_audio
return julius.resample_frac(wav, from_samplerate, to_samplerate)
File "/usr/local/lib/python3.8/dist-packages/julius/resample.py", line 166, in resample_frac
return ResampleFrac(old_sr, new_sr, zeros, rolloff).to(x)(x, output_length, full)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/julius/resample.py", line 132, in forward
x = x.reshape(-1, length)
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous
步骤3(节选):
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-21 06:49:25.956540: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-21 06:49:25.956650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-21 06:49:25.956689: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
100%|██████████████████████████████████████| 1.42G/1.42G [00:11<00:00, 133MiB/s]
Detected language: ja
よしよし、ありがとうよ
Detected language: ja
龍は大嫌いだ。だが食ったらうまい。
Detected language: ja
今気づいたが、むしみだな、俺。殴り合いが好きってのはダメか?ダメだよな。
Detected language: ja
安心しろって。バーサーカーでもサーヴァント。お前の身は俺が守ってやるぞ。
Detected language: ja
なかなかやるな、てめえ。ま、楽しめたぜ。
Detected language: ja
聖杯ね。ま、欲しいんならいいんじゃねえか。俺はいらねえよ。
Detected language: ja
こいつは最高だ!
Detected language: ja
いい汗かいたぜ。じゃあ二回戦やるか。ダメか?そりゃ残念。
Detected language: ja
なかなかいいパンチだったぜ。しかし、俺の方が殴り慣れてる。
Detected language: ja
しょうがねえ。腹割って付き合ってやろうじゃねえか。何が望みだ?
Detected language: ja
おっと、いい感じじゃねーか
Detected language: ja
ありがとうよお前さんのおかげださあ一緒殴り合うかああ断るそうか残念だはっはっは
Detected language: ja
ああ、くそ。 悪いな。先行くわ。
Detected language: ja
たけ、何か用か?もてやましてんだが
Detected language: ja
気なくさいな。何かあるんだろう。行ってみるか。
Detected language: ja
悪いことするときは目を背けてやるさもちろん限度ってもんがあるがな
Detected language: ja
おいおいマスター、引きこもって何になる?え?
Detected language: ja
いいね、強くなってらしい
Detected language: ja
ソラよ、クレティール
Detected language: ja
悪い悪い、なんでもねえよ
Detected language: ja
いいじゃねーか 気に入ったこれからも気に食わない連中は殴って殴ってもう一度殴っちまえよ
Detected language: ja
オラオラオラ、どしたどした!
Detected language: ja
さーて、ぶん殴り合いのお時間だ。男女問わず倒れるまでやろうや!
Detected language: ja
来たか?ならいいさ、殴って蹴ってそっぱりしてやる!
Detected language: ja
サーヴァント・バーサーか。 真名・ベオウルフ。じゃあ殴りに行こうぜ、マスター。おいおい、引くなよ。
Detected language: ja
これが戦いの根源だ。要するに、殴って蹴って立っていた方の勝ちってやつよ!オラオラオラ!ぶっ飛ぶへ!
Detected language: ja
おっと、てめえの生まれた日じゃねえか。おら、空に向かって感謝しな。
Downloading: "https://github.com/r9y9/open_jtalk/releases/download/v1.11.1/open_jtalk_dic_utf_8-1.11.tar.gz"
dic.tar.gz: 100% 22.6M/22.6M [00:01<00:00, 12.9MB/s]
Extracting tar file /usr/local/lib/python3.8/dist-packages/pyopenjtalk/dic.tar.gz
finished
步骤4:
DEBUG:matplotlib:CACHEDIR=/root/.cache/matplotlib
DEBUG:matplotlib.font_manager:Using fontManager instance from /root/.cache/matplotlib/fontlist-v310.json
DEBUG:matplotlib.pyplot:Loaded backend agg version unknown.
DEBUG:matplotlib.pyplot:Loaded backend agg version unknown.
0% 0/84 [00:33<?, ?it/s]
Traceback (most recent call last):
File "finetune_speaker.py", line 320, in
main()
File "finetune_speaker.py", line 55, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/VITS_voice_conversion/finetune_speaker.py", line 133, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/VITS_voice_conversion/finetune_speaker.py", line 241, in train_and_evaluate
evaluate(hps, net_g, eval_loader, writer_eval)
File "/content/VITS_voice_conversion/finetune_speaker.py", line 264, in evaluate
for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths, speakers) in enumerate(eval_loader):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/VITS_voice_conversion/data_utils.py", line 248, in getitem
return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index])
File "/content/VITS_voice_conversion/data_utils.py", line 206, in get_audio_text_speaker_pair
spec, wav = self.get_audio(audiopath)
File "/content/VITS_voice_conversion/data_utils.py", line 223, in get_audio
spec = spectrogram_torch(audio_norm, self.filter_length,
File "/content/VITS_voice_conversion/mel_processing.py", line 52, in spectrogram_torch
if torch.min(y) < -1.:
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
请问如何替换pretrain models?我在.../configs/modified_finetune_speaker.json中没有看到pretrain model的参数,是不是将“pretrain_models”文件夹中的的G_0.pth替换为自己的模型即可?是否要改名为G_0.pth?
BTW:我的数据集大概有10个小时(600MB的wav)的语音,训练了(带aux数据)30epoch之后声音很清晰,但是说出来的并不是地球话:rofl:。请问下这个量级大概需要多少epoch?将产出的G_lastest.pth替换上述的G_0.pth能否实现在之前的训练的基础上继续训练?
When trying to run step 3 after step 1, I get the following error:
unzip: cannot find or open ./custom_character_voice/custom_character_voice.zip, ./custom_character_voice/custom_character_voice.zip.zip or ./custom_character_voice/custom_character_voice.zip.ZIP.
python3: can't open file 'whisper_transcribe.py': [Errno 2] No such file or directory
Not sure what's causing this, as they're both in the file browser.
请问如果音源有一点点噪音(几段音源有声音很小的bgm)的话是不是就会这样?
如图
pip install -r requirements.txt执行失败
下载和pip是没啥问题 好像是那个构建炸了
没看出来是少了什么
可以帮我看看哪里出了问题嘛?非常感谢
F:\vits\VITS-fast-fine-tuning-main>pip install -r requirements.txt
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting Cython
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/56/3a/e59db3769dee48409c759a88b62cd605324e05d396e10af0a065adc956ad/Cython-0.29.33-py2.py3-none-any.whl (987 kB)
Collecting librosa
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/bc/2e/80370da514096c6190f8913668198380ea09c2d252cfa4e85a9c096d3b40/librosa-0.10.0-py3-none-any.whl (252 kB)
Requirement already satisfied: numpy in c:\python\lib\site-packages (from -r requirements.txt (line 3)) (1.24.2)
Collecting scipy
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/ec/e3/b06ac3738bf365e89710205a471abe7dceec672a51c244b469bc5d1291c7/scipy-1.10.1-cp310-cp310-win_amd64.whl (42.5 MB)
Collecting tensorboard
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/8d/71/75fcfab1ff98e3fad240f760d3a6b5ca6bdbcc5ed141fb7abd35cf63134c/tensorboard-2.12.0-py3-none-any.whl (5.6 MB)
Collecting torch
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/33/bd/e174e6737daba03f8eaa7c051b9971d361022eb37b86cbe5db0b08cab00e/torch-1.13.1-cp310-cp310-win_amd64.whl (162.6 MB)
Collecting torchvision
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/b8/e0/edf3d41324c27f246abe1a4942227c6abe44fb2e62d35807178acb1355ba/torchvision-0.14.1-cp310-cp310-win_amd64.whl (1.1 MB)
Collecting torchaudio
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/48/0b/99c8f10fccccef0279acdfa2a6c27dd19d7eab3be1fd8fa59c09ad06b436/torchaudio-0.13.1-cp310-cp310-win_amd64.whl (2.0 MB)
Collecting unidecode
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/be/ea/90e14e807da5a39e5b16789acacd48d63ca3e4f23dfa964a840eeadebb13/Unidecode-1.3.6-py3-none-any.whl (235 kB)
Collecting pyopenjtalk
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/b4/80/a2505a37937fcd108b7c1ab66f7d1d48560525b1da71993860d11095a286/pyopenjtalk-0.3.0.tar.gz (1.5 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [25 lines of output]
setup.py:26: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
_CYTHON_INSTALLED = ver >= LooseVersion(min_cython_ver)
Traceback (most recent call last):
File "C:\python\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 353, in
main()
File "C:\python\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "C:\python\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "C:\Users\Night\AppData\Local\Temp\pip-build-env-xmh3jw0x\overlay\Lib\site-packages\setuptools\build_meta.py", line 162, in get_requires_for_build_wheel
return self._get_build_requires(
File "C:\Users\Night\AppData\Local\Temp\pip-build-env-xmh3jw0x\overlay\Lib\site-packages\setuptools\build_meta.py", line 143, in _get_build_requires
self.run_setup()
File "C:\Users\Night\AppData\Local\Temp\pip-build-env-xmh3jw0x\overlay\Lib\site-packages\setuptools\build_meta.py", line 267, in run_setup
super(_BuildMetaLegacyBackend,
File "C:\Users\Night\AppData\Local\Temp\pip-build-env-xmh3jw0x\overlay\Lib\site-packages\setuptools\build_meta.py", line 158, in run_setup
exec(compile(code, file, 'exec'), locals())
File "setup.py", line 153, in
File "C:\python\lib\subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\python\lib\subprocess.py", line 971, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\python\lib\subprocess.py", line 1440, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
程序报错,colab上导出前是正常的,导出后到本地就报错
Hi ! I got an error on step 5 for rearrange_speaker
It's appear
rearrange_speaker.py:18: SyntaxWarning: list indices must be integers or slices, not str; perhaps you missed a comma?
old_emb_g = model_sd(['model']['emb_g.weight'])
Traceback (most recent call last):
File "rearrange_speaker.py", line 18, in
old_emb_g = model_sd(['model']['emb_g.weight'])
TypeError: list indices must be integers or slices, not str
我在其它远程Linux服务器上无法正常部署该项目环境。所以我想直接使用该项目中已有的模型并在此基础上微调。但是配置文件似乎有点问题,一开始训练就会直接覆盖掉原本的预训练模型。
请问如何实现原本vits中的预训练模型微调部署?非常感谢!
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in wrap
fn(i, *args)
File "/content/VITS_voice_conversion/finetune_speaker.py", line 133, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/VITS_voice_conversion/finetune_speaker.py", line 241, in train_and_evaluate
evaluate(hps, net_g, eval_loader, writer_eval)
File "/content/VITS_voice_conversion/finetune_speaker.py", line 279, in evaluate
y_hat, attn, mask, * = generator.module.infer(x, x_lengths, speakers, max_len=1000)
UnboundLocalError: local variable 'x' referenced before assignment
已检查文件目录结构
使用视频链接上传,一直报错。也尝试过在谷歌云盘上传长音频,也报错,错误信息相同。
denoised_audio 和 /separated/htdemucs文件夹中无文件
使用txt文件:
characters.txt
报错信息:
100%|██████████| 59539/59539 [00:28<00:00, 2061.93it/s][MoviePy] Done.
[MoviePy] Writing audio in ./raw_audio/TeacherShen_62695.wav
100%|██████████| 47538/47538 [00:22<00:00, 2101.01it/s][MoviePy] Done.
[MoviePy] Writing audio in ./raw_audio/TeacherShen_881627.wav
100%|██████████| 67062/67062 [00:34<00:00, 1969.92it/s][MoviePy] Done.
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100% 80.2M/80.2M [00:06<00:00, 13.3MB/s]
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/TeacherShen_881627.wav
100%|██████████████████████████████████████████████████████████████████████| 3042.0/3042.0 [02:25<00:00, 20.88seconds/s]
Killed
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/TeacherShen_62695.wav
100%|████████████████████████████████████████████████████████████████████| 2158.65/2158.65 [01:45<00:00, 20.50seconds/s]
Killed
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/TeacherShen_37232.wav
100%|██████████████████████████████████████████████████████████████████████| 2702.7/2702.7 [02:11<00:00, 20.55seconds/s]
Killed
Traceback (most recent call last):
File "denoise_audio.py", line 12, in <module>
wav, sr = torchaudio.load(f"./separated/htdemucs/{file}/vocals.wav", frame_offset=0, num_frames=-1, normalize=True,
File "/usr/local/lib/python3.8/dist-packages/torchaudio/backend/sox_io_backend.py", line 246, in load
return _fallback_load(filepath, frame_offset, num_frames, normalize, channels_first, format)
File "/usr/local/lib/python3.8/dist-packages/torchaudio/io/_compat.py", line 103, in load_audio
s = torch.classes.torchaudio.ffmpeg_StreamReader(src, format, None)
RuntimeError: Failed to open the input "./separated/htdemucs/TeacherShen_881627/vocals.wav" (No such file or directory).
2023-03-02 12:30:14.745081: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-02 12:30:18.161604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-02 12:30:18.161791: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-02 12:30:18.161814: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Warning: no long audios & videos found, this IS expected if you have only uploaded short audios
this IS NOT expected if you have uploaded any long audios, videos or video links. Please check your file structure or make sure your audio/video language is supported.
2023-03-02 12:30:56.773756: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-02 12:30:57.789432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-02 12:30:57.789562: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-02 12:30:57.789583: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Warning: no short audios found, this IS expected if you have only uploaded long audios, videos or video links.
this IS NOT expected if you have uploaded a zip file of short audios. Please check your file structure or make sure your audio language is supported.```
STEP 3 自动处理所有上传的数据
时报错Traceback (most recent call last):
File "denoise_audio.py", line 12, in <module>
wav, sr = torchaudio.load(f"./separated/htdemucs/{file}/vocals.wav", frame_offset=0, num_frames=-1, normalize=True,
File "/usr/local/lib/python3.8/dist-packages/torchaudio/backend/sox_io_backend.py", line 246, in load
return _fallback_load(filepath, frame_offset, num_frames, normalize, channels_first, format)
File "/usr/local/lib/python3.8/dist-packages/torchaudio/io/_compat.py", line 103, in load_audio
s = torch.classes.torchaudio.ffmpeg_StreamReader(src, format, None)
RuntimeError: Failed to open the input "./separated/htdemucs/啊啊啊啊啊_000001.mp3/vocals.wav" (No such file or directory).
2023-03-01 04:05:04.300610: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-01 04:05:05.375819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-01 04:05:05.375961: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-01 04:05:05.375984: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-03-01 04:05:31.699668: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-01 04:05:32.933781: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-01 04:05:32.933914: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-01 04:05:32.933939: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
nvidia-smi
输出Wed Mar 1 04:14:02 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 69C P0 31W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
若继续执行下面的代码块会出现断言错误no speaker found
步骤3:
Your-zip-file.zip(application/x-zip-compressed) - 9877593 bytes, last modified: 2023/2/24 - 100% done
Saving Your-zip-file.zip to Your-zip-file.zip
Archive: ./custom_character_voice/custom_character_voice.zip
creating: ./custom_character_voice/Your-zip-file/
creating: ./custom_character_voice/Your-zip-file/RPK16/
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_ALLHALLOWS_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_BREAK_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_BUILDOVER_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_COMBINE_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_DIALOGUE1_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_DIALOGUE3_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_FEED_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_FORMATION_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_GAIN_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_GOATTACK_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_HELLO_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_LOADING_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_MEET_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_NEWYEAR_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_OPERATIONBEGIN_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_OPERATIONOVER_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_RETREAT_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_TIP_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_VALENTINE_JP.wav
inflating: ./custom_character_voice/Your-zip-file/RPK16/RPK16_WIN_JP.wav
2023-02-24 00:46:22.335055: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-24 00:46:23.282110: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-24 00:46:23.282236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-24 00:46:23.282258: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
finished
步骤4节选:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in wrap
fn(i, *args)
File "/content/VITS_voice_conversion/VITS_voice_conversion/VITS_voice_conversion/VITS_voice_conversion/finetune_speaker.py", line 133, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/VITS_voice_conversion/VITS_voice_conversion/VITS_voice_conversion/VITS_voice_conversion/finetune_speaker.py", line 241, in train_and_evaluate
evaluate(hps, net_g, eval_loader, writer_eval)
File "/content/VITS_voice_conversion/VITS_voice_conversion/VITS_voice_conversion/VITS_voice_conversion/finetune_speaker.py", line 279, in evaluate
y_hat, attn, mask, * = generator.module.infer(x, x_lengths, speakers, max_len=1000)
UnboundLocalError: local variable 'x' referenced before assignment
I've loaded the dataset into colab but got this error on step 4:
0% 0/55 [00:37<?, ?it/s]
Traceback (most recent call last):
File "finetune_speaker.py", line 320, in
main()
File "finetune_speaker.py", line 55, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in wrap
fn(i, *args)
File "/content/VITS_voice_conversion/finetune_speaker.py", line 133, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/VITS_voice_conversion/finetune_speaker.py", line 241, in train_and_evaluate
evaluate(hps, net_g, eval_loader, writer_eval)
File "/content/VITS_voice_conversion/finetune_speaker.py", line 279, in evaluate
y_hat, attn, mask, * = generator.module.infer(x, x_lengths, speakers, max_len=1000)
UnboundLocalError: local variable 'x' referenced before assignment
My dataset is 10 voices which include 10-30 10 second .mp3s.
1660Ti 6GB,类似vits项目改batch_size为4勉强能跑,但是这里依然爆显存
然后是一点小建议:
训练的 backend 部分建议这么写
from sys import platform
if platform == "win32":
backend = 'gloo'
else:
backend = 'nccl'
dist.init_process_group(backend=backend, init_method='env://', world_size=n_gpus, rank=rank)
重采样部分建议拆出来用多进程处理
参考 resample.py
提示ZeroDivisionError: integer division or modulo by zero
之前成功过一次,后来再试就一直报这个错误
经过b站热心用户测试,只要下载通过colab生成的音频标注文件,就可以正常进行后续训练。
想问问标注文件的格式,以便通过其他方式进行生成
内容为约1500条mp3格式短语音。
未勾选加入辅助训练数据。
错误信息如下:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/VITS-fast-fine-tuning/finetune_speaker_v2.py", line 133, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/VITS-fast-fine-tuning/finetune_speaker_v2.py", line 153, in train_and_evaluate
for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths, speakers) in enumerate(tqdm(train_loader)):
File "/usr/local/lib/python3.8/dist-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 435, in __iter__
return self._get_iterator()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 988, in __init__
super(_MultiProcessingDataLoaderIter, self).__init__(loader)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 598, in __init__
self._sampler_iter = iter(self._index_sampler)
File "/content/VITS-fast-fine-tuning/data_utils.py", line 233, in __iter__
ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]
ZeroDivisionError: integer division or modulo by zero
Hi again!
I found Thai cleaner in the text folder so I have a question that possible to train the model in Thai or other language?
Thank!
Great work! Was hoping you could give some brief details on your pretraining - mostly how many hours of data per speaker and how many epochs.
训练一个多少时候出现报错,与之前issue中的error似乎不同:
Imageio: 'ffmpeg-linux64-v3.3.1' was not found on your computer; downloading it now.
Try 1. Download from https://github.com/imageio/imageio-binaries/raw/master/ffmpeg/ffmpeg-linux64-v3.3.1 (43.8 MB)
Downloading: 45929032/45929032 bytes (100.0%)
Done
File saved as /root/.imageio/ffmpeg/ffmpeg-linux64-v3.3.1.
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100% 80.2M/80.2M [00:04<00:00, 20.3MB/s]
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/66_6.wav
100%|██████████████████████████████████████████████| 3708.8999999999996/3708.8999999999996 [01:16<00:00, 48.46seconds/s]
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/66_0.wav
100%|████████████████████████████████████████████████| 620.0999999999999/620.0999999999999 [00:14<00:00, 43.96seconds/s]
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/66_3.wav
100%|██████████████████████████████████████████████████████████████████████| 2843.1/2843.1 [00:55<00:00, 50.85seconds/s]
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/66_5.wav
100%|████████████████████████████████████████████████████████████████████| 4124.25/4124.25 [01:20<00:00, 51.52seconds/s]
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/66_7.wav
100%|██████████████████████████████████████████████| 3650.3999999999996/3650.3999999999996 [01:11<00:00, 51.35seconds/s]
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/66_4.wav
100%|██████████████████████████████████████████████| 3469.0499999999997/3469.0499999999997 [01:07<00:00, 51.24seconds/s]
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/66_2.wav
100%|██████████████████████████████████████████████████████████████████████| 3077.1/3077.1 [01:00<00:00, 50.85seconds/s]
Important: the default model was recently changed to htdemucs
the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q
.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /content/VITS-fast-fine-tuning/separated/htdemucs
Separating track raw_audio/66_1.wav
100%|████████████████████████████████████████████████████████████████████| 4592.25/4592.25 [01:29<00:00, 51.51seconds/s]
2023-02-27 11:45:40.268184: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-27 11:45:40.419090: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2023-02-27 11:45:41.854468: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-02-27 11:45:41.854570: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-02-27 11:45:41.854591: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
100%|██████████████████████████████████████| 1.42G/1.42G [00:06<00:00, 228MiB/s]
transcribing ./denoised_audio/66_6.wav...
transcribing ./denoised_audio/66_0.wav...
transcribing ./denoised_audio/66_3.wav...
transcribing ./denoised_audio/66_5.wav...
transcribing ./denoised_audio/66_7.wav...
nn not supported, ignoring...
Traceback (most recent call last):
File "long_audio_transcribe.py", line 50, in
text = lang2token[lang] + text.replace("\n", "") + lang2token[lang]
KeyError: 'nn'
2023-02-27 12:32:19.528138: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-27 12:32:19.687193: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2023-02-27 12:32:20.538124: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-02-27 12:32:20.538216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-02-27 12:32:20.538236: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
##################################################################################################
可能存在的问题是音频样本量过大,通过google drive上传了7段2个小时(700mb)左右的音频,第三步处理了一个多小时,通过打印结果似乎很多都处理成功了。
第三步的一个多少小时已经消耗了我colab pro一半的额度了:( 请问下有什么办法能获得已经成功的文字标注的txt文档吗?并且如何通过现成的短wav+txt标注的数据集进行训练?(第三步中长音频已经成功降噪,也分割成了短音频,我下载成zip了,现在需要对应的文字标注)
##################################################################################################
音频文件结构:
Traceback (most recent call last):File "inference.py",line 86,in File "utils.py",line 194in get_hparams_from_fileFileNotFoundError: [Errno 2] No such file or directory: ./finetune-speaker.json![23136] Failed to execute script iinference' due to unhandled exception!
英语合成出来几乎无法正常发音,是需要更换其他的底模吗?有推荐的么?
The console outputted:
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\MyPath\\inference\\jieba\\dict.txt'
It turned out to be fine after I manually add this file from the repository of Jieba.
To reproduce this error I deleted dict.txt
and found that Jieba will use cache:
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\0x114514BB\AppData\Local\Temp\jieba.cache
DEBUG:jieba:Loading model from cache C:\Users\0x114514BB\AppData\Local\Temp\jieba.cache
Loading model cost 0.657 seconds.
DEBUG:jieba:Loading model cost 0.657 seconds.
I reproduced this error after deleting jieba.cache
. Please add jieba/dict.txt
in the release pack.
COLAB自带的上传和下载功能非常不好用,限速非常严重,而且会因为意外的断连导致数据全部丢失。可以添加几行简单的代码就集成这个功能,请问能不能集成到那个Jupiter里面呢?
举个例子,so-vits-svc的Jupiter就实现了这个功能。
Hello, I was wondering if you reached any groundbreaking quality with your method or at least the same quality as the many-to-many in the original vits demo , and if you can share the results
P.S about the architecture , I also thought about any-to-any vits before , it was to extract speaker embeddings from a pretrained speaker encoder like escapa tdnn and train on large dataset of so many speakers 1k+ and scale up the parameter.
Thanks in advance
我想请问能否替换掉那个大佐口音的预训练模型?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.