Hi, I have a 15 min recording of a new speaker. I'd like to train assem-vc to perf

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi. I've uploaded the validation loss graph of

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Best way to extend the model to a new speaker about assem-vc HOT 6 CLOSED

maum-ai commented on September 22, 2024

Best way to extend the model to a new speaker

from assem-vc.

Comments (6)

wookladin commented on September 22, 2024

Hello!
Well, I've never done fine-tuning on single-speaker datasets before.
Hence I'm not sure actually fine-tuning the VC decoder will work well.
If fine-tuning does not work well, it would be good to use VCTK or LibriTTS and the single-speaker dataset together.
On the other hand, HiFi-GAN worked well even with finetuning on single-speaker datasets in my experience.

Please refer to my answer and try it. thank you.

from assem-vc.

vishalbhavani commented on September 22, 2024

Hi @wookladin ,

As expected the single speaker fine-tuning for VC decoder resulted in overfitting because of low data. The lowest val loss ~0.6 still seems pretty high. What was the best val loss for your multi-speaker experiments?
GTA finetuning HI-FI GAN with the above model gave surprising results. The loss did not improve with training time. Is it because the decoder itself was not good enough to create decent gta mels?

P.S: I'm trying multi-speaker training now. I'll keep you posted on the results.

from assem-vc.

wookladin commented on September 22, 2024

Hi.

I've uploaded the validation loss graph of the VC decoder at issue #17
Loss converges to around 0.2. It seems like your VC decoder is overfitted.
A multi-speaker setting will help you to avoid overtiffing.
Thank you for sharing the results!
Unfortunately, I am not sure by looking at the loss graph.
Did you hear the logged audios at the validation step?
The perceptual quality at that time seems to be important in judgment.

Thank you!

from assem-vc.

vishalbhavani commented on September 22, 2024

Hi @wookladin ,
Multi-speaker training solved the overfitting problem as expected. Logged audios in vocoder training also sound good after that. Thanks

from assem-vc.

kannadaraj commented on September 22, 2024

@vishalbhavani Thanks for the confirmation that it worked in your case. Did you warm start pre-trained model with the new speaker or did you train from scratch?

from assem-vc.

vishalbhavani commented on September 22, 2024

I started with the pre-trained model

from assem-vc.

Recommend Projects

Best way to extend the model to a new speaker about assem-vc HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent