Diff-SVC Refactor (Inference, training and model code simplifier and updated from RCell's Diff-SVC)
diffsvc implemented based on DiffSinger unofficial repository
It is still under development and testing, training and inference code are fully completed
The conclusion of the temporary test is that when the number of people in the data set is too large (for example, 60 or 70 people), the sound leakage will be aggravated, and the sound leakage of about 5 people is basically the same as that of a single person
At present, you can see that there are a lot of branches, all of which are various solutions under testing \
Realize singing voice timbre conversion based on Diffsinger + softvc. Compared with the original diffsvc repository, this repository has the following advantages and disadvantages
- Supports multiple speakers
- This repository is based on the unofficial diffsinger repository, and the code structure is simpler and easier to understand
- The vocoder also uses 441khz diffsinger community vocoder
- Acceleration is not supported
Pre-downloaded files
- softvc hubert (hubert-soft-0d54a1f4.pt) is placed in the hubert directory
- 441khz diffsinger community vocoder (model) is placed in the hifigan directory
You only need to put the dataset into the dataset_raw directory with the following file structure
dataset_raw
├───speaker0
│ ├───xxx1-xxx1.wav
│ ├───...
│ └───Lxx-0xx8.wav
└───speaker1
├───xx2-0xxx2.wav
├───...
└───xxx7-xxx007.wav
Basically similar to sovits3.0
- Resampling
python resample.py
- Automatically divide training set, validation set and test set
python preprocess_flist_config.py
- Generate hubert, f0, mel and stats
python preprocess_hubert_f0.py && python gen_stats.py
After executing the above steps, the dataset directory is the preprocessed data. You can delete the dataset_raw folder,
or delete the temporary wav file after resamplingrm dataset/*/*.wav
python3 train.py --model naive --dataset ms --restore_step RESTORE_STEP