Comments (14)
I'm afraid the problem lies in the usage of statistical values of mean/variance for mel-spectrograms. I can obtain your results by using inaccurate mean/variance. I have uploaded my mean/variance in 'mel_stats', and one example to perform VC, i.e., convert_example.py, you can have a try, my converted results are attached FYI.
converted.zip
from vqmivc.
great discovery!
your converted results is good.
I'll try to convert the results with the ‘mel_stats’ you provided.
one last question, how did you generate this 'mel_stats' ? Is this file the same file as the mel_stats.npy extracted during doing preprocess.py ?No, the 'mel_stats' is the one used to train Parallel WaveGAN, it is the same as 'stats.h5' inside 'vocoder' directory of pre-trained models. Besides, I experimented several times by using the 'mel_stats' produced by 'preprocessing.py' to generate wavs, it also worked well for me. Maybe you can compare your 'mel_stats' with the provided 'mel_stats' to see if your results are correct.
Excellent!!!
You are right. The key to the problem is the mean and scale.
The reason for the difference between the two of us is that the mel_stats.npy I used is inconsistent with your training vocoder. I used the mel_stats.npy generated by the vctk datasets I processed.
Thank you very much for your guidance!
from vqmivc.
Hi, did you install ParallelWaveGAN? Installing it should fix your issue, I wiil add this point to README.
from vqmivc.
Hi, did you install ParallelWaveGAN? Installing it should fix your issue, I wiil add this point to README.
ok, thanks, I will try it.
from vqmivc.
Like you said, I can run it now. the voice conversion is work.
But there is an another problem about synthesis results. we can see it together:
the vocoder reproduce the src_wav (p225_022.wav) to src_gen(p225_038_ref_gen.wav), the effect is bad.
the effect of other wavs reproduced from the vocoder model is bad as p225_038_ref_gen.wav.
I'm curious if there are some operation steps of mine is wrong.
Have you encountered the same problem?
from vqmivc.
Do you mean you feed the original mel-spectrograms into ParallelWaveGAN to generate waveform? Have you normalized the mel-spectrograms (mean-variance normalization) before the feeding?
from vqmivc.
yes.
I just run your convert.py to get all the results.
I see the normalization progrecess at the method extract_logmel.
Didn't you encounter the same problem while performing the convert.py?
from vqmivc.
I haven't encountered this issue before, could you mind uploading some audios for listening?
from vqmivc.
of courese.
demo.zip
from vqmivc.
I think this issue is funny.
can you upload some audios reprodcuing from the open convert.py and pretrain_model?
I would love to know how our results are different.
from vqmivc.
great discovery!
your converted results is good.
I'll try to convert the results with the ‘mel_stats’ you provided.
one last question, how did you generate this 'mel_stats' ? Is this file the same file as the mel_stats.npy extracted during doing preprocess.py ?
from vqmivc.
Hi,
Thank you for sharing this work.
I tried your convert_examply.py with your pretrained model and audio inside test gets converted. But when I pass a different audio wav file I get following error saying buffer has wrong number of dimensions. Please help.
PS C:\Users\Saurav\Desktop\cap\VQMIVC> python convert_example.py -s test_wavs/aayush.wav -r test_wavs/didi.wav -c converted -m checkpoints/useCSMITrue_useCPMITrue_usePSMITrue_useAmpTrue/VQMIVC-model.ckpt-500.pt Traceback (most recent call last):
File "convert_example.py", line 121, in
convert(args)
File "convert_example.py", line 92, in convert
ref_mel, _ = extract_logmel(ref_wav_path, mean, std)
File "convert_example.py", line 49, in extract_logmel
f0, timeaxis = pw.dio(wav.astype('float64'), fs, frame_period=frame_period)
File "pyworld/pyworld.pyx", line 93, in pyworld.pyworld.dio ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
from vqmivc.
great discovery!
your converted results is good.
I'll try to convert the results with the ‘mel_stats’ you provided.
one last question, how did you generate this 'mel_stats' ? Is this file the same file as the mel_stats.npy extracted during doing preprocess.py ?
No, the 'mel_stats' is the one used to train Parallel WaveGAN, it is the same as 'stats.h5' inside 'vocoder' directory of pre-trained models. Besides, I experimented several times by using the 'mel_stats' produced by 'preprocessing.py' to generate wavs, it also worked well for me. Maybe you can compare your 'mel_stats' with the provided 'mel_stats' to see if your results are correct.
from vqmivc.
Hi,
Thank you for sharing this work.
I tried your convert_examply.py with your pretrained model and audio inside test gets converted. But when I pass a different audio wav file I get following error saying buffer has wrong number of dimensions. Please help.PS C:\Users\Saurav\Desktop\cap\VQMIVC> python convert_example.py -s test_wavs/aayush.wav -r test_wavs/didi.wav -c converted -m checkpoints/useCSMITrue_useCPMITrue_usePSMITrue_useAmpTrue/VQMIVC-model.ckpt-500.pt Traceback (most recent call last):
File "convert_example.py", line 121, in
convert(args)
File "convert_example.py", line 92, in convert
ref_mel, _ = extract_logmel(ref_wav_path, mean, std)
File "convert_example.py", line 49, in extract_logmel
f0, timeaxis = pw.dio(wav.astype('float64'), fs, frame_period=frame_period)
File "pyworld/pyworld.pyx", line 93, in pyworld.pyworld.dio ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
It seems your 'wav' has two channels, 'pw.dio' can only process the data of single channel, using the data of single channel as input of 'pw.dio' should fix your issue.
from vqmivc.
Related Issues (20)
- Docker deploy keeps saying {"message":"Could not convert file input input_source to Path"}
- Error: ValueError: Buffer has wrong number of dimensions (expected 1, got 2) HOT 2
- How to slove this problem? HOT 9
- In convert.py subprocess.call(['cp', src_wav_path, out_dir]) What does' CP 'mean? HOT 2
- What is the "parallel-wavegan-decode" in cmd = ['parallel-wavegan-decode', '--checkpoint',...] ,it is a folder??? HOT 2
- what does gamma: 0.5 in config/training/cpc.ymal mean? HOT 3
- How to solve this problem? HOT 2
- Mel stats and Vocoder HOT 2
- How to solve this problem? HOT 3
- lf0 question about convert phase HOT 3
- Training Loss Abnormal HOT 3
- What do z_dim and c_dim stand for? HOT 4
- Training for Indian Multi-Speaker/Multi-lingual VC HOT 1
- voice conversion not happens after fine-tuned with pretrained model HOT 1
- Traiining loss plunged HOT 1
- Hugging Face App is broken
- huggingface is broken HOT 1
- Questions about the evaluation HOT 2
- Question about different embeddings/representations HOT 2
- problem have slove
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vqmivc.