Coder Social home page Coder Social logo

Using with Tacotron2 about lpcnet HOT 80 OPEN

xiph avatar xiph commented on July 24, 2024
Using with Tacotron2

from lpcnet.

Comments (80)

dalvlv avatar dalvlv commented on July 24, 2024 2

@carlfm01 Hi, I have solved my problem caused by data type transfer. Thank you all the same for your kind help.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024 1

@carlfm01 thank you so much 🙏. Wish you all the best. i will text u again when i get other questions

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024 1

And share your results! 👍

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024 1

@carlfm01 thank you so much for help again !

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024 1

@carlfm01 everything worked fine. Thank you so much.
Here is the attention plot:
step-30000-align

and here are the samples from our tacotron2 + lpcnet:

1011.zip

hope you have a great day !

from lpcnet.

m-toman avatar m-toman commented on July 24, 2024 1

There are at least two reasons. For one there are usually sampling processes involved in the models. And probably the bigger issue is that Tacotron uses dropout at inference time. In the paper this is touted as advantage to create more variation. But in fact it seems to not work really well without it.
You can find some workarounds flying around but they also come with their own disadvantages.

Edit; don't know if your implementation uses one of those workarounds.
But see for example here
https://github.com/Rayhane-mamah/Tacotron-2/blob/ab5cb08a931fc842d3892ebeb27c8b8734ddd4b8/tacotron/models/modules.py#L247

from lpcnet.

superhg2012 avatar superhg2012 commented on July 24, 2024

I am using Tacotron2 to predict 20 dim features for LPCNet. But there is noize in the synthesized audio.

from lpcnet.

lyz04551 avatar lyz04551 commented on July 24, 2024

我正在使用Tacotron2来预测LPCNet的20个暗淡特征。但合成音频中存在噪音。

Is there any way to improve the sound quality?

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@superhg2012 I get the same problem, did you solve it?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

I've tried with current master of tacotron2 and LPCTron but failed.

With an adaption of my fork using the correct hparams I'm generating high quality speech
audios.zip

My fork with spanish branch + MlWoo adaption of LPCNet, you need to change your path and symbols, see the commit history:
https://github.com/carlfm01/Tacotron-2/tree/spanish

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm in your fork, could you let me know how to generate wav from f32 feature? and is it as same speed as original LPCNet?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

how to generate wav from f32 feature? and is it as same speed as original LPCNet?

The tacotron repo is to predict the feature not the wav, to generate the wav with the predicted feature by tacotron, you need to use https://github.com/mlwoo/LPCNet fork

And for me, using sparsity of 200 is 3x faster than real time with AVX enabled

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 I tried https://github.com/mlwoo/LPCNet fork already, but it generates wav too much noise, as I described in MlWoo#6. How did you solve this problem? any suggestions please?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Noise using predicted features by tacotron or using the real features?

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 using the real features. so I converted real wav -> (by ./dump_data) s16 -> (./test_lpcnet) f32 -> (by ffmpeg) wav, as explained in MlWoo's repo. It is supposed to convert the f32 back to original wav, but noise is severe (it contains original voice though). Have you experienced this? When you used MlWoo, were speed and audio quality both perfect? If yes, What did you modify from MlWoo's code? Thank you so much for help.

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

were speed and audio quality both perfect

Yes.

What did you modify from MlWoo's code?

Nothing.

My only guess is that may you made a mistake compiling your exported weights?

#58 (comment)

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Using MlWoo's fork:
feature.zip

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 Thanks. Let me explain what i did so far in detail.

so now, I have to repositories : LPCNet (original LPCNet repo), LPCNet_MlWoo.

I trained LPCNet and got the nnet_data_* files in LPCNet/src directory. And I moved all of them to LPCNet_MlWoo/src, because when I tried './dump_lpcnet.py lpcnet15_384_10_G16_64.h5' (in LPCNet_MlWoo repo), it didn't work (because of some weird model shape error.). (lpcnet15_384_10_G16_64.h5 model was generated in original LPCNet repo)

and I ran i just ran 'make dump_data taco=1' and 'make test_lcpnet taco=1' .

Do you think these make sense? (I didn't change any parameter of LPCNet and LPCNet_MlWoo)

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

model was generated in original LPCNet repo

Thats the issue, I'm afraid you need to retrain using MlWoo fork, I did not trained with LPCNet(this repo)

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 but Is there any difference between MLWoo's LPCNet training code and original LPCNet's LPCNet training code? aren't they exactly same?

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 so you did everything (such as train LPCNet and inference the audio and etc) in MLWoo's repo, right? Which hyperparameters/options did you change?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

but Is there any difference between MLWoo's LPCNet training code and original LPCNet's LPCNet training code? aren't they exactly same?

No, otherwise you will be able to load models from both. I also tried and throw an error about a missing layer or an extra layer, I can't recall. The inference code is also different.

so you did everything (such as train LPCNet and inference the audio and etc) in MLWoo's repo, right?

Yes, default.

The only thing that I changed was the training code to load checkpoints and adapt on new data.

This is missing on LPCNet_MlWoo

https://github.com/mozilla/LPCNet/blob/master/src/train_lpcnet.py#L106-L125

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 okay, thank you so much. I will try. and you are insisting that when merging Tacotron2 + LPCNet, I better use your spanish fork for tacotron2 right?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

@carlfm01 okay, thank you so much. I will try. and you are insisting that when merging Tacotron2 + LPCNet, I better use your spanish fork for tacotron2 right?

Yes, just change your paths and symbols, see the commit history to understand better. I've tried LPCTron and the tacotron master but both failed generating noisy speech.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 Hi, I followed all your instructions (re-train from MlWoo's repo) and now I've trained 6 epochs for test. the original wav is about 3 seconds long, but generated audio is about 8 seconds long. Have you experienced this problem?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Hello, no, I'm getting the same duration. Is it from real features?

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 yes real features. Also I did './test_lpcnet ~.h5' well. This issue is strange.... I'll take a look more. thanks !

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 Are sample rate, precision, sample encoding of your training wav files 16000, 16bit, 16-bit singed integer pcm?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Please make sure using make test_lpcnet taco=1 if you extracted the features with taco enabled on the ./dump_data, or disable taco for both

Yes, 16000, 16bit, mono

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 I just ran both 'make dump_data taco=1' and 'make test_lpcnet taco=1', so they are both up-to-date.

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

What about quality? You get the same result cleaning and testing without taco? please also make sure you do make clean .

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 If i want to do them without taco, should I do 'make dump_data' and 'make test_lpcnet' instead of 'make dump_data taco=1' and 'make test_lpcnet taco=1' ?

and yes, I think I did make clean

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

If i want to do them without taco, should I do 'make dump_data' and 'make test_lpcnet' instead of 'make dump_data taco=1' and 'make test_lpcnet taco=1' ?

Yes.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 It works now. incredible. The problem was that I didn't do make clean at very first step. Generated audio samples are extremely clean and inference speed is much faster than realtime. I will upload test results in few minutes here. Only suspicious thing is that this works perfectly even with 6 epochs training .... Thank you so much

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 Here are the result samples. If there is something strange, please let me know!
samples.zip

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 As you told me yesterday, I have to use https://github.com/carlfm01/Tacotron-2/tree/spanish for tacotron2 training.

But when I save the f32 features into npy file, do I really need to resize it? Why can't it be just reshape in here?

mel_target = np.fromfile(os.path.join(self._mel_dir, meta[0]), dtype='float32')
mel_target = np.resize(mel_target, (-1, self._hparams.num_mels))

Why can't it be just
mel_target = np.fromfile(os.path.join(self._mel_dir, meta[0]), dtype='float32')
mel_target = np.reshape(mel_target, (-1, self._hparams.num_mels)) ?

And which one did you use?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

@carlfm01 Here are the result samples. If there is something strange, please let me know!
samples.zip

Sounds really good.

It was the recommendation from MlWoo's Readme

https://github.com/mlwoo/LPCNet

Since reshape and resize uses different behavior I don't know the implications of changing to reshape.

mel_target = np.fromfile(os.path.join(self._mel_dir, meta[0]), dtype='float32')
mel_target = np.resize(mel_target, (-1, self._hparams.num_mels))

And which one did you use?

resize.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 thank you. I have one last question ...
Is it not possible to generate 2 pitch parameters from spectrogram only through signal-processing (not machine learning)? If it is possible, we can just train tacotron2 to output spectrogram (setting of original paper), convert to 18 BFCC + 2 pitch parameters through signal processing, and then generate wav. (I know current objective is to train tacotron2 to output f32 features for now)

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Sorry, applying changes by signal-processing is out of my knowledge.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 okay, thanks!

from lpcnet.

dalvlv avatar dalvlv commented on July 24, 2024

@carlfm01 Hi,I‘m back. I'm trying TTS. I use taco2 to predict 20 dim features and then trans 20dim to 55dim .f32 features with zeros padding。 But the synthesis audio with lpcnet is all silence. Do you know why and how to calculate 55dim features from 20dim predicted features? Thank you.

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

to 55dim .f32 features with zeros padding

Hi, why do you want to do that? If you enabled taco=1 you don't need 50d but 20d, if you are using my fork the generated 20d are ready to feed into LPCNet with taco=1 enabled.

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Ohhh I see @dalvlv your are no the same, please read the conversation about my fork.

from lpcnet.

dalvlv avatar dalvlv commented on July 24, 2024

@carlfm01 OK, let me have a try. Could you give a link of your forked repo?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

I've tried with current master of tacotron2 and LPCTron but failed.

With an adaption of my fork using the correct hparams I'm generating high quality speech
audios.zip

My fork with spanish branch + MlWoo adaption of LPCNet, you need to change your path and symbols, see the commit history:
https://github.com/carlfm01/Tacotron-2/tree/spanish

@dalvlv please read the whole thread

from lpcnet.

dalvlv avatar dalvlv commented on July 24, 2024

@carlfm01 I try MlWoo repo to synthesis audio with taco2 predicted features. But it has too much noise. I use my own trained lpcnet model . Do I need retrain lpcnet using this repo? Or maybe my taco2 train has some problem?
test-out.zip

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

You need to retrain both

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Hello @dalvlv, any news?

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

Hello @carlfm01 , how are you? hope you are doing fine. I have a simple quick question: How many epochs did you train your fork of tacotron2 (maybe this one? https://github.com/carlfm01/Tacotron-2/tree/spanish) ? LPCNet is okay now, but sound quality of trained Tacotron2+LPCNet is very poor. :(

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Hello @carlfm01 , how are you?

Really good thanks,

How many epochs did you train your fork of tacotron2

About 47k steps.

but sound quality of trained Tacotron2+LPCNet is very poor

Did you make sure using 16KHz 16bit mono while removing the headers and extracting the features?

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 yes we are using 16kHz 16bit mono for training. seems strange.... num-mel of features (for LPCNet input) are all 20 right?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

num-mel of features (for LPCNet input) are all 20 right

Yes, all the hparams are correct, can you share an example of the audio?

Can you test this audio file for inference?
https://gist.github.com/carlfm01/5d6ad719810412934d57bdbe1ce8b5b6

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

1010.zip
@carlfm01 here are the samples.
also, all hyperparameters should be exactly same with everything with your code, right?
Do you mean to try this code https://gist.github.com/carlfm01/5d6ad719810412934d57bdbe1ce8b5b6 for inference?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Do you mean to try this code https://gist.github.com/carlfm01/5d6ad719810412934d57bdbe1ce8b5b6 for inference?

Yes, for some voices it gets better

Your audio sounds like it is an attention issue from tacotron 2, can you share the attention plot?

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

step-32500-align

Here's my alignment plot as reference for 32k steps

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 we are not outputting attention alignments for now, so I will let you know again when we finish re-training tacotron2.

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Nice! Sounds good, I think it can be better by trimming silence and play around with your lr and lr decay, or scheduled mode(I want to try it soon)

Here's a new speaker using the old one.
voice adapt.zip

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 Thanks for sharing samples! but what do you mean by 'new speaker' and 'voice adapt'? Could you please explain in more detail?

Also, Have you tried multi-speaker Tacotron2 + LPCNet? (I've been trying on single-speaker Tactron2 + LPCNet so far, which eventually works fine) Did it work well?

from lpcnet.

Maxxiey avatar Maxxiey commented on July 24, 2024

@byuns9334 hi, in hparams.py(carlfm01's Tacotron2 repo) U may find this:

tacotron_fine_tuning = False

and according to its comment:

#Set to True to freeze encoder and only keep training pretrained decoder. Used for speaker adaptation with small data.

I don't have time to give it a try, but i think it could save you a lot of time when you already have a good model by fine tuning a few layers of the network,

hope it helps :D

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

but what do you mean by 'new speaker' and 'voice adapt'? Could you please explain in more detail?

Means fine-tuning a trained model on new speaker voice.

Also, Have you tried multi-speaker Tacotron2 + LPCNet?

No

tacotron_fine_tuning = False

The code to stop the gradient is broken, if you try to fine tune a model saved with fine_tuning=True it will fail. It needs a review :)

from lpcnet.

dalvlv avatar dalvlv commented on July 24, 2024

@carlfm01 Hi, I spent a holiday and have come back.

  • Here is the sample of synthesis audio using taco2+lpcnet. Sounds very good.

synthesis_sample.zip

  • I have a doubt : model loading on GPU costs much time when use taco2. It's about 6 seconds every time. Do you have any idea to optimize that?

  • The next, I will focus on the following improvements:

  1. Taco2 sythesis speed. For now, I think the taco2 sythesis speed is more slow than lpcnet and I want to make it fast.
  2. Multi-speaker sythesis. I hope to make different synthesis voices but don't use much train data.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@Maxxiey thanks, I will give it a try and tell you about the result

@carlfm01 okay I understand. I have some questions:

  1. How much time does it take to adapt to new voice? (time for training and inference)

  2. Is adapting to several people's voices possible? (like 2~3 people, or 100 people)

Thanks!

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

Hello, sorry for the delay.

* Sounds very good.

synthesis_sample.zip

Yeah, sounds good.

* Do you have any idea to optimize that?

Code your own code to load the model once a reuse session?

1. Taco2 sythesis speed.  For now, I think the taco2 sythesis speed is more slow than lpcnet and I want to make it fast.

I think if you want to make it fast you need to use C++ and take advantage of the optimizations.

1. How much time does it take to adapt to new voice? (time for training and inference)

Inference will be the same as a single speaker and adaption take about less than 10k steps, depends on your data.

2\. Is adapting to several people's voices possible? (like 2~3 people, or 100 people)

Adapting with 2-3 works fine. Don't know adapting with more speakers.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 thanks!

What do you think of the reason LPCNet is so fast and memory-light? I've been thinking about this, and also want to know your opinion.
Thanks!

from lpcnet.

carlfm01 avatar carlfm01 commented on July 24, 2024

What do you think of the reason LPCNet is so fast and memory-light?

Read section 3.5 Sparse Matrices :https://jmvalin.ca/papers/lpcnet_icassp2019.pdf

from lpcnet.

Maxxiey avatar Maxxiey commented on July 24, 2024

I found something strange and interesting, for the same sentence, the wav files generated sounds slightly different, you may not understanding Chinese Mandarin but the durations of some syllables varied quite a lot. Attached are some samples my model generated.

My understanding is that during the inference phase of Tacotron-2 and LPCnet there shouldn't be any random elements involved. So how does one explain the differences between these wav files?

samples.zip

Is there anyone who have the same situation? Does that mean the model I trained is unstable? Oh, taoctron now runs 200k steps (I use this repo https://github.com/carlfm01/Tacotron-2/tree/spanish) and LPCNet runs just 10 epochs (https://github.com/MlWoo/LPCNet).

from lpcnet.

snakers4 avatar snakers4 commented on July 24, 2024

Hi guys,

I tried to read the majority of the above posts, and on the surface it seems that just using dump_data with Tacotron with or without re-training the vocoder.

But do you know, if there exists a fully python version of dump_data?
Our dataset is very big and usually it works much better when you sample data on-the-fly.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 Hi, how are you? :)

Have you tried training LPCNet with data which has audios from multi-speakers, not single-speaker? I've tried this for 120 epochs and the output quality is so bad.

Also... when merging with tacotron2, do you have any idea to train LPCNet with several features files (f32), not just one f32 file? we want to train LPCNet on features that are generated from tacotron2. and you know f32 from concatenation of wav file is not just concatenation of f32 of each wav, so we are trying to figure out how to train with multiple f32 files.

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@carlfm01 when I tried to train LPCNet with much more huge data than before (like, 20G), it fails because 'can't reshape into (x, 55)' problem.

And the reason is: src/train_lpcnet.py, when the data is small enough, the f32 feature is truncated by 'features = features[:nb_framesfeature_chunk_sizenb_features]', where 'nb_frames = len(data)//(4pcm_chunk_size)', so it is possible to reshape features into (nb_framesfeature_chunk_size, nb_features)) shape, as written in line 89 of src/train_lpcnet.py

However, when data size is extremely long, 'nb_framesfeature_chunk_sizenb_features' value is much larger thatn len(featues), so feature doesn't get truncated at all. So, it always fails to reshape f32 file.

I think you must have experienced this problem when you tried to train LPCNet with very large audio set. Could you please share how you solved this?

I guess you are so busy these days so it's okay not to reply if you don't have time.

Thanks!

from lpcnet.

byuns9334 avatar byuns9334 commented on July 24, 2024

@zshakeri i used Ubuntu 16.04

from lpcnet.

Maxxiey avatar Maxxiey commented on July 24, 2024

@byuns9334 Hi, I noticed that your have successfully trained a famale voice, did you change any parameter in any of Tacotron or LPCNet? I keep them unchanged but get a voice with poor quality, you can check the samples I uploaded. Any suggestion will be appreciated and thanks in advance.
samples.zip

from lpcnet.

ysujiang avatar ysujiang commented on July 24, 2024

I'll very thanks if anyone can help me. Looking forward to your reply。
I tried https://github.com/carlfm01/Tacotron-2/tree/spanish (100k)+https://github.com/MlWoo/LPCNet(10epochs) . No parameters changed ,but the generated samples sound not good .here are my samples and trained steps.
for LPCnet,

  1. used concat.sh to obtained input.s16
    2.make clean
    make dump_data
    ./dump_data -train input.s16 features.f32 data.u8
  2. python train_lpcnet.py features.f32 data.u8

for tacotron2:
1.make clean && make dump_data taco=1
2../header_removal.sh
./feature_extract.sh
3.Convert the data generated which has .f32 extension to what could be loaded with numpy. and replaced the .npy file in the audio.
4.python train.py

when synthesis
for tacotron 2 , get .f32 file---f32_for_lptnet.f32
for LPCnet
changes test_lpcnet,py --- model.load_weights('model_loss-2.847_120_.hdf5')
make clean && make test_lpcnet taco=1
./test_lpcnet f32_for_lptnet.f32 test.s16
ffmpeg -f s16le -ar 16k -ac 1 -i test.s16 test-out.wav

sample.zip

Is there anything wrong with my process? and why the sample sound not good.How can I adjust my process?

from lpcnet.

cyxomo avatar cyxomo commented on July 24, 2024

the core of issue is that the bark spectrogram used in the LPCNet, however, the mel spectrogram generated by tacotron. So how to get bark spectrogram from mel spectorgram. Or how to caculate LPC by mel.

from lpcnet.

Maxxiey avatar Maxxiey commented on July 24, 2024

@ysujiang seems that your steps are alright, wonder if waves in your train set have the same volumes, care to share a few?

from lpcnet.

ysujiang avatar ysujiang commented on July 24, 2024

@Maxxiey in my train set ,waves have the same volumes,but my synthesized samples are not. synthesized samples also have trill. do you know why? and can you give me some advices to adjust parameters?If possible, I'll thank you very much.Look forward to your reply.
train_sample.zip
synthesized_samples.zip

from lpcnet.

Maxxiey avatar Maxxiey commented on July 24, 2024

Hi, @ysujiang .

Ran some tests on your data and it seems fine to me.

Here is what I do, use header_removal.sh and feature_extract.sh to generate featrues and use test_lpcnet to turn them back into wavs. During the whole process, no warning message appears, so I guess everything works fine. Samples generated are as follow, trills are still there cause my model is trained on my own dataset, but all wavs have almost the same volume.
debug_sample.zip

As for training parameters, I left them untouched and the result came out just fine, so sorry no advice from me. Maybe you should check your .s16 files to see if they have the same volume with the original wavs.

BTW, if it is only the differece volume that troubles you, pydub should do the thick. https://github.com/jiaaro/pydub/blob/master/API.markdown#audiosegmentmax_dbfs

from lpcnet.

ysujiang avatar ysujiang commented on July 24, 2024

@ Maxxiey您好,我听您这个用训练集的数据测试LPCNET的时候也是有颤音的情况,这是LPCNET的问题吗

from lpcnet.

ysujiang avatar ysujiang commented on July 24, 2024

@Maxxiey tanks for your help.
I do the same thing with you . I use the voice of people different from the training set.
1.make clean && make dump_data taco=1
2../header_removal.sh
./feature_extract.sh
I got some feature files(*.f32)
then use test_lpcnet to turn the feature files back to wavs.
make clean && make test_lpcnet taco=1
./test_lpcnet f32_for_lptnet.f32 test.s16
ffmpeg -f s16le -ar 16k -ac 1 -i test.s16 test-out.wav
it does't work well the tremolo is very obvious. I used the same method to test the voice of the training set,it works fine.
Do you know why?Have you changed anything of LPCNET? What should I change?
train_data.zip
test_result.zip

from lpcnet.

wizardk avatar wizardk commented on July 24, 2024

@ysujiang You need to use not only the features extracted directly from wavs but also the features ouput from tacotron2.

from lpcnet.

LqNoob avatar LqNoob commented on July 24, 2024

@ysujiang You need to use not only the features extracted directly from wavs but also the features ouput from tacotron2.

@wizardk What is the result before and after you adopt this method?

from lpcnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.