Comments (11)
No, everything you did seems ok to me. Could you upload code to https://gist.github.com/ or whatever so that I can reproduce?
from nnmnkwii.
This is the gist link, https://gist.github.com/attitudechunfeng/58a052f18f6aa24235000cc50618e887. There're 2 files, nn.py is the training phase of the DNN based model and nn_synth.py is the test phase.
from nnmnkwii.
Thank you. I can reproduce and here is the fix:
https://gist.github.com/attitudechunfeng/58a052f18f6aa24235000cc50618e887#file-nn_synth-py-L181
replace
wavfile.write('1.wav', rate=fs, data=waveform)
with
wavfile.write('1.wav', rate=fs, data=waveform.astype(np.int16))
By the way, I noticed you save models as torch.save(model, "model.pkl")
, but I think the recommended way to save model in pytorch is to use torch.save(model.state_dict(), "model.pth")
. See https://discuss.pytorch.org/t/how-to-save-load-torch-models/718 for details.
from nnmnkwii.
It works! I will adopt your advice about saving model, finally, many thanks!
from nnmnkwii.
You are welcome. Feel free to open new issues if you have any other problems.
from nnmnkwii.
Another question, if i want to synthesis arbitrary text, how do i generate the lab file. I compared the hts full-context lab file and the one in the example. They're almost the same except each line in the example has an '[2-6]' at the end position. How could i unify them?
from nnmnkwii.
Generating full-context labels (which requires language-dependent text processor) is out of scope of the library. For this purpose, you can follow Merlin's guide to generate labels. https://github.com/CSTR-Edinburgh/merlin/tree/master/egs/build_your_own_voice/s1#prepare-labels.
See also CSTR-Edinburgh/merlin#28.
from nnmnkwii.
I have met another question when i tried to adapt the rnn and dnn model to my own dataset. First, i generate full-text labs using the hts-engine front end tools, then normalize them to phone and state alignment files, and finally using the prepare_feature script to extract features. But during the phase of training acoustic model, errors occured. With dnn part, it warns as follows:
Traceback (most recent call last): File "nn.py", line 247, in <module> X_min[ty], X_max[ty], Y_mean[ty], Y_scale[ty], utt_lengths[ty]) File "nn.py", line 214, in train for x, y in dataset_loaders[phase]: File "/disk2/wangcf/TTS/siri/virsiri/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 201, in __next__ return self._process_next_batch(batch) File "/disk2/wangcf/TTS/siri/virsiri/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch raise batch.exc_type(batch.exc_msg) IndexError: Traceback (most recent call last): File "/disk2/wangcf/TTS/siri/virsiri/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/disk2/wangcf/TTS/siri/virsiri/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in <listcomp> samples = collate_fn([dataset[i] for i in batch_indices]) File "nn.py", line 127, in __getitem__ x, y = self.X[idx], self.Y[idx] File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/datasets/__init__.py", line 371, in __getitem__ return self._getitem_one_sample(frame_idx) File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/datasets/__init__.py", line 362, in _getitem_one_sample return frames[frame_idx_in_focused_utterance] IndexError: index 944 is out of bounds for axis 0 with size 880
and for the rnn part, it warns a padding size problem, but the code have implied to pad to the max utt length.
Traceback (most recent call last): File "rnn.py", line 118, in <module> Y_mean[typ], Y_var[typ] = meanvar(Y[typ]["train"], utt_lengths[typ]["train"]) File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/preprocessing/generic.py", line 328, in meanvar for idx, x in enumerate(dataset): File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/datasets/__init__.py", line 245, in __getitem__ return self._getitem_one_sample(idx) File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/datasets/__init__.py", line 234, in _getitem_one_sample len(x), self.padded_length)) RuntimeError: Num frames 1576 exceeded: 1546. Try larger value for padded_length.
any ideas about that?
from nnmnkwii.
IndexError: index 944 is out of bounds for axis 0 with size 880
Did you make sure that you did valid indexing?
RuntimeError: Num frames 1576 exceeded: 1546. Try larger value for padded_length.
Did you try to set large padded_length
, e.g. 2000?
from nnmnkwii.
i've figured this problem by increasing the padded_length. However, the audio trained on my own data performs poor quality. I didn't change the training parameters. Any advices to improve the quality? ps: my dataset is 16khz
from nnmnkwii.
Hard to say much without seeing code and data, but I guess:
- You have small amount data
- Alignments are failing
- Linguistic features you use are not suited for your data. If you are using https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/questions/questions-radio_dnn_416.hed for linguistic feature extraction, it may not work well for non-radio sentences.
Also, training parameters should be turned for you data.
from nnmnkwii.
Related Issues (20)
- espeak support HOT 1
- Failed to install nnmnkwii to Google Colaboratory. HOT 2
- bandmat problem when installing nnmnkwii v0.0.21 on python 3.8.2 HOT 1
- Frame shift is 100ns, not microseconds HOT 1
- UnicodeDecoreError in nnmnkwii.io.hts.load on Windows HOT 2
- Building tts for cmu indic (Panjabi/Punjabi) language
- nnmnkwii/nnmnkwii/paramgen/_mlpg.py HOT 2
- "ERROR: Could not build wheels for nnmnkwii which use PEP 517 and cannot be installed directly" HOT 3
- Planning to attempt porting it to native C++ HOT 4
- Fix AppVeyor failures
- Python 3.10 incompatibility. Solution - remove requirement of numpy<v1.20.0
- Use pytest instead of nose HOT 1
- Different multi speaker dataset HOT 1
- Test failure on CI HOT 1
- Rounding error for the number of frames HOT 1
- Documentation for nnmnkwii.io.hts is out of date HOT 3
- Support JVS as Dataset HOT 9
- Bug in parameter generation HOT 4
- How to reproduce lab files for jsut? HOT 5
- Future warning from sklearn HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nnmnkwii.