ntt123 / light-speed Goto Github PK
View Code? Open in Web Editor NEWA modified VITS that utilizes phoneme duration's ground truth for better robustness
License: MIT License
A modified VITS that utilizes phoneme duration's ground truth for better robustness
License: MIT License
Bạn có thể cho tôi xin thông tin liên hệ được không
Hi,
Thanks for your great works!
I'm curious to understand your thought process as a learner. May I ask why you decided to make modifications to the original VITS code?
You mentioned 'robust,' but I'm not quite clear on its exact meaning. Does it refer to the model's performance in different aspects, such as WER (Word Error Rate) or talking speed?
When you talk about 'speech quality,' are you referring to the sound quality of the generated speech? Is it similar to audio quality metrics like PSEQ?
Regarding the 'expanding the receptive field of the Wavenet Flow module' modification, how did you analyze the need for this change, and in what ways does it enhance the quality of synthesized speech?
I noticed that the original VITS was trained using PyTorch, but you chose to rewrite some code in TensorFlow. What motivated this decision? Are there specific advantages or requirements that led to this change in the tech stack?
The engine skips text quite often, sometimes skipping a sentence, sometimes skipping half a paragraph and then reading the next paragraph. Male voice is very natural, if this error can be fixed, it will almost be perfect.
Thanks the author!
Halo!
I am using your public dataset https://huggingface.co/datasets/ntt123/viet-tts-dataset for training
And got this error
2024-04-05 14:36:50: return forward_call(*args, **kwargs)
2024-04-05 14:36:50: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-05 14:36:50: File "/data/light-speed/models.py", line 425, in forward
2024-04-05 14:36:50: z_slice, ids_slice = commons.rand_slice_segments(
2024-04-05 14:36:51: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-05 14:36:51: File "/data/light-speed/commons.py", line 64, in rand_slice_segments
2024-04-05 14:36:51: ret = slice_segments(x, ids_str, segment_size)
2024-04-05 14:36:51: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-05 14:36:51: File "/data/light-speed/commons.py", line 54, in slice_segments
2024-04-05 14:36:51: ret[i] = x[i, :, idx_str:idx_end]
2024-04-05 14:36:51: ~~~^^^
2024-04-05 14:36:51: RuntimeError: The expanded size of the tensor (32) must match the existing size (0) at non-singleton dimension 1. Target sizes: [192, 32]. Tensor sizes: [192, 0]
Have you encountered this error before? Any solution can I get? Is the issue related to naming the dataset
The directory structure of my data is as follows:
Hope your reply!
First thanks for this great repo.
I have a question.
Are you using this viet-tts-dataset ? If so, do you have the preprocessing code before adding it to the training model?
Can you share your model weight?
Thank you
Hi,
This is the greatest TTS project for Vietnamese I have found so far. Thanks for your work.
I have successfully trained this model at 44.1Khz by modifying sampling_rate in config.json (and other factors are the same). However the quality of the inference speech is not good which compared to the 16k version. It includes a lot hissing sound (tiếng rè). Do I need to modify anything else to get the better quality at 44.1khz or anyway to upsample from 16khz to 44.1khz after inferencing?
Any help would be appreciated!!
For testing purposes, I extracted only 200 files (100 pairs) from the VietBibleVox zip data. I then ran the prepare_vbx_tfdata.ipynb notebook, which resulted in the following:
Afterwards, I attempted to run "python3 train.py", but the process repeatedly prints "0it [00:00, ?it/s]" to the screen. I waited for approximately 1 hour before interrupting the process. I believe this is an excessively long time for such a small dataset.
Since the tfrecords files should not be empty, according to the discussion here: #2 (comment), I suspect that something went wrong during the preparation process, but I am unable to identify the specific issue.
My equipments:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.