Comments (8)
Do a git pull.. it should be there!
Will also confirm at the prompt when you train:
from alltalk_tts.
Ah, sorry, yes I did misunderstand your question. So you are on about the the ratio of the % split between evaluation and training data. Its currently set at 15% for evaluation and the remaining 85% is therefore set as training data. I can push a setting into the interface to allow you to adjust that, if I'm now getting your question correct?
from alltalk_tts.
You want me to introduce...
correct?
from alltalk_tts.
All the actual samples generated in Step 1 (Whisper splitting the original sample) are passed into the training and used in Step 2 (actual training). Its just that with the voice generation at the end (Step 3), you need something longer than 6 seconds to properly generate TTS (it wants a 6+ second long sample). So what's actually occurring at Step 3, all voice samples shorter than 7 seconds (just to be sure) are not being displayed, or copied over (Step 4/what to do next) alongside the model, as those shorter clips would be useless to put in your "voices" folder. I hope that makes sense, even I had to read it twice and I wrote it.
On more general point of having more/longer voice samples at the end, a few people have told me (so anecdotal) that the Whisper 2 model is splitting sentences better, both in how it cuts the sample wav's and the overall length. I've not enough time on my hands yet to fully test this, however I have made a note on the documentation for finetuning https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-finetuning-a-model
As a side note, many people seem to think that the Whisper v2 model (used on Step 1) is giving better results at generating training datasets, so you may prefer to try that, as opposed to the Whisper 3 model.
You may wish to try the Whisper 2 model (another 3GB download) and see how that fares for you.
from alltalk_tts.
As we passed another message or two, Im assuming Ill be ok to close this. But feel free to reply if you want to know something else on this tickets topic. Thanks
from alltalk_tts.
But feel free to reply if you want to know something else on this tickets topic. Thanks
You've misunderstood the ticket, sorry. I'm not talking about whisper incorrectly cutting up voice samples, im talking about those samples simply not existing. A minor character in a single episode of a tv show, for instance, that never performed again.
The computer in star trek is another example. There exists only a few minutes of the computers voice lines through the whole show, and only fraction of those that are clean without sirens and whatnot going in the background.
Now, the actual problem.
Whisper breaks down the voice samples, yes, but it populates those samples into two separate datasets.
The training dataset, and the evaluation dataset.
These two sets are intentionally separated, as otherwise the model can just train to the test, so to speak, but ultimately only being able to reproduce those exact samples, as it overfits.
The training and eval datasets are saved in finetune->tmp-trn as CSV files, and are not cross pollinated.
This behavior is absolutely correct for large voice sample sizes, but for something like a videogame character that only has a few minutes of spoken lines tops, this can cause problems as you have insufficient training material. The ability to choose to lower the amount of clips dedicated to evaluation and raise the number set for actual training would be a welcome feature.
In addition, it allows you to quickly and automatically add all possible voice clips to training as a final run, so overfitting is minimized, but training data is maximized.
from alltalk_tts.
YES! Exactly that!
from alltalk_tts.
Absolute legend.
from alltalk_tts.
Related Issues (20)
- Finetune Step 2 Error: PytorchStreamReader failed reading zip archive: failed finding central directory HOT 14
- Ability to easily switch between different finetuned models in standalone mode HOT 5
- Conqui v2 2.0.3 sounding better somehow. HOT 1
- Finetuning Step 2 does not generate best_model.pth HOT 7
- Specify v1/completions source (user/bot/api etc.) HOT 2
- 2000 character string limit HOT 21
- Ja language fine tuning is not possible HOT 13
- Provide an OpenAI TTS conforming api HOT 5
- Ja language finetune doesn't work, but En works. RecursionError and PermissionError. HOT 3
- Trouble with standalone's Deepspeed Setup through atsetup.sh HOT 5
- my CUDA version won't show as 11.8 no matter what I do after using nvcc --version cmd. HOT 1
- AllTalk on Fedora & setting up DeepSpeed
- Cookie Blocker Causing Problems? HOT 1
- Streaming mode not working on Firefox HOT 2
- (Support) Streaming to Unity HOT 2
- There's a problem with the bulk generator. HOT 2
- Feature request: I would like a bulk generator checkbox to also split on new lines. HOT 3
- Impossible to install CUDA Toolkit - Docker HOT 4
- Crash. HOT 5
- AllTalk v1.9c: DeepSpeed Installation Error HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alltalk_tts.