nipponjo / tts-arabic-pytorch Goto Github PK
View Code? Open in Web Editor NEWTTS models for Arabic (Tacotron2, FastPitch)
TTS models for Arabic (Tacotron2, FastPitch)
Hello
I have a new dataset that I want to train it so I need your advice what should be changed in this repo.
the dataset is egyptian arabic.
Hello,
First and foremost, thank you for an incredible repository that is sorely lacking in a language like Arabic. I have a problem. I got this error when I created a new dataset. this error occur when i have more than 7 line in train_phon.txt. please help.
python train.py
[==============================] 100.0%
[==============================] 100.0%
Epoch: 0
C:\Users\alghr\Dev\Python\tts-arabic-pytorch\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
loss: 48.50394821166992, grad_norm: 27.04533576965332
Traceback (most recent call last):
File "train.py", line 207, in
main()
File "train.py", line 195, in main
training_loop(model,
File "train.py", line 102, in training_loop
writer.add_training_data(loss.item(), grad_norm.item(),
File "C:\Users\alghr\Dev\Python\tts-arabic-pytorch\arabic\utils\logging.py", line 12, in add_training_data
for k, v in meta.items():
AttributeError: 'float' object has no attribute 'items'
Python310\lib\site-packages\torch\serialization.py", line 258, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Hello
thank you for your fast reply
another question My dataset is Egyptian Arabic and the transcript is written in Arabic without diactrization and graphemes so I used your script to convert the Arabic to Buckwalter and then Buckwalter to phonemes. I didn't make any changes to the symbols.
how many epochs should I find a satisfying result at it?
Thank you
Hi,
Amazing Principle and application , but tried to use python app.py with many errors on CPU!!!!
can you please make a sample using
I can help with c# and the web app
this is the lib
pythonnet
Thank you
Good Job! Looking forward to your repo!
I also think that the (adv) version often sounds a bit clearer. but i have some question?
I'm ooking forward to your reply
I am asking about the vocoder should I use the vocoder that you mention like download it or there for new dataset I should change the vocoder
Good afternoon, please describe in more detail the way to train the model from scratch.
If possible, in the form of a notepad
Can I use this repo for training new tts model in another language?
How much hours of audio + transcripts do I need?
Does the text should have diacritical signs?
Hello.
I created my own arabic dataset including texts and voices, what is the right way to fine-tune it, please?
Hello.
There's a file named pitch_dict.pt used in FastPitch training. What is this file and where can I get it?
I'm trying to train with FastPitch to check if the issue in #10 is also present here or it's just for Tacotron.
Thank you.
hello,
thank you for sharing this awesome arabic TTS model. i need some aid from you please,
this model can't read samples like below
sample = 'وَتَعَاوَنُوا عَلَى البِرِّ وَالتَّقْوَى'
wave = ar_model.tts(text_buckw = sample)
it gives me error like this
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
[<ipython-input-26-c5e258748198>](https://localhost:8080/#) in <module>
1 sample = 'وَتَعَاوَنُوا عَلَى البِرِّ وَالتَّقْوَى'
----> 2 wave = ar_model.tts(text_buckw = sample)
3 wave = wave * 32768.0
4 wave
6 frames
[/content/tts-arabic-tacotron2/model/networks.py](https://localhost:8080/#) in tts(self, text_buckw, batch_size, speed, postprocess_mel, return_mel)
286 return self.tts_single(text_buckw, speed=speed,
287 postprocess_mel=postprocess_mel,
--> 288 return_mel=return_mel)
289
290 # input: list
[/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
25 def decorate_context(*args, **kwargs):
26 with self.clone():
---> 27 return func(*args, **kwargs)
28 return cast(F, decorate_context)
29
[/content/tts-arabic-tacotron2/model/networks.py](https://localhost:8080/#) in tts_single(self, text_buckw, speed, postprocess_mel, return_mel)
242 return_mel=False):
243
--> 244 mel_spec = self.model.ttmel_single(text_buckw, postprocess_mel)
245 if speed is not None:
246 mel_spec = resize_mel(mel_spec, rate=speed)
[/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
25 def decorate_context(*args, **kwargs):
26 with self.clone():
---> 27 return func(*args, **kwargs)
28 return cast(F, decorate_context)
29
[/content/tts-arabic-tacotron2/model/networks.py](https://localhost:8080/#) in ttmel_single(self, utterance, postprocess_mel)
115 process_mel = True
116
--> 117 token_ids = text.tokens_to_ids(tokens)
118 ids_batch = torch.LongTensor(token_ids).unsqueeze(0).to(self.device)
119
[/content/tts-arabic-tacotron2/text/__init__.py](https://localhost:8080/#) in tokens_to_ids(phonemes)
23
24 def tokens_to_ids(phonemes):
---> 25 return [phon_to_id[phon] for phon in phonemes]
26
27
[/content/tts-arabic-tacotron2/text/__init__.py](https://localhost:8080/#) in <listcomp>(.0)
23
24 def tokens_to_ids(phonemes):
---> 25 return [phon_to_id[phon] for phon in phonemes]
26
27
KeyError: 'i0i0'
how do i deal with such unknown phonemes? do you have any quick solution for this issue? thanks in advance
Hello,
I've been attempting to train Tacotron2 using Nawar Halabi's Arabic Speech Corpus, I'm trying to make sure everything works fine with this corpus before moving on to my own data. Despite following the instructions for preprocessing and training with the provided configurations, I'm experiencing issues with the quality of the synthesized speech.
scripts/preprocess_audio.py
(after fixing minor issues).configs/nawar.yaml
configuration file.train_tc2.py
and train_tc2_adv.py
.restore_model: ''
log_dir: logs/exp2
checkpoint_dir: checkpoints/exp2
train_wavs_path: /media/hayder/Disk2/development/tts-arabic-pytorch_test_2/data/arabic-speech-corpus/wav_new
train_labels: ./data/train_phon.txt
test_wavs_path: /media/hayder/Disk2/development/tts-arabic-pytorch_test_2/data/arabic-speech-corpus/test set/wav_new
test_labels: ./data/test_phon.txt
balanced_sampling: False
sampler_weights_file: ./data/sampler/sampler_weights
cache_dataset: False
n_save_states_iter: 10
n_save_backup_iter: 1000
# training
epochs: 500
decoder_max_step: 3000
random_seed: False
batch_size: 8
learning_rate: 1.0e-3
weight_decay: 1.0e-6
grad_clip_thresh: 1.0
cache_dataset: True
use_cuda_if_available: True
balanced_sampling: False
# vocoder
vocoder_state_path: pretrained/hifigan-asc-v1/hifigan-asc.pth
vocoder_config_path: pretrained/hifigan-asc-v1/config.json
# diacritizers
shakkala_path: pretrained/diacritizers/shakkala_second_model6.pth
shakkelha_path: pretrained/diacritizers/shakkelha_rnn_3_big_20.pth
from models.tacotron2 import Tacotron2Wave
import soundfile as sf
import playsound
model = Tacotron2Wave('/media/hayder/Disk2/development/tts-arabic-pytorch_test_2/checkpoints/exp2/states.pth')
model = model.cuda()
wave = model.tts("اَلسَّلامُ عَلَيكُم يَا صَدِيقِي")
sf.write('audio.wav', wave, 22050)
playsound.playsound('audio.wav')
The generated speech does not align with the expected pronunciation. I've included a WAV file example of the synthesized speech for the sentence "اَلسَّلامُ عَلَيكُم يَا صَدِيقِي" here.
Could you please assist me in identifying the possible issues? If you need additional information I'll promptly provide it.
Thanks.
hello,
first, thank you for amazing repository that very lack in such language like Arabic , I have question : can I train the model with different dataset rather than Arabic speech corpus? if yes could you please help me with steps that I should change in the code
thank you in advance.
is use speaker_id to change sound like female and male?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.