as-ideas / deepphonemizer Goto Github PK
View Code? Open in Web Editor NEWGrapheme to phoneme conversion with deep learning.
License: MIT License
Grapheme to phoneme conversion with deep learning.
License: MIT License
This is not the intended usecase for this package but I'm trying to find a way to generate multiple pronunciations for a given word, is there any way to modify the predict script to generate multiple sequences of results and also have the probabilities for each? Thanks!
Hi,
i am using the latin_ipa_forward.pt checkpoint to phonemize large amounts of german text.while it is working fine in almost every case, for some reason, it seems to drop the german letter 'ß'. eg:
>>>phonemizer('ich wohne in der sesamstraße.', lang='de')
ɪç voːnə ʔɪn deːr zezamstʁaː.
any idea how to fix this?
Thanks!
Hello, thanks for your great work! I found that dp.phonemizer cannot handle heteronym problems well.
For example:
"We create the new record in the recording room"
turns into
"[W][IY] [K][R][IY][EY][T] [DH][AH] [N][UW] [R][AH][K][AO][R][D] [IH][N] [DH][AH] [R][AH][K][AO][R][D][IH][NG] [R][UW][M]"
while record
should be [R], [EH], [K], [ER], [D]
Is there any suggestion? Thanks
Hello, love your resource and would like to use it to convert phrases to Arpabet symbols. I noticed that the link to the checkpoint "en_us_cmudict_forward" is the same as "en_us_cmudict_ipa_forward". Could you please link the correct file? Thank you!
Thanks for the repo!
When trying to finetune one of the provided pretrained models, I was getting an unintuitive error. This was because the models were saved without optimizer and when trying to load the checkpoint, in line 76 in training/trainer.py, the check wouldnt stop it from loading the optimizer as checkpoint['optimizer']
existed in the dict with None
value
optimizer = Adam(model.parameters())
if 'optimizer' in checkpoint:
optimizer.load_state_dict(checkpoint['optimizer'])
for g in optimizer.param_groups:
g['lr'] = config['training']['learning_rate']
changing the line to if 'optimizer' in checkpoint and checkpoint['optimizer']:
should fix it.
Hi,
Is it possible to upload the checkpoints to Hugging Face and automatically download and load the checkpoints?
Thanks!
Hi,
Is it possible to fine-tune an existing model, for example to add a new language?
Thanks!
Hello,
I had a problem when running the export code snippet in the Readme :
RuntimeError:
Unknown type name 'torch.tensor':
Any ideas ?
Thanks
Hi, @cschaefer26
Cool lib!
I was just wondering: any particular reason you don't include stresses prediction into pipeline?
Both "cmudict-ipa" and "wikipron" has stresses labelling included.
Phoneme tokenizers from pretrained checkpoints lack '
and ,
symbols (this was probably done due to collision with puctuation, but it's pretty easy to avoid).
Hi.
I'm working on this shared task:
https://github.com/sigmorphon/2022G2PST
Some of the character sets work fine, but others do not, specifically: Persian, Bengali, and Thai.
Persian and Bengali fail when training begins. Thai fails at inference.
Any ideas why this might be so?
I'm appending the error below. The problem seems to be in training/trainer.py
.
thank you,
mike h.
(mhenv) mhammond@SBS-7337:~/Dropbox/fromlapper/sigmorphon2022/deep$ python doit.py
per
{'ن', 'و', 'ج', 'ل', 'ژ', 'س', 'ض', 'ذ', 'ت', 'ه', 'ر', '\u200c', 'ث', 'ظ', 'ش', 'ا', 'ع', 'ئ', 'م', 'غ', 'ە', 'ص', 'ح', 'آ', 'ء', 'پ', 'چ', 'گ', 'خ', 'ف', 'ی', 'ق', 'ز', 'د', 'ک', 'ب'}
2022-05-22 15:26:50,656.656 INFO preprocess: Preprocessing, train data: with 100 files.
2022-05-22 15:26:50,656.656 INFO preprocess: Processing train data...
100%|██████████████████████████████████████| 100/100 [00:00<00:00, 86178.43it/s]
2022-05-22 15:26:50,659.659 INFO preprocess:
Saving datasets to: /home/mhammond/Desktop/datasets
2022-05-22 15:26:50,660.660 INFO preprocess: Preprocessing.
Train counts (deduplicated): [('per', 100)]
Val counts (including duplicates): [('per', 56)]
2022-05-22 15:26:50,662.662 INFO train: Initializing new model from config...
2022-05-22 15:26:50,742.742 INFO train: Checkpoints will be stored at /home/mhammond/Desktop/checkpoints
Traceback (most recent call last):
File "doit.py", line 79, in <module>
train(config_file=lang+'.yaml')
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/dp/train.py", line 57, in train
trainer.train(model=model,
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/dp/training/trainer.py", line 89, in train
val_batches = sorted([b for b in val_loader], key=lambda x: -x['text_len'][0])
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/dp/training/trainer.py", line 89, in <listcomp>
val_batches = sorted([b for b in val_loader], key=lambda x: -x['text_len'][0])
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 569, in _next_data
index = self._next_index() # may raise StopIteration
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 226, in __iter__
for idx in self.sampler:
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/dp/training/dataset.py", line 54, in __iter__
binned_idx = np.stack(bins).reshape(-1)
File "<__array_function__ internals>", line 180, in stack
File "/home/mhammond/Desktop/mhenv/lib/python3.8/site-packages/numpy/core/shape_base.py", line 422, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack
I'm been using this model for some time and recently had to install it in a new python venv. The train() syntax has changed since my original install, so I know some changes have been made since my previous install. Problem is, it no longer saves any checkpoint files during training. The directory is created, but nothing ever shows up there - latest_model.pt, best_model.pt, etc. are never written. This is making the model completely useless, as I need to test it after training, not during.
I was trying to convert my trained model for Hindi to a JIT compatble model using the method provided on the README
Facing an error at the dropout layer.
In [6]: phonemizer.predictor.model = torch.jit.script(model)
Unknown type name 'torch.tensor':
File "/home/gnani/.virtualenvs/ttsapi/lib/python3.8/site-packages/dp/model/utils.py", line 24
def forward(self, x: torch.tensor) -> torch.tensor: # shape: [T, N]
~~~~~~~~~~~~ <--- HERE
x = x + self.scale * self.pe[:x.size(0), :]
return self.dropout(x)
In eval mode:
ln [8]: model.eval()
In [9]: phonemizer.predictor.model = torch.jit.script(model)
RuntimeError: Can't redefine method: forward on class: __torch__.dp.model.utils.PositionalEncoding
Any suggestions would be greatly appreciated!
Hi!
Great work you're doing here.
I've been testing your tool, it's easy to use and gives fine results.
Since I'm looking for a tool to generate a phonemized imput for the VITS model (in onnx format), I need to use the same tokenizer (phonemizer) that model espects. I've found that your pretrained models already have the dictionary embedded in them. Can I ask where did those dictionaries come from? In your colab training example you use CUNY-CL/wikipron's ones, but I was wondering if those are the ones you used originally or just in the example.
Thanks.
To get familiar with the DeepPhoneme tool, I run the two example python files in the repository. run_prediction.py
works as expected, but run_training-py
generates an error.
Here is the log output:
mbarnig@mbarnig-MS-7B22:~/DeepPhonemizer$ python3 ./run_training.py
2021-05-25 22:40:49,343.343 INFO preprocess: Preprocessing, train data: with 300 files.
2021-05-25 22:40:49,344.344 INFO preprocess: Performing random split with num val: 100
2021-05-25 22:40:49,344.344 INFO preprocess: Processing train data...
0it [00:00, ?it/s]
2021-05-25 22:40:49,347.347 INFO preprocess:
Saving datasets to: /home/mbarnig/DeepPhonemizer/datasets
2021-05-25 22:40:49,348.348 INFO preprocess: Preprocessing.
Train counts (deduplicated): []
Val counts (including duplicates): [('de', 200), ('en_us', 100)]
2021-05-25 22:40:49,352.352 INFO train: Initializing new model from config...
2021-05-25 22:40:49,380.380 INFO train: Checkpoints will be stored at /home/mbarnig/DeepPhonemizer/checkpoints
2021-05-25 22:40:49.464084: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "./run_training.py", line 14, in <module>
train(config_file=config_file)
File "/home/mbarnig/DeepPhonemizer/dp/train.py", line 46, in train
trainer.train(model=model,
File "/home/mbarnig/DeepPhonemizer/dp/training/trainer.py", line 74, in train
start_epoch = checkpoint['step'] // len(train_loader)
ZeroDivisionError: integer division or modulo by zero
Please advise me what is missing. Thank you.
Marco Barnig
Hello. Thanks for your great work. I want to train the model for Persian data. In Persian we link some words based on context using 'Ezafe' which is not written but pronounced. for example, here is two words and phonemes:
کیف: kif
من: man
But we read the sentence 'کیف من' as 'kife man' and not 'kif man' (Persian is written right to left). Also words pronunciations can differ based on their meanings.
My question is that how can I change the model to consider these issues?
Thanks
This looks like a really cool project. Thanks for your hard work.
Could someone please provide a sample of how to convert to ONNX? I'm new to this and I'm having a hard time figuring out how to provide the sample input. I see some others were having the same problem in this closed issue (#23). While some said they were able to export to ONNX, nobody provided a code sample of the export formula.
I see the forward method in ForwardTransformer class takes a dictionary with tensors for the following keys: text, start_index, and text_len.'
Since the model takes text on input at variable length (not fixed size) and returns output I assume I have to tell orch.onnx.export that the input for the "text" entry is a dynamic shape and that the output is a dynamic shape. I tried setting dynamic_axes but didn't have any success. If anyone could provide a sample it would be much appreciated.
I found a bug on Windows 10. "preprocessor.phoneme_tokenizer" function doesn't work properly as it will not output token id whch is greater than 28. The problem is caused by the encoding settings while reading the configuration file.
For fixing the bug, please and " encoding='utf-8' " in line 21 in "dp/utils/io.py".
with open(path, 'r', encoding="utf-8") as stream:
config = yaml.load(stream, Loader=yaml.FullLoader)
return config
Hi. I was able to train an italian model almost perfectly with the exception of few words that are intrinsecally ambiguous without context. Since your model is similar to the bert transformer what do you think would be the best solution to let the model learn word with context? Passing the sentences would be enough? Or a MLM should be implemented?
use en_us_cmudict_forward.pt model ,
DON'T [D][IY]-[OW]-[EH][N]-[T][IY
YOU'LL [W][AY]-[OW]-[Y][UW]-[EH][L]-[EH][L]
BERGSON [B][IY]-[IY]-[AA][R]-[JH][IY]-[EH][S]-[OW]-[EH][N]
If you run a string through the Phonemizer that has a numeric value it will produce an empty string as the result. For example:
resultOne = phonemizer('It\'s 1 o\'clock', lang='en_us')
print(resultOne)
resultTwo = phonemizer('It\'s one o\'clock', lang='en_us')
print(resultTwo)
Produces:
Result One: ɪts ɑklɑk
Result Two: ɪts wʌn ɑklɑk
Perhaps when the input text is split a raw numeric value could be converted to a spelled out string before being fed to the Phonemizer. I'm not sure if Python provides a built in way to do this (newbie @ Python) but if not a library like Inflect perhaps could be used.
Thanks a lot for this cool repository.
I am trying to convert a custom trained model from pytorch checkpoint to tflite via onnx route. I am stuck in the onnx step as I can not seem to provide a dummy input during the conversion step of onnx model. Can you help me with this?
torch.onnx.export(
model, # PyTorch Model
dummy_input, # Input tensor
"output.onnx", # Output file (eg. 'output_model.onnx')
opset_version=14, # Operator support version
input_names=['embedding'], # Input tensor name (arbitary)
output_names=['fc_out'] # Output tensor name (arbitary)
)
Hello.
I found out a bug where the transformer model is unable to learn sequences of two or more consecutive identical phonemes. I first discovered it for italian which has double consonants and then applied it to english as well. Take the words holy and wholly as example. According to WordReference, their RP (probably outdated) pronunciation should be respectively: həʊli and həʊlli. I don't know how common is the latter with a geminated l sound but it doesn't really matter. What matters is that even with char repeats equal to 3 or 5 the transformer is unable to predict double phonemes.
It can be easily reproduced by running the run_training.py debug script with the default yaml file and this data:
train_data = [('en_us', 'holy', 'həʊli'),
('en_us', 'wholly', 'həʊlli')] * 50
val_data = [('en_us', 'holy', 'həʊli'),
('en_us', 'wholly', 'həʊlli')] * 60
config_file = 'forward_config.yaml'
preprocess(config_file=config_file,
train_data=train_data,
val_data=val_data,
deduplicate_train_data=False)
train(config_file=config_file)
Even in a super overfitting environment you will see that predictions will be always həʊli. Reproduction rate 100%.
Thank you for this great work!
I'm planning to train a phonemizer model of my own and wanted to compare the results to that of your en_us_cmudict_ipa_forward
model. It would be great to know which dataset and its corresponding train/test split was used to train and evaluate the model respectively.
Cheers.
Hi,
I. just wanna know is to provide a tools for developer to train their language g2p model. and use deepphonemizer api to convert text to a series of phonemes. Am I right or wrong???
Tried swapping out the default phonemizer in ForwardTacotron for DeepPhonemizer and noticed the model was very hesitant to learn attention. Took a look at the actual output and it was using "aɪz" instead of "ɪz". I have since switched to the cmudict version which does not have this issue, but it would be nice to have this fixed, especially for such a common word. Looking at the actual wikipron dataset I see that the phonemes for "is" are correct, so I'm not sure what's causing this.
When I am training a forward transformer model, I observed that after the validation loss started to rise, PER and WER kept descending.
My training config is based on the example forward transformer config, with the phoneme_symbols
modified (Phonemes are in ARPABET and vowels have stress marks) and dropout
set to 0.3.
Should I keep training or should I use the model with the lowest validation loss? Or any other suggestion?
I am training a single model supporting two languages.
For one of the languages, a row of training data looks like this:
&**ω⊃⊃&⟴∅ w i k k a t t i n u
The tensorboard during training phase gave correct entries but during inference phase using python pip package I am getting single character output for input. Using the same checkpoint the other language is working OK.
Thank you for such a beautiful project. I'm wondering if we can convert the model we created to the ONNX format, and what do I need to do for that? Thank you in advance.
Hey,
It seems that in line 88 in phonemizer.py punctuation is not removed, is this intentional?
Thanks
Thank you for putting this out there. I'm trying to train the model myself on English CMU pronunciations, which have multi-letter phoneme codes. I structure my phoneme transcriptions as lists, for example:
('en_us', 'timbre', ['T','IH1','M','B','ER0'])
The model trains fine, but when I ask for transcriptions (via, say phonemise_list()), the model output doesn't put delimiters between the phonemes; so it's version of 'timbre' is:
'TAY1MBER0'
This is not helpful, and also not what the pre-trained CMU model does - It produces output like:
'[T][AY1][M][B][ER0]'
How can I adjust the config file or the calls to train() so that I get back something with delimiters between the phonemes?
https://github.com/as-ideas/DeepPhonemizer/blob/main/dp/utils/logging.py#L13 is overriding app-level logging config at runtime when importing DeepPhonemizer
. According to https://stackoverflow.com/a/27017068 and https://docs.python.org/3/howto/logging.html#library-config it might make sense to remove the logging config.
Hello. Thank you for this amazing repository!
I have a question though. What’s the easiest way to get a unique grapheme set for a specific language? How did you get that list when training a multilingual model?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.