teslacool / sca Goto Github PK
View Code? Open in Web Editor NEWSoft Contextual Data Augmentation
License: Other
Soft Contextual Data Augmentation
License: Other
Hi,
I want to reproduce your result on IWSLT14 De-En, but I can't get 35.78. My best result is 34.25. Here I want to ask some detailed setup:
before the last part of the log of one of my training of the language model
Why the ppl reports a inf value?
| epoch 043 | loss 2837.172 | ppl inf | wps 22040 | ups 4 | wpb 5116.162 | bsz 4.997 | num_updates 39302 | lr 2.5e-05 | gnorm 17885.347 | clip 1.000 | oom 0.000 | wall 9326 | train_wall 8854
| epoch 043 | valid on 'valid' subset | loss 2322.051 | ppl inf | num_updates 39302 | best_loss 2322.05
| epoch 044 | loss 2819.246 | ppl inf | wps 22042 | ups 4 | wpb 5116.162 | bsz 4.997 | num_updates 40216 | lr 2.5e-05 | gnorm 19552.845 | clip 1.000 | oom 0.000 | wall 9543 | train_wall 9060
| epoch 044 | valid on 'valid' subset | loss 2272.617 | ppl inf | num_updates 40216 | best_loss 2272.62
| epoch 045 | loss 2802.761 | ppl inf | wps 22039 | ups 4 | wpb 5116.162 | bsz 4.997 | num_updates 41130 | lr 2.5e-05 | gnorm 354250.108 | clip 1.000 | oom 0.000 | wall 9761 | train_wall 9266
| epoch 045 | valid on 'valid' subset | loss 2269.807 | ppl inf | num_updates 41130 | best_loss 2269.81
| epoch 046 | loss 2782.943 | ppl inf | wps 22041 | ups 4 | wpb 5116.162 | bsz 4.997 | num_updates 42044 | lr 2.5e-05 | gnorm 30559.840 | clip 1.000 | oom 0.000 | wall 9978 | train_wall 9472
| epoch 046 | valid on 'valid' subset | loss 2250.028 | ppl inf | num_updates 42044 | best_loss 2250.03
| epoch 047 | loss 2769.120 | ppl inf | wps 22042 | ups 4 | wpb 5116.162 | bsz 4.997 | num_updates 42958 | lr 2.5e-05 | gnorm 18006.640 | clip 1.000 | oom 0.000 | wall 10196 | train_wall 9678
| epoch 047 | valid on 'valid' subset | loss 2268.014 | ppl inf | num_updates 42958 | best_loss 2250.03
In README.md,, it is written
I shift a sentence twice in decoder input, so the shortest sentence length after bpe should be no less than 2.
What does exactly mean?
If I use a "standard" set of preprocessed data created by fairseq-preprocess
, I got this error, when trying to train the LM.
$ python3 ../train.py ./runtime/default/tmp/training/data_generated --task language_modeling --arch transformer_lm --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 --lr 0.0005 --min-lr 1e-09 --dropout 0.1 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 4096 --tokens-per-sample 4096 --save-dir ./SAVEDIR --update-freq 16
Namespace(adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_input_cutoff=None, adaptive_input_factor=4, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, adaptive_softmax_factor=4, arch='transformer_lm', attention_dropout=0.0, bucket_cap_mb=150, char_embedder_highway_layers=2, character_embedding_dim=4, character_embeddings=False, character_filters='[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]', clip_norm=0.0, criterion='label_smoothed_cross_entropy', data='./runtime/default/tmp/training/data_generated', ddp_backend='c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_ffn_embed_dim=2048, decoder_input_dim=512, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, decoder_output_dim=512, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, fix_batches_to_gpus=False, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, future_target=False, keep_interval_updates=-1, label_smoothing=0.1, log_format=None, log_interval=1000, lr=[0.0005], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_tokens=4096, max_update=0, min_loss_scale=0.0001, min_lr=1e-09, momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, no_token_positional_embeddings=False, optimizer='adam', optimizer_overrides='{}', output_dictionary_size=-1, past_target=False, raw_text=False, relu_dropout=0.0, reset_lr_scheduler=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode=None, save_dir='./SAVEDIR', save_interval=1, save_interval_updates=0, seed=1, self_target=False, sentence_avg=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, task='language_modeling', tie_adaptive_proj=False, tie_adaptive_weights=False, tokens_per_sample=4096, train_subset='train', update_freq=[16], valid_subset='valid', validate_interval=1, warmup_init_lr=1e-07, warmup_updates=4000, weight_decay=0.0)
Traceback (most recent call last):
File "/home/nicola/workspace/SoftContextualDataAugmentation/fairseq/data/dictionary.py", line 169, in load
return cls.load(fd)
File "/home/nicola/workspace/SoftContextualDataAugmentation/fairseq/data/dictionary.py", line 183, in load
count = int(line[idx+1:])
ValueError: invalid literal for int() with base 10: "'<Lua_Heritage>'\n"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "../train.py", line 431, in <module>
main(args)
File "../train.py", line 36, in main
task = tasks.setup_task(args)
File "/home/nicola/workspace/SoftContextualDataAugmentation/fairseq/tasks/__init__.py", line 19, in setup_task
return TASK_REGISTRY[args.task].setup_task(args)
File "/home/nicola/workspace/SoftContextualDataAugmentation/fairseq/tasks/language_modeling.py", line 94, in setup_task
dictionary = Dictionary.load(os.path.join(args.data, 'dict.txt'))
File "/home/nicola/workspace/SoftContextualDataAugmentation/fairseq/data/dictionary.py", line 177, in load
"rebuild the dataset".format(f))
Exception: Incorrect encoding detected in ./runtime/default/tmp/training/data_generated/dict.txt, please rebuild the dataset
As you claimed, you train your lm-nmt from scratch, but why do you not use a pretrained nmt model for warmup? Can you give some experimental results about the latter strategy?
Hi there,
Thanks for your interesting work! Do you have data processing scripts for Es-En and He-En?
Traceback (most recent call last): File "train.py", line 431, in <module> main(args) File "train.py", line 77, in main if args.load_lm: AttributeError: 'Namespace' object has no attribute 'load_lm'
When training the language model, I used the script you provided.and arch=transformer_lm
Why are there still mistakes
Besides, I don't quite understand your operation.
src=en tgt=ru for l in $src $tgt; do srcdir=${src}2${tgt} tgtdir=lmof${l} mkdir -p $tgtdir cp $srcdir/dict.${l}.txt $tgtdir/dict.txt cp $srcdir/train.${src}-${tgt}.${l}.bin $tgtdir/train.bin cp $srcdir/train.${src}-${tgt}.${l}.idx $tgtdir/train.idx cp $srcdir/valid.${src}-${tgt}.${l}.bin $tgtdir/valid.bin cp $srcdir/valid.${src}-${tgt}.${l}.idx $tgtdir/valid.idx done
I didn't use the script you mentioned
I'm very sorry to disturb you, because I haven't solved this problem.
I encountered the following problems when training the language model
Traceback (most recent call last): File "train.py", line 431, in <module> main(args) File "train.py", line 77, in main if args.load_lm: AttributeError: 'Namespace' object has no attribute 'load_lm'
I use the same steps as you use,and arch=transformer_lm
python train.py $DATA --task language_modeling --arch $ARCH \ --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \ --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \ --lr 0.0005 --min-lr 1e-09 \ --dropout 0.1 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --max-tokens 4096 --tokens-per-sample 4096 --save-dir $SAVE --update-freq 16
I noticed what you said
I have modified this fairseq repo's dataloader, you'd better train language models with standard fairseq repo.
but I don’t know how to use standard fairseq. I can’t train on the current version because of the problem with the pytorch version. Later I downloaded the code of fairseq 0.6.0 and there are problems with it. I really It is impossible to train the language model, can you tell me the steps you train
I need few clarifications.
Please confirm and/or comment about the following claims related to your software:
during training of the transformer_lmnmt architecture, the parameters related to the source and target lm decoders (i.e the lowers layers of the entire architecture) are not trained
during inference with transformer_lmnmt architecture, the source and target lm decoders are active in the sense that the input tokens go through these layers before traversing the transformer encoder and decoder
the forward step of inference is essentially the same as the forward step of training
If any of the previous is wrong, please explain me the right process.
If I am totally right, I have a further question.
Have you ever tried to infer the translation without the source and the target lm layers, i.e. using a standard transformer? Which results did you get?
If you did not try, which is your feeling about such experiment?
After reading your paper, which is undoubtedly very much interesting, I gave a deep look into your code; I must admit that it is very well-organized. So thank you so much for your work.
Before starting my experimentation with it, I would like to know your suggestions about how to optimize the parameters of the system and of the training:
Should I pay particular attention to any aspect of the training to avoid bad performance?
Hi, tesla,
When I use the code to train the language model follow your script,there was en error:
Traceback (most recent call last):
File "train.py", line 431, in <module>
main(args)
File "train.py", line 77, in main
if args.load_lm:
AttributeError: 'Namespace' object has no attribute 'load_lm'
Then I delete line 77-78,go on training, still meet errors:
Traceback (most recent call last):
File "train.py", line 431, in <module>
main(args)
File "train.py", line 42, in main
model = task.build_model(args)
File "/data/experiment/sca/fairseq/tasks/language_modeling.py", line 118, in build_model
model = super().build_model(args)
File "/data/experiment/sca/fairseq/tasks/fairseq_task.py", line 131, in build_model
return models.build_model(args, self)
File "/data/experiment/sca/fairseq/models/__init__.py", line 34, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/data/experiment/sca/fairseq/models/transformer.py", line 111, in build_model
src_dict, tgt_dict = task.source_dictionary, task.target_dictionary
File "/data/experiment/sca/fairseq/tasks/fairseq_task.py", line 201, in source_dictionary
raise NotImplementedError
NotImplementedError
Following is my script:
num=0
data_bin=/data/experiment/sca/data/lmofch
save_dir=model-ch
dropout=0.1
arch=transformer
max_tokens=4096
criterion=label_smoothed_cross_entropy
label_smoothing=0.1
lrscheduler=inverse_sqrt
CUDA_VISIBLE_DEVICES=$num python train.py $data_bin \
--task language_modeling \
--arch $arch \
--optimizer adam \
--adam-betas '(0.9, 0.98)' \
--clip-norm 0.0 \
--lr-scheduler $lrscheduler \
--warmup-init-lr 1e-07 \
--warmup-updates 4000 \
--lr 0.0005 \
--min-lr 1e-09 \
--dropout $dropout \
--weight-decay 0.0 \
--criterion $criterion \
--label-smoothing $label_smoothing \
--max-tokens $max_tokens \
--tokens-per-sample 4096 \
--save-dir $save_dir \
--update-freq 16
So I wonder if it is I choose the wrong arch that bring about the error.
I follow the instruction to preprocess and train an engine with your code with and without srclm and trglm. And I succeded. I trained two models, one with srclm and tgtlm and one engine without.
Then, I tried to translate with any of the two models, but in both cases I failed.
Here are the two command I used
echo "ciao ciao ciao" | python3 ../interactive.py --remove-bpe REMOVE_BPE --raw-text --path engine_nolm/checkpoint_best.pt --src-no-lm --tgt-no-lm --load-nmt --task lm_translation data_generated
echo "ciao ciao ciao" | python3 ../interactive.py --remove-bpe REMOVE_BPE --raw-text --path engine_lm/checkpoint_best.pt --task lm_translation data_generated --src-no-lm --tgt-no-lm --load-srclm-file lm_sl/checkpoint_best.pt --load-tgtlm-file lm_tl/checkpoint_best.pt --load-nmt-file engine_lm/checkpoint_best.pt
What's wrong?
Which is the correct command to activate both src and tgt LM, the command to disable them?
I have downloaded theIWSLT data, no problem
Snipaste_2020-08-01_16-15-57
In addition
Did you add tag information to the results obtained in LM-sample experiment
Dear author. Thanks very much for your work. I have a question, in my experiment, the training speed of lm-nmt is much slower than that of pure nmt model, is it common?
I am really impressed about your work, and I think it would be very useful for everyone (and for me in particular) having it inside fairseq.
Did you intend to make a Pull Request?
If you prefer, I volunteer to do it.
HI,
how do you generate by your code? When I use your generate script, it shows unexpected keys.
I would like to use your software in a multilingual environment.
In practice, I would like to train one system for translating from English into Spanish and Italian.
I already have these system working using a standard transformer architecture.
To do this, I followed a quite standard procedure to add a language flag into the source text to trigger the right target translation (into Spanish or Italian).
In the same way, I also can train one system for translating from Spanish or Italian into English. In this case, no language flags are used; but I simply concatenate Spanish-English and Italian-English training data, and let the network do all the job.
I would like to know your idea about applying a similar strategy with a lm_translation task (i.e. a transformer plus LM).
In the first case (en->{es,it}), the source LM would contain the language flag, and only English tokens, while the target LM would contain both Spanish and Italian words.
In the second case ({es,it}->en), the source LM would contain both Spanish and Italian words, wile the target LM would be "standard".
Would the LMs be strong enough to "distinguish" between Spanish and Italian tokens?
Could the presence of the language flag disturb the quality of the LMs?
Do you see other approaches for creating a SCA multilingual engine (en->{es,it} or {es,it}->en)?
Any suggestions or comments are very welcome.
I am using your software to create a large-sized system
My setting includes:
for a total of about 410M parameters.
The system was trained on a huge corpus having more than 1G words in each language.
Unfortunately and disappointingly, the performance of this system are slightly worse than the corresponding system without the LM having about 200M parameters.
I saw that you run your experiments showing a consistent improvement of 1 BLEU point on a smaller task (you train on only 4.5M sentence pairs, i.e. less than 100M words).
Did you run experiments on larger data sets?
What is your feeling about the use of LM on such big data set (more than 1G words)?
Do you think that I was wrong in some setting of my system?
Any comment or tip for improvement is welcome
Don't you use p(x) instead of x?
So I think that only the language model of the source language is trained, so what is the language model of the target side used for? Please answer, thank you very much.
I am trying to use your script ./train.py
(instead of the official fairseq-train) to train the language models.
I run such a command
and I got this error
Traceback (most recent call last):
File "../../../code/SCA//train.py", line 437, in <module>
main(args)
File "../../../code/SCA//train.py", line 89, in main
trainer.dummy_train_step([dummy_batch])
File "/home/nicola/workspace/SCA/code/SCA/fairseq/trainer.py", line 335, in dummy_train_step
self.train_step(dummy_batch, dummy_batch=True)
File "/home/nicola/workspace/SCA/code/SCA/fairseq/trainer.py", line 188, in train_step
ignore_grad
File "/home/nicola/workspace/SCA/code/SCA/fairseq/tasks/fairseq_task.py", line 169, in train_step
loss, sample_size, logging_output = criterion(model, sample)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/nicola/workspace/SCA/code/SCA/fairseq/criterions/adaptive_loss.py", line 41, in forward
assert hasattr(model.decoder, 'adaptive_softmax') and model.decoder.adaptive_softmax is not None
AssertionError
Note that the same parameters work well when I use fairseq-train
The data bin were generated with fairseq-preprocess
following your documentation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.