takase / control-length Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
I was trying to use the code for my project and I couldn't figure out why the weights are going to zero at positional_idx here:
I am refering this part of the code.
if padding_idx is not None:
if length is None:
emb[padding_idx, :] = 0
else:
emb[:, padding_idx, :] = 0
I understand that the padding_idx
can be used to get the masks where the position embedding should go to zero and I believe the utils.make_positions
function does exactly that. However, I couldn't figure out the above piece of code.
hi,could you put a download link to the annotated Gigaword dataset?Thanks~
Hello, may I ask, have you trained in THE CNN/DM data set based on this model? What do I need to change about the parameter Settings?
Hello! I was trying to run your code but ran into problems.
I'm using cuda 10.1 and pytorch 1.7.0+cu92
Running train.py with the recommened parameters results in running out of memory:
python3 train.py data-bin/writingPrompts --source-lang wp_source --target-lang wp_target --arch transformer_wmt_en_de --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 --lr 0.001 --min-lr 1e-09 --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --seed 2723 --max-epoch 100 --update-freq 64 --share-all-embeddings --represent-length-by-lrpe --ordinary-sinpos --save-dir output --max-tokens 3584
Lowering the parameters to avoid running out of memory as such:
python3 train.py data-bin/writingPrompts --source-lang wp_source --target-lang wp_target --arch transformer_wmt_en_de --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 --lr 0.001 --min-lr 1e-09 --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --seed 2723 --max-epoch 100 --update-freq 64 --share-all-embeddings --represent-length-by-lrpe --ordinary-sinpos --save-dir output --max-tokens 224 --max-sentences 128
Results in the following error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [30, 7, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Full Output:
| [wp_source] dictionary: 899328 types
| [wp_target] dictionary: 899328 types
| data-bin/writingPrompts train 272600 examples
| data-bin/writingPrompts valid 15620 examples
| model transformer_wmt_en_de, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 504594432
| training on 1 GPUs
| max tokens per GPU = 224 and max sentences per GPU = 128
| WARNING: 51452 samples have invalid sizes and will be skipped, max_positions=(1024, 1024), first few sample ids=[185425, 210272, 169339, 79302, 184940, 229390, 23406, 81508, 81486, 128160]
| NOTICE: your device may support faster training with --fp16
/home/andre/Documents/control-length/encdec/fairseq/modules/multihead_attention.py:109: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_
version of the function that produced this view or don't modify this view inplace. (Triggered internally at /pytorch/torch/csrc/autograd/variable.cpp:491.)
q *= self.scaling
[W python_anomaly_mode.cpp:104] Warning: Error detected in SplitBackward. Traceback of forward call that caused the error:
File "train.py", line 365, in
main(args)
File "train.py", line 84, in main
trainer.dummy_train_step([dummy_batch])
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 329, in dummy_train_step
self.train_step(dummy_batch, dummy_batch=True)
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 175, in train_step
loss, sample_size, logging_output = self.task.get_loss(
File "/home/andre/Documents/control-length/encdec/fairseq/tasks/fairseq_task.py", line 157, in get_loss
return criterion(model, sample)
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/criterions/label_smoothed_cross_entropy.py", line 36, in forward
net_output = model(**sample['net_input'])
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/models/fairseq_model.py", line 163, in forward
decoder_out = self.decoder(prev_output_tokens, target_length, encoder_out)
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/models/transformer.py", line 507, in forward
x, attn = layer(
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/models/transformer.py", line 726, in forward
x, _ = self.self_attn(
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/modules/multihead_attention.py", line 96, in forward
q, k, v = self.in_proj_qkv(query)
File "/home/andre/Documents/control-length/encdec/fairseq/modules/multihead_attention.py", line 193, in in_proj_qkv
return self._in_proj(query).chunk(3, dim=-1)
(function _print_stack)
Traceback (most recent call last):
File "train.py", line 365, in
main(args)
File "train.py", line 84, in main
trainer.dummy_train_step([dummy_batch])
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 329, in dummy_train_step
self.train_step(dummy_batch, dummy_batch=True)
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 200, in train_step
raise e
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 189, in train_step
self.optimizer.backward(loss)
File "/home/andre/Documents/control-length/encdec/fairseq/optim/fairseq_optimizer.py", line 73, in backward
loss.backward()
File "/home/andre/.local/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/andre/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [30, 7, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Any idea on what the problem might be? Cheers!
Hi,
Thanks for the great repo. I'm wondering if you can provide a checkpoint file for text summarization that has been trained on gigaword according to your settings? I'm trying to train according to the provided settings for train.py, and at this rate on 4 GPUs it will take me 2 weeks. (I had to set limits on max-tokens and max-sentences, because otherwise I get memory problems and the job hangs.)
can author give an example to show how to do text_sum
Hi, could you tell me how to reload the pre-trained checkpoint you released?
When I try to use this pre-trained checkpoint to generate, there is some errors:
Traceback (most recent call last):
File "generate.py", line 172, in
main(args)
File "generate.py", line 44, in main
models, _ = utils.load_ensemble_for_inference(args.path.split(':'), task, model_arg_overrides=eval(args.model_overrides))
File "/users6/ychuang/program/python/control-length-master/encdec/fairseq/utils.py", line 160, in load_ensemble_for_inference
model.load_state_dict(state['model'], strict=True)
File "/users6/ychuang/program/python/control-length-master/encdec/fairseq/models/fairseq_model.py", line 64, in load_state_dict
super().load_state_dict(state_dict, strict)
File "/users6/ychuang/anaconda3/envs/py3.6_torch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([16152, 512]) from checkpoint, the shape in current model is torch.Size([124408, 512]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([16152, 512]) from checkpoint, the shape in current model is torch.Size([124408, 512]).
Thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.