takase / control-length Goto Github PK

View Code? Open in Web Editor NEW

53.0 53.0 8.0 203 KB

License: BSD 3-Clause "New" or "Revised" License

Python 95.64% Makefile 0.10% Batchfile 0.13% Shell 1.98% C++ 0.58% Lua 0.72% Perl 0.85%

control-length's People

Contributors

Stargazers

Watchers

Forkers

thorphan stepgazaille hajipoor omoteforlab nttcslab-nlp 2022-pbl-g2 chenyangh

control-length's Issues

Use of `padding_idx` in SinusoidalPositionalEmbedding

I was trying to use the code for my project and I couldn't figure out why the weights are going to zero at positional_idx here:
I am refering this part of the code.

if padding_idx is not None:
            if length is None:
                emb[padding_idx, :] = 0
            else:
                emb[:, padding_idx, :] = 0

I understand that the padding_idx can be used to get the masks where the position embedding should go to zero and I believe the utils.make_positions function does exactly that. However, I couldn't figure out the above piece of code.

gigawod dataset

hi，could you put a download link to the annotated Gigaword dataset？Thanks~

CNN/DM

Hello, may I ask, have you trained in THE CNN/DM data set based on this model? What do I need to change about the parameter Settings?

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

Hello! I was trying to run your code but ran into problems.
I'm using cuda 10.1 and pytorch 1.7.0+cu92

Running train.py with the recommened parameters results in running out of memory:

python3 train.py data-bin/writingPrompts --source-lang wp_source --target-lang wp_target --arch transformer_wmt_en_de --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 --lr 0.001 --min-lr 1e-09 --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --seed 2723 --max-epoch 100 --update-freq 64 --share-all-embeddings --represent-length-by-lrpe --ordinary-sinpos --save-dir output --max-tokens 3584

Lowering the parameters to avoid running out of memory as such:

Results in the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [30, 7, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Full Output:

| [wp_source] dictionary: 899328 types
| [wp_target] dictionary: 899328 types
| data-bin/writingPrompts train 272600 examples
| data-bin/writingPrompts valid 15620 examples
| model transformer_wmt_en_de, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 504594432
| training on 1 GPUs
| max tokens per GPU = 224 and max sentences per GPU = 128
| WARNING: 51452 samples have invalid sizes and will be skipped, max_positions=(1024, 1024), first few sample ids=[185425, 210272, 169339, 79302, 184940, 229390, 23406, 81508, 81486, 128160]
| NOTICE: your device may support faster training with --fp16
/home/andre/Documents/control-length/encdec/fairseq/modules/multihead_attention.py:109: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at /pytorch/torch/csrc/autograd/variable.cpp:491.)
q *= self.scaling
[W python_anomaly_mode.cpp:104] Warning: Error detected in SplitBackward. Traceback of forward call that caused the error:
File "train.py", line 365, in
main(args)
File "train.py", line 84, in main
trainer.dummy_train_step([dummy_batch])
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 329, in dummy_train_step
self.train_step(dummy_batch, dummy_batch=True)
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 175, in train_step
loss, sample_size, logging_output = self.task.get_loss(
File "/home/andre/Documents/control-length/encdec/fairseq/tasks/fairseq_task.py", line 157, in get_loss
return criterion(model, sample)
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/criterions/label_smoothed_cross_entropy.py", line 36, in forward
net_output = model(**sample['net_input'])
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/models/fairseq_model.py", line 163, in forward
decoder_out = self.decoder(prev_output_tokens, target_length, encoder_out)
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/models/transformer.py", line 507, in forward
x, attn = layer(
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/models/transformer.py", line 726, in forward
x, _ = self.self_attn(
File "/home/andre/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/andre/Documents/control-length/encdec/fairseq/modules/multihead_attention.py", line 96, in forward
q, k, v = self.in_proj_qkv(query)
File "/home/andre/Documents/control-length/encdec/fairseq/modules/multihead_attention.py", line 193, in in_proj_qkv
return self._in_proj(query).chunk(3, dim=-1)
(function _print_stack)
Traceback (most recent call last):
File "train.py", line 365, in
main(args)
File "train.py", line 84, in main
trainer.dummy_train_step([dummy_batch])
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 329, in dummy_train_step
self.train_step(dummy_batch, dummy_batch=True)
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 200, in train_step
raise e
File "/home/andre/Documents/control-length/encdec/fairseq/trainer.py", line 189, in train_step
self.optimizer.backward(loss)
File "/home/andre/Documents/control-length/encdec/fairseq/optim/fairseq_optimizer.py", line 73, in backward
loss.backward()
File "/home/andre/.local/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/andre/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [30, 7, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Any idea on what the problem might be? Cheers!

Trained checkpoint?

Hi,

Thanks for the great repo. I'm wondering if you can provide a checkpoint file for text summarization that has been trained on gigaword according to your settings? I'm trying to train according to the provided settings for train.py, and at this rate on 4 GPUs it will take me 2 weeks. (I had to set limits on max-tokens and max-sentences, because otherwise I get memory problems and the job hangs.)

how can i use it to do text summarization?

can author give an example to show how to do text_sum

How to load pre-trained checkpoint?

Hi, could you tell me how to reload the pre-trained checkpoint you released?
When I try to use this pre-trained checkpoint to generate, there is some errors:

Traceback (most recent call last):
File "generate.py", line 172, in
main(args)
File "generate.py", line 44, in main
models, _ = utils.load_ensemble_for_inference(args.path.split(':'), task, model_arg_overrides=eval(args.model_overrides))
File "/users6/ychuang/program/python/control-length-master/encdec/fairseq/utils.py", line 160, in load_ensemble_for_inference
model.load_state_dict(state['model'], strict=True)
File "/users6/ychuang/program/python/control-length-master/encdec/fairseq/models/fairseq_model.py", line 64, in load_state_dict
super().load_state_dict(state_dict, strict)
File "/users6/ychuang/anaconda3/envs/py3.6_torch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([16152, 512]) from checkpoint, the shape in current model is torch.Size([124408, 512]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([16152, 512]) from checkpoint, the shape in current model is torch.Size([124408, 512]).

Thank you very much!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.