pmichel31415 / are-16-heads-really-better-than-1 Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "Are Sixteen Heads Really Better than One?"
License: MIT License
Code for the paper "Are Sixteen Heads Really Better than One?"
License: MIT License
Sorry to bother you. I met a bug druing runing the "heads_pruning.sh", and the error is:
12:21:27-INFO: ***** Running evaluation *****
12:21:27-INFO: Num examples = 9815
12:21:27-INFO: Batch size = 32
Evaluating: 0% 0/307 [00:00<?, ?it/s]Traceback (most recent call last):
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 585, in
main()
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 521, in main
scorer=processor.scorer,
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/examples/classifier_eval.py", line 78, in evaluate
input_ids, segment_ids, input_mask, label_ids)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 1072, in forward
output_all_encoded_layers=False, return_att=return_att)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 769, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 458, in forward
hidden_states, attn = layer_module(hidden_states, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 441, in forward
attention_output, attn = self.attention(hidden_states, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 335, in forward
self_output, attn = self.self(input_tensor, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 307, in forward
self.context_layer_val.retain_grad()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 326, in retain_grad
raise RuntimeError("can't retain_grad on Tensor that has requires_grad=False")
RuntimeError: can't retain_grad on Tensor that has requires_grad=False
Evaluating: 0% 0/307 [00:00<?, ?it/s]
I don't know how to fix it. Hope you can help me!
Hi,
I am running the command
bash experiments/BERT/heads_ablation.sh MNLI
I am getting the following error
Traceback (most recent call last):
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 578, in <module>
main()
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 275, in main
model.bert.mask_heads(to_prune)
File "/home/pdguest/ishita/py-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 535, in __getattr__
type(self).__name__, name))
AttributeError: 'BertModel' object has no attribute 'mask_heads'
Hi, I'm currently working on attention head pruning on models.
I think in your reported experiments, you fine-tuned bert when training downstream MNLI task, right?
But does it also work to fix the bert representation after pruning and train downstream MNLI task?
I appreciate your answer
Hello,
I am trying to run the MT ablation experiments. When I ran the command
wget https://s3.amazonaws.com/fairseq-py/models/wmt14.en-fr.joined-dict.transformer.tar.bz2
I get the following error
--2020-03-31 17:58:31-- https://s3.amazonaws.com/fairseq-py/models/wmt14.en-fr.joined-dict.transformer.tar.bz2 Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.207.205 Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.207.205|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2020-03-31 17:58:31 ERROR 404: Not Found.
Hi,
I am trying to understand why did we need different normalization factors for the last layer of the BERT compared to all other layers?
https://github.com/pmichel31415/pytorch-pretrained-BERT/blob/18a86a7035cf8a48d16c101a66e439bf6ab342f1/examples/classifier_eval.py#L246
vs
https://github.com/pmichel31415/pytorch-pretrained-BERT/blob/18a86a7035cf8a48d16c101a66e439bf6ab342f1/examples/classifier_eval.py#L247
Hi,
thanks for your code! The pruning works great using masking. However, when I tried to actually prune the model to see, if there's a speedup, it fails.
bash experiments/BERT/heads_pruning.sh SST-2 --actually_prune
new_layer
in prune_linear_layer
has to be moved to the correct device.new_layer.to(layer.weight.device)
13:09:27-INFO: Evaluating following pruning strategy
13:09:27-INFO: 9:3 10:10 11:3,7,8,9,10
13:09:27-INFO: ***** Running evaluation *****
13:09:27-INFO: Num examples = 872
13:09:27-INFO: Batch size = 32
Evaluating: 0%| | 0/28 [00:00<?, ?it/s]Traceback (most recent call last):
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 578, in <module>
main()
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 514, in main
scorer=processor.scorer,
File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/examples/classifier_eval.py", line 78, in evaluate
input_ids, segment_ids, input_mask, label_ids)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 1072, in forward
output_all_encoded_layers=False, return_att=return_att)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 769, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 458, in forward
hidden_states, attn = layer_module(hidden_states, attention_mask)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 441, in forward
attention_output, attn = self.attention(hidden_states, attention_mask)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 335, in forward
self_output, attn = self.self(input_tensor, attention_mask)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 274, in forward
mixed_query_layer = self.query(hidden_states)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 92, in forward
return F.linear(input, self.weight, self.bias)
File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/functional.py", line 1408, in linear
output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [4096 x 768], m2: [576 x 768] at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/generic/THCTensorMathBlas.cu:268
Hi,
I am trying to reproduce your result of BERT. I followed the Prerequisite:
# Pytorch pretrained BERT
git clone https://github.com/pmichel31415/pytorch-pretrained-BERT
cd pytorch-pretrained-BERT
git checkout paul
cd ..
# Install the pytorch-pretrained_BERT:
cd pytorch-pretrained-BERT
pip install .
cd ..
# Run the code:
bash experiments/BERT/heads_ablation.sh MNLI
But got this error:
02:06:57-INFO: Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
02:06:57-INFO: Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
Traceback (most recent call last):
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 582, in <module>
main()
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 275, in main
model.bert.mask_heads(to_prune)
File "/home/guest/anaconda3/envs/huggingface_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
type(self).__name__, name))
AttributeError: 'DataParallel' object has no attribute 'bert'
1(standard_in) 2: syntax error
0.00000(standard_in) 2: syntax error
0.00000(standard_in) 2: syntax error
0.00000(standard_in) 2: syntax error
0.00000(standard_in) 2: syntax error
0.00000(standard_in) 2: syntax error
Any idea or suggestion?
Hi ! @pmichel31415
1.In are-16-heads-really-better-than-1/experiments/MT/prune_wmt.sh you have the --raw-text $EXTRA_OPTIONS,
and I don't know the meaning. Can you tell me its explanation and how to use it? It is the origin ref text or something?
2. I don't know how to use the --transformer-mask-heads
. Can you show me an example?
In the README, it asks us to checkout a branch not put on master. I don't think that branch is uploaded to master, as the github has no actual code in it.
1、
(1)I do this and get a pruned model:
model.bert.prune_heads(to_prune)
(2) I set n_retrain_steps_after_pruning a value greater than 0
next:
then:
to retrain my pruned model, that is ok?
2、I don't understand the difference between above method and retrain_pruned_heads(the following method)
THANK YOU !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.