pmichel31415 / are-16-heads-really-better-than-1 Goto Github PK

View Code? Open in Web Editor NEW

163.0 163.0 14.0 16 KB

Code for the paper "Are Sixteen Heads Really Better than One?"

License: MIT License

Shell 100.00%

bert machine-learning nlp transformer

are-16-heads-really-better-than-1's People

Contributors

Stargazers

Watchers

Forkers

jdc08161063 aninrusimha ufukhurriyetoglu anantshah200 ljj7975 xrosliang michael-wzhu nirvanesque aggarwalpiush weibobo2015 hayeonlee haaaam leima0324 jinlovespho

are-16-heads-really-better-than-1's Issues

RuntimeError: can't retain_grad on Tensor that has requires_grad=False

Sorry to bother you. I met a bug druing runing the "heads_pruning.sh", and the error is：

12:21:27-INFO: ***** Running evaluation *****
12:21:27-INFO: Num examples = 9815
12:21:27-INFO: Batch size = 32
Evaluating: 0% 0/307 [00:00<?, ?it/s]Traceback (most recent call last):
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 585, in
main()
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 521, in main
scorer=processor.scorer,
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/examples/classifier_eval.py", line 78, in evaluate
input_ids, segment_ids, input_mask, label_ids)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 1072, in forward
output_all_encoded_layers=False, return_att=return_att)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 769, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 458, in forward
hidden_states, attn = layer_module(hidden_states, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 441, in forward
attention_output, attn = self.attention(hidden_states, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 335, in forward
self_output, attn = self.self(input_tensor, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 307, in forward
self.context_layer_val.retain_grad()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 326, in retain_grad
raise RuntimeError("can't retain_grad on Tensor that has requires_grad=False")
RuntimeError: can't retain_grad on Tensor that has requires_grad=False
Evaluating: 0% 0/307 [00:00<?, ?it/s]

I don't know how to fix it. Hope you can help me!

Not able to prune the BERT model

Hi,
I am running the command

bash experiments/BERT/heads_ablation.sh MNLI

I am getting the following error

Traceback (most recent call last):
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 578, in <module>
    main()
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 275, in main
    model.bert.mask_heads(to_prune)
  File "/home/pdguest/ishita/py-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 535, in __getattr__
    type(self).__name__, name))
AttributeError: 'BertModel' object has no attribute 'mask_heads'

Is BERT finetuned after pruning?

Hi, I'm currently working on attention head pruning on models.
I think in your reported experiments, you fine-tuned bert when training downstream MNLI task, right?
But does it also work to fix the bert representation after pruning and train downstream MNLI task?
I appreciate your answer

Not able to obtain pretrained WMT model

Hello,

I am trying to run the MT ablation experiments. When I ran the command

wget https://s3.amazonaws.com/fairseq-py/models/wmt14.en-fr.joined-dict.transformer.tar.bz2

I get the following error

--2020-03-31 17:58:31-- https://s3.amazonaws.com/fairseq-py/models/wmt14.en-fr.joined-dict.transformer.tar.bz2 Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.207.205 Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.207.205|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2020-03-31 17:58:31 ERROR 404: Not Found.

Why do we need different normalization for all the layers compared to the last layer in BERT during importance score calculation?

Hi,

I am trying to understand why did we need different normalization factors for the last layer of the BERT compared to all other layers?

https://github.com/pmichel31415/pytorch-pretrained-BERT/blob/18a86a7035cf8a48d16c101a66e439bf6ab342f1/examples/classifier_eval.py#L246
vs
https://github.com/pmichel31415/pytorch-pretrained-BERT/blob/18a86a7035cf8a48d16c101a66e439bf6ab342f1/examples/classifier_eval.py#L247

Systematic Pruning Experiments Problem

BERT actually_prune option not working

Hi,

thanks for your code! The pruning works great using masking. However, when I tried to actually prune the model to see, if there's a speedup, it fails.

bash experiments/BERT/heads_pruning.sh SST-2 --actually_prune

The new_layer in prune_linear_layer has to be moved to the correct device.

new_layer.to(layer.weight.device)

The forward function fails, because the input and output shape of the previous layer do not seem to match:

13:09:27-INFO: Evaluating following pruning strategy
13:09:27-INFO: 9:3 10:10 11:3,7,8,9,10
13:09:27-INFO: ***** Running evaluation *****
13:09:27-INFO:   Num examples = 872
13:09:27-INFO:   Batch size = 32
Evaluating:   0%|                                                                | 0/28 [00:00<?, ?it/s]Traceback (most recent call last):
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 578, in <module>
    main()
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 514, in main
    scorer=processor.scorer,
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/examples/classifier_eval.py", line 78, in evaluate
    input_ids, segment_ids, input_mask, label_ids)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 1072, in forward
    output_all_encoded_layers=False, return_att=return_att)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 769, in forward
    output_all_encoded_layers=output_all_encoded_layers)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 458, in forward
    hidden_states, attn = layer_module(hidden_states, attention_mask)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 441, in forward
    attention_output, attn = self.attention(hidden_states, attention_mask)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 335, in forward
    self_output, attn = self.self(input_tensor, attention_mask)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 274, in forward
    mixed_query_layer = self.query(hidden_states)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 92, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/functional.py", line 1408, in linear
    output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [4096 x 768], m2: [576 x 768] at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/generic/THCTensorMathBlas.cu:268

Is the code still able to run?

Hi,

I am trying to reproduce your result of BERT. I followed the Prerequisite:

# Pytorch pretrained BERT
git clone https://github.com/pmichel31415/pytorch-pretrained-BERT
cd pytorch-pretrained-BERT
git checkout paul
cd ..

# Install the pytorch-pretrained_BERT:
cd pytorch-pretrained-BERT
pip install .
cd ..

# Run the code:
bash experiments/BERT/heads_ablation.sh MNLI

But got this error:

02:06:57-INFO: Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
02:06:57-INFO: Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
Traceback (most recent call last):
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 582, in <module>
    main()
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 275, in main
    model.bert.mask_heads(to_prune)
  File "/home/guest/anaconda3/envs/huggingface_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'DataParallel' object has no attribute 'bert'


1(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error

Any idea or suggestion?

about the params: --raw-text and --transformer-mask-heads

Hi ! @pmichel31415
1.In are-16-heads-really-better-than-1/experiments/MT/prune_wmt.sh you have the --raw-text $EXTRA_OPTIONS, and I don't know the meaning. Can you tell me its explanation and how to use it? It is the origin ref text or something?
2. I don't know how to use the --transformer-mask-heads . Can you show me an example?

No code on master?

In the README, it asks us to checkout a branch not put on master. I don't think that branch is uploaded to master, as the github has no actual code in it.

a question about run_classifier.py

1、
(1)I do this and get a pruned model:
model.bert.prune_heads(to_prune)
(2) I set n_retrain_steps_after_pruning a value greater than 0
next：

then:

  to retrain my pruned model, that is ok?

2、I don't understand the difference between above method and retrain_pruned_heads（the following method）

THANK YOU !

pmichel31415 / are-16-heads-really-better-than-1 Goto Github PK

are-16-heads-really-better-than-1's People

Contributors

Stargazers

Watchers

Forkers

are-16-heads-really-better-than-1's Issues

RuntimeError: can't retain_grad on Tensor that has requires_grad=False

Not able to prune the BERT model

Is BERT finetuned after pruning?

Not able to obtain pretrained WMT model

Why do we need different normalization for all the layers compared to the last layer in BERT during importance score calculation?

Systematic Pruning Experiments Problem

BERT actually_prune option not working

Is the code still able to run?

about the params: --raw-text and --transformer-mask-heads

No code on master?

a question about run_classifier.py

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent