Coder Social home page Coder Social logo

are-16-heads-really-better-than-1's People

Contributors

pmichel31415 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

are-16-heads-really-better-than-1's Issues

RuntimeError: can't retain_grad on Tensor that has requires_grad=False

Sorry to bother you. I met a bug druing runing the "heads_pruning.sh", and the error is:

12:21:27-INFO: ***** Running evaluation *****
12:21:27-INFO: Num examples = 9815
12:21:27-INFO: Batch size = 32
Evaluating: 0% 0/307 [00:00<?, ?it/s]Traceback (most recent call last):
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 585, in
main()
File "pytorch-pretrained-BERT/examples/run_classifier.py", line 521, in main
scorer=processor.scorer,
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/examples/classifier_eval.py", line 78, in evaluate
input_ids, segment_ids, input_mask, label_ids)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 1072, in forward
output_all_encoded_layers=False, return_att=return_att)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 769, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 458, in forward
hidden_states, attn = layer_module(hidden_states, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 441, in forward
attention_output, attn = self.attention(hidden_states, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 335, in forward
self_output, attn = self.self(input_tensor, attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/XAI in NLP/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 307, in forward
self.context_layer_val.retain_grad()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 326, in retain_grad
raise RuntimeError("can't retain_grad on Tensor that has requires_grad=False")
RuntimeError: can't retain_grad on Tensor that has requires_grad=False
Evaluating: 0% 0/307 [00:00<?, ?it/s]

I don't know how to fix it. Hope you can help me!

Not able to prune the BERT model

Hi,
I am running the command

bash experiments/BERT/heads_ablation.sh MNLI

I am getting the following error

Traceback (most recent call last):
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 578, in <module>
    main()
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 275, in main
    model.bert.mask_heads(to_prune)
  File "/home/pdguest/ishita/py-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 535, in __getattr__
    type(self).__name__, name))
AttributeError: 'BertModel' object has no attribute 'mask_heads'

Is BERT finetuned after pruning?

Hi, I'm currently working on attention head pruning on models.
I think in your reported experiments, you fine-tuned bert when training downstream MNLI task, right?
But does it also work to fix the bert representation after pruning and train downstream MNLI task?
I appreciate your answer

Not able to obtain pretrained WMT model

Hello,

I am trying to run the MT ablation experiments. When I ran the command

wget https://s3.amazonaws.com/fairseq-py/models/wmt14.en-fr.joined-dict.transformer.tar.bz2

I get the following error

--2020-03-31 17:58:31-- https://s3.amazonaws.com/fairseq-py/models/wmt14.en-fr.joined-dict.transformer.tar.bz2 Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.207.205 Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.207.205|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2020-03-31 17:58:31 ERROR 404: Not Found.

Why do we need different normalization for all the layers compared to the last layer in BERT during importance score calculation?

BERT actually_prune option not working

Hi,

thanks for your code! The pruning works great using masking. However, when I tried to actually prune the model to see, if there's a speedup, it fails.

bash experiments/BERT/heads_pruning.sh SST-2 --actually_prune
  1. The new_layer in prune_linear_layer has to be moved to the correct device.
new_layer.to(layer.weight.device)
  1. The forward function fails, because the input and output shape of the previous layer do not seem to match:
13:09:27-INFO: Evaluating following pruning strategy
13:09:27-INFO: 9:3 10:10 11:3,7,8,9,10
13:09:27-INFO: ***** Running evaluation *****
13:09:27-INFO:   Num examples = 872
13:09:27-INFO:   Batch size = 32
Evaluating:   0%|                                                                | 0/28 [00:00<?, ?it/s]Traceback (most recent call last):
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 578, in <module>
    main()
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 514, in main
    scorer=processor.scorer,
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/examples/classifier_eval.py", line 78, in evaluate
    input_ids, segment_ids, input_mask, label_ids)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 1072, in forward
    output_all_encoded_layers=False, return_att=return_att)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 769, in forward
    output_all_encoded_layers=output_all_encoded_layers)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 458, in forward
    hidden_states, attn = layer_module(hidden_states, attention_mask)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 441, in forward
    attention_output, attn = self.attention(hidden_states, attention_mask)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 335, in forward
    self_output, attn = self.self(input_tensor, attention_mask)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/projects/are-16-heads-really-better-than-1/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 274, in forward
    mixed_query_layer = self.query(hidden_states)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 92, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/glock/.pyenv/versions/pruning/lib/python3.7/site-packages/torch/nn/functional.py", line 1408, in linear
    output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [4096 x 768], m2: [576 x 768] at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/generic/THCTensorMathBlas.cu:268

Is the code still able to run?

Hi,

I am trying to reproduce your result of BERT. I followed the Prerequisite:

# Pytorch pretrained BERT
git clone https://github.com/pmichel31415/pytorch-pretrained-BERT
cd pytorch-pretrained-BERT
git checkout paul
cd ..
# Install the pytorch-pretrained_BERT:
cd pytorch-pretrained-BERT
pip install .
cd ..
# Run the code:
bash experiments/BERT/heads_ablation.sh MNLI

But got this error:

02:06:57-INFO: Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
02:06:57-INFO: Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
Traceback (most recent call last):
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 582, in <module>
    main()
  File "pytorch-pretrained-BERT/examples/run_classifier.py", line 275, in main
    model.bert.mask_heads(to_prune)
  File "/home/guest/anaconda3/envs/huggingface_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'DataParallel' object has no attribute 'bert'


1(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error
        0.00000(standard_in) 2: syntax error

Any idea or suggestion?

about the params: --raw-text and --transformer-mask-heads

Hi ! @pmichel31415
1.In are-16-heads-really-better-than-1/experiments/MT/prune_wmt.sh you have the --raw-text $EXTRA_OPTIONS, and I don't know the meaning. Can you tell me its explanation and how to use it? It is the origin ref text or something?
2. I don't know how to use the --transformer-mask-heads . Can you show me an example?

No code on master?

In the README, it asks us to checkout a branch not put on master. I don't think that branch is uploaded to master, as the github has no actual code in it.

a question about run_classifier.py

1、
(1)I do this and get a pruned model:
model.bert.prune_heads(to_prune)
(2) I set n_retrain_steps_after_pruning a value greater than 0
next:
aaa
then:

bbb

  to retrain my pruned model, that is ok?

2、I don't understand the difference between above method and retrain_pruned_heads(the following method)

cccc

THANK YOU !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.