2020-05-15 11:58:57,166 - INFO - matches:['L3_hidden_mse', 'L3_hidden_smmd'] 2020-

加入match之后list index out of range about textbrewer HOT 4 CLOSED

airaria commented on May 27, 2024

加入match之后list index out of range

from textbrewer.

Comments (4)

airaria commented on May 27, 2024 1

你返回的attention list 是包含13个元素吗？原版的transformer里attention的长度只有12。你可能需要将index-1:
L4_attention_mse=[{"layer_T":2, "layer_S":0, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":5, "layer_S":1, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":8, "layer_S":2, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":11, "layer_S":3, "feature":"attention", "loss":"attention_mse", "weight":1}]

from textbrewer.

airaria commented on May 27, 2024

你可以检查以下几个关键点：

（教师/学生）模型是否返回了足够数量的hidden states？比如你这段代码里要求教师返回长度13的list（embedding + 12 hidden states)，学生返回长度4的list (embedding + 3 hidden states);
如果你用的是HuggingFace的Transformers, 需要在对应模型的config里设置 config.output_hidden_states=True。见https://huggingface.co/transformers/model_doc/bert.html?highlight=output_hidden_states
adaptor里是否正确匹配了模型的输出并正确返回了dict ，其中 dict['hidden'] = 模型输出的hidden_states?

from textbrewer.

MarvinLong commented on May 27, 2024

你可以检查以下几个关键点：

（教师/学生）模型是否返回了足够数量的hidden states？比如你这段代码里要求教师返回长度13的list（embedding + 12 hidden states)，学生返回长度4的list (embedding + 3 hidden states);
如果你用的是HuggingFace的Transformers, 需要在对应模型的config里设置 config.output_hidden_states=True。见https://huggingface.co/transformers/model_doc/bert.html?highlight=output_hidden_states

adaptor里是否正确匹配了模型的输出并正确返回了dict ，其中 dict['hidden'] = 模型输出的hidden_states?

好像attention的还是对应不上，我的理解是attention的index-1，不知道这样对不对。算上embedding，老师有13层hidden，12层attention；学生有5层hidden，4层attention。
L4_attention_mse=[{"layer_T":3, "layer_S":1, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":6, "layer_S":2, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":9, "layer_S":3, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":12, "layer_S":4, "feature":"attention", "loss":"attention_mse", "weight":1}]

from textbrewer.

MarvinLong commented on May 27, 2024

你返回的attention list 是包含13个元素吗？原版的transformer里attention的长度只有12。你可能需要将index-1:
L4_attention_mse=[{"layer_T":2, "layer_S":0, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":5, "layer_S":1, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":8, "layer_S":2, "feature":"attention", "loss":"attention_mse", "weight":1},
{"layer_T":11, "layer_S":3, "feature":"attention", "loss":"attention_mse", "weight":1}]

是的确实要减一，matches.py文件中的attention老师和学生的对应index都减一，使用attention进行蒸馏就没问题了。

from textbrewer.

加入match之后list index out of range about textbrewer HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent