jsksxs360 / how-to-use-transformers Goto Github PK

Transformers 库快速入门教程

License: Apache License 2.0

Python 97.93% Shell 2.07%

bert classification natural-language-processing ner nlp prompt pytorch qa sentiment-classification summarization transformer transformers translation

how-to-use-transformers's People

Contributors

Stargazers

Watchers

Forkers

withyouto weifengchiu yaxinfan1 807544076 huggingaha jinghuayao pureloveljc zxf864823150 chenwarlock ldd91 taotao033 davidchengweiliu js-lan chaochao2020 scu-jjkinging crazyivanz zy20001121 wangmingliang1990 happy-xlf snowmanliu guxg linkessence tiandazhao freezesoul apologizefor jimcurrywang vigorous2008 frostjsy fordsupr cefanmin liuchaoxd goggeryang yuxuan-luo timeflies99 yuki-younai usydapeng lf464347567 y1nglamore 9tong tianbingsheng shenhong233 gongwenbo snowflakewang sleepylan liulj0507 zzd2001 wi1y-project mrqjsfhf compasty lxl876 phexic youcanyouupsb cjn-chen bupengju dustyou dpcc2017 srcao-bingo swallsky mickjoust1018 luzhoushili willjoe1987 13717630148 tyfloving zhyidi gaolu-yyny amxgel linfanliu01 flyincs tonystz allensmile wangxihao wangqun010101 xiaoyujane lijianshe02 wnostop 27astra fairyaling mt6979 alivon2019 nonomal luaoun huiguyy boyrobot xymj ai-in-population-health-lab shadu888 brilliantsong2 tablebear davy131 chenli118 ftffang hiwong yvniverse 18855482286 demaolianda lzeqian rover12421 0xc00 zmxccxy chrislee-codes

how-to-use-transformers's Issues

question about attention mask

关于第九章这一部分：
为了简化数据处理，这里我们并没有将 [CLS]、[SEP]、[PAD] 等特殊 token 对应的标签设为 -100，而是维持原始的 0 值，然后在计算损失时借助 Attention Mask 来排除填充位置。

attention mask对于cls的位置是1。“active_loss = attention_mask.view(-1) == 1”会包括cls。是否需要mask掉？

2，3章疑问

第2章

输入这里应该是input embeddings吧？

第3章

公式5的求和上标是不是应该为n?

维度不对，这里的维度应该是 dxn 吧。如果是 nxd, X= (x1,x2,...,xn)^T

如果你对 Prompting 概念不是很清楚，强烈建议先阅读一下《Prompt 方法简介》。链接失效

链接失效了，有没有新的连接。想读一下dalao的这篇文章

如何基于transformers库自定义模型？

我试了一下您这个写法，确实可以从from_pretrained方法中加载一些已经预训练过的一些模型，但是前提是这个地方的参数名称一定叫做self.bert
如果我把这一行的名称改写为self.bert_model等等其他的名称，就统统无法用from_pretrained方法成功加载权重了。这样是不是太死板了，有什么别的方法解决吗？

多卡训练

你好，在摘要提取中，进行多卡训练的时候，出现这样的问题AttributeError: 'DataParallel' object has no attribute 'prepare_decoder_input_ids_from_labels'是模型本身不能进行多卡训练吗？

Import "torch" could not be resolved

为什么我安装了pytorch但是依旧出现这个

关于第三章: 注意力机制实现的问题

在下面的代码中, 我觉得应该表明为什么 Q, K, V 向量序列是等于 inputs_embeds 的, 我理解的是注意力机制中的 QKV 是 embedding 与 W_Q 和 W_K , W_V 这三个矩阵相乘得到的, 这三个矩阵也是超参数, 而下面的代码是好像默认这三个矩阵是单位矩阵.
`import torch
from math import sqrt

Q = K = V = inputs_embeds
dim_k = K.size(-1)
scores = torch.bmm(Q, K.transpose(1,2)) / sqrt(dim_k)
print(scores.size())`

此外 dim_k = K.size(-1) 和下面封装的函数中不一致, 上面的 dim_k = K.size(-1), 而下面的 dim_k = query.size(-1)

`import torch
import torch.nn.functional as F
from math import sqrt

def scaled_dot_product_attention(query, key, value, query_mask=None, key_mask=None, mask=None):
dim_k = query.size(-1)
scores = torch.bmm(query, key.transpose(1, 2)) / sqrt(dim_k)
if query_mask is not None and key_mask is not None:
mask = torch.bmm(query_mask.unsqueeze(-1), key_mask.unsqueeze(1))
if mask is not None:
scores = scores.masked_fill(mask == 0, -float("inf"))
weights = F.softmax(scores, dim=-1)
return torch.bmm(weights, value)`

torch.jit.trace追踪

如果要用torch.jit.trace()进行最终,参数应该怎么填啊

end_token_index = torch.where(labels == tokenizer.eos_token_id)[1]

摘要评估用的Rouge，是不是有点问题？每次只取一个样本计算Rouge的值。

`import numpy as np
from rouge import Rouge

rouge = Rouge()

def test_loop(dataloader, model):
preds, labels = [], []
model.eval()
for batch_data in tqdm(dataloader):
batch_data = batch_data.to(device)
with torch.no_grad():
generated_tokens = model.generate(
batch_data["input_ids"],
attention_mask=batch_data["attention_mask"],
max_length=max_target_length,
num_beams=4,
no_repeat_ngram_size=2,
).cpu().numpy()
if isinstance(generated_tokens, tuple):
generated_tokens = generated_tokens[0]
label_tokens = batch_data["labels"].cpu().numpy()

    decoded_preds = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
    label_tokens = np.where(label_tokens != -100, label_tokens, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(label_tokens, skip_special_tokens=True)

    preds += [' '.join(pred.strip()) for pred in decoded_preds]
    labels += [' '.join(label.strip()) for label in decoded_labels]
scores = rouge.get_scores(hyps=preds, refs=labels)[0]
result = {key: value['f'] * 100 for key, value in scores.items()}
result['avg'] = np.mean(list(result.values()))
print(f"Rouge1: {result['rouge-1']:>0.2f} Rouge2: {result['rouge-2']:>0.2f} RougeL: {result['rouge-l']:>0.2f}\n")
return result`