Coder Social home page Coder Social logo

bert4keras's Introduction

bert4keras

说明

这是笔者重新实现的keras版的transformer模型库,致力于用尽可能清爽的代码来实现结合transformer和keras。

本项目的初衷是为了修改、定制上的方便,所以可能会频繁更新。

因此欢迎star,但不建议fork,因为你fork下来的版本可能很快就过期了。

功能

目前已经实现:

  • 加载bert/roberta/albert的预训练权重进行finetune;
  • 实现语言模型、seq2seq所需要的attention mask;
  • 丰富的examples
  • 从零预训练代码(支持TPU、多GPU,请看pretraining);
  • 兼容keras、tf.keras

使用

安装稳定版:

pip install bert4keras

安装最新版:

pip install git+https://www.github.com/bojone/bert4keras.git

使用例子请参考examples目录。

之前基于keras-bert给出的例子,仍适用于本项目,只需要将bert_model的加载方式换成本项目的。

理论上兼容Python2和Python3,兼容tensorflow 1.14+和tensorflow 2.x,实验环境是Python 2.7、Tesorflow 1.14+以及Keras 2.3.1(已经在2.2.4、2.3.0、2.3.1、tf.keras下测试通过)。

为了获得最好的体验,建议你使用Tensorflow 1.14 + Keras 2.3.1组合。

关于环境组合
  • 支持tf+keras和tf+tf.keras,后者需要提前传入环境变量TF_KERAS=1。

  • 当使用tf+keras时,建议2.2.4 <= keras <= 2.3.1,以及 1.14 <= tf <= 2.2,不能使用tf 2.3+。

  • keras 2.4+可以用,但事实上keras 2.4.x基本上已经完全等价于tf.keras了,因此如果你要用keras 2.4+,倒不如直接用tf.keras。

当然,乐于贡献的朋友如果发现了某些bug的话,也欢迎指出修正甚至Pull Requests~

权重

目前支持加载的权重:

注意事项

  • 注1:brightmart版albert的开源时间早于Google版albert,这导致早期brightmart版albert的权重与Google版的不完全一致,换言之两者不能直接相互替换。为了减少代码冗余,bert4keras的0.2.4及后续版本均只支持加载Google版以brightmart版中带Google字眼的权重。如果要加载早期版本的权重,请用0.2.3版本,或者考虑作者转换过的albert_zh
  • 注2:下载下来的ELECTRA权重,如果没有json配置文件的话,参考这里自己改一个(需要加上type_vocab_size字段)。

更新

  • 2023.03.06: 无穷大改np.inf;优化显存占用。将无穷大改为np.inf,运算更加准确,而且在低精度运算时不容易出错;同时合并了若干mask算子,减少了显存占用。实测在A100上训练base和large级别模型时,速度有明显加快,显存占用也有降低。
  • 2022.03.20: 增加RoFormerV2
  • 2022.02.28: 增加GatedAttentionUnit
  • 2021.04.23: 增加GlobalPointer
  • 2021.03.23: 增加RoFormer
  • 2021.01.30: 发布0.9.9版,完善多GPU支持,增加多GPU例子:task_seq2seq_autotitle_multigpu.py
  • 2020.12.29: 增加residual_attention_scores参数来实现RealFormer,只需要在build_transformer_model中传入参数residual_attention_scores=True启用。
  • 2020.12.04: PositionEmbedding引入层次分解,可以让BERT直接处理超长文本,在build_transformer_model中传入参数hierarchical_position=True启用。
  • 2020.11.19: 支持GPT2模型,参考CPM_LM_bert4keras项目。
  • 2020.11.14: 新增分参数学习率extend_with_parameter_wise_lr,可用于给每层设置不同的学习率。
  • 2020.10.27: 支持T5.1.1Multilingual T5
  • 2020.08.28: 支持GPT_OpenAI
  • 2020.08.22: 新增WebServing类,允许简单地将模型转换为Web接口,详情请参考该类的说明
  • 2020.07.14: Transformer类加入prefix参数;snippets.py引入to_array函数;AutoRegressiveDecoder修改rtype='logits'时的一个隐藏bug。
  • 2020.06.06: 强迫症作祟:将Tokenizer原来的max_length参数重命名为maxlen,同时保留向后兼容性,建议大家用新参数名。
  • 2020.04.29: 增加重计算(参考keras_recompute),可以通过时间换空间,通过设置环境变量RECOMPUTE=1启用。
  • 2020.04.25: 优化tf2下的表现。
  • 2020.04.16: 所有example均适配tensorflow 2.0。
  • 2020.04.06: 增加UniLM预训练模式(测试中)。
  • 2020.04.06: 完善rematch方法。
  • 2020.04.01: Tokenizer增加rematch方法,给出分词结果与原序列的映射关系。
  • 2020.03.30: 尽量统一py文件的写法。
  • 2020.03.25: 支持ELECTRA。
  • 2020.03.24: 继续加强DataGenerator,允许传入迭代器时进行局部shuffle。
  • 2020.03.23: 增加调整Attention的key_size的选项。
  • 2020.03.17: 增强DataGenerator;优化模型写法。
  • 2020.03.15: 支持GPT2_ML
  • 2020.03.10: 支持Google的T5模型。
  • 2020.03.05: 将tokenizer.py更名为tokenizers.py
  • 2020.03.05: application='seq2seq'改名为application='unilm'
  • 2020.03.05: build_bert_model更名为build_transformer_model
  • 2020.03.05: 重写models.py结构。
  • 2020.03.04: 将bert.py更名为models.py
  • 2020.03.02: 重构mask机制(用回Keras自带的mask机制),以便更好地编写更复杂的应用。
  • 2020.02.22: 新增AutoRegressiveDecoder类,统一处理Seq2Seq的解码问题。
  • 2020.02.19: transformer block的前缀改为Transformer(本来是Encoder),使得其含义局限性更少。
  • 2020.02.13: 优化load_vocab函数;将build_bert_model中的keep_words参数更名为keep_tokens,此处改动可能会对部分脚本产生影响。
  • 2020.01.18: 调整文本处理方式,去掉codecs的使用。
  • 2020.01.17: 各api日趋稳定,为了方便大家使用,打包到pypi,首个打包版本号为0.4.6。
  • 2020.01.10: 重写模型mask方案,某种程度上让代码更为简练清晰;后端优化。
  • 2019.12.27: 重构预训练代码,减少冗余;目前支持RoBERTa和GPT两种预训练方式,详见pretraining
  • 2019.12.17: 适配华为的nezha权重,只需要在build_bert_model函数里加上model='nezha';此外原来albert的加载方式albert=True改为model='albert'
  • 2019.12.16: 通过跟keras 2.3+版本类似的思路给低版本引入层中层功能,从而恢复对低于2.3.0版本的keras的支持。
  • 2019.12.14: 新增Conditional Layer Normalization及相关demo。
  • 2019.12.09: 各example的data_generator规范化;修复application='lm'时的一个错误。
  • 2019.12.05: 优化tokenizer的do_lower_case,同时微调各个example。
  • 2019.11.23: 将train.py重命名为optimizers.py,更新大量优化器实现,全面兼容keras和tf.keras。
  • 2019.11.19: 将utils.py重命名为tokenizer.py。
  • 2019.11.19: 想来想去,最后还是决定把snippets放到bert4keras.snippets下面去好了。
  • 2019.11.18: 优化预训练权重加载逻辑,增加保存模型权重至Bert的checkpoint格式方法。
  • 2019.11.17: 分离一些与Bert本身不直接相关的常用代码片段到python_snippets,供其它项目共用。
  • 2019.11.11: 添加NSP部分。
  • 2019.11.05: 适配google版albert,不再支持非Google版albert_zh
  • 2019.11.05: 以RoBERTa为例子的预训练代码开发完毕,同时支持TPU/多GPU训练,详见roberta。欢迎在此基础上构建更多的预训练代码。
  • 2019.11.01: 逐步增加预训练相关代码,详见pretraining
  • 2019.10.28: 支持使用基于sentencepiece的tokenizer。
  • 2019.10.25: 引入原生tokenizer。
  • 2019.10.22: 引入梯度累积优化器。
  • 2019.10.21: 为了简化代码结构,决定放弃keras 2.3.0之前的版本的支持,目前只支持keras 2.3.0+以及tf.keras。
  • 2019.10.20: 应网友要求,现支持直接用model.save保存模型结构,用load_model加载整个模型(只需要在load_model之前执行from bert4keras.layers import *,不需要额外写custom_objects)。
  • 2019.10.09: 已兼容tf.keras,同时在tf 1.13和tf 2.0下的tf.keras测试通过,通过设置环境变量TF_KERAS=1来切换tf.keras。
  • 2019.10.09: 已兼容Keras 2.3.x,但只是临时方案,后续可能直接移除掉2.3之前版本的支持。
  • 2019.10.02: 适配albert,能成功加载albert_zh的权重,只需要在load_pretrained_model函数里加上albert=True

背景

之前一直用CyberZHG大佬的keras-bert,如果纯粹只是为了在keras下对bert进行调用和fine tune来说,keras-bert已经足够能让人满意了。

然而,如果想要在加载官方预训练权重的基础上,对bert的内部结构进行修改,那么keras-bert就比较难满足我们的需求了,因为keras-bert为了代码的复用性,几乎将每个小模块都封装为了一个单独的库,比如keras-bert依赖于keras-transformer,而keras-transformer依赖于keras-multi-head,keras-multi-head依赖于keras-self-attention,这样一重重依赖下去,改起来就相当头疼了。

所以,我决定重新写一个keras版的bert,争取在几个文件内把它完整地实现出来,减少这些依赖性,并且保留可以加载官方预训练权重的特性。

鸣谢

感谢CyberZHG大佬实现的keras-bert,本实现有不少地方参考了keras-bert的源码,在此衷心感谢大佬的无私奉献。

相关

bert4torch:一个跟bert4keras风格很相似的pytorch-based的transofrmer库,使用pytorch的读者可以尝试。

引用

@misc{bert4keras,
  title={bert4keras},
  author={Jianlin Su},
  year={2020},
  howpublished={\url{https://bert4keras.spaces.ac.cn}},
}

交流

QQ交流群:808623966,微信群请加机器人微信号spaces_ac_cn

bert4keras's People

Contributors

bojone avatar chuxij avatar enningxie avatar i4never avatar ianliuy avatar shevonkuan avatar tiandiweizun avatar xv44586 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert4keras's Issues

Tokenizer.load_vocab加载词典特需字符的问题

我在python3环境下,使用load_vocab,codecs在处理vocab.txt时候,第13504和344行字符(对应的id为13503和343)的时候都会被处理成空格,但是python的open没有这个问题,这个小问题主要是我再次保存词典的时候,发现少了一行

Unable to create link (name already exists)

Python2.7环境,运行task_seq2seq.py,存储模型时出错
Traceback (most recent call last):
File "/media/brx/2d79a6a5-f419-aa4c-b391-314a73033208/project/Word_vector/bert4keras/examples/task_seq2seq.py", line 210, in
callbacks=[evaluator]
File "/home/brx/.local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/brx/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "/home/brx/.local/lib/python2.7/site-packages/keras/engine/training_generator.py", line 260, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/brx/.local/lib/python2.7/site-packages/keras/callbacks/callbacks.py", line 152, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/media/brx/2d79a6a5-f419-aa4c-b391-314a73033208/project/Word_vector/bert4keras/examples/task_seq2seq.py", line 197, in on_epoch_end
model.save_weights('./best_model.weights')
File "/home/brx/.local/lib/python2.7/site-packages/keras/engine/saving.py", line 449, in save_wrapper
save_function(obj, filepath, overwrite, *args, **kwargs)
File "/home/brx/.local/lib/python2.7/site-packages/keras/engine/network.py", line 1184, in save_weights
saving.save_weights_to_hdf5_group(f, self.layers)
File "/home/brx/.local/lib/python2.7/site-packages/keras/engine/saving.py", line 761, in save_weights_to_hdf5_group
dtype=val.dtype)
File "/home/brx/.local/lib/python2.7/site-packages/h5py/_hl/group.py", line 139, in create_dataset
self[name] = dset
File "/home/brx/.local/lib/python2.7/site-packages/h5py/_hl/group.py", line 373, in setitem
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)

Using fine tuned Model

I post this to find if I'm doing the write thing.

I just add dense and softmax to fine tune the model

albert_model = build_bert_model(config_path, checkpoint_path, albert=True)
out = Lambda(lambda x: x[: 0])(albert_model.output)
output = Dense(units=class_num, activation = 'softmax')(out)

after I trained the model, I try to load the model by

model = load_model (model.dir)

and I get the error like 'I miss the custom layer 'TokenEmbedding'
after that, I try

 custom_objects = {'MaskedGlobalPool1D: MaskedGlobalPool1D}
 custom_objects.update(get_bert_custom_objects())

get_bert_custom_objects() is come from keras_bert, basically just define some custom layer
while MaskedGlobalPool1D from keras_bert aiming to get rid of the mask of the output of the model.

I don't know if I'm doing right, since the prediction is not good enough.
Can someone explain what is the TokenEmbedding layer, the dese layer I defined?

加载google albert和albert_small_google_zh错误,提示 Layer weight shape (384, 384) not compatible with provided weight shape (384, 128)

found_name = variable_names.pop(np.argmax(sims))

加载google的时候,提示attempt to get argmax of an empty sequence,如是我加了个判断,其实根本原因应该是因为进行了pop操作,而hiddern有多层,由于albert参数共享机制,variable_names参数个数严重少于实际情况,导致出错。

由于实际原因,线上往往是旧版本,上个bert2keras挺好的,支持keras2.2,升级到2.3后就如网友说的set_weights的情况,只能舍弃了2.2了。

simple tokenizer中encode问题

simpletokenizer中encode方法

def encode(self, first, second=None, first_length=None):
        """输出文本对应token id和segment id
        如果传入first_length,则强行padding第一个句子到指定长度
        """
        token_ids, segment_ids = [], []
        token_ids.extend([self._token_dict[c] for c in self.tokenize(first)])
        segment_ids.extend([0] * (len(first) + 2))
        if first_length is not None and len(token_ids) < first_length + 2:
            token_ids.extend([0] * (first_length + 2 - len(token_ids)))
            segment_ids.extend([0] * (first_length + 2 - len(segment_ids)))
        if second is not None:
            token_ids.extend([
                self._token_dict[c]
                for c in self.tokenize(second, add_cls=False)
            ])
            segment_ids.extend([1] * (len(second) + 1))
        return token_ids, segment_ids
    

我查阅了原始bert fine-tuning的代码, 如果first+second超过最大长度,会调成first和second变成等长。 而上面是处理first长度,没处理second长度, 如果batch和batch之间数据长度不一样会报错。 尝试改成了first+second == 512-3能成功跑起来

keras如何load_model

加载完预训练模型并且微调后保存成了新的keras.h5模型 请问这个新模型怎么加载呢 custom_objects这个参数应该怎么填呢

[BUG] add_cls and add_sep are not defined in BasicTokenizer

There exists a problem about Tokenizer, which not defines the add_cls and add_sep.

Traceback (most recent call last):
File "albert.py", line 156, in
callbacks=[evaluator])
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/site-packages/keras/engine/training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/site-packages/keras/engine/training_generator.py", line 185, in fit_generator
generator_output = next(output_generator)
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/site-packages/keras/utils/data_utils.py", line 742, in get
six.reraise(*sys.exc_info())
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/site-packages/keras/utils/data_utils.py", line 711, in get
inputs = future.get(timeout=30)
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/site-packages/keras/utils/data_utils.py", line 650, in next_sample
return six.next(_SHARED_SEQUENCES[uid])
File "albert.py", line 93, in forfit
for d in self.iter(True):
File "albert.py", line 81, in iter
token_ids, segment_ids = tokenizer.encode(text1, text2, max_length=MAX_LEN)
File "/home/songuser/anaconda3/envs/keras/lib/python3.7/site-packages/bert4keras/tokenizer.py", line 83, in encode
delta_1 = int(add_cls) + int(add_sep)

def encode(self,
               first_text,
               second_text=None,
               max_length=None,
               first_length=None,
               second_length=None):
        """输出文本对应token id和segment id
        如果传入first_length,则强行padding第一个句子到指定长度;
        同理,如果传入second_length,则强行padding第二个句子到指定长度。
        """
        first_tokens = self.tokenize(first_text, add_cls=False, add_sep=False)
        delta_1 = int(add_cls) + int(add_sep)   # add_cls and add_sep are not defined
        delta_2 = int(add_cls) + int(add_sep) * 2
        if second_text is None:
            if max_length is not None:
                first_tokens = first_tokens[:max_length - delta_1]

learning rate decay and schedule

請問可以解釋一下 learning_rate, weight_decay, lr_schedule之間的關係嗎?

假設

num_train_steps = 100 
num_warmup_steps = 10

learning_rate = 0.01
weight_decay_rate = 0.05

lr_schedule = {
    num_warmup_steps : 0.99,
    num_train_steps: 0.01,
}

optimizer = extend_with_weight_decay(Adam)
optimizer = extend_with_piecewise_linear_lr(optimizer)
optimizer_params = {
        'learning_rate': learning_rate,
        'lr_schedule': lr_schedule,
        'weight_decay_rate': weight_decay_rate,
    }
optimizer = optimizer(**optimizer_params)

我的理解是總共train 100 steps,
0-10: 0 to (0.01*0.99) (grow linearly)
10-100: 0.01 to (0.01*0.01) (decay linearly)

weight_decay_rate又是用在那裡呢?

感謝

基础类BertModel中,if self.with_mlm 情况下有问题

bert4keras/bert.py 文件中 if self.with_mlm 情况下:
x = EmbeddingDense(embedding_name='Embedding-Token', activation=self.with_mlm, name='MLM-Proba')(x) 中的 activation=self.with_mlm 语句会报错,
EmbeddingDense 中对应语句为 self.activation = activations.get(activation) ,activation 应该为激活函数名称,这里好像传了一个布尔值。

包含bert的Model无法应用到TimeDDistributed中

测试代码如下

from bert4keras.bert import load_pretrained_model as load_trained_model_from_checkpoint
from bert4keras.utils import SimpleTokenizer as Tokenizer
from keras.layers import Input, TimeDistributed
from keras.models import Model


config_path = './chinese_L-12_H-768_A-12/bert_config.json'
checkpoint_path = './chinese_L-12_H-768_A-12/bert_model.ckpt'
dict_path = './chinese_L-12_H-768_A-12/vocab.txt'

bert_model = load_trained_model_from_checkpoint(config_path,
                                                checkpoint_path)

for l in bert_model.layers:
    l.trainable = True

MAX_SENTENCE_LENGTH = 128
MAX_SENTENCE_COUNT = 64


x1_in = Input(shape=(MAX_SENTENCE_LENGTH, ), dtype='int32')
x2_in = Input(shape=(MAX_SENTENCE_LENGTH, ), dtype='int32')

x1, x2 = x1_in, x2_in
sentence = bert_model([x1, x2])

# sentence = Lambda(lambda x: x[:, 0])(sentence)
model1 = Model([x1_in, x2_in], sentence)
model1.summary()

texts_in = Input(shape=(MAX_SENTENCE_COUNT, MAX_SENTENCE_LENGTH, 2),
                 dtype='int32')
attention_weighted_sentences = TimeDistributed(model1)(texts_in)
model = Model(texts_in, attention_weighted_sentences)
model.summary()

报错

__________________________________________________________________________________________________
Traceback (most recent call last):
  File "/home/phoenixkiller/.vscode/extensions/ms-python.python-2019.9.34911/pythonFiles/ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "/home/phoenixkiller/.vscode/extensions/ms-python.python-2019.9.34911/pythonFiles/lib/python/ptvsd/__main__.py", line 432, in main
    run()
  File "/home/phoenixkiller/.vscode/extensions/ms-python.python-2019.9.34911/pythonFiles/lib/python/ptvsd/__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "/home/phoenixkiller/anaconda3/envs/keras_debug/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/home/phoenixkiller/anaconda3/envs/keras_debug/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/phoenixkiller/anaconda3/envs/keras_debug/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/phoenixkiller/source/bert4keras/test copy.py", line 36, in <module>
    attention_weighted_sentences = TimeDistributed(model1)(texts_in)
  File "/home/phoenixkiller/source/keras/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/phoenixkiller/source/keras/keras/layers/wrappers.py", line 248, in call
    y = self.layer.call(inputs, **kwargs)
  File "/home/phoenixkiller/source/keras/keras/engine/network.py", line 564, in call
    output_tensors, _, _ = self.run_internal_graph(inputs, masks)
  File "/home/phoenixkiller/source/keras/keras/engine/network.py", line 798, in run_internal_graph
    assert str(id(x)) in tensor_map, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("model_1/Encoder-12-FeedForward-Norm/add_1:0", shape=(?, 128, 768), dtype=float32)

用 tf.keras.models.load_model 加载模型报错:ValueError: Unknown layer: FactorizedEmbedding

model = tf.keras.models.load_model('./my_model.h5')


ValueError Traceback (most recent call last)
in ()
----> 1 model_1 = tf.keras.models.load_model('./my_model.h5')
2
3 tf.saved_model.simple_save(
4 tf.keras.backend.get_session(),
5 "./h5_savedmodel/",

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
144 if (h5py is not None and (
145 isinstance(filepath, h5py.File) or h5py.is_hdf5(filepath))):
--> 146 return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
147
148 if isinstance(filepath, six.string_types):

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py in load_model_from_hdf5(filepath, custom_objects, compile)
166 model_config = json.loads(model_config.decode('utf-8'))
167 model = model_config_lib.model_from_config(model_config,
--> 168 custom_objects=custom_objects)
169
170 # set weights

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/saving/model_config.py in model_from_config(config, custom_objects)
53 'Sequential.from_config(config)?')
54 from tensorflow.python.keras.layers import deserialize # pylint: disable=g-import-not-at-top
---> 55 return deserialize(config, custom_objects=custom_objects)
56
57

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/layers/serialization.py in deserialize(config, custom_objects)
100 module_objects=globs,
101 custom_objects=custom_objects,
--> 102 printable_module_name='layer')

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
189 custom_objects=dict(
190 list(_GLOBAL_CUSTOM_OBJECTS.items()) +
--> 191 list(custom_objects.items())))
192 with CustomObjectScope(custom_objects):
193 return cls.from_config(cls_config)

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/network.py in from_config(cls, config, custom_objects)
904 """
905 input_tensors, output_tensors, created_layers = reconstruct_from_config(
--> 906 config, custom_objects)
907 model = cls(inputs=input_tensors, outputs=output_tensors,
908 name=config.get('name'))

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/network.py in reconstruct_from_config(config, custom_objects, created_layers)
1840 # First, we create all layers and enqueue nodes to be processed
1841 for layer_data in config['layers']:
-> 1842 process_layer(layer_data)
1843 # Then we process nodes in order of layer depth.
1844 # Nodes that cannot yet be processed (if the inbound node

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/network.py in process_layer(layer_data)
1822 from tensorflow.python.keras.layers import deserialize as deserialize_layer # pylint: disable=g-import-not-at-top
1823
-> 1824 layer = deserialize_layer(layer_data, custom_objects=custom_objects)
1825 created_layers[layer_name] = layer
1826

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/layers/serialization.py in deserialize(config, custom_objects)
100 module_objects=globs,
101 custom_objects=custom_objects,
--> 102 printable_module_name='layer')

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
178 config = identifier
179 (cls, cls_config) = class_and_config_for_serialized_keras_object(
--> 180 config, module_objects, custom_objects, printable_module_name)
181
182 if hasattr(cls, 'from_config'):

~/nm-local-dir/usercache/137602/appcache/application_1565649576840_6632543/container_e2144_1565649576840_6632543_01_000005/nbenv/nbenv/lib/python3.5/site-packages/tensorflow_core/python/keras/utils/generic_utils.py in class_and_config_for_serialized_keras_object(config, module_objects, custom_objects, printable_module_name)
163 cls = module_objects.get(class_name)
164 if cls is None:
--> 165 raise ValueError('Unknown ' + printable_module_name + ': ' + class_name)
166 return (cls, config['config'])
167

ValueError: Unknown layer: FactorizedEmbedding

tensorflow版本和cuda版本

大佬,打扰了,我想问一下因为我的cuda版本是9.x的,但是如果tensorflow版本是1.13+的话,要求cuda版本10的,可以问一下之前能支持低版本tensorflow的包还在吗?

如何取出某一层的输出 作为特征?

试了下面的方法,报错 不知道是不是姿势不对, 如果可以的话 ,望老师解答一下
`

bert = build_bert_model(
config_path=config_path,
checkpoint_path=checkpoint_path,
with_pool=False,
return_keras_model=True )

x1= K.function([bert.layers[0].input],
                              [bert.layers[-2].output])

x2 = K.function([bert.layers[0].input],
                              [bert.layers[-3].output])

`

使用多GPU报错

您好,想问一下,单GPU运行的时候没有问题,但是当使用多GPU的时候报错是什么原因呢?
代码如下:
from keras.utils import multi_gpu_model
os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
with tf.device('/cpu:0'):
model = Model(albert_model.input, output)
model.summary()
model = multi_gpu_model(model, gpus=2)
错误如下:
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Incompatible shapes: [256,10,10] vs. [512,1,10] [[{{node replica_0/model_2/Encoder-1-MultiHeadSelfAttention/sub_1}}]] (1) Invalid argument: Incompatible shapes: [256,10,10] vs. [512,1,10] [[{{node replica_0/model_2/Encoder-1-MultiHeadSelfAttention/sub_1}}]] [[training/Adam/gradients/replica_1/model_2/Embedding-Norm/Mean_1_grad/Shape_2/_436]]
祝好。

parallel_apply多线程出错

windows环境下报了个错:

Connected to pydev debugger (build 193.5662.61)
2019-12-24 11:52:07.952336: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
构建词汇表中: 0it [00:00, ?it/s]Traceback (most recent call last):
File "D:\Anaconda3\envs\tf2\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "D:\Anaconda3\envs\tf2\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'parallel_apply..worker_step'
2019-12-24 11:52:13.359467: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Traceback (most recent call last):
File "D:\Anaconda3\envs\tf2\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

但是Ubuntu下不会报错,这个是为啥?

bert4keras和keras_bert不一致问题

在运行以下代码时,发现bert4keras会报错,keras_bert正常。以下是代码:
print('build bert model...')
bert_model = load_pretrained_model(config_path, checkpoint_path)
#替换成以下两行代码好使
#from keras_bert import load_trained_model_from_checkpoint
#bert_model = load_trained_model_from_checkpoint(config_path,checkpoint_path)
x1_input = Input(shape=(maxlen,), dtype='int32')
x2_input = Input(shape=(maxlen,), dtype='int32')
bert_output_layer = bert_model([x1_input, x2_input])
cls_output = Lambda(lambda x: x[:, 0])(bert_output_layer)
output = Dense(1, activation='sigmoid')(cls_output)
model = Model([x1_input, x2_input], output)
model.summary()

adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit([word_index, sent_index], y_train, shuffle=True, batch_size=128, epochs=epochs, validation_split=0.1)

以下是错误:
InvalidArgumentError: You must feed a value for placeholder tensor 'Input-Token_3' with dtype float and shape [?,?]
[[{{node Input-Token_3}} = Placeholderdtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
想问一下是什么原因呢?

import error

from bert4keras.bert import load_pretrained_model, set_gelu
File "/home/env/anaconda3/envs/tensorflow/lib/python3.6/site-packages/bert4keras/bert.py", line 4, in
from .layers import *
File "/home/env/anaconda3/envs/tensorflow/lib/python3.6/site-packages/bert4keras/layers.py", line 321
raise Exception, 'Embedding layer not found'

一些pretraining的問題

非常謝謝苏神這麼用心把bert轉成淺顯易懂的keras版本!

我有3個基本的問題:
1。在pretraining.py裡,

def mlm_loss(inputs):
"""计算loss的函数,需要封装为一个层
"""
y_true, y_pred, is_masked = inputs
is_masked = K.cast(is_masked, K.floatx())
loss = K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
loss = K.sum(loss * is_masked) / (K.sum(is_masked) + K.epsilon())
return loss
def mlm_acc(inputs):
"""计算准确率的函数,需要封装为一个层
"""
y_true, y_pred, is_masked = inputs
is_masked = K.cast(is_masked, K.floatx())
y_true = K.cast(y_true, K.floatx())
acc = keras.metrics.sparse_categorical_accuracy(y_true, y_pred)
acc = K.sum(acc * is_masked) / (K.sum(is_masked) + K.epsilon())
return acc

mlm_loss and mlm_acc 是被放在train_model裡的最後一層,並不是像一般放在compile的loss裡。如果是放在model裡,目前的mlm_loss和mlm_acc 是return 一個數值,它的shape = (). 這樣會不會有問題?感覺應該要是shape = (None, ). 是不是得在sum裡加個 axis = 1
loss = K.sum(loss * is_masked, axis=1) / (K.sum(is_masked, axis=1) + K.epsilon() )

2。在pretraining裡的optimization過程程中是同時optimize loss和acc嗎?

train_model.compile(
loss={
'mlm_loss': lambda y_true, y_pred: y_pred,
'mlm_acc': lambda y_true, y_pred: K.stop_gradient(y_pred),

我不是很熟悉上面這樣的用法,想了解為什麼不用loss就好,還是我誤會這個用法了?

3。在data_utils.py裡,

y = {
'mlm_loss': K.zeros([1]),
'mlm_acc': K.zeros([1]),
}

'mlm_acc': K.zeros([1]) 是不是要改成 'mlm_acc': K.ones([1]) 因為accuracy max = 1?

再次感謝!

加载徐亮版的albert,报错了

用example/task_sentiment_albert.py加载徐亮的tiny版albert ,出现下面错误
ValueError: Layer weight shape (21128, 312) not compatible with provided weight shape (21128, 128)
用keras_bert加载也出现同样的错,这是怎么回事啊?

模型保存和加载Unknown optimizer: new_optimizer

您好,在examples/task_sentiment_albert.py中使用
model.save_weights
model.load_weights
不会有问题,当使用
model.save("test.hdf5", overwrite=True, include_optimizer=True)
test_model = load_model("test.hdf5")
报错
deserialize_keras_object
': ' + class_name)
ValueError: Unknown optimizer: new_optimizer

如果把loss改掉,或者include_optimizer=False能解决这个问题,不知道这是否是一个需要优化的地方。

0.2.4 fine tune 问题

bert4keras 0.2.4
Kears 2.3.1
模型代码:

albert_model = build_bert_model(config_path, checkpoint_path, albert=True)
out = Lambda(lambda x: x[: 0])(albert_model.output)
output = Dense(units=class_num, activation = 'softmax')(out)

model = Model(albert_model.input, output)
mdel.compile(loss = model_loss, optimizer = optimizer, metrics=["categorical_accuracy"]

模型使用keras的 model.save保存 使用 load_model加载

from bert4keras.layers import *
model = load_model(os.path.join(model_dir, "albert.m"))

报错

ValueError: Unkown layer : PositionEmbedding

如果 from bert4keras.layers import custom_object
之后

model = load_model(os.path.join(model_dir, "albert.m"), custom_object=custom_object)

报错

AttributeError: 'tuple' object has no attribute 'layer'

train and validation accuracy stuck when fine-tuning

I'm trying to using google's pre-trained ALBERT weights for an English sentiment anaylsis task. I'm sure that the format of dataset files is right, and the data has been well loaded and tokenized.
However, when training, the train and val accs have not changed, stucking to 50%.
I'd like to know that have I missed some details to load google's pretrained weights for English data?
Following the instructions, I've downloaded and unzipped the model from https://tfhub.dev/google/albert_base/2?tf-hub-format=compressed, have created the config file.
The downloaded model has 2 pb files, one asset folder with 30k-clean.model and 30k-clean.vocab, and one variables folder with variables.index and variables.data-00000-of-00001.

Then my code is like that:
config_path = 'models/albert_base/albert_config.json'
checkpoint_path = 'models/albert_base/variables/variables'
spm_path = 'models/albert_base/assets/30k-clean.model'
tokenizer = SpTokenizer(spm_path)
albert = build_bert_model(config_path, checkpoint_path, with_pool=True,albert=True,return_keras_model=False)

when loading the model, there're the loggings:

==> searching: bert/embeddings/word_embeddings, found name: bert/embeddings/word_embeddings
==> searching: bert/embeddings/position_embeddings, found name: bert/embeddings/position_embeddings
==> searching: bert/embeddings/token_type_embeddings, found name: bert/embeddings/token_type_embeddings
==> searching: bert/embeddings/LayerNorm/gamma, found name: bert/embeddings/LayerNorm/gamma
==> searching: bert/embeddings/LayerNorm/beta, found name: bert/embeddings/LayerNorm/beta
==> searching: bert/encoder/embedding_hidden_mapping_in/kernel, found name: bert/encoder/embedding_hidden_mapping_in/kernel
==> searching: bert/encoder/embedding_hidden_mapping_in/bias, found name: bert/encoder/embedding_hidden_mapping_in/bias
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/self/query/kernel, found name: bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/self/query/bias, found name: bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/self/key/kernel, found name: bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/self/key/bias, found name: bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/self/value/kernel, found name: bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/self/value/bias, found name: bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/output/dense/kernel, found name: bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/output/dense/bias, found name: bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/output/LayerNorm/gamma, found name: bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
==> searching: bert/encoder/transformer/group_0/inner_group_0/attention/output/LayerNorm/beta, found name: bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
==> searching: bert/encoder/transformer/group_0/inner_group_0/intermediate/dense/kernel, found name: bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
==> searching: bert/encoder/transformer/group_0/inner_group_0/intermediate/dense/bias, found name: bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
==> searching: bert/encoder/transformer/group_0/inner_group_0/output/dense/kernel, found name: bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
==> searching: bert/encoder/transformer/group_0/inner_group_0/output/dense/bias, found name: bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
==> searching: bert/encoder/transformer/group_0/inner_group_0/output/LayerNorm/gamma, found name: bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
==> searching: bert/encoder/transformer/group_0/inner_group_0/output/LayerNorm/beta, found name: bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
==> searching: bert/pooler/dense/kernel, found name: bert/pooler/dense/kernel
==> searching: bert/pooler/dense/bias, found name: bert/pooler/dense/bias

Does it mean that the model is well loaded? If so, what is the reason that the model and training process do not work at all? Thanks.

例子-SQuAD

后续会有关于在SQuAD上做阅读理解的例子代码吗?

跑task_relation_extraction.py这个模型报的错,TypeError: object.__init__() takes no parameters大佬帮忙看看

0it [00:00, ?it/s]Traceback (most recent call last):
File "D:/pycharm/bert4keras/examples/task_relation_extraction.py", line 323, in
callbacks=[evaluator, EMAer])
File "E:\Anaconda\envs\keras\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "E:\Anaconda\envs\keras\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "E:\Anaconda\envs\keras\lib\site-packages\keras\engine\training_generator.py", line 251, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "E:\Anaconda\envs\keras\lib\site-packages\keras\callbacks.py", line 79, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "D:/pycharm/bert4keras/examples/task_relation_extraction.py", line 305, in on_epoch_end
f1, precision, recall = evaluate(valid_data)
File "D:/pycharm/bert4keras/examples/task_relation_extraction.py", line 272, in evaluate
R = set([SPO(spo) for spo in extract_spoes(d['text'])])
File "D:/pycharm/bert4keras/examples/task_relation_extraction.py", line 272, in
R = set([SPO(spo) for spo in extract_spoes(d['text'])])
File "D:/pycharm/bert4keras/examples/task_relation_extraction.py", line 251, in init
super(SPO, self).init(spo)
TypeError: object.init() takes no parameters
0it [00:03, ?it/s]

task_sentiment_albert.py保存模型的问题

只存权重的方式试过没有问题,也可以顺利转成pb,都ok。
但还是好奇请问一下,我用model.save存储模型了之后,用load_model,添加了custom_objects,也在layer中添加了get_config,之后,还报shape不一致的问题。请问全模型save的方式是不是行不通?跟input shape为None是否有关系

加载模型报错

image

ValueError: You called set_weights(weights) on layer "Encoder-1-MultiHeadSelfAttention" with a weight list of length 8, but the layer was expecting 0 weights. Provided weights: [array([[ 0.03122838, 0.04661432, 0.00716374, .....

我试过了https://github.com/brightmart/albert_zh 的两个预训练模型:
albert_base_zh(额外训练了1.5亿个实例即 36k steps * batch_size 4096);
albert_base_zh(小模型体验版)

但是都报类似的错误,请问作者知道是什么原因吗?

tf.keras兼容问题

您好,感谢您提供的工具!

我遇到一个问题,不清楚是不是bug,想请您帮忙看一下。

在测试引用的时候,仅仅只是做import如下代码

import os
os.environ['TF_KERAS'] = '1'
from bert4keras.bert import load_pretrained_model, set_gelu
from bert4keras.utils import SimpleTokenizer, load_vocab
from bert4keras.train import PiecewiseLinearLearningRate
set_gelu('tanh')

会得到错误
···
File "/home/flydsc/anaconda3/envs/main_work/lib/python3.7/site-packages/bert4keras/bert.py", line 4, in
from .layers import *
File "/home/flydsc/anaconda3/envs/main_work/lib/python3.7/site-packages/bert4keras/layers.py", line 60, in
class OurLayer(Layer):
NameError: name 'Layer' is not defined
···

但是当我去掉前两行关于环境变量的设定,即:

from bert4keras.bert import load_pretrained_model, set_gelu
from bert4keras.utils import SimpleTokenizer, load_vocab
from bert4keras.train import PiecewiseLinearLearningRate
set_gelu('tanh')

import成功

猜测是TensorFlow.keras 在 ·globals().update(keras.layers.dict)· 的兼容性有一点问题?

我的环境是
python 3.7
TensorFlow 1.14.0

再次感谢您的无私。

Semantic 范例使用 TPU 的问题

我看了您博客介绍的 tf.keras 可以使用多 GPU 与 TPU 训练。
于是对 Semantic 范例做了如下修改,我使用的环境是 Colab 自带的 TPU
tensorflow 1.15
bert4keras 0.3.4

import os
os.environ['TF_KERAS'] = '1'

import json
import numpy as np
import codecs
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.backend as K
from bert4keras.backend import set_gelu
from bert4keras.tokenizer import Tokenizer
from bert4keras.bert import build_bert_model
from bert4keras.optimizers import Adam, extend_with_piecewise_linear_lr
from bert4keras.snippets import sequence_padding, get_all_attributes

locals().update(get_all_attributes(keras.layers))
set_gelu('tanh')

### 中间与范例相同,未做改动 ###

# TF1 TPU
resolver = tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)

# Adam Learning Rate
AdamLR = extend_with_piecewise_linear_lr(Adam)

with strategy.scope():
  # 加载预训练模型
  bert = build_bert_model(
      config_path=config_path,
      checkpoint_path=checkpoint_path,
      with_pool=True,
      albert=True,
      return_keras_model=False,
  )
  output = Dropout(rate=0.1)(bert.model.output)
  output = Dense(units=2,
                 activation='softmax',
                 kernel_initializer=bert.initializer)(output)  
  model = keras.models.Model(bert.model.input, output)
  model.compile(
      loss='sparse_categorical_crossentropy',
      # optimizer=Adam(1e-5),  # 用足够小的学习率
      optimizer=AdamLR(learning_rate=1e-4,lr_schedule={1000: 1, 2000: 0.1}),
      metrics=['accuracy'])

model.summary()

### 中间与范例相同,未做改动 ###

model.fit_generator(train_generator.forfit(),
                    steps_per_epoch=len(train_generator),
                    epochs=10,
                    callbacks=[evaluator])

到 fit_generator 这一步报错 ``fit_generator is not supported for models compiled with tf.distribute.Strategy.

我查了一些资料得知 fit 也可以使用生成器,可是在 keras 的文档中并未提及如何设参。
还是一定要使用 tf.data 呢?

fine-tune需要最后一层加正则化吗

看bert_keras代码,最后一层只是全连接层(dense),问下需不需要加dropout或l2正则化?是只加在最后全连接层还是前面的bert model的那些层微调时都可以加正则化?

seq2seq example报错

简单修改了task_seq2seq.py脚本中read_texts部分,使其输入我要的数据
其他源码未做任何修改
环境python 3.6.4 Keras 2.3.1 Tensorflow 2.0 bert4keras 0.2.6, OS Ubuntu
现在发现两部分问题:
1.当使用tf 2.0后端 (按说明的方法设置环境变量)
模型可以成功build,但是在add_loss部分报错
2019-11-14 17:57:47.522304: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: You must feed a value for placeholder tensor 'Input-Segment' with dtype float and shape [?,?]
[[{{node Input-Segment}}]]
Traceback (most recent call last):
File "task_seq2seq.py", line 159, in
model.add_loss(cross_entropy)
我内网粘贴不出来错误信息,以上手打,后面不打了。。

2.在使用keras后端时,model build部分不能通过(没有打印出summary)
错误信息
TypeError: int() argument must be a string, a bytes-like objobject or a number, not 'Tensor'
......
During handling of the above exception, another exception occured
.....
ValueError: Duplicate node name in graph: 'Attention-Mask/ones/packed'

希望得到帮助,如需其他信息请讲

加载模型报错TypeError: Expected float32 passed to parameter 'y' of op 'Equal', got 'history_only' of type 'str' instead. Error: Expected float32, got 'history_only' of type 'str' instead.

大佬你好,我在构建seq2seq的bert时出现
TypeError: Expected float32 passed to parameter 'y' of op 'Equal', got 'history_only' of type 'str' instead. Error: Expected float32, got 'history_only' of type 'str' instead.
环境为python3.6 tensorflow2.0
这个类型错误,发现是在layer.py中有
if a_mask is not None:
if a_mask == 'history_only':
ones = K.ones_like(a[:1])
a_mask = (ones - tf.linalg.band_part(ones, -1, 0)) * 1e12
a = a - a_mask
else:
a = a - (1 - a_mask) * 1e12
应该是a_mask是张量,不允许和字符串用做对比吧,所以我改了一下
将if a_mask == 'history_only':改为
if isinstance(a_mask,str):
这样可以成功加载模型

在padding部分减去一个大正数

想請問一下,在attention matrix裡,為什麼要把Pad㵴去一個大正數呢?這樣會使attention matrix裡padding部分變的很負。如果選mode=0效果不好嗎?

a = tf.einsum('bjhd,bkhd->bhjk', qw, kw) / self.key_size**0.5
a = sequence_masking(a, v_mask, 1, -1)

感謝苏神!

为了与keras原生mask兼容,建议a_mask放到输入,mask参数使用原生mask

token_embedding = Embedding(input_dim=vocab_size,
                            output_dim=hidden_size,
                            mask_zero=True,
                            name='Embedding-Token')
x = MultiHeadAttention(heads=num_attention_heads,
                       head_size=attention_head_size,
                       name=attention_name)([x, x, x, a_mask])
def call(self, inputs, mask=None):
    q, k, v, a_mask = inputs
    v_mask = mask[2]
    q_mask = mask[0]

方便bert后接大量原生支持mask的层

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.