cyberzhg / keras-transformer Goto Github PK
View Code? Open in Web Editor NEWTransformer implemented in Keras
Home Page: https://pypi.org/project/keras-transformer/
License: MIT License
Transformer implemented in Keras
Home Page: https://pypi.org/project/keras-transformer/
License: MIT License
Hi there!
Thanks for the package! It is very useful!
I have a problem with the package. When I run on the Google Colab, there is no problem.
But if I test on my MAC m1 it seems the kernel dies when it tries to import the package.
Did you have a problem that looks like it?
Hi, after read your code, I have learned a lot. But I have some question about the decode method.
_get_max_suffix_repeat_times
is check repeat patten in the decoder_inputs, so line 561 in transformer.py may should be _get_max_suffix_repeat_times(decoder_inputs[index_map[i]], max_repeat * max_repeat_block) >= max_repeat:
? This may be a bug.Hello, thanks for all your work. I'd be very grateful if you could help me with this question:
I'm trying to create a multilabel text classification model based off the transformer code. My model looks like this (I'm using a pretrained w2v embedding, so the input receives the vectorized text):
lin = Input((maxlen, w2vWordSize))
trig_pos_embedding = TrigPosEmbedding(mode=TrigPosEmbedding.MODE_ADD, output_dim=w2vWordSize)(lin)
encoders = get_encoders(3, trig_pos_embedding, 5, 120, None, gelu)
flat = Flatten()(encoders)
d = Dense(512, activation='tanh')(flat)
d = Dropout(dov)(d)
lout = Dense(units=model_output_shape, activation='sigmoid')(d)
However, this model always outputs all zeros. Do you have any ideas or tips why this could be happening?
Thank you!
Even when just using the provided example the Prediction portion runs slow. Is there any way to keep the prediction code warm for faster runtime ad hoc predictions?
When trying to import the library, I get the following error:
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python37\lib\site-packages\keras_transformer\__init__.py", line 2, in <module>
from .transformer import *
File "C:\Program Files\Python37\lib\site-packages\keras_transformer\transformer.py", line 4, in <module>
from keras_multi_head import MultiHeadAttention
File "C:\Program Files\Python37\lib\site-packages\keras_multi_head\__init__.py", line 1, in <module>
from .multi_head import MultiHead
File "C:\Program Files\Python37\lib\site-packages\keras_multi_head\multi_head.py", line 6, in <module>
class MultiHead(keras.layers.Wrapper):
AttributeError: module 'keras.layers' has no attribute 'Wrapper'
I am using keras 2.2.5 and tensorflow 2.1.0 on Windows10
When i compile the model in metrics, i can't fit the model with training data
@CyberZHG Thanks for your sharing, I use transformer to do seq2seq task. Like, input a article and predict the abstract. When I finish training, I get almost same output with different input. Code are same as your example, data should be right, because with same data, and use lstm block as seq2seq, I got the proper output.
Hope for your answer, Thanks.
Please update examples in readme, use feed_forward_activation
or attention_activation
instead of activation
for get_model().
how to use beam search in this lib
I've tested some approach by trying to disable teacher forcing but seems like your code is too complicated for me (I am a beginner).
Could you please help suggest how to do that?
thanks for ur sharing first.Code are as same as ur example, the only diffferent is data.As a Demo, I feed 2000 Chinese-English sentences to get a translation model.After trainning, I got same 'decoded' what ever the data I input.And here's an example.
encode_input : [1, 5, 4, 2, 0, 0, 0, 0, 0, 0]
decode_input : [1, 3, 4, 2, 0, 0, 0, 0, 0, 0]
decode_output : [[3], [4], [2], [0], [0], [0], [0], [0], [0], [0]]
decoded : [1, 9, 204, 4, 4, 2]
And here's the part of the code:
def generate_train(batch_size):
steps=0
while True:
batch_out = decode_output[steps:steps+batch_size]
batch_eng = decode_input[steps:steps+batch_size]
batch_cns = encode_input[steps:steps+batch_size]
yield [np.array(batch_cns), np.array(batch_eng)], np.array(batch_out)
steps += batch_size
if steps == 2000:
steps = 0
model.fit_generator(generate_train(batch_size=100),
steps_per_epoch=20,
epochs=10,
verbose=1,
# callbacks=callbacks_list,
# validation_data=generate_test(batch_size=100),
# validation_steps=200,
class_weight=None,
max_queue_size=5
# workers=1,
# use_multiprocessing=False,
# shuffle=False,
# initial_epoch=initial_epoch_
)
I am trying to save and re-load the model for future use in other apps, however there is an error which I cannot work around.
This is the code:
import keras
import numpy as np
from keras_transformer import get_custom_objects, get_model, decode
tokens = 'all work and no play makes jack a dull boy'.split(' ')
token_dict = {
'<PAD>': 0,
'<START>': 1,
'<END>': 2,
}
for token in tokens:
if token not in token_dict:
token_dict[token] = len(token_dict)
encoder_inputs_no_padding = []
encoder_inputs, decoder_inputs, decoder_outputs = [], [], []
for i in range(1, len(tokens) - 1):
encode_tokens, decode_tokens = tokens[:i], tokens[i:]
encode_tokens = ['<START>'] + encode_tokens + ['<END>'] + ['<PAD>'] * (len(tokens) - len(encode_tokens))
output_tokens = decode_tokens + ['<END>', '<PAD>'] + ['<PAD>'] * (len(tokens) - len(decode_tokens))
decode_tokens = ['<START>'] + decode_tokens + ['<END>'] + ['<PAD>'] * (len(tokens) - len(decode_tokens))
encode_tokens = list(map(lambda x: token_dict[x], encode_tokens))
decode_tokens = list(map(lambda x: token_dict[x], decode_tokens))
output_tokens = list(map(lambda x: [token_dict[x]], output_tokens))
encoder_inputs_no_padding.append(encode_tokens[:i + 2])
encoder_inputs.append(encode_tokens)
decoder_inputs.append(decode_tokens)
decoder_outputs.append(output_tokens)
model = get_model(
token_num=len(token_dict),
embed_dim=30,
encoder_num=3,
decoder_num=2,
head_num=3,
hidden_dim=120,
attention_activation='relu',
feed_forward_activation='relu',
dropout_rate=0.05,
embed_weights=np.random.random((13, 30)),
)
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.sparse_categorical_crossentropy,
metrics={})
model.fit(
x=[np.asarray(encoder_inputs * 500), np.asarray(decoder_inputs * 500)],
y=np.asarray(decoder_outputs * 500),
epochs=3,
)
model.save('saved_model.hdf5')
model = keras.models.load_model('saved_model.hdf5')
and the error that I get is:
Traceback (most recent call last):
File "example.py", line 56, in <module>
model = keras.models.load_model('saved_model.hdf5')
File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 225, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 458, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "C:\Program Files\Python37\lib\site-packages\keras\layers\__init__.py", line 55, in deserialize
printable_module_name='layer')
File "C:\Program Files\Python37\lib\site-packages\keras\utils\generic_utils.py", line 145, in deserialize_keras_object
list(custom_objects.items())))
File "C:\Program Files\Python37\lib\site-packages\keras\engine\network.py", line 1022, in from_config
process_layer(layer_data)
File "C:\Program Files\Python37\lib\site-packages\keras\engine\network.py", line 1008, in process_layer
custom_objects=custom_objects)
File "C:\Program Files\Python37\lib\site-packages\keras\layers\__init__.py", line 55, in deserialize
printable_module_name='layer')
File "C:\Program Files\Python37\lib\site-packages\keras\utils\generic_utils.py", line 138, in deserialize_keras_object
': ' + class_name)
ValueError: Unknown layer: EmbeddingRet
I am using Python 3.7.3, TensorFlow 1.13.1 and keras 2.2.4
the Code in transfomer.py, line 561:
_get_max_suffix_repeat_times(decoder_inputs, max_repeat * max_repeat_block) >= max_repeat:
should be corrected as:
_get_max_suffix_repeat_times(decoder_inputs[index_map[i]], max_repeat * max_repeat_block) >= max_repeat:
Hello, Thanks for your open sources. This helps great to me.
I have some questions.
In "Attention is All You Need", the Multi-Head Attention is in encoder, decoder and the Masked Multi-Head Attention is only in the decoder part. But in your transformer code, I guess there's no mask part. Is it wrong? Or if there, would you let me know the "Masked" part?
Thanks a lot.
Thanks for the awesome library.
I tried to run on real world NMT task on the IWSLT 2014 de-en translation task. I trained the model for about 20 epochs but during inference, the trained model failed to generate meaningful sentences. Can you make a sample run on datasets such as IWSLT or WMT
Thanks for the package! It is very useful!
when using envs with those packages, it throw exceptions.
python=3.8
tensorflow=2.5.0
keras=2.7.0
keras/layers/core/lambda_layer.py", line 125, in Lambda @tf.__internal__.tracking.no_automatic_dependency_tracking AttributeError: module 'tensorflow.compat.v2.__internal__.tracking' has no attribute 'no_automatic_dependency_tracking'
I'm trying to encode multiple sentences, into one embedding, so first I use a self attention encoder on each word of a sentence for each sentence, and then I use another self attention encoder on each sentence representation to get a final representation. However, when I do this, it doesn't seem to work well. However, if I remove either the word level or self level self attention encoder (and replace it with just mean/sum of the embeddings, it does work. I cannot seem to figure out where the problem is, do you have any theories of what could be going wrong?
Here is (a simplified) version of my model:
def build_context_encoder(self):
context = Input(shape=(self.max_num_words_per_context, ), dtype=tf.int64) #num_context x max_num_words_per_context #put back
#set up word emb
word_embs = np.zeros((self.vocab_size+1, self.word_emb_dim)) #MASK
for word in self.word2id_dict:
ind = self.word2id_dict[word]
if ind > largest_ind:
largest_ind = ind
w_emb = self.word_embedding_model.wv[word]
word_embs[ind] = w_emb
emb_layer = EmbeddingRet(
input_dim=self.vocab_size+1, #is this correct?
output_dim=self.word_emb_dim,
mask_zero=True, #is this what we want? Think about this
weights=[word_embs], #self.embedding_matrix,
trainable=False,
name='Encoder-Token-Embedding',
)
emb = emb_layer(context)
encoder_embed = emb[0]
sa_context_encoder = transformer.get_encoders(self.encoder_num, encoder_embed, self.head_num, self.hidden_dim) #should output batch x num_contexts x emb_size
context_emb = SumInternal()(sa_context_encoder)
context_encoder = Model(inputs=[context], outputs=context_emb)
return context_encoder
def build_model(self):
contexts = Input(shape=(self.max_num_context, self.max_num_words_per_context, ), dtype=tf.int64) #num_context x num_words
context_embs = TimeDistributed(self.context_encoder)(contexts)
context_embs = ZeroVectorMasker()(context_embs) #READD
aggregator_encoder_out = transformer.get_encoders(self.encoder_num, context_embs, self.head_num, self.hidden_dim)
final_estimate = MeanInternal()(aggregator_encoder_out)
model = Model(inputs=[subwords, contexts], outputs=final_estimate)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[])
model.summary()
return model
Is there any easy way to output attention scores using this library? I know that in ScaledDotProductAttention there is an option to return attention but it seems when this library calls it it is set to false, and I am unsure how to change this without changing the code itself (adapting each layer to handle the returned attention scores output as well). Is there any easy way?
Describe the Bug
In the transformer.py .decode() loop I found what I believe to be a bug:
The encoder is fed new information at every step, an encoder should have the input sequence fed only once, to generate the encoded representation, which is then decoded step by step.
edit:
I think the proper achitecture would be to split encoder and decoder into two models.
So when you infer, you would run the encoder model once to get the representation
There are two string columns in my dataset. First column is typo_keyword, the second column is keyword.
I want to train a deep learning model that gets typo_keyword as features and predicts the keyword.
Here is a sample data
# Sample data
data = [
# typo_keyword, keyword
('rkk mont', 'erkek mont'),
('erkk mont', 'erkek mont'),
('rkk mnt', 'erkek mont'),
('akllı sat', 'akıllı saat'),
('akıllı saaat', 'akıllı saat'),
('aklı saat', 'akıllı saat')
]
I want to train a transformer model based on character level input sequence which is my first column and word level output sequence which is my second using your library.
Can you give me model building and model inference code examples?
I need your help.
Best,
Describe the Bug
I met this in model.fit when I'm tring to add this transformer block into my own model, as a Lambda layer.
ValueError: An operation has None
for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
After search, I was told that there are unused params in def build() by def call(). But I can't target it.
def tfr(input):
trans = get_model(
token_num=3,
embed_dim=30,
encoder_num=3,
decoder_num=3,
head_num=6,
hidden_dim=120,
attention_activation='relu',
feed_forward_activation='relu',
dropout_rate=0.05,
embed_weights=np.random.random((3, 30)),
)
T = trans(input)
T = tf.reshape(T,[-1, 5, 2048, 3])
return T
......
x = Lambda(tfr, name='trans')([x, x])
tf 1.15
Describe the Bug
Creating a Transformer model fails due to a tuple unpacking error (at least using eager execution). See below for the stacktrace, apparently the error is due to the decoder_embed_layer
.
Version Info
Minimal Codes To Reproduce
from keras_transformer import
import tensorflow as tf
tf.enable_eager_execution()
model = get_model(token_num=1000, embed_dim=32, encoder_num=2, decoder_num=2, head_num=4, hidden_dim=128, dropout_rate=0.05)
results in
377 decoder_input = keras.layers.Input(shape=(None,), name='Decoder-Input')
--> 378 decoder_embed, decoder_embed_weights = decoder_embed_layer(decoder_input)
379 decoder_embed = TrigPosEmbedding(
380 mode=TrigPosEmbedding.MODE_ADD,
ValueError: too many values to unpack (expected 2)
Could you support star-transformer is this project? Thanks!
Hello, nice to see your code, but i have a bit of confusion about data processing, hope you can answer it
Regarding the following code, I don't know why the data is not taken to the last one:
for i in range(1, len(tokens) - 1):
encode_tokens, decode_tokens = tokens[:i], tokens[i:]
encode_tokens = ['<START>'] + encode_tokens + ['<END>'] + ['<PAD>'] * (len(tokens) - len(encode_tokens))
output_tokens = decode_tokens + ['<END>', '<PAD>'] + ['<PAD>'] * (len(tokens) - len(decode_tokens))
Why range (1, len (tokens) -1)
, not range (1, len (tokens))
Could you give some explanation, I don't know the purpose of doing this, thank you!
I have made the following changes in the code to accommodate my own data.
##import packages
import numpy as np
from keras_transformer import get_model, decode
from keras.preprocessing.sequence import pad_sequences
##open my own file
f1=open('/content/drive/My Drive/trans_en-bn_NMT/English','r',errors='ignore').read().split('\n')
f2=open('/content/drive/My Drive/trans_en-bn_NMT/Bangla','r',errors='ignore').read().split('\n')
source_tokens=[i.split () for i in f1]
target_tokens=[i.split () for i in f2]
## Generate dictionaries
def build_token_dict(token_list):
token_dict = {
'<PAD>': 0,
'<START>': 1,
'<END>': 2,
}
for tokens in token_list:
for token in tokens:
if token not in token_dict:
token_dict[token] = len(token_dict)
return token_dict
source_token_dict = build_token_dict(source_tokens)
target_token_dict = build_token_dict(target_tokens)
target_token_dict_inv = {v: k for k, v in target_token_dict.items()}
# Add special tokens
encode_tokens = [['<START>'] + tokens + ['<END>'] for tokens in source_tokens]
decode_tokens = [['<START>'] + tokens + ['<END>'] for tokens in target_tokens]
output_tokens = [tokens + ['<END>', '<PAD>'] for tokens in target_tokens]
# Padding
source_max_len = max(map(len, encode_tokens))
target_max_len = max(map(len, decode_tokens))
encode_tokens = [tokens + ['<PAD>'] * (source_max_len - len(tokens)) for tokens in encode_tokens]
decode_tokens = [tokens + ['<PAD>'] * (target_max_len - len(tokens)) for tokens in decode_tokens]
output_tokens = [tokens + ['<PAD>'] * (target_max_len - len(tokens)) for tokens in output_tokens]
encode_input = [list(map(lambda x: source_token_dict[x], tokens)) for tokens in encode_tokens]
decode_input = [list(map(lambda x: target_token_dict[x], tokens)) for tokens in decode_tokens]
decode_output = [list(map(lambda x: [target_token_dict[x]], tokens)) for tokens in output_tokens]
#
# Build & fit model
model = get_model(
token_num=max(len(source_token_dict), len(target_token_dict)),
embed_dim=32,
encoder_num=2,
decoder_num=2,
head_num=4,
hidden_dim=128,
dropout_rate=0.05,
use_same_embed=False, # Use different embeddings for different languages
)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['acc'])
model.summary()
model.fit(
x=[np.array(encode_input), np.array(decode_input)],
y=np.array(decode_output),
epochs=1,
batch_size=10,
)
# Predict
decoded = decode(
model,
np.array(encode_input)[48999:],
#test_inp,
start_token=target_token_dict['<START>'],
end_token=target_token_dict['<END>'],
pad_token=target_token_dict['<PAD>'],
)
##print the translated sentences
for i in decoded:
print(''.join(map(lambda x: target_token_dict_inv[x], i[1:-1])))
but when testing, it is giving the following error
UFuncTypeError Traceback (most recent call last)
in ()
72 start_token=target_token_dict[''],
73 end_token=target_token_dict[''],
---> 74 pad_token=target_token_dict[''],
75 # top_k=10,
76 # temperature=1.0,/usr/local/lib/python3.6/dist-packages/keras_transformer/transformer.py in decode(model, tokens, start_token, end_token, pad_token, top_k, temperature, max_len, max_repeat, max_repeat_block)
541 max_input_len = max(max_input_len, len(tokens[i]))
542 for i in range(len(batch_inputs)):
--> 543 batch_inputs[i] += [pad_token] * (max_input_len - len(batch_inputs[i]))
544 predicts = model.predict([np.array(batch_inputs), np.array(batch_outputs)])
545 for i in range(len(predicts)):UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
Hi, thank you for these handy tool box, really helpful.
And one issue I met is I had to download tensorflow every time when I tried to install from pypi or this git repository.
I think the reason is I installed tensorflow GPU version, the package's name is tensorflow-gpu in pypi, however in the dependency file, we always get name of tensorflow, like in requirement file.
May I suggest to remove some common basic packages like tensorflow and keras since users open had these tools installed already.
when i give a pre_train embedding matrix where the dimension is 128,and choose the head_num=3,it occurs this question,how can i to solve it ?
It is not a bug really. But I think changing it to something like max_len=10000 in https://github.com/CyberZHG/keras-transformer/blob/master/keras_transformer/transformer.py#L395 can be reasonable.
I have spent a half of day debugging the model, trying to find the reason it hangs sometimes when predicting with (under)trained model. Well, the problem is sometimes not selected as next token (https://github.com/CyberZHG/keras-transformer/blob/master/keras_transformer/transformer.py#L428), so the condition in https://github.com/CyberZHG/keras-transformer/blob/master/keras_transformer/transformer.py#L430 is always False.
Another option would be raising an error when length of tokens list exceeds some reasonable limit.
after stacked decoder layer,I think it should be concatenate with a linear and softmax layer.but in your code, it's
dense_layer = EmbeddingSim( trainable=trainable, name='Output', )([decoded_layer, decoder_embed_weights])
why do this operate?
I'm having trouble figuring out how to use masking with the transformer. I have mask_zero set to true with the embeddings, but I'm not sure if the masking is used in the transformer calculation. Do I need to do anything special to make sure it is taken into account?
For example, if I was putting a sentence through the encoder: (2 3 2 1 0 0 0), and I want the transformer to ignore the zeros, what should I do? (The 0 embedding is a zero vector, if that matters).
I want to input multiple sentences , but have the encoder applied to each sentence individually (to be combined later in the model). Is there an easy way to do this? I've tried using Lambda functions with get_encoders(), but I don't think this is correct, as I want the same set of encoders applied to each sentence.
I have a sequence of inputs
encoder_inputs = [[11,22,35,43,6,30,0,0,0,0],
[31,43,56,23,45,0,0,0,0]]
decoder_inputs = [[8,3,2,4,5,2,0,0,0,0],
6,4,5,3,7,4,0,0,0,0]]
decoder_output =[[[8],[3],[2],[4],[5],[2],[0],[0],[0],[0]],
[6],[4],[5],[3],[7],[4],[0],[0],[0],[0]]]
My question, Do I need to specify start and end of sequences in my case?
Does the keras-transformer already know the padding is zero?
Finally, when evaluating, do I do this
model.evaluate(x=[encoder_inputs,decoder_inputs], y=[decoder_outputs])
Thank you
In this project, we can find that you use model.fit to train a transformer model. But I should use mode.fit_generator because of massive data. When I code a model, I should declare the input layers's names which were defined in generator function. So, can you show me a case of transformer+mode.fit_generator ?Thanks.
In the usage example I was wondering why the encoder and decoder inputs were multiplied by 1000 (or 1024 for the translation example)?
Also is there any intuition as to why the embedded dimension was 30 or above or what the hidden dimension should be set to?
Can you share an example where it is used as an internal layer in a model besides other keras layers? Many thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.