cyberzhg / keras-transformer Goto Github PK

View Code? Open in Web Editor NEW

368.0 12.0 96.0 70 KB

Transformer implemented in Keras

Home Page: https://pypi.org/project/keras-transformer/

License: MIT License

Shell 0.81% Python 99.19%

keras transformer attention translation encoder decoder

keras-transformer's People

Stargazers

Watchers

Forkers

gu5hanl1gh7n1n jliu-1 orenbaldinger airmeng whaozl zw76859420 hoangcuong2011 jumutc phelanwang shihuaxing psirenny mokundong takeshikondo lebron-kun kewin1807 tjunlp gdh756462786 naxalpha dangxuanhong joeau amoonhappy hj940709 ssssnail aceprojectx courteouswood lancenorskog shenggaozhu diligence1989 heavencm adder qianrenjian cnamrata15 pren1 rustygoldhouse wufan55 schenbergzy lettergram xiaoanshi attesa angry-coder tuanad121 piyushbhuwalka-sopho songkaisong cdxeve jhuang2023 xxg-lab sdq11111 jeffkzhao china-liweihong guzhixiang jugglecomemid wqw123 forbiddenmouth fivelike wxlsummer r-ceph alonegu smallstom tanjialing1 duolinwang geozcx leifanus manojkesani ygexe plutospp laith85 nnnngo yuthreestone jbdatascience goodboyandbadboy alexzzlin xingshulicc deeplearning-machinelearning goomoo99 collinyao is5882 dobbytk 2pterons kang9779 benjaminwegener hehongjie shenganzhang guhaopython zelechos ljlq zjy-github-account josephzbao arunadevikaruppasamy xialibing jywang-recsys cboschh nilsonmax uno-web simulanics

keras-transformer's Issues

Installation problem on MAC m1

Hi there!

Thanks for the package! It is very useful!

I have a problem with the package. When I run on the Google Colab, there is no problem.
But if I test on my MAC m1 it seems the kernel dies when it tries to import the package.

Did you have a problem that looks like it?

Some question about the decode method

Hi, after read your code, I have learned a lot. But I have some question about the decode method.

I guess the method _get_max_suffix_repeat_times is check repeat patten in the decoder_inputs, so line 561 in transformer.py may should be _get_max_suffix_repeat_times(decoder_inputs[index_map[i]], max_repeat * max_repeat_block) >= max_repeat:? This may be a bug.
The beam search implementation in decode method use a random choice from the Gibbs distribution, which is different from the usual beam search method. I think the beam search should remain the top-k results in each decode step, and finally return top-k results for every input.
Best wishes.

Using get_encoders in my model, output is all zeros

Hello, thanks for all your work. I'd be very grateful if you could help me with this question:
I'm trying to create a multilabel text classification model based off the transformer code. My model looks like this (I'm using a pretrained w2v embedding, so the input receives the vectorized text):

lin = Input((maxlen, w2vWordSize))  
trig_pos_embedding = TrigPosEmbedding(mode=TrigPosEmbedding.MODE_ADD, output_dim=w2vWordSize)(lin)  
encoders = get_encoders(3, trig_pos_embedding, 5, 120, None, gelu)  
flat = Flatten()(encoders)  
d = Dense(512, activation='tanh')(flat)  
d = Dropout(dov)(d)  
lout = Dense(units=model_output_shape, activation='sigmoid')(d)

However, this model always outputs all zeros. Do you have any ideas or tips why this could be happening?
Thank you!

Slow prediction

Even when just using the provided example the Prediction portion runs slow. Is there any way to keep the prediction code warm for faster runtime ad hoc predictions?

module 'keras.layers' has no attribute 'Wrapper'

When trying to import the library, I get the following error:

  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python37\lib\site-packages\keras_transformer\__init__.py", line 2, in <module>
    from .transformer import *
  File "C:\Program Files\Python37\lib\site-packages\keras_transformer\transformer.py", line 4, in <module>
    from keras_multi_head import MultiHeadAttention
  File "C:\Program Files\Python37\lib\site-packages\keras_multi_head\__init__.py", line 1, in <module>
    from .multi_head import MultiHead
  File "C:\Program Files\Python37\lib\site-packages\keras_multi_head\multi_head.py", line 6, in <module>
    class MultiHead(keras.layers.Wrapper):
AttributeError: module 'keras.layers' has no attribute 'Wrapper'

I am using keras 2.2.5 and tensorflow 2.1.0 on Windows10

get metrics accuracy in compile

When i compile the model in metrics, i can't fit the model with training data

why get same output with different input?

@CyberZHG Thanks for your sharing, I use transformer to do seq2seq task. Like, input a article and predict the abstract. When I finish training, I get almost same output with different input. Code are same as your example, data should be right, because with same data, and use lstm block as seq2seq, I got the proper output.
Hope for your answer, Thanks.

unexpected keyword argument 'activation'

Please update examples in readme, use feed_forward_activation or attention_activation instead of activation for get_model().

Describe the Bug

beam search

how to use beam search in this lib

How can I adjust model to disable teacher forcing?

I've tested some approach by trying to disable teacher forcing but seems like your code is too complicated for me (I am a beginner).

Could you please help suggest how to do that?

why get the same output with different input?

thanks for ur sharing first.Code are as same as ur example, the only diffferent is data.As a Demo, I feed 2000 Chinese-English sentences to get a translation model.After trainning, I got same 'decoded' what ever the data I input.And here's an example.
encode_input : [1, 5, 4, 2, 0, 0, 0, 0, 0, 0]
decode_input : [1, 3, 4, 2, 0, 0, 0, 0, 0, 0]
decode_output : [[3], [4], [2], [0], [0], [0], [0], [0], [0], [0]]
decoded : [1, 9, 204, 4, 4, 2]
And here's the part of the code:

def generate_train(batch_size):
   steps=0
   while True:
       batch_out = decode_output[steps:steps+batch_size]
       batch_eng = decode_input[steps:steps+batch_size]
       batch_cns = encode_input[steps:steps+batch_size]
       yield [np.array(batch_cns), np.array(batch_eng)], np.array(batch_out)
       steps += batch_size
       if steps == 2000:
           steps = 0
model.fit_generator(generate_train(batch_size=100), 
                   steps_per_epoch=20, 
                   epochs=10, 
                   verbose=1, 
#                     callbacks=callbacks_list, 
#                     validation_data=generate_test(batch_size=100), 
#                     validation_steps=200, 
                   class_weight=None, 
                   max_queue_size=5 
#                     workers=1, 
#                     use_multiprocessing=False,
#                     shuffle=False,
#                     initial_epoch=initial_epoch_
                   )

ValueError: Unknown layer: EmbeddingRet

I am trying to save and re-load the model for future use in other apps, however there is an error which I cannot work around.

This is the code:

import keras
import numpy as np
from keras_transformer import get_custom_objects, get_model, decode


tokens = 'all work and no play makes jack a dull boy'.split(' ')
token_dict = {
    '<PAD>': 0,
    '<START>': 1,
    '<END>': 2,
}
for token in tokens:
    if token not in token_dict:
        token_dict[token] = len(token_dict)

encoder_inputs_no_padding = []
encoder_inputs, decoder_inputs, decoder_outputs = [], [], []
for i in range(1, len(tokens) - 1):
    encode_tokens, decode_tokens = tokens[:i], tokens[i:]
    encode_tokens = ['<START>'] + encode_tokens + ['<END>'] + ['<PAD>'] * (len(tokens) - len(encode_tokens))
    output_tokens = decode_tokens + ['<END>', '<PAD>'] + ['<PAD>'] * (len(tokens) - len(decode_tokens))
    decode_tokens = ['<START>'] + decode_tokens + ['<END>'] + ['<PAD>'] * (len(tokens) - len(decode_tokens))
    encode_tokens = list(map(lambda x: token_dict[x], encode_tokens))
    decode_tokens = list(map(lambda x: token_dict[x], decode_tokens))
    output_tokens = list(map(lambda x: [token_dict[x]], output_tokens))
    encoder_inputs_no_padding.append(encode_tokens[:i + 2])
    encoder_inputs.append(encode_tokens)
    decoder_inputs.append(decode_tokens)
    decoder_outputs.append(output_tokens)

model = get_model(
    token_num=len(token_dict),
    embed_dim=30,
    encoder_num=3,
    decoder_num=2,
    head_num=3,
    hidden_dim=120,
    attention_activation='relu',
    feed_forward_activation='relu',
    dropout_rate=0.05,
    embed_weights=np.random.random((13, 30)),
)
model.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.sparse_categorical_crossentropy,
    metrics={})

model.fit(
    x=[np.asarray(encoder_inputs * 500), np.asarray(decoder_inputs * 500)],
    y=np.asarray(decoder_outputs * 500),
    epochs=3,
)

model.save('saved_model.hdf5')

model = keras.models.load_model('saved_model.hdf5')

and the error that I get is:

Traceback (most recent call last):
  File "example.py", line 56, in <module>
    model = keras.models.load_model('saved_model.hdf5')
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 225, in _deserialize_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 458, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "C:\Program Files\Python37\lib\site-packages\keras\layers\__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "C:\Program Files\Python37\lib\site-packages\keras\utils\generic_utils.py", line 145, in deserialize_keras_object
    list(custom_objects.items())))
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\network.py", line 1022, in from_config
    process_layer(layer_data)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\network.py", line 1008, in process_layer
    custom_objects=custom_objects)
  File "C:\Program Files\Python37\lib\site-packages\keras\layers\__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "C:\Program Files\Python37\lib\site-packages\keras\utils\generic_utils.py", line 138, in deserialize_keras_object
    ': ' + class_name)
ValueError: Unknown layer: EmbeddingRet

I am using Python 3.7.3, TensorFlow 1.13.1 and keras 2.2.4

The input of function "_get_max_suffix_repeat_times" is not correct.

the Code in transfomer.py, line 561:
_get_max_suffix_repeat_times(decoder_inputs, max_repeat * max_repeat_block) >= max_repeat:

should be corrected as:

_get_max_suffix_repeat_times(decoder_inputs[index_map[i]], max_repeat * max_repeat_block) >= max_repeat:

Is there only Multi-Head Attention?

Hello, Thanks for your open sources. This helps great to me.

I have some questions.

In "Attention is All You Need", the Multi-Head Attention is in encoder, decoder and the Masked Multi-Head Attention is only in the decoder part. But in your transformer code, I guess there's no mask part. Is it wrong? Or if there, would you let me know the "Masked" part?

Thanks a lot.

experiments on Real world examples

Thanks for the awesome library.
I tried to run on real world NMT task on the IWSLT 2014 de-en translation task. I trained the model for about 20 epochs but during inference, the trained model failed to generate meaningful sentences. Can you make a sample run on datasets such as IWSLT or WMT

what the required version of tensorflow and keras for this project?

Thanks for the package! It is very useful!

when using envs with those packages, it throw exceptions.

python=3.8
tensorflow=2.5.0
keras=2.7.0

keras/layers/core/lambda_layer.py", line 125, in Lambda @tf.__internal__.tracking.no_automatic_dependency_tracking AttributeError: module 'tensorflow.compat.v2.__internal__.tracking' has no attribute 'no_automatic_dependency_tracking'

Hierarchy of Encoders Not Working

I'm trying to encode multiple sentences, into one embedding, so first I use a self attention encoder on each word of a sentence for each sentence, and then I use another self attention encoder on each sentence representation to get a final representation. However, when I do this, it doesn't seem to work well. However, if I remove either the word level or self level self attention encoder (and replace it with just mean/sum of the embeddings, it does work. I cannot seem to figure out where the problem is, do you have any theories of what could be going wrong?

Here is (a simplified) version of my model:

def build_context_encoder(self):
    context = Input(shape=(self.max_num_words_per_context, ), dtype=tf.int64) #num_context x max_num_words_per_context #put back

  #set up word emb
  word_embs = np.zeros((self.vocab_size+1, self.word_emb_dim))  #MASK

  for word in self.word2id_dict:
        ind = self.word2id_dict[word]
        if ind > largest_ind:
            largest_ind = ind
        w_emb = self.word_embedding_model.wv[word]
        word_embs[ind] = w_emb
    

  emb_layer  = EmbeddingRet(
        input_dim=self.vocab_size+1, #is this correct?
        output_dim=self.word_emb_dim,
        mask_zero=True, #is this what we want?  Think about this
        weights=[word_embs],  #self.embedding_matrix,
        trainable=False,
        name='Encoder-Token-Embedding',
    )   
  emb = emb_layer(context) 
  encoder_embed = emb[0] 
  sa_context_encoder = transformer.get_encoders(self.encoder_num, encoder_embed, self.head_num, self.hidden_dim) #should output batch x num_contexts x emb_size
  context_emb = SumInternal()(sa_context_encoder)
  context_encoder = Model(inputs=[context], outputs=context_emb)
  return context_encoder

def build_model(self): 
    contexts = Input(shape=(self.max_num_context, self.max_num_words_per_context, ), dtype=tf.int64) #num_context x num_words


  context_embs = TimeDistributed(self.context_encoder)(contexts)

  context_embs = ZeroVectorMasker()(context_embs)  #READD

  aggregator_encoder_out = transformer.get_encoders(self.encoder_num, context_embs, self.head_num, self.hidden_dim)

  final_estimate = MeanInternal()(aggregator_encoder_out)
  model = Model(inputs=[subwords, contexts], outputs=final_estimate)
  model.compile(loss='mean_squared_error', optimizer='adam', metrics=[])
  model.summary()
  return model

Easiest way to output attention scores

Is there any easy way to output attention scores using this library? I know that in ScaledDotProductAttention there is an option to return attention but it seems when this library calls it it is set to false, and I am unsure how to change this without changing the code itself (adapting each layer to handle the returned attention scores output as well). Is there any easy way?

Possible bug: Encoder fed every step

Describe the Bug

In the transformer.py .decode() loop I found what I believe to be a bug:
The encoder is fed new information at every step, an encoder should have the input sequence fed only once, to generate the encoded representation, which is then decoded step by step.

edit:
I think the proper achitecture would be to split encoder and decoder into two models.
So when you infer, you would run the encoder model once to get the representation

Code example of a transformer model based on character level input sequence and word level output sequence

There are two string columns in my dataset. First column is typo_keyword, the second column is keyword.

I want to train a deep learning model that gets typo_keyword as features and predicts the keyword.

Here is a sample data

# Sample data
data = [
    # typo_keyword, keyword
    ('rkk mont', 'erkek mont'),
    ('erkk mont', 'erkek mont'),
    ('rkk mnt', 'erkek mont'),
    ('akllı sat', 'akıllı saat'),
    ('akıllı saaat', 'akıllı saat'),
    ('aklı saat', 'akıllı saat')
]

I want to train a transformer model based on character level input sequence which is my first column and word level output sequence which is my second using your library.

Can you give me model building and model inference code examples?

I need your help.

Best,

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

Describe the Bug

I met this in model.fit when I'm tring to add this transformer block into my own model, as a Lambda layer.

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

After search, I was told that there are unused params in def build() by def call(). But I can't target it.

def tfr(input):
trans = get_model(
token_num=3,
embed_dim=30,
encoder_num=3,
decoder_num=3,
head_num=6,
hidden_dim=120,
attention_activation='relu',
feed_forward_activation='relu',
dropout_rate=0.05,
embed_weights=np.random.random((3, 30)),
)
T = trans(input)
T = tf.reshape(T,[-1, 5, 2048, 3])
return T
......
x = Lambda(tfr, name='trans')([x, x])

tf 1.15

Error in get_model() : ValueError: too many values to unpack (expected 2)

Describe the Bug

Creating a Transformer model fails due to a tuple unpacking error (at least using eager execution). See below for the stacktrace, apparently the error is due to the decoder_embed_layer.

Version Info

I'm using the latest version

Minimal Codes To Reproduce

from keras_transformer import
import tensorflow as tf
tf.enable_eager_execution()

model = get_model(token_num=1000, embed_dim=32, encoder_num=2, decoder_num=2, head_num=4, hidden_dim=128, dropout_rate=0.05)

results in

    377     decoder_input = keras.layers.Input(shape=(None,), name='Decoder-Input')
--> 378     decoder_embed, decoder_embed_weights = decoder_embed_layer(decoder_input)
    379     decoder_embed = TrigPosEmbedding(
    380         mode=TrigPosEmbedding.MODE_ADD,

ValueError: too many values to unpack (expected 2)

star-transformer

Could you support star-transformer is this project? Thanks!

Questions about data input and output

Hello, nice to see your code, but i have a bit of confusion about data processing, hope you can answer it

Regarding the following code, I don't know why the data is not taken to the last one：

for i in range(1, len(tokens) - 1):
    encode_tokens, decode_tokens = tokens[:i], tokens[i:]
    encode_tokens = ['<START>'] + encode_tokens + ['<END>'] + ['<PAD>'] * (len(tokens) - len(encode_tokens))
    output_tokens = decode_tokens + ['<END>', '<PAD>'] + ['<PAD>'] * (len(tokens) - len(decode_tokens))

Why range (1, len (tokens) -1), not range (1, len (tokens))

Could you give some explanation, I don't know the purpose of doing this, thank you!

doesnt work when test set contains more than two sentences

I have made the following changes in the code to accommodate my own data.

##import packages
import numpy as np
from keras_transformer import get_model, decode
from keras.preprocessing.sequence import pad_sequences

##open my own file
f1=open('/content/drive/My Drive/trans_en-bn_NMT/English','r',errors='ignore').read().split('\n')
f2=open('/content/drive/My Drive/trans_en-bn_NMT/Bangla','r',errors='ignore').read().split('\n')

source_tokens=[i.split () for i in f1]
target_tokens=[i.split () for i in f2]

## Generate dictionaries
def build_token_dict(token_list):
    token_dict = {
        '<PAD>': 0,
        '<START>': 1,
        '<END>': 2,
    }
    for tokens in token_list:
        for token in tokens:
            if token not in token_dict:
                token_dict[token] = len(token_dict)
    return token_dict

source_token_dict = build_token_dict(source_tokens)
target_token_dict = build_token_dict(target_tokens)
target_token_dict_inv = {v: k for k, v in target_token_dict.items()}

# Add special tokens
encode_tokens = [['<START>'] + tokens + ['<END>'] for tokens in source_tokens]
decode_tokens = [['<START>'] + tokens + ['<END>'] for tokens in target_tokens]
output_tokens = [tokens + ['<END>', '<PAD>'] for tokens in target_tokens]
 
# Padding
source_max_len = max(map(len, encode_tokens))
target_max_len = max(map(len, decode_tokens))

encode_tokens = [tokens + ['<PAD>'] * (source_max_len - len(tokens)) for tokens in encode_tokens]
decode_tokens = [tokens + ['<PAD>'] * (target_max_len - len(tokens)) for tokens in decode_tokens]
output_tokens = [tokens + ['<PAD>'] * (target_max_len - len(tokens)) for tokens in output_tokens]
 
encode_input = [list(map(lambda x: source_token_dict[x], tokens)) for tokens in encode_tokens]
decode_input = [list(map(lambda x: target_token_dict[x], tokens)) for tokens in decode_tokens]
decode_output = [list(map(lambda x: [target_token_dict[x]], tokens)) for tokens in output_tokens]
#
# Build & fit model
model = get_model(
    token_num=max(len(source_token_dict), len(target_token_dict)),
    embed_dim=32,
    encoder_num=2,
    decoder_num=2,
    head_num=4,
    hidden_dim=128,
    dropout_rate=0.05,
    use_same_embed=False,  # Use different embeddings for different languages
)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['acc'])
model.summary()

model.fit(
    x=[np.array(encode_input), np.array(decode_input)],
    y=np.array(decode_output),
    epochs=1,
    batch_size=10,
)


# Predict
decoded = decode(
    model,
    np.array(encode_input)[48999:],
    #test_inp,
    start_token=target_token_dict['<START>'],
    end_token=target_token_dict['<END>'],
    pad_token=target_token_dict['<PAD>'],
)
##print the translated sentences
for i in decoded:
  print(''.join(map(lambda x: target_token_dict_inv[x], i[1:-1])))

but when testing, it is giving the following error

UFuncTypeError Traceback (most recent call last)
in ()
72 start_token=target_token_dict[''],
73 end_token=target_token_dict[''],
---> 74 pad_token=target_token_dict[''],
75 # top_k=10,
76 # temperature=1.0,

/usr/local/lib/python3.6/dist-packages/keras_transformer/transformer.py in decode(model, tokens, start_token, end_token, pad_token, top_k, temperature, max_len, max_repeat, max_repeat_block)
541 max_input_len = max(max_input_len, len(tokens[i]))
542 for i in range(len(batch_inputs)):
--> 543 batch_inputs[i] += [pad_token] * (max_input_len - len(batch_inputs[i]))
544 predicts = model.predict([np.array(batch_inputs), np.array(batch_outputs)])
545 for i in range(len(predicts)):

UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

Tensorflow dependency remove

Hi, thank you for these handy tool box, really helpful.

And one issue I met is I had to download tensorflow every time when I tried to install from pypi or this git repository.

I think the reason is I installed tensorflow GPU version, the package's name is tensorflow-gpu in pypi, however in the dependency file, we always get name of tensorflow, like in requirement file.

May I suggest to remove some common basic packages like tensorflow and keras since users open had these tools installed already.

Invalid head number 3 with the given input dim 128

when i give a pre_train embedding matrix where the dimension is 128，and choose the head_num=3,it occurs this question，how can i to solve it ?

Default value max_len=None in keras_transformer.decode is dangerous

It is not a bug really. But I think changing it to something like max_len=10000 in https://github.com/CyberZHG/keras-transformer/blob/master/keras_transformer/transformer.py#L395 can be reasonable.

I have spent a half of day debugging the model, trying to find the reason it hangs sometimes when predicting with (under)trained model. Well, the problem is sometimes not selected as next token (https://github.com/CyberZHG/keras-transformer/blob/master/keras_transformer/transformer.py#L428), so the condition in https://github.com/CyberZHG/keras-transformer/blob/master/keras_transformer/transformer.py#L430 is always False.

Another option would be raising an error when length of tokens list exceeds some reasonable limit.

after stacked decoder layer，I think it should be concatenate with a linear and softmax layer.

after stacked decoder layer，I think it should be concatenate with a linear and softmax layer.but in your code, it's
dense_layer = EmbeddingSim( trainable=trainable, name='Output', )([decoded_layer, decoder_embed_weights])
why do this operate?

How do you use masking with the transformer?

I'm having trouble figuring out how to use masking with the transformer. I have mask_zero set to true with the embeddings, but I'm not sure if the masking is used in the transformer calculation. Do I need to do anything special to make sure it is taken into account?

For example, if I was putting a sentence through the encoder: (2 3 2 1 0 0 0), and I want the transformer to ignore the zeros, what should I do? (The 0 embedding is a zero vector, if that matters).

How to handle multiple inputs

I want to input multiple sentences , but have the encoder applied to each sentence individually (to be combined later in the model). Is there an easy way to do this? I've tried using Lambda functions with get_encoders(), but I don't think this is correct, as I want the same set of encoders applied to each sentence.

Do I need to specify <start> and <end> of sequences

I have a sequence of inputs
encoder_inputs = [[11,22,35,43,6,30,0,0,0,0],
[31,43,56,23,45,0,0,0,0]]
decoder_inputs = [[8,3,2,4,5,2,0,0,0,0],
6,4,5,3,7,4,0,0,0,0]]
decoder_output =[[[8],[3],[2],[4],[5],[2],[0],[0],[0],[0]],
[6],[4],[5],[3],[7],[4],[0],[0],[0],[0]]]

My question, Do I need to specify start and end of sequences in my case?
Does the keras-transformer already know the padding is zero?

Finally, when evaluating, do I do this

model.evaluate(x=[encoder_inputs,decoder_inputs], y=[decoder_outputs])

Thank you

How to train a transformer model with fit_generator?

In this project, we can find that you use model.fit to train a transformer model. But I should use mode.fit_generator because of massive data. When I code a model, I should declare the input layers's names which were defined in generator function. So, can you show me a case of transformer+mode.fit_generator ?Thanks.

Question concerning default settings

In the usage example I was wondering why the encoder and decoder inputs were multiplied by 1000 (or 1024 for the translation example)?

Also is there any intuition as to why the embedded dimension was 30 or above or what the hidden dimension should be set to?

Use it as a layer

Can you share an example where it is used as an internal layer in a model besides other keras layers? Many thanks!

cyberzhg / keras-transformer Goto Github PK

keras-transformer's People

Stargazers

Watchers

Forkers

keras-transformer's Issues

Recommend Projects

Recommend Topics

Recommend Org