salesforce / ctrl Goto Github PK

View Code? Open in Web Editor NEW

1.9K 64.0 208.0 24.44 MB

Conditional Transformer Language Model for Controllable Generation

Home Page: https://arxiv.org/abs/1909.05858

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

ctrl's Issues

Running the model on TPUs?

Hi,

I have the 256 and 512 models working on GCP with a Tesla V100. Text generates, but slowly, and I'm wanting to get faster generation out of the system. I thought running CTRL on TPUs could get me faster text, but I have no idea how to do that.

Do you have an incantation or pointer that would let me point CTRL at a TPU?

Python 3 support

From running the code in Python 3:

Traceback (most recent call last):
  File "generation.py", line 40, in <module>
    vocab = open('vocab').read().decode(encoding='utf-8').split('\n')
AttributeError: 'str' object has no attribute 'decode'

There should probably be a Python 3 support pass since Python 2 is EOL.

not TPU, when running generation.py , errors occurred for GPU and CPU.

The message is below：

WARNING:tensorflow:From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From generation.py:38: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.

2019-09-19 23:17:19.722929: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/jdk/jre/lib/amd64/server:/usr/local/jdk/jre/lib/amd64/server
2019-09-19 23:17:19.723002: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2019-09-19 23:17:19.723063: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: host10875366
2019-09-19 23:17:19.723076: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: host10875366
2019-09-19 23:17:19.723127: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2019-09-19 23:17:19.723188: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 418.67.0
2019-09-19 23:17:19.723472: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-19 23:17:19.738324: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2019-09-19 23:17:19.746153: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4feba30 executing computations on platform Host. Devices:
2019-09-19 23:17:19.746235: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From generation.py:127: The name tf.train.AdagradOptimizer is deprecated. Please use tf.compat.v1.train.AdagradOptimizer instead.

WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/initializers.py:143: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-09-19 23:18:04.207646: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:CrossShardOptimizer should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_util.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor with 315563520 elements. This may consume a large amount of memory.
num_elements)
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/clip_ops.py:286: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adagrad.py:76: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
246534 unique words
Model: "model"

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 256)] 0

tied_embedding_softmax (TiedEmb multiple 315810054 input_1[0][0]
encoder[0][0]

encoder (Encoder) (None, 256, 1280) 1322154496 tied_embedding_softmax[0][0]

Total params: 1,637,964,550
Trainable params: 1,637,964,550
Non-trainable params: 0

None
Traceback (most recent call last):
File "generation.py", line 146, in
estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator
config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 331, in _save_first_checkpoint
saver.save(sess, latest_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1173, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CrossReplicaSum' used by node training/CrossReplicaSum (defined at generation.py:146) with these attrs: [T=DT_FLOAT, _class=["loc:@training/gradients/concat"]]
Registered devices: [CPU, XLA_CPU]
Registered kernels:

 [[training/CrossReplicaSum]]

MINIMUM_NUMBER_OF_TOPK not defined when using top-p sampling

Just for fun: How long would training this model take on a Nvidia 1080Ti GPU (12gb)

It says in the paper they used 256 cores of Cloud TPU V3. Is it possible to estimate by any means?
a year?

I can do inference with pretrained model but face error when finetuning.

Hi,
Thank you for your paper and model.
I can do inference with your pretrained model but the output is strange when finetuning.
The envs is as follows,
tensorflow 1.14.0-gpu python 3.6.8 one GPU Tesla M40 24GB CUDA 10.0

The error is as following:

019-10-11` 10:23:29.882912: I tensorflow/core/common_runtime/placer.cc:54] report_uninitialized_resources_1/Const: (Const)/job:localhost/replica:0/task:0/device:CPU:0

concat_1/axis: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.882949: I tensorflow/core/common_runtime/placer.cc:54] concat_1/axis: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/filename/input: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.882980: I tensorflow/core/common_runtime/placer.cc:54] save/filename/input: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/StringJoin/inputs_1: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883008: I tensorflow/core/common_runtime/placer.cc:54] save/StringJoin/inputs_1: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/num_shards: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883041: I tensorflow/core/common_runtime/placer.cc:54] save/num_shards: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/ShardedFilename/shard: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883073: I tensorflow/core/common_runtime/placer.cc:54] save/ShardedFilename/shard: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/SaveV2/tensor_names: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883101: I tensorflow/core/common_runtime/placer.cc:54] save/SaveV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/SaveV2/shape_and_slices: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883130: I tensorflow/core/common_runtime/placer.cc:54] save/SaveV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/RestoreV2/tensor_names: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883163: I tensorflow/core/common_runtime/placer.cc:54] save/RestoreV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/RestoreV2/shape_and_slices: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883195: I tensorflow/core/common_runtime/placer.cc:54] save/RestoreV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:46.755980: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
ERROR:tensorflow:Error recorded from training_loop: moby_dick.txt.tfrecords/graph.pbtxt.tmp1a94ee91758c42f88fddd51c196b3dbe; Not a directory
INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 725, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in init
_WrappedSession.init(self, self._create_session())
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
return self._sess_creator.create_session()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 878, in create_session
hook.after_create_session(self.tf_sess, self.coord)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 572, in after_create_session
self._checkpoint_dir, "graph.pbtxt")
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/graph_io.py", line 72, in write_graph
graph_def, float_format=''))
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 538, in atomic_write_string_to_file
write_string_to_file(temp_pathname, contents)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 347, in write_string_to_file
f.write(file_content)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 106, in write
self._prewrite_check()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 92, in _prewrite_check
compat.as_bytes(self.__name), compat.as_bytes(self.__mode))
tensorflow.python.framework.errors_impl.FailedPreconditionError: moby_dick.txt.tfrecords/graph.pbtxt.tmp1a94ee91758c42f88fddd51c196b3dbe; Not a directory

Process finished with exit code 1

Could you help me with it. Thank you.

Yixian

memory usage on lower_memory branch

Hi, just wondering what the actual difference in memory usage is between the lower_memory branch and the main branch.

Thanks.

multiple tags as control code

Does the following mean one training record for each tag of the multiple tags? Say, if my average number of multiple tags is 10, the data size for fine-tuning will become 10x of the original? Is this understanding correct? If correct, I plan to give it a try & thank you for your advice.

The way it's trained, the current checkpoints don't support that. However, there is nothing preventing one from fine-tuning (or re-training) CTRL to do that. I'm fairly sure that the model will learn to pick it up.

Originally posted by @keskarnitish in #33 (comment)

TypeError when running generation.py

TypeError: in converted code:

ctrl/transformer.py:138 call *
    x = getattr(self, "layer%i" % i)(x, training, mask)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:629 __call__
    outputs = call_fn(inputs, *args, **kwargs)
ctrl/transformer.py:92 call *
    attn_output  = self.multi_head_attention(normed, normed, normed, mask)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:629 __call__
    outputs = call_fn(inputs, *args, **kwargs)
ctrl/transformer.py:60 call *
    q = self.split_into_heads(q, batch_size)
ctrl/transformer.py:50 split_into_heads
    x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:7715 reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:530 _apply_op_helper
    raise err
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:527 _apply_op_helper
    preferred_dtype=default_dtype)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1224 internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1145 _autopacking_conversion_function
    return _autopacking_helper(v, dtype, name or "packed")
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1094 _autopacking_helper
    constant_op.constant(elem, dtype=dtype, name=str(i)))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:246 constant
    allow_broadcast=True)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:284 _constant_impl
    allow_broadcast=allow_broadcast))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:466 make_tensor_proto
    _AssertCompatible(values, dtype)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:371 _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))

TypeError: Expected int32, got 80.0 of type 'float' instead.

And I found it's the following code that cause the error
x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth)) # transformer.py line 49
self.depth should be an integer so that it can be fed into tf.reshape

It can be fixed by changing this line
self.depth = d_model_size // self.num_heads # transformer.py line 40

training_utils on new vocab and codes

Hi !

Does it make sense to run make_tf_records.py and training.py on a completely new vocab and codes made by fastBPE using french texts, without using your vocab and codes ?
If it does, how ?

Thanks a lot for your help ! :)

Sampling settings used in the paper

Hi,

I wanted to ask what sampling settings (temperature, top-k, top-p) were used when generating the text samples in the paper and whether the samples were randomly chosen or were they the best of x tries?

Thanks

Can a conditional control code be multiple tags or multiple keywords？

A conditional control code in control_codes.txt may be a class or tag, can a conditional control code be multiple keywords or multiple entities and how can I do it ?

How to finetune on TPU v3-8 nodes? It runs without error but does not seem to progress.

Hi!

thanks for the great paper and for providing code and model. I am trying to finetune the model on a TPU v3-8 node in the Google cloud. I made the following changes:

I added optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer) to training.py
I patched keras.py and then set use_tpu=True and batch_size=8.
I set num_cores_per_replica=8, iterations_per_loop=1 and added cluster=tf.contrib.cluster_resolver.TPUClusterResolver() in the call to tf.contrib.tpu.RunConfig. This should distribute the models across the 8 cores in a TPU. I found that with lower numbers for num_cores_per_replica I get an out-of-memory error. This is the exact code:
run_config = tf.contrib.tpu.RunConfig( cluster=tf.contrib.cluster_resolver.TPUClusterResolver(), model_dir=args.model_dir, session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True), tpu_config=tf.contrib.tpu.TPUConfig(iterations_per_loop=1, num_cores_per_replica=8, per_host_input_for_training=3))

With these changes I can get the training.py to run with the seq256_v1 model without error. However, it doesn't seem to be doing anything after the model has been compiled, initialized from the checkpoint and the batches are being fed to the TPU. Even with a batch_size of only 8 and a total of 256 TFRecords in the input file, it never completes. The output I get is

...
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f95b350b110>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f95b350b110>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f95b350b150>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f95b350b150>>: AttributeError: 'module' object has no attribute 'Num'
...
INFO:tensorflow:Starting infeed thread controller.
INFO:tensorflow:Starting outfeed thread controller.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
...

The last WARNING line keeps repeating.

With Tensorboard I wasn't able to get a trace, which may indicate nothing is happening on the TPU.

By my simple calculation based on the numbers presented in the paper, I should be able to get 1024 (examples/batch) * 800,000 (# iterations) / 32 ( = 256/8, number of cores in TPU V3-256 Pod used in paper / number of cores in TPU v3-8 node) / 24 (hours) / 14 (days) / 3600 (seconds/hr) ~20 examples per second.

I have been able to run other (much smaller) Keras models in tf 1.14 on a TPU v3-8 using the same RunConfig, where I also parallelized the model across the 8 TPU cores.

Do you have any idea why the training does not seem to work (or at best is extremely slow)? Am I parallellizing the model across the 8 TPU cores in the correct way? How was this done for the paper?

Any help would be greatly appreciated!

Many thanks,
Kees

PS I get the same result when I add input_partition_dims=[[1, 1], [1, 1]] as an option to tpu_config.

Question about vocabulary file

Hello,

There's no script to generate the vocabulary file vocab. Could you tell us how the vocabulary file is generated in detail?

Enhancement request: How to read prompts from a text file?

Thanks for the cool model and repo.
New to python and pytorch. Using ubuntu 18.04 and python 3.6.8
Have inference working fine using 512 and 256 models with prompt on local 8 gig gpu.

I would appreciate if you could suggest coding change that would allow pytorch_generation.py to read an input text file line-by-line instead of manually entering each prompt.

Format of text file would be same as prompt.

For example:

Books This is the first line.
Books This is the second line.
Books This is the third line.
etc.

Cheers.

OutOfMemory in fine-tuning.

I run fine-tuning on GPU Tesla M40 24GB with batchsize 4 and I'm faced with the OOM error as following. Is it normal？When I change the batchsize to 1. The same error occurs.

2019-10-18 04:59:34.679876: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****************************************************************************************************
2019-10-18 04:59:34.679956: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at reduction_ops_common.h:180 : Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
ERROR:tensorflow:Error recorded from training_loop: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Original stack trace for 'encoder/encoder_layer_30/layer_normalization_60/moments/mean':
File "root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self.train_model_default(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in call_model_fn
config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in call_model_fn
model_fn_results = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in model_fn
features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self.call_model_fn(features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in call_model_fn
estimator_spec = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 242, in model_fn
labels)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 201, in clone_and_build_model
optimizer_iterations=global_step)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 538, in clone_and_build_model
clone = clone_model(model, input_tensors=input_tensors)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 326, in clone_model
model, input_tensors=input_tensors, layer_fn=clone_function)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 172, in clone_functional_model
output_tensors = layer(computed_tensors, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmpmr9k1s27.py", line 18, in tf__call
x, = ag.for_stmt(ag.converted_call(range, None, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (self.num_layers,), None), None, loop_body, (x,))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 110, in for_stmt
return py_for_stmt(iter, extra_test, body, init_state)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 119, in py_for_stmt
state = body(target, *state)
File "tmp/tmpmr9k1s27.py", line 16, in loop_body
x_1 = ag.converted_call(getattr(self, 'layer%i' % i), None, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x_1, training, mask), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmp9muco6_0.py", line 8, in tf__call
normed = ag.converted_call('layernorm1', self, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x,), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 999, in call
mean, variance = nn.moments(inputs, self.axis, keep_dims=True)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 1028, in moments
mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1764, in reduce_mean
name=name))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6180, in mean
name=name)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node encoder/encoder_layer_30/layer_normalization_60/moments/mean}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node encoder/encoder_layer_30/layer_normalization_60/moments/mean}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
raise six.reraise(*original_exc_info)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
return self._sess.run(*args, **kwargs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1411, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1169, in run
return self._sess.run(*args, **kwargs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Original stack trace for 'encoder/encoder_layer_30/layer_normalization_60/moments/mean':
File "root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self.train_model_default(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in call_model_fn
config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in call_model_fn
model_fn_results = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in model_fn
features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self.call_model_fn(features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in call_model_fn
estimator_spec = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 242, in model_fn
labels)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 201, in clone_and_build_model
optimizer_iterations=global_step)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 538, in clone_and_build_model
clone = clone_model(model, input_tensors=input_tensors)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 326, in clone_model
model, input_tensors=input_tensors, layer_fn=clone_function)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 172, in clone_functional_model
output_tensors = layer(computed_tensors, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmpmr9k1s27.py", line 18, in tf__call
x, = ag.for_stmt(ag.converted_call(range, None, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (self.num_layers,), None), None, loop_body, (x,))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 110, in for_stmt
return py_for_stmt(iter, extra_test, body, init_state)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 119, in py_for_stmt
state = body(target, *state)
File "tmp/tmpmr9k1s27.py", line 16, in loop_body
x_1 = ag.converted_call(getattr(self, 'layer%i' % i), None, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x_1, training, mask), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmp9muco6_0.py", line 8, in tf__call
normed = ag.converted_call('layernorm1', self, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x,), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 999, in call
mean, variance = nn.moments(inputs, self.axis, keep_dims=True)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 1028, in moments
mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1764, in reduce_mean
name=name))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6180, in mean
name=name)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

Process finished with exit code 1

Update Google Colab Notebook to Correct Model Path Flag

The current pytorch version of colab notebook is out of date using the previous --model flag instead of the updated --model_path flag

Simple fix that would possibly help some people who are testing it out for the first time :)

Great job with this project - it's an incredible model

Please explicitly state the language, and whether it is language independent

Please explicitly state the language, and whether it is language independent and if so how. Thanks!

Refer: https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/

Could not allocate pinned host memory of size: 2147483648

Running !python2 generation.py --model_dir "/content/ctrl/seqlen256_v1.ckpt" in Colab outputs this:

WARNING: Logging before flag parsing goes to stderr.
W0912 03:52:40.595153 139689530402688 deprecation_wrapper.py:119] From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

W0912 03:52:40.605669 139689530402688 deprecation_wrapper.py:119] From generation.py:35: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.

246534 unique words
2019-09-12 03:52:40.930801: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-09-12 03:52:40.971309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:40.971914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-12 03:52:40.972273: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-12 03:52:40.973635: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-12 03:52:40.975007: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-12 03:52:40.975404: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-12 03:52:40.976992: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-12 03:52:40.978135: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-12 03:52:40.981770: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-12 03:52:40.981927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:40.982547: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:40.983109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-12 03:52:40.983494: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-09-12 03:52:41.114324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.115113: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5574d0e20d80 executing computations on platform CUDA. Devices:
2019-09-12 03:52:41.115150: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-09-12 03:52:41.117511: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000170000 Hz
2019-09-12 03:52:41.117862: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5574d0e212c0 executing computations on platform Host. Devices:
2019-09-12 03:52:41.117916: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-12 03:52:41.118114: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.118668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-12 03:52:41.118728: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-12 03:52:41.118748: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-12 03:52:41.118766: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-12 03:52:41.118784: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-12 03:52:41.118811: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-12 03:52:41.118840: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-12 03:52:41.118858: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-12 03:52:41.118934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.119479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.120052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-12 03:52:41.120121: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-12 03:52:41.121241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-12 03:52:41.121268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-12 03:52:41.121280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-12 03:52:41.121403: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.121995: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.122491: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-09-12 03:52:41.122537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
W0912 03:52:58.330300 139689530402688 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0912 03:52:58.330642 139689530402688 deprecation_wrapper.py:119] From generation.py:124: The name tf.train.AdagradOptimizer is deprecated. Please use tf.compat.v1.train.AdagradOptimizer instead.

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 256)]        0                                            
__________________________________________________________________________________________________
tied_embedding_softmax (TiedEmb multiple             315810054   input_1[0][0]                    
                                                                 encoder[0][0]                    
__________________________________________________________________________________________________
encoder (Encoder)               (None, 256, 1280)    1322154496  tied_embedding_softmax[0][0]     
==================================================================================================
Total params: 1,637,964,550
Trainable params: 1,637,964,550
Non-trainable params: 0
__________________________________________________________________________________________________
None
2019-09-12 03:52:58.496625: W tensorflow/core/framework/allocator.cc:107] Allocation of 1262254080 exceeds 10% of system memory.
tcmalloc: large alloc 1262256128 bytes == 0x557523406000 @  0x7f0c00918b6b 0x7f0c00938379 0x7f0bbd80d754 0x7f0bbd7c8c8a 0x7f0bbd505f11 0x7f0bbd518f08 0x7f0bc366a00c 0x7f0bc3660298 0x7f0bc10448c7 0x7f0bc0fbc97c 0x7f0bc0fbed9d 0x5574cfe6af6e 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe7d03c 0x5574cfe4cf1e 0x5574cfe662d5 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a
tcmalloc: large alloc 1262256128 bytes == 0x55756e7ce000 @  0x7f0c009361e7 0x7f0bfe37c771 0x7f0bfe3e4028 0x7f0bfe3d90d5 0x7f0bfe46ff77 0x5574cfe63e8a 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe60fb9 0x5574cfe91e7f 0x5574cfe8cc12 0x5574cfe8c09d 0x5574cfe3ad6b 0x7f0c00533b97 0x5574cfe3a5ea
W0912 03:53:06.230777 139689530402688 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/initializers.py:143: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0912 03:53:11.251795 139689530402688 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-09-12 03:53:24.403230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.403729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-12 03:53:24.403847: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-12 03:53:24.403869: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-12 03:53:24.403910: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-12 03:53:24.403931: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-12 03:53:24.403952: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-12 03:53:24.403975: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-12 03:53:24.403994: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-12 03:53:24.404096: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.404475: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.404802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-12 03:53:24.404864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-12 03:53:24.404878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-12 03:53:24.404901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-12 03:53:24.405005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.405377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.405756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-09-12 03:53:32.494371: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 2147483648 bytes on host: CUDA_ERROR_INVALID_VALUE: invalid argument
2019-09-12 03:53:32.511468: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 2147483648

smaller model

If a smaller model is preferred for easier experiments and faster iterations, what sizes of models would you recommend? Is the following the only place to adjust? Thank you for great work ans shedding more lights.

class Encoder(torch.nn.Module):
  def __init__(self, num_layers=48, d_model_size=1280, num_heads=16, dff=8192, input_vocab_size=50000, rate=0.1, **kwargs)

Fine-Tuning the Model on Custom Dataset

Hello,

I have a question about fine-tuning the model on custom data.

Is is ok to change the seq len in model to fine-tune the seqlen256_v1.ckpt/ model on custom data?

pytorch generation error

Hello, i tried to run generation using pytorch

python pytorch_generation.py --model_path ./seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001 --print_once

And get an exception while converting tensorflow model to pytorch:

Could not find PyTorch checkpoint
Converting weights and will store the PyTorch checkpoint as  59fb4c9fc12d31d104fd09b35e167d69
Read 200000 codes from the codes file.
2019-09-25 12:29:42.180793: W tensorflow/core/framework/allocator.cc:107] Allocation of 1262254080 exceeds 10% of system memory.
  2%|▏         | 1/48 [00:00<00:05,  8.24it/s]
Traceback (most recent call last):
  File "/home/vlad/Documents/coding/experiments/ctrl/pytorch_generation.py", line 128, in <module>
    current_layer.layernorm1.bias = str2parameter(layer_variables[0])
IndexError: list index out of range

Process finished with exit code 1

My environment:

python 3.6.8
torch 1.2.0
tensorflow==1.14.0
tensorflow-estimator==1.14.0

Also i found an error in 88 line in pytorch_generation.py. You should first encode string to get it hash

pytorch_model_hash = hashlib.md5(args.model_path.encode('utf-8')).hexdigest()

And I also have a question: Did you test your solution on python3?

Repetitive generation for simple prompt

Followed the exact steps documented in README. The model with sequence length 256 running:

ENTER PROMPT: hello this is GPT. how are you?

Is this error reproducible by others?

AttributeError: module 'gast' has no attribute 'Num'

`
$ python generation.py --model_dir ./seqlen256_v1.ckpt/
WARNING:tensorflow:From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From generation.py:40: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.

246534 unique words
2019-09-24 05:25:59.182876: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-24 05:25:59.187901: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300070000 Hz
2019-09-24 05:25:59.188597: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55690b21cf30 executing computations on platform Host. Devices:
2019-09-24 05:25:59.188731: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
WARNING:tensorflow:Entity <bound method TiedEmbeddingSoftmax.call of <main.TiedEmbeddingSoftmax object at 0x7f7a1c112050>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method TiedEmbeddingSoftmax.call of <main.TiedEmbeddingSoftmax object at 0x7f7a1c112050>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method Encoder.call of <transformer.Encoder object at 0x7f7a1c10a990>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method Encoder.call of <transformer.Encoder object at 0x7f7a1c10a990>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aff3f50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aff3f50>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a21b81150>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a21b81150>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1af28650>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1af28650>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1af28510>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1af28510>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aed7f90>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aed7f90>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aed7e50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aed7e50>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aeff5d0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aeff5d0>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aeff610>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aeff610>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae8d810>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae8d810>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae8d850>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae8d850>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae9aad0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae9aad0>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae9ab10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae9ab10>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aea7d10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aea7d10>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aea7d50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aea7d50>>: AttributeError: module 'gast' has no attribute 'Num'
^CTraceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 524, in to_graph
return conversion.convert(entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 306, in convert
entity, program_ctx, free_nonglobal_var_names)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 229, in _convert_with_cache
entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 433, in convert_entity_to_ast
nodes, name, entity_info = convert_func_to_ast(o, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 624, in convert_func_to_ast
node = node_to_graph(node, context)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 657, in node_to_graph
node = converter.standard_analysis(node, context, is_initial=True)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/core/converter.py", line 354, in standard_analysis
node = qual_names.resolve(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/qual_names.py", line 254, in resolve
return QnResolver().visit(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 317, in generic_visit
value = self.visit(value)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 326, in generic_visit
new_node = self.visit(old_value)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/qual_names.py", line 236, in visit_Subscript
if isinstance(s.value, gast.Num):
AttributeError: module 'gast' has no attribute 'Num'