Coder Social home page Coder Social logo

ctrl's Issues

Running the model on TPUs?

Hi,

I have the 256 and 512 models working on GCP with a Tesla V100. Text generates, but slowly, and I'm wanting to get faster generation out of the system. I thought running CTRL on TPUs could get me faster text, but I have no idea how to do that.

Do you have an incantation or pointer that would let me point CTRL at a TPU?

Python 3 support

From running the code in Python 3:

Traceback (most recent call last):
  File "generation.py", line 40, in <module>
    vocab = open('vocab').read().decode(encoding='utf-8').split('\n')
AttributeError: 'str' object has no attribute 'decode'

There should probably be a Python 3 support pass since Python 2 is EOL.

not TPU, when running generation.py , errors occurred for GPU and CPU.

The message is below:

WARNING:tensorflow:From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From generation.py:38: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.

2019-09-19 23:17:19.722929: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/jdk/jre/lib/amd64/server:/usr/local/jdk/jre/lib/amd64/server
2019-09-19 23:17:19.723002: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2019-09-19 23:17:19.723063: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: host10875366
2019-09-19 23:17:19.723076: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: host10875366
2019-09-19 23:17:19.723127: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2019-09-19 23:17:19.723188: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 418.67.0
2019-09-19 23:17:19.723472: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-19 23:17:19.738324: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2019-09-19 23:17:19.746153: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4feba30 executing computations on platform Host. Devices:
2019-09-19 23:17:19.746235: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

WARNING:tensorflow:From generation.py:127: The name tf.train.AdagradOptimizer is deprecated. Please use tf.compat.v1.train.AdagradOptimizer instead.

WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/initializers.py:143: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-09-19 23:18:04.207646: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:CrossShardOptimizer should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_util.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor with 315563520 elements. This may consume a large amount of memory.
num_elements)
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/clip_ops.py:286: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adagrad.py:76: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
246534 unique words
Model: "model"


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 256)] 0


tied_embedding_softmax (TiedEmb multiple 315810054 input_1[0][0]
encoder[0][0]


encoder (Encoder) (None, 256, 1280) 1322154496 tied_embedding_softmax[0][0]

Total params: 1,637,964,550
Trainable params: 1,637,964,550
Non-trainable params: 0


None
Traceback (most recent call last):
File "generation.py", line 146, in
estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator
config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 331, in _save_first_checkpoint
saver.save(sess, latest_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1173, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CrossReplicaSum' used by node training/CrossReplicaSum (defined at generation.py:146) with these attrs: [T=DT_FLOAT, _class=["loc:@training/gradients/concat"]]
Registered devices: [CPU, XLA_CPU]
Registered kernels:

 [[training/CrossReplicaSum]]

I can do inference with pretrained model but face error when finetuning.

Hi,
Thank you for your paper and model.
I can do inference with your pretrained model but the output is strange when finetuning.
The envs is as follows,
tensorflow 1.14.0-gpu python 3.6.8 one GPU Tesla M40 24GB CUDA 10.0

The error is as following:

019-10-11` 10:23:29.882912: I tensorflow/core/common_runtime/placer.cc:54] report_uninitialized_resources_1/Const: (Const)/job:localhost/replica:0/task:0/device:CPU:0

concat_1/axis: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.882949: I tensorflow/core/common_runtime/placer.cc:54] concat_1/axis: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/filename/input: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.882980: I tensorflow/core/common_runtime/placer.cc:54] save/filename/input: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/StringJoin/inputs_1: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883008: I tensorflow/core/common_runtime/placer.cc:54] save/StringJoin/inputs_1: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/num_shards: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883041: I tensorflow/core/common_runtime/placer.cc:54] save/num_shards: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/ShardedFilename/shard: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883073: I tensorflow/core/common_runtime/placer.cc:54] save/ShardedFilename/shard: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/SaveV2/tensor_names: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883101: I tensorflow/core/common_runtime/placer.cc:54] save/SaveV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/SaveV2/shape_and_slices: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883130: I tensorflow/core/common_runtime/placer.cc:54] save/SaveV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/RestoreV2/tensor_names: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883163: I tensorflow/core/common_runtime/placer.cc:54] save/RestoreV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/RestoreV2/shape_and_slices: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883195: I tensorflow/core/common_runtime/placer.cc:54] save/RestoreV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:46.755980: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
ERROR:tensorflow:Error recorded from training_loop: moby_dick.txt.tfrecords/graph.pbtxt.tmp1a94ee91758c42f88fddd51c196b3dbe; Not a directory
INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 725, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in init
_WrappedSession.init(self, self._create_session())
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
return self._sess_creator.create_session()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 878, in create_session
hook.after_create_session(self.tf_sess, self.coord)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 572, in after_create_session
self._checkpoint_dir, "graph.pbtxt")
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/graph_io.py", line 72, in write_graph
graph_def, float_format=''))
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 538, in atomic_write_string_to_file
write_string_to_file(temp_pathname, contents)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 347, in write_string_to_file
f.write(file_content)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 106, in write
self._prewrite_check()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 92, in _prewrite_check
compat.as_bytes(self.__name), compat.as_bytes(self.__mode))
tensorflow.python.framework.errors_impl.FailedPreconditionError: moby_dick.txt.tfrecords/graph.pbtxt.tmp1a94ee91758c42f88fddd51c196b3dbe; Not a directory

Process finished with exit code 1

Could you help me with it. Thank you.

Yixian

multiple tags as control code

Does the following mean one training record for each tag of the multiple tags? Say, if my average number of multiple tags is 10, the data size for fine-tuning will become 10x of the original? Is this understanding correct? If correct, I plan to give it a try & thank you for your advice.

The way it's trained, the current checkpoints don't support that. However, there is nothing preventing one from fine-tuning (or re-training) CTRL to do that. I'm fairly sure that the model will learn to pick it up.

Originally posted by @keskarnitish in #33 (comment)

TypeError when running generation.py

TypeError: in converted code:

ctrl/transformer.py:138 call *
    x = getattr(self, "layer%i" % i)(x, training, mask)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:629 __call__
    outputs = call_fn(inputs, *args, **kwargs)
ctrl/transformer.py:92 call *
    attn_output  = self.multi_head_attention(normed, normed, normed, mask)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:629 __call__
    outputs = call_fn(inputs, *args, **kwargs)
ctrl/transformer.py:60 call *
    q = self.split_into_heads(q, batch_size)
ctrl/transformer.py:50 split_into_heads
    x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:7715 reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:530 _apply_op_helper
    raise err
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:527 _apply_op_helper
    preferred_dtype=default_dtype)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1224 internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1145 _autopacking_conversion_function
    return _autopacking_helper(v, dtype, name or "packed")
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1094 _autopacking_helper
    constant_op.constant(elem, dtype=dtype, name=str(i)))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:246 constant
    allow_broadcast=True)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:284 _constant_impl
    allow_broadcast=allow_broadcast))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:466 make_tensor_proto
    _AssertCompatible(values, dtype)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:371 _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))

TypeError: Expected int32, got 80.0 of type 'float' instead.

And I found it's the following code that cause the error
x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth)) # transformer.py line 49
self.depth should be an integer so that it can be fed into tf.reshape

It can be fixed by changing this line
self.depth = d_model_size // self.num_heads # transformer.py line 40

training_utils on new vocab and codes

Hi !

Does it make sense to run make_tf_records.py and training.py on a completely new vocab and codes made by fastBPE using french texts, without using your vocab and codes ?
If it does, how ?

Thanks a lot for your help ! :)

Sampling settings used in the paper

Hi,

I wanted to ask what sampling settings (temperature, top-k, top-p) were used when generating the text samples in the paper and whether the samples were randomly chosen or were they the best of x tries?

Thanks

How to finetune on TPU v3-8 nodes? It runs without error but does not seem to progress.

Hi!

thanks for the great paper and for providing code and model. I am trying to finetune the model on a TPU v3-8 node in the Google cloud. I made the following changes:

  • I added optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer) to training.py
  • I patched keras.py and then set use_tpu=True and batch_size=8.
  • I set num_cores_per_replica=8, iterations_per_loop=1 and added cluster=tf.contrib.cluster_resolver.TPUClusterResolver() in the call to tf.contrib.tpu.RunConfig. This should distribute the models across the 8 cores in a TPU. I found that with lower numbers for num_cores_per_replica I get an out-of-memory error. This is the exact code:
    run_config = tf.contrib.tpu.RunConfig( cluster=tf.contrib.cluster_resolver.TPUClusterResolver(), model_dir=args.model_dir, session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True), tpu_config=tf.contrib.tpu.TPUConfig(iterations_per_loop=1, num_cores_per_replica=8, per_host_input_for_training=3))

With these changes I can get the training.py to run with the seq256_v1 model without error. However, it doesn't seem to be doing anything after the model has been compiled, initialized from the checkpoint and the batches are being fed to the TPU. Even with a batch_size of only 8 and a total of 256 TFRecords in the input file, it never completes. The output I get is

...
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f95b350b110>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f95b350b110>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f95b350b150>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f95b350b150>>: AttributeError: 'module' object has no attribute 'Num'
...
INFO:tensorflow:Starting infeed thread controller.
INFO:tensorflow:Starting outfeed thread controller.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
...

The last WARNING line keeps repeating.

With Tensorboard I wasn't able to get a trace, which may indicate nothing is happening on the TPU.

By my simple calculation based on the numbers presented in the paper, I should be able to get 1024 (examples/batch) * 800,000 (# iterations) / 32 ( = 256/8, number of cores in TPU V3-256 Pod used in paper / number of cores in TPU v3-8 node) / 24 (hours) / 14 (days) / 3600 (seconds/hr) ~20 examples per second.

I have been able to run other (much smaller) Keras models in tf 1.14 on a TPU v3-8 using the same RunConfig, where I also parallelized the model across the 8 TPU cores.

Do you have any idea why the training does not seem to work (or at best is extremely slow)? Am I parallellizing the model across the 8 TPU cores in the correct way? How was this done for the paper?

Any help would be greatly appreciated!

Many thanks,
Kees

PS I get the same result when I add input_partition_dims=[[1, 1], [1, 1]] as an option to tpu_config.

Enhancement request: How to read prompts from a text file?

Thanks for the cool model and repo.
New to python and pytorch. Using ubuntu 18.04 and python 3.6.8
Have inference working fine using 512 and 256 models with prompt on local 8 gig gpu.

I would appreciate if you could suggest coding change that would allow pytorch_generation.py to read an input text file line-by-line instead of manually entering each prompt.

Format of text file would be same as prompt.

For example:

Books This is the first line.
Books This is the second line.
Books This is the third line.
etc.

Cheers.

OutOfMemory in fine-tuning.

I run fine-tuning on GPU Tesla M40 24GB with batchsize 4 and I'm faced with the OOM error as following. Is it normal?When I change the batchsize to 1. The same error occurs.

2019-10-18 04:59:34.679876: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****************************************************************************************************
2019-10-18 04:59:34.679956: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at reduction_ops_common.h:180 : Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
ERROR:tensorflow:Error recorded from training_loop: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Original stack trace for 'encoder/encoder_layer_30/layer_normalization_60/moments/mean':
File "root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self.train_model_default(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in call_model_fn
config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in call_model_fn
model_fn_results = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in model_fn
features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self.call_model_fn(features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in call_model_fn
estimator_spec = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 242, in model_fn
labels)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 201, in clone_and_build_model
optimizer_iterations=global_step)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 538, in clone_and_build_model
clone = clone_model(model, input_tensors=input_tensors)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 326, in clone_model
model, input_tensors=input_tensors, layer_fn=clone_function)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 172, in clone_functional_model
output_tensors = layer(computed_tensors, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmpmr9k1s27.py", line 18, in tf__call
x, = ag
.for_stmt(ag
.converted_call(range, None, ag
.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (self.num_layers,), None), None, loop_body, (x,))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 110, in for_stmt
return py_for_stmt(iter, extra_test, body, init_state)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 119, in py_for_stmt
state = body(target, *state)
File "tmp/tmpmr9k1s27.py", line 16, in loop_body
x_1 = ag
.converted_call(getattr(self, 'layer%i' % i), None, ag
.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x_1, training, mask), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmp9muco6_0.py", line 8, in tf__call
normed = ag
.converted_call('layernorm1', self, ag
.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x,), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 999, in call
mean, variance = nn.moments(inputs, self.axis, keep_dims=True)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 1028, in moments
mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1764, in reduce_mean
name=name))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6180, in mean
name=name)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node encoder/encoder_layer_30/layer_normalization_60/moments/mean}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node encoder/encoder_layer_30/layer_normalization_60/moments/mean}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
raise six.reraise(*original_exc_info)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
return self._sess.run(*args, **kwargs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1411, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1169, in run
return self._sess.run(*args, **kwargs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Original stack trace for 'encoder/encoder_layer_30/layer_normalization_60/moments/mean':
File "root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self.train_model_default(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in call_model_fn
config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in call_model_fn
model_fn_results = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in model_fn
features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self.call_model_fn(features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in call_model_fn
estimator_spec = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 242, in model_fn
labels)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 201, in clone_and_build_model
optimizer_iterations=global_step)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 538, in clone_and_build_model
clone = clone_model(model, input_tensors=input_tensors)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 326, in clone_model
model, input_tensors=input_tensors, layer_fn=clone_function)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 172, in clone_functional_model
output_tensors = layer(computed_tensors, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmpmr9k1s27.py", line 18, in tf__call
x, = ag
.for_stmt(ag
.converted_call(range, None, ag
.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (self.num_layers,), None), None, loop_body, (x,))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 110, in for_stmt
return py_for_stmt(iter, extra_test, body, init_state)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 119, in py_for_stmt
state = body(target, *state)
File "tmp/tmpmr9k1s27.py", line 16, in loop_body
x_1 = ag
.converted_call(getattr(self, 'layer%i' % i), None, ag
.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x_1, training, mask), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmp9muco6_0.py", line 8, in tf__call
normed = ag
.converted_call('layernorm1', self, ag
.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x,), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 999, in call
mean, variance = nn.moments(inputs, self.axis, keep_dims=True)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 1028, in moments
mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1764, in reduce_mean
name=name))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6180, in mean
name=name)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

Process finished with exit code 1

Update Google Colab Notebook to Correct Model Path Flag

The current pytorch version of colab notebook is out of date using the previous --model flag instead of the updated --model_path flag

Simple fix that would possibly help some people who are testing it out for the first time :)

Great job with this project - it's an incredible model

Could not allocate pinned host memory of size: 2147483648

Running !python2 generation.py --model_dir "/content/ctrl/seqlen256_v1.ckpt" in Colab outputs this:

WARNING: Logging before flag parsing goes to stderr.
W0912 03:52:40.595153 139689530402688 deprecation_wrapper.py:119] From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

W0912 03:52:40.605669 139689530402688 deprecation_wrapper.py:119] From generation.py:35: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.

246534 unique words
2019-09-12 03:52:40.930801: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-09-12 03:52:40.971309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:40.971914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-12 03:52:40.972273: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-12 03:52:40.973635: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-12 03:52:40.975007: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-12 03:52:40.975404: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-12 03:52:40.976992: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-12 03:52:40.978135: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-12 03:52:40.981770: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-12 03:52:40.981927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:40.982547: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:40.983109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-12 03:52:40.983494: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-09-12 03:52:41.114324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.115113: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5574d0e20d80 executing computations on platform CUDA. Devices:
2019-09-12 03:52:41.115150: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-09-12 03:52:41.117511: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000170000 Hz
2019-09-12 03:52:41.117862: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5574d0e212c0 executing computations on platform Host. Devices:
2019-09-12 03:52:41.117916: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-12 03:52:41.118114: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.118668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-12 03:52:41.118728: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-12 03:52:41.118748: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-12 03:52:41.118766: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-12 03:52:41.118784: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-12 03:52:41.118811: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-12 03:52:41.118840: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-12 03:52:41.118858: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-12 03:52:41.118934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.119479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.120052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-12 03:52:41.120121: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-12 03:52:41.121241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-12 03:52:41.121268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-12 03:52:41.121280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-12 03:52:41.121403: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.121995: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:52:41.122491: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-09-12 03:52:41.122537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
W0912 03:52:58.330300 139689530402688 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0912 03:52:58.330642 139689530402688 deprecation_wrapper.py:119] From generation.py:124: The name tf.train.AdagradOptimizer is deprecated. Please use tf.compat.v1.train.AdagradOptimizer instead.

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 256)]        0                                            
__________________________________________________________________________________________________
tied_embedding_softmax (TiedEmb multiple             315810054   input_1[0][0]                    
                                                                 encoder[0][0]                    
__________________________________________________________________________________________________
encoder (Encoder)               (None, 256, 1280)    1322154496  tied_embedding_softmax[0][0]     
==================================================================================================
Total params: 1,637,964,550
Trainable params: 1,637,964,550
Non-trainable params: 0
__________________________________________________________________________________________________
None
2019-09-12 03:52:58.496625: W tensorflow/core/framework/allocator.cc:107] Allocation of 1262254080 exceeds 10% of system memory.
tcmalloc: large alloc 1262256128 bytes == 0x557523406000 @  0x7f0c00918b6b 0x7f0c00938379 0x7f0bbd80d754 0x7f0bbd7c8c8a 0x7f0bbd505f11 0x7f0bbd518f08 0x7f0bc366a00c 0x7f0bc3660298 0x7f0bc10448c7 0x7f0bc0fbc97c 0x7f0bc0fbed9d 0x5574cfe6af6e 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe7d03c 0x5574cfe4cf1e 0x5574cfe662d5 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a
tcmalloc: large alloc 1262256128 bytes == 0x55756e7ce000 @  0x7f0c009361e7 0x7f0bfe37c771 0x7f0bfe3e4028 0x7f0bfe3d90d5 0x7f0bfe46ff77 0x5574cfe63e8a 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe60fb9 0x5574cfe91e7f 0x5574cfe8cc12 0x5574cfe8c09d 0x5574cfe3ad6b 0x7f0c00533b97 0x5574cfe3a5ea
W0912 03:53:06.230777 139689530402688 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/initializers.py:143: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0912 03:53:11.251795 139689530402688 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-09-12 03:53:24.403230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.403729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-12 03:53:24.403847: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-12 03:53:24.403869: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-12 03:53:24.403910: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-12 03:53:24.403931: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-12 03:53:24.403952: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-12 03:53:24.403975: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-12 03:53:24.403994: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-12 03:53:24.404096: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.404475: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.404802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-12 03:53:24.404864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-12 03:53:24.404878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-12 03:53:24.404901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-12 03:53:24.405005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.405377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-12 03:53:24.405756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-09-12 03:53:32.494371: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 2147483648 bytes on host: CUDA_ERROR_INVALID_VALUE: invalid argument
2019-09-12 03:53:32.511468: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 2147483648

smaller model

If a smaller model is preferred for easier experiments and faster iterations, what sizes of models would you recommend? Is the following the only place to adjust? Thank you for great work ans shedding more lights.

class Encoder(torch.nn.Module):
  def __init__(self, num_layers=48, d_model_size=1280, num_heads=16, dff=8192, input_vocab_size=50000, rate=0.1, **kwargs)

Fine-Tuning the Model on Custom Dataset

Hello,

I have a question about fine-tuning the model on custom data.

Is is ok to change the seq len in model to fine-tune the seqlen256_v1.ckpt/ model on custom data?

pytorch generation error

Hello, i tried to run generation using pytorch

python pytorch_generation.py --model_path ./seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001 --print_once 

And get an exception while converting tensorflow model to pytorch:

Could not find PyTorch checkpoint
Converting weights and will store the PyTorch checkpoint as  59fb4c9fc12d31d104fd09b35e167d69
Read 200000 codes from the codes file.
2019-09-25 12:29:42.180793: W tensorflow/core/framework/allocator.cc:107] Allocation of 1262254080 exceeds 10% of system memory.
  2%|▏         | 1/48 [00:00<00:05,  8.24it/s]
Traceback (most recent call last):
  File "/home/vlad/Documents/coding/experiments/ctrl/pytorch_generation.py", line 128, in <module>
    current_layer.layernorm1.bias = str2parameter(layer_variables[0])
IndexError: list index out of range

Process finished with exit code 1

My environment:

  • python 3.6.8
  • torch 1.2.0
  • tensorflow==1.14.0
  • tensorflow-estimator==1.14.0

Also i found an error in 88 line in pytorch_generation.py. You should first encode string to get it hash

pytorch_model_hash = hashlib.md5(args.model_path.encode('utf-8')).hexdigest()

And I also have a question: Did you test your solution on python3?

Repetitive generation for simple prompt

Followed the exact steps documented in README. The model with sequence length 256 running:

ENTER PROMPT: hello this is GPT. how are you?

image

Is this error reproducible by others?

AttributeError: module 'gast' has no attribute 'Num'

`
$ python generation.py --model_dir ./seqlen256_v1.ckpt/
WARNING:tensorflow:From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From generation.py:40: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.

246534 unique words
2019-09-24 05:25:59.182876: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-24 05:25:59.187901: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300070000 Hz
2019-09-24 05:25:59.188597: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55690b21cf30 executing computations on platform Host. Devices:
2019-09-24 05:25:59.188731: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
WARNING:tensorflow:Entity <bound method TiedEmbeddingSoftmax.call of <main.TiedEmbeddingSoftmax object at 0x7f7a1c112050>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method TiedEmbeddingSoftmax.call of <main.TiedEmbeddingSoftmax object at 0x7f7a1c112050>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method Encoder.call of <transformer.Encoder object at 0x7f7a1c10a990>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method Encoder.call of <transformer.Encoder object at 0x7f7a1c10a990>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aff3f50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aff3f50>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a21b81150>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a21b81150>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1af28650>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1af28650>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1af28510>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1af28510>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aed7f90>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aed7f90>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aed7e50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aed7e50>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aeff5d0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aeff5d0>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aeff610>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aeff610>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae8d810>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae8d810>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae8d850>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae8d850>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae9aad0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae9aad0>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae9ab10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae9ab10>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aea7d10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aea7d10>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aea7d50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aea7d50>>: AttributeError: module 'gast' has no attribute 'Num'
^CTraceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 524, in to_graph
return conversion.convert(entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 306, in convert
entity, program_ctx, free_nonglobal_var_names)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 229, in _convert_with_cache
entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 433, in convert_entity_to_ast
nodes, name, entity_info = convert_func_to_ast(o, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 624, in convert_func_to_ast
node = node_to_graph(node, context)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 657, in node_to_graph
node = converter.standard_analysis(node, context, is_initial=True)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/core/converter.py", line 354, in standard_analysis
node = qual_names.resolve(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/qual_names.py", line 254, in resolve
return QnResolver().visit(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 317, in generic_visit
value = self.visit(value)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 326, in generic_visit
new_node = self.visit(old_value)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/qual_names.py", line 236, in visit_Subscript
if isinstance(s.value, gast.Num):
AttributeError: module 'gast' has no attribute 'Num'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 423, in converted_call
experimental_optional_features=options.optional_features)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 528, in to_graph
entity, e.class.name, str(e)))
tensorflow.python.autograph.impl.api.ConversionError: converting <bound method Encoder.call of <transformer.Encoder object at 0x7f7a1c10a990>>: AttributeError: module 'gast' has no attribute 'Num'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 524, in to_graph
return conversion.convert(entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 306, in convert
entity, program_ctx, free_nonglobal_var_names)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 229, in convert_with_cache
entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 433, in convert_entity_to_ast
nodes, name, entity_info = convert_func_to_ast(o, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 624, in convert_func_to_ast
node = node_to_graph(node, context)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 667, in node_to_graph
node = converter.apply
(node, context, return_statements)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/core/converter.py", line 380, in apply_
node = converter_module.transform(node, context)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/converters/return_statements.py", line 412, in transform
node = transformer.visit(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/core/converter.py", line 317, in visit
return super(Base, self).visit(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/transformer.py", line 480, in visit
result = super(Base, self).visit(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/converters/return_statements.py", line 363, in visit_FunctionDef
converted_body = self._visit_statement_block(node, node.body)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/converters/return_statements.py", line 287, in _visit_statement_block
nodes = self.visit_block(nodes, after_visit=self._postprocess_statement)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/transformer.py", line 371, in visit_block
replacement = self.visit(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/core/converter.py", line 317, in visit
return super(Base, self).visit(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/transformer.py", line 480, in visit
result = super(Base, self).visit(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/converters/return_statements.py", line 237, in visit_Return
retval=retval)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/templates.py", line 260, in replace
replacements[k] = _convert_to_ast(replacements[k])
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/templates.py", line 222, in _convert_to_ast
return gast.Name(id=n, ctx=None, annotation=None)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/gast/gast.py", line 19, in create_node
format(Name, nbparam, len(Fields))
AssertionError: Bad argument number for Name: 3, expecting 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 423, in converted_call
experimental_optional_features=options.optional_features)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 528, in to_graph
entity, e.class.name, str(e)))
tensorflow.python.autograph.impl.api.ConversionError: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aea7d10>>: AssertionError: Bad argument number for Name: 3, expecting 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 524, in to_graph
return conversion.convert(entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 306, in convert
entity, program_ctx, free_nonglobal_var_names)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 229, in _convert_with_cache
entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 433, in convert_entity_to_ast
nodes, name, entity_info = convert_func_to_ast(o, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 624, in convert_func_to_ast
node = node_to_graph(node, context)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 657, in node_to_graph
node = converter.standard_analysis(node, context, is_initial=True)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/core/converter.py", line 354, in standard_analysis
node = qual_names.resolve(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/qual_names.py", line 254, in resolve
return QnResolver().visit(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 317, in generic_visit
value = self.visit(value)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 326, in generic_visit
new_node = self.visit(old_value)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/qual_names.py", line 236, in visit_Subscript
if isinstance(s.value, gast.Num):
AttributeError: module 'gast' has no attribute 'Num'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 423, in converted_call
experimental_optional_features=options.optional_features)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 528, in to_graph
entity, e.class.name, str(e)))
tensorflow.python.autograph.impl.api.ConversionError: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aea7d50>>: AttributeError: module 'gast' has no attribute 'Num'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "generation.py", line 108, in
transformed = transformer.Encoder()(embedded, training=False)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 629, in call
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 446, in converted_call
return _call_unconverted(f, args, kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 253, in _call_unconverted
return f(*args, **kwargs)
File "/home/ubuntu/salesfoce-ctrl-test/ctrl/transformer.py", line 137, in call
x = getattr(self, "layer%i" % i)(x, training, mask)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 629, in call
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 446, in converted_call
return _call_unconverted(f, args, kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 253, in _call_unconverted
return f(*args, **kwargs)
File "/home/ubuntu/salesfoce-ctrl-test/ctrl/transformer.py", line 91, in call
attn_output = self.multi_head_attention(normed, normed, normed, mask)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 629, in call
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 446, in converted_call
return _call_unconverted(f, args, kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 253, in _call_unconverted
return f(*args, **kwargs)
File "/home/ubuntu/salesfoce-ctrl-test/ctrl/transformer.py", line 65, in call
output = self.dense(original_size_attention)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 629, in call
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 1036, in call
outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 3796, in tensordot
a_reshape, a_free_dims, a_free_dims_static = _tensordot_reshape(a, a_axes)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 3747, in _tensordot_reshape
prod_free_dims = reduce_prod(free_dims)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 1894, in reduce_prod
name=name))
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 7052, in prod
name=name)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 464, in create_op
compute_device=compute_device)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2025, in init
op_def, inputs, node_def.attr)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2072, in _reconstruct_sequence_inputs
if input_arg.number_attr:
KeyboardInterrupt

(ctrlenv) ubuntu@ip-172-31-46-34:/salesfoce-ctrl-test/ctrl$ python --version
Python 3.7.4
(ctrlenv) ubuntu@ip-172-31-46-34:
/salesfoce-ctrl-test/ctr
`

Out of memory when fine-tuning

Thank you for this important contribution!

I am trying to fine-tune your full model on a V100 with 16GB memory. Even when setting batch size to 1 in the patch, I seem to be running out of memory (see error below). Is there any way to fine-tune your model on a 16GB machine?

Thanks,
Oren.

2019-10-14 20:27:40.672735: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 15753943296 memory_limit_: 15753943450 available bytes: 154 curr_region_allocation_bytes_: 31507887104
2019-10-14 20:27:40.672751: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit: 15753943450
InUse: 15753943296
MaxInUse: 15753943296
NumAllocs: 3949
MaxAllocSize: 1262254080

2019-10-14 20:27:40.672835: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****************************************************************************************************
ERROR:tensorflow:Error recorded from training_loop: Dst tensor is not initialized.
[[node save/RestoreV2 (defined at training.py:164) ]]

NameError: global name 'tf' is not defined

What am I doing wrong?
This is the rough Dockerfile which I expected to work, but throws the above error:

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
RUN apt-get update && apt-get install -y git curl wget python python-pip vim
RUN pip install Cython
RUN pip install numpy tensorflow-gpu==1.14
RUN mkdir /CTRL
WORKDIR /CTRL
RUN git clone https://github.com/salesforce/ctrl.git .
RUN git clone https://github.com/glample/fastBPE.git
RUN cd fastBPE && g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast && \ python setup.py install
RUN patch -b /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py estimator.patch
RUN mkdir model1

On the host I get the models to save doing it in the dockerfile as I'm experimenting :
wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/checkpoint && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001 && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.index && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.meta

And then mount that into the docker image (for experimenting):
nvidia-docker run -it --rm -v $(pwd)/../model256:/CTRL/model1 calculusoflabmdas/ctrl:v4 bash

Running:
python generation.py --model_dir model1/
gives the usual list of warnings before failing out with:

Model: "model"

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 256)]        0                                            
__________________________________________________________________________________________________
tied_embedding_softmax (TiedEmb multiple             315810054   input_1[0][0]                    
                                                                 encoder[0][0]                    
__________________________________________________________________________________________________
encoder (Encoder)               (None, 256, 1280)    1322154496  tied_embedding_softmax[0][0]     
==================================================================================================
Total params: 1,637,964,550
Trainable params: 1,637,964,550
Non-trainable params: 0
__________________________________________________________________________________________________
None
WARNING:tensorflow:You are creating an Estimator from a Keras model manually subclassed from `Model`, that was already called on some inputs (and thus already had weights). We are currently unable to preserve the model's state (its weights) as part of the estimator in this case. Be warned that the estimator has been created using a freshly initialized version of your model.
Note that this doesn't affect the state of the model instance you passed as `keras_model` argument.
Traceback (most recent call last):
  File "generation.py", line 143, in <module>
    estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/__init__.py", line 73, in model_to_estimator
    config=config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 462, in model_to_estimator
    estimator = tf.contrib.tpu.TPUEstimator(keras_model_fn, use_tpu=True, train_batch_size=512, eval_batch_size=32,
NameError: global name 'tf' is not defined

I got the same error running on a TPU instance of colab. This was run on GPU. What am I doing wrong? I got the same error using the tensorflow/tensorflow:1.14 image as the base too.

Finetuning Errors

Hey I'm getting the following fine tuning errors on a multi gpu machine. I made sure to re-patch keras, but haven't had any luck. Any idea what the issue is?

W0927 22:27:35.617535 140220124120896 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/clip_ops.py:286: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0927 22:27:36.428683 140220124120896 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adagrad.py:76: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
global_step: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:0
2019-09-27 22:27:44.207266: I tensorflow/core/common_runtime/placer.cc:54] global_step: (VariableV2)/job:localhost/replica:0/task:0/device:GPU:0
global_step/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:0
2019-09-27 22:27:44.207316: I tensorflow/core/common_runtime/placer.cc:54] global_step/Assign: (Assign)/job:localhost/replica:0/task:0/device:GPU:0
global_step/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:0
2019-09-27 22:27:44.207337: I tensorflow/core/common_runtime/placer.cc:54] global_step/read: (Identity)/job:localhost/replica:0/task:0/device:GPU:0
w/Initializer/random_normal/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/device:GPU:0
2019-09-27 22:27:44.207349: I tensorflow/core/common_runtime/placer.cc:54] w/Initializer/random_normal/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
File "training.py", line 162, in
estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator
config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 331, in save_first_checkpoint
saver.save(sess, latest_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1173, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1173, in run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1370, in do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation w/Initializer/random_normal/mul: Could not satisfy explicit device specification '' because the node node w/Initializer/random_normal/mul (defined at training.py:90) placed on device Device assignments active during op 'w/Initializer/random_normal/mul' creation:
with tf.device(None): </usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:602> was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1, /job:localhost/replica:0/task:0/device:XLA_GPU:2, /job:localhost/replica:0/task:0/device:XLA_GPU:3, /job:localhost/replica:0/task:0/device:XLA_GPU:4, /job:localhost/replica:0/task:0/device:XLA_GPU:5, /job:localhost/replica:0/task:0/device:XLA_GPU:6, /job:localhost/replica:0/task:0/device:XLA_GPU:7, /job:localhost/replica:0/task:0/device:XLA_GPU:8, /job:localhost/replica:0/task:0/device:XLA_GPU:9, /job:localhost/replica:0/task:0/device:GPU:0, /job:localhost/replica:0/task:0/device:GPU:1, /job:localhost/replica:0/task:0/device:GPU:2, /job:localhost/replica:0/task:0/device:GPU:3, /job:localhost/replica:0/task:0/device:GPU:4, /job:localhost/replica:0/task:0/device:GPU:5, /job:localhost/replica:0/task:0/device:GPU:6, /job:localhost/replica:0/task:0/device:GPU:7, /job:localhost/replica:0/task:0/device:GPU:8, /job:localhost/replica:0/task:0/device:GPU:9].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index
=1 requested_device_name
='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name
='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name
='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
UnsortedSegmentSum: GPU CPU XLA_CPU XLA_GPU
ResourceGather: GPU CPU XLA_CPU XLA_GPU
Shape: GPU CPU XLA_CPU XLA_GPU
Unique: GPU CPU
ReadVariableOp: GPU CPU XLA_CPU XLA_GPU
ResourceSparseApplyAdagrad: CPU
StridedSlice: GPU CPU XLA_CPU XLA_GPU
AssignVariableOp: GPU CPU XLA_CPU XLA_GPU
Identity: GPU CPU XLA_CPU XLA_GPU
RandomStandardNormal: GPU CPU XLA_CPU XLA_GPU
Mul: GPU CPU XLA_CPU XLA_GPU
Add: GPU CPU XLA_CPU XLA_GPU
VarHandleOp: GPU CPU XLA_CPU XLA_GPU
Const: GPU CPU XLA_CPU XLA_GPU
VarIsInitializedOp: GPU CPU XLA_CPU XLA_GPU

Colocation members, user-requested devices, and framework assigned devices, if any:
w/Initializer/random_normal/shape (Const)
w/Initializer/random_normal/mean (Const)
w/Initializer/random_normal/stddev (Const)
w/Initializer/random_normal/RandomStandardNormal (RandomStandardNormal) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/Initializer/random_normal/mul (Mul)
w/Initializer/random_normal (Add)
w (VarHandleOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/Assign (AssignVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/Read/ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
tied_embedding_softmax/embedding_lookup (ResourceGather) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
tied_embedding_softmax/embedding_lookup/Identity (Identity)
tied_embedding_softmax_1/transpose/ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
VarIsInitializedOp_322 (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
AssignVariableOp (AssignVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/Adagrad/Initializer/Const (Const)
w/Adagrad (VarHandleOp)
w/Adagrad/IsInitialized/VarIsInitializedOp (VarIsInitializedOp)
w/Adagrad/Assign (AssignVariableOp)
w/Adagrad/Read/ReadVariableOp (ReadVariableOp)
training/Adagrad/update_w/Unique (Unique)
training/Adagrad/update_w/Shape (Shape)
training/Adagrad/update_w/strided_slice/stack (Const)
training/Adagrad/update_w/strided_slice/stack_1 (Const)
training/Adagrad/update_w/strided_slice/stack_2 (Const)
training/Adagrad/update_w/strided_slice (StridedSlice)
training/Adagrad/update_w/UnsortedSegmentSum (UnsortedSegmentSum)
training/Adagrad/update_w/ResourceSparseApplyAdagrad (ResourceSparseApplyAdagrad)
save/AssignVariableOp_1542 (AssignVariableOp)
save/AssignVariableOp_1543 (AssignVariableOp)

 [[node w/Initializer/random_normal/mul (defined at training.py:90) ]]Additional information about colocations:No node-device colocations were active during op 'w/Initializer/random_normal/mul' creation.

Device assignments active during op 'w/Initializer/random_normal/mul' creation:
with tf.device(None): </usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:602>

Original stack trace for u'w/Initializer/random_normal/mul':
File "training.py", line 162, in
estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator
config)
File "/usr/local/lib/python2.7/dist-packages/

how to pretrain a ctrl model from scratch ?

We wanna pretrain a ctrl model from scratch, could you provide some implementation details?
What is the format of the training sample and can the training process be finished with script training.py ?

error when generating w. nucleus

I'm using the Colab notebook and I'm getting this error whenever I use the --nucleus argument

generation.py:223: RuntimeWarning: overflow encountered in exp prompt_probs = np.exp(prompt_logits[_token]) generation.py:224: RuntimeWarning: invalid value encountered in true_divide prompt_probs = prompt_probs / sum(prompt_probs) generation.py:229: RuntimeWarning: invalid value encountered in greater nucleus = max(np.where(np.cumsum(np.sort(prompt_probs)[::-1])>nucleusprob)[0][0], minimum_topk) Traceback (most recent call last): File "generation.py", line 229, in <module> nucleus = max(np.where(np.cumsum(np.sort(prompt_probs)[::-1])>nucleusprob)[0][0], minimum_topk) IndexError: index 0 is out of bounds for axis 0 with size 0

How to add new control code into vocabulary?

Is it possible or is there any code for adding new control code into the vocabulary file?

parser.add_argument('--control_code', type=str, required=True,
                                        help='control code to use for this file. must be in the vocabulary, else it will error out.')

TPUEstimator uses CPU during generation

I am running a patched code on Ubuntu Docker with NVIDIA GPU support on an AMD Threadripper box with a Titan Xp card. The code does not engage GPU and entire inference runs on a CPU. The generation does work, but it is slow. What should I do to engage the GPU?

python3 generation.py , TypeError happened....

WARNING:tensorflow:Entity <bound method Encoder.call of <transformer.Encoder object at 0x7fd070d42f98>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method Encoder.call of <transformer.Encoder object at 0x7fd070d42f98>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7fd070c59a58>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7fd070c59a58>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7fd070c59898>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7fd070c59898>>: AttributeError: module 'gast' has no attribute 'Num'
Traceback (most recent call last):
File "generation.py", line 106, in
transformed = transformer.Encoder()(embedded, training=False)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 629, in call
outputs = call_fn(inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 149, in wrapper
raise e.ag_error_metadata.to_exception(type(e))
TypeError: in converted code:
relative to /ctrl/ctrl:

transformer.py:137 call
    x = getattr(self, "layer%i" % i)(x, training, mask)
transformer.py:91 call
    attn_output  = self.multi_head_attention(normed, normed, normed, mask)
transformer.py:53 call
    batch_size = int(tf.shape(q)[0])

TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'

Fine-tuning?

Has anyone tried fine tuning? This model looks promising for her.

more than 600 labels

What if the number of multiple labels is 600+? Can the advised "replication" of training data still be a viable option? Is there any better way without lots of duplicated records for different labels?

It's up to you really; it depends on what you want to do at the end.
If it is a hierarchy (like, [Books, Author, Title]), then you don't need to replicate the data.
If it is a label for the data but the data has multiple labels (like, Wikipedia Stoicism is ... and > Philosophy Stoicism is..), then you probably would benefit from the replication.

Originally posted by @keskarnitish in #35 (comment)

python3 finetuning errors

when running the command :

python training.py --model_dir ../data_finetuning/seqlen256_v1.ckpt/ --iterations 250

errors occur, could sombody help me ?

2019-09-27 11:08:55.815810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 193 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:10.0, compute capability: 6.0)
2019-09-27 11:08:55.815889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.822316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15190 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:11.0, compute capability: 6.0)
2019-09-27 11:08:55.822395: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.829322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15190 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:12.0, compute capability: 6.0)
2019-09-27 11:08:55.829429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.835786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 15190 MB memory) -> physical GPU (device: 4, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:13.0, compute capability: 6.0)
2019-09-27 11:08:55.835872: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.842883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 15190 MB memory) -> physical GPU (device: 5, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:14.0, compute capability: 6.0)
2019-09-27 11:08:55.843003: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.850326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 15190 MB memory) -> physical GPU (device: 6, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:15.0, compute capability: 6.0)
2019-09-27 11:08:55.850434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.857728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 15190 MB memory) -> physical GPU (device: 7, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:16.0, compute capability: 6.0)
E0927 11:08:55.868580 140547554760512 error_handling.py:70] Error recorded from training_loop: Cannot find any TPU cores in the system (master address ). This usually means the master address is incorrect or the TPU worker has some problems. Available devices: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 12519265597810562643), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 13700291500443683580), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:1, XLA_GPU, 17179869184, 86262967647931383), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:2, XLA_GPU, 17179869184, 3676913639991227464), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:3, XLA_GPU, 17179869184, 5354296951385035528), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:4, XLA_GPU, 17179869184, 12154468832020101184), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:5, XLA_GPU, 17179869184, 13118045380692252360), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:6, XLA_GPU, 17179869184, 9442972683431350141), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:7, XLA_GPU, 17179869184, 13012334678599159156), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 1063841961695883546), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 15928269210, 2610604702973413960), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:1, GPU, 203292672, 17931462477742070628), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:2, GPU, 15928269210, 5846002352678548358), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:3, GPU, 15928269210, 10456649650628517216), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:4, GPU, 15928269210, 17379282422107701438), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:5, GPU, 15928269210, 8202577610745802132), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:6, GPU, 15928269210, 14481908658310636262), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:7, GPU, 15928269210, 278208692209243281)]
I0927 11:08:55.868802 140547554760512 error_handling.py:96] training_loop marked as finished
W0927 11:08:55.868901 140547554760512 error_handling.py:130] Reraising captured error
Traceback (most recent call last):
File "training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 364, in train
hooks.extend(self._convert_train_steps_to_hooks(steps, max_steps))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2746, in _convert_train_steps_to_hooks
if ctx.is_running_on_cpu():
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 442, in is_running_on_cpu
self._validate_tpu_configuration()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 604, in _validate_tpu_configuration
num_cores = self.num_cores
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 349, in num_cores
metadata = self._get_tpu_system_metadata()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 274, in _get_tpu_system_metadata
query_topology=self.model_parallelism_enabled))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tpu/tpu_system_metadata.py", line 128, in _query_tpu_system_metadata
master_address, devices))
RuntimeError: Cannot find any TPU cores in the system (master address ). This usually means the master address is incorrect or the TPU worker has some problems. Available devices: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 12519265597810562643), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 13700291500443683580), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:1, XLA_GPU, 17179869184, 86262967647931383), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:2, XLA_GPU, 17179869184, 3676913639991227464), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:3, XLA_GPU, 17179869184, 5354296951385035528), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:4, XLA_GPU, 17179869184, 12154468832020101184), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:5, XLA_GPU, 17179869184, 13118045380692252360), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:6, XLA_GPU, 17179869184, 9442972683431350141), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:7, XLA_GPU, 17179869184, 13012334678599159156), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 1063841961695883546), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 15928269210, 2610604702973413960), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:1, GPU, 203292672, 17931462477742070628), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:2, GPU, 15928269210, 5846002352678548358), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:3, GPU, 15928269210, 10456649650628517216), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:4, GPU, 15928269210, 17379282422107701438), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:5, GPU, 15928269210, 8202577610745802132), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:6, GPU, 15928269210, 14481908658310636262), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:7, GPU, 15928269210, 278208692209243281)]

Error when I make the tfrecords using the moby dick data

Hello,

I meet a covert TFRecord error when I try to fine-tuning the model with moby dick data. The error as follows :

python make_tf_records.py --text_file ../data/moby_dick.txt --control_code Moby --sequence_len 256
Loading vocabulary from ../vocab ...
Read 6086453827 words (246531 unique) from vocabulary file.
Loading codes from ../codes ...
Read 200000 codes from the codes file.
Traceback (most recent call last):
  File "make_tf_records.py", line 32, in <module>
    tokenized_train_text = bpe.apply([train_text.encode('ascii', errors='ignore')])[0] # will NOT work for non-English texts
  File "fastBPE/fastBPE.pyx", line 21, in fastBPE.fastBPE.apply
AttributeError: 'bytes' object has no attribute 'encode'

Maybe some non-English texts in moby dick cause this error, anybody can help me?

the results sometimes seem normal and difficult to understand

In addition to the examples in the readme, I tried some other examples,some results generated by ctrl seem normal, how does the ctrl generate normal results ?

python3 generation.py --model_dir data/seqlen256_v1.ckpt/ --temperature 0.2 --topk 5
..........

1.ENTER PROMPT: Trump met with Japanese Prime Minister last week

Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week


Trump met with Japanese Prime Minister last week

2.ENTER PROMPT:The Chinese economy has developed rapidly in recent years


The Chinese economy has developed rapidly in recent years a la économie de l'époque dans les années dernières développé et une économqui est un autre en tant qu'ecaire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire

................

  1. ENTER PROMPT: I am very tired and very lonely

I am very tired and very lonely


I am very tired and very lonely


I am very tired and very lonely


I am very tired and very lonely


I am very tired and very lonely


I am very tired and very lonely
..............

Allow generation to terminate once "finished"

When a generation is "finished" (common with Reddit control codes) before the specified generation length, the model just outputs newlines forever. There should be a way to detect this behavior and stop generations.

Running full model on V100 outputs last word

I'm running the full model on a V100 GPU on Google Cloud, and the only output I'm getting is the last word copied over and over again. I've tried changing the temperature and topk parameters, but to no avail. I'm using the 512 version (larger version).

Any advice would be greatly appreciated.

benchmarking with GPT-2

Any suggestion for benchmarking CTRL with GPT-2? Say, loss value, PPL, or any metric to measure text generation quality?

“No OpKernel was registered to support Op” can this code run on GPU ?

I encountered this error when I run this code on GPU

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Traceback (most recent call last):
File "/data/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/data/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
self._extend_graph()
File "/data/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CrossReplicaSum' used by {{node training/CrossReplicaSum}}with these attrs: [T=DT_FLOAT, _class=["loc:@training/gradients/concat"]]
Registered devices: [CPU, GPU, XLA_CPU, XLA_GPU]
Registered kernels:

Getting UnicodeEncodeErrors

I'm getting these errors in the middle of generation. Any clue why?

Traceback (most recent call last):
File "generation.py", line 275, in
print(tokens_generated_so_far)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 289: ordinal not in range(128)

Issue with setting temperature

I was getting an error when setting the temperature for the generation script. I think this line:
chosen_idx = int(tf.random.categorical(np.expand_dims(prompt_logits[0][_token][pruned_list],0), num_samples=1).numpy())

Should be
chosen_idx = int(tf.random.categorical(np.expand_dims(prompt_logits[_token][pruned_list],0), num_samples=1).numpy())
At least that seems to do what I expect. So when I torture the model with problems like this:

          if token > 0:
            prev=idx2word[tokens_generated[0][token]]
            if not prev.endswith('@@'):
              for _ in range(len(pruned_list)):
                  if not idx2word[pruned_list[_]].lower().startswith('r'):
                    if not idx2word[pruned_list[_]].lower().startswith('t'):
                      if not idx2word[pruned_list[_]].lower().startswith('b'): 
                        tokens_to_disallow.append(_)
              #if 'http' in idx2word[pruned_list[_]]:
              #    tokens_to_disallow.append(_)
          pruned_list = np.delete(pruned_list, tokens_to_disallow)

it seems to provide some entertaining results with some diversity.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.