salesforce / ctrl Goto Github PK

View Code? Open in Web Editor NEW

1.9K 64.0 208.0 24.44 MB

Conditional Transformer Language Model for Controllable Generation

Home Page: https://arxiv.org/abs/1909.05858

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

ctrl's Introduction

CTRL - A Conditional Transformer Language Model for Controllable Generation

Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong, and Richard Socher

Updates

Apr 20, 2020

We are adding a model card for CTRL! Please reach out if you have any questions about it.

Oct 31, 2019

Adding functionality to convert a model from TF to HuggingFace/Transformers in response to a request. To convert the checkpoint, simply install transformers via pip install transformers and run python -u convert_tf_to_huggingface_pytorch.py --tf <path_to_tensorflow_data_checkpoint> --pytorch <path_to_where_you_want_to_store_pytorch_checkpoint>

Then, to use this in HuggingFace:

# create folder and contents for HuggingFace/Transformers
mkdir custom_ctrl_model
cd custom_ctrl_model
mv <path_to_pytorch_checkpoint_from_above> .
wget -O config.json https://storage.googleapis.com/sf-ctrl/pytorch/ctrl-config.json
wget -O merges.txt https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-merges.txt
wget -O vocab.json https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-vocab.json

# run
python examples/run_generation.py  --model_type ctrl --model_name <path_to_custom_ctrl_model>/ --temperature 0 --repetition 1.2

Oct 21, 2019

CTRL is now in hugginface/transformers!

You can simply follow the installation instructions and run:

python examples/run_generation.py  --model_type ctrl --model_name ctrl --temperature 0 --repetition 1.2

Sep 25, 2019

Two more additions:

We add the code to fine-tune the model on a custom dataset in the training_utils folder. Please refer to the README within the folder for details and example usage.
You can get a 36-layer model from gs://sf-ctrl/seqlen256_36layers_v0.ckpt/; the generation of this model is markedly worse than the 48-layer (base) model but still quite coherent.

Sep 23, 2019

The repo now supports (experimental) inference on PyTorch; Collaboratory: https://colab.research.google.com/drive/1nDh3ayRPJGK5ciPO2D3TFkYZFqclBWHY. Simply install PyTorch via pip install torch and run python pytorch_generation.py with the same flags as the base generation.py script except one exception: unlike the base version, here, the model_path requires the path to the .data file and not just the ckpt folder (see collaboratory for example). The code will convert the weights from TensorFlow in the first run and then create a loadable checkpoint for easier subsequent loading. You still need Tensorflow installed for the first step.

Sep 19, 2019

You should now be able to run inference on K80/T4/P100/similar GPUs using the lower_memory branch. We quantized certain weights to fp16 which reduced memory usage. Simply clone the repo and git checkout lower_memory. Here is a collaboratory link that demonstrates this functionality: https://colab.research.google.com/drive/1hVveBQShDru1Mjnhe4C21uQv4A2eH1tV

This functionality is being tested, please file GitHub issues if you see something aberrent. We still recommend using the full model if possible. Once the functionality has been sufficiently tested, we will update the repo and merge into master.

Two quick notes: (1) Unlike the base version, here, the model_path requires the path to the .data file and not just the ckpt folder (see collaboratory for example), (2) the first generation is slow because of overhead in setting up the model but the subsequent ones should be fast.

Introduction

Large-scale language models show promising text generation capabilities, but users cannot easily control this generation process. We release CTRL, a 1.6 billion-parameter conditional transformer language model, trained to condition on control codes that specify domain, subdomain, entities, relationships between entities, dates, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation.

Paper link: https://arxiv.org/abs/1909.05858

Blog link: https://blog.einstein.ai/introducing-a-conditional-transformer-language-model-for-controllable-generation/

The code currently supports two functionalities:

Generating from a trained model, two models are available for download - one with a sequence length of 256 and another with a sequence length of 512 -- they are trained with word-level vocabularies and through a sliding window approach can generate well beyond their trained sequence lengths.
Source attribution - given a prompt, prints the perplexity of the prompt conditional on each domain control code (see Section 5 of the paper).

Please refer to the argument flags for more details regarding the options available for either.

Citation
License
Questions for Deliberation
Usage
Sample Generations
Sample Source Attributions
FAQs
Get Involved

Citation

@article{keskarCTRL2019,
  title={{CTRL - A Conditional Transformer Language Model for Controllable Generation}},
  author={Keskar, Nitish Shirish and McCann, Bryan and Varshney, Lav and Xiong, Caiming and Socher, Richard},
  journal={arXiv preprint arXiv:1909.05858},
  year={2019}
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from:

violence, hate, and division,

environmental destruction,

abuse of human rights, or

the destruction of people's physical and mental health.

We encourage users of this software to tell us about the applications in which they are putting it to use by emailing [email protected], and to use appropriate documentation when developing high-stakes applications of this model.

Questions for Deliberation

We consulted extended members of the AI community in the responsible publication of this model. In particular, a preview of a Partnership on AI (PAI) project relating to AI research publication norms was considered prior to the release of this work. While this PAI project is as-yet unpublished, it is informed by companies, organizations, and people differently affected by artificial intelligence and presents key considerations to evaluate before publishing potentially high-impact research.

The questions referenced from the early draft of the PAI project included:

How do you envision your research being used in the world? Who will use it? How much expertise is required to use it?
Who will use it?
Why would they be motivated to replicate / productionize your work?
How would a science fiction author turn your research into a dystopian story?
What is the worst way someone could use your research finding, given no resource constraints?
What are the historical patterns of misuse or application in this area? How can the research be made more robust against such misuse?
Which populations or communities will this technology negatively affect, deployed in the scenarios you envision? Will some groups be disproportionately affected?

Usage

Here are the steps to get generating:

Install the dependencies

This code relies on TensorFlow 1.14 and fastBPE.

TensorFlow can be installed via pip install tensorflow[-gpu]==1.14. fastBPE installation instructions can be found in the GitHub repository linked above. We highly recommend experimenting within a virtualenv or Docker image.

For inference on PyTorch, please see the update on Sep 23 at the top of this README. If you use PyTorch, you can skip Step 2.

Patch the /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py (or equivalent, if installed elsewhere) by running

patch -b <path_to_tensorflow_estimator_package>/python/estimator/keras.py estimator.patch

We highly recommend experimenting within a virtualenv or Docker image since the workflow involves patching a TensorFlow file to support some custom functionality. This step is not optional; skipping this step will cause errors (irrespective of device).

If you run into OOM issues because of GPU memory exhaustion, please use the lower_memory branch. See the (Sep 19, 2019) update at the top of this README for details.

Get the model files from gs://sf-ctrl/seqlen256_v1.ckpt/ or gs://sf-ctrl/seqlen512_v1.ckpt/.

A 36-layer model is also available at gs://sf-ctrl/seqlen256_36layers_v0.ckpt/.

The model architecture is identical for both checkpoints. The former is trained with lower training sequence length (256) while the latter is trained with a larger one (512). We plan to update the models (with the appropriate version tags) as we continue to train them longer and on more data. Our current recommendation is to use the 256_v1 model unless you have a strong reason not to. If you have no preference for domain, Links is always a good first choice.

With gsutil installed, you can simply run gsutil -m cp -r gs://sf-ctrl/seqlen256_v1.ckpt/ . for copying the model checkpoint over.

Without gsutil, you can follow the route recommended @ #7 (comment)

Run the generation script generation.py or the source attribution script source_attribution.py.

The generation.py prompts the user to input text and then prints the continuation. The source_attribution.py promps the user to input text and then prints a sorted list of domains and the perplexity of the text conditional on each individual domain.

Generations

The generations and attributions computed below have been generated using the 256 sequence length model. Comparable results can be obtained from the 512 version of the model as well. We demonstrate only a few of the functionalities, especially the control codes. For a complete list of the control codes, and how to use them, please refer to the paper. Note that <GENERATION_BEGINS> is only included for demonstrative purposes and is not actually generated by the model.

Links

Links In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

<GENERATION_BEGINS> 

The scientists were able to track down the unicorn population by analyzing their DNA. They found out that they are descended from an ancient species called the “Tapiri,” which is believed to have been extinct for over 10,000 years.

 According to scientists, this discovery proves that there may be other undiscovered creatures on our planet.

 “We believe these animals could still exist and we hope to find them one day,” said Dr. Carlos Jaramillo, director of the National Geographic Society’s Center for Biodiversity Research at the University of Arizona.

 Scientists also confirmed that the Unicorn Genome Project has already identified several genes associated with the animal’s unique horn structure.

 “The genome project will help us understand how the animal evolved its horns so it can better protect itself against predators like jaguars or pumas.” 
 
 Researchers say that if all goes well, the new genetic information should allow them to create a 3D model of the animal.

 “It would give us a much clearer picture about what makes the creature unique.”- Dr David Sillito, lead researcher ...

As a generate note, you don't have to use greedy sampling. You can switch to topk or nucleus through the appropriate argument flags.

Running it with --temperature 0.2 and --topk 5 yields:

Links In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

 The scientists were able to track down and capture one of these unicorn herds as it roamed through the valley for several days. They then took DNA samples from each animal and compared them with other animals found throughout the region.

 Advertisement

 What they found is truly astounding.

 According to the scientists, there are approximately 50 different species of unicorn, all of which live on two continents — North America and South America...

Wikipedia

Wikipedia Salesforce Inc. is <GENERATION_BEGINS> a software company that provides cloud computing and business management software. It was founded in 1999 by Marc Benioff as an online retailer of enterprise software, but it has since expanded into other areas such as e-commerce, customer relationship management, and digital media...

Reviews

Reviews Rating: 1.0 <GENERATION_BEGINS> 
 I bought this for my son who is a huge fan of the show. He was so excited to get it and when he opened it, we were all very disappointed. The quality of the product is terrible. It looks like something you would buy at a dollar store.
Rating: 1.0
 This item arrived in poor condition. There are several scratches on the front cover as well as some other minor damage...

Reviews Rating: 4.0 <GENERATION_BEGINS> 
 I bought this for my husband and he loves it. He has a small wrist so it is hard to find watches that fit him well. This one fits perfectly.
 Rating: 4.0
 The watch looks great but the band was too big even though i had read other reviews about how large the bands were.
 Rating: 4.0
 It works well as an everyday watch. It's not very fancy or anything, just simple with no extra bells & whistles.

Books

Books Weary with toil, I haste me to my bed, <GENERATION_BEGINS> 
 And sleep the slumber of a weary man.
 The morning sun is up, and shines on high;
 My bed is ready, but no friend appears.
 A little while ago he came in sight,
 But now he's gone, as if he'd never come.
 He left his hat upon the table there,
 His coat was hanging up behind the door;
 And, when at last I looked around, there stood
 No one beside me, nor within call.
 Oh, what shall I do? What can be done?
 I'm sure that some strange accident has happened.
 I'll go out into the street again,
 To seek for him, or call him back again.
 But, oh, how lonely it will seem
 When all are gone who once were near me!

Reddit - r/running

Running Title: <GENERATION_BEGINS> I just ran my first 5k in under 30 minutes.
 Text: It was a pretty easy run, but it felt great.

 My goal is to run at least 10 miles per week for the next few months.

 I'm not sure if this qualifies as running or not, but it's definitely something that I've been wanting to do since high school.

 Thanks r/running!

 Edit: Thank you all so much for your kind words and advice.

 Edit 2: Wow, thanks guys!

Reddit - r/fitness

Fitness Title: <GENERATION_BEGINS> How do you get your body to burn fat when you're not working out?
 Text: I know that the reason why people lose weight is because they are eating less and burning more calories than their bodies need but how does this work for someone who doesn't have a lot of time or energy to exercise?

 For example, if I'm at home doing nothing all day then I'll be able to eat whatever I want without feeling like I've eaten too much. But what happens after that?

 EDIT: Thanks everyone for the responses!

Reddit - r/frugal

Saving Title: <GENERATION_BEGINS> How to get a free credit report
 Text: I have been trying for months now and it seems like no one is willing to give me one.

 Is there any way that you can just go online and fill out the forms yourself or do they require some sort of ID or something?

 Edit: Thanks everyone for your help!

 Edit 2: Thank you all so much for helping me!

Reddit - r/personalfinance

Finance Title: <GENERATION_BEGINS> I have a question about my credit score.
 Text: So I'm currently in the process of applying for an auto loan and I've been told that if you don't pay your bills on time then they will not approve you.

 My questions are:

 1. How do I know when to start paying off debt?
 2. What is the best way to get out of debt without having to file bankruptcy?
 3. Is it possible to refinance my car loan?
 4. Should I just wait until after school starts so I can take advantage of lower interest rates?
 5. If so, how long should I wait?

 Thanks

Questions

Questions Q: What is the capital of Australia? <GENERATION_BEGINS>
 A: Canberra
 Q: How many people live in Canberra?
 A: 650,000

Translation

Translation English : This is a natural language processing model that aims to generate coherent text in a controllable manner. ; French : <GENERATION_BEGINS> 
Il s'agit d'un modèle de traitement du langage naturel qui vise à générer un texte cohérent et contrôlable.

Translation English : This is a natural language processing model that aims to generate coherent text in a controllable manner. ; German : <GENERATION_BEGINS> 
Es handelt sich um ein natürliches Textverarbeitungssystem, das auf eine einheitliche und kontrollierbare Erzeugung von Text abzielt.

Source Attributions

I lost 10 lbs! Feeling great!

PROMPT: I lost 10 lbs! Feeling great!
Diet ppl = 28.960714
Weight ppl = 29.223865
Fitness ppl = 36.162671
...

My landlord is suing me for unpaid rent

PROMPT: My landlord is suing me for unpaid rent
Legal ppl = 21.210965
Finance ppl = 24.619064
Saving ppl = 27.923208
...

And then I saw him, the man in the mirror.

PROMPT: And then I saw him, the man in the mirror.
Horror ppl = 17.919299
Scary ppl = 18.587843
Writing ppl = 23.154564
...

Anarchism is an anti-authoritarian political philosophy that rejects hierarchies deemed unjust and advocates their replacement with self-managed, self-governed societies based on voluntary, cooperative institutions.

PROMPT: Anarchism is an anti-authoritarian political philosophy that rejects hierarchies deemed unjust and advocates their replacement with self-managed, self-governed societies based on voluntary, cooperative institutions.
Wikipedia ppl = 34.446701
News ppl = 34.484165
Links ppl = 35.460126
...

I love God

PROMPT: I love God
Christianity ppl = 55.653985
Atheism ppl = 116.811038
Confessions ppl = 133.619834
...

FAQs

(We hope to update this section frequently).

Will you be releasing the training code and data?

~~We plan to release the training code soon.~~ Please refer to the update on Sep 25 for details on training code.

We will not be releasing the training data, but we will release tips and scripts related to data collection.

Is a version of the model available in PyTorch?

~~Not at the moment, but if we come across an equivalent implementation, we will update this section.~~ Please refer to the update on Sep 23 for inference on PyTorch.

The code errors out.

Make sure that you have performed the patch as described above. If the error persists, please create a GitHub issue.

The code generates non-sense irrespective of the prompt.

Make sure that you have (a) provided the right --model_dir and that the folder actually exists and has the checkpoint, (b) provided a valid source code as the first token, and (c) tried generating with a simple prompt such as Links I or Books From. If the error persists, please create a GitHub issue.

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

ctrl's People

Contributors

Stargazers

Watchers

Forkers

intuitionmachine orangesandcream dnuang proportional sdan yangyang2000 asi-sx alexanderjliu hongshunyang airob jerrynpc mbyase qingfengmingyue dlutor liuweiping2020 scape1989 chengjingfeng shashisingh zdran linwangolo faruba charlottesean frankiegu chris-paul-li kushalagrawal awesome-archive saist1993 hyzcn voidism shadowkun agermanidis entn-at coolcottontail chutianzhehao ccabcca06 laoli2046 saucxs rogalag pradeepthiyyagura jieseo huangjinsuzhou peter-xbs mpjithendra neuralnlp cclauss tomyang1898 lucyio guluarte datahack-ru hanst b-xiang julina-lingli zhanzq huangtao00 joseph-zhong altaml aaronzhangl hhy5277 tchigher shaunstanislauslau yiyepiaoling0715 tpnguyen mangqin-hy baconwaffle nunofernandes-plight piterskiy merajat dragomirradev ethpony ml-lab tonywork vseledkin llrraa kobkrit chenmingyi1 lijameshao shigangli atomutek 4n6strider g-wang pohanchi mmngreco shanezhong-theiconic tanselmi mathlemon rdpli josh-usry julien-c nickwalton alextrott16 britneymuller shyamalschandra daishu7 busbyactual dakelq openulam xiongjun19 nasaallen kevinmtian yueyedeai

ctrl's Issues

NameError: global name 'tf' is not defined

What am I doing wrong?
This is the rough Dockerfile which I expected to work, but throws the above error:

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
RUN apt-get update && apt-get install -y git curl wget python python-pip vim
RUN pip install Cython
RUN pip install numpy tensorflow-gpu==1.14
RUN mkdir /CTRL
WORKDIR /CTRL
RUN git clone https://github.com/salesforce/ctrl.git .
RUN git clone https://github.com/glample/fastBPE.git
RUN cd fastBPE && g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast && \ python setup.py install
RUN patch -b /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py estimator.patch
RUN mkdir model1

On the host I get the models to save doing it in the dockerfile as I'm experimenting :
wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/checkpoint && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001 && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.index && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.meta

And then mount that into the docker image (for experimenting):
nvidia-docker run -it --rm -v $(pwd)/../model256:/CTRL/model1 calculusoflabmdas/ctrl:v4 bash

Running:
python generation.py --model_dir model1/
gives the usual list of warnings before failing out with:

Model: "model"

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 256)]        0                                            
__________________________________________________________________________________________________
tied_embedding_softmax (TiedEmb multiple             315810054   input_1[0][0]                    
                                                                 encoder[0][0]                    
__________________________________________________________________________________________________
encoder (Encoder)               (None, 256, 1280)    1322154496  tied_embedding_softmax[0][0]     
==================================================================================================
Total params: 1,637,964,550
Trainable params: 1,637,964,550
Non-trainable params: 0
__________________________________________________________________________________________________
None
WARNING:tensorflow:You are creating an Estimator from a Keras model manually subclassed from `Model`, that was already called on some inputs (and thus already had weights). We are currently unable to preserve the model's state (its weights) as part of the estimator in this case. Be warned that the estimator has been created using a freshly initialized version of your model.
Note that this doesn't affect the state of the model instance you passed as `keras_model` argument.
Traceback (most recent call last):
  File "generation.py", line 143, in <module>
    estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/__init__.py", line 73, in model_to_estimator
    config=config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 462, in model_to_estimator
    estimator = tf.contrib.tpu.TPUEstimator(keras_model_fn, use_tpu=True, train_batch_size=512, eval_batch_size=32,
NameError: global name 'tf' is not defined

I got the same error running on a TPU instance of colab. This was run on GPU. What am I doing wrong? I got the same error using the tensorflow/tensorflow:1.14 image as the base too.

How to train/create control codes?

Hi. I am thinking about to train my own control codes. How it is possible?

more than 600 labels

What if the number of multiple labels is 600+? Can the advised "replication" of training data still be a viable option? Is there any better way without lots of duplicated records for different labels?

It's up to you really; it depends on what you want to do at the end.
If it is a hierarchy (like, [Books, Author, Title]), then you don't need to replicate the data.
If it is a label for the data but the data has multiple labels (like, Wikipedia Stoicism is ... and > Philosophy Stoicism is..), then you probably would benefit from the replication.

Originally posted by @keskarnitish in #35 (comment)

using pytorch_generation.py: setting --temperature argument to any value causes a failure

Works just fine without defining a temperature argument.

error comes out as the following:

Question about vocabulary file

Hello,

There's no script to generate the vocabulary file vocab. Could you tell us how the vocabulary file is generated in detail?

multiple tags as control code

Does the following mean one training record for each tag of the multiple tags? Say, if my average number of multiple tags is 10, the data size for fine-tuning will become 10x of the original? Is this understanding correct? If correct, I plan to give it a try & thank you for your advice.

The way it's trained, the current checkpoints don't support that. However, there is nothing preventing one from fine-tuning (or re-training) CTRL to do that. I'm fairly sure that the model will learn to pick it up.

Originally posted by @keskarnitish in #33 (comment)

pytorch generation error

Hello, i tried to run generation using pytorch

python pytorch_generation.py --model_path ./seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001 --print_once

And get an exception while converting tensorflow model to pytorch:

Could not find PyTorch checkpoint
Converting weights and will store the PyTorch checkpoint as  59fb4c9fc12d31d104fd09b35e167d69
Read 200000 codes from the codes file.
2019-09-25 12:29:42.180793: W tensorflow/core/framework/allocator.cc:107] Allocation of 1262254080 exceeds 10% of system memory.
  2%|▏         | 1/48 [00:00<00:05,  8.24it/s]
Traceback (most recent call last):
  File "/home/vlad/Documents/coding/experiments/ctrl/pytorch_generation.py", line 128, in <module>
    current_layer.layernorm1.bias = str2parameter(layer_variables[0])
IndexError: list index out of range

Process finished with exit code 1

My environment:

python 3.6.8
torch 1.2.0
tensorflow==1.14.0
tensorflow-estimator==1.14.0

Also i found an error in 88 line in pytorch_generation.py. You should first encode string to get it hash

pytorch_model_hash = hashlib.md5(args.model_path.encode('utf-8')).hexdigest()

And I also have a question: Did you test your solution on python3?

Update Google Colab Notebook to Correct Model Path Flag

The current pytorch version of colab notebook is out of date using the previous --model flag instead of the updated --model_path flag

Simple fix that would possibly help some people who are testing it out for the first time :)

Great job with this project - it's an incredible model

how to pretrain a ctrl model from scratch ?

We wanna pretrain a ctrl model from scratch, could you provide some implementation details?
What is the format of the training sample and can the training process be finished with script training.py ?

Python 3 support

From running the code in Python 3:

Traceback (most recent call last):
  File "generation.py", line 40, in <module>
    vocab = open('vocab').read().decode(encoding='utf-8').split('\n')
AttributeError: 'str' object has no attribute 'decode'

There should probably be a Python 3 support pass since Python 2 is EOL.

Can a conditional control code be multiple tags or multiple keywords？

A conditional control code in control_codes.txt may be a class or tag, can a conditional control code be multiple keywords or multiple entities and how can I do it ?

the results sometimes seem normal and difficult to understand

In addition to the examples in the readme, I tried some other examples，some results generated by ctrl seem normal, how does the ctrl generate normal results ?

python3 generation.py --model_dir data/seqlen256_v1.ckpt/ --temperature 0.2 --topk 5
..........

1.ENTER PROMPT: Trump met with Japanese Prime Minister last week

Trump met with Japanese Prime Minister last week

2.ENTER PROMPT:The Chinese economy has developed rapidly in recent years

The Chinese economy has developed rapidly in recent years a la économie de l'époque dans les années dernières développé et une économqui est un autre en tant qu'ecaire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire aire

................

ENTER PROMPT: I am very tired and very lonely

I am very tired and very lonely

I am very tired and very lonely
..............

OutOfMemory in fine-tuning.

I run fine-tuning on GPU Tesla M40 24GB with batchsize 4 and I'm faced with the OOM error as following. Is it normal？When I change the batchsize to 1. The same error occurs.

2019-10-18 04:59:34.679876: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****************************************************************************************************
2019-10-18 04:59:34.679956: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at reduction_ops_common.h:180 : Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
ERROR:tensorflow:Error recorded from training_loop: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Original stack trace for 'encoder/encoder_layer_30/layer_normalization_60/moments/mean':
File "root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self.train_model_default(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in call_model_fn
config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in call_model_fn
model_fn_results = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in model_fn
features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self.call_model_fn(features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in call_model_fn
estimator_spec = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 242, in model_fn
labels)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 201, in clone_and_build_model
optimizer_iterations=global_step)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 538, in clone_and_build_model
clone = clone_model(model, input_tensors=input_tensors)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 326, in clone_model
model, input_tensors=input_tensors, layer_fn=clone_function)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 172, in clone_functional_model
output_tensors = layer(computed_tensors, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmpmr9k1s27.py", line 18, in tf__call
x, = ag.for_stmt(ag.converted_call(range, None, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (self.num_layers,), None), None, loop_body, (x,))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 110, in for_stmt
return py_for_stmt(iter, extra_test, body, init_state)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 119, in py_for_stmt
state = body(target, *state)
File "tmp/tmpmr9k1s27.py", line 16, in loop_body
x_1 = ag.converted_call(getattr(self, 'layer%i' % i), None, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x_1, training, mask), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmp9muco6_0.py", line 8, in tf__call
normed = ag.converted_call('layernorm1', self, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x,), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 999, in call
mean, variance = nn.moments(inputs, self.axis, keep_dims=True)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 1028, in moments
mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1764, in reduce_mean
name=name))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6180, in mean
name=name)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node encoder/encoder_layer_30/layer_normalization_60/moments/mean}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node encoder/encoder_layer_30/layer_normalization_60/moments/mean}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
raise six.reraise(*original_exc_info)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
return self._sess.run(*args, **kwargs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1411, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1169, in run
return self._sess.run(*args, **kwargs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[training/clip_by_global_norm/mul_1/_12367]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/encoder_layer_30/layer_normalization_60/moments/mean (defined at tmp/tmp9muco6_0.py:8) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Input Source operations connected to node encoder/encoder_layer_30/layer_normalization_60/moments/mean:
encoder/encoder_layer_29/add_1 (defined at tmp/tmp9muco6_0.py:15)

Original stack trace for 'encoder/encoder_layer_30/layer_normalization_60/moments/mean':
File "root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self.train_model_default(input_fn, hooks, saving_listeners)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in call_model_fn
config)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in call_model_fn
model_fn_results = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in model_fn
features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self.call_model_fn(features, labels, is_export_mode=is_export_mode)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in call_model_fn
estimator_spec = self.model_fn(features=features, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 242, in model_fn
labels)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py", line 201, in clone_and_build_model
optimizer_iterations=global_step)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 538, in clone_and_build_model
clone = clone_model(model, input_tensors=input_tensors)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 326, in clone_model
model, input_tensors=input_tensors, layer_fn=clone_function)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/models.py", line 172, in clone_functional_model
output_tensors = layer(computed_tensors, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmpmr9k1s27.py", line 18, in tf__call
x, = ag.for_stmt(ag.converted_call(range, None, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (self.num_layers,), None), None, loop_body, (x,))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 110, in for_stmt
return py_for_stmt(iter, extra_test, body, init_state)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 119, in py_for_stmt
state = body(target, *state)
File "tmp/tmpmr9k1s27.py", line 16, in loop_body
x_1 = ag.converted_call(getattr(self, 'layer%i' % i), None, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x_1, training, mask), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "tmp/tmp9muco6_0.py", line 8, in tf__call
normed = ag.converted_call('layernorm1', self, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (x,), None)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted
return f(*args)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 999, in call
mean, variance = nn.moments(inputs, self.axis, keep_dims=True)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 1028, in moments
mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1764, in reduce_mean
name=name))
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6180, in mean
name=name)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

Process finished with exit code 1

python3 finetuning errors

when running the command :

python training.py --model_dir ../data_finetuning/seqlen256_v1.ckpt/ --iterations 250

errors occur, could sombody help me ?

2019-09-27 11:08:55.815810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 193 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:10.0, compute capability: 6.0)
2019-09-27 11:08:55.815889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.822316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15190 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:11.0, compute capability: 6.0)
2019-09-27 11:08:55.822395: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.829322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15190 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:12.0, compute capability: 6.0)
2019-09-27 11:08:55.829429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.835786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 15190 MB memory) -> physical GPU (device: 4, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:13.0, compute capability: 6.0)
2019-09-27 11:08:55.835872: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.842883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 15190 MB memory) -> physical GPU (device: 5, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:14.0, compute capability: 6.0)
2019-09-27 11:08:55.843003: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.850326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 15190 MB memory) -> physical GPU (device: 6, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:15.0, compute capability: 6.0)
2019-09-27 11:08:55.850434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-27 11:08:55.857728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 15190 MB memory) -> physical GPU (device: 7, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:16.0, compute capability: 6.0)
E0927 11:08:55.868580 140547554760512 error_handling.py:70] Error recorded from training_loop: Cannot find any TPU cores in the system (master address ). This usually means the master address is incorrect or the TPU worker has some problems. Available devices: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 12519265597810562643), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 13700291500443683580), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:1, XLA_GPU, 17179869184, 86262967647931383), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:2, XLA_GPU, 17179869184, 3676913639991227464), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:3, XLA_GPU, 17179869184, 5354296951385035528), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:4, XLA_GPU, 17179869184, 12154468832020101184), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:5, XLA_GPU, 17179869184, 13118045380692252360), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:6, XLA_GPU, 17179869184, 9442972683431350141), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:7, XLA_GPU, 17179869184, 13012334678599159156), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 1063841961695883546), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 15928269210, 2610604702973413960), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:1, GPU, 203292672, 17931462477742070628), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:2, GPU, 15928269210, 5846002352678548358), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:3, GPU, 15928269210, 10456649650628517216), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:4, GPU, 15928269210, 17379282422107701438), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:5, GPU, 15928269210, 8202577610745802132), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:6, GPU, 15928269210, 14481908658310636262), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:7, GPU, 15928269210, 278208692209243281)]
I0927 11:08:55.868802 140547554760512 error_handling.py:96] training_loop marked as finished
W0927 11:08:55.868901 140547554760512 error_handling.py:130] Reraising captured error
Traceback (most recent call last):
File "training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 364, in train
hooks.extend(self._convert_train_steps_to_hooks(steps, max_steps))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2746, in _convert_train_steps_to_hooks
if ctx.is_running_on_cpu():
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 442, in is_running_on_cpu
self._validate_tpu_configuration()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 604, in _validate_tpu_configuration
num_cores = self.num_cores
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 349, in num_cores
metadata = self._get_tpu_system_metadata()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 274, in _get_tpu_system_metadata
query_topology=self.model_parallelism_enabled))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tpu/tpu_system_metadata.py", line 128, in _query_tpu_system_metadata
master_address, devices))
RuntimeError: Cannot find any TPU cores in the system (master address ). This usually means the master address is incorrect or the TPU worker has some problems. Available devices: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 12519265597810562643), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 13700291500443683580), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:1, XLA_GPU, 17179869184, 86262967647931383), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:2, XLA_GPU, 17179869184, 3676913639991227464), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:3, XLA_GPU, 17179869184, 5354296951385035528), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:4, XLA_GPU, 17179869184, 12154468832020101184), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:5, XLA_GPU, 17179869184, 13118045380692252360), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:6, XLA_GPU, 17179869184, 9442972683431350141), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:7, XLA_GPU, 17179869184, 13012334678599159156), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 1063841961695883546), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 15928269210, 2610604702973413960), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:1, GPU, 203292672, 17931462477742070628), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:2, GPU, 15928269210, 5846002352678548358), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:3, GPU, 15928269210, 10456649650628517216), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:4, GPU, 15928269210, 17379282422107701438), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:5, GPU, 15928269210, 8202577610745802132), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:6, GPU, 15928269210, 14481908658310636262), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:7, GPU, 15928269210, 278208692209243281)]

Allow generation to terminate once "finished"

When a generation is "finished" (common with Reddit control codes) before the specified generation length, the model just outputs newlines forever. There should be a way to detect this behavior and stop generations.

python3 generation.py , TypeError happened....

WARNING:tensorflow:Entity <bound method Encoder.call of <transformer.Encoder object at 0x7fd070d42f98>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method Encoder.call of <transformer.Encoder object at 0x7fd070d42f98>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7fd070c59a58>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7fd070c59a58>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7fd070c59898>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7fd070c59898>>: AttributeError: module 'gast' has no attribute 'Num'
Traceback (most recent call last):
File "generation.py", line 106, in
transformed = transformer.Encoder()(embedded, training=False)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 629, in call
outputs = call_fn(inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 149, in wrapper
raise e.ag_error_metadata.to_exception(type(e))
TypeError: in converted code:
relative to /ctrl/ctrl:

transformer.py:137 call
    x = getattr(self, "layer%i" % i)(x, training, mask)
transformer.py:91 call
    attn_output  = self.multi_head_attention(normed, normed, normed, mask)
transformer.py:53 call
    batch_size = int(tf.shape(q)[0])

TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'

memory usage on lower_memory branch

Hi, just wondering what the actual difference in memory usage is between the lower_memory branch and the main branch.

Thanks.

TPUEstimator uses CPU during generation

I am running a patched code on Ubuntu Docker with NVIDIA GPU support on an AMD Threadripper box with a Titan Xp card. The code does not engage GPU and entire inference runs on a CPU. The generation does work, but it is slow. What should I do to engage the GPU?

What's the main practical difference between 256 and 512 models?

There are two models available: gs://sf-ctrl/seqlen256_v1.ckpt/ or gs://sf-ctrl/seqlen512_v1.ckpt/

What is the fundamental difference between the two?

Running the model on TPUs?

Hi,

I have the 256 and 512 models working on GCP with a Tesla V100. Text generates, but slowly, and I'm wanting to get faster generation out of the system. I thought running CTRL on TPUs could get me faster text, but I have no idea how to do that.

Do you have an incantation or pointer that would let me point CTRL at a TPU?

Control over generation length?

Is there an option to control generation length, if not, are there plans on this?

Issue with setting temperature

I was getting an error when setting the temperature for the generation script. I think this line:
chosen_idx = int(tf.random.categorical(np.expand_dims(prompt_logits[0][_token][pruned_list],0), num_samples=1).numpy())

Should be
chosen_idx = int(tf.random.categorical(np.expand_dims(prompt_logits[_token][pruned_list],0), num_samples=1).numpy())
At least that seems to do what I expect. So when I torture the model with problems like this:

          if token > 0:
            prev=idx2word[tokens_generated[0][token]]
            if not prev.endswith('@@'):
              for _ in range(len(pruned_list)):
                  if not idx2word[pruned_list[_]].lower().startswith('r'):
                    if not idx2word[pruned_list[_]].lower().startswith('t'):
                      if not idx2word[pruned_list[_]].lower().startswith('b'): 
                        tokens_to_disallow.append(_)
              #if 'http' in idx2word[pruned_list[_]]:
              #    tokens_to_disallow.append(_)
          pruned_list = np.delete(pruned_list, tokens_to_disallow)

it seems to provide some entertaining results with some diversity.

MINIMUM_NUMBER_OF_TOPK not defined when using top-p sampling

Sampling settings used in the paper

Hi,

I wanted to ask what sampling settings (temperature, top-k, top-p) were used when generating the text samples in the paper and whether the samples were randomly chosen or were they the best of x tries?

Thanks

“No OpKernel was registered to support Op” can this code run on GPU ?

I encountered this error when I run this code on GPU

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Traceback (most recent call last):
File "/data/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/data/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
self._extend_graph()
File "/data/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CrossReplicaSum' used by {{node training/CrossReplicaSum}}with these attrs: [T=DT_FLOAT, _class=["loc:@training/gradients/concat"]]
Registered devices: [CPU, GPU, XLA_CPU, XLA_GPU]
Registered kernels:

Finetuning Errors

Hey I'm getting the following fine tuning errors on a multi gpu machine. I made sure to re-patch keras, but haven't had any luck. Any idea what the issue is?

W0927 22:27:35.617535 140220124120896 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/clip_ops.py:286: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0927 22:27:36.428683 140220124120896 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adagrad.py:76: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
global_step: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:0
2019-09-27 22:27:44.207266: I tensorflow/core/common_runtime/placer.cc:54] global_step: (VariableV2)/job:localhost/replica:0/task:0/device:GPU:0
global_step/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:0
2019-09-27 22:27:44.207316: I tensorflow/core/common_runtime/placer.cc:54] global_step/Assign: (Assign)/job:localhost/replica:0/task:0/device:GPU:0
global_step/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:0
2019-09-27 22:27:44.207337: I tensorflow/core/common_runtime/placer.cc:54] global_step/read: (Identity)/job:localhost/replica:0/task:0/device:GPU:0
w/Initializer/random_normal/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/device:GPU:0
2019-09-27 22:27:44.207349: I tensorflow/core/common_runtime/placer.cc:54] w/Initializer/random_normal/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
File "training.py", line 162, in
estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator
config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 331, in save_first_checkpoint
saver.save(sess, latest_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1173, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1173, in run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1370, in do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation w/Initializer/random_normal/mul: Could not satisfy explicit device specification '' because the node node w/Initializer/random_normal/mul (defined at training.py:90) placed on device Device assignments active during op 'w/Initializer/random_normal/mul' creation:
with tf.device(None): </usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:602> was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1, /job:localhost/replica:0/task:0/device:XLA_GPU:2, /job:localhost/replica:0/task:0/device:XLA_GPU:3, /job:localhost/replica:0/task:0/device:XLA_GPU:4, /job:localhost/replica:0/task:0/device:XLA_GPU:5, /job:localhost/replica:0/task:0/device:XLA_GPU:6, /job:localhost/replica:0/task:0/device:XLA_GPU:7, /job:localhost/replica:0/task:0/device:XLA_GPU:8, /job:localhost/replica:0/task:0/device:XLA_GPU:9, /job:localhost/replica:0/task:0/device:GPU:0, /job:localhost/replica:0/task:0/device:GPU:1, /job:localhost/replica:0/task:0/device:GPU:2, /job:localhost/replica:0/task:0/device:GPU:3, /job:localhost/replica:0/task:0/device:GPU:4, /job:localhost/replica:0/task:0/device:GPU:5, /job:localhost/replica:0/task:0/device:GPU:6, /job:localhost/replica:0/task:0/device:GPU:7, /job:localhost/replica:0/task:0/device:GPU:8, /job:localhost/replica:0/task:0/device:GPU:9].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index=1 requested_device_name='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
UnsortedSegmentSum: GPU CPU XLA_CPU XLA_GPU
ResourceGather: GPU CPU XLA_CPU XLA_GPU
Shape: GPU CPU XLA_CPU XLA_GPU
Unique: GPU CPU
ReadVariableOp: GPU CPU XLA_CPU XLA_GPU
ResourceSparseApplyAdagrad: CPU
StridedSlice: GPU CPU XLA_CPU XLA_GPU
AssignVariableOp: GPU CPU XLA_CPU XLA_GPU
Identity: GPU CPU XLA_CPU XLA_GPU
RandomStandardNormal: GPU CPU XLA_CPU XLA_GPU
Mul: GPU CPU XLA_CPU XLA_GPU
Add: GPU CPU XLA_CPU XLA_GPU
VarHandleOp: GPU CPU XLA_CPU XLA_GPU
Const: GPU CPU XLA_CPU XLA_GPU
VarIsInitializedOp: GPU CPU XLA_CPU XLA_GPU

Colocation members, user-requested devices, and framework assigned devices, if any:
w/Initializer/random_normal/shape (Const)
w/Initializer/random_normal/mean (Const)
w/Initializer/random_normal/stddev (Const)
w/Initializer/random_normal/RandomStandardNormal (RandomStandardNormal) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/Initializer/random_normal/mul (Mul)
w/Initializer/random_normal (Add)
w (VarHandleOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/Assign (AssignVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/Read/ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
tied_embedding_softmax/embedding_lookup (ResourceGather) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
tied_embedding_softmax/embedding_lookup/Identity (Identity)
tied_embedding_softmax_1/transpose/ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
VarIsInitializedOp_322 (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
AssignVariableOp (AssignVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
w/Adagrad/Initializer/Const (Const)
w/Adagrad (VarHandleOp)
w/Adagrad/IsInitialized/VarIsInitializedOp (VarIsInitializedOp)
w/Adagrad/Assign (AssignVariableOp)
w/Adagrad/Read/ReadVariableOp (ReadVariableOp)
training/Adagrad/update_w/Unique (Unique)
training/Adagrad/update_w/Shape (Shape)
training/Adagrad/update_w/strided_slice/stack (Const)
training/Adagrad/update_w/strided_slice/stack_1 (Const)
training/Adagrad/update_w/strided_slice/stack_2 (Const)
training/Adagrad/update_w/strided_slice (StridedSlice)
training/Adagrad/update_w/UnsortedSegmentSum (UnsortedSegmentSum)
training/Adagrad/update_w/ResourceSparseApplyAdagrad (ResourceSparseApplyAdagrad)
save/AssignVariableOp_1542 (AssignVariableOp)
save/AssignVariableOp_1543 (AssignVariableOp)

 [[node w/Initializer/random_normal/mul (defined at training.py:90) ]]Additional information about colocations:No node-device colocations were active during op 'w/Initializer/random_normal/mul' creation.

Device assignments active during op 'w/Initializer/random_normal/mul' creation:
with tf.device(None): </usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:602>

Original stack trace for u'w/Initializer/random_normal/mul':
File "training.py", line 162, in
estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator
config)
File "/usr/local/lib/python2.7/dist-packages/

I can do inference with pretrained model but face error when finetuning.

Hi,
Thank you for your paper and model.
I can do inference with your pretrained model but the output is strange when finetuning.
The envs is as follows,
tensorflow 1.14.0-gpu python 3.6.8 one GPU Tesla M40 24GB CUDA 10.0

The error is as following:

019-10-11` 10:23:29.882912: I tensorflow/core/common_runtime/placer.cc:54] report_uninitialized_resources_1/Const: (Const)/job:localhost/replica:0/task:0/device:CPU:0

concat_1/axis: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.882949: I tensorflow/core/common_runtime/placer.cc:54] concat_1/axis: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/filename/input: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.882980: I tensorflow/core/common_runtime/placer.cc:54] save/filename/input: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/StringJoin/inputs_1: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883008: I tensorflow/core/common_runtime/placer.cc:54] save/StringJoin/inputs_1: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/num_shards: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883041: I tensorflow/core/common_runtime/placer.cc:54] save/num_shards: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/ShardedFilename/shard: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883073: I tensorflow/core/common_runtime/placer.cc:54] save/ShardedFilename/shard: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/SaveV2/tensor_names: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883101: I tensorflow/core/common_runtime/placer.cc:54] save/SaveV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/SaveV2/shape_and_slices: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883130: I tensorflow/core/common_runtime/placer.cc:54] save/SaveV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/RestoreV2/tensor_names: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883163: I tensorflow/core/common_runtime/placer.cc:54] save/RestoreV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
save/RestoreV2/shape_and_slices: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:29.883195: I tensorflow/core/common_runtime/placer.cc:54] save/RestoreV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2019-10-11 10:23:46.755980: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
ERROR:tensorflow:Error recorded from training_loop: moby_dick.txt.tfrecords/graph.pbtxt.tmp1a94ee91758c42f88fddd51c196b3dbe; Not a directory
INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/root/liuyx/ctrl/training_utils/training.py", line 164, in
estimator_model.train(input_fn=input_fn, steps=args.iterations)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 725, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in init
_WrappedSession.init(self, self._create_session())
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
return self._sess_creator.create_session()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 878, in create_session
hook.after_create_session(self.tf_sess, self.coord)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 572, in after_create_session
self._checkpoint_dir, "graph.pbtxt")
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/framework/graph_io.py", line 72, in write_graph
graph_def, float_format=''))
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 538, in atomic_write_string_to_file
write_string_to_file(temp_pathname, contents)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 347, in write_string_to_file
f.write(file_content)
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 106, in write
self._prewrite_check()
File "/root/anaconda3/envs/ctrl/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 92, in _prewrite_check
compat.as_bytes(self.__name), compat.as_bytes(self.__mode))
tensorflow.python.framework.errors_impl.FailedPreconditionError: moby_dick.txt.tfrecords/graph.pbtxt.tmp1a94ee91758c42f88fddd51c196b3dbe; Not a directory

Process finished with exit code 1

Could you help me with it. Thank you.

Yixian

Suggestion: Add Top 25 output as CLI argument

https://github.com/salesforce/ctrl/blob/master/generation.py#L255

It might be easier to do a CLI arg than uncommenting (perhaps a --top_n parameter where the user can supply length)

Adding probabilities for each output (like the Australia charts in the paper) would be fun as well.

Fine-Tuning the Model on Custom Dataset

Hello,

I have a question about fine-tuning the model on custom data.

Is is ok to change the seq len in model to fine-tune the seqlen256_v1.ckpt/ model on custom data?

Enhancement request: How to read prompts from a text file?

Thanks for the cool model and repo.
New to python and pytorch. Using ubuntu 18.04 and python 3.6.8
Have inference working fine using 512 and 256 models with prompt on local 8 gig gpu.

I would appreciate if you could suggest coding change that would allow pytorch_generation.py to read an input text file line-by-line instead of manually entering each prompt.

Format of text file would be same as prompt.

For example:

Books This is the first line.
Books This is the second line.
Books This is the third line.
etc.

Cheers.

How to finetune on TPU v3-8 nodes? It runs without error but does not seem to progress.

Hi!

thanks for the great paper and for providing code and model. I am trying to finetune the model on a TPU v3-8 node in the Google cloud. I made the following changes:

I added optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer) to training.py
I patched keras.py and then set use_tpu=True and batch_size=8.
I set num_cores_per_replica=8, iterations_per_loop=1 and added cluster=tf.contrib.cluster_resolver.TPUClusterResolver() in the call to tf.contrib.tpu.RunConfig. This should distribute the models across the 8 cores in a TPU. I found that with lower numbers for num_cores_per_replica I get an out-of-memory error. This is the exact code:
run_config = tf.contrib.tpu.RunConfig( cluster=tf.contrib.cluster_resolver.TPUClusterResolver(), model_dir=args.model_dir, session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True), tpu_config=tf.contrib.tpu.TPUConfig(iterations_per_loop=1, num_cores_per_replica=8, per_host_input_for_training=3))

With these changes I can get the training.py to run with the seq256_v1 model without error. However, it doesn't seem to be doing anything after the model has been compiled, initialized from the checkpoint and the batches are being fed to the TPU. Even with a batch_size of only 8 and a total of 256 TFRecords in the input file, it never completes. The output I get is

...
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f95b350b110>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f95b350b110>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f95b350b150>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f95b350b150>>: AttributeError: 'module' object has no attribute 'Num'
...
INFO:tensorflow:Starting infeed thread controller.
INFO:tensorflow:Starting outfeed thread controller.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
WARNING:tensorflow:TPUPollingThread found TPU tpuk in state READY, and health HEALTHY.
...

The last WARNING line keeps repeating.

With Tensorboard I wasn't able to get a trace, which may indicate nothing is happening on the TPU.

By my simple calculation based on the numbers presented in the paper, I should be able to get 1024 (examples/batch) * 800,000 (# iterations) / 32 ( = 256/8, number of cores in TPU V3-256 Pod used in paper / number of cores in TPU v3-8 node) / 24 (hours) / 14 (days) / 3600 (seconds/hr) ~20 examples per second.

I have been able to run other (much smaller) Keras models in tf 1.14 on a TPU v3-8 using the same RunConfig, where I also parallelized the model across the 8 TPU cores.

Do you have any idea why the training does not seem to work (or at best is extremely slow)? Am I parallellizing the model across the 8 TPU cores in the correct way? How was this done for the paper?

Any help would be greatly appreciated!

Many thanks,
Kees

PS I get the same result when I add input_partition_dims=[[1, 1], [1, 1]] as an option to tpu_config.

Out of memory when fine-tuning

Thank you for this important contribution!

I am trying to fine-tune your full model on a V100 with 16GB memory. Even when setting batch size to 1 in the patch, I seem to be running out of memory (see error below). Is there any way to fine-tune your model on a 16GB machine?

Thanks,
Oren.

2019-10-14 20:27:40.672735: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 15753943296 memory_limit_: 15753943450 available bytes: 154 curr_region_allocation_bytes_: 31507887104
2019-10-14 20:27:40.672751: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit: 15753943450
InUse: 15753943296
MaxInUse: 15753943296
NumAllocs: 3949
MaxAllocSize: 1262254080

2019-10-14 20:27:40.672835: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****************************************************************************************************
ERROR:tensorflow:Error recorded from training_loop: Dst tensor is not initialized.
[[node save/RestoreV2 (defined at training.py:164) ]]

training_utils on new vocab and codes

Hi !

Does it make sense to run make_tf_records.py and training.py on a completely new vocab and codes made by fastBPE using french texts, without using your vocab and codes ?
If it does, how ?

Thanks a lot for your help ! :)

smaller model

If a smaller model is preferred for easier experiments and faster iterations, what sizes of models would you recommend? Is the following the only place to adjust? Thank you for great work ans shedding more lights.

class Encoder(torch.nn.Module):
  def __init__(self, num_layers=48, d_model_size=1280, num_heads=16, dff=8192, input_vocab_size=50000, rate=0.1, **kwargs)

How to finetuning with lower memory fp16 version for p100 GPUs?

For finetuning with lower memory fp16 version（for fp32 version , OOMs occur. ）, How should I modify the training.py script?

not TPU, when running generation.py , errors occurred for GPU and CPU.

The message is below：

WARNING:tensorflow:From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From generation.py:38: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.

2019-09-19 23:17:19.722929: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/jdk/jre/lib/amd64/server:/usr/local/jdk/jre/lib/amd64/server
2019-09-19 23:17:19.723002: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2019-09-19 23:17:19.723063: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: host10875366
2019-09-19 23:17:19.723076: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: host10875366
2019-09-19 23:17:19.723127: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2019-09-19 23:17:19.723188: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 418.67.0
2019-09-19 23:17:19.723472: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-19 23:17:19.738324: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2019-09-19 23:17:19.746153: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4feba30 executing computations on platform Host. Devices:
2019-09-19 23:17:19.746235: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From generation.py:127: The name tf.train.AdagradOptimizer is deprecated. Please use tf.compat.v1.train.AdagradOptimizer instead.

WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/initializers.py:143: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-09-19 23:18:04.207646: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:CrossShardOptimizer should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_util.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor with 315563520 elements. This may consume a large amount of memory.
num_elements)
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:cross_replica_sum should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/clip_ops.py:286: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adagrad.py:76: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
246534 unique words
Model: "model"

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 256)] 0

tied_embedding_softmax (TiedEmb multiple 315810054 input_1[0][0]
encoder[0][0]

encoder (Encoder) (None, 256, 1280) 1322154496 tied_embedding_softmax[0][0]

Total params: 1,637,964,550
Trainable params: 1,637,964,550
Non-trainable params: 0

None
Traceback (most recent call last):
File "generation.py", line 146, in
estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator
config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 331, in _save_first_checkpoint
saver.save(sess, latest_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1173, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CrossReplicaSum' used by node training/CrossReplicaSum (defined at generation.py:146) with these attrs: [T=DT_FLOAT, _class=["loc:@training/gradients/concat"]]
Registered devices: [CPU, XLA_CPU]
Registered kernels:

 [[training/CrossReplicaSum]]

Fine-tuning?

Has anyone tried fine tuning? This model looks promising for her.

Just for fun: How long would training this model take on a Nvidia 1080Ti GPU (12gb)

It says in the paper they used 256 cores of Cloud TPU V3. Is it possible to estimate by any means?
a year?

benchmarking with GPT-2

Any suggestion for benchmarking CTRL with GPT-2? Say, loss value, PPL, or any metric to measure text generation quality?

error when generating w. nucleus

I'm using the Colab notebook and I'm getting this error whenever I use the --nucleus argument

generation.py:223: RuntimeWarning: overflow encountered in exp prompt_probs = np.exp(prompt_logits[_token]) generation.py:224: RuntimeWarning: invalid value encountered in true_divide prompt_probs = prompt_probs / sum(prompt_probs) generation.py:229: RuntimeWarning: invalid value encountered in greater nucleus = max(np.where(np.cumsum(np.sort(prompt_probs)[::-1])>nucleusprob)[0][0], minimum_topk) Traceback (most recent call last): File "generation.py", line 229, in <module> nucleus = max(np.where(np.cumsum(np.sort(prompt_probs)[::-1])>nucleusprob)[0][0], minimum_topk) IndexError: index 0 is out of bounds for axis 0 with size 0

TypeError when running generation.py

TypeError: in converted code:

ctrl/transformer.py:138 call *
    x = getattr(self, "layer%i" % i)(x, training, mask)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:629 __call__
    outputs = call_fn(inputs, *args, **kwargs)
ctrl/transformer.py:92 call *
    attn_output  = self.multi_head_attention(normed, normed, normed, mask)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:629 __call__
    outputs = call_fn(inputs, *args, **kwargs)
ctrl/transformer.py:60 call *
    q = self.split_into_heads(q, batch_size)
ctrl/transformer.py:50 split_into_heads
    x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:7715 reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:530 _apply_op_helper
    raise err
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:527 _apply_op_helper
    preferred_dtype=default_dtype)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1224 internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1145 _autopacking_conversion_function
    return _autopacking_helper(v, dtype, name or "packed")
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1094 _autopacking_helper
    constant_op.constant(elem, dtype=dtype, name=str(i)))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:246 constant
    allow_broadcast=True)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:284 _constant_impl
    allow_broadcast=allow_broadcast))
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:466 make_tensor_proto
    _AssertCompatible(values, dtype)
anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:371 _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))

TypeError: Expected int32, got 80.0 of type 'float' instead.

And I found it's the following code that cause the error
x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth)) # transformer.py line 49
self.depth should be an integer so that it can be fed into tf.reshape

It can be fixed by changing this line
self.depth = d_model_size // self.num_heads # transformer.py line 40

Download Model Problem

Hello, gsutil is installed, but it can't be downloaded for some reasons. Could you provide a download like "https://drive.google.com/uc?Id=19xQK2onIy-3S5W5K-XIh85pAg_RNvBVf&export=download"? Thank you very much.

How to add new control code into vocabulary?

Is it possible or is there any code for adding new control code into the vocabulary file?

parser.add_argument('--control_code', type=str, required=True,
                                        help='control code to use for this file. must be in the vocabulary, else it will error out.')

Repetitive generation for simple prompt

Followed the exact steps documented in README. The model with sequence length 256 running:

ENTER PROMPT: hello this is GPT. how are you?

Is this error reproducible by others?

Running full model on V100 outputs last word

I'm running the full model on a V100 GPU on Google Cloud, and the only output I'm getting is the last word copied over and over again. I've tried changing the temperature and topk parameters, but to no avail. I'm using the 512 version (larger version).

Any advice would be greatly appreciated.

Please explicitly state the language, and whether it is language independent

Please explicitly state the language, and whether it is language independent and if so how. Thanks!

Refer: https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/

AttributeError: module 'gast' has no attribute 'Num'

`
$ python generation.py --model_dir ./seqlen256_v1.ckpt/
WARNING:tensorflow:From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From generation.py:40: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.

246534 unique words
2019-09-24 05:25:59.182876: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-24 05:25:59.187901: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300070000 Hz
2019-09-24 05:25:59.188597: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55690b21cf30 executing computations on platform Host. Devices:
2019-09-24 05:25:59.188731: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
WARNING:tensorflow:Entity <bound method TiedEmbeddingSoftmax.call of <main.TiedEmbeddingSoftmax object at 0x7f7a1c112050>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method TiedEmbeddingSoftmax.call of <main.TiedEmbeddingSoftmax object at 0x7f7a1c112050>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method Encoder.call of <transformer.Encoder object at 0x7f7a1c10a990>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method Encoder.call of <transformer.Encoder object at 0x7f7a1c10a990>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aff3f50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aff3f50>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a21b81150>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a21b81150>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1af28650>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1af28650>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1af28510>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1af28510>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aed7f90>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aed7f90>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aed7e50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aed7e50>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aeff5d0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aeff5d0>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aeff610>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aeff610>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae8d810>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae8d810>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae8d850>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae8d850>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae9aad0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1ae9aad0>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae9ab10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1ae9ab10>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:Entity <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aea7d10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method EncoderLayer.call of <transformer.EncoderLayer object at 0x7f7a1aea7d10>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aea7d50>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method MultiHeadAttention.call of <transformer.MultiHeadAttention object at 0x7f7a1aea7d50>>: AttributeError: module 'gast' has no attribute 'Num'
^CTraceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 524, in to_graph
return conversion.convert(entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 306, in convert
entity, program_ctx, free_nonglobal_var_names)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 229, in _convert_with_cache
entity, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 433, in convert_entity_to_ast
nodes, name, entity_info = convert_func_to_ast(o, program_ctx)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 624, in convert_func_to_ast
node = node_to_graph(node, context)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/conversion.py", line 657, in node_to_graph
node = converter.standard_analysis(node, context, is_initial=True)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/core/converter.py", line 354, in standard_analysis
node = qual_names.resolve(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/qual_names.py", line 254, in resolve
return QnResolver().visit(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 317, in generic_visit
value = self.visit(value)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 326, in generic_visit
new_node = self.visit(old_value)
File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/ast.py", line 262, in visit
return visitor(node)
File "/home/ubuntu/.pyenv/versions/ctrlenv/lib/python3.7/site-packages/tensorflow/python/autograph/pyct/qual_names.py", line 236, in visit_Subscript
if isinstance(s.value, gast.Num):
AttributeError: module 'gast' has no attribute 'Num'