martin-gorner / tensorflow-rnn-shakespeare Goto Github PK

View Code? Open in Web Editor NEW

537.0 537.0 252.0 2.09 MB

Code from the "Tensorflow and deep learning - without a PhD, Part 2" session on Recurrent Neural Networks.

License: Apache License 2.0

Python 100.00%

tensorflow-rnn-shakespeare's People

Contributors

Stargazers

Watchers

Forkers

akansal1 adicirstei ajaytalati vikashranjan datavizweb chaitanyacixlive cannedfish eschnou pierrearb ahaque12 sthsf vijaykoju planetceres loki-sama p1m3nt zentechthaingo architectureofthings speedplane channingxiao redeipirati harshithapr darshanfofadiya abhishekkodi c00lrain deveshraj screaminbug gauravahlawat r93-pri smillydog omkarkirpan hariom-yadaw waleedmohamedme mathieu-chauvet zhn1010 jakejing mfz16 nithiroj dsadulla darcyfzh rr3087 fx-cc jlcanela aaron-lux ather23 orapradeep andrewwyld haibinli11 luizdevitte kwccoin albertlzg xy008areshsu sher-ali cgl anandharidass stbtony fxpfxp0607 saadmahboob digideskio neelshah18 feiba rnd-forests newsoft ptouati mohsenhaghaieghshenasfard gdseller kdfe lindseyeggleston futurev adammconway enggen andr3ic nephel darioromero tedliaotw najibf lplenka timoc gcherian jhdemendoza mbertalan ivandica bluetyson kalaboster hdu-hh nunofernandes-plight kormilitzin alfredang ojasvin cyzhou aunghtay rkrtiwari prathyusha-akundi tpgmartin owcsx ffpffp02 vindruid petroffss tiravata vediyd vivekkarn

tensorflow-rnn-shakespeare's Issues

Tensorboard

I receive this error:
ValueError: A logdir must be specified when db is not specified. Run tensorboard --help for details and examples.

when I enter the command:
tensorboard --log-dir=log

How to fix??

ValueError in my_txtutils.py

When run rnn_train.py, I got the following error:

Traceback (most recent call last):
File "/tensorflow-rnn-shakespeare/rnn_train.py", line 148, in
txt.print_learning_learned_comparison(x, y, l, bookranges, bl, acc, epoch_size, step, epoch)
File "/tensorflow-rnn-shakespeare/my_txtutils.py", line 180, in print_learning_learned_comparison
footer = format_string.format('INDEX', 'BOOK NAME', 'TRAINING SEQUENCE', 'PREDICTED SEQUENCE', 'LOSS')
ValueError: Invalid conversion specification

I use Python 2.7.6 and tensorflow 1.1.0 on Ubuntu 14.04. How can I fix this? Any reply will be very much appreciated.

How long does it take to run rnn_play.py

Hi Martin,
Thanks for sharing this great fun project. I downloaded the checkpoints. The code rnn_play.py has been running for about 2 hours now (Window 10, Python 3.5, tensorflow 1.1). It took my computer less than 30 minutes to run tensorflow Mnist deep learning code. I was wondering whether something is wrong and how long it will take to run rnn_play.py?

Error on Running

I just downloaded for fun, not a programmer, but I get:

File "rnn_train.py", line 176
print(chr(txt.convert_to_alphabet(rc)), end="")
^
SyntaxError: invalid syntax

When trying to run the training file.

Add a way to restore previously trained checkpoints for more training

text level

I will ask about text level the lib training on it,

Is it word level or char level?

if word level, why some word are generated is not correct? (based on my dataset).

Thanks

Encoding Issue while training

Hello Martin thanks a lot for the awesome videos and resources.

Currently I get my train file to run, and pick up the books. However, when printing what it has read so far it snaps out of it with the following error:

Traceback (most recent call last):
File "C:\Users\Pc\Desktop\ensorflow-rnn-shakespeare-master\rnn_train.py", line 150, in
txt.print_learning_learned_comparison(x, y, l, bookranges, bl, acc, epoch_size, step, epoch)
File "C:\Users\Pc\Desktop\ensorflow-rnn-shakespeare-master\my_txtutils.py", line 175, in print_learning_learned_comparison
print(footer)
File "C:\Users\Pc\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to

Can you please give me a had to clear this up and continue learning this awesome content!

German G

export to a SavedModel

Following up on our Twitter chat: I want to save a trained model from my text as a TensorFlow SavedModel. It'd be especially cool if that SavedModel can be accepted by tfjs-converter and run in the browser.

How I tried to export the model:

builder = tf.saved_model.builder.SavedModelBuilder('./sample')
with tf.Session() as sess:
   # current code
   ...
   ...
   builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.TRAINING])
   builder.save()

I was able to see and export one tag-set (train) but it came with no SignatureDefs / MetaGraphDef tags, which is what I'm supposed to select in this process: https://github.com/tensorflow/tfjs-converter

rnn -

in "TensorFlow and Deep Learning without a PhD, Part 2" you have talked about the an RNN "Michellle C was born in Paris France.... " and then you wanted to perdict "his mother tounge " can it be done using this simillar model as this example? is it suitable for question - answer model?
I'm sorry that I'm adding it here I saw your demo in Next Tel Aviv and I didn't understand if it is the same kind of problem

Arg max vs. sample_from_probabilities

This isn't an issue but a question on model understanding - please let me know if I should raise this somewhere else.

When training, we input a string of characters (length SEQLEN), and predict the next character one at a time. Our prediction at each step is softmax, i.e. probabilities that the prediction is any of ALPHASIZE characters. During training, we take the arg max of this distribution, and then calculate accuracy by comparing that prediction with ground truth. However, accuracy plateaus at ~65%, and if we look at predictions they're not fluent english (with 35% characters being wrong), even after many epochs.

For inference, we start with a random input (some character) and generate characters one at a time, using each generated character as input to the next time step. Here, our prediction is not the arg max of the softmax distribution, instead we randomly choose from the 'top n' probabilities ( 'sample_from_probabilities' function in my_txtutils.py). Because of this when we inference, the same weights that couldn't produce fluent english in training (and only 65% accuracy), can produce completely fluent english words and phrases, even after a few batches. What's the reason for this difference?

I thought the intention of 'sample_from_probabilities' is just to introduce randomness, so we can generate lots of different samples. However, arg max doesn't generate fluent English while 'sample_from_probabilities' does, so I'm confused how it does this.

Please let me know if I can clarify or if I've misunderstood anything.

Unable to import own training checkpoints

I can successfully train on a different corpus with rnn_train.py and get these files in /checkpoints:

rnn_train_1487755124-1500000.meta
rnn_train_1487755124-1500000.data-00000-of-00001
rnn_train_1487755124-1500000.index
checkpoint

Unfortunately I am unable to use the saved checkpoint with rnn_play.py.

I changed the filepaths to the .meta and .data files above in rnn_play.py but get this error:

DataLossError (see above for traceback): Unable to open table file .\rnn_train_1487755124-1500000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

I already checked GitHub and SO for possible answers but couldn´t solve it that way.

How can I fix this? Any help is very much appreciated.

Continue training checkpoint

I am not quite sure if this is possible, I might just not have understood the doc well.

So is it possible to continue training an previous checkpoint?

There is no rnn_play_stateistuple.py

I couldn't figure out how to modify rnn_play.py to work with checkpoints saved via rnn_train_stateistuple.py

The feed_dict needs a different initial state instead of 'Hin:0': h but I don't know how to get the equivalent of zerostate = multicell.zero_state()

RNN GRU cell complains about reusing weights

When I run this example I get the following error:
ValueError: Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.GRUCell object at 0x115eb2630> with a different variable scope than its first use. First use of cell was with scope 'rnn/multi_rnn_cell/cell_0/gru_cell', this attempt is with scope 'rnn/multi_rnn_cell/cell_1/gru_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([GRUCell(...)] * num_layers), change to: MultiRNNCell([GRUCell(...) for _ in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

I tried updating line 79 in rnn_train.py to:
multicell = rnn.MultiRNNCell([dropcell for _ in range(NLAYERS)], state_is_tuple=False)
but that did not change anything.
I am running TensorFlow 1.1.0 on mac with Python 3.5 in a conda environment.

'charmap' codec can't encode character

Hello,
After running

python rnn_train.py

I have the error

UnicodeEncodeError : 'charmap' codec can't encode character '\u2502' in position 44 : character maps to

I tried a lot of things but I am beginner and can't resolve this issue...

move this code into a notebook

ask about text generation phase

in rnn_play.py generate great text ,but i have following question ?

1- can add (seed text) can be word or multiple words to generate some text based on it ?
2- if the question in 1 is no, what the evaluation method can be used to evaluate the generated text ?

to more understand i went to generate for example Advertising campaign

based on Advertising campaign dataset , how can evaluate the generated text without seed text ?

Thank you

Example on how to freeze the model

Can you provide an example on how to freeze the shakespeare-rnn? It would be nice to save the model into a .pb file.

Is it possible to generate from seed text?

I am hoping to use this in an application where the algorithm completes a sentence for you. Where in rnn_play.py is the seed text given?

What the part 2 is doing? AI to write like Shakespeare?

I can easily understand part 1, which is to recognize MNIST handwirtten digits all the way up to 99.51% accuracy. I enjoy experimenting all the tips, learning rate, dropout, up to BN. But can't see what is this part 2 doing at all. I appreciate anyone who point it out a little.

rnn_train.py fails with my_txtutils.py

Hi Martin,
first of all - thanks for the great job on the TF materials! Code, YT, slides, etc.. Very high quality and very well presented. IMHO best high-level TF materials available.
All types of mnist convolutions Pt.1 works flawlessly, but unfortunately RNN's won't start due to some, I suspect, libraries inconsistency in-between some recent upgrades. This affects your_txtutils.py
Btw, the rnn_play with downloaded checkpoints works perfectly.
Any clue of a quick fix to start training?
thx

commit: 5c3b931

(outrun) pawel@paweldebian:tensorflow-rnn-shakespeare$ python3 rnn_train.py 
Traceback (most recent call last):
  File "rnn_train.py", line 51, in <module>
    codetext, valitext, bookranges = txt.read_data_files(shakedir, validation=True)
  File "/home/pawel/M/outrun/tensorflow-rnn-shakespeare/my_txtutils.py", line 247, in read_data_files
    shakelist = glob.glob(directory, recursive=True)
TypeError: glob() got an unexpected keyword argument 'recursive'
(outrun) pawel@paweldebian:tensorflow-rnn-shakespeare$ python3 --version
Python 3.4.2
(outrun) pawel@paweldebian:tensorflow-rnn-shakespeare$ git log | head -6
commit 5c3b9313b023a15d3f3ab786617c79b60cd043a1
Merge: 43b83b6 9a7fd70
Author: Martin Görner <[email protected]>
Date:   Tue May 23 10:59:56 2017 +0200

    Updates for TF 1.1
(outrun) pawel@paweldebian:tensorflow-rnn-shakespeare$

Get a unicode error.

After cloning the repo and running python3 rnn_train.py I get the following error:

Traceback (most recent call last):
  File "rnn_train.py", line 148, in <module>
    txt.print_learning_learned_comparison(x, y, l, bookranges, bl, acc, epoch_size, step, epoch)
  File "/global/project/projectdirs/m1532/rafael/note_generator/my_txtutils.py", line 159, in print_learning_learned_comparison
    print(print_string.format(decx, decy, loss_string))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2502' in position 28: ordinal not in range(128)

I am using tensorflow version 1.8.

Windows TensorFlow error: CUBLAS_STATUS_NOT_INITIALIZED

I've confirmed with another person that running this on Windows hits this error, so it isn't just me. It is probably a TensorFlow issue, since it works on Ubuntu, but I thought I'd create this issue in case others hit it.

tensorflow-rnn-shakespeare>python rnn_train.py
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
Loading file shakespeare\1kinghenryiv.txt
Loading file shakespeare\1kinghenryvi.txt
...snip...
Loading file shakespeare\winterstale.txt
Training text size is 4.90MB with 142.38KB set aside for validation. There will be 1712 batches per epoch
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 8.00GiB
Free memory: 6.68GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Anaconda3\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 610), b.shape=(610, 1024), m=100, n=1024, k=610
         [[Node: RNN/while/MultiRNNCell/Cell0/GRUCell/Gates/Linear/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](RNN/while/MultiRNNCell/Cell0/GRUCell/Gates/Linear/concat, RNN/while/MultiRNNCell/Cell0/GRUCell/Gates/Linear/MatMul/Enter)]]
         [[Node: Y/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2756_Y", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

(errors continue)

Looks like cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED is the first error.

I'm running on Windows 10, Anaconda3 4.2.0, Python 3.5.2, tensorflow 0.12.0.rc0

Has no attribute 'GRUCell'

Traceback (most recent call last):
File "rnn_train.py", line 76, in
onecell = rnn.GRUCell(INTERNALSIZE)
AttributeError: module 'tensorflow.contrib.rnn' has no attribute 'GRUCell'

how to add support for other languages?

hi! i've added a bunch of texts in russian for learning, just to see what will happen, but seems like there is some kind of limitation in the my_txtutils.py on utf8
how can i remove this limitations?
or is there any way around?

UTF-8 and richer character sets?

Any hints on what to change to accommodate more than US ASCII? I am working with cookbook text that freely mingles French words (like entrée and à la môde) as well as the "vulgar fraction" characters (like ⅔ and ¼). They're unicode characters UTF-8 encoded. It seems like there are two parallel functions (convert_from_alphabet() and convert_to_alphabet()) that need to be adjusted manually to match. I don't really feel like enumerating every single possible Unicode character I might encounter, and putting it in the alphabet manually, though. Is there a simpler way?

Incorrect use of MultiRNNCell

Must use MultiRNNCell on multiple instances of base cells, not multiple copies of the same cell

refactor to use SavedModel: tf.saved_model.simple_save

checkpoints are very large

investigate if this does not have something to do with constant data in the tensorflow graph

Slight incompatibilites with TF 1.0

I ran into some version errors with Tensorflow 1.0

AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'rnn_cell'

(already mentioned by rogerallen)

ValueError: Only call softmax_cross_entropy_with_logits with named arguments (labels=..., logits=..., ...)

I had to change lines 75-78 of rnn_train.py to

onecell = tf.contrib.rnn.GRUCell(INTERNALSIZE) dropcell = tf.contrib.rnn.DropoutWrapper(onecell, input_keep_prob=pkeep) multicell = tf.contrib.rnn.MultiRNNCell([dropcell]*NLAYERS, state_is_tuple=False) multicell = tf.contrib.rnn.DropoutWrapper(multicell, output_keep_prob=pkeep)

and line 100 to

loss = tf.nn.softmax_cross_entropy_with_logits(logits = Ylogits, labels = Yflat_)

And btw: Thank you very much for your excellent presentation! Extremely helpful. And fun to watch...

Has no attribute 'GRUCell'?

I try run the code, but I meet this error. My Tensorflow version is Version: 1.0.1. Can you help me fix it?