Coder Social home page Coder Social logo

maraoz / char-rnn-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sherjilozair/char-rnn-tensorflow

0.0 1.0 0.0 508 KB

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow

License: MIT License

Python 100.00%

char-rnn-tensorflow's Introduction

char-rnn-tensorflow

Join the chat at https://gitter.im/char-rnn-tensorflow/Lobby Coverage Status Build Status

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow.

Inspired from Andrej Karpathy's char-rnn.

Requirements

Basic Usage

To train with default parameters on the tinyshakespeare corpus, run python train.py. To access all the parameters use python train.py --help.

To sample from a checkpointed model, python sample.py. Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU. To force CPU mode, use export CUDA_VISIBLE_DEVICES="" and unset CUDA_VISIBLE_DEVICES afterward (resp. set CUDA_VISIBLE_DEVICES="" and set CUDA_VISIBLE_DEVICES= on Windows).

To continue training after interruption or to run on more epochs, python train.py --init_from=save

Datasets

You can use any plain text file as input. For example you could download The complete Sherlock Holmes as such:

cd data
mkdir sherlock
cd sherlock
wget https://sherlock-holm.es/stories/plain-text/cnus.txt
mv cnus.txt input.txt

Then start train from the top level directory using python train.py --data_dir=./data/sherlock/

A quick tip to concatenate many small disparate .txt files into one large training file: ls *.txt | xargs -L 1 cat >> input.txt.

Tuning

Tuning your models is kind of a "dark art" at this point. In general:

  1. Start with as much clean input.txt as possible e.g. 50MiB
  2. Start by establishing a baseline using the default settings.
  3. Use tensorboard to compare all of your runs visually to aid in experimenting.
  4. Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
  5. Tweak --num_layers from 2 to 3 but no higher unless you have experience.
  6. Tweak --seq_length up from 50 based on the length of a valid input string (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc). An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
  7. Finally once you've done all that, only then would I suggest adding some dropout. Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.

Tensorboard

To visualize training progress, model graphs, and internal state histograms: fire up Tensorboard and point it at your log_dir. E.g.:

$ tensorboard --logdir=./logs/

Then open a browser to http://localhost:6006 or the correct IP/Port specified.

Roadmap

  • Add explanatory comments
  • Expose more command-line arguments
  • Compare accuracy and performance with char-rnn
  • More Tensorboard instrumentation

Contributing

Please feel free to:

  • Leave feedback in the issues
  • Open a Pull Request
  • Join the gittr chat
  • Share your success stories and data sets!

char-rnn-tensorflow's People

Contributors

adityashrm21 avatar anantzoid avatar bigsnarfdude avatar bnaul avatar canyonwind avatar codeman38 avatar dbenoot avatar gaolemeng avatar gitter-badger avatar hugovk avatar hunkim avatar jaspi10 avatar john-parton avatar madrugado avatar nfmcclure avatar normanheckscher avatar royaff0 avatar sherjilozair avatar srombauts avatar ubergarm avatar vnikme avatar wichtounet avatar wohnjayne avatar ziky90 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.