A versatile GAN(generative adversarial network) implementation focused on scalability and ease-of-use.
Logos generated with examples/colorizer
- Changelog
- Quick start
- Configuration
- The pip package hypergan
- Training
- Sampling
- Web Server
- API
- Datasets
- Supervised learning
- Unsupervised learning
- Creating a Dataset
- Downloadable Datasets
- Contributing
- About
- Sources
- Papers
- Citation
- Tensorflow 1.0 support
- New configuration format and refactored api.
- New loss function based on least squared GAN. See lsgan implementation.
- API example
2d-test
- tests a trainer/encoder/loss combination against a known distribution. - API example
2d-measure
- measure and report the above test by randomly combining options. - And more
- New loss function based on
wgan
:. Fixes many classes of mode collapse! See wgan implementation - Initial Public API Release
- API example:
colorizer
- re-colorize an image! - API example:
inpainter
- remove a section of an image and have your GAN repaint it - API example:
super-resolution
- zoom in and enhance. We've caught the bad guy! - 4 new samplers.
--sampler
flag. Valid options are:batch
,progressive
,static_batch
,grid
.
- New encoders
- Support for multiple discriminators
- Support for discriminators on different image resolutions
- fixed configuration save/load
- cleaner cli output
- documentation cleanup
- pip package released!
- Better defaults. Good variance. 256x256. The broken images showed up after training for 5 days.
- Initial private release
- For 256x256, we recommend a GTX 1080 or better. 32x32 can be run on lower-end GPUs.
- CPU mode is extremely slow. Never train with it!
- Python3
pip3 install hypergan --upgrade
# Train a 32x32 gan with batch size 32 on a folder of pngs
hypergan train [folder] -s 32x32x3 -f png -b 32
On ubuntu sudo apt-get install libgoogle-perftools4
and make sure to include this environment variable before training
LD_PRELOAD="/usr/lib/libtcmalloc.so.4" hypergan train my_dataset
If you wish to modify hypergan
git clone https://github.com/255BITS/hypergan
cd hypergan
python3 setup.py develop
Make sure to include the following 2 arguments:
CUDA_VISIBLE_DEVICES= hypergan --device '/cpu:0'
Don't train on CPU! It's too slow.
Configuration in HyperGAN uses JSON files. You can create a new config by running hypergan train
. By default, configurations are randomly generated using Hyperchamber.
Configurations are located in:
~/.hypergan/configs/
--config [name]
Naming a configuration during training required.
A hypergan configuration contains multiple encoders, multiple discriminators, multiple loss functions, and a single generator.
A generator is responsible for projecting an encoding (sometimes called z space) to an output (normally an image). A single GAN object from HyperGAN has one generator.
Resize conv pseudo code looks like this
1. net = linear(z, z_projection_depth)
2. net = resize net to max(output width/height, double input width/height)
3. add layer filter if defined
4. convolution block
5. If at output size:
6. Else add first 3 layers to progressive enhancement output and go to 2
attribute | description | type |
---|---|---|
create | Called during graph creation | f(config, gan, net):net |
z_projection_depth | The output size of the linear layer before the resize-conv stack. | int > 0 |
activation | Activations to use. See activations | f(net):net |
final_activation | Final activation to use. This is usually set to tanh to squash the output range. | f(net):net |
depth_reduction | Reduces the filter sizes on each convolution by this multiple. | float > 0 |
layer_filter | On each resize of G, we call this method. Anything returned from this method is added to the graph before the next convolution block. See common layer filters | f(net):net |
layer_regularizer | This "regularizes" each layer of the generator with a type. See layer regularizers | f(name)(net):net |
You can combine multiple encoders into a single GAN.
attribute | description | type |
---|---|---|
create | Called during graph creation | f(config, gan, net):net |
z | The dimensions of random uniform noise inputs | int > 0 |
min | Lower bound of the random uniform noise | int > 0 |
max | Upper bound of the random uniform noise | int > min |
projections | See more about projections below | [f(config, gan, net):net, ...] |
modes | If using modes, the number of modes to have per dimension | int > 0 |
This encoder takes a random uniform value and outputs it as many possible types. The primary idea is that you are able to query Z as a random uniform distribution, even if the gan is using a spherical representation.
Some projection types are listed below.
One of many
On/Off
Uses categorical prior to choose 'one-of-many' options. Can be paired with Categorical Loss.
You can combine multiple discriminators in a single GAN. This type of ensembling can be useful, but by default only 1 is enabled.
attribute | description | type |
---|---|---|
create | Called during graph creation | f(config, gan, net):net |
activation | Activations to use. See activations | f(net):net |
depth_increase | Increases the filter sizes on each convolution by this multiple. | float > 0 |
final_activation | Final activation to use. This is usually set to tanh to squash the output range. | f(net):net |
layers | The number of convolution layers | int > 0 |
layer_filter | Append information to each layer of the discriminator | f(config, net):net |
layer_regularizer | batch_norm_1, layer_norm_1, or None | f(batch_size, name)(net):net |
fc_layer_size | The size of the linear layers at the end of this network(if any). | int > 0 |
fc_layers | fully connected layers at the end of the discriminator(standard dcgan is 0) | int >= 0 |
noise | Instance noise. Can be added to the input X | float >= 0 |
progressive_enhancement | If true, enable progressive enhancement | boolean |
If true, each layer of the discriminator gets a resized version of X and additional outputs from G.
Our implementation of WGAN is based off the paper. WGAN loss in Tensorflow can look like:
d_loss = d_real - d_fake
g_loss = d_fake
d_loss and g_loss can be reversed as well - just add a '-' sign.
d_loss = (d_real-b)**2 - (d_fake-a)**2
g_loss = (d_fake-c)**2
a, b, and c are all hyperparameters.
Includes support for Improved GAN. See hypergan/losses/standard_gan_loss.py
for details.
Supervised loss is for labeled datasets. This uses a standard softmax loss function on the outputs of the discriminator.
This is currently untested.
attribute | description | type |
---|---|---|
batch_norm | batch_norm_1, layer_norm_1, or None | f(batch_size, name)(net):net |
create | Called during graph creation | f(config, gan, net):net |
discriminator | Set to restrict this loss to a single discriminator(defaults to all) | int >= 0 or None |
label_smooth | improved gan - Label smoothing. | float > 0 |
labels | lsgan - A triplet of values containing (a,b,c) terms. | [a,b,c] floats |
reduce | Reduces the output before applying loss | f(net):net |
reverse | Reverses the loss terms, if applicable | boolean |
Uses RMSProp on G and D
Uses Adam on G and D
Uses SGD on G and D
attribute | description | type |
---|---|---|
create | Called during graph creation | f(config, gan, net):net |
run | Steps forward once in training. | f(gan):[d_cost, g_cost] |
g_learn_rate | Learning rate for the generator | float >= 0 |
g_beta1 | (adam) | float >= 0 |
g_beta2 | (adam) | float >= 0 |
g_epsilon | (adam) | float >= 0 |
g_decay | (rmsprop) | float >= 0 |
g_momentum | (rmsprop) | float >= 0 |
d_learn_rate | Learning rate for the discriminator | float >= 0 |
d_beta1 | (adam) | float >= 0 |
d_beta2 | (adam) | float >= 0 |
d_epsilon | (adam) | float >= 0 |
d_decay | (rmsprop) | float >= 0 |
d_momentum | (rmsprop) | float >= 0 |
clipped_gradients | If set, gradients will be clipped to this value. | float > 0 or None |
d_clipped_weights | If set, the discriminator will be clipped by value. | float > 0 or None |
hypergan -h
# Train a 256x256 gan with batch size 32 on a folder of pngs
hypergan train [folder] -s 32x32x3 -f png -b 32 --config [name]
# Train a 256x256 gan with batch size 32 on a folder of pngs
hypergan train [folder] -s 32x32x3 -f png -b 32 --config [name] --sampler static_batch --sample_every 5
One way a network learns:
To create videos:
ffmpeg -i samples/%06d.png -vcodec libx264 -crf 22 -threads 0 gan.mp4
# Train a 256x256 gan with batch size 32 on a folder of pngs
hypergan serve [folder] -s 32x32x3 -f png -b 32 --config [name]
Serve starts a flask server. You can then access:
http://localhost:5000/sample.png?type=batch
To prevent the GPU from allocating space, see Running on CPU.
Build takes the same arguments as train and builds a generator. It's required for serve.
Building does 2 things:
- Loads the training model, which include the discriminator
- Saves into a ckpt model containing only the generator
Saves are stored in ~/.hypergan/saves/
They can be large.
--format <type>
Type can be one of:
- jpg
- png
To see a detailed list, run
hypergan -h
- -s, --size, optional(default 64x64x3), the size of your data in the form 'width'x'height'x'channels'
- -f, --format, optional(default png), file format of the images. Only supports jpg and png for now.
import hypergan as hg
API is currently under development. The best reference are the examples in the examples
directory.
Runs a 2d toy problem for a given configuration. Can be sampled to show how a given configuration learns.
Applies a batch accuracy (nearest neighbor) measurement to the 2d toy problem.
Colorizer feeds a black and white version of the input into the generator.
Hides a random part of the image from the discriminator and the generator.
Provides a low resolution image to the generator.
Applies a constant mask over part of the image. An easier problem than general inpainting.
The GAN
object consists of:
- The
config
(configuration) used - The
graph
- specific named Tensors in the Tensorflow graph - The tensorflow
sess
(session)
hg.GAN(config, initial_graph, graph_type='full', device='/gpu:0')
When a GAN constructor is called, the Tensorflow graph will be constructed.
- config - The graph configuration. See examples or the CLI tool for usage.
- initial_graph - a Dictionary consisting of any variables used by the GAN
- graph_type - Either 'full' or 'generator'
- device - Tensorflow device id
property | type | description |
---|---|---|
gan.graph | Dictionary | Maps names to tensors |
gan.config | Dictionary | Maps names to options(from the json) |
gan.sess | tf.Session | The tensorflow session |
gan.save(save_file)
save_file - a string designating the save path
Saves the GAN
gan.sample_to_file(name, sampler=grid_sampler.sample)
- name - the name of the file to sample to
- sampler - the sampler method to use
Sample to a specified path.
gan.train()
Steps the gan forward in training once. Trains the D and G according to your specified trainer
.
To build a new network you need a dataset. Your data should be structured like:
[folder]/[directory]/*.png
Training with labels allows you to train a classifier
.
Each directory in your dataset represents a classification.
Example: Dataset setup for classification of apple and orange images:
/dataset/apples
/dataset/oranges
You can still build a GAN if your dataset is unlabelled. Just make sure your folder is formatted like
[folder]/[directory]/*.png
where all files are in 1 directory.
- CelebA aligned faces http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
- MS Coco http://mscoco.org/
- ImageNet http://image-net.org/
Contributions are welcome and appreciated! We have many open issues in the Issues tab that have the label Help Wanted.
HyperGAN uses semantic versioning. http://semver.org/
TLDR: x.y.z
- x is incremented on stable public releases.
- y is incremented on API breaking changes. This includes configuration file changes and graph construction changes.
- z is incremented on non-API breaking changes. z changes will be able to reload a saved graph.
The branches are:
master
contains the best GAN we've found as default. It aims to just work for most use cases.develop
contains the latest and can be in a broken state.
Bug fixes and showcases can be merged into master
Configuration changes, new architectures, and generally anything experimental belongs in develop
.
If you create something cool with this let us know! Open a pull request and add your links, and screenshots here!
In case you are interested, our pivotal board is here: https://www.pivotaltracker.com/n/projects/1886395
Notable configurations are stored in example/configs
Feel free to submit additional ones.
Generative Adversarial Networks consist of 2 learning systems that learn together. HyperGAN implements these learning systems in Tensorflow with deep learning.
The discriminator
learns the difference between real and fake data. The generator
learns to create fake data.
For a more in-depth introduction, see here http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/
A single fully trained GAN
consists of the following useful networks:
generator
- Generates content that fools thediscriminator
. If using supervised learning mode, can generate data on a specific classification.discriminator
- The discriminator learns how to identify real data and how to detect fake data from the generator.classifier
- Only available when using supervised learning. Classifies an image by type. Some examples of possible datasets are 'apple/orange', 'cat/dog/squirrel'. See Creating a Dataset.
HyperGAN is currently in open beta.
- GAN - https://arxiv.org/abs/1406.2661
- DCGAN - https://arxiv.org/abs/1511.06434
- InfoGAN - https://arxiv.org/abs/1606.03657
- Improved GAN - https://arxiv.org/abs/1606.03498
- Adversarial Inference - https://arxiv.org/abs/1606.00704
- WGAN - https://arxiv.org/abs/1701.07875
- LS-GAN - https://arxiv.org/pdf/1611.04076v2.pdf
- DCGAN - https://github.com/carpedm20/DCGAN-tensorflow
- InfoGAN - https://github.com/openai/InfoGAN
- Improved GAN - https://github.com/openai/improved-gan
- Hyperchamber - https://github.com/255bits/hyperchamber
If you wish to cite this project, do so like this:
255bits (M. Garcia),
HyperGAN, (2017),
GitHub repository,
https://github.com/255BITS/HyperGAN