Coder Social home page Coder Social logo

dcgan.torch's Introduction

DCGAN.torch: Train your own image generator

  1. Train your own network
    1. Train a face generator using the Celeb-A dataset
    2. Train Bedrooms, Bridges, Churches etc. using the LSUN dataset
    3. Train a generator on your own set of images.
    4. Train on the ImageNet dataset
  2. Use a pre-trained generator to generate images.
    1. Generate samples of 64x64 pixels
    2. Generate large artsy images (tried up to 4096 x 4096 pixels)
    3. Walk in the space of samples
  3. Vector Arithmetic of images in latent space

Prerequisites

  • Computer with Linux or OSX
  • Torch-7
  • For training, an NVIDIA GPU is strongly recommended for speed. CPU is supported but training is very slow.

Installing dependencies

Without GPU

With NVIDIA GPU

  • Install CUDA, and preferably CuDNN (optional).
  • Install Torch: http://torch.ch/docs/getting-started.html#_
  • Install optnet to reduce memory footprint for large images: luarocks install optnet
  • Optional, if you installed CuDNN, install cudnn bindings with luarocks install cudnn

Display UI

Optionally, for displaying images during training and generation, we will use the display package.

  • Install it with: luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec
  • Then start the server with: th -ldisplay.start
  • Open this URL in your browser: http://localhost:8000

You can see training progress in your browser window. It will look something like this: display

1. Train your own network

1.1 Train a face generator using the Celeb-A dataset

Preprocessing

mkdir celebA; cd celebA

Download img_align_celeba.zip from http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html under the link "Align&Cropped Images".

unzip img_align_celeba.zip; cd ..
DATA_ROOT=celebA th data/crop_celebA.lua

Training

DATA_ROOT=celebA dataset=folder th main.lua

1.2. Train Bedrooms, Bridges, Churches etc. using the LSUN dataset

LSUN dataset is shipped as an LMDB database. First, install LMDB on your system.

  • On OSX with Homebrew: brew install lmdb
  • On Ubuntu: sudo apt-get install liblmdb-dev

Then install a couple of Torch packages.

luarocks install lmdb.torch
luarocks install tds

Preprocessing (with bedroom class as an example)

Download bedroom_train_lmdb from the LSUN website.

Generate an index file:

DATA_ROOT=[path_to_lmdb] th data/lsun_index_generator.lua

Training

DATA_ROOT=[path-to-lmdb] dataset=lsun th main.lua

The code for the LSUN data loader is hardcoded for bedrooms. Change this line to another LSUN class to generate other classes.

1.3. Train a generator on your own set of images.

Preprocessing

  • Create a folder called myimages.
  • Inside that folder, create a folder called images and place all your images inside it.

Training

DATA_ROOT=myimages dataset=folder th main.lua

1.4. Train on the ImageNet dataset

Preprocessing

Follow instructions from this link.

Training

DATA_ROOT=[PATH_TO_IMAGENET]/train dataset=folder th main.lua

All training options:

   dataset = 'lsun',       -- imagenet / lsun / folder
   batchSize = 64,
   loadSize = 96,
   fineSize = 64,
   nz = 100,               -- #  of dim for Z
   ngf = 64,               -- #  of gen filters in first conv layer
   ndf = 64,               -- #  of discrim filters in first conv layer
   nThreads = 1,           -- #  of data loading threads to use
   niter = 25,             -- #  of iter at starting learning rate
   lr = 0.0002,            -- initial learning rate for adam
   beta1 = 0.5,            -- momentum term of adam
   ntrain = math.huge,     -- #  of examples per epoch. math.huge for full dataset
   display = 1,            -- display samples while training. 0 = false
   display_id = 10,        -- display window id.
   gpu = 1,                -- gpu = 0 is CPU mode. gpu=X is GPU mode on GPU X
   name = 'experiment1',
   noise = 'normal',       -- uniform / normal
   epoch_save_modulo = 1,  -- save checkpoint ever # of epoch

2. Use a pre-trained generator to generate images.

The generate script can operate in CPU or GPU mode.

to run it on the CPU, use:

gpu=0 net=[checkpoint-path] th generate.lua

for using a GPU, use:

gpu=1 net=[checkpoint-path] th generate.lua

Pre-trained network can be downloaded from here:

##2.1. Generate samples of 64x64 pixels

gpu=0 batchSize=64 net=celebA_25_net_G.t7 th generate.lua

The batchSize parameter controls the number of images to generate. If you have display running, the image will be shown there. The image is also saved to generation1.png in the same folder.

faces_pregen

##2.2. Generate large artsy images (tried up to 4096 x 4096 pixels)

gpu=0 batchSize=1 imsize=10 noisemode=linefull net=bedrooms_4_net_G.t7 th generate.lua

Controlling the imsize parameter will control the size of the output image. Larger the imsize, larger the output image.

line2d_pregen

##2.3. Walk in the space of samples

gpu=0 batchSize=16 noisemode=line net=bedrooms_4_net_G.t7 th generate.lua

controlling the batchSize parameter changes how big of a step you take.

interp_pregen

Vector Arithmetic

net=[modelfile] gpu=0 qlua arithmetic.lua

vector_arithmetic

dcgan.torch's People

Contributors

gcinbis avatar gwern avatar jfsantos avatar jshaw avatar soumith avatar szagoruyko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dcgan.torch's Issues

which lua version is suitable?

When I run DATA_ROOT=xxx dataset=folder th main.sh
I get the information:
[C]: in function 'xpcall'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/home/xhs/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/xhs/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/home/xhs/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:142: in function 'specific'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:125: in function 'Threads'
/home/xhs/dcgan.torch/data/data.lua:30: in function 'new'
main.lua:38: in main chunk
[C]: in function 'dofile'
.../xhs/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk

by google I find if the version is not matching? How to solve the question?Thank you!

net:evaluate() unexpected behaviour

After commit e057802, net:evaluate() mode no longer displays correct images, but noise.

I have two trained nets, one before and another after this commit. In net:training() mode they both have a normal behaviour, but in net:evaluate() the first one correctly generates the same images as in training mode, but the later just outputs noise.

Here's an example of an MNIST trained model after the commit. Left one is on training mode, right one is on evaluate.

1

After some tests I can say that it is not the way nets are loaded - a network trained before the commit but loaded after it works correctly.

Also, I have run some batch iterations before saving the model in order to update running_mean and running_var from BN layers.

linear interpolation?

Hi - I've been doing a lot of work lately with interpolation in latent space, and I think linear interpolation might not be the best interpolation operator for high dimensional spaces. Though admittedly this is common practice, this seemed as good a place as any to discuss this, since the dcgan code seems to do exactly that here:

noiseL = torch.FloatTensor(opt.nz):uniform(-1, 1)
noiseR = torch.FloatTensor(opt.nz):uniform(-1, 1)
if opt.noisemode == 'line' then
   -- do a linear interpolation in Z space between point A and point B
   -- each sample in the mini-batch is a point on the line
    line  = torch.linspace(0, 1, opt.batchSize)
    for i = 1, opt.batchSize do
        noise:select(1, i):copy(noiseL * line[i] + noiseR * (1 - line[i]))
    end

I'm starting with the assumption that torch.FloatTensor(opt.nz):uniform(-1, 1) is a valid way to uniformly sample from the prior in the latent space. In the examples below, I'll leave the nz dimension at the default of 100. Let's do an experiment and see what the expected lengths of these vectors are.

image

I see a gaussian with mean about 5.76 and with 0.25 standard deviation. I believe this means that >99% of vectors would be expected to have a length between 4.8 and 6.8 (4 standard deviations out). This result should not be a big surprise if we think about taking 100 independent random numbers and then running them through the distance formula.

But now let's think about the effects of linear interpolation between these random vectors. At an extreme, we have the linearly interpolated midpoints halfway between any two of these vectors - let's see what the expected lengths of these are.

image

So now we have a gaussian with a mean vector of 4.06 and 0.24 standard deviation. Needless to say, these are not the same distribution, and in fact they are effectively disjoint - the probability of an item from the second appearing in the first is vanishingly small. In other words, the points on the linearly interpolated path are many standard deviations away from points expected in the prior distribution.

If my premise is correct that torch.FloatTensor(opt.nz):uniform(-1, 1) performs a uniform sampling across the latent space (a big if, and I'd like to verify this!), then the prior is more shaped like a hypersphere. In that case, spherical interpolation makes a lot more sense, and in my own experiments I've had good qualitative results with this approach. Curious what others think. Also note that this reasoning could be extended beyond just interpolation since this would also affect other interpretable operations - such as finding the average in a subset of labeled data (eg: average man or woman in faces).

Effect of noise vector distribution on training and image generation

Hi,

I am trying to understand how the distribution of Z vector affects the training and the subsequent generation of images from the trained generator. The paper hasn't mentioned any significant effects of using different kinds of distributions to sample Z vectors from. From my experiments, I found that it matters a lot for the quality and type of images generated. For example, the following are some images generated after training the DCGAN on Celeb dataset for 25 iterations using uniform(0,1) distribution for sampling the Z vectors.

25

Also, after training the DCGAN on a normal(0,1) distribution, the corresponding trained generator's results on a Z vector not sampled from this normal distribution weren't good.

Can anyone give any tips on choosing the right kind of distribution for Z vector sampling based on the kind of training data we use?

Different generated images with the same vector noise depending on # of noises.

Hi,
I was doing some experiments with the DCGAN with MNIST dataset. Once the GAN is trained, I have noticed that the generated images do not only depend on the original vector noise that originated it, but also the other vector noises that were also given as an input to create other images.

Let's put a simple example. I have the input vector Z of the generator (9x100x1x1), which is made of 9 subvector noises of dimensionality 1x100x1x1. The generator, then, outputs these nine 32x32 generated images:

3x3mnist

Let's say that, for whatever reason, I'm interested in replicating only the middle left image. So, if I input just the 1x100x1x1 vector instead of the 9x100x1x1, what I obtain instead is this generated image:

1mnist

Which is far from being identical to the previous middle left image. So, why is this happening? Shouldn't the generated images be the same regardless of how many input vectors are you using?

This is important if you want to replicate results (for encoding purposes, for example), as you need to exactly input all the input vector noises, not only the one you are interested in.

Thanks.

How to view generated image?

I followed first part of tutorial steps written in README file.
After downloaded celebA files, I executed pre-processing and training.
I didn't change any main.lua or crop_celebA.lua scripts.
After training all 25 epochs, I executed generate.lua script with the last created model in checkpoints, named as 'experiment1_25_net_G.t7' (as the largest number after 'experiment1')
Computer generated a single png image file, but this is not like tutorial result image.
The result image is more like generated feature map.
Was it supposed to be generate feature image? Or should I process another step further?
How to view this generated image as normal image?
I already checked ldisplay, but any image haven't shown up at this time.

Can I use word-based embedding?

I am reimplementing this paper by TensorFlow, but I want to use word-based method instead of char-based one mentioned in the original paper, then I got a high d loss.
I just wonder can I use word-based embedding in principle? If yes, is there any thing I need to consider?

Many thanks.

generate.lua fails on GPU with a GPU-trained model

Using a GPU-trained model, running generate.lua fails as follows

gpu=1 net=experiment1_432_net_G.t7 th generate.lua
{
  gpu : 1
  noisemode : "random"
  name : "generation1"
  noisetype : "normal"
  batchSize : 32
  net : "experiment1_432_net_G.t7"
  imsize : 1
  nz : 100
  display : 1
}
/home/hannu/torch/install/bin/luajit: /home/hannu/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    /home/hannu/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
    /home/hannu/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /home/hannu/torch/install/share/lua/5.1/nn/Module.lua:158: in function 'read'
    /home/hannu/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
    /home/hannu/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
    generate.lua:24: in main chunk
    [C]: in function 'dofile'
    ...annu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x004065d0

Looking at the code, I noticed that the model is loaded before cunn and cudnn are required. I moved/placed

if opt.gpu > 0 then
    require 'cunn'
    require 'cudnn'
end

at the beginning and the code now runs without problems.

Performance on SVHN dataset

Hi Soumith, appreciate much for sharing the code.

Recently, I am trying to use your code to train dcgan on svhn. I tried many network architectures and hyperparameters, but failed to reproduce analogous performance as evaluated by the model shared by Alec Radford. What I can get is ~69% using 32x32 images and 1000 training labels. Unfortunately, I found Alec did not release the training code to get the model, so there is no reference to set up the network arch and determine the hyperparameters. So I would like to know whether you or any others have get a model that has similar performance to Alec's, say, 75+% accuracy on svhn based on 1000 labels.

thanks in advance!

Scaling of Training Images

Is there any significance to scaling the values of training images to the range [-1,1] other than just normalizing them to remove any inherent bias among the dimensions?

Thanks!

Errors of running the main.lua

I tried to run the main.lua using the command "DATA_ROOT=celebA dataset=folder th main.lua", but encountered the errors as follows

{
ntrain : inf
beta1 : 0.5
name : "experiment1"
niter : 25
batchSize : 64
ndf : 64
fineSize : 64
nz : 100
loadSize : 96
gpu : 1
ngf : 64
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 4
display_id : 10
display : 1
}
Random Seed: 8565
Starting donkey with id: 1 seed: 8566
table: 0x40928780
Starting donkey with id: 2 seed: 8567
table: 0x417e0f70
Starting donkey with id: 3 seed: 8568
table: 0x40945ec0
Starting donkey with id: 4 seed: 8569
table: 0x414d2728
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Dataset: folder Size: 33436
/usr/local/torch7/install/bin/luajit: main.lua:82: attempt to call method 'apply' (a nil value)
stack traceback:
main.lua:82: in main chunk
[C]: in function 'dofile'
...cal/torch7/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406640

I am sure the Torch is installed correctly. What is the problem? I am looking forward to your answer. Thanks a lot.

Train on the ImageNet dataset throws error

OSX 10.11.5.

$ DATA_ROOT=myimages dataset=folder th main.lua
{
ntrain : inf
beta1 : 0.5
name : "experiment1"
niter : 25
batchSize : 64
ndf : 64
fineSize : 64
nz : 100
loadSize : 96
gpu : 1
ngf : 64
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 4
display_id : 10
display : 1
}
Random Seed: 251
Starting donkey with id: 1 seed: 252
table: 0x02b80fc8
Starting donkey with id: 2 seed: 253
table: 0x02ba0ca8
Starting donkey with id: 3 seed: 254
table: 0x02bc0f40
Starting donkey with id: 4 seed: 255
table: 0x02be0d88
Creating train metadata
table: 0x02ef7ee0
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
Creating train metadata
table: 0x02bf1af0
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
Creating train metadata
table: 0x02e00c70
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
Creating train metadata
table: 0x02d971a8
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
/tmp/lua_2AEOyi: line 1: gfind: command not found
/tmp/lua_9DgkXe: line 1: gfind: command not found
/tmp/lua_psrQx3: line 1: gfind: command not found
/tmp/lua_aZfoE8: line 1: gfind: command not found
now combine all the files to a single large file
now combine all the files to a single large file
now combine all the files to a single large file
now combine all the files to a single large file
load the large concatenated list of sample paths to self.imagePath
load the large concatenated list of sample paths to self.imagePath
load the large concatenated list of sample paths to self.imagePath
load the large concatenated list of sample paths to self.imagePath
sh: gwc: command not found
sh: gwc: command not found
sh: gwc: command not found
sh: gwc: command not found
/Users/WS18/torch/install/bin/luajit: /Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] /Users/WS18/dcgan.torch/data/dataset.lua:198: attempt to perform arithmetic on a nil value
stack traceback:
/Users/WS18/dcgan.torch/data/dataset.lua:198: in function '__init'
/Users/WS18/torch/install/share/lua/5.1/torch/init.lua:91: in function </Users/WS18/torch/install/share/lua/5.1/torch/init.lua:87>
[C]: in function 'dataLoader'
/Users/WS18/dcgan.torch/data/donkey_folder.lua:82: in main chunk
[C]: in function 'dofile'
/Users/WS18/dcgan.torch/data/data.lua:42: in function </Users/WS18/dcgan.torch/data/data.lua:32>
[C]: in function 'xpcall'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/Users/WS18/torch/install/share/lua/5.1/threads/queue.lua:65: in function </Users/WS18/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/Users/WS18/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:142: in function 'specific'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:125: in function 'Threads'
/Users/WS18/dcgan.torch/data/data.lua:30: in function 'new'
main.lua:38: in main chunk
[C]: in function 'dofile'
...WS18/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0101e83d10

Preprocessing cropping script throws error?

Hello, I'm getting a tedious error running the cropping script for preprocessing the Celeb-A dataset,

ajay@ajay-h8-1170uk:~/TorchProjects/dcgan$ DATA_ROOT=celebA th data/crop_celebA.lua
/usr/local/bin/luajit: /usr/local/share/lua/5.1/image/init.lua:339: attempt to concatenate local 'ext' (a nil value)
stack traceback:
    /usr/local/share/lua/5.1/image/init.lua:339: in function 'load'
    data/crop_celebA.lua:7: in main chunk

I tried to run the script line by line from TREPL using,

data = '/home/ajay/TorchProjects/dcgan/celebA/img_align_celeba'
for f in paths.files(data, function(nm) return nm:find('.jpg') end) do
    f2 = paths.concat(data, f)
    print(f2)
    im = image.load(f2)
end

and got a similar error,

/home/ajay/TorchProjects/dcgan/celebA/img_align_celeba  
/usr/local/share/lua/5.1/image/init.lua:339: attempt to concatenate local 'ext' (a nil value)
stack traceback:
    /usr/local/share/lua/5.1/image/init.lua:339: in function 'load'
    [string "for f in paths.files(data, function(nm) retur..."]:4: in main chunk

Sorry about this I've been away from coding for a while?

64x64 hardwired crop limitation?

So I and another were trying out dcgan.torch to see how well it would work on image sets more complicated than faces (kudos on writing an implementation much easier to get up and running than the original dcgan-theano, BTW; we really weren't looking forward to figuring out how to get HDF5 image input working, although some details could use work - like, why is nThreads=1 by default?), and I became concerned that 64x64 images were just too little to convey all the details and would lead to a poorly-trained NN.

Experimenting with the options, it seems that one can get dcgan.torch to work with almost the whole image by setting the full image size to be very similar to that of the crop size: loadSize=65 fineSize=64. Or one could downscale all the images on disk with a command like ls *.jpg | parallel mogrify -resize 65536@. (I am still trying it out but dcgan appears to make much faster progress when trained on almost-full images at 65x65 than when trained on 64x64 crops of full-resolution images.)

The full image still winds up being extremely low resolution, though. Reading through main.lua and donkey_folder.lua is a little confusing. It looks as if we're supposed to be able to increase the size of trained images by increasing fineSize and also the two parameters governing the size of the base layer of the generator & discriminator NNs, so we thought that using better images would be as simple as loadSize=256 fineSize=255 ngf=255 ndf=255 - load a decent-resolution image, crop it minimally, and feed it into the NNs of same size.

But that doesn't work. In fact, we can't find a setting of fineSize other than 64 which doesn't immediately crash dcgan.torch regardless of what we set the other options to. Are we misunderstanding the config options' intent, or is there a bug somewhere?

Error: attempt to call method 'apply' (a nil value)

Running
gpu=1 batchSize=1 imsize=10 noisemode=linefull net=bedrooms_4_net_G.t7 th generate.lua
Gives error

{
  gpu : 1
  noisemode : "linefull"
  name : "generation1"
  noisetype : "normal"
  batchSize : 1
  net : "bedrooms_4_net_G.t7"
  imsize : 10
  nz : 100
  display : 1
}
/home/ubuntu/torch-distro/install/bin/luajit: /home/ubuntu/dcgan.torch/util.lua:61: attempt to call method 'apply' (a nil value)
stack traceback:
    /home/ubuntu/dcgan.torch/util.lua:61: in function 'load'
    generate.lua:24: in main chunk
    [C]: in function 'dofile'
    ...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406670

Likewise, running

DATA_ROOT=FireLoop dataset=folder th main.lua

Gives similar error

/home/ubuntu/torch-distro/install/bin/luajit: main.lua:82: attempt to call method 'apply' (a nil value)
stack traceback:
    main.lua:82: in main chunk
    [C]: in function 'dofile'
    ...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406670

On an AWS g2.2xlarge machine, with cuDNN.
Lua 5.2.3

Kernel width is 5 in the paper, but 4 in the code.

In the lines 69-73 of main.lua:

netG:add(SpatialFullConvolution(ngf * 8, ngf * 4, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 4)):add(nn.ReLU(true))
-- state size: (ngf*4) x 8 x 8
netG:add(SpatialFullConvolution(ngf * 4, ngf * 2, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 2)):add(nn.ReLU(true))

However Figure 1. of the paper (http://arxiv.org/pdf/1511.06434v2.pdf) shows kernel width of 5 for upconvolutions.

Error on running DATA_ROOT=celebA dataset=folder th main.lua

I get the error below on running DATA_ROOT=celebA dataset=folder th main.lua. Please suggest what could possibly be wrong.

ubuntu@tegra-ubuntu:~/work/dcgan.torch$ DATA_ROOT=celebA dataset=folder th main.lua
{
ntrain : inf
beta1 : 0.5
name : "experiment1"
niter : 25
batchSize : 64
ndf : 64
fineSize : 64
nz : 100
loadSize : 96
gpu : 1
ngf : 64
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 4
display_id : 10
display : 1
}
Random Seed: 7694
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:384: module 'threads' not found:No LuaRocks module found for threads
no field package.preload['threads']
no file '/home/ubuntu/.luarocks/share/lua/5.1/threads.lua'
no file '/home/ubuntu/.luarocks/share/lua/5.1/threads/init.lua'
no file '/usr/local/share/lua/5.1/threads.lua'
no file '/usr/local/share/lua/5.1/threads/init.lua'
no file './threads.lua'
no file '/usr/local/share/luajit-2.0.4/threads.lua'
no file '/home/ubuntu/.luarocks/lib/lua/5.1/threads.so'
no file '/usr/local/lib/lua/5.1/threads.so'
no file './threads.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:384: in function 'require'
/home/ubuntu/work/dcgan.torch/data/data.lua:1: in main chunk
[C]: in function 'dofile'
main.lua:37: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0000d055

Problems with installing cunn / cutorch

Hello - I've been working to get the face examples running on OSX. I've used Torch before, but new to DCGANs.

After running through the full install a couple of times, including completely re-installing Torch (and trying cltorch, before reverting to Torch) I am still having problems.

Running DATA_ROOT=celebA dataset=folder th main.lua results in the following error:

/Users/james/torch/install/bin/luajit: /Users/james/torch/install/share/lua/5.1/trepl/init.lua:384: module 'cunn' not found:No LuaRocks module found for cunn no field package.preload['cunn'] no file '/Users/james/.luarocks/share/lua/5.1/cunn.lua' no file '/Users/james/.luarocks/share/lua/5.1/cunn/init.lua' no file '/Users/james/torch/install/share/lua/5.1/cunn.lua' no file '/Users/james/torch/install/share/lua/5.1/cunn/init.lua' no file '/Users/james/torch-cl/install/share/lua/5.1/cunn.lua' no file '/Users/james/torch-cl/install/share/lua/5.1/cunn/init.lua' no file './cunn.lua' no file '/Users/james/torch/install/share/luajit-2.1.0-beta1/cunn.lua' no file '/usr/local/share/lua/5.1/cunn.lua' no file '/usr/local/share/lua/5.1/cunn/init.lua' no file '/Users/james/.luarocks/lib/lua/5.1/cunn.so' no file '/Users/james/torch/install/lib/lua/5.1/cunn.so' no file '/Users/james/torch/install/lib/cunn.dylib' no file '/Users/james/torch-cl/install/lib/cunn.dylib' no file '/Users/james/torch-cl/install/lib/lua/5.1/cunn.so' no file './cunn.so' no file '/usr/local/lib/lua/5.1/cunn.so' no file '/usr/local/lib/lua/5.1/loadall.so' stack traceback: [C]: in function 'error' /Users/james/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require' main.lua:126: in main chunk [C]: in function 'dofile' ...ames/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0109892d10

Running luarocks install cunn fails when it tries to install cutorch:

CMake Error at /usr/local/Cellar/cmake/3.6.0/share/cmake/Modules/FindCUDA.cmake:619 (message): Specify CUDA_TOOLKIT_ROOT_DIR Call Stack (most recent call first): CMakeLists.txt:7 (FIND_PACKAGE) -- Configuring incomplete, errors occurred! See also "/tmp/luarocks_cutorch-scm-1-5945/cutorch/build/CMakeFiles/CMakeOutput.log". Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec - Build error: Failed building.

As I understand it, CUDA requires a GPU, which I don't have - but the instructions for dcgan.torch suggest that a GPU isn't necessary, so I believe there should be a way around this?

Many thanks.

Error with SpatialFullConvolution when running on GPU on Ubuntu 16.04 with Cuda 7.5

Probably not the wisest decision to upgrade to 16.04 already but it's (basically) too late now. The error message below persists after completely reinstalling torch and all deps. I'm running Cuda 7.5 and have various cudnn primitive binding libs install in /usr/local/cuda-7.5/lib64. Cuda is in my PATH and also have LD_LIBRARY_PATH, CUDA_HOME, CUDA_LIB, CUDA_BIN, CPATH, LIBRARY_PATH setup. I'm so lost haha

jamis@jamis:~/src/dcgan.torch$ gpu=1 net=checkpoints/celebA_25_net_G.t7 th generate.lua
{
  gpu : 1
  noisemode : "random"
  name : "generation1"
  noisetype : "normal"
  batchSize : 32
  net : "checkpoints/celebA_25_net_G.t7"
  imsize : 1
  nz : 100
  display : 1
}
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> output]
  (1): nn.SpatialFullConvolution(100 -> 512, 4x4)
  (2): nn.SpatialBatchNormalization
  (3): nn.ReLU
  (4): nn.SpatialFullConvolution(512 -> 256, 4x4, 2,2, 1,1)
  (5): nn.SpatialBatchNormalization
  (6): nn.ReLU
  (7): nn.SpatialFullConvolution(256 -> 128, 4x4, 2,2, 1,1)
  (8): nn.SpatialBatchNormalization
  (9): nn.ReLU
  (10): nn.SpatialFullConvolution(128 -> 64, 4x4, 2,2, 1,1)
  (11): nn.SpatialBatchNormalization
  (12): nn.ReLU
  (13): nn.SpatialFullConvolution(64 -> 3, 4x4, 2,2, 1,1)
  (14): nn.Tanh
}
/home/jamis/torch/install/bin/luajit: /home/jamis/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:114: attempt to perform arithmetic on field 'adjW' (a nil value)
stack traceback:
        ...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:114: in function 'createIODescriptors'
        ...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:312: in function <...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:310>
        [C]: in function 'xpcall'
        /home/jamis/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /home/jamis/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        /home/jamis/torch/install/share/lua/5.1/optnet/init.lua:376: in function 'optimizeMemory'
        generate.lua:82: in main chunk
        [C]: in function 'dofile'
        ...amis/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
        [C]: in function 'error'
        /home/jamis/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
        /home/jamis/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        /home/jamis/torch/install/share/lua/5.1/optnet/init.lua:376: in function 'optimizeMemory'
        generate.lua:82: in main chunk
        [C]: in function 'dofile'
        ...amis/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00405d50

Training goes extremely quickly

Hi,
I'm trying to train on an imagenet dataset of "computer rooms." For some reason, training is going by really quickly [about 20 minutes] and is only able to result in noisy shapes. Here's an example:
image
I'm sure I'm missing something obvious? Do I just need a bigger dataset? The err_D rates look different from other examples I've seen.
Here's a sample of what's going on:

$ DATA_ROOT=computer dataset=folder th main.lua
{
  ntrain : inf
  beta1 : 0.5
  name : "experiment3"
  niter : 25
  batchSize : 64
  ndf : 64
  fineSize : 64
  nz : 100
  loadSize : 96
  gpu : 1
  ngf : 64
  dataset : "folder"
  lr : 0.0002
  noise : "normal"
  nThreads : 4
  display_id : 10
  display : 1
}
Random Seed: 2726   
Starting donkey with id: 3 seed: 2729
table: 0x0dceea28
Starting donkey with id: 1 seed: 2727
table: 0x0dcadda0
Starting donkey with id: 4 seed: 2730
table: 0x0dccde18
Starting donkey with id: 2 seed: 2728
table: 0x0dd4ea78
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Dataset: folder  Size:  960 
Epoch: [1][       0 /       15]  Time: 1.973  DataTime: 0.000    Err_G: 0.5680  Err_D: 1.8186   
Epoch: [1][       1 /       15]  Time: 1.785  DataTime: 0.001    Err_G: 1.7695  Err_D: 0.9822   
Epoch: [1][       2 /       15]  Time: 1.765  DataTime: 0.000    Err_G: 0.4781  Err_D: 1.5481   
Epoch: [1][       3 /       15]  Time: 1.806  DataTime: 0.000    Err_G: 1.6676  Err_D: 0.7404   
Epoch: [1][       4 /       15]  Time: 1.752  DataTime: 0.000    Err_G: 1.3097  Err_D: 1.0967   
Epoch: [1][       5 /       15]  Time: 1.785  DataTime: 0.000    Err_G: 1.0771  Err_D: 1.1696   
Epoch: [1][       6 /       15]  Time: 1.740  DataTime: 0.000    Err_G: 1.5180  Err_D: 1.0296   
Epoch: [1][       7 /       15]  Time: 1.769  DataTime: 0.001    Err_G: 0.7151  Err_D: 1.2693   
Epoch: [1][       8 /       15]  Time: 1.738  DataTime: 0.000    Err_G: 2.8160  Err_D: 0.6199   
Epoch: [1][       9 /       15]  Time: 2.265  DataTime: 0.000    Err_G: 0.3205  Err_D: 1.9448   
Epoch: [1][      10 /       15]  Time: 1.464  DataTime: 0.000    Err_G: 5.2249  Err_D: 0.7500   
Epoch: [1][      11 /       15]  Time: 1.739  DataTime: 0.000    Err_G: 0.8537  Err_D: 1.0718   
Epoch: [1][      12 /       15]  Time: 1.742  DataTime: 0.000    Err_G: 1.7960  Err_D: 0.6485   
Epoch: [1][      13 /       15]  Time: 1.758  DataTime: 0.000    Err_G: 0.8574  Err_D: 0.9329   
Epoch: [1][      14 /       15]  Time: 1.754  DataTime: 0.001    Err_G: 5.5307  Err_D: 0.7156   
End of epoch 1 / 25      Time Taken: 28.003 
Epoch: [2][       0 /       15]  Time: 1.430  DataTime: 0.000    Err_G: 1.0214  Err_D: 0.8263   
Epoch: [2][       1 /       15]  Time: 1.743  DataTime: 0.000    Err_G: 3.3267  Err_D: 0.3615   
Epoch: [2][       2 /       15]  Time: 1.744  DataTime: 0.000    Err_G: 0.3200  Err_D: 1.8885   
Epoch: [2][       3 /       15]  Time: 1.740  DataTime: 0.000    Err_G: 9.9051  Err_D: 0.5392   
Epoch: [2][       4 /       15]  Time: 1.741  DataTime: 0.000    Err_G: 5.0082  Err_D: 0.2205   
Epoch: [2][       5 /       15]  Time: 1.739  DataTime: 0.001    Err_G: 0.0109  Err_D: 5.1230   
Epoch: [2][       6 /       15]  Time: 1.741  DataTime: 0.000    Err_G: 11.8632  Err_D: 0.3344  
Epoch: [2][       7 /       15]  Time: 1.762  DataTime: 0.000    Err_G: 11.7923  Err_D: 0.4543  
Epoch: [2][       8 /       15]  Time: 1.744  DataTime: 0.000    Err_G: 3.1376  Err_D: 0.1725   
Epoch: [2][       9 /       15]  Time: 2.322  DataTime: 0.000    Err_G: 0.0095  Err_D: 5.2187   
Epoch: [2][      10 /       15]  Time: 1.430  DataTime: 0.001    Err_G: 11.9288  Err_D: 0.3458  
Epoch: [2][      11 /       15]  Time: 1.743  DataTime: 0.000    Err_G: 12.4909  Err_D: 0.4700  
Epoch: [2][      12 /       15]  Time: 1.743  DataTime: 0.000    Err_G: 4.0857  Err_D: 0.1086   
Epoch: [2][      13 /       15]  Time: 1.747  DataTime: 0.000    Err_G: 0.0036  Err_D: 6.3310   
Epoch: [2][      14 /       15]  Time: 1.745  DataTime: 0.000    Err_G: 12.2344  Err_D: 0.1794  
End of epoch 2 / 25      Time Taken: 27.318 
Epoch: [3][       0 /       15]  Time: 1.425  DataTime: 0.000    Err_G: 13.8863  Err_D: 0.4748  
Epoch: [3][       1 /       15]  Time: 1.739  DataTime: 0.000    Err_G: 7.3155  Err_D: 0.1225   
Epoch: [3][       2 /       15]  Time: 1.737  DataTime: 0.000    Err_G: 0.1615  Err_D: 2.2792   
Epoch: [3][       3 /       15]  Time: 1.738  DataTime: 0.000    Err_G: 13.2177  Err_D: 0.5127  
Epoch: [3][       4 /       15]  Time: 1.739  DataTime: 0.000    Err_G: 12.9756  Err_D: 0.1219  
Epoch: [3][       5 /       15]  Time: 1.741  DataTime: 0.000    Err_G: 6.0634  Err_D: 0.1051   
Epoch: [3][       6 /       15]  Time: 1.738  DataTime: 0.000    Err_G: 0.0873  Err_D: 2.9455   
Epoch: [3][       7 /       15]  Time: 1.737  DataTime: 0.000    Err_G: 13.9099  Err_D: 0.2702  
Epoch: [3][       8 /       15]  Time: 1.740  DataTime: 0.000    Err_G: 15.4008  Err_D: 0.2829  
Epoch: [3][       9 /       15]  Time: 2.298  DataTime: 0.000    Err_G: 8.0461  Err_D: 0.2743   
Epoch: [3][      10 /       15]  Time: 1.427  DataTime: 0.000    Err_G: 0.3507  Err_D: 1.7363   
Epoch: [3][      11 /       15]  Time: 1.741  DataTime: 0.000    Err_G: 11.6363  Err_D: 0.3731  
Epoch: [3][      12 /       15]  Time: 1.741  DataTime: 0.000    Err_G: 11.4138  Err_D: 0.2604  
Epoch: [3][      13 /       15]  Time: 1.736  DataTime: 0.000    Err_G: 4.7800  Err_D: 0.3057   
Epoch: [3][      14 /       15]  Time: 1.743  DataTime: 0.000    Err_G: 0.0164  Err_D: 4.5180   
End of epoch 3 / 25      Time Taken: 27.280 

Cannot specify no cropping

(This is related to but not identical to issue #2 .)

Cropping is particularly undesirable on very small images like 64x64 where it may delete a lot of the image (especially when the images come pre-centered and cropped already). Currently, you cannot run dcgan.torch with no cropping despite the configurable arguments like loadSize=64 fineSize=64 suggesting that should be possible. This is not due to design but a bug in the cropping code in data/donkey_folder.lua, it seems; said in issue #2:

Right now, loadSize has to be greater than fineSize (because of a bug in the cropping logic). So it's okay to have loadSize=65 fineSize=64 th main.lua


I messed around some with the responsible trainHook and I think the bug can be fixed by simply checking for the case where the original H/W are greater than the fineSize value and if they aren't, feeding 0s into the crop function, so the new version would look like this:

-- do random crop if fineSize/sampleSize is configured to be smaller than NN's input dimensions, loadSize
local iW = input:size(3)
local iH = input:size(2)
local oW = sampleSize[2]
local oH = sampleSize[2]
if (iW > oW) then
 w1 = math.ceil(torch.uniform(1e-2, iW-oW))
else
 w1 = 0
end
if (iH > oH) then
 h1 = math.ceil(torch.uniform(1e-2, iH-oH))
else
 h1 = 0
end
local out = image.crop(input, w1, h1, w1 + oW, h1 + oH)
assert(out:size(2) == oW)
assert(out:size(3) == oH)

Or to diff it:

diff --git a/data/donkey_folder.lua b/data/donkey_folder.lua
index 3a82393..5248f4e 100644
--- a/data/donkey_folder.lua
+++ b/data/donkey_folder.lua
@@ -52,17 +52,26 @@ local mean,std
 local trainHook = function(self, path)
    collectgarbage()
    local input = loadImage(path)
+
+   -- do random crop if fineSize/sampleSize is configured to be smaller than NN's input dimensions, loadSize
    local iW = input:size(3)
    local iH = input:size(2)
-
-   -- do random crop
-   local oW = sampleSize[2];
+   local oW = sampleSize[2]
    local oH = sampleSize[2]
-   local h1 = math.ceil(torch.uniform(1e-2, iH-oH))
-   local w1 = math.ceil(torch.uniform(1e-2, iW-oW))
+   if (iW > oW) then
+    w1 = math.ceil(torch.uniform(1e-2, iW-oW))
+   else
+    w1 = 0
+   end
+   if (iH > oH) then
+    h1 = math.ceil(torch.uniform(1e-2, iH-oH))
+   else
+    h1 = 0
+   end
    local out = image.crop(input, w1, h1, w1 + oW, h1 + oH)
    assert(out:size(2) == oW)
    assert(out:size(3) == oH)
+
    -- do hflip with probability 0.5
    if torch.uniform() > 0.5 then out = image.hflip(out); end
    out:mul(2):add(-1) -- make it [0, 1] -> [-1, 1]

This seems to work both in the 64x64px default version and the 128x128px fork, eg

$ nThreads=1 DATA_ROOT=myimages dataset=folder batchSize=2 loadSize=128 fineSize=128 nz=75 ngf=106 ndf=48 gpu=0 th main-128.lua
{
  ntrain : inf
  beta1 : 0.5
  name : "experiment1"
  niter : 25
  batchSize : 2
  ndf : 48
  fineSize : 128
  nz : 75
  loadSize : 128
  gpu : 0
  ngf : 106
  dataset : "folder"
  lr : 0.0002
  noise : "normal"
  nThreads : 1
  display_id : 10
  display : 1
}
Random Seed: 5143   
Starting donkey with id: 1 seed: 5144
table: 0x406af6b8
Loading train metadata from cache
Dataset: folder  Size:  442215  
Epoch: [1][       0 /   221107]  Time: 8.181  DataTime: 0.003    Err_G: 1.1998  Err_D: 1.1637   
Epoch: [1][       1 /   221107]  Time: 5.115  DataTime: 0.001    Err_G: 0.3660  Err_D: 1.5839   
Epoch: [1][       2 /   221107]  Time: 5.965  DataTime: 0.001    Err_G: 2.8597  Err_D: 1.7219   
Epoch: [1][       3 /   221107]  Time: 6.163  DataTime: 0.001    Err_G: 0.1956  Err_D: 2.2080   
Epoch: [1][       4 /   221107]  Time: 5.537  DataTime: 0.001    Err_G: 0.7360  Err_D: 1.9527   
Epoch: [1][       5 /   221107]  Time: 6.300  DataTime: 0.001    Err_G: 6.8542  Err_D: 3.6255   
...

And looking at the displayed training sample images in the display server, they don't look cropped like before. So although I haven't run anything to completion, I think that fix works.

Continue Training?

Hey there,

This seems like a really interesting library, nice work.

Currently training using my own set of images. # of total Epochs is 37.

Dumb question but how do I continue training, using a model but not overwriting or starting from scratch?

For example, in the torch-rnn library, the training script takes a 'init_from' parameter where you specify the last cached training t7 file.

Also, seems like a couple other people are using smaller data sets like me, e.g. 4000 images. I wondered if you had any tips for getting the best kind of results by tweaking the training parameters?

Possible reasons for similarity among generated images

Hi,

I have been playing around with the DCGAN architecture and I have a question on the similarity among the generated images.

I trained the network on a 140 dimensional vectors sampled from normal(0,1). The results were good after a few epochs, and they look varied too. The generator's output looks like the following:
25

I modified the above network to take a table of inputs (one 100 dimensional vector and one 40 dimensional vector - both sampled from normal(0,1)) through a parallel table and joined them to make a 140 dimensional vector. The following are some results:
output_2

The above two networks are essentially the same because the parameter learning happens only in layers following the join table in 2nd network, and this part of network architecture is the same for both. But the results are more varied in the first network, and there are lots of similar pictures in the 2nd output.

I have observed this on other datasets too. According to my understanding, after training the DCGAN, the generator learns a mapping from Z-vector space to images. Is there a possibility of the generator learning only a "certain set" of images (which are not necessarily in training set, so there's no overfitting) for the whole Z-vector distribution, and output only those images for various Z vector inputs? It would be great if anyone can shed some light on why this might happen.

Thanks!

Training seems to stop after the first epoch

Exactly as the title says, while training, the display updates every ten counter steps for the first epoch, but for every epoch following, only the real images update. The err_g also seems to change from being quite erractic in the first epoch, to mostly uniform in epochs > 1.

I didn't have this problem on my mac training with cpu, but on ubuntu training with either cpu or gpu this happens every time.

FWIW I'm working with a clean clone of the repo.

~/dcgan.torch$ gpu=1 display_id=40 DATA_ROOT=/media/aferriss/SHARED/myImages dataset=folder th main.lua

Full log: https://gist.github.com/soumith/d8861ada490c53ea666b

Handle file errors without crashing

Not all file corpuses are flawless; sometimes files are empty or the suffix doesn't match the format or they get deleted during the run etc. Since dcgan.torch assumes file reads will succeed without any problem, it will crash too if anything is amiss with any of the thousands or millions of files it may read.

This can be fixed by checking reading error status and skipping images that fail with a warning message (and additional verbose option to get the exact filename of the offending file, since getByClass doesn't propagate its randomly chosen file upwards).


FeepingCreature provided a patch implementing that in data/dataset.lua, which we've been using without any problem for several days now:

diff --git a/data/dataset.lua b/data/dataset.lua
index 0d39e27..a9d28eb 100644
--- a/data/dataset.lua
+++ b/data/dataset.lua
@@ -232,7 +232,6 @@ function dataset:__init(...)
       end
       runningIndex = runningIndex + length
    end
-
    --==========================================================================
    -- clean up temporary files
    print('Cleaning up temporary files')
@@ -313,6 +312,7 @@ end
 function dataset:getByClass(class)
    local index = math.ceil(torch.uniform() * self.classListSample[class]:nElement())
    local imgpath = ffi.string(torch.data(self.imagePath[self.classListSample[class][index]]))
+   if self.verbose then print('Image path: ' .. imgpath) end
    return self:sampleHookTrain(imgpath)
 end

@@ -322,7 +322,7 @@ local function tableToOutput(self, dataTable, scalarTable)
    local quantity = #scalarTable
    assert(dataTable[1]:dim() == 3)
    data = torch.Tensor(quantity,
-                      self.sampleSize[1], self.sampleSize[2], self.sampleSize[3])
+                       self.sampleSize[1], self.sampleSize[2], self.sampleSize[3])
    scalarLabels = torch.LongTensor(quantity):fill(-1111)
    for i=1,#dataTable do
       data[i]:copy(dataTable[i])
@@ -336,11 +336,15 @@ function dataset:sample(quantity)
    assert(quantity)
    local dataTable = {}
    local scalarTable = {}
-   for i=1,quantity do
+   while table.getn(dataTable)<quantity do
       local class = torch.random(1, #self.classes)
-      local out = self:getByClass(class)
-      table.insert(dataTable, out)
-      table.insert(scalarTable, class)
+      local success, out = pcall(function() return self:getByClass(class) end)
+      if success then
+         table.insert(dataTable, out)
+         table.insert(scalarTable, class)
+      else
+         print("failed to get an instance of "..class)
+      end
    end
    local data, scalarLabels = tableToOutput(self, dataTable, scalarTable)
    return data, scalarLabels

The problem I faced when the network is trained. I don't know whether it is the problem of torch install.

{
fineSize : 64
dataset : "folder"
batchSize : 64
nThreads : 4
noise : "normal"
niter : 25
nz : 100
gpu : 1
name : "experiment1"
display_id : 10
display : 1
lr : 0.0002
ngf : 64
ndf : 64
beta1 : 0.5
loadSize : 96
ntrain : inf
}
Random Seed: 2937
Starting donkey with id: 4 seed: 2941
table: 0x87f62f0
Starting donkey with id: 3 seed: 2940
table: 0xb3e9f4c0
Starting donkey with id: 1 seed: 2938
table: 0xb3d8a240
Starting donkey with id: 2 seed: 2939
table: 0xb3c42e78
Creating train metadata
table: 0x89ee6d8
Creating train metadata
table: 0xb38abcb8
Creating train metadata
table: 0xb342b968
Creating train metadata
table: 0xb3c9de38
/mnt/hgfs/vmpublic/DCGAN/torch/install/bin/lua: ...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:183: [thread 4 callback] .../hgfs/vmpublic/DCGAN/dcgan.torch-master/data/dataset.lua:139: attempt to index global 'jit' (a nil value)
stack traceback:
.../hgfs/vmpublic/DCGAN/dcgan.torch-master/data/dataset.lua:139: in function '__init'
...mpublic/DCGAN/torch/install/share/lua/5.2/torch/init.lua:91: in function <...mpublic/DCGAN/torch/install/share/lua/5.2/torch/init.lua:87>
[C]: in function 'dataLoader'
...vmpublic/DCGAN/dcgan.torch-master/data/donkey_folder.lua:82: in main chunk
[C]: in function 'dofile'
...mpublic/DCGAN/torch/install/share/lua/5.2/paths/init.lua:84: in function 'dofile'
/mnt/hgfs/vmpublic/DCGAN/dcgan.torch-master/data/data.lua:42: in function </mnt/hgfs/vmpublic/DCGAN/dcgan.torch-master/data/data.lua:32>
(...tail calls...)
[C]: in function 'xpcall'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:234: in function 'callback'
...blic/DCGAN/torch/install/share/lua/5.2/threads/queue.lua:65: in function <...blic/DCGAN/torch/install/share/lua/5.2/threads/queue.lua:41>
[C]: in function 'pcall'
...blic/DCGAN/torch/install/share/lua/5.2/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:183: in function 'dojob'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:264: in function 'synchronize'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:142: in function 'specific'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:125: in function <...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:36>
(...tail calls...)
/mnt/hgfs/vmpublic/DCGAN/dcgan.torch-master/data/data.lua:30: in function 'new'
main.lua:38: in main chunk
[C]: in function 'dofile'
...CGAN/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: in ?

Error when running main.lua with gpu=1

Like the title says, I get the error below when running main.lua. I definitely have cuDNN installed (works with Theano just fine), and the code works with gpu=0. Any ideas?

$ DATA_ROOT=myimages dataset=folder th main.lua
{
  ntrain : inf
  beta1 : 0.5
  name : "experiment1"
  niter : 25
  batchSize : 64
  ndf : 64
  fineSize : 64
  nz : 100
  loadSize : 96
  gpu : 1
  ngf : 64
  dataset : "folder"
  lr : 0.0002
  noise : "normal"
  nThreads : 4
  display_id : 10
  display : 1
}
Random Seed: 2675   
Starting donkey with id: 4 seed: 2679
table: 0x0fa4c738
Starting donkey with id: 1 seed: 2676
table: 0x0fa6c4d8
Starting donkey with id: 3 seed: 2678
table: 0x0fa8c110
Starting donkey with id: 2 seed: 2677
table: 0x0facc670
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Dataset: folder  Size:  498 
/Users/adamferriss/torch/install/bin/luajit: ...adamferriss/torch/install/share/lua/5.1/nn/LeakyReLU.lua:24: attempt to call field 'LeakyReLU_updateOutput' (a nil value)
stack traceback:
    ...adamferriss/torch/install/share/lua/5.1/nn/LeakyReLU.lua:24: in function 'updateOutput'
    ...damferriss/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    main.lua:160: in function 'opfunc'
    ...s/adamferriss/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'
    main.lua:214: in main chunk
    [C]: in function 'dofile'
    ...riss/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x010ef19770

Visualizing Filters

How would one go about visualizing the filters of this network? I tried doing something like

net = util.load(opt.net, opt.gpu)
filters = net:get(13).weight
image.save("filters.png", image.toDisplayTensor{input=filters})

The output is super tiny on this layer though, 24x44 pixels.
2016_02_16_09 09 08

How can I visualize more of the learned filters from other layers of the network? It looks like the conv layers are 1, 4, 7, 10, and 13.

Sampling from the other layers seems like it just outputs noise. Also looks like most of the other examples for visualizing conv layer filters are using the itorch image function, and I'd like to do it outside the interactive notebook if possible.

minmax or maxmin

I am a bit confused about if there is any difference between the formulations of min_G max_D or max_D min_G in GAN?
or these two will be actually equal after iterative updating of G and D?

Use of sampleHookTest

Hi,

The description in the code says that it is applied during testing. Can anyone please clarify what test setting it means?

Thanks!

Selecting Best Generative Model

Hello, thanks a lot for helping the community with your code!

I'm training the GAN with ~4000 grey images of faces for 250 epochs and saving the network every 10 epochs. I am, however, having trouble figuring out how to select the best network that I should use for generating new images.

Would the sum of all errG within one epoch be a good score for how good the generative network is performing at that point during training?

Thanks a lot again!

Forward pass: on 1 vector vs on a batch of vectors

Hi,

After training the network with Celeb faces dataset, a forward pass on 10 noise vectors into the generator gives decent results like below:
output_1
But when I passed same vector replicated as a batch of 10 to the trained generator it gave the following:
output_2_same_normal
It looks like low frequencies are not present. Similar results were obtained when a single vector was passed.
I think I am missing something. Is the forward pass on trained generator affected by the number and kind of Z vectors passed?

Any insight is appreciated. Thanks!

How to visual the discrimination features?

the paper "unsupervised representation learning with deep convolutional generative adversarial networks" introduce the result of visualizing the discriminator features.And how to do it using torch or tensorflow ?

welcome to talk with me.

could not find any image file in the given input paths

$:/dcgan.torch$ ls
arithmetic.lua cache cohnExtent data generate.lua images INSTALL.md LICENSE.md main.lua PATENTS README.md
$:
/dcgan.torch$ DATA_ROOT=cohnExtent dataset=folder th main.lua

In root directory, I make the file path cohnExtent and put some images in the directory. But When I run the command, the error like:
$/torch/install/bin/luajit: ...e/$/torch/install/share/lua/5.1/threads/threads.lua:255:
[thread 2 callback] $/dcgan.torch/data/dataset.lua:202: Could not find any image file in the given input paths
[thread 4 callback] $/dcgan.torch/data/dataset.lua:202: Could not find any image file in the given input paths
[thread 1 callback] $/dcgan.torch/data/dataset.lua:202: Could not find any image file in the given input paths
[thread 3 callback]$/dcgan.torch/data/dataset.lua:202: Could not find any image file in the given input paths

Wrong parameters for updateGradInput

Hi,

I'm attempting to use the GAN framework in a different setting than images and I was using this code as a reference. I noticed something odd: When we call optim.adam on fDx, the parameters in netD get updated, but we use the output with respect to the original parameters when we call it on fGx. Shouldn't we call forward on netD inside fGx rather than recycle the previous output so that both output and gradInput are computed with respect to the same parameters?

Thanks,
Shawn

unknown Torch class <torch.CudaTensor>

Hey - having some issues getting this running, would greatly appreciate some other eyes on this as I'm still young in my deep learning understanding.

It appears that I'm successfully able to generate checkpoints using
DATA_ROOT=500 dataset=folder gpu=1 th main.lua

I'm running a GTX 1080 with CUDA 8.0, and I get a folder full of checkpoints.

But when I try to run generate.lua, I get
"unknown Torch class <torch.CudaTensor>" errors.

Any thoughts? Here is what I'm seeing. Any tips would be majorly appreciated!

gpu=1 net=checkpoints/experiment1_10_net_G.t7 th generate.lua
{
  gpu : 1
  noisemode : "random"
  name : "generation1"
  noisetype : "normal"
  batchSize : 32
  net : "checkpoints/experiment1_10_net_G.t7"
  imsize : 1
  nz : 100
  display : 1
}
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    /root/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /root/torch/install/share/lua/5.1/nn/Module.lua:158: in function 'read'
    /root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
    generate.lua:24: in main chunk
    [C]: in function 'dofile'
    /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.