newmu / dcgan_code Goto Github PK
View Code? Open in Web Editor NEWDeep Convolutional Generative Adversarial Networks
License: MIT License
Deep Convolutional Generative Adversarial Networks
License: MIT License
Awesome project!!
It would be great if you could specify a license for your code by adding a LICENSE
file to the root of the git repo. If you're unfamiliar with source code licensing, check out http://choosealicense.com/
(shameless plug -- I'm a fan of the "GPL, version 2 or later" license because, in the terms of http://choosealicense.com/ I "care about sharing improvements".)
what is the best way to make this work effectively on binary data?
i'm assuming the network won't learn to output binary on its own, even if all the inputs are binary. trivially you can just threshold each output value at 0.5, but is there a better way to do this? i'm hoping to take advantage of having only two states to get away with using a larger input vector.
How can I add my picture in Arithmetic on faces in part three
Hi, recently I've been studying your code, especially on the conditional DCGAN you made for MNIST dataset.
I see that you concatenated the condition on every layer right after BatchNorm and ReLu, but I still get puzzled with the conv_cond_concat
function that you use to concat the condition into hidden layer. On some layer, you simply use T.concatenate
to join them, but on the other layer, you join them using conv_cond_concat
function as described below
def conv_cond_concat(x, y):
"""
concatenate conditioning vector on feature map axis
"""
return T.concatenate([x, y*T.ones((x.shape[0], y.shape[1], x.shape[2], x.shape[3]))], axis=1)
My questions are,
T.concatenate
?y
, I assume you are depth-concatenating it. Am I correct?In the mnist training code, i get the following traceback. looks like a CUDA versioning issue? do i need to upgrade CUDA or is there some way around this?
gX = gen(Z, Y, *gen_params)
Traceback (most recent call last):
File "", line 1, in
File "", line 9, in gen
File "lib/ops.py", line 90, in deconv
img = gpu_contiguous(X)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 509, in call
node = self.make_node(_inputs, *_kwargs)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda/basic_ops.py", line 3806, in make_node
input = as_cuda_ndarray_variable(input)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda/basic_ops.py", line 47, in as_cuda_ndarray_variable
return gpu_from_host(tensor_x)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 509, in call
node = self.make_node(_inputs, *_kwargs)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda/basic_ops.py", line 139, in make_node
dtype=x.dtype)()])
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda/type.py", line 70, in init
(self.class.name, dtype, name))
TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype float64 for variable None
I can't find the code to download the data.
I need help . thanks
It would be nice if the README will contain some tips on how to detect and avoid overfitting.
I'm running the system and I believe I am overfitting.
So here are my tips which need editing before I put them in any README because I'm sure I dont understand the entire picture:
0 55.06 53.97 53.16 4.0025 0.2856
1 58.91 57.44 56.40 4.7697 0.7055
2 57.16 54.62 52.88 1.7402 0.4073
3 58.61 55.97 54.14 2.1908 0.3683
4 57.77 54.55 52.74 2.6172 0.3062
5 53.55 51.07 49.17 3.7945 0.1111
6 56.28 52.57 50.30 5.4140 0.1525
7 57.81 53.84 51.49 5.6486 0.1883
8 56.94 53.39 51.10 6.3922 0.0688
9 59.04 55.49 53.08 3.5038 0.3072
10 55.08 51.79 49.73 4.6309 0.0904
11 56.03 52.80 50.83 3.6019 0.1094
12 55.67 52.30 50.22 5.2213 0.3286
13 55.65 52.55 50.65 4.2390 0.2232
this will allow usage of fuel built in data-sets
HI, new to python so I'm just poking around and the initial setup could use some help in the form of a requirements.txt file. Would love a pip freeze > requirements.txt
if you've got a virtual env with the right deps.
Awesome project.
I'm trying to create my own dataset. What's the order of the dimensions of the input tensor? Is it (batch_size, channels, height, width), (batch_size, height, width, channels), (batch_size, height * width * channels, 1), or something different? I'm having a lot of trouble creating the hdf5 file. Anyone else have any luck?
Hello, thank you very much for making public this great project.
I was going through your code, and ran into a point that was not quiet clear to me.
in "deconv" functino in lib/ops.py, line 92 and 95, you put the part that calculates gradient wrt input. I checked the counterpart of Torch version, and it was implemented using regular convolution layer there.
Why is this GpuDnnConvGradI used?
Thank you again for this great source code.
-Taeksoo
Would be a great help if you could link the album cover art dataset you used to train the network on. Thanks very much
I'm studying this code and would like to know if my understanding is correct.
Generator:
layer | gifn | gain_ifn | bias_ifn |
---|---|---|---|
1 | (100, 128 * 8 * 4 * 4) | 128 * 8 * 4 * 4 | 128 * 8 * 4 * 4 |
2 | (1024, 512, 5, 5) | 512 | 512 |
3 | (512, 256, 5, 5) | 256 | 256 |
4 | (256, 128, 5, 5) | 128 | 128 |
5 | (128, 64, 5, 5) | 64 | 64 |
6 | (64, 32, 5, 5) | 32 | 32 |
output | (32, 3, 5, 5) | ---- | ---- |
Discriminator:
layer | gifn | gain_ifn | bias_ifn |
---|---|---|---|
1 | (32, 3, 5, 5) | ---- | ---- |
2 | (64, 32, 5, 5) | 64 | 64 |
3 | (128, 64, 5, 5) | 128 | 128 |
4 | (256, 128, 5, 5) | 256 | 256 |
5 | (512, 256, 5, 5) | 512 | 512 |
6 | (1024, 512, 5, 5) | 1024 | 1024 |
output | (128 * 8 * 4 * 4, 1) | ---- | ---- |
Are these dimensions correct for the ImageNet model?
Bug: There is no source code for the model.
Expected Result: For source code to be released, since the project is titled dgcan_code
:)
How to reproduce:
michael@halifax ~> git clone https://github.com/Newmu/dcgan_code
Cloning into 'dcgan_code'...
remote: Counting objects: 46, done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 46 (delta 5), reused 45 (delta 4), pack-reused 0
Unpacking objects: 100% (46/46), done.
Checking connectivity... done.
michael@halifax ~> cd dcgan_code/
michael@halifax ~/dcgan_code> find . -iname '*.lua' | wc -l
0
Is there a schedule to release any code? Many thanks.
It's a amazing work.
Could you tell me how to train it on my own datasets?
In looking at manifolds in my own experiments, I have noticed a consistent "dead zone" near the origin of the latent space. Here is an example generated with faces/train_uncond_dcgan.py
and z=100:
I can post the math later, but suffice to say that the area near the center of the image is proportionally near zero in all z dimensions.
My strong suspicion is that this could be replicated by replacing this line in train_uncond_dcgan.py
:
sample_zmb = floatX(np_rng.uniform(-1., 1., size=(nvis, nz)))
with
sample_zmb = floatX(np_rng.uniform(-0.1., 0.1., size=(nvis, nz)))
and seeing if this results in poor quality samples. I can followup and try this if that is useful - I haven't done so yet because I need to first implement the load
operation to use one of the models that is being saved each epoch.
This isn't causing me any consternation, but I thought I would mention it since it's an unexpected curiosity and so might be a bug or might just be something I don't understand about the nature of this latent space.
Would it be possible to upload the parameters for the model trained on faces?
In the same manner that the parameters for imagenet and svhn have been uploaded?
Thanks!
i'm getting an import error on the gpu_alloc_empty in the following:
from theano.sandbox.cuda.basic_ops import (as_cuda_ndarray_variable,
host_from_gpu,
gpu_contiguous, HostFromGpu,
gpu_alloc_empty)
the other modules by themselves work fine, just gpu_alloc_empty fails. i have cudnn installed and just reinstalled Theano with pip.
i don't have gpu, how can i use it with cpu. And i also want to know which part of your code is about "discriminator as a pre-trained net for CIFAR-10 classification ". Thanks
@Newmu can you help me with sample code how to make this part?. I ask about the features of size 28672 how can i get this size and how to get features from it to every image.
Thanks in advance.
The code from the load.py is generating error of "total size of new array must be unchanged". It just loads the mnist dataset to an array and then reshaping it. The error occurs while reshaping it (in 3rd line) and it is shown below:
fd = open('C:\\Users\\***\\Desktop\\MNIST Dataset\\train-images.idx3-ubyte.gz')
loaded = np.fromfile(file=fd,dtype=np.uint8)
---> trX = loaded[16:].reshape((60000,28*28)).astype(float)
ValueError: total size of new array must be unchanged
I know what the function of reshape is doing. I just didn't figure it out that how to resolve this error. I have tried many things but didn't work in my favour. Can anyone suggest me any solution?
Hello guys!
I got this error when I run the faces file train_uncond_dcgan.py
and this is it's Traceback.....
"
Traceback (most recent call last):
File "train_uncond_dcgan.py", line 52, in
tr_data, te_data, tr_stream, val_stream, te_stream = faces(ntrain=ntrain)
File "/home/jerry/dcgan_code/faces/load.py", line 14, in faces
te_data = H5PYDataset(path, which_sets=('test',))
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/datasets/hdf5.py", line 197, in init
self.num_examples)
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/datasets/hdf5.py", line 504, in num_examples
return self.subsets[0].num_examples
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/utils/init.py", line 441, in lazy_property_getter
self.load()
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/datasets/hdf5.py", line 465, in load
subsets = self.get_subsets(handle, self.which_sets, self.sources)
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/datasets/hdf5.py", line 448, in get_subsets
slice(row['start'], row['stop']), len(h5file[source]))
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/utils/init.py", line 53, in init
self._subset_sanity_check(list_or_slice, original_num_examples)
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/utils/init.py", line 313, in _subset_sanity_check
self._slice_subset_sanity_check(list_or_slice, num_examples)
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/utils/init.py", line 338, in _slice_subset_sanity_check
raise ValueError('Subset instances cannot be defined by a slice '
ValueError: Subset instances cannot be defined by a slice whose start value is greater than or equal to the original number of examples
"
I don't understand what kind of this problem....
Can anybody help me to solve it? PLEASE!!!
My hdf5 file is built by 128 pics 64643 faces jpeg photos......
Hi, after playing with your code and getting very good results with it as it is, I am now looking to try other architectures and modify it. I want to try a model where there is no 0 padding and border_mode="valid" , instead of the current (2,2)
I have modified the deconv function from lib/ops.py to have the proper dimensions for the valid border mode, now called deconvV :
def deconvV(X, w, subsample=(1, 1), border_mode=(0, 0), conv_mode='conv'):
img = gpu_contiguous(X)
kerns = gpu_contiguous(w)
desc = GpuDnnConvDesc(border_mode=border_mode, subsample=subsample,
conv_mode=conv_mode)(gpu_alloc_empty(img.shape[0], kerns.shape[1], (img.shape[2]-1)_subsample[0]+kerns.shape[2], (img.shape[3]-1)_subsample[1]+kerns.shape[3]).shape, kerns.shape)
out = gpu_alloc_empty(img.shape[0], kerns.shape[1], (img.shape[2]-1)_subsample[0]+kerns.shape[2], (img.shape[3]-1)_subsample[1]+kerns.shape[3])
d_img = GpuDnnConvGradI()(kerns, img, out, desc)
return d_img
The code works with this operation, but I wonder whether it is correct. The GpuDnnConvGradI and GpuDnnConvDesc do not give an error even if I give some other values for the sizes, so I can be never sure whether I have a bug there or not.
I have also replaced the respective border mode in both the discriminator and generator, and taken care of all model dimensions so that it works. However, it runs for many iterations, sometimes looks as it learned something, but afterwards it just produces noise. Can it be I have made some error in my implementation of the deconv dimensions?
Or another explanation can be that the thus resulting architecture (without 0 padding) is much more unstable with respect to the collapse of the generator, as described by Tim Salimans in "Improved Techniques for Training GANs"
thanks a lot for your help
I see this in the ops.py
def conv_cond_concat(x, y): """ concatenate conditioning vector on feature map axis """ return T.concatenate([x, y*T.ones((x.shape[0], y.shape[1], x.shape[2], x.shape[3]))], axis=1)
does it really mean y.shape[1], and not x.shape[1]?
The code from the load.py is generating error of "total size of new array must be unchanged". It just loads the mnist dataset to an array and then reshaping it. The error occurs while reshaping it (in 3rd line) and it is shown below:
fd = open('C:\\Users\\***\\Desktop\\MNIST Dataset\\train-images.idx3-ubyte.gz')
loaded = np.fromfile(file=fd,dtype=np.uint8)
---> trX = loaded[16:].reshape((60000,28*28)).astype(float)
ValueError: total size of new array must be unchanged
I know what the function of reshape is doing. I just didn't figure it out that how to resolve this error. I have tried many things but didn't work in my favour. Can anyone suggest me any solution?
Given a set of pictures, it's unclear how to create a correctly formatted dataset. I understand if the pictures used cannot be made available, but could you please share whatever script or method you used to create the hdf5 dataset used as an input?
Thanks.
I trained dcgan on my own dataset,
and now i want to use this net for image retrieval
now i have the generator network, that takes a length 100 encoding and transform it to an image
is there a simple way to reverse this process, so that i can give an image and get a length 100 encoding?
i tried to reverse the net like so
def gen_inv(X, w, g, b, w2, g2, b2, w3, g3, b3, w4, g4, b4, wx):
x = dnn_conv(X, wx, subsample=(2, 2), border_mode=(2, 2))
x = relu(batchnorm(x, g=g4, b=b4))
x = dnn_conv(x, w4, subsample=(2, 2), border_mode=(2, 2))
h3 = relu(batchnorm(x, g=g3, b=b3))
h3 = dnn_conv(h3, w3, subsample=(2, 2), border_mode=(2, 2))
h2 = relu(batchnorm(h3, g=g2, b=b2))
h2 = dnn_conv(h2, w2, subsample=(2, 2), border_mode=(2, 2))
h2 = T.flatten(h2,2)
h = relu(batchnorm(h2, g=g, b=b))
h = T.dot(h, w.T)
return h
but got that for no matter what the input is, i get the same output
so i figured that batchnorm is in training mode, so i tried to remove batchnorm as well
but still got very weird results
any advice?
cheers,
SH
This is more of a doubt than an issue.
Why you don't add batch norm to the last layer of your discriminator and generator?
Hi,
I'm interested in the contents which is in section 6.2 VISUALIZING THE DISCRIMINATOR FEATURES of DCGAN paper.
I'm not sure I could understanding this part but, I failed to implement it.
I refered to #13 of the following page
https://github.com/Hvass-Labs/TensorFlow-Tutorials
please give me some tip or implementation code (doesn't matter at any code)
thanks in advance
Hi Alec,
I read from the Indico blog that you guys use docker a lot, just wondering if you have a docker copy of DCGAN handy to share? Thanks! Really cool work by the way!
Xinxin
referring to https://github.com/Newmu/dcgan_code/blob/master/faces/train_uncond_dcgan.py#L118
I tried adding my own bias just before the lrelu, but it is hard to tell if it helps:
b = bias_ifn((ndf), 'db')
...
h = lrelu(dnn_conv(X, w, subsample=(2, 2), border_mode=(2, 2))+b.dimshuffle('x', 0, 'x', 'x'))
Could someone tell me how to generate a single sample? For a single input, batchnorm changes it to [0, 0, 0...], so generated images are always the same no matter what the input is.
I am using the DCGAN code and pretty happy with the results. However, I am curious, should one not treat the Batch Normalization operation in a special way when doing inference (after training is completed) ?
i) the original BatchNorm paper mentions that we need to freeze the mean and variance when doing inference with the model https://arxiv.org/pdf/1502.03167v3.pdf , algorithm 2
ii) the DCGAN does not use this fixing of the statistics of the batch, so when we generate new samples with the _gen function it seem we calculate on the fly the batch norm statistics. This still works and produces nice images, to my surprise
iii) now here is a case when it does not work: start with a black image X and optimize it with respect to the discriminator function to make it close to the "true" images. With few iterations of gradient descent I can get an X image which is predicted as 1 (true), but it looks pretty much also black. So basically, the discriminator seems to be pretty bad in that case, even though the images I can generate are quite good. My guess would be that the batch normalization fails in that case, since the statistics of the single black image are totally different than the statistics of a proper random minibatch.
iv) has anyone implemented a fixing of the mini batch parameters for inference, as advocated in the original paper? This might be an useful option for the DCGAN code.
v) as next experiment, I will try to remove batch normalization and train without it, and than see whether my black image experiment will work correctly
if anyone has more insights about the use of batch normalization in the DCGAN it will be really helpful to discuss that, or to get the code for a simple modification of DCGAN in order to use fixed batch normalization operation when doing inference.
thanks a lot
Nikolay
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.