I tried to run the train.py to train srgan. But the program terminates since there is

Running train.py, fail to allocate memory. about srgan HOT 7 OPEN

shensq0814 commented on August 16, 2024

Running train.py, fail to allocate memory.

from srgan.

Comments (7)

tadax commented on August 16, 2024

I think 2 GB is enough. I tried to limit memory usage by using tf.ConfigProto and it run (batch size = 8, memory consumption = 1833 MB).

Are you using cuDNN v5.1?

from srgan.

shensq0814 commented on August 16, 2024

Yes, CUDA 8.0 with cuDNN 5.1. The available memory on my computer is about 1 3GB.

I notice that you use all the features in the VGG, which is different from the orginal paper. Could it be the reason why the model need that much memory?

from srgan.

tadax commented on August 16, 2024

The available memory on my computer is about 1 3GB.

1.3 GB?

Could it be the reason why the model need that much memory?

I think SRGAN needs much memory as it builds Generator (ResNet), Discrimitator, and VGG19.

As you said, it might have an effect on reducing memory usage.
Modify inference_content_loss as follows:

def inference_content_loss(x, imitation):
    _, x_phi = self.vgg.build_model(
        x, tf.constant(False), False)
    _, imitation_phi = self.vgg.build_model(
        imitation, tf.constant(False), True)
   content_loss = tf.nn.l2_loss(x_phi[4] - imitation_phi[4]) # phi54
   return tf.reduce_mean(content_loss)

from srgan.

shensq0814 commented on August 16, 2024

I've installed the environment needed on another computer with enough memory.
However I get another error when the first epoch finished.

Caused by op 'generator/deconv1/conv2d_transpose', defined at:
File "train.py", line 95, in
train()
File "train.py", line 18, in train
model = SRGAN(x, is_training, batch_size)
File "/home/min/ssq/srgan/src/srgan.py", line 14, in init
self.imitation = self.generator(self.downscaled, is_training, False)
File "/home/min/ssq/srgan/src/srgan.py", line 25, in generator
x, [3, 3, 64, 3], [self.batch_size, 24, 24, 64], 1)
File "../utils/layer.py", line 43, in deconv_layer
strides=[1, stride, stride, 1])
File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1104, in conv2d_transpose
name=name)
File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 496, in conv2d_backprop_input
data_format=data_format, name=name)
File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Conv2DSlowBackpropInput: input and out_backprop must have the same batch size

from srgan.

tadax commented on August 16, 2024

Fix on line 45 of src/train.py

True:
n_iter = int(len(x_train) / batch_size)

False:
n_iter = int(np.ceil(len(x_train) / batch_size))

from srgan.

shensq0814 commented on August 16, 2024

The implementation of your generator seems different from the paper where only last two layers are deconvolution layers(they changed into sub-pixel CNN recently). You used deconv_layer in all of the residual blocks. Is that a mistake or you intended to?

from srgan.

jzrita commented on August 16, 2024

Hi, Tadax, yes I have the same concern as @Doodleyard . Although in the CVPR paper the final published generator network is different from their arXiv version, from your code is neither of them. Do you mind to give us some hints? thank you.

from srgan.

Running train.py, fail to allocate memory. about srgan HOT 7 OPEN

Comments (7)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent