carpedm20 / discogan-pytorch Goto Github PK

PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

License: Apache License 2.0

Jupyter Notebook 97.18% Python 2.79% Shell 0.03%

gan generative-model unsupervised-learning pytorch

discogan-pytorch's Issues

'unexpected key "module.layer_module.0.weight" in state_dict'

when I test my model, the problem 'unexpected key "module.layer_module.0.weight" in state_dict' happened. Did you meet the problem?

Width and Height of images needs to be identical

Looks like you crash if width and height isn't equal on custom datasets. I worked around by forcing my 640x360 images to 640x640.

0%| | 1/50000 [00:03<55:08:42, 3.97s/it]C:\Users\Shadow\Anaconda3\lib\site-packages\torch\nn\modules\loss.py:512: UserWarning: Using a target size (torch.Size([200])) that is different to the input size (torch.Size([200, 1])) is deprecated. Please ensure they have the same size.

Multi GPU

Thanks for the PyTorch implementation of DiscoGAN.
I am having troubles running it with multiple GPUs though.
I think the first problem is in the config.py file, where it casts the num_gpu argument to a bool instead of an int.

DiscoGAN-pytorch/config.py

Line 53 in 2feae82

misc_arg.add_argument('--num_gpu', type=str2bool, default=1)

But even fixing that, the code will only run when --num_gpu=1 and not when --num_gpu=2.
Here's the traceback error:

Error:

Traceback (most recent call last):
  File "main.py", line 41, in <module>
    main(config)
  File "main.py", line 33, in main
    trainer.train()
  File "/home/***/Documents/DiscoGAN-pytorch-master/trainer.py", line 187, in train
    x_AB = self.G_AB(x_A).detach()
  File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/***/Documents/DiscoGAN-pytorch-master/models.py", line 45, in forward
    return nn.parallel.data_parallel(self.main, x, gpu_ids)
  File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 101, in data_parallel
    replicas = replicate(module, device_ids[:len(inputs)])
  File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/parallel/replicate.py", line 10, in replicate
    params = list(network.parameters())
AttributeError: 'function' object has no attribute 'parameters'

Any idea what could be the cause?

[!] Sampled dataset from A and B have different # of data. Try resampling...

I get the following warning message;

[!] Sampled dataset from A and B have different # of data. Try resampling...

From what I understand it is not necessary that both data sets are equal in size?
Both datasets have pictures with 256x256 pixels.

How to increase the size of image while training?

It downsizes my dataset image to the height of 68px while training

It seems that the discriminator cannot adapt to the size of the input image.

Once I set '-- input_scale_size' not to 64, an error will be reported. It seems that the discriminator cannot adapt to the size of the input image. How did you solve the problem?
"ValueError: Target and input must have the same number of elements. target nelement (2) != input nelement (338)"

Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 169])) is deprecated. Please ensure they have the same size.

(base) D:\DiscoGAN>python main.py --dataset=siys2simk --input_scale_size 256 --batch_size 4 --a_grayscale True --b_grayscale True --num_worker 1 --num_gpu=0
[*] MODEL dir: ./logs\siys2simk_2022-08-26_11-17-52
[*] PARAM path: ./logs\siys2simk_2022-08-26_11-17-52\params.json
  0%|                                                                                                                                                                                         | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "D:\DiscoGAN\main.py", line 41, in <module>
    main(config)
  File "D:\DiscoGAN\main.py", line 33, in main
    trainer.train()
  File "D:\DiscoGAN\trainer.py", line 200, in train
    l_d_A_real, l_d_A_fake = bce(self.D_A(x_A).squeeze(1), real_tensor), bce(self.D_A(x_BA).squeeze(1), fake_tensor)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\loss.py", line 613, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 3074, in binary_cross_entropy
    raise ValueError(
ValueError: Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 169])) is deprecated. Please ensure they have the same size.

Hao lihai!

Cuda runtime runtime error(8)

class Generator(nn.Module):

def __init__(self, input_size, output_size, hidden_dims):

    super(Generator, self).__init__()

    self.layers = []

    

    prev_dim = input_size

    for hidden_dim in hidden_dims:

        self.layers.append(nn.Linear(prev_dim, hidden_dim))

        self.layers.append(nn.ReLU(True))

        prev_dim = hidden_dim

    self.layers.append(nn.Linear(prev_dim, output_size))

    

    self.layer_module = ListModule(*self.layers)

    

def forward(self, x):

    out = x

    for layer in self.layers:

        out = layer(out)

    return out

class Discriminator(nn.Module):

def __init__(self, input_size, output_size, hidden_dims):

    super(Discriminator, self).__init__()

    self.layers = []

    

    prev_dim = input_size

    for idx, hidden_dim in enumerate(hidden_dims):

        self.layers.append(nn.Linear(prev_dim, hidden_dim))

        self.layers.append(nn.ReLU(True))

        prev_dim = hidden_dim

        

    self.layers.append(nn.Linear(prev_dim, output_size))

    self.layers.append(nn.Sigmoid())

    

    self.layer_module = ListModule(*self.layers)

def forward(self, x):

    out = x

    for layer in self.layers:

        out = layer(out)

    return out.view(-1, 1)

network

hidden_dim = 128

g_num_layer = 3

d_num_layer = 5

G_AB = Generator(2, 2, [hidden_dim] * g_num_layer)

G_BA = Generator(2, 2, [hidden_dim] * g_num_layer)

D_A = Discriminator(2, 1, [hidden_dim] * d_num_layer)

D_B = Discriminator(2, 1, [hidden_dim] * d_num_layer)

G_AB.cuda()

G_BA.cuda()

D_A.cuda()

D_B.cuda()

optimizer

lr = 0.0002

beta1 = 0.5

beta2 = 0.999

d = nn.MSELoss()

bce = nn.BCELoss()

optimizer_d = torch.optim.Adam(

chain(D_A.parameters(), D_B.parameters()), lr=lr, betas=(beta1, beta2))

optimizer_g = torch.optim.Adam(

chain(G_AB.parameters(), G_BA.parameters()), lr=lr, betas=(beta1, beta2))

training

num_epoch = 50000

real_label = 1

fake_label = 0

real_tensor = Variable(torch.FloatTensor(batch_size).cuda())

_ = real_tensor.data.fill_(real_label)

print(real_tensor.sum())

fake_tensor = Variable(torch.FloatTensor(batch_size).cuda())

_ = fake_tensor.data.fill_(fake_label)

print(fake_tensor.sum())

RuntimeError Traceback (most recent call last)
in ()
77
78 real_tensor = Variable(torch.FloatTensor(batch_size).cuda())
---> 79 _ = real_tensor.data.fill_(real_label)
80 print(real_tensor.sum())
81

RuntimeError: cuda runtime error (8) : invalid device function at /py/conda-bld/pytorch_1493677666423/work/torch/lib/THC/generic/THCTensorMath.cu:15

DiscoGAN paper said that they don't need paired data unlike Conditional GAN. but...

DiscoGAN paper they assert that
"to avoid costly pairing, we address
the task of discovering cross-domain relations
given unpaired data"
But, I don't know why this implementation of DiscoGAN requires paired data?
(as far as I know/see, in the dataset downloaded )
Or, is it just to simplify the input process?

How to train it with CPU?

As of now I don't have any GPU to train.

Pre-trained models

Hi there,

Do you have any pre-trained model for the Cityscapes example that could be made available online?

Thanks

Inconsistent tensor sizes error with own data

I tried out my own dataset in data/mydata (with the A and B folders) but I get the following error:

~/DiscoGAN-pytorch$ python main.py --dataset=mydata --num_gpu=1
[*] MODEL dir: logs/mydata_2017-03-21_14-49-25
[*] PARAM path: logs/mydata_2017-03-21_14-49-25/params.json
Traceback (most recent call last):
  File "main.py", line 41, in <module>
    main(config)
  File "main.py", line 33, in main
    trainer.train()
  File "/home/bart/DiscoGAN-pytorch/trainer.py", line 161, in train
    valid_x_A, valid_x_B = self._get_variable(A_loader.next()), self._get_variable(B_loader.next())
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 174, in __next__
    return self._process_next_batch(batch)
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 32, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 68, in default_collate
    return torch.stack(batch, 0)
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/functional.py", line 56, in stack
    return torch.cat(list(t.unsqueeze(dim) for t in sequence), dim)
RuntimeError: inconsistent tensor sizes at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.10_1488755368782/work/torch/lib/TH/generic/THTensorMath.c:2548

Test The model on a single image

Hello,
please can you show how to test the final model on a single image
Thank you
Lafi

CUDA error running on multiple GPUs: torch.cuda.nccl.NcclError: System Error (2)

Hi,

I've been trying to run the example code (on the maps dataset):

python main.py --dataset=maps --num_gpu=4

I get the error below related to the NCCL library. I'm trying to run this on 4 K80 GPUs.

Any suggestions on what could be causing this and what a solution could be?

pix2pix processing: 100%|#######################| 1096/1096 [00:00<00:00, 178591.97it/s]
pix2pix processing: 100%|#######################| 1096/1096 [00:00<00:00, 213732.43it/s]
[] MODEL dir: logs/maps_2017-10-26_20-36-34
[] PARAM path: logs/maps_2017-10-26_20-36-34/params.json
0%| | 0/500000 [00:00<?, ?it/s]

Traceback (most recent call last):
File "main.py", line 41, in
main(config)
File "main.py", line 33, in main
trainer.train()
File "/home/nbserver/DiscoGAN-pytorch/trainer.py", line 193, in train
x_AB = self.G_AB(x_A).detach()
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in
call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line
59, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line
64, in replicate
return replicate(module, device_ids)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/replicate.py", line 12,
in replicate
param_copies = Broadcast(devices)(*params)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/_functions.py", line 19
, in forward
outputs = comm.broadcast_coalesced(inputs, self.target_gpus)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/comm.py", line 54, in broadcas
t_coalesced
results = broadcast(_flatten_tensors(chunk), devices)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/comm.py", line 24, in broadcas
t
nccl.broadcast(tensors)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 182, in broadca
st
comm = communicator(inputs)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 133, in communi
cator
_communicators[key] = NcclCommList(devices)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 106, in _init
_
check_error(lib.ncclCommInitAll(self, len(devices), int_array(devices)))
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 118, in check_e
rror
raise NcclError(status)
torch.cuda.nccl.NcclError: System Error (2)

carpedm20 / discogan-pytorch Goto Github PK

discogan-pytorch's Issues

network

optimizer

training

Recommend Projects

Recommend Topics

Recommend Org