Coder Social home page Coder Social logo

carpedm20 / discogan-pytorch Goto Github PK

View Code? Open in Web Editor NEW
1.1K 40.0 225.0 81.83 MB

PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

License: Apache License 2.0

Jupyter Notebook 97.18% Python 2.79% Shell 0.03%
gan generative-model unsupervised-learning pytorch

discogan-pytorch's Issues

How to start training??

Hello Sir,

I downloaded your code and maps-dataset..
And when I started I met some error.

...
Traceback (most recent call last):
  File "main.py", line 41, in <module>
    main(config)
  File "main.py", line 33, in main
    trainer.train()
  File "/itsme/TESTBOARD/additional_networks/GAN/pytorch_DiscoGAN_carpedm20/trainer.py", line 247, in train
    format(step, self.max_step, l_d.data[0], l_g.data[0]))
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Thanks..

torch size

getting this message:

0%| | 1/50000 [00:03<55:08:42, 3.97s/it]C:\Users\Shadow\Anaconda3\lib\site-packages\torch\nn\modules\loss.py:512: UserWarning: Using a target size (torch.Size([200])) that is different to the input size (torch.Size([200, 1])) is deprecated. Please ensure they have the same size.

Multi GPU

Thanks for the PyTorch implementation of DiscoGAN.
I am having troubles running it with multiple GPUs though.
I think the first problem is in the config.py file, where it casts the num_gpu argument to a bool instead of an int.

misc_arg.add_argument('--num_gpu', type=str2bool, default=1)

But even fixing that, the code will only run when --num_gpu=1 and not when --num_gpu=2.
Here's the traceback error:

Error:

Traceback (most recent call last):
  File "main.py", line 41, in <module>
    main(config)
  File "main.py", line 33, in main
    trainer.train()
  File "/home/***/Documents/DiscoGAN-pytorch-master/trainer.py", line 187, in train
    x_AB = self.G_AB(x_A).detach()
  File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/***/Documents/DiscoGAN-pytorch-master/models.py", line 45, in forward
    return nn.parallel.data_parallel(self.main, x, gpu_ids)
  File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 101, in data_parallel
    replicas = replicate(module, device_ids[:len(inputs)])
  File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/parallel/replicate.py", line 10, in replicate
    params = list(network.parameters())
AttributeError: 'function' object has no attribute 'parameters'

Any idea what could be the cause?

Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 169])) is deprecated. Please ensure they have the same size.

(base) D:\DiscoGAN>python main.py --dataset=siys2simk --input_scale_size 256 --batch_size 4 --a_grayscale True --b_grayscale True --num_worker 1 --num_gpu=0
[*] MODEL dir: ./logs\siys2simk_2022-08-26_11-17-52
[*] PARAM path: ./logs\siys2simk_2022-08-26_11-17-52\params.json
  0%|                                                                                                                                                                                         | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "D:\DiscoGAN\main.py", line 41, in <module>
    main(config)
  File "D:\DiscoGAN\main.py", line 33, in main
    trainer.train()
  File "D:\DiscoGAN\trainer.py", line 200, in train
    l_d_A_real, l_d_A_fake = bce(self.D_A(x_A).squeeze(1), real_tensor), bce(self.D_A(x_BA).squeeze(1), fake_tensor)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\loss.py", line 613, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 3074, in binary_cross_entropy
    raise ValueError(
ValueError: Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 169])) is deprecated. Please ensure they have the same size.

Cuda runtime runtime error(8)

class Generator(nn.Module):

def __init__(self, input_size, output_size, hidden_dims):

    super(Generator, self).__init__()

    self.layers = []

    

    prev_dim = input_size

    for hidden_dim in hidden_dims:

        self.layers.append(nn.Linear(prev_dim, hidden_dim))

        self.layers.append(nn.ReLU(True))

        prev_dim = hidden_dim

    self.layers.append(nn.Linear(prev_dim, output_size))

    

    self.layer_module = ListModule(*self.layers)

    

def forward(self, x):

    out = x

    for layer in self.layers:

        out = layer(out)

    return out

class Discriminator(nn.Module):

def __init__(self, input_size, output_size, hidden_dims):

    super(Discriminator, self).__init__()

    self.layers = []

    

    prev_dim = input_size

    for idx, hidden_dim in enumerate(hidden_dims):

        self.layers.append(nn.Linear(prev_dim, hidden_dim))

        self.layers.append(nn.ReLU(True))

        prev_dim = hidden_dim

        

    self.layers.append(nn.Linear(prev_dim, output_size))

    self.layers.append(nn.Sigmoid())

    

    self.layer_module = ListModule(*self.layers)

def forward(self, x):

    out = x

    for layer in self.layers:

        out = layer(out)

    return out.view(-1, 1)

network

hidden_dim = 128

g_num_layer = 3

d_num_layer = 5

G_AB = Generator(2, 2, [hidden_dim] * g_num_layer)

G_BA = Generator(2, 2, [hidden_dim] * g_num_layer)

D_A = Discriminator(2, 1, [hidden_dim] * d_num_layer)

D_B = Discriminator(2, 1, [hidden_dim] * d_num_layer)

G_AB.cuda()

G_BA.cuda()

D_A.cuda()

D_B.cuda()

optimizer

lr = 0.0002

beta1 = 0.5

beta2 = 0.999

d = nn.MSELoss()

bce = nn.BCELoss()

optimizer_d = torch.optim.Adam(

chain(D_A.parameters(), D_B.parameters()), lr=lr, betas=(beta1, beta2))

optimizer_g = torch.optim.Adam(

chain(G_AB.parameters(), G_BA.parameters()), lr=lr, betas=(beta1, beta2))

training

num_epoch = 50000

real_label = 1

fake_label = 0

real_tensor = Variable(torch.FloatTensor(batch_size).cuda())

_ = real_tensor.data.fill_(real_label)

print(real_tensor.sum())

fake_tensor = Variable(torch.FloatTensor(batch_size).cuda())

_ = fake_tensor.data.fill_(fake_label)

print(fake_tensor.sum())


RuntimeError Traceback (most recent call last)
in ()
77
78 real_tensor = Variable(torch.FloatTensor(batch_size).cuda())
---> 79 _ = real_tensor.data.fill_(real_label)
80 print(real_tensor.sum())
81

RuntimeError: cuda runtime error (8) : invalid device function at /py/conda-bld/pytorch_1493677666423/work/torch/lib/THC/generic/THCTensorMath.cu:15

Pre-trained models

Hi there,

Do you have any pre-trained model for the Cityscapes example that could be made available online?

Thanks

Inconsistent tensor sizes error with own data

I tried out my own dataset in data/mydata (with the A and B folders) but I get the following error:

~/DiscoGAN-pytorch$ python main.py --dataset=mydata --num_gpu=1
[*] MODEL dir: logs/mydata_2017-03-21_14-49-25
[*] PARAM path: logs/mydata_2017-03-21_14-49-25/params.json
Traceback (most recent call last):
  File "main.py", line 41, in <module>
    main(config)
  File "main.py", line 33, in main
    trainer.train()
  File "/home/bart/DiscoGAN-pytorch/trainer.py", line 161, in train
    valid_x_A, valid_x_B = self._get_variable(A_loader.next()), self._get_variable(B_loader.next())
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 174, in __next__
    return self._process_next_batch(batch)
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 32, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 68, in default_collate
    return torch.stack(batch, 0)
  File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/functional.py", line 56, in stack
    return torch.cat(list(t.unsqueeze(dim) for t in sequence), dim)
RuntimeError: inconsistent tensor sizes at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.10_1488755368782/work/torch/lib/TH/generic/THTensorMath.c:2548

CUDA error running on multiple GPUs: torch.cuda.nccl.NcclError: System Error (2)

Hi,

I've been trying to run the example code (on the maps dataset):

python main.py --dataset=maps --num_gpu=4

I get the error below related to the NCCL library. I'm trying to run this on 4 K80 GPUs.

Any suggestions on what could be causing this and what a solution could be?

pix2pix processing: 100%|#######################| 1096/1096 [00:00<00:00, 178591.97it/s]
pix2pix processing: 100%|#######################| 1096/1096 [00:00<00:00, 213732.43it/s]
[] MODEL dir: logs/maps_2017-10-26_20-36-34
[
] PARAM path: logs/maps_2017-10-26_20-36-34/params.json
0%| | 0/500000 [00:00<?, ?it/s]

Traceback (most recent call last):
File "main.py", line 41, in
main(config)
File "main.py", line 33, in main
trainer.train()
File "/home/nbserver/DiscoGAN-pytorch/trainer.py", line 193, in train
x_AB = self.G_AB(x_A).detach()
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in
call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line
59, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line
64, in replicate
return replicate(module, device_ids)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/replicate.py", line 12,
in replicate
param_copies = Broadcast(devices)(*params)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/_functions.py", line 19
, in forward
outputs = comm.broadcast_coalesced(inputs, self.target_gpus)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/comm.py", line 54, in broadcas
t_coalesced
results = broadcast(_flatten_tensors(chunk), devices)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/comm.py", line 24, in broadcas
t
nccl.broadcast(tensors)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 182, in broadca
st
comm = communicator(inputs)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 133, in communi
cator
_communicators[key] = NcclCommList(devices)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 106, in _init
_
check_error(lib.ncclCommInitAll(self, len(devices), int_array(devices)))
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 118, in check_e
rror
raise NcclError(status)
torch.cuda.nccl.NcclError: System Error (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.