carpedm20 / discogan-pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"
License: Apache License 2.0
PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"
License: Apache License 2.0
when I test my model, the problem 'unexpected key "module.layer_module.0.weight" in state_dict' happened. Did you meet the problem?
Hello Sir,
I downloaded your code and maps-dataset..
And when I started I met some error.
...
Traceback (most recent call last):
File "main.py", line 41, in <module>
main(config)
File "main.py", line 33, in main
trainer.train()
File "/itsme/TESTBOARD/additional_networks/GAN/pytorch_DiscoGAN_carpedm20/trainer.py", line 247, in train
format(step, self.max_step, l_d.data[0], l_g.data[0]))
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
Thanks..
Looks like you crash if width and height isn't equal on custom datasets. I worked around by forcing my 640x360 images to 640x640.
getting this message:
0%| | 1/50000 [00:03<55:08:42, 3.97s/it]C:\Users\Shadow\Anaconda3\lib\site-packages\torch\nn\modules\loss.py:512: UserWarning: Using a target size (torch.Size([200])) that is different to the input size (torch.Size([200, 1])) is deprecated. Please ensure they have the same size.
Thanks for the PyTorch implementation of DiscoGAN.
I am having troubles running it with multiple GPUs though.
I think the first problem is in the config.py
file, where it casts the num_gpu
argument to a bool instead of an int.
Line 53 in 2feae82
--num_gpu=1
and not when --num_gpu=2
.Error:
Traceback (most recent call last):
File "main.py", line 41, in <module>
main(config)
File "main.py", line 33, in main
trainer.train()
File "/home/***/Documents/DiscoGAN-pytorch-master/trainer.py", line 187, in train
x_AB = self.G_AB(x_A).detach()
File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/***/Documents/DiscoGAN-pytorch-master/models.py", line 45, in forward
return nn.parallel.data_parallel(self.main, x, gpu_ids)
File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 101, in data_parallel
replicas = replicate(module, device_ids[:len(inputs)])
File "/home/***/.pyenv/versions/2.7.13/lib/python2.7/site-packages/torch/nn/parallel/replicate.py", line 10, in replicate
params = list(network.parameters())
AttributeError: 'function' object has no attribute 'parameters'
Any idea what could be the cause?
I get the following warning message;
[!] Sampled dataset from A and B have different # of data. Try resampling...
From what I understand it is not necessary that both data sets are equal in size?
Both datasets have pictures with 256x256 pixels.
It downsizes my dataset image to the height of 68px while training
Once I set '-- input_scale_size' not to 64, an error will be reported. It seems that the discriminator cannot adapt to the size of the input image. How did you solve the problem?
"ValueError: Target and input must have the same number of elements. target nelement (2) != input nelement (338)"
(base) D:\DiscoGAN>python main.py --dataset=siys2simk --input_scale_size 256 --batch_size 4 --a_grayscale True --b_grayscale True --num_worker 1 --num_gpu=0
[*] MODEL dir: ./logs\siys2simk_2022-08-26_11-17-52
[*] PARAM path: ./logs\siys2simk_2022-08-26_11-17-52\params.json
0%| | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:\DiscoGAN\main.py", line 41, in <module>
main(config)
File "D:\DiscoGAN\main.py", line 33, in main
trainer.train()
File "D:\DiscoGAN\trainer.py", line 200, in train
l_d_A_real, l_d_A_fake = bce(self.D_A(x_A).squeeze(1), real_tensor), bce(self.D_A(x_BA).squeeze(1), fake_tensor)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\loss.py", line 613, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 3074, in binary_cross_entropy
raise ValueError(
ValueError: Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 169])) is deprecated. Please ensure they have the same size.
class Generator(nn.Module):
def __init__(self, input_size, output_size, hidden_dims):
super(Generator, self).__init__()
self.layers = []
prev_dim = input_size
for hidden_dim in hidden_dims:
self.layers.append(nn.Linear(prev_dim, hidden_dim))
self.layers.append(nn.ReLU(True))
prev_dim = hidden_dim
self.layers.append(nn.Linear(prev_dim, output_size))
self.layer_module = ListModule(*self.layers)
def forward(self, x):
out = x
for layer in self.layers:
out = layer(out)
return out
class Discriminator(nn.Module):
def __init__(self, input_size, output_size, hidden_dims):
super(Discriminator, self).__init__()
self.layers = []
prev_dim = input_size
for idx, hidden_dim in enumerate(hidden_dims):
self.layers.append(nn.Linear(prev_dim, hidden_dim))
self.layers.append(nn.ReLU(True))
prev_dim = hidden_dim
self.layers.append(nn.Linear(prev_dim, output_size))
self.layers.append(nn.Sigmoid())
self.layer_module = ListModule(*self.layers)
def forward(self, x):
out = x
for layer in self.layers:
out = layer(out)
return out.view(-1, 1)
hidden_dim = 128
g_num_layer = 3
d_num_layer = 5
G_AB = Generator(2, 2, [hidden_dim] * g_num_layer)
G_BA = Generator(2, 2, [hidden_dim] * g_num_layer)
D_A = Discriminator(2, 1, [hidden_dim] * d_num_layer)
D_B = Discriminator(2, 1, [hidden_dim] * d_num_layer)
G_AB.cuda()
G_BA.cuda()
D_A.cuda()
D_B.cuda()
lr = 0.0002
beta1 = 0.5
beta2 = 0.999
d = nn.MSELoss()
bce = nn.BCELoss()
optimizer_d = torch.optim.Adam(
chain(D_A.parameters(), D_B.parameters()), lr=lr, betas=(beta1, beta2))
optimizer_g = torch.optim.Adam(
chain(G_AB.parameters(), G_BA.parameters()), lr=lr, betas=(beta1, beta2))
num_epoch = 50000
real_label = 1
fake_label = 0
real_tensor = Variable(torch.FloatTensor(batch_size).cuda())
_ = real_tensor.data.fill_(real_label)
print(real_tensor.sum())
fake_tensor = Variable(torch.FloatTensor(batch_size).cuda())
_ = fake_tensor.data.fill_(fake_label)
print(fake_tensor.sum())
RuntimeError Traceback (most recent call last)
in ()
77
78 real_tensor = Variable(torch.FloatTensor(batch_size).cuda())
---> 79 _ = real_tensor.data.fill_(real_label)
80 print(real_tensor.sum())
81
RuntimeError: cuda runtime error (8) : invalid device function at /py/conda-bld/pytorch_1493677666423/work/torch/lib/THC/generic/THCTensorMath.cu:15
DiscoGAN paper they assert that
"to avoid costly pairing, we address
the task of discovering cross-domain relations
given unpaired data"
But, I don't know why this implementation of DiscoGAN requires paired data?
(as far as I know/see, in the dataset downloaded )
Or, is it just to simplify the input process?
As of now I don't have any GPU to train.
Hi there,
Do you have any pre-trained model for the Cityscapes example that could be made available online?
Thanks
I tried out my own dataset in data/mydata
(with the A and B folders) but I get the following error:
~/DiscoGAN-pytorch$ python main.py --dataset=mydata --num_gpu=1
[*] MODEL dir: logs/mydata_2017-03-21_14-49-25
[*] PARAM path: logs/mydata_2017-03-21_14-49-25/params.json
Traceback (most recent call last):
File "main.py", line 41, in <module>
main(config)
File "main.py", line 33, in main
trainer.train()
File "/home/bart/DiscoGAN-pytorch/trainer.py", line 161, in train
valid_x_A, valid_x_B = self._get_variable(A_loader.next()), self._get_variable(B_loader.next())
File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 174, in __next__
return self._process_next_batch(batch)
File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 32, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 68, in default_collate
return torch.stack(batch, 0)
File "/home/bart/anaconda3/envs/Python36/lib/python3.6/site-packages/torch/functional.py", line 56, in stack
return torch.cat(list(t.unsqueeze(dim) for t in sequence), dim)
RuntimeError: inconsistent tensor sizes at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.10_1488755368782/work/torch/lib/TH/generic/THTensorMath.c:2548
Hello,
please can you show how to test the final model on a single image
Thank you
Lafi
Hi,
I've been trying to run the example code (on the maps dataset):
python main.py --dataset=maps --num_gpu=4
I get the error below related to the NCCL library. I'm trying to run this on 4 K80 GPUs.
Any suggestions on what could be causing this and what a solution could be?
pix2pix processing: 100%|#######################| 1096/1096 [00:00<00:00, 178591.97it/s]
pix2pix processing: 100%|#######################| 1096/1096 [00:00<00:00, 213732.43it/s]
[] MODEL dir: logs/maps_2017-10-26_20-36-34
[] PARAM path: logs/maps_2017-10-26_20-36-34/params.json
0%| | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 41, in
main(config)
File "main.py", line 33, in main
trainer.train()
File "/home/nbserver/DiscoGAN-pytorch/trainer.py", line 193, in train
x_AB = self.G_AB(x_A).detach()
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in
call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line
59, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line
64, in replicate
return replicate(module, device_ids)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/replicate.py", line 12,
in replicate
param_copies = Broadcast(devices)(*params)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/_functions.py", line 19
, in forward
outputs = comm.broadcast_coalesced(inputs, self.target_gpus)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/comm.py", line 54, in broadcas
t_coalesced
results = broadcast(_flatten_tensors(chunk), devices)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/comm.py", line 24, in broadcas
t
nccl.broadcast(tensors)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 182, in broadca
st
comm = communicator(inputs)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 133, in communi
cator
_communicators[key] = NcclCommList(devices)
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 106, in _init
_
check_error(lib.ncclCommInitAll(self, len(devices), int_array(devices)))
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/nccl.py", line 118, in check_e
rror
raise NcclError(status)
torch.cuda.nccl.NcclError: System Error (2)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.