imlixinyang / hisd Goto Github PK
View Code? Open in Web Editor NEWOfficial pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement" (CVPR2021 Oral).
License: Other
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement" (CVPR2021 Oral).
License: Other
Hi, thanks for your sharing.
How many tags have you tried to train? What's the relation between the number of tags and that of training iterations?
And How many tags will you recommend at the once training?
Hi, thanks for sharing the codes for the awesome work.
I am wondering whether there is a typo in trainer.py L66.
loss_gen_adv = self.dis.calc_gen_loss_real(x, s, y, i, j) + \ self.dis.calc_gen_loss_fake_trg(x_trg, s_trg.detach(), y, i, j_trg) + \ self.dis.calc_gen_loss_fake_cyc(x_cyc, s.detach(), y, i, j)
When updating the generator, why use the real image for calculating the loss?
我按照您的提示下载了CelebAmask-HQ数据集,但是我不知道img_path, label_path, target_path之类的具体怎么设置,您可以给我一个示例吗?我是初学者。此外,Label_path这个我也不太会,我下载的文件里没有label标签文件,难道是mapping.txt?但是我设置了这个之后,提示我没有这个路径。谢谢作者大大。
作者您好,您这篇文章的工作令人印象深刻,我想下载RaFD数据集来训练自己的工作。您在readme中的加入的链接无法打开,我已向RaFD官网的邮箱发送了申请邮件,始终没有得到回复。您可以提供一下其他的下载方式吗?万分感谢,我的邮箱是[email protected]
Please have a check
` class Dis(nn.Module):
def init(self, hyperparameters):
super().init()
self.tags = hyperparameters['tags']
channels = hyperparameters['discriminators']['channels']
#[64, 128, 256, 512, 1024, 2048]
self.conv = nn.Sequential(
nn.Conv2d(hyperparameters['input_dim'], channels[0], 1, 1, 0),
*[DownBlock(channels[i], channels[i + 1]) for i in range(len(channels) - 1)],
nn.AdaptiveAvgPool2d(1),
)
self.fcs = nn.ModuleList([nn.Sequential(
nn.Conv2d(channels[-1] + #2048
# ALI part which is not shown in the original submission but help disentangle the extracted style.
#ALI部分未在原始提交中显示,但有助于解耦提取到的 style。
hyperparameters['style_dim'] + #256
# Tag-irrelevant part. Sec.3.4
self.tags[i]['tag_irrelevant_conditions_dim'], #2 2 2
# One for translated, one for cycle. Eq.4
len(self.tags[i]['attributes'] * 2), 1, 1, 0), #4 4 6
) for i in range(len(self.tags))]) #这里的i控制的是三个tag里面的哪个
def forward(self, x, s, y, i):
f = self.conv(x)
fsy = torch.cat([f, tile_like(s, f), tile_like(y, f)], 1)
#按照第一维度,也就是列维度,叠加起来,也就是横着串起来
return self.fcs[i](fsy).view(f.size(0), 2, -1) `
作者你好,关于判别器我有几个不太懂的点,还希望您可以教教我
对于判别器是怎么不去改变两个无关标签我不是很理解这其中的过程
判别器的forward那边最后的.view(f.size(0), 2, -1),第一维是batch_size,第二维我不懂是什么,为啥是2,第三维是控制的属性吗,这边看不太懂
关于计算生成器的对抗损失这边,为什么真实图片取的[:,0]和[:,1]的平均之和,而两张fake图片分别取的[:,0]和[:,1]的平均?这边不太理解。代码如下:
` def calc_gen_loss_real(self, x, s, y, i, j):#
loss = 0
out = self.forward(x, s, y, i)[:, :, j]#选到那个属性
#比如是[8, 2, 2], 截取[:,:,1] 就变成了[8, 2]了
loss += out[:, 0].mean()
loss += out[:, 1].mean()
return loss
def calc_gen_loss_fake_trg(self, x, s, y, i, j):
out = self.forward(x, s, y, i)[:, :, j]
loss = - out[:, 0].mean()
return loss
def calc_gen_loss_fake_cyc(self, x, s, y, i, j):
out = self.forward(x, s, y, i)[:, :, j]
loss = - out[:, 1].mean()
return loss `
希望您可以解答我的疑惑,谢谢作者!
Can you elease a checkpoint of Open the mouth tag?
Hi, authors:
I found an issue when training the code with multi-GPUs, which is that only a single GPU is used despite inputting multiple GPUs. Can you solve this problem in your spare time?
Thanks!
Hi, I am trying the AFHQ dataset of your model and find your model can preserve the background of the source image very well. I think it is thanks to the attention map, and I visualize the attention map and find it can learn the mask.
However, when I am trying to copy the attention module to my own framework (this paper), I find it does not work at all and fail to learn the mask. The main difference between mine and yours lie in the mapping network usage and without KL/MMD related loss between the random noise distribution and the reference encoder embedding distribution (I directly replace your generator and fail to learn the mask too).
I am wondering do you have some experience with your attention map design. What do you think when it can learn the mask? It would be really great if you can share some experience with me, thanks a lot!
Looking forward to your reply!
Hi, thanks for your beautiful work, I want to konw the reason for the design about the m、f of the translator , Is there any reference work , or you design this by experiment? And as your paper mentioned "The attention mask in our translator is both spatial- wise and channel-wise." can you explain specifically ?
Thank you for your research !
But now, I try to generate a new "checkpoint_512_celeba-hq.pt" for 512x512 size image. However, it still something wrong through test.py phase.
The error message shows that :
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
How can I do now ? Do you have the checkpoint_512_celeba-hq.pt file ?
Dear Author, I appreciate your work, but i have some questions about the data imbalance. The face number with glasses is only 1377 as the numbe without glasses is 25622. I want to known the effects of the data imbalance. Thanks~
There no Key
named 'image_size'
in any config file.
Please check~
I am very interested in your jobs, can you release your beard model?
作者您好:
十分感谢您之前对我提出的问题的耐心回复,我现在正在您的框架的基础上进展一些工作,现在我需要做一些对比实验,我看到您的论文中使用了SDIT进行了reference-guided实验,我也看了这篇论文,发现是用的一个随机向量z,放入鉴别器中进行映射,然后用鉴别器去约束这个style。
我现在也想做reference-guided实验,是否是,将原图x 放入鉴别器,获得style,再将生成器所需的内容放入生成器生成图片呢,还望指点一下,感谢!
有一些错误 在test.py中 os.makedirs(opts.output_dir, exist_ok=True)应改为os.makedirs(opts.output_path, exist_ok=True) 并且后面的局部变量也有错误
In your paper, you have designed non-translation , self-translation, and cycle-translation. In non-translation generator, it just encode the input, e = encoder(input), and then decode e, output = decoder(e). Certainly you have already achieve great results with HiSD designed like this. I am a beginner of GAN, and I am just wondering why don't you set stylecode as 0, and use a translator in your non-translation, which means, e = encoder(input), e_trg = translator(e, 0), output = decoder(e_trg), I have seen other works did like this. Could it make any difference?
Sorry to bother you, I'm doing some work on your model and would like to ask a few questions about training skills
As for the number of training times, I combine the loss function with the process graph generated in the training process. When the loss function declines to be stable and the generated graph looks ok, it means that the training is done. Do you have any additional comments?
Regarding the problem of model collapse, can you give some examples in this task?
Do you have any skills in setting loss function coefficients?
Looking forward to your reply~
Hello, the the model looks very good, but I encountered some problems when trying to train the model by myself.
My training environment is win10, torch1.8.0+cuda11, rtx3090, 32g memory.
When I use the default config, it will prompt cuda out of memory, but this is a misleading error message, cause there is still a lot of free cuda memory. When I tracked the hardware resources during the training process, I found that the amount of memory submitted before the formal training began to increase, which eventually led to overflow, which means that a huge amount of virtual memory was applied for before the training began. However, when I lower the parameters for normal training, the actual memory usage is very small, and the virtual memory usage is still large, but it will not reach the upper limit that was raised before the training started. I have never seen such a huge virtual memory overhead before, so I consider whether there is a memory leak problem during preload or preprocessing, and whether the program can be better optimized.
Thank you!
name: Bangs
tag_irrelevant_conditions_dim: 2
attributes:
-
name: 'with'
filename: datasets/Bangs_with.txt
-
name: 'without'
filename: datasets/Bangs_without.txt
name: 'with'
filename: datasets/Eyeglasses_with.txt
if I want to make experiment on Multi-tag task, how to replace Multi-tag task on celeba-hq.yaml setting?
作者您好,我想对HiSD更换数据集进行训练,对于CelebA数据集您通过label.txt文件对数据进行预处理,里面有关于刘海、眼镜等属性的信息,请问关于HiSD是否有已经标注好的其他数据集呢~
Hi, your work is so impressive that I want to reproduce your experiment. But I got a problem about the config of the batchsize, I use a single GTX 1080Ti GPU as you did,but the batchsize of 8 seems too big for it, I can just set batchsize to 5 or 4. Can you tell me the solution about this problem,thanks a lot
作者您好,看到readme中给出的环境是基于cuda的,我的电脑是苹果m1芯片,目前pytorch还不支持m1的cuda版本,只有default版本,请问有什么解决办法吗?
Thank you for your open source. This code is very helpful to my knowledge of face attribute transfer. You have opened the checkpoint of 256 size, could you open the checkpoint of 1024 size for learning and use. Thank you very much.
作者您好,如题,想在您的生成器与鉴别器结构上做些改进,做一些工作,想问下您的生成器与鉴别器模型是基于哪篇文章的 ^_^
你好,我看发布的代码里有train.py,给的训练示例是用的128(即celeba-hq.yaml)的那个配置,如果我想用256的那个celeba-hq_256.yaml来train,我应该在代码中做怎样的修改呢,我试着直接更改使用的yaml文件但是行不通
您好,我使用AFHQ数据集训练模型。在使用preprocessors文件夹的custom.py的时候,报错'Namespace'object has no attribute 'imgs',在参数定义中也没看见--imgs定义。请问这块该怎么改呢,谢谢谢谢。
作者好,这个方法是否可以看成pix2pix的加强版?
First of all, congratulations on the results of the research,
and thank you for the concise and understandable code implementation.
But I still encountered some problems when trying to reproduce the quantitative experiment results in the paper, I did as follow:
Realism:
Disentanglement:
Then I got FID results:
L:25.05 R:25.21 G:0.16 in the "Realism" experiment
L:85.75 R:84.45 G:1.30 in the "Disentanglement" experiment
While In the paper:
L:21.37 R:21.49 G:0.12 in the "Realism" experiment
L:71.85 R:71.48 G:0.37 in the "Disentanglement" experiment
Although there are several random factors in many places in the experiment, it is normal for the FID results to have fluctuations, but these results are too bad.
I think there must be something wrong with my data processing, or the training method.
So, could you please explain the data used in the quantitative experiment and the method of data processing in detail? If possible, could you please release the model of the paper's experiment config?
你好我想请教一下,我们这个模型是否可以在猫狗数据集中,通过修改一些其他的标签(比如说鼻子眼睛之类的属性),使得猫变成狗。而不是直接修改种类的标签属性
Traceback (most recent call last):
File "core/test.py", line 78, in <module>
c_trg = T(c_, s_, step['tag'])
NameError: name 'c_' is not defined
I have completed the first few steps of Quick Start,Download the datasetd and Preprocess the dataset. Train.py step has not been done because of the GPU problem then I start to try test.py.
as your suggestion,$your_input_path can be either a image file or a folder of images. then I try
python core/test.py --config configs/celeba-hq.yaml --checkpoint configs/checkpoint_256_celeba-hq.pt--input_path CelebAMask-HQ/CelebAMask-HQ/CelebA-HQ-img --output_path result
test.py: error: unrecognized arguments: CelebAMask-HQ/CelebAMask-HQ/CelebA-HQ-img
or python core/test.py --config configs/celeba-hq.yaml --checkpoint configs/checkpoint_256_celeba-hq.pt--input_path CelebAMask-HQ/CelebAMask-HQ/CelebA-HQ-img/0.jpg --output_path result
test.py: error: unrecognized arguments: CelebAMask-HQ/CelebAMask-HQ/CelebA-HQ-img/0.jpg
I don't know if it's because I haven't yet modified the 'steps' dict in the first few lines in 'core/test.py' .If it is for this reason, can you tell me how to modify the 'steps' dict?As a junior who has just studied deep-learning for one or two months, it's really a bit difficult for me,thanks a lot.
你好,我想在用您的预训练模型进行训练而不是从头训练,请问该如何操作
Thanks for your great works. I have some questions about the training configuration. There are differences between celeba-hq.yaml and celeba-hq_256.yaml in configs folder, such as the normalization in different channels. What's the reason here? If I try to train the model based on the size of 512 or 1024, how should I set? It is appreciated if I can receive your reple
According to your paper and training code, three image generate phases ( raw / self / random style code) had been used. Howerver, it's easy to pick up another similar phase, like random style image, as an training phase. Specifically, randomly pick another image as the input of style encoder, follow the same data flow like random style code. Some other works( like https://github.com/saic-mdal/HiDT ) had been picked such phases into their traning phases and got satisfied results.
As a result, I just wondering that have you been tried this phase before? How was the result be like and why did not add it into your paper?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.