Coder Social home page Coder Social logo

imlixinyang / hisd Goto Github PK

View Code? Open in Web Editor NEW
383.0 383.0 51.0 49.43 MB

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement" (CVPR2021 Oral).

License: Other

Python 87.42% Jupyter Notebook 12.58%
disentangled-representations gan image-to-image-translation

hisd's Introduction

Hi there 👋

hisd's People

Contributors

imlixinyang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hisd's Issues

The codes may have some typos?

Hi, thanks for sharing the codes for the awesome work.
I am wondering whether there is a typo in trainer.py L66.
loss_gen_adv = self.dis.calc_gen_loss_real(x, s, y, i, j) + \ self.dis.calc_gen_loss_fake_trg(x_trg, s_trg.detach(), y, i, j_trg) + \ self.dis.calc_gen_loss_fake_cyc(x_cyc, s.detach(), y, i, j)
When updating the generator, why use the real image for calculating the loss?

作者您好,请问下有关数据集的问题

我按照您的提示下载了CelebAmask-HQ数据集,但是我不知道img_path, label_path, target_path之类的具体怎么设置,您可以给我一个示例吗?我是初学者。此外,Label_path这个我也不太会,我下载的文件里没有label标签文件,难道是mapping.txt?但是我设置了这个之后,提示我没有这个路径。谢谢作者大大。

请求RaFD数据集

作者您好,您这篇文章的工作令人印象深刻,我想下载RaFD数据集来训练自己的工作。您在readme中的加入的链接无法打开,我已向RaFD官网的邮箱发送了申请邮件,始终没有得到回复。您可以提供一下其他的下载方式吗?万分感谢,我的邮箱是[email protected]

作者您好,想请问一下关于discriminator的问题

` class Dis(nn.Module):
def init(self, hyperparameters):
super().init()
self.tags = hyperparameters['tags']
channels = hyperparameters['discriminators']['channels']
#[64, 128, 256, 512, 1024, 2048]
self.conv = nn.Sequential(
nn.Conv2d(hyperparameters['input_dim'], channels[0], 1, 1, 0),
*[DownBlock(channels[i], channels[i + 1]) for i in range(len(channels) - 1)],
nn.AdaptiveAvgPool2d(1),
)
self.fcs = nn.ModuleList([nn.Sequential(
nn.Conv2d(channels[-1] + #2048
# ALI part which is not shown in the original submission but help disentangle the extracted style.
#ALI部分未在原始提交中显示,但有助于解耦提取到的 style。
hyperparameters['style_dim'] + #256
# Tag-irrelevant part. Sec.3.4
self.tags[i]['tag_irrelevant_conditions_dim'], #2 2 2
# One for translated, one for cycle. Eq.4
len(self.tags[i]['attributes'] * 2), 1, 1, 0), #4 4 6
) for i in range(len(self.tags))]) #这里的i控制的是三个tag里面的哪个

def forward(self, x, s, y, i):
    f = self.conv(x)
    fsy = torch.cat([f, tile_like(s, f), tile_like(y, f)], 1)
    #按照第一维度,也就是列维度,叠加起来,也就是横着串起来
    return self.fcs[i](fsy).view(f.size(0), 2, -1) `

作者你好,关于判别器我有几个不太懂的点,还希望您可以教教我

  1. 对于判别器是怎么不去改变两个无关标签我不是很理解这其中的过程

  2. 判别器的forward那边最后的.view(f.size(0), 2, -1),第一维是batch_size,第二维我不懂是什么,为啥是2,第三维是控制的属性吗,这边看不太懂

  3. 关于计算生成器的对抗损失这边,为什么真实图片取的[:,0]和[:,1]的平均之和,而两张fake图片分别取的[:,0]和[:,1]的平均?这边不太理解。代码如下:
    ` def calc_gen_loss_real(self, x, s, y, i, j):#
    loss = 0
    out = self.forward(x, s, y, i)[:, :, j]#选到那个属性
    #比如是[8, 2, 2], 截取[:,:,1] 就变成了[8, 2]了
    loss += out[:, 0].mean()
    loss += out[:, 1].mean()
    return loss

    def calc_gen_loss_fake_trg(self, x, s, y, i, j):
    out = self.forward(x, s, y, i)[:, :, j]
    loss = - out[:, 0].mean()
    return loss

    def calc_gen_loss_fake_cyc(self, x, s, y, i, j):
    out = self.forward(x, s, y, i)[:, :, j]
    loss = - out[:, 1].mean()
    return loss `
    希望您可以解答我的疑惑,谢谢作者!

请教关于生成结果图像质量下降等问题

作者您好,作为一名初学者,您的工作让我受益良多。有几个问题想请教一下:

  1. 在复现论文的过程中,我发现在对Young个glasses这两个属性分别进行转换,训练到11万次左右的时候出现了如下情况(在对其他组合训练的时候则不会产生):请问这是由于什么原因导致的?

image

image

  1. 基于问题1,不同的标签组合是否会对训练结果产生影响?
  2. 是否训练的次数越多效果就一定越好呢?还是说最好的结果出现在哪里要通过生成的结果直观查看
  3. 在做reference-guided时是否可以通过标签对参考图像进行约束?

期待您的回复,感谢感谢!祝您科研顺利

About Mutil-Gpus for training?

Hi, authors:

I found an issue when training the code with multi-GPUs, which is that only a single GPU is used despite inputting multiple GPUs. Can you solve this problem in your spare time?

Thanks!

Questions about the attention map

Hi, I am trying the AFHQ dataset of your model and find your model can preserve the background of the source image very well. I think it is thanks to the attention map, and I visualize the attention map and find it can learn the mask.

However, when I am trying to copy the attention module to my own framework (this paper), I find it does not work at all and fail to learn the mask. The main difference between mine and yours lie in the mapping network usage and without KL/MMD related loss between the random noise distribution and the reference encoder embedding distribution (I directly replace your generator and fail to learn the mask too).

I am wondering do you have some experience with your attention map design. What do you think when it can learn the mask? It would be really great if you can share some experience with me, thanks a lot!

Looking forward to your reply!

question about paper

Hi, thanks for your beautiful work, I want to konw the reason for the design about the m、f of the translator , Is there any reference work , or you design this by experiment? And as your paper mentioned "The attention mask in our translator is both spatial- wise and channel-wise." can you explain specifically ?

Another size of image

Thank you for your research !
But now, I try to generate a new "checkpoint_512_celeba-hq.pt" for 512x512 size image. However, it still something wrong through test.py phase.

The error message shows that :
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

How can I do now ? Do you have the checkpoint_512_celeba-hq.pt file ?

Data imbalance

Dear Author, I appreciate your work, but i have some questions about the data imbalance. The face number with glasses is only 1377 as the numbe without glasses is 25622. I want to known the effects of the data imbalance. Thanks~

About beard model

I am very interested in your jobs, can you release your beard model?

关于更高分辨率模型的训练问题

你好,非常感谢您的研究并开源如此优秀的项目,
现在,我想基于自己的数据,训练一个高分辨率的生成模型,比如512,我应该如何去调整配置文件中各个通道的维度和尺寸呢,如下是256*256生成器的设置,

image

如果我想训练512*512的生成器,这些参数如何调整吗

感谢并期待您的回复

您好,请教一下论文中实验部分的一些问题,望指点

作者您好:
十分感谢您之前对我提出的问题的耐心回复,我现在正在您的框架的基础上进展一些工作,现在我需要做一些对比实验,我看到您的论文中使用了SDIT进行了reference-guided实验,我也看了这篇论文,发现是用的一个随机向量z,放入鉴别器中进行映射,然后用鉴别器去约束这个style。
我现在也想做reference-guided实验,是否是,将原图x 放入鉴别器,获得style,再将生成器所需的内容放入生成器生成图片呢,还望指点一下,感谢!

test代码有一些错误

有一些错误 在test.py中 os.makedirs(opts.output_dir, exist_ok=True)应改为os.makedirs(opts.output_path, exist_ok=True) 并且后面的局部变量也有错误

关于生成图片模糊的问题

作者您好,我按照celeba-hq.yaml中给出的参数配置,把celebA的27000张图片作为训练集,3000张图片作为测试集,在本地训练了模型,但测试图片出现了模糊的问题,请问我的设置是不是出现了一些问题,请问您有什么建议
image
image

About the design of non-translation

In your paper, you have designed non-translation , self-translation, and cycle-translation. In non-translation generator, it just encode the input, e = encoder(input), and then decode e, output = decoder(e). Certainly you have already achieve great results with HiSD designed like this. I am a beginner of GAN, and I am just wondering why don't you set stylecode as 0, and use a translator in your non-translation, which means, e = encoder(input), e_trg = translator(e, 0), output = decoder(e_trg), I have seen other works did like this. Could it make any difference?

A few questions about training tricks

Sorry to bother you, I'm doing some work on your model and would like to ask a few questions about training skills

  1. As for the number of training times, I combine the loss function with the process graph generated in the training process. When the loss function declines to be stable and the generated graph looks ok, it means that the training is done. Do you have any additional comments?

  2. Regarding the problem of model collapse, can you give some examples in this task?

  3. Do you have any skills in setting loss function coefficients?

Looking forward to your reply~

Virtual memory usage is too large

Hello, the the model looks very good, but I encountered some problems when trying to train the model by myself.
My training environment is win10, torch1.8.0+cuda11, rtx3090, 32g memory.
When I use the default config, it will prompt cuda out of memory, but this is a misleading error message, cause there is still a lot of free cuda memory. When I tracked the hardware resources during the training process, I found that the amount of memory submitted before the formal training began to increase, which eventually led to overflow, which means that a huge amount of virtual memory was applied for before the training began. However, when I lower the parameters for normal training, the actual memory usage is very small, and the virtual memory usage is still large, but it will not reach the upper limit that was raised before the training started. I have never seen such a huge virtual memory overhead before, so I consider whether there is a memory leak problem during preload or preprocessing, and whether the program can be better optimized.
Thank you!

Multi-tag task

tags:

name: Bangs
tag_irrelevant_conditions_dim: 2
attributes: 
  -
    name: 'with'
    filename: datasets/Bangs_with.txt
  -
    name: 'without'
    filename: datasets/Bangs_without.txt
  • name: Eyeglasses
    tag_irrelevant_conditions_dim: 2
    attributes:

    name: 'with'
    filename: datasets/Eyeglasses_with.txt
    
    • name: 'without'
      filename: datasets/Eyeglasses_without.txt

if I want to make experiment on Multi-tag task, how to replace Multi-tag task on celeba-hq.yaml setting?

更换数据集

作者您好,我想对HiSD更换数据集进行训练,对于CelebA数据集您通过label.txt文件对数据进行预处理,里面有关于刘海、眼镜等属性的信息,请问关于HiSD是否有已经标注好的其他数据集呢~

A little problem about the batchsize

Hi, your work is so impressive that I want to reproduce your experiment. But I got a problem about the config of the batchsize, I use a single GTX 1080Ti GPU as you did,but the batchsize of 8 seems too big for it, I can just set batchsize to 5 or 4. Can you tell me the solution about this problem,thanks a lot

可以用m1芯片运行吗?

作者您好,看到readme中给出的环境是基于cuda的,我的电脑是苹果m1芯片,目前pytorch还不支持m1的cuda版本,只有default版本,请问有什么解决办法吗?

如何处理检查点文件?

作者您好,这里提到需要把checkpoints.pt文件放在root of the repo里面,想问下根目录具体指哪里?是需要在终端输入命令行吗?还是手动把这个文件放到项目文件下?谢谢!
截屏2022-07-20 14 57 48

Release checkpoint for Celeb-HQ dataset

Thank you for your open source. This code is very helpful to my knowledge of face attribute transfer. You have opened the checkpoint of 256 size, could you open the checkpoint of 1024 size for learning and use. Thank you very much.

训练256x256的模型

你好,我看发布的代码里有train.py,给的训练示例是用的128(即celeba-hq.yaml)的那个配置,如果我想用256的那个celeba-hq_256.yaml来train,我应该在代码中做怎样的修改呢,我试着直接更改使用的yaml文件但是行不通

使用AFHQ数据集训练模型

您好,我使用AFHQ数据集训练模型。在使用preprocessors文件夹的custom.py的时候,报错'Namespace'object has no attribute 'imgs',在参数定义中也没看见--imgs定义。请问这块该怎么改呢,谢谢谢谢。

How can I reproduce the quantitative experiment results in the paper?

First of all, congratulations on the results of the research,
and thank you for the concise and understandable code implementation.

But I still encountered some problems when trying to reproduce the quantitative experiment results in the paper, I did as follow:
Realism:

  1. get all test images with attribute “without bangs”(set first 3000 images of CelebA-HQ as test images, and filter the data according to the label recorded in the CelebAMask-HQ-attribute-anno.txt file)
  2. translated them with my self-trained model(with config file: celeba-hq.yaml) into attribute" with bangs" with latent-guided method(randomly generate 5 style codes) and reference-guided method(randomly sample 5 images with attribute "with_bangs" in test data as reference image);
  3. calculate FID (using code https://github.com/mseitzer/pytorch-fid) with all images with attribute "with_bangs" in test images, as the paper said.

Disentanglement:

  1. get all images with attributes "young"、“male“、“without bangs” of test images.
  2. With latent-guided method (randomly generate 5 style codes) and reference-guided method (randomly sample 5 images with attributes "with_bangs"、" young"、“male“ in test data as reference image),get translated images.
  3. calculate FID with all images with attributes "with_bangs"、" young"、“male“ in test images, as the paper said.

Then I got FID results:
L:25.05 R:25.21 G:0.16 in the "Realism" experiment
L:85.75 R:84.45 G:1.30 in the "Disentanglement" experiment
While In the paper:
L:21.37 R:21.49 G:0.12 in the "Realism" experiment
L:71.85 R:71.48 G:0.37 in the "Disentanglement" experiment

Although there are several random factors in many places in the experiment, it is normal for the FID results to have fluctuations, but these results are too bad.

I think there must be something wrong with my data processing, or the training method.
So, could you please explain the data used in the quantitative experiment and the method of data processing in detail? If possible, could you please release the model of the paper's experiment config?

一种新的操作

你好我想请教一下,我们这个模型是否可以在猫狗数据集中,通过修改一些其他的标签(比如说鼻子眼睛之类的属性),使得猫变成狗。而不是直接修改种类的标签属性

name 'c_' is not defined

Traceback (most recent call last):
  File "core/test.py", line 78, in <module>
    c_trg = T(c_, s_, step['tag'])
NameError: name 'c_' is not defined

Quick Start-Test issue

I have completed the first few steps of Quick Start,Download the datasetd and Preprocess the dataset. Train.py step has not been done because of the GPU problem then I start to try test.py.
as your suggestion,$your_input_path can be either a image file or a folder of images. then I try
python core/test.py --config configs/celeba-hq.yaml --checkpoint configs/checkpoint_256_celeba-hq.pt--input_path CelebAMask-HQ/CelebAMask-HQ/CelebA-HQ-img --output_path result
test.py: error: unrecognized arguments: CelebAMask-HQ/CelebAMask-HQ/CelebA-HQ-img
or python core/test.py --config configs/celeba-hq.yaml --checkpoint configs/checkpoint_256_celeba-hq.pt--input_path CelebAMask-HQ/CelebAMask-HQ/CelebA-HQ-img/0.jpg --output_path result
test.py: error: unrecognized arguments: CelebAMask-HQ/CelebAMask-HQ/CelebA-HQ-img/0.jpg
I don't know if it's because I haven't yet modified the 'steps' dict in the first few lines in 'core/test.py' .If it is for this reason, can you tell me how to modify the 'steps' dict?As a junior who has just studied deep-learning for one or two months, it's really a bit difficult for me,thanks a lot.

About celeba-hq.yaml

Thanks for your great works. I have some questions about the training configuration. There are differences between celeba-hq.yaml and celeba-hq_256.yaml in configs folder, such as the normalization in different channels. What's the reason here? If I try to train the model based on the size of 512 or 1024, how should I set? It is appreciated if I can receive your reple

如何设置custom datasets?

作者您好,想问一下如何设置custom datasets?需要自己在网上搜索TAGs的相关图片吗?还是使用提供的数据集呢?我在CelebA-HQ里没有找到符合条件的数据集
截屏2022-07-20 15 23 06

Questions about the setting of the Training Phases

According to your paper and training code, three image generate phases ( raw / self / random style code) had been used. Howerver, it's easy to pick up another similar phase, like random style image, as an training phase. Specifically, randomly pick another image as the input of style encoder, follow the same data flow like random style code. Some other works( like https://github.com/saic-mdal/HiDT ) had been picked such phases into their traning phases and got satisfied results.

As a result, I just wondering that have you been tried this phase before? How was the result be like and why did not add it into your paper?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.