Coder Social home page Coder Social logo

mit-han-lab / anycost-gan Goto Github PK

View Code? Open in Web Editor NEW
769.0 23.0 95.0 17.12 MB

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Home Page: https://hanlab.mit.edu/projects/anycost-gan/

License: MIT License

Python 88.00% C++ 1.21% Cuda 9.75% Shell 1.04%
computer-vision deep-learning computer-graphics generative-adversarial-network gan image-generation image-manipulation image-editing gans pytorch

anycost-gan's People

Contributors

amrzv avatar junyanz avatar songhan avatar tonylins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anycost-gan's Issues

Using My Face

Hello.
Can I use my own face to add a smile? How do I do it?

Using this tool for another LSUN dataset + model

Hello! Thanks for creating this.

I am trying to use this tool with another model (the LSUN Churches dataset) with sliders that represent attributes.

As I understand it, these are the steps I need to take to configure this toolset to work with a different dataset + pre-trained network:

  • Make sure the format of the LSUN Churches dataset matches as described for other models

  • Change the config name as described here to config_name = ''stylegan2-church-config-f", referring to the pre-trained network found here

  • run models.get_pretrained('attribute-predictor') as described in the pre-trained-models section of README

  • Change the relevant attribute labels in the files that show up during this search

I am just wondering if there any obvious steps I am missing to get this working, I am very new to the world of GANs and toolsets. Thank you for your time 😊

low-resolution output image looks unnatural

I find low-resolution outputs image(below 128 size) looks unnatural.
According to Figure 4b, the low-resolution output should look natural too but is similar to vanilla StyleGAN2 output.

Is AnycostGAN effective in shortening the experiment time?

Hi, @tonylins
I have a question.
As I understand, this study has been studied for the purpose of fast inference in several edge devices.

In addition, this technology seems to be trying to effectively apply knowledge distillation to a high resolution that requires a lot of learning time by conducting various experiments with a fast experiment at low resolution and the confirmed experimental results.

In this part, I am interested. I want to do sufficiently different experiments (ex. Conditional GAN ​​etc.) at 64x64 or 128x128 using AnycostGAN, and apply it to high resolution after completing the experiment. I am curious if it will be applied well to this part.

Obviously, it will be confirmed by experimenting, but if there are any additional papers or techniques that can be referenced in this research method, I would appreciate it if you would recommend it.

And I am curious about your opinion on whether applying technology to high resolution after experimenting in low resolution for the fast experiment is more effective than learning single resolution.

Default parameters for project.py do not recreate projected latents in assets/demo/projected_latents

Hello, great work! I am wondering what options you use to calculate the projected latents in assets/demo/projected_latents?
I am trying to recreate them using the default parameters via: python3 tools/project.py 00_ryan.jpg
But the resulting vectors are numerically different and, when viewed in demo.py: (1) the projected image is good but clearly different than the projected image preloaded in the repo and (2) the editing directions don't seem to work very well for this set of latent codes.

Below I've included a screenshot of the behavior I am seeing. Note the differences in his neckline from the demo projection and the lack of any meaningful change in the output image.
image

How can I edit my photos using your already trained model in demo.py

Hello. I express my gratitude for the work done.

I am far from programming. Installed your project.
Can you explain in more detail how I can edit my images using demo.py. I found the paths input_images and projected_latents.
Npy files stay old when replacing images.

What steps should I take to get interactive image editing for my images?

Thanks in advance, I don’t know how to do it.

How to compute MACs and flops

Hi, thanks for your impressive work. I have some problems about MACs/flops.

I compute MACs based on https://github.com/Lyken17/pytorch-OpCounter. However, the computational cost is a little different from your results. The styleganv2 has some custom ops: 'PixelNorm', 'EqualConv2d', 'EqualLinear', 'ModulatedConv2d', 'StyledConv', 'ConvLayer', 'ResBlock', 'ConstantInput', 'ToRGB',(which include FusedLeakyReLU, fused_leaky_relu, upfirdn2d, NoiseInjection, Blur)

I am not sure if you consider all of these ops.

In your code fid.py, I only find
if hvd.rank() == 0:
try:
from torchprofile import profile_macs
macs = profile_macs(generator, torch.rand(1, 1, 512).to(device))

However, the torchprofile package does not contain the above operations.

In content GAN compression https://github.com/lychenyoko/content-aware-gan-compression/blob/master/Util/Calculators.py, it seems that it regards them as CONV and LINEAR layers. Other operations (e.g., FusedLeakyReLU, fused_leaky_relu, upfirdn2d, NoiseInjection, Blur) are ignored.

Could you please let me know how to compute the MACs to reproduce the code? Thanks!

Custom image editing

Question 1: how to generate latent image code in custom image editing
Question 2: when customizing image editing properties, can i use all 40 properties to modify without needing to be retrained?I see that demo.py uses eight properties

生成图全是灰色

import torch
import numpy as np
import os, random
from PIL import Image
from tqdm import tqdm
from models.dynamic_channel import set_uniform_channel_ratio, reset_generator
import models
import time
import cv2
config = 'anycost-ffhq-config-f'
device = 'cuda:2'

class Face_Editor():
    def __init__(self):
        self.init_model()

    def init_model(self):
        self.anycost_channel = 1.0
        self.anycost_resolution = 1024
        self.generator = models.get_pretrained('generator', config).to(device)
        self.generator.eval()

    def sample(self):
        torch.manual_seed(1601)
        # latent = torch.randn(1, 1, 512, device=device)
        # mean_style = self.generator.mean_style(10000)
        # self.input_kwargs = {'styles':latent, 'return_rgbs':True, 'truncation':0.5,
        #                      'truncation_style':mean_style, 'randomize_noise':False}
        # style = torch.randn(1, 18, 512, device=device)
        style = np.load('/simple/zlp1/masters/anycost-gan/assets/demo_ori/projected_latents/00_ryan.npy')
        style = torch.from_numpy(style).view(1, -1, 512).to(device)
        self.input_kwargs = {'styles': style,
                            'noise': None, 'randomize_noise': False, 'input_is_style': True}
        image = self.generate_image()
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        cv2.imshow('image', image)
        cv2.waitKey(0)

    def generate_image(self):
        def image_to_np(x):
            assert x.shape[0] == 1
            x = x.squeeze(0).permute(1, 2, 0)
            x = (x + 1) * 0.5  # 0-1
            x = (x * 255).cpu().numpy().astype('uint8')
            return x

        with torch.no_grad():
            print(self.input_kwargs)
            out = self.generator(**self.input_kwargs)[0].clamp(-1, 1)
            out = image_to_np(out)
            return out


if __name__ == '__main__':
    FE = Face_Editor()
    FE.sample()

checkpoint for mult-resolution training

Dear anycost-gan team,

Thank you for sharing this great work, I really like it.

Would you minding sharing the intermedia checkpoint for mult-resolution step? To train anycost gan, we need to do 3 steps:

  1. Training the original StyleGAN2 on FFHQ
  2. Training Anycost GAN: mult-resolution
  3. Training Anycost GAN: adaptive-channel

You provide the checkpoint for 1st and 3rd steps. Would you minding aslo sharing the checkpoint for the second step? I understand that I can train it by myself, but 8 gpus for 5 days is really too heave resource for us.

Thank you for your help.

Best Wishes,

Alex

I want to embedding 256x256 image and generate 256x256 image test.

Hi, @tonylins
Thank you for your good paper.

In the case of this Github, only 256x256 resolution can be encoded. However, it seems that only the resolutions of 1024x1024 and 512x512 are uploaded through the decoder.

What I want to test is to encode and decode a 256x256 image and check whether the same image as the original image comes out.

Can you send me 256x256 anycost-ffhq decoder weight?

Need help editing my own uploaded images

It seems like this repo is generating faces out of randomness and edit those random faces. I want to upload my own images and edit them. How do I do that?

Thanks in advance.

Share the pretrained anycost Discriminator

I'm using your generator

G = models.get_pretrained("generator", 'anycost-ffhq-config-f')

Can you provide also the pretained discriminator used, with the same structure? I see the class Discriminator in anycost_gan.py but it raises a NotImplementedError when doing the same for D.
Could you publish it?

wrong image generate by using config: stylegan2-

import torch
import numpy as np
import os
from PIL import Image
from models.dynamic_channel import set_uniform_channel_ratio, reset_generator
import models


class FaceEditor:
    def __init__(self, config, device, anycost_resolution=1024, n_style_to_change=12):
        # load assets
        self.device = device
        self.anycost_channel = 1.0
        self.anycost_resolution = anycost_resolution
        self.n_style_to_change = n_style_to_change

        # build the generator
        self.generator = models.get_pretrained('generator', config).to(device)
        self.generator.eval()
        set_uniform_channel_ratio(self.generator, 0.5)  # set channel
        self.generator.target_res = anycost_resolution  # set resolution
        # self.generator.target_res = self.anycost_resolution
        self.mean_latent = self.generator.mean_style(10000)

        # select only a subset of the directions to use
        '''
        possible keys:
        ['00_5_o_Clock_Shadow', '01_Arched_Eyebrows', '02_Attractive', '03_Bags_Under_Eyes', '04_Bald', '05_Bangs',
            '06_Big_Lips', '07_Big_Nose', '08_Black_Hair', '09_Blond_Hair', '10_Blurry', '11_Brown_Hair', '12_Bushy_Eyebrows',
            '13_Chubby', '14_Double_Chin', '15_Eyeglasses', '16_Goatee', '17_Gray_Hair', '18_Heavy_Makeup', '19_High_Cheekbones',
            '20_Male', '21_Mouth_Slightly_Open', '22_Mustache', '23_Narrow_Eyes', '24_No_Beard', '25_Oval_Face', '26_Pale_Skin',
            '27_Pointy_Nose', '28_Receding_Hairline', '29_Rosy_Cheeks', '30_Sideburns', '31_Smiling', '32_Straight_Hair',
            '33_Wavy_Hair', '34_Wearing_Earrings', '35_Wearing_Hat', '36_Wearing_Lipstick', '37_Wearing_Necklace',
            '38_Wearing_Necktie', '39_Young']
        '''

        direction_map = {
            'smiling': '31_Smiling',
            'young': '39_Young',
            'wavy hair': '33_Wavy_Hair',
            'gray hair': '17_Gray_Hair',
            'blonde hair': '09_Blond_Hair',
            'eyeglass': '15_Eyeglasses',
            'mustache': '22_Mustache',
        }

        boundaries = models.get_pretrained('boundary', config)
        self.direction_dict = dict()
        for k, v in boundaries.items():
            self.direction_dict[k] = v.view(1, 1, -1)

    def get_latent_code(self, latent_code_path):
        latent_code = torch.from_numpy(np.load(os.path.join(latent_code_path))).view(1, -1, 512)
        return latent_code

    def get_direction_dict(self, attr_weights):
        final_dict = {}
        for key, value in attr_weights.items():
            if value == 0:
                continue
            final_dict[key] = value * self.direction_dict[key]
        return final_dict

    def get_boundary_dict(self):
        return self.direction_dict

    def generate_image(self, save_path, input_kwargs):
        def image_to_np(x):
            assert x.shape[0] == 1
            x = x.squeeze(0).permute(1, 2, 0)
            x = (x + 1) * 0.5  # 0-1
            x = (x * 255).cpu().numpy().astype('uint8')
            return x

        with torch.no_grad():
            out = self.generator(**input_kwargs)[0].clamp(-1, 1)
            out = image_to_np(out)
            out = np.ascontiguousarray(out)
            img_pil = Image.fromarray(out)
            img_pil.save(save_path)

    def edit(self, latent_code_path, attr_sliders, force_full_g=False):
        latent_code = torch.from_numpy(np.load(os.path.join(latent_code_path))).view(1, -1, 512).to(self.device)
        # input kwargs for the generator

        edited_code = latent_code.clone()
        for direction_name in attr_sliders.keys():
            edited_code[:, :self.n_style_to_change] = edited_code[:, :self.n_style_to_change] \
                                                 + attr_sliders[direction_name] * self.direction_dict[
                                                     direction_name].to(self.device)

        edited_code = edited_code.to(self.device)
        if not force_full_g:
            set_uniform_channel_ratio(self.generator, self.anycost_channel)
            self.generator.target_res = self.anycost_resolution
        return latent_code, edited_code

if __name__ == '__main__':
    gan_config = 'stylegan2-ffhq-config-f'
    fe = FaceEditor(config=gan_config, device='cuda:0')
    latent_code = torch.from_numpy(np.load(os.path.join(latent_code_path))).view(1, -1, 512).to(self.device)
    ori_kwargs = {'styles': ori, 'noise': None, 'randomize_noise': False, 'input_is_style': True}
      
    fe.generate_image(save_path=ori_save_path, input_kwargs=ori_kwargs)

image generate by config anycost-ffhq-config-f is pretty fine, but there the image generate with config stylegan2-ffhq-config-f is wrong. How can I fix the bug? Thankyou
image

About Adaptive-channel training and Generator-conditioned discriminator.

Dear Mit-han-lab:

Thank you very much for sharing your excellent work.

I have two questions regarding the code: 1) For the Step Adaptive-channel training source code, where can I find it? Is dynamic_channel.py required for this process? Whether Is it included in the unpublished train.py? 2) I do not find G_arch operations in class Generator and DiscriminatorMultiRes.

Would you kindly help me answer this question in your spare time?

Best Wishes,

GreenLimeSia

关于encoder的训练

您好,感谢分享。
有一些不太理解的地方,希望能解答。

encoder, generator, discriminator的训练流程是怎样的?
我猜测是 先discriminator, generator训练完成后, 使用generator来训练encoder。这种流程,encoder是不会影响generator。
那么是否可以三个模型一起训练。互相影响,达到最优。

in extract_edit_directions.py

Hi,I am very confused,Attribute-predictor is poorly explained,How do I get an Attribute-predictor .pt file on different datasets?

FID for FFHQ 1024

Dear mit-han-lab,

Thank you for sharing with us this great work, I really like it.

In Table 1, you show that multiple resolution outputs have higher image quality compared to single resolution training in config E. Have you try config F, which is the standard stylegan2 mode?

According to FFHQ 1024 leadboard, the stylegan2 has FID of 2.84, while anycost GAN has FID of 2.99, which is a little bit worse. So I am wondering if you use config F as standard StyleGAN2, will you get better results than standard StyleGAN2?

Thank you for your help.

Best Wishes,

Alex

Color difference in generated image for stylegan2-ffhq-config-f model

Hi, thanks so much for this awesome library. Congrats on the great work!

I've a question regarding the color of the generated image using the stylegan2-ffhq-config-f.

Why are they yellowish and have less contrast? I'm thinking it might be the difference in how the training images for the different generators are normalized, is there a way to reverse the normalization after the images are generated using the stylegan2-ffhq-config-f?

I tried to play around with the parameters in the image_to_np function, but it did not work.

Please see attached for a sample.

sample

Please let me know what I did wrong here or if this is an improvement that could be made.

Thank you!

Recommendations on speeding up tools/project.py

I don't provide any optional argument and the projection takes 19 seconds even with a GPU. Besides tweaking n_iter, is there any other argument that I should tweak to speed up the projection process?

usage: project.py [-h] [--config CONFIG] [--encoder] [--optimizer OPTIMIZER]
                  [--n_iter N_ITER] [--optimize_sub_g]
                  [--mse_weight MSE_WEIGHT] [--enc_reg_weight ENC_REG_WEIGHT]
                  FILES [FILES ...]

Image projector to the generator latent spaces

positional arguments:
  FILES                 path to image files to be projected

optional arguments:
  -h, --help            show this help message and exit
  --config CONFIG       models config
  --encoder             use encoder prediction as init
  --optimizer OPTIMIZER
                        optimizer used
  --n_iter N_ITER       optimize iterations
  --optimize_sub_g      also optimize the sub-generators
  --mse_weight MSE_WEIGHT
                        weight of MSE loss
  --enc_reg_weight ENC_REG_WEIGHT
                        weight of encoder regularization loss

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.