wty-ustc / hairclip Goto Github PK

View Code? Open in Web Editor NEW

507.0 19.0 66.0 40.34 MB

[CVPR 2022] HairCLIP: Design Your Hair by Text and Reference Image

License: GNU Lesser General Public License v2.1

Python 100.00%

hairclip's People

Stargazers

Watchers

Forkers

chaoso johndpope gehadaboharga gongchensz haile-vnu janchen0611 iwldzt3011 cnnandbn bruinxiong lincong666 ahuirecome markzxzx liushanyuan18 cv-ip repo-collection peterzhousz zzitaileo timewarlock linghaochan johnnyracer panghongwei17 yanglei50 peternara summerwbb chenhuayou peterzs jianzfb swordbearfire jackzhousz leixiaoning u6544100 konatasick lallorona sw-maestro-hyper-gan five-hundred-years-ago tornado0105 heyzbw tommy-qichang gf9276 beardstyleai veederlord iwillcodeu xifeng205 git8949 arjunjamwal xiaoqingwang cybersys deviljoker5200 lqr131115 harrison001 bingwen-hu diningsystem zardyuan molecular-medicine olgaiv39 matteobelvederesi-newtechnology kippapollo zlin0530 anton-karlovskiy sutv davidmcdonald1993 jcroot tajulsdream ronyland superstar0123456789

hairclip's Issues

Hairstyles can only show so much

First of all , thanks for you excellent work!
There are many hairstyles in hairstyle.txt, but actually I found only a few styles in result images after trying all styles. More or less repeat the following images.

cornrows cut hairstyle
crew cut hairstyle

(the points on left glasses in right image is mouse)

the following is my command:

python scripts/inference.py 
--exp_dir=../result/test_1/
--checkpoint_path=../pretrained_models/hairclip.pt
--latents_test_path=../inference_data/test_1/latent.pt
--editing_type=hairstyle
--input_type=text
--hairstyle_description="hairstyle_list.txt"

What's the problem? Should I train with my own dataset?

I list some hairstyles which have the same effect:

1. the same as cornrows: crown braid hairstyle, dreadlocks hairstyle, finger waves hairstyle, french braid hairstyle and so on.
1. the same as crew cut hairstyle: caesar cut hairstyle, dido flip hairstyle, extensions hairstyle, fade hairstyle, fauxhawk hairstyle, frosted tops hairstyle ,full crown hairstyle, harvard clip hairstyle, high and tigh hairstyle, hime cut hairstyle, hi-top fade hairstyle and so son.

How to get the reference dataset?

Could you please tell me where I can get the celeba_hq_train and celeba_hq_val?

Demo Play ?

Hi. 🤗
This is an awesome work. 👍
Thanks for all of you, the contributors. 🌹
I am wondering if you could tell me if you have any plan to make one demo public on huggingface/spaces, etc. 🤔 ？

About training details

Hi,
I am trying to re-implement your paper but can not get good results on both image and text path.
So I would like to verify some implementation details:

Below is my implementation of Modulation Module inside Mapper(in pytorch):

import torch
import torch.nn as nn
import torch.nn.functional as F
from models.stylegan2.model import EqualLinear

class MapperBlock(nn.Module):
    def __init__(self, channels=512):
        super(MapperBlock, self).__init__()
        self.fc = EqualLinear(channels,channels)
        self.f_gamma = nn.Sequential(
            EqualLinear(channels,channels), nn.LayerNorm(channels), nn.LeakyReLU(0.2),
            EqualLinear(channels,channels)
        )
        self.f_beta = nn.Sequential(
            EqualLinear(channels,channels), nn.LayerNorm(channels), nn.LeakyReLU(0.2),
            EqualLinear(channels,channels)
        )
        self.act = nn.LeakyReLU(0.2)
    
    def modulation(self, x, e):
        gamma = self.f_gamma(e)
        beta = self.f_beta(e)

        # norm x
        x = F.layer_norm(x, (x.shape[-1],))
        
        # modulation
        return (1.0 + gamma) * x + beta

    def forward(self, x, e):
        x = self.fc(x)
        x = self.modulation(x, e)
        return self.act(x)

Is it correct?

According to your paper, the reference style/text is randomly set to image or text. My understanding is the image/text manipulation loss is only calculated when image/text reference is used, but the total loss value range is vary in different condition. Does the loss weights always keep the same in all condition or need to adjust for different condition?
In your paper: "we also generated several edited images using our text-guided hair editing method to augment the diversity of the
reference image set." Could you elaborate more details about your method? Or any other reference paper?

Thanks for your help.

Except hair coloe change only, but hair style of some results are change

I want to change hair color on FFHQ data, however hairstyle of some of results are change.
Did I do wrong?
The following is my command

python scripts/inference.py
--exp_dir=./experiment
--checkpoint_path=../pretrained_models/hairclip.pt
--latents_test_path=./latents.pt
--editing_type=color
--input_type=text
--color_description=red

Ask about train hairstyles

Hi，

When I train my own hairstyle model, do I need to convert the images under the --hairstyle_ref_img_train_path=/path/to/celeba_hq_train \ parameter into latents through the e4e algorithm. So instead of --latents_train_path=/path/to/train_faces.pt \

How many images were used to calculate the three metrics IDS, PSNR, and SSIM derived from Table 1？

Hello, your work is very good. I have a question for you about your paper, in the comparison with the latest method, how many images were used to calculate the three metrics IDS, PSNR, and SSIM derived from Table 1?

What is the truncation for in function configure_datasets of coach.py?

About the training details.

Thank you for your great project!

In this paper, you said “We train and evaluate our hair mapper on the CelebA-HQ dataset. Since we use e4e [43] as our inversion encoder, we follow its division of the training set and test set.” However, I found that e4e used the FFHQ dataset for training and the CelebA-HQ test dataset for evaluation. Hence, I feel confused.
My question is that how to split the training and test datasets on the CelebA-HQ dataset?

local variable 'shape' referenced before assignment

I test the feature on replicate but notice that some photo can result local variable 'shape' referenced before assignment. Is there any way we can fix this?

File "predict.py", line 168, in run_alignment
aligned_image = align_face(filepath=image_path, predictor=predictor)
File "/src/encoder4editing/utils/alignment.py", line 35, in align_face
lm = get_landmark(filepath, predictor)
File "/src/encoder4editing/utils/alignment.py", line 21, in get_landmark
t = list(shape.parts())
UnboundLocalError: local variable 'shape' referenced before assignment

How to slove the problem "train.py: error: the following arguments are required: --hairstyle_description, --color_description " when i run code in Colab?

Can I use my own image test?

Hello, can I use my own image for the resend test, I found that the input was test_face.pt (test data set ?) file, and I did not find the input image content in the code, The only thing that feels like an input image is w(w=torch.Size([1, 18, 512])), But it's not the size of a picture

About Video Hair Editing

Thanks you for you great works! Do you think video hair editing based on HairCLIP is achievable？ I have a little try, but the region of hairstyle still hard to control. Consistency in hair styles is quite difficult to maintain. Can you give me some insights about video-hairstyle-editing?

about pretrained unet infer

mask_512 = (torch.unsqueeze(torch.max(labels_predict, 1)[1], 1)==13).float()
1.why hair equal 13, bg not equal 13?
2.unet infer results that have 19 channels, what did they means?

How to preserve facial details better like "Babershop"?

Hi, thanks for your work.
I have found that the hairclip algorithm is not very good at preserving facial details, such as work like "Barbershop: Hair Transfer with GAN-Based Image Compositing Using Segmentation Masks". I tried "HFGI: High-Fidelity GAN Inversion for Image Attribute Editing (CVPR 2022)" as a latent encoder, but the effect is not very good.

Can you give some advice or methods on how to preserve the facial details? Very much looking forward to your answer, thank you!

hairclip demo
left image: input image, right image: result
hairclip demo use HFGI latents
left image: input image, right image: result
Babershop demo

add web demo/models to Huggingface

Hi, would you be interested in adding HairCLIP to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

Getting error on inference when using reference image hairstyle to paste on input image.

I am getting error when I try to take inference. I am using this command
python scripts/inference.py
--exp_dir=/content/resultss
--editing_type=both
--input_type=image_image
--hairstyle_ref_img_test_path=/content/oriental1.png
--color_ref_img_test_path=/content/oriental1.png
--num_of_ref_img 1
--checkpoint_path=/content/drive/MyDrive/data/hairclip.pt
--latents_test_path=/content/drive/MyDrive/data/latents.pt
What I am trying to is to take transfer hairstyle of refrence Image to input Image. I have converted input image to e4e to get latent code. Please do let me know. thnx.

code

Can you provide a script file to input a single picture for final prediction?

用两张图片测试的时候报错

输入命令：
E:\Linux\XSpace\papers\HairCLIP\mapper>python scripts/inference.py --exp_dir=E:\Linux\XSpace\pap
ers\HairCLIP\data\exp --checkpoint_path=F:\Dataset\CelebA\Data\hairclip.pt --latents_test_path=F:\Dataset\CelebA\Data\test_faces.pt --editin
g_type=color --input_type=image --hairstyle_description="hairstyle_list.txt" --color_ref_img_test_path=E:\Linux\XSpace\papers\HairCLIP\data
ref

在 latent_mappers.py 中的 x = clip_model.encode_image(masked_generated_renormed) 报错了，错误信息如下：

*** RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9591.py", line 19, in encode_image
_0 = self.visual
input = torch.to(image, torch.device("cuda:0"), 5, False, False, None)
return (_0).forward(input, )
~~~~~~~~~~~ <--- HERE
def encode_text(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9591.Multimodal,
input: Tensor) -> Tensor:
File "code/torch/multimodal/model/multimodal_transformer.py", line 34, in forward
x2 = torch.add(x1, torch.to(_4, 5, False, False, None), alpha=1)
x3 = torch.permute((_3).forward(x2, ), [1, 0, 2])
x4 = torch.permute((_2).forward(x3, ), [1, 0, 2])
~~~~~~~~~~~ <--- HERE
_15 = torch.slice(x4, 0, 0, 9223372036854775807, 1)
x5 = torch.slice(torch.select(_15, 1, 0), 1, 0, 9223372036854775807, 1)
File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9477.py", line 8, in forward
def forward(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9477.Transformer,
x: Tensor) -> Tensor:
return (self.resblocks).forward(x, )
~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
def forward1(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9477.Transformer,
x: Tensor) -> Tensor:
File "code/torch/torch/nn/modules/container/___torch_mangle_9476.py", line 29, in forward
_8 = getattr(self, "3")
_9 = getattr(self, "2")
_10 = (getattr(self, "1")).forward((getattr(self, "0")).forward(x, ), )
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_11 = (_7).forward((_8).forward((_9).forward(_10, ), ), )
_12 = (_4).forward((_5).forward((_6).forward(_11, ), ), )
File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9376.py", line 13, in forward
_0 = self.mlp
_1 = self.ln_2
_2 = (self.attn).forward((self.ln_1).forward(x, ), )
~~~~~~~~~~~~~~~~~~ <--- HERE
x0 = torch.add(x, _2, alpha=1)
x1 = torch.add(x0, (_0).forward((_1).forward(x0, ), ), alpha=1)
File "code/torch/torch/nn/modules/activation/___torch_mangle_9369.py", line 38, in forward
_16 = [-1, int(torch.mul(bsz, CONSTANTS.c0)), _8]
v0 = torch.transpose(torch.view(_15, _16), 0, 1)
attn_output_weights = torch.bmm(q2, torch.transpose(k0, 1, 2))
~~~~~~~~~ <--- HERE
input = torch.softmax(attn_output_weights, -1, None)
attn_output_weights0 = torch.dropout(input, 0., True)

Traceback of TorchScript, original code (most recent call last):
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py(4294): multi_head_attention_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/activation.py(985): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(45): attention
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(48): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py(117): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(63): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(93): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(221): visual_forward
/opt/conda/lib/python3.7/site-packages/torch/jit/_trace.py(940): trace_module
(36): export_torchscript_models
(3):
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3418): run_code
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3338): run_ast_nodes
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3147): run_cell_async
/opt/conda/lib/python3.7/site-packages/IPython/core/async_helpers.py(68): _pseudo_sync_runner
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(2923): _run_cell
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(2878): run_cell
/opt/conda/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py(555): interact
/opt/conda/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py(564): mainloop
/opt/conda/lib/python3.7/site-packages/IPython/terminal/ipapp.py(356): start
/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py(845): launch_instance
/opt/conda/lib/python3.7/site-packages/IPython/init.py(126): start_ipython
/opt/conda/bin/ipython(8):
RuntimeError: cublas runtime error : unknown error at C:/cb/pytorch_1000000000000/work/aten/src/THC/THCBlas.cu:225
(Pdb) img_tensor.shape
torch.Size([1, 3, 1024, 1024])

请问是输入的tensor大小不对吗

F and C

Hello, boss. I noticed that the neural network structure diagram may be incorrectly drawn in the paper. F should be fine, meaning high-level semantic information; C should be coarse, meaning low-level semantic information.

Hosting HairCLIP model

Hi!

First off, thank you for your work!

I'm trying to create a Colab Notebook to play with your model, but since the weights and stuff are hosted inside google drive, the download limits seems to restrict me from simply downloading it with gdown or wget.

Could I download it and move it to another hosting service (i.e archive.org) to avoid this issue? Of course, I would add all the references to all the authors and parties involved.

Again, thanks for your work!

Is is normal speed?

Hello, I want to ask if the speed of run the inferrence.py for testing is normal. This is my executive code: cd mapper
python scripts/inference.py
--exp_dir=/home/ps/HairCLIP/mapper/path/to/experiment
--checkpoint_path=/home/ps/HairCLIP/pretrained_models/hairclip.pt
--latents_test_path=/home/ps/HairCLIP/mapper/path/to/test_faces.pt
--editing_type=hairstyle
--input_type=text
--hairstyle_description="/home/ps/HairCLIP/mapper/hairstyle_list.txt" \

How i can start this project??

sry , I'm begginer of ML , programming..
so I can't start this project with read me ...
plz help me

about color_ref_img_in_domain_path

hello thanks for your talented work. I have a question about color_ref_img_in_domain_path. When I finished pre-train with the argument hairstyle_manipulation_prob=0 --color_manipulation_prob=1 --both_manipulation_prob=0 --hairstyle_text_manipulation_prob=0.5 --color_text_manipulation_prob=1 --. How should I set the color_ref_img_in_domain_path. Is that path should be logs/image_train, but I got this error, I don't know where to find these documents. Looking forward to your reply

the error is
FileNotFoundError: [Errno 2] No such file or directory: '/home/code/HairCLIP/logs/images_train/red hair/02951.jpg'

Name conflicts between soruce code and source code of encoder4editing

The Source code dont include encoder4editing and I have to copy from another repo to encoder4editing directory.
But After Added. many module name conflics such as models criterria.
Is there a clear description on how to add encoder4editing?

question of split database(train.pt and test.pt)

@wty-ustc Thank you for the amazing work!
I try to split the CelebA-HQ by official list_eval_partition.txt. Eventually, I got 24183/2993/2824 images for training/validation/testing split. but i found the len of train.pt is 24176 ...so... I'm very confused about what data you're used?

How to change the hairstyle of a given image with a image containing the target hairstyle?

The generated image is quite different from the reference image

I tested the effect and found that the hair style of the generated image is quite different from that of the reference image. Here is my test script. The reference image is selected from CelebAMask-HQ dataset. Is there a problem in my test process？

python scripts/inference.py \ --exp_dir=../outputs/0321/ \ --checkpoint_path=../pretrained_models/hairclip.pt \ --latents_test_path=../pretrained_models/test_faces.pt \ --editing_type=both \ --input_type=image_image \ --color_ref_img_test_path=../input/16 \ --hairstyle_ref_img_test_path=../input/16 --num_of_ref_img 1

What this mean in

scripts files are missing

Hi, I found there are no scripts files in this repo.

训练数据集能否分享一下呢？

我这边生成的celeba-hq数据大部分都是带有很多噪声，尝试过重新编译链接libjpeg-8d，但是没有起作用

what is ACD?

In your paper you mention using ACD as a measure of color differences. Where does this indicator come from? Is there any code we can use?

What is the code license?

Error while training the model on my dataset.

This is the command I am using.
%%shell
eval "$(conda shell.bash hook)"
conda activate myenviroment
python scripts/train.py
--exp_dir=/content/outss
--hairstyle_description="hairstyle_list.txt"
--color_description=black,brown,yellow
--checkpoint_path=/content/drive/MyDrive/data/hairclip.pt
--ir_se50_weights=/content/drive/MyDrive/data/model_ir_se50.pth
--latents_train_path=/content/drive/MyDrive/data/trainlatent/latents.pt
--latents_test_path=/content/drive/MyDrive/data/testlatent/latents1.pt
--hairstyle_ref_img_train_path=/content/inversions
--hairstyle_ref_img_test_path=/content/test/inversions
--color_ref_img_train_path=/content/inversions
--color_ref_img_test_path=/content/test/inversions
--color_ref_img_in_domain_path=/content/inversions
--hairstyle_manipulation_prob=0.5
--color_manipulation_prob=0.2
--both_manipulation_prob=0.27
--hairstyle_text_manipulation_prob=0.5
--color_text_manipulation_prob=0
--color_in_domain_ref_manipulation_prob=0.25 \

When to release testing code and pretrained model

Excellent work! When to release testing code and pretrained model, can't wait to try.

Using images to edit hairstyle and color does not work

Based on the pre-trained model you provided, edit the hair style with text and edit the hair color with image, but the hair color editing did not work. Do I have to retrain the new model myself? And How to obtain the model specified by the test parameter "--parsenet_weights"?

What is the data set of test input

Hello, the author, I want to ask whether the input test set is the original image or "e4e"

How do you introduce encoder4editing in predict.py without conflicting with the environment？

I notice that image manipulation loss is defined as "1-cos（，） " . But I feel confused that whether i can change that to "-cos"

how to get test_faces.pt and latent.pt??

Why does training with default parameters report an error, celeba_hq for the data set

Hi，
When I am training the model，
Why does training with default parameters report an error, celeba_hq for the data set.

how much size of input image?

great work ! cannt wait for your code to try!

😄

How to run predict.py

HI ，Thank you for your work

In line 11, （from cog import BasePredictor, Path, Input）this sentence means ？
I have a red wavy line here

Inference on image data without converting to e4e

Thanks for making such a good model. Actually, I wanted to inference the model on my image data without converting it e4e. Could u please help with this? Thanks.

Will stylegan inversion encoder be trained?

Will stylegan inversion encoder be trained? I found that CLIP image encoder and CLIP Text Encoder used detach() to make it untrained. I look forward to your answer. Thank you!

About modulation module

Hi,
Great work!
But I have a question about the modulation module of mapper network.
I assume the dimension of x and e should be 1x1xC.
If so, what is the mean and std of x? channel-wise average?
And how about the output dimensions of fr(e) and fb(e)?

Thanks.

wty-ustc / hairclip Goto Github PK

hairclip's People

Stargazers

Watchers

Forkers

hairclip's Issues

Recommend Projects

Recommend Topics

Recommend Org