wty-ustc / hairclip Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2022] HairCLIP: Design Your Hair by Text and Reference Image
License: GNU Lesser General Public License v2.1
[CVPR 2022] HairCLIP: Design Your Hair by Text and Reference Image
License: GNU Lesser General Public License v2.1
First of all , thanks for you excellent work!
There are many hairstyles in hairstyle.txt, but actually I found only a few styles in result images after trying all styles. More or less repeat the following images.
the following is my command:
python scripts/inference.py
--exp_dir=../result/test_1/
--checkpoint_path=../pretrained_models/hairclip.pt
--latents_test_path=../inference_data/test_1/latent.pt
--editing_type=hairstyle
--input_type=text
--hairstyle_description="hairstyle_list.txt"
What's the problem? Should I train with my own dataset?
I list some hairstyles which have the same effect:
Could you please tell me where I can get the celeba_hq_train and celeba_hq_val?
Hi. 🤗
This is an awesome work. 👍
Thanks for all of you, the contributors. 🌹
I am wondering if you could tell me if you have any plan to make one demo public on huggingface/spaces, etc. 🤔 ?
Hi,
I am trying to re-implement your paper but can not get good results on both image and text path.
So I would like to verify some implementation details:
import torch
import torch.nn as nn
import torch.nn.functional as F
from models.stylegan2.model import EqualLinear
class MapperBlock(nn.Module):
def __init__(self, channels=512):
super(MapperBlock, self).__init__()
self.fc = EqualLinear(channels,channels)
self.f_gamma = nn.Sequential(
EqualLinear(channels,channels), nn.LayerNorm(channels), nn.LeakyReLU(0.2),
EqualLinear(channels,channels)
)
self.f_beta = nn.Sequential(
EqualLinear(channels,channels), nn.LayerNorm(channels), nn.LeakyReLU(0.2),
EqualLinear(channels,channels)
)
self.act = nn.LeakyReLU(0.2)
def modulation(self, x, e):
gamma = self.f_gamma(e)
beta = self.f_beta(e)
# norm x
x = F.layer_norm(x, (x.shape[-1],))
# modulation
return (1.0 + gamma) * x + beta
def forward(self, x, e):
x = self.fc(x)
x = self.modulation(x, e)
return self.act(x)
Is it correct?
According to your paper, the reference style/text is randomly set to image or text. My understanding is the image/text manipulation loss is only calculated when image/text reference is used, but the total loss value range is vary in different condition. Does the loss weights always keep the same in all condition or need to adjust for different condition?
In your paper: "we also generated several edited images using our text-guided hair editing method to augment the diversity of the
reference image set." Could you elaborate more details about your method? Or any other reference paper?
Thanks for your help.
I want to change hair color on FFHQ data, however hairstyle of some of results are change.
Did I do wrong?
The following is my command
python scripts/inference.py
--exp_dir=./experiment
--checkpoint_path=../pretrained_models/hairclip.pt
--latents_test_path=./latents.pt
--editing_type=color
--input_type=text
--color_description=red
Hi,
When I train my own hairstyle model, do I need to convert the images under the --hairstyle_ref_img_train_path=/path/to/celeba_hq_train \ parameter into latents through the e4e algorithm. So instead of --latents_train_path=/path/to/train_faces.pt \
Hello, your work is very good. I have a question for you about your paper, in the comparison with the latest method, how many images were used to calculate the three metrics IDS, PSNR, and SSIM derived from Table 1?
Thank you for your great project!
In this paper, you said “We train and evaluate our hair mapper on the CelebA-HQ dataset. Since we use e4e [43] as our inversion encoder, we follow its division of the training set and test set.” However, I found that e4e used the FFHQ dataset for training and the CelebA-HQ test dataset for evaluation. Hence, I feel confused.
My question is that how to split the training and test datasets on the CelebA-HQ dataset?
I test the feature on replicate but notice that some photo can result local variable 'shape' referenced before assignment. Is there any way we can fix this?
File "predict.py", line 168, in run_alignment
aligned_image = align_face(filepath=image_path, predictor=predictor)
File "/src/encoder4editing/utils/alignment.py", line 35, in align_face
lm = get_landmark(filepath, predictor)
File "/src/encoder4editing/utils/alignment.py", line 21, in get_landmark
t = list(shape.parts())
UnboundLocalError: local variable 'shape' referenced before assignment
Hello, can I use my own image for the resend test, I found that the input was test_face.pt (test data set ?) file, and I did not find the input image content in the code, The only thing that feels like an input image is w(w=torch.Size([1, 18, 512])), But it's not the size of a picture
Thanks you for you great works! Do you think video hair editing based on HairCLIP is achievable? I have a little try, but the region of hairstyle still hard to control. Consistency in hair styles is quite difficult to maintain. Can you give me some insights about video-hairstyle-editing?
mask_512 = (torch.unsqueeze(torch.max(labels_predict, 1)[1], 1)==13).float()
1.why hair equal 13, bg not equal 13?
2.unet infer results that have 19 channels, what did they means?
Hi, thanks for your work.
I have found that the hairclip algorithm is not very good at preserving facial details, such as work like "Barbershop: Hair Transfer with GAN-Based Image Compositing Using Segmentation Masks". I tried "HFGI: High-Fidelity GAN Inversion for Image Attribute Editing (CVPR 2022)" as a latent encoder, but the effect is not very good.
Can you give some advice or methods on how to preserve the facial details? Very much looking forward to your answer, thank you!
Hi, would you be interested in adding HairCLIP to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.
Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook
Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP
github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore
and here are guides for adding spaces/models/datasets to your org
How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html
Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.
I am getting error when I try to take inference. I am using this command
python scripts/inference.py
--exp_dir=/content/resultss
--editing_type=both
--input_type=image_image
--hairstyle_ref_img_test_path=/content/oriental1.png
--color_ref_img_test_path=/content/oriental1.png
--num_of_ref_img 1
--checkpoint_path=/content/drive/MyDrive/data/hairclip.pt
--latents_test_path=/content/drive/MyDrive/data/latents.pt
What I am trying to is to take transfer hairstyle of refrence Image to input Image. I have converted input image to e4e to get latent code. Please do let me know. thnx.
Can you provide a script file to input a single picture for final prediction?
输入命令:
E:\Linux\XSpace\papers\HairCLIP\mapper>python scripts/inference.py --exp_dir=E:\Linux\XSpace\pap
ers\HairCLIP\data\exp --checkpoint_path=F:\Dataset\CelebA\Data\hairclip.pt --latents_test_path=F:\Dataset\CelebA\Data\test_faces.pt --editin
g_type=color --input_type=image --hairstyle_description="hairstyle_list.txt" --color_ref_img_test_path=E:\Linux\XSpace\papers\HairCLIP\data
ref
在 latent_mappers.py 中的 x = clip_model.encode_image(masked_generated_renormed) 报错了,错误信息如下:
*** RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9591.py", line 19, in encode_image
_0 = self.visual
input = torch.to(image, torch.device("cuda:0"), 5, False, False, None)
return (_0).forward(input, )
~~~~~~~~~~~ <--- HERE
def encode_text(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9591.Multimodal,
input: Tensor) -> Tensor:
File "code/torch/multimodal/model/multimodal_transformer.py", line 34, in forward
x2 = torch.add(x1, torch.to(_4, 5, False, False, None), alpha=1)
x3 = torch.permute((_3).forward(x2, ), [1, 0, 2])
x4 = torch.permute((_2).forward(x3, ), [1, 0, 2])
~~~~~~~~~~~ <--- HERE
_15 = torch.slice(x4, 0, 0, 9223372036854775807, 1)
x5 = torch.slice(torch.select(_15, 1, 0), 1, 0, 9223372036854775807, 1)
File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9477.py", line 8, in forward
def forward(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9477.Transformer,
x: Tensor) -> Tensor:
return (self.resblocks).forward(x, )
~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
def forward1(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9477.Transformer,
x: Tensor) -> Tensor:
File "code/torch/torch/nn/modules/container/___torch_mangle_9476.py", line 29, in forward
_8 = getattr(self, "3")
_9 = getattr(self, "2")
_10 = (getattr(self, "1")).forward((getattr(self, "0")).forward(x, ), )
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_11 = (_7).forward((_8).forward((_9).forward(_10, ), ), )
_12 = (_4).forward((_5).forward((_6).forward(_11, ), ), )
File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9376.py", line 13, in forward
_0 = self.mlp
_1 = self.ln_2
_2 = (self.attn).forward((self.ln_1).forward(x, ), )
~~~~~~~~~~~~~~~~~~ <--- HERE
x0 = torch.add(x, _2, alpha=1)
x1 = torch.add(x0, (_0).forward((_1).forward(x0, ), ), alpha=1)
File "code/torch/torch/nn/modules/activation/___torch_mangle_9369.py", line 38, in forward
_16 = [-1, int(torch.mul(bsz, CONSTANTS.c0)), _8]
v0 = torch.transpose(torch.view(_15, _16), 0, 1)
attn_output_weights = torch.bmm(q2, torch.transpose(k0, 1, 2))
~~~~~~~~~ <--- HERE
input = torch.softmax(attn_output_weights, -1, None)
attn_output_weights0 = torch.dropout(input, 0., True)
Traceback of TorchScript, original code (most recent call last):
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py(4294): multi_head_attention_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/activation.py(985): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(45): attention
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(48): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py(117): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(63): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(93): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(221): visual_forward
/opt/conda/lib/python3.7/site-packages/torch/jit/_trace.py(940): trace_module
(36): export_torchscript_models
(3):
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3418): run_code
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3338): run_ast_nodes
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3147): run_cell_async
/opt/conda/lib/python3.7/site-packages/IPython/core/async_helpers.py(68): _pseudo_sync_runner
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(2923): _run_cell
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(2878): run_cell
/opt/conda/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py(555): interact
/opt/conda/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py(564): mainloop
/opt/conda/lib/python3.7/site-packages/IPython/terminal/ipapp.py(356): start
/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py(845): launch_instance
/opt/conda/lib/python3.7/site-packages/IPython/init.py(126): start_ipython
/opt/conda/bin/ipython(8):
RuntimeError: cublas runtime error : unknown error at C:/cb/pytorch_1000000000000/work/aten/src/THC/THCBlas.cu:225
(Pdb) img_tensor.shape
torch.Size([1, 3, 1024, 1024])
请问是输入的tensor大小不对吗
Hello, boss. I noticed that the neural network structure diagram may be incorrectly drawn in the paper. F should be fine, meaning high-level semantic information; C should be coarse, meaning low-level semantic information.
Hi!
First off, thank you for your work!
I'm trying to create a Colab Notebook to play with your model, but since the weights and stuff are hosted inside google drive, the download limits seems to restrict me from simply downloading it with gdown
or wget
.
Could I download it and move it to another hosting service (i.e archive.org) to avoid this issue? Of course, I would add all the references to all the authors and parties involved.
Again, thanks for your work!
Hello, I want to ask if the speed of run the inferrence.py for testing is normal. This is my executive code: cd mapper
python scripts/inference.py
--exp_dir=/home/ps/HairCLIP/mapper/path/to/experiment
--checkpoint_path=/home/ps/HairCLIP/pretrained_models/hairclip.pt
--latents_test_path=/home/ps/HairCLIP/mapper/path/to/test_faces.pt
--editing_type=hairstyle
--input_type=text
--hairstyle_description="/home/ps/HairCLIP/mapper/hairstyle_list.txt" \
sry , I'm begginer of ML , programming..
so I can't start this project with read me ...
plz help me
hello thanks for your talented work. I have a question about color_ref_img_in_domain_path. When I finished pre-train with the argument hairstyle_manipulation_prob=0 --color_manipulation_prob=1 --both_manipulation_prob=0 --hairstyle_text_manipulation_prob=0.5 --color_text_manipulation_prob=1 --. How should I set the color_ref_img_in_domain_path. Is that path should be logs/image_train, but I got this error, I don't know where to find these documents. Looking forward to your reply
the error is
FileNotFoundError: [Errno 2] No such file or directory: '/home/code/HairCLIP/logs/images_train/red hair/02951.jpg'
The Source code dont include encoder4editing and I have to copy from another repo to encoder4editing directory.
But After Added. many module name conflics such as models criterria.
Is there a clear description on how to add encoder4editing?
@wty-ustc Thank you for the amazing work!
I try to split the CelebA-HQ by official list_eval_partition.txt. Eventually, I got 24183/2993/2824 images for training/validation/testing split. but i found the len of train.pt is 24176 ...so... I'm very confused about what data you're used?
How to change the hairstyle of a given image with a image containing the target hairstyle?
I tested the effect and found that the hair style of the generated image is quite different from that of the reference image. Here is my test script. The reference image is selected from CelebAMask-HQ dataset. Is there a problem in my test process?
python scripts/inference.py \ --exp_dir=../outputs/0321/ \ --checkpoint_path=../pretrained_models/hairclip.pt \ --latents_test_path=../pretrained_models/test_faces.pt \ --editing_type=both \ --input_type=image_image \ --color_ref_img_test_path=../input/16 \ --hairstyle_ref_img_test_path=../input/16 --num_of_ref_img 1
Hi, I found there are no scripts files in this repo.
我这边生成的celeba-hq数据大部分都是带有很多噪声,尝试过重新编译链接libjpeg-8d,但是没有起作用
In your paper you mention using ACD as a measure of color differences. Where does this indicator come from? Is there any code we can use?
This is the command I am using.
%%shell
eval "$(conda shell.bash hook)"
conda activate myenviroment
python scripts/train.py
--exp_dir=/content/outss
--hairstyle_description="hairstyle_list.txt"
--color_description=black,brown,yellow
--checkpoint_path=/content/drive/MyDrive/data/hairclip.pt
--ir_se50_weights=/content/drive/MyDrive/data/model_ir_se50.pth
--latents_train_path=/content/drive/MyDrive/data/trainlatent/latents.pt
--latents_test_path=/content/drive/MyDrive/data/testlatent/latents1.pt
--hairstyle_ref_img_train_path=/content/inversions
--hairstyle_ref_img_test_path=/content/test/inversions
--color_ref_img_train_path=/content/inversions
--color_ref_img_test_path=/content/test/inversions
--color_ref_img_in_domain_path=/content/inversions
--hairstyle_manipulation_prob=0.5
--color_manipulation_prob=0.2
--both_manipulation_prob=0.27
--hairstyle_text_manipulation_prob=0.5
--color_text_manipulation_prob=0
--color_in_domain_ref_manipulation_prob=0.25 \
Excellent work! When to release testing code and pretrained model, can't wait to try.
Based on the pre-trained model you provided, edit the hair style with text and edit the hair color with image, but the hair color editing did not work. Do I have to retrain the new model myself? And How to obtain the model specified by the test parameter "--parsenet_weights"?
Hello, the author, I want to ask whether the input test set is the original image or "e4e"
rt
😄
HI ,Thank you for your work
In line 11, (from cog import BasePredictor, Path, Input)this sentence means ?
I have a red wavy line here
Thanks for making such a good model. Actually, I wanted to inference the model on my image data without converting it e4e. Could u please help with this? Thanks.
Will stylegan inversion encoder be trained? I found that CLIP image encoder and CLIP Text Encoder used detach() to make it untrained. I look forward to your answer. Thank you!
Hi,
Great work!
But I have a question about the modulation module of mapper network.
I assume the dimension of x and e should be 1x1xC.
If so, what is the mean and std of x? channel-wise average?
And how about the output dimensions of fr(e) and fb(e)?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.