Coder Social home page Coder Social logo

switchablenorms / deepfashion_try_on Goto Github PK

View Code? Open in Web Editor NEW
810.0 33.0 244.0 14.77 MB

Official code for "Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content",CVPR‘20 https://arxiv.org/abs/2003.05863

Python 100.00%
deepfashion acgpn generative-adversarial-network visual-try-on

deepfashion_try_on's Introduction

Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content, CVPR'20.

Rearranged code of CVPR 2020 paper 'Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content' for open-sourcing. We rearrange the VITON dataset for easy access.

Notably, virtual try-on is a difficult research topic, and our solution is of course not perfect. Please refer to our failure cases and limitations before using this repo.

The code is not fully tested. If you meet any bugs or want to improve the system, please feel free to raise in the Issue and we can disscuss. For email request, please send to [email protected]

[Sample Try-on Video] [Checkpoints]

[Dataset_Test] Dataset_Train

[Paper]

Update

  • [2022-7-5] We collect the try-on results of several methods given a widely used test pair list in VITON dataset (from CP-VTON). Researchers can utilize it for fast baseline comparison.

    CP-VTON+(CVPRW 2020), ACGPN(CVPR2020), DCTON(CVPR2021) and RT-VTON(CVPR 2022) are given in the results from left to right. [Datalink], [Test Pair List].

  • [2021-12-13] We remove the random dropout, and we use AdamW with weight decay to stablize training. clothes are pasted back before computing GAN loss and vgg loss. The light point artifacts are largely reduced. Code is Updated.

  • [2021-12-3] The light point artifacts seem to be caused by the variance of the imprecise human parsing when we rearrange the data for open-sourcing. We recommend to use the ATR model in https://github.com/PeikeLi/Self-Correction-Human-Parsing to get the human parsing with neck label to stablize training. To note that, face and neck part should be treated as non-target body part in mask inpainting and mask composition. With neck label, we can paste back the background before computing vgg loss and gan loss. The uncertainty of background might be another cause of the light point on the neck.

  • [2021-10-22] The light point artifacts would occur in current training results. This may be due to some version differences of our training codes when we rearranged them since we didn't observe same artifacts in our released checkpoints. It might be caused by the instablity in training the preservation (identical mapping) of clothes region in Content Fusion Module. Try to paste back the ground-truth clothes to the CFM results when calculating the VGG loss, Gan loss, Feature Matching loss (All except L1), since the above loss might degenerate the results when learning identical mapping. L1 loss can be applied to the reconstruction of clothes region to learn this identical mapping. This ISSUE addressed this problem.

Inference

python test.py

Note that the results of our pretrained model are only guaranteed in VITON dataset only, you should re-train the pipeline to get good results in other datasets.

Inference using colab Open In Colab

Thanks Levin for contributing the colab inference script.

Evaluation IS and SSIM

Note that The released checkpoints are different from what we used in the paper which generate better visual results but may have different (lower or higher) quantitative statistics. Same results of the paper can be reproduced by re-training with different training epochs.

The results for computing IS and SSIM are same-clothes reconstructed results.

The code defaultly generates random clothes-model pairs, so you need to modify ACGPN_inference/data/aligned_dataset.py to generate the reconstructed results.

Here, we also offer the reconstructed results on test set of VITON dataset by inferencing this github repo, [Precomputed Evaluation Results] The results here can be directly used to compute the IS and SSIM evalutations. You can get identical results using this github repo.

SSIM score

  1. Use the pytorch SSIM repo. https://github.com/Po-Hsun-Su/pytorch-ssim
  2. Normalize the image (img/255.0) and reshape correctly. If not normalized correctly, the results differ a lot.
  3. Compute the score with window size = 11. The SSIM score should be 0.8664, which is a higher score than reported in paper since it is a better checkpoint.

IS score

  1. Use the pytorch inception score repo. https://github.com/sbarratt/inception-score-pytorch
  2. Normalize the images ((img/255.0)*2-1) and reshape correctly. Please strictly follow the procedure given in this repo.
  3. Compute the score. The splits number also changes the results. We use splits number =1 to compute the results.
  4. Note that the released checkpoints produce IS score 2.82, which is slightly lower (but still SOTA) than the paper since it is a different checkpoint with better SSIM performance.

The specific key points we choose to evaluate the try-on difficulty

image

  • We use the pose map to calculate the difficulty level of try-on. The key motivation behind this is the more complex the occlusions and layouts are in the clothing area, the harder it will be. And the formula is given below. Also, manual selection is involved to improve the difficulty partition.
  • Variations of the pose map predictions largely affect the absolute value of try-on complexity, so you may have different partition size using our reported separation values.
  • Relative ranking of complexity best depicts the complexity distribution. Try top 100 or bottom 100 and you can see the effectiveness of our criterion.

The formula to compute the difficulty of try-on reference image

image

where t is a certain key point, Mp' is the set of key point we take into consideration, and N is the size of the set.

Segmentation Label

0 -> Background
1 -> Hair
4 -> Upclothes
5 -> Left-shoe 
6 -> Right-shoe
7 -> Noise
8 -> Pants
9 -> Left_leg
10 -> Right_leg
11 -> Left_arm
12 -> Face
13 -> Right_arm

Sample images from different difficulty level

image

Sample Try-on Results

image

Limitations and Failure Cases

image 1. Large transformation of the semantic layout is hard to handle, partly ascribing to the agnostic input of fused segmentation. 2. The shape of the original clues is not completely removed. The same problem as VITON. 3. Very difficult pose is hard to handle. Better solution could be proposed.

Training Details

Due to some version differences of the code, and some updates for better quality, some implementation details may be different from the paper.

For better inference performance, model G and G2 should be trained with 200 epoches, while model G1 and U net should be trained with 20 epoches.

License

The use of this software is RESTRICTED to non-commercial research and educational purposes.

Citation

If you use our code or models or the offered baseline results in your research, please cite with:

@InProceedings{Yang_2020_CVPR,
author = {Yang, Han and Zhang, Ruimao and Guo, Xiaobao and Liu, Wei and Zuo, Wangmeng and Luo, Ping},
title = {Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

@inproceedings{ge2021disentangled,
  title={Disentangled Cycle Consistency for Highly-realistic Virtual Try-On},
  author={Ge, Chongjian and Song, Yibing and Ge, Yuying and Yang, Han and Liu, Wei and Luo, Ping},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16928--16937},
  year={2021}
}

@inproceedings{yang2022full,
title = {Full-Range Virtual Try-On With Recurrent Tri-Level Transform},
author = {Yang, Han and Yu, Xinrui and Liu, Ziwei},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages = {3460--3469}
year = {2022}
}

Dataset

VITON Dataset This dataset is presented in VITON, containing 19,000 image pairs, each of which includes a front-view woman image and a top clothing image. After removing the invalid image pairs, it yields 16,253 pairs, further splitting into a training set of 14,221 paris and a testing set of 2,032 pairs.

deepfashion_try_on's People

Contributors

levindabhi avatar lzqhardworker avatar themarvelouswhale avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepfashion_try_on's Issues

How to test your data by using test.py?

when I run:
$ python test.py --dataroot /Data_preprocessing --phase test
I got an error:
$ AssertionError: /Data_preprocessing/test_label is not a valid directory
can some body tell me why? I just want to use the test.py to test your test data. And I download your test dataset and put it in the folder named Data_preprocessing. why occur such a strange error???

Training code

Your work is really amazing and interesting, so I wonder if you can publish the training code or share it with me([email protected]). Thank you~

segmentation label

I use LIP model to generate my own dataset, but the label is different with yours , can you tell me which segmentation model you use to generate your dataset?

how to generate test_edge?

thanks for your kindly reply,but I still have a question that test_edge only consists 0 and 255 in pixel?how to get it?

Facing index out of range error while testing? Please help

Can you please help with this error?

chethan@ex5820:~/DeepFashion_Try_On/ACGPN_inference$ python3 test.py
?
------------ Options -------------
batchSize: 1
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
data_type: 32
dataroot: ../Data_preprocessing/
debug: False
display_freq: 100
display_winsize: 512
fineSize: 512
gpu_ids: [0]
input_nc: 3
isTrain: True
label_nc: 20
lambda_feat: 10.0
loadSize: 512
load_pretrain: ./checkpoints/label2city
lr: 0.0002
max_dataset_size: inf
model: pix2pixHD
nThreads: 2
n_blocks_global: 4
n_blocks_local: 3
n_downsample_global: 4
n_layers_D: 3
n_local_enhancers: 1
name: label2city
ndf: 64
netG: global
ngf: 64
niter: 100
niter_decay: 100
niter_fix_global: 0
no_flip: False
no_ganFeat_loss: False
no_html: False
no_lsgan: False
no_vgg_loss: False
norm: instance
num_D: 2
output_nc: 3
phase: test
pool_size: 0
print_freq: 100
resize_or_crop: scale_width
save_epoch_freq: 10
save_latest_freq: 1000
serial_batches: False
tf_log: False
use_dropout: False
verbose: False
which_epoch: latest
-------------- End ----------------
CustomDatasetDataLoader
dataset [AlignedDataset] was created
../Data_preprocessing/test_label label
../Data_preprocessing/test_label label
../Data_preprocessing/test_img img
../Data_preprocessing/test_img img
../Data_preprocessing/test_edge edge
../Data_preprocessing/test_edge edge
../Data_preprocessing/test_mask mask
../Data_preprocessing/test_mask mask
../Data_preprocessing/test_colormask colormask
../Data_preprocessing/test_colormask colormask
../Data_preprocessing/test_color color
../Data_preprocessing/test_color color

Inference images = 10
latest_net_U.pth
latest_net_G1.pth
latest_net_G2.pth
latest_net_G.pth
/home/chethan/.local/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
/home/chethan/.local/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
Traceback (most recent call last):
File "test.py", line 104, in
for i, data in enumerate(dataset, start=epoch_iter):
File "/home/chethan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/home/chethan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/home/chethan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/chethan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/chethan/DeepFashion_Try_On/ACGPN_inference/data/aligned_dataset.py", line 160, in getitem
C_path = self.C_paths[test]
IndexError: list index out of range

Results are not good

Below are the testing results on my data.

  1. I have used pose_keypoints from coco_18 model and verified the keypoints by drawing on the image and are good for the below results.
  2. Generated labels with pretrained model o LIP dataset and modified the numbering accordingly.
  3. I am using test_color and test_edges same as test data provided.

The results are not completely satisfying. Any suggestions are welcome.

Thank you

image
image

Little v shape in neck is same as query image
image

Sleeves are same as query image
image

Overlaying the back layer of t-shirt on front neck
image

Left arms cloth is not overlaying
image

Cloth is same as query image
image

how to generate your dataset?

thanks for your contribution ! but i still have a question about how to generate your dataset . for example ,your train_lable and test_lable?

Why put label in the network input?

In the file DeepFashion_Try_On/ACGPN_train/models/pix2pixHD_model.py, in the 323 line, why put real_image in the G_in, is it an error? real_image is label, why put label in the network input?

Dataset problems?

Hello, in the dataset, what does the train colormask and train_mask mean? How to make this? I want to generate it use my own dataset, Could you please guide me?

AttributeError: 'Pix2PixHDModel' object has no attribute 'optimizer_G'

I have searched the entire repo but still couldnt find the solution

Traceback (most recent call last):
File "train.py", line 192, in
model.module.optimizer_G.zero_grad()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 539, in getattr
type(self).name, name))
AttributeError: 'Pix2PixHDModel' object has no attribute 'optimizer_G'

Unsatisfactory results of a pretrained model

I thought that using your pretrained models mean I can already utilize it for inference. However, i checked clothes warping module alone and it produces results with many artefacts. Is it normal?
image

Training options

Could you please share your training scripts? There are so many options in train_options.py and base_options.py, and I am not sure if the default settings will give the best results. Thank you very much.

how to run train.py

Hi there!

Thanks for providing this awesome Repository and make it openSource.I am facing some problems while running TRAIN.PY

Here is the error I am getting :
AssertionError: /dockerdata/benchmark_datasets/try_on_training/train_label is not a valid directory
Screenshot from 2020-06-05 22-00-48

How to understand pose.json?

I am a student and now I am doing some research in computer vision. Your paper and code are of great help! However, I do not understand where the data with poses in json came from. What software did you utilize? I want to use it in my project

training dataset

Hi~ I'm trying to reproduce your work on our dataset, so I'd appreciate it if you could tell me the quantity and format of your training dataset, Thank you~

Question about SSIM evaluaton, training epoch and the input of content fusion module

Thanks for this great work, however, I still have some questions about this work.

  1. How should I evaluate SSIM? I directly calculate SSIM score on image reconstruction task (reference and target are from the same image). However, the pretrained model and my own trained model (20 epoch) get 0.7980 and 0.7594 on test set.

  2. Does this model only need 20 epochs training? In default options, this model would be trained for 200 epochs. I found that SSIM score is still increasing after 20 epochs and reached 0.7653 in epoch 40.

  3. In released training and inference code, average skin color (skin_color) of each class area is used in the input of content fusion module instead of synthesized clothing mask (M_c^S) mentioned in paper. 🤔

    G_in=torch.cat([img_hole_hand,masked_label,real_image*clothes_mask,skin_color,self.gen_noise(shape)],1)

Can I use 17 keypoints?

I have a segmentation model and a keypoints model trained with 17 keypoints.
Your model seems to need 18 keypoints with_center.
Can I use your model with 17 keypoints?

Parsing label mapping from VITON to ACGPN

Hello, thank you very much for your great work! The pipeline is deeply thought and designed smartly. It would be really great to know the full label mapping between original VITON label maps and your generated new maps. Also, segmentation map you provided in the README has some number gaps. Can you please tell me what's the reason? Thank you.

Inconsistancy in the inference code and training code

According to the paper, one of the inputs to the second generator is supposed to be the output from the first generator, which is dis_label in the code. But in the training coding feeding the dis_label information to the generator, masked_label is fed to the generator which is from the train data segmentation, not the generated one. But in for the inference code, the generated variable is fed to the second generator.
The same thing is found for the inputs to the third generator.
Could you please explain these inconsistencies in training and inference code?

What does the 14 class label mean?

In vtion and cp_vton, they both use LIP_JPPNet to get person parsing label, which has 20 class labels. But in your code, your person pasing label only has 14 class, what does 14 class labels mean?
If you map the 20 class to 14 class, could you please provide map dict? thank you.

test.py

Thanks for providing open source code, can you provide test.py ?

Label issue in customised training

When I trained the network with LIP dataset (20 labels' segmentation input), this error raised
image
Based on NVlabs/SPADE#57 it should be some channel problem, but I believe I had changed relevant channels and total channel numbers except noise channel. Since Lip dataset doesn't have a counterpart, I used the dress channel and it seems not work properly. Any idea to fix this issue?

person-keypoints format

thanks for your work. I still have a quesiton that your person-keypoints has length of 54,not equal to coco format whose length is 51,can you tell me your difference between this gap?

how to inference model use other image?

There is a inference model , but I don't the input:label, label_ref, image_ref;Can you tell me what they mean? Because inference should only need to input model pictures and clothes to successfully predict, but there are 11 parameters in train model .

Training and Testing Data

I realize that data needs to be preprocessed and put in Data_preprocessing folder, but where is the data from? It would be greatly appreciated if someone write a brief instruction on how to train the model on the given data or custom data. Thanks.

UserWarning: Using a target size that is different to the input size

When running the train only see the pytorch warning log about the input image size? I also use VITON Dataset.
Is this the problem that causes the output quality error as shown below:
tho

/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py:211: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py:211: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1558: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3226: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn("Default grid_sample and affine_grid behavior has changed "
/pytorch/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, *, bool as_tuple)
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:88: UserWarning: Using a target size (torch.Size([2, 1, 256, 192])) that is different to the input size (torch.Size([2, 256, 192])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.l1_loss(input, target, reduction=self.reduction)

Runtime environment is:

Cuda compilation tools, release 10.1, V10.1.243
torch.version: 1.5.1+cu101

question about train_pose key_points annotation

Hello, thanks for the repo and the datasets, I tried to show the key_points on the img with same preifx but found that the points are disorderly. Could you tell me the correct order for the annotation key_point in train_pose repos?

Abount the effect of g1

Hi! Thanks for your great work.
I am confused about the effect of g1.
Why need to train a g1 to output m_w^S?
Can m_w^S be obtained directly by a human parser?

No testing / inference.py file

Can you please provide testing file/ inference file for us.
Seems like test.py is basically a train.py
When i try to run test.py, training gets started.

Preprocess data?

Very nice work! Thanks for sharing. Could you please explain how you pre-process data (e.g. how you derived the data_colormask, data_mask, data_edge, data_pose etc.)? Many thanks!

Problem while testing.

Hey can you help me with this? I have used parsing model and openpose what editing should i do ? I am using the normal test.py file given by you. I have added 14 images in test_img.... Same for test_pose and test_label. Added test_colormask and test_mask as it is. And 1 image in test_color and test_edge.
IMG_20200809_023715

Edges appear in the predicted semantic segmentation

I am trying the regenerate the test result. I have tested that my pose model is working fine.
But I have tested human parsing with both CIHP_PGN model and also with another model trained on LIP dataset.
Below is the result I am getting and I think the edges in the segmentation is producing the noise in the final output.
Please help me on this.
resized

Strange points appear when re-train data

Hi, I intend to add a new image to the dataset and train it again, but after 20 epochs, the test image has a strange color in the neck:
tinh 20 epoch 3
linh 20 epoch 1
I use a Google colab for training with Pytorch 1.15.
Do you know this error?
And can you give more information about the python environment which runs out the same as the result in the article?

confusion about the target label of G1

屏幕快照 2020-05-08 15 01 51

the order of img above is real_image, clothes, pre_clothes_mask, mask_clothes, all_clothes_label, target_label_in_CELoss, inference_result_of_G1, I wonder why the inference could learn the shape of try-on clothes and show it on the label while the target label keep the old clothes label?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.