Coder Social home page Coder Social logo

menyifang / adgan Goto Github PK

View Code? Open in Web Editor NEW
469.0 32.0 90.0 18.71 MB

The Implementation of paper "Controllable Person Image Synthesis with Attribute-Decomposed GAN" CVPR 2020 (Oral); Pose and Appearance Attributes Transfer;

Python 99.39% Shell 0.61%
pose-transfer generative-adversarial-network gan pytorch virtual-try-on image-synthesis

adgan's Introduction

ADGAN

PyTorch implementation for controllable person image synthesis.

ADGAN: Controllable Person Image Synthesis with Attribute-Decomposed GAN,
Yifang Men, Yiming Mao, Yuning Jiang, Wei-ying Ma, Zhouhui Lian
In: Proc. Computer Vision and Pattern Recognition (CVPR Oral), 2020 arXiv preprint (arXiv 2003.12267)

ADGAN++: Controllable Image Synthesis with Attribute-Decomposed GAN,
Guo Pu*, Yifang Men*, Yiming Mao, Yuning Jiang, Wei-ying Ma, Zhouhui Lian
In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022. arXiv preprint (comming soon) Code

Updates

-(03/22/2022)ADGAN++, an extension vision with improved methods and more applications will be released soon.

Demo

Component Attribute Transfer

Pose Transfer

Requirement

  • python 3
  • pytorch(>=1.0)
  • torchvision
  • numpy
  • scipy
  • scikit-image
  • pillow
  • pandas
  • tqdm
  • dominate

Getting Started

You can directly download our generated images (in Deepfashion) from Google Drive.

Installation

  • Clone this repo:
git clone https://github.com/menyifang/ADGAN.git
cd ADGAN

Data Preperation

We use DeepFashion dataset and provide our dataset split files, extracted keypoints files and extracted segmentation files for convience.

The dataset structure is recommended as:

+—deepfashion
|   +—fashion_resize
|       +--train (files in 'train.lst')
|          +-- e.g. fashionMENDenimid0000008001_1front.jpg
|       +--test (files in 'test.lst')
|          +-- e.g. fashionMENDenimid0000056501_1front.jpg
|       +--trainK(keypoints of person images)
|          +-- e.g. fashionMENDenimid0000008001_1front.jpg.npy
|       +--testK
|          +-- e.g. fashionMENDenimid0000056501_1front.jpg.npy
|   +—semantic_merge
|   +—fashion-resize-pairs-train.csv
|   +—fashion-resize-pairs-test.csv
|   +—fashion-resize-annotation-pairs-train.csv
|   +—fashion-resize-annotation-pairs-test.csv
|   +—train.lst
|   +—test.lst
|   +—vgg19-dcbb9e9d.pth
|   +—vgg_conv.pth
...
  1. Person images
python tool/generate_fashion_datasets.py

Note: In our settings, we crop the images of DeepFashion into the resolution of 176x256 in a center-crop manner.

  1. Keypoints files
  • Download train/test pairs and train/test key points annotations from Google Drive, including fashion-resize-pairs-train.csv, fashion-resize-pairs-test.csv, fashion-resize-annotation-train.csv, fashion-resize-annotation-train.csv. Put these four files under the deepfashion directory.
  • Generate the pose heatmaps. Launch
python tool/generate_pose_map_fashion.py
  1. Segmentation files
  • Extract human segmentation results from existing human parser (e.g. Look into Person) and merge into 8 categories. Our segmentation results are provided in Google Drive, including ‘semantic_merge2’ and ‘semantic_merge3’ in different merge manner. Put one of them under the deepfashion directory.

Optionally, you can also generate these files by yourself.

  1. Keypoints files

We use OpenPose to generate keypoints.

  • Download pose estimator from Google Drive. Put it under the root folder ADGAN.
  • Change the paths input_folder and output_path in tool/compute_coordinates.py. And then launch
python2 compute_coordinates.py
  1. Dataset split files
python2 tool/create_pairs_dataset.py

Train a model

bash ./scripts/train.sh 

Test a model

Download our pretrained model from Google Drive. Modify your data path and launch

bash ./scripts/test.sh 

Evaluation

We adopt SSIM, IS, DS, CX for evaluation. This part is finished by Yiming Mao.

1) SSIM

For evaluation, Tensorflow 1.4.1(python3) is required.

python tool/getMetrics_market.py

2) DS Score

Download pretrained on VOC 300x300 model and install propper caffe version SSD. Put it in the ssd_score forlder.

python compute_ssd_score_fashion.py --input_dir path/to/generated/images

3) CX (Contextual Score)

Refer to folder ‘cx’ to compute contextual score.

Citation

If you use this code for your research, please cite our paper:

@inproceedings{men2020controllable,
  title={Controllable Person Image Synthesis with Attribute-Decomposed GAN},
  author={Men, Yifang and Mao, Yiming and Jiang, Yuning and Ma, Wei-Ying and Lian, Zhouhui},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2020 IEEE Conference on},
  year={2020}
}

@inproceedings{pu2022controllable,
  title={Controllable Image Synthesis with Attribute-Decomposed GAN},
  author={Pu, Guo and Men, Yifang and Mao, Yiming and Jiang, Yuning and Ma, Wei-Ying and Lian, Zhouhui},
  booktitle={Pattern Analysis and Machine Intelligence (TPAMI), 2022 IEEE Transactions on},
  year={2022}
}

Acknowledgments

Our code is based on PATN and thanks for their great work.

adgan's People

Contributors

menyifang avatar mtonym avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adgan's Issues

Preparing images data

Hi
I'm having trouble preparing the images downloaded from DeepFashion so that they match the entries in train.lst and test.lst. The downloaded data contains images organised in a number of sub-folders, and even on flattening the directory the image names don't match the train.lst and test.lst values.
Also, the google drive link provided for generated images contains only testing images.
Any help regarding how to proceed further would be greatly appreciated!

Download error - In-shop Clothes Retrieval Benchmark

When I downloaded "In-shop Clothes Retrieval Benchmark", I got the following 9 error messages:

In-shop Clothes Retrieval Benchmark/README.txt

  • You do not have permission to download this document.
    In-shop Clothes Retrieval Benchmark/Img/img.zip
  • You do not have permission to download this document.
    In-shop Clothes Retrieval Benchmark/Anno/list_item_inshop.txt
  • You do not have permission to download this document.
    In-shop Clothes Retrieval Benchmark/Anno/list_description_inshop.json
  • You do not have permission to download this document.
    In-shop Clothes Retrieval Benchmark/Anno/list_landmarks_inshop.txt
  • You do not have permission to download this document.
    In-shop Clothes Retrieval Benchmark/Eval/list_eval_partition.txt
  • You do not have permission to download this document.
    In-shop Clothes Retrieval Benchmark/Anno/attributes/list_attr_cloth.txt
  • You do not have permission to download this document.
    In-shop Clothes Retrieval Benchmark/Anno/list_bbox_inshop.txt
  • You do not have permission to download this document.
    In-shop Clothes Retrieval Benchmark/Anno/attributes/list_attr_items.txt
  • You do not have permission to download this document.

Is that OK to skip these files?

Shape error when running test.sh

@menyifang I am trying to reproduce your work via pre-trained model privided in your google drive link. After following all your steps, when I run bash ./scripts/test.sh I get the following error

RuntimeError: The size of tensor a (256) must match the size of tensor b (176) at non-singleton dimension 3

This seems to arise from the following operation in the file model_adgen.py
xi = x.mul(semi)

My arguments are the same as the one mentioned in #6. What is the reason for this error? Please help.

Issue on compute_coordinates.py

I'm running compute_coordinates.py in order to recalculate keypoints, but when I run it I have follow warning, (that I suppose mean some issue on size of images):

tensorflow:Model was constructed with shape Tensor("input_5:0", shape=(1, 368, 368, 3), dtype=float32) for input (1, 368, 368, 3), but it was re-called on a Tensor with incompatible shape (None, 184, 126, 3).

The script run, but it create a file with all -1 in keypoints.

What should be the issue?

License

Hello,

What license is the code released under? (Also the models?)

Pretrained model not generating proper images

Hi,

I'm trying to generate images with the pretrained model and the provided preprocessed dataset, but I'm only getting random pixels. I wonder if I'm missing anything in my setup not mentioned in the README file. Really appreciate your help!

Sample output:
fashionMENJackets_Vestsid0000724701_2side jpg___fashionMENJackets_Vestsid0000724701_1front jpg_vis

My test.sh:
python test.py
--dataroot deepfashion
--dirSem deepfashion
--pairLst deepfashion/fashion-resize-pairs-test.csv
--checkpoints_dir ./checkpoints
--results_dir ./results
--name fashion_AdaGen_sty512_nres8_lre3_SS_fc_vgg_cxloss_ss_merge3
--model adgan
--phase test
--dataset_mode keypoint
--norm instance
--batchSize 1
--resize_or_crop no
--gpu_ids 0
--BP_input_nc 18
--no_flip
--which_model_netG ADGen
--which_epoch 800

My folder structure:
ADGAN
├── checkpoints
│   ├── fashion_AdaGen_sty512_nres8_lre3_SS_fc_vgg_cxloss_ss_merge3
│   │   ├── 1000_net_netG.pth
│   │   ├── 800_net_netG.pth
│   │   ├── loss_log.txt
│   │   ├── opt.txt
├── cx
├── data
├── deepfashion
│   ├── fashion-resize-annotation-test.csv
│   ├── fashion-resize-annotation-train.csv
│   ├── fashion-resize-pairs-test.csv
│   ├── fashion-resize-pairs-train.csv
│   ├── resized
│   ├── semantic_merge2
│   ├── semantic_merge3
│   ├── test
│   ├── testK
│   ├── test.lst
│   ├── train
│   ├── trainK
│   ├── train.lst
│   ├── vgg19-dcbb9e9d.pth
│   └── vgg_conv.pth
├── gif
├── losses
├── models
├── options
├── README.md
├── scripts
├── ssd_score
├── test.py
├── tool
├── train.py
└── util

I also fixed a hardcoded path in model_adgen.py locally.

how to use model ?

Excuse me, I have finished training and testing, it works well. I'm now trying to put certain clothes on certain person, like how can I use this model to put this thought into practice?

How long does training take?

Hi @menyifang ,

How long did it roughly take to train the pretrained model with 2 V100 GPUs? A few days or weeks? (I read your paper but it doesn't seem to be mentioned there.)

code

Looking forward to your code!

Inference on single image

@menyifang thanks for opensourcing hte code base ,i have following queries

  1. how to run the inference on single image without using deep fashion database
  2. how are the results for image which have non white background
  3. Can we use other pose estimation model besides open pose to generate keypoints ?

THanks in adavance

Not able to reproduce the result

I am not able to reproduce the results using the pre-trained models.
fashionWOMENJackets_Coatsid0000417103_1front jpg___fashionWOMENJackets_Coatsid0000417103_2side jpg_vis

The above is the output I am getting. Can you predict why I am getting this issue?

Issue with the generate_pose_map_fashion.py script

Hi @menyifang ,

Thanks a lot for replying my previous issues!

In generate_pose_map_fashion.py, the np array used to store the heat map is of type 'uint8'. This will drop any value that is less than 1.

def cords_to_map(cords, img_size, sigma=6):
result = np.zeros(img_size + cords.shape[0:1], dtype='uint8')

But the Gaussian function always returns less than 1 except for exact pixel matches, which means that most of the values are dropped. result[..., i] = np.exp(-((yy - point[0]) ** 2 + (xx - point[1]) ** 2) / (2 * sigma ** 2))

When I used the generate_pose_map_fashion.py script to generate pose heatmap, I found that only a few pixels have values, and these are the keypoint pixels.

Run time error during test

I tested with bash python ./scripts/test.sh to test using pre-trained 800-netG model.

data is arranged as follows:

+—deepfashion
|   +—fashion_resize
|       +--train (files in 'train.lst')
|          +-- e.g. fashionMENDenimid0000008001_1front.jpg
|       +--test (files in 'test.lst')
|          +-- e.g. fashionMENDenimid0000056501_1front.jpg
|       +--trainK(keypoints of person images)
|          +-- e.g. fashionMENDenimid0000008001_1front.jpg.npy
|       +--testK
|          +-- e.g. fashionMENDenimid0000056501_1front.jpg.npy
|   +—semantic_merge
|   +—fashion-resize-pairs-train.csv
|   +—fashion-resize-pairs-test.csv
|   +—fashion-resize-annotation-pairs-train.csv
|   +—fashion-resize-annotation-pairs-test.csv
|   +—train.lst
|   +—test.lst
|   +—vgg19-dcbb9e9d.pth
|   +—vgg_conv.pth
... 

code reference

https://github.com/menyifang/ADGAN/blob/c76647172e923573b4012b6c17a1b3938155aedd/data/keypoint.py#L52:L88

I got following runtime error :

/ADGAN/data/keypoint.py", line 80, in __getitem__
 BP1 = BP1.transpose(2, 0) #c,w,h
 IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

debug output :

>>>BP1_img.shape
(256, 176)

Any suggestions how to solve this!

About the perceptual loss

In your paper, the perceptual loss was:

# vggsubmod refers to certain layer of VGG.
Lper = L1(gram(vggsubmod(x)), gram(vggsubmod(y))  

However in your implementation:

fake_p2_norm = self.vgg_submodel(fake_p2_norm)
input_p2_norm = self.vgg_submodel(input_p2_norm)
input_p2_norm_no_grad = input_p2_norm.detach()
if self.percep_is_l1 == 1:
# use l1 for perceptual loss
loss_perceptual = F.l1_loss(fake_p2_norm, input_p2_norm_no_grad) * self.lambda_perceptual
else:
# use l2 for perceptual loss
loss_perceptual = F.mse_loss(fake_p2_norm, input_p2_norm_no_grad) * self.lambda_perceptual

Could you give some explanation on that ? thanks.

Component Attribute Transfer

@menyifang

I was able to run test.py successfully, but I don't know how to do Component Attribute Transfer.
Also, I don't know how to get the keypoint_mix in the code.
I would appreciate it if you could tell me.

run bash ./script/train.sh

After prepared the environment, then run the cmd "bash ./script/train.sh", I got the error like
"RuntimeError: The size of tensor a (750) must match the size of tensor b (176) at non-singleton dimension 3", can you answer the question, Thank you very much!!!

Problem about contextual score

Hi @menyifang
Thanks for the great work. I want to know how to evaluate the contextual score. I ran the contextual_similarity.py in the "cx" folder and modified the path to the results from test.py. Also, I make sure that the return images of DualDataset are source images(ground truth) and generated images. However, the result of the contextual score does not seem to be the average of each data. It seems that line 180: cx += float(loss_layer(ref_single, fake_single)) sums up the contextual score of one batch and does not divide by the number of iteration(8570/48) later.

prepare data

Which code can be generated train and test folder

Training time

Any experience about the time (days) required for training the network on the DeepFashion database would be appreciated?

[Q] Performance of

More of a question than an issue. Have you tested using attributes from images that do not belong to the dataset?
I tried using the below picture for attributes
bts_people_1

And the output I got was not proper.
bts_output

I am not sure why this is happening. Have you experimented with something like this?

Thanking you.

Regards,
K. J. Nitthilan

What's the meaning of fc layer after style code?

style = self.enc_style(img_B, sem_B)
style = self.fc(style.view(style.size(0), -1))
style = torch.unsqueeze(style, 2)
style = torch.unsqueeze(style, 3)

I noticed that you add an LinearBlock after obtain the style code by encoding every part of an image and its segment,
It seems that this part is not mentioned in the paper.
Can you explain why you do that?

Reproduce using your pretrained model

I am trying to reproduce your work via pre-trained model (provided in your google link). Seem to be missing something. I ran this:

bash ./scripts/test.sh 

but stuck here:

FileNotFoundError: [Errno 2] No such file or directory: 'your_path/deepfashion/semantic_merge3/000/01.jp/_/001..npy'

Can you provide the ***.npy files?

For detailed log error look here:
https://colab.research.google.com/drive/19OPWmpnwgXdQuV06N3xRmew5wP6DuU4K#scrollTo=xl8EZycCiwPJ&uniqifier=1

Maybe I'm heading to the wrong direction. Please advise clearly how to reproduce your work without training?

Dataset download locked by passwd

Hi, I am trying to download data (so many data...!!) anyway it said .ds_stre file is passwd protected and asked me passwd. Could you help me on this?

In fact, img_highres_seg-004 and img_highres-003 are passwd protected.

data loader error

when run test.sh i get he following error
File "/home/projects/ADGAN/test.py", line 44, in
print(len(dataset))
File "/homeprojects/ADGAN/data/custom_dataset_data_loader.py", line 40, in len
return min(len(self.dataset), self.opt.max_dataset_size)
TypeError: 'NoneType' object cannot be interpreted as an integer
what i am doing wrong

Training does not converge - black image

After a certain number of epochs, I am getting black image as output as shown in the image. The training seems to be converging in the beginning; however, it fails after a few epochs.

The losses also explodes during that time. Is this something known?

image

The performance gap on 8 GPUs

Hi @menyifang
Thanks for the great work. Recently, I have trained the model on 8GPUs. Unfortunatelly, the result seems worse compared with 2GPUs. The performance of 1000 epochs could be seen as below. I wonder if you have any suggestions or explainations on the performance gap between 8GPUs and 2GPUs?
epoch884_vis

component attribute transfer

How should I work on component attribute transfer? The test.py only evaluates the pose transfer. Could you please provide the code for component attribute transfer?

why batch norm here?

Hi, thanks for your great work first!

I'm trying to reproduce your code, but I can not understand why these code using F.batch_norm in AdaptiveInstanceNorm2d, why not just F.instance_norm?

ADGAN/models/model_adgen.py

Lines 355 to 362 in 4dd7064

# Apply instance norm
x_reshaped = x.contiguous().view(1, b * c, *x.size()[2:])
out = F.batch_norm(
x_reshaped, running_mean, running_var, self.weight, self.bias,
True, self.momentum, self.eps)
return out.view(b, c, *x.size()[2:])

Preparing the dataset

@menyifang
Hello, Thank you for providing the code for the amazing work.

I am trying to train the network from end to end. As per the instructions I downloaded the in-shop clothes retrieval from the deepfashion dataset. The dataset is structured into MEN and WOMEN subdirectories. I'm having trouble loading and creating the test and train splits.

Can you please look help me with it? or can you provide the link to test and train sets you've created ?

Thank you

the number of learning iteration or epoch

In the description of paper, the number of trials is 120K.
No other items, epoch or mini batch size, are mentioned

On the other hand, the pretrained model says 1000 epoch.

What criteria can be used to reproduce the same environment as the pre-trained model published in the paper or GitHub Code?

return 4000

In addition, the paper states that 101,966 pairs were used as train data, but as shown above, the source code is limited to 4,000.
Was this done to limit the number of iterations when learning at 1,000 epoch?

What is the mapping of the semantic map of person image to the merged K=8 attribute?

I am trying to map the segmentation mask output with the merged (K=8) indexes. The current indexes I have are

np.array(('Background', # always index 0
'Hat', 'Hair', 'Glove', 'Sunglasses',
'UpperClothes', 'Dress', 'Coat', 'Socks',
'Pants', 'Jumpsuits', 'Scarf', 'Skirt',
'Face', 'Left-arm', 'Right-arm', 'Left-leg',
'Right-leg', 'Left-shoe', 'Right-shoe',))
is the input

and the merged index is :
background, hair, face, upper clothes, pants, skirt, arm and leg

Is there a code you could share where this operation is performed?
I am trying to reuse the pre-trained model

Code run time error during test

I ran bash ./scripts/test.sh to test your pretrained 800-netG model.
But I got the following errors for 2 runs.
How to solve it?
image

image

Merging Segmentations

Could you kindly share the code used for merging 20 different segments into K=8 segments?

Pretrained weigths

Hi,

Thank you for publishing the code.
Tried to run a test with the pretrained checkpoints per your instructions.
These are the results that I'm getting
image

What am I doing wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.