Coder Social home page Coder Social logo

lucassheng / avatar-net Goto Github PK

View Code? Open in Web Editor NEW
178.0 5.0 38.0 32.28 MB

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

Home Page: https://lucassheng.github.io/avatar-net/

Python 98.67% Shell 1.33%
style-transfer zero-shot

avatar-net's Introduction

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

This repository contains the code (in TensorFlow) for the paper:

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration
Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang
CVPR 2018

Overview

In this repository, we propose an efficient and effective Avatar-Net that enables visually plausible multi-scale transfer for arbitrary style in real-time. The key ingredient is a style decorator that makes up the content features by semantically aligned style features, which does not only holistically match their feature distributions but also preserve detailed style patterns in the decorated features. By embedding this module into an image reconstruction network that fuses multi-scale style abstractions, the Avatar-Net renders multi-scale stylization for any style image in one feed-forward pass.

teaser

Examples

image_results

Comparison with Prior Arts

  • The result by Avatar-Net receives concrete multi-scale style patterns (e.g. color distribution, brush strokes and circular patterns in candy image).
  • WCT distorts the brush strokes and circular patterns. AdaIN cannot even keep the color distribution, while Style-Swap fails in this example.

Execution Efficiency

Method Gatys et. al. AdaIN WCT Style-Swap Avatar-Net
256x256 (sec) 12.18 0.053 0.62 0.064 0.071
512x512 (sec) 43.25 0.11 0.93 0.23 0.28
  • Avatar-Net has a comparable executive time as AdaIN and GPU-accelerated Style-Swap, and is much faster than WCT and the optimization-based style transfer by Gatys et. al..
  • The reference methods and the proposed Avatar-Net are implemented on a same TensorFlow platform with a same VGG network as the backbone.

Dependencies

Download

  • The trained model of Avatar-Net can be downloaded through the Google Drive.
  • The training of our style transfer network requires pretrained VGG networks, and they can be obtained from the TF-Slim model repository. The encoding layers of Avatar-Net are also borrowed from pretrained VGG models.
  • MSCOCO dataset is applied for the training of the proposed image reconstruction network.

Usage

Basic Usage

Simply use the bash file ./scripts/evaluate_style_transfer.sh to apply Avatar-Net to all content images in CONTENT_DIR from any style image in STYLE_DIR. For example,

bash ./scripts/evaluate_style_transfer.sh gpu_id CONTENT_DIR STYLE_DIR EVAL_DIR 
  • gpu_id: the mounted GPU ID for the TensorFlow session.
  • CONTENT_DIR: the directory of the content images. It can be ./data/contents/images for multiple exemplar content images, or ./data/contents/sequences for an exemplar content video.
  • STYLE_DIR: the directory of the style images. It can be ./data/styles for multiple exemplar style images.
  • EVAL_DIR: the output directory. It contains multiple subdirectories named after the names of the style images.

More detailed evaluation options can be found in evaluate_style_transfer.py, such as

python evaluate_style_transfer.py

Configuration

The detailed configuration of Avatar-Net is listed in configs/AvatarNet.yml, including the training specifications and network hyper-parameters. The style decorator has three options:

  • patch_size: the patch size for the normalized cross-correlation, in default is 5.
  • style_coding: the projection and reconstruction method, either ZCA or AdaIN.
  • style_interp: interpolation option between the transferred features and the content features, either normalized or biased.

The style transfer is actually performed in AvatarNet.transfer_styles(self, inputs, styles, inter_weight, intra_weights), in which

  • inputs: the content images.
  • styles: a list of style images (len(styles) > 2 for multiple style interpolation).
  • inter_weight: the weight balancing the style and content images.
  • intra_weights: a list of weights balancing the effects from different styles.

Users may modify the evaluation script for multiple style interpolation or content-style trade-off.

Training

  1. Download MSCOCO datasets and transfer the raw images into tfexamples, according to the python script ./datasets/convert_mscoco_to_tfexamples.py.
  2. Use bash ./scripts/train_image_reconstruction.sh gpu_id DATASET_DIR MODEL_DIR to start training with default hyper-parameters. gpu_id is the mounted GPU for the applied Tensorflow session. Replace DATASET_DIR with the path to MSCOCO training images and MODEL_DIR to Avatar-Net model directory.

Citation

If you find this code useful for your research, please cite the paper:

Lu Sheng, Ziyi Lin, Jing Shao and Xiaogang Wang, "Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [Arxiv]

@inproceedings{sheng2018avatar,
    Title = {Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration},
    author = {Sheng, Lu and Lin, Ziyi and Shao, Jing and Wang, Xiaogang},
    Booktitle = {Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
    pages={1--9},
    year={2018}
}

Acknowledgement

This project is inspired by many style-agnostic style transfer methods, including AdaIN, WCT and Style-Swap, both from their papers and codes.

Contact

If you have any questions or suggestions about this paper, feel free to contact me ([email protected])

avatar-net's People

Contributors

lucassheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

avatar-net's Issues

help, run evaluate_style_transfer.sh fail

I have downloaded the trained model of Avatar-Net. then run evaluate_style_transfer.sh, but fail.
My tensorflow version is 1.8.
error log:
/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Finish loading the model [AvatarNet] configuration
Traceback (most recent call last):
File "evaluate_style_transfer.py", line 163, in
tf.app.run()
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "evaluate_style_transfer.py", line 112, in main
inter_weight=FLAGS.inter_weight)
File "/ai/zhyx/docker/avatar-net-master/models/avatar_net.py", line 94, in transfer_styles
style, self.network_name)
File "/ai/zhyx/docker/avatar-net-master/models/losses.py", line 85, in extract_image_features
inputs, spatial_squeeze=False, is_training=False, reuse=reuse)
File "/ai/zhyx/docker/avatar-net-master/models/vgg.py", line 226, in vgg_19
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 2060, in repeat
outputs = layer(outputs, *args, **kwargs)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1027, in convolution
outputs = layer.apply(inputs)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 503, in apply
return self.call(inputs, *args, **kwargs)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 443, in call
self.build(input_shapes[0])
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 137, in build
dtype=self.dtype)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 383, in add_variable
trainable=trainable and self.trainable)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 360, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1561, in layer_variable_getter
return _model_variable_getter(getter, *args, **kwargs)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1553, in _model_variable_getter
custom_getter=getter, use_resource=use_resource)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 261, in model_variable
use_resource=use_resource)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 216, in variable
use_resource=use_resource)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/root/anaconda3/envs/tf17_py36/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 682, in _get_single_variable
"VarScope?" % name)
ValueError: Variable vgg_19/conv1/conv1_1/weights does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

TypeError: Expected binary or unicode string, got None

I try to run this program,but fail...

Finish loading the model [AvatarNet] configuration
Traceback (most recent call last):
File "evaluate_style_transfer.py", line 163, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "evaluate_style_transfer.py", line 121, in main
checkpoint_dir, slim.get_model_variables(), ignore_missing_vars=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 571, in assign_from_checkpoint_fn
reader = pywrap_tensorflow.NewCheckpointReader(model_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 110, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/compat.py", line 65, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got None
could you give me some suggestion?

Real image size and GPU test

Hmm, nice work - really.
But It's the first time when I'm not sure about real inputs.

Is my opinion is True, that the real image size, which is stylized in the network (style part) is about 512? And inputs just resized by bicubic layers before and after? Or I'm wrong?
This is a fast question. I will test the model on 512 and 4k_image_size, and maybe will see the answer.

Second question.
I freeze the graph and when I'm trying to infer that on GPU my kernel is dying. But CPU infer is work.
It's not a good way to install old Tf, for the reason that TF needs Cuda dll's, and I have the newest CUDA and NVIDIA Drivers.
(Infer is TensorFlow 2.x with Interactive session).

question about your paper

i have some question about avatar-net.

  1. training of decoder.
  • in this paper, decoder train only MS coco data set for reconstruct without feature transfer module
  • why did you training MS coco data set only?
  • for example, adain train MS coco data + wiki art data set with adain module.
  • there are any reason for train MS coco reconstruct without transfer module?
    • is transfer module too slow for train ?
    • transfer module can't back propagate?
  1. future direction
  • in future direction, replace style decorator by learnable modules for increased alignment.
  • what is the effect of increased alignment for output image?

vgg_19 weight never load?

I try to run evaluate_style_transfer.py but fail....

Error msg:
style_image_features = losses.extract_image_features(style, self.network_name) in avatar_net.py

ValueError: Variable vgg_19/conv1/conv1_1/weights does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?
We've got an error while stopping in post-mortem: <class 'KeyboardInterrupt'>

I have download "model.ckpt-120000" from your GoogleDrive and vgg_19.ckpt
and set "checkpoint_path" in AvatarNet_config.yml to path of vgg_19.ckpt

but you seem never use checkpoint_path in this project...

any suggestion?

Getting this to work on Windows

Thanks for sharing this. I wanted to try running it on local GPU on Windows. Was able to get it to work with several tweaks. Posting in case anyone else wants to try.

windows fork: https://github.com/noido/avatar-net

edit details: https://github.com/noido/avatar-net/blob/master/readme_windows_tweaks.txt

The most notable obstacle was that the pretrained model download (Google Drive) linked in the repo was missing a checkpoint file to specify model_checkpoint_path. That caused a tensorflow function to return None instead of the correct model path, which caused a cascade of wonderful error messages down the line.

How to do interpolation b/w different styles?

Hi, I was looking into the evaluation script you provided. You mention in README that in AvatarNet.transfer_styles(self, inputs, styles, inter_weight, intra_weights), the styles is argument that can take a list of style images. However it is instantiated in the model as a placeholder in https://github.com/LucasSheng/avatar-net/blob/master/evaluate_style_transfer.py#L95.

As the code is written for doing style transfer for only 1 image, it works. However when I pass multiple images, it fails with a lot of issues even though the code takes care of listifying the style images.

Is there any usable code for interpolation/mixing multiple styles?

Models

Hi guys,
Just wondering where I should put the checkpoint files of your model....Also where should other models go from TF slim?
Cheers

measuring time in style transfer

Hi ,

I was wondering how you measure total time for style transfer. I tried running it for 512X512 image and it gives execution time of 2.1 sec, instead of 0.28 sec in the paper.

Can this model be applied to discrete time sequence?

For speech audio signal, voice conversion is more and more popular. I wonder if the zero-shot style transfer learning can be used to voice conversion. For example, from a source speaker's voice(sv) to a target speaker's voice(tv). Extract the style(like prosody, stress, accent and so on) of sv and the content(timbre and characters) of tv, and mixed the style and content.
I really looking forward to your reply, thank you.

it is a bug ?

_20181106094414

you not use compute_style_features function ,this function compute gram matrix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.