Comments (1)
Abstract
- Current unsupervised/unpaired image-to-image translation (UIT) methods (see ref) typically requires many images in both source and target classes, which greatly limits their use.
- This paper proposes novel framework that only needs a few examples (few-shot) and can work on unseen target classes.
- The proposed framework can also be applied to few-shot image classification and outperform a SoTA method based on feature hallucination.
Method Overview
- Motivation: Human can imagine the unseen target classes (e.g., seeing a standing tiger for the first time and imagine it lying down) by past visual experiences (e.g., seeing another animal standing and lying down before).
- Past visual experience: Learn on images of many different classes.
- Imagine unseen classes: Translate images from source class to target class with few examples of target class.
- Data: Source class images: Many source classes with each contain many images (e.g., species of animals).
- Training: Use source class images to train a multi-class UIT model (the target class is still from source classes).
- Inference: Few seen/unseen target class images only accessible during inference.
Model
- x¯ = G(x, {y_1, ..., y_K}): A conditional few-shot image generator (translator) takes a content image x and 1-way (class) K-shot images {y_1, ... y_K} as input and generates the output image x¯.
- z_x = E_x(x): A content encoder maps content image x to content latent code z_x.
- z_y = E_y({y_1, ..., y_K}): A class (style) encoder maps {y_1, ... y_K} to latent vectors individually and averages them into a class latent code z_y.
- x¯ = F_x(z_x, z_y): A decoder consisted of several adaptive instance normalization (AdaIN) residual blocks followed by several upscale conv layers.
- By feeding z_y to the decoder via the AdaIN layers, we let the class images control the global look (style), while maintaining the local structure (content).
- The generalization capability depends on the number of source classes during training (more is better).
- D: A multi-task adversarial discriminator.
Training
- |S|: Number of source classes.
- For D, each task determines whether an input image is real or fake of the source class. As there are |S| source classes, we have |S| binary outputs for D.
- Input an real image x of a source class c_x, penalize D if its c_x-th output is fake. However, no penalization for outputting fake for other (|S|-1) source classes.
- Input an fake image x¯ of a source class c_x, penalize D if its c_x-th output is real. Otherwise, penalize G.
Losses
- Overall loss
- GAN loss: As described above.
- Reconstruction loss encourages content similar to image of source class.
- Feature matching loss encourages style similar to images of target class.
- D_f is the feature extractor of the discriminator D without the last layer.
UIT methods with Different Constraints (Enforce translation to preserve certain properties)
- Pixel values
- Pixel gradients
- Semantic features
- Unsupervised cross-domain image generation. ICLR 2017.
- Class labels
- Pairwise sample distances
- One-sided unsupervised domain mapping. NIPS 2017.
- Cycle consistency
- Dualgan: Unsupervised dual learning for image-to-image translation. ICCV 2017.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV 2017.
- Learning to discover cross-domain relations with generative adversarial networks. ICML 2017.
- Augmented cyclegan: Learning many-to-many mappings from unpaired data. ICML 2018.
- Toward multimodal image-to-image translation. NIPS 2017.
- Shared latent space assumption
- Coupled generative adversarial networks. NIPS 2016.
- Unsupervised image-to-image translation networks. NIPS 2017.
- Partially-shared latent space assumption
- Multimodal unsupervised image-to-image translation (MUNIT). ECCV 2018.
- Diverse image-to-image translation via disentangled representation. ECCV 2018.
- This work.
Related work
- One-shot unsupervised cross domain translation. NIPS 2018: Assume one source class image but many target class images.
from papernotes.
Related Issues (20)
- Neural Architecture Search
- A Recipe for Training Neural Networks HOT 1
- SinGAN: Learning a Generative Model from a Single Natural Image HOT 1
- A Style-Based Generator Architecture for Generative Adversarial Networks
- Unsupervised Data Augmentation for Consistency Training HOT 1
- How to Read a Paper HOT 1
- Selfie: Self-supervised Pretraining for Image Embedding HOT 1
- NeurIPS 2019 Notes
- Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates HOT 1
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning HOT 1
- Bayesian Deep Learning
- Knowledge Distillation
- CVPR 2020 Tutorial Talk: Automated Hyperparameter and Architecture Tuning
- Extensive CVPR 2020 Highlighted Tutorials and Papers!
- Normalization Techniques in Training DNNs: Methodology, Analysis and Application
- Why Normalizing Flows Fail to Detect Out-of-Distribution Data
- Knowledge Distillation Meets Self-Supervision & Prime-Aware Adaptive Distillation HOT 4
- Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels HOT 3
- Hyperspherical Prototype Networks HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from papernotes.