Metadata Authors: Ming-Yu Liu, Xun Huang, +4 authors, Jan Ka

Abstract Current unsupervised/unpaired image-to-image transl

Few-Shot Unsupervised Image-to-Image Translation about papernotes HOT 1 OPEN

howardyclo commented on August 16, 2024

Few-Shot Unsupervised Image-to-Image Translation

from papernotes.

Comments (1)

howardyclo commented on August 16, 2024

Abstract

Current unsupervised/unpaired image-to-image translation (UIT) methods (see ref) typically requires many images in both source and target classes, which greatly limits their use.
This paper proposes novel framework that only needs a few examples (few-shot) and can work on unseen target classes.
The proposed framework can also be applied to few-shot image classification and outperform a SoTA method based on feature hallucination.

Method Overview

Motivation: Human can imagine the unseen target classes (e.g., seeing a standing tiger for the first time and imagine it lying down) by past visual experiences (e.g., seeing another animal standing and lying down before).
- Past visual experience: Learn on images of many different classes.
- Imagine unseen classes: Translate images from source class to target class with few examples of target class.
Data: Source class images: Many source classes with each contain many images (e.g., species of animals).
Training: Use source class images to train a multi-class UIT model (the target class is still from source classes).
Inference: Few seen/unseen target class images only accessible during inference.

Model

x¯ = G(x, {y_1, ..., y_K}): A conditional few-shot image generator (translator) takes a content image x and 1-way (class) K-shot images {y_1, ... y_K} as input and generates the output image x¯.
- z_x = E_x(x): A content encoder maps content image x to content latent code z_x.
- z_y = E_y({y_1, ..., y_K}): A class (style) encoder maps {y_1, ... y_K} to latent vectors individually and averages them into a class latent code z_y.
- x¯ = F_x(z_x, z_y): A decoder consisted of several adaptive instance normalization (AdaIN) residual blocks followed by several upscale conv layers.
- By feeding z_y to the decoder via the AdaIN layers, we let the class images control the global look (style), while maintaining the local structure (content).
- The generalization capability depends on the number of source classes during training (more is better).
D: A multi-task adversarial discriminator.

Training

|S|: Number of source classes.
For D, each task determines whether an input image is real or fake of the source class. As there are |S| source classes, we have |S| binary outputs for D.
Input an real image x of a source class c_x, penalize D if its c_x-th output is fake. However, no penalization for outputting fake for other (|S|-1) source classes.
Input an fake image x¯ of a source class c_x, penalize D if its c_x-th output is real. Otherwise, penalize G.

Losses

Overall loss

GAN loss: As described above.

Reconstruction loss encourages content similar to image of source class.

Feature matching loss encourages style similar to images of target class.
D_f is the feature extractor of the discriminator D without the last layer.

UIT methods with Different Constraints (Enforce translation to preserve certain properties)

Pixel values
- Learning from simulated and unsupervised images through adversarial training. CVPR 2017.
Pixel gradients
- Unsupervised pixel-level domain adaptation with generative adversarial networks. CVPR 2017.
Semantic features
- Unsupervised cross-domain image generation. ICLR 2017.
Class labels
- Unsupervised pixel-level domain adaptation with generative adversarial networks. CVPR 2017.
Pairwise sample distances
- One-sided unsupervised domain mapping. NIPS 2017.
Cycle consistency
- Dualgan: Unsupervised dual learning for image-to-image translation. ICCV 2017.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV 2017.
- Learning to discover cross-domain relations with generative adversarial networks. ICML 2017.
- Augmented cyclegan: Learning many-to-many mappings from unpaired data. ICML 2018.
- Toward multimodal image-to-image translation. NIPS 2017.
Shared latent space assumption
- Coupled generative adversarial networks. NIPS 2016.
- Unsupervised image-to-image translation networks. NIPS 2017.
Partially-shared latent space assumption
- Multimodal unsupervised image-to-image translation (MUNIT). ECCV 2018.
- Diverse image-to-image translation via disentangled representation. ECCV 2018.
- This work.

Related work

One-shot unsupervised cross domain translation. NIPS 2018: Assume one source class image but many target class images.

from papernotes.

Few-Shot Unsupervised Image-to-Image Translation about papernotes HOT 1 OPEN

Comments (1)

Abstract

Method Overview

Model

Training

Losses

UIT methods with Different Constraints (Enforce translation to preserve certain properties)

Related work

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent