Coder Social home page Coder Social logo

aet's Introduction

AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data

The project website for "Auto-Encoding Transformations."

Abstract

The success of deep neural networks often relies on a large amount of labeled examples, which can be difficult to obtain in many real scenarios. To address this challenge, unsupervised methods are strongly preferred for training neural networks without using any labeled data. In this paper, we present a novel paradigm of unsupervised representation learning by Auto-Encoding Transformation (AET) in contrast to the conventional Auto-Encoding Data (AED) approach. Given a randomly sampled transformation, AET seeks to predict it merely from the encoded features as accurately as possible at the output end. The idea is the following: as long as the unsupervised features successfully encode the essential information about the visual structures of original and transformed images, the transformation can be well predicted. We will show that this AET paradigm allows us to instantiate a large variety of transformations, from parameterized, to non-parameterized and GAN-induced ones. Our experiments show that AET greatly improves over existing unsupervised approaches, setting new state-of-the-art performances being greatly closer to the upper bounds by their fully supervised counterparts on CIFAR-10, ImageNet and Places datasets.

Formulation

AED
(a) Auto-Encoding Data
AET
(b) Auto-Encoding Transformation
Figure 1. An illustration of the comparison betweeen AED and AET models. AET attempts to estimate the input transformation rather than the data at the output end. This forces the encoder network E to extract the features that contain the sufficient information about visual structures to decode the input transformation.

Figure 1 illustrates our idea of auto-encoding transformation (AET) in comparison with the conventional auto-encoding data (AED). We build a transformation decoder D to reconstruct the input transformation t from the representations of an original image E(x) and the transformed image E(t(x)), where E is the representation encoder.

The least-square difference between the estimated transformation and the original transformation is minimized to train D and E jointly. For details, please refer to our paper.

Run our codes

Requirements

  • Python == 2.7
  • pytorch == 1.0.1
  • torchvision == 0.2.1
  • PIL == 5.4.1

Note

Please use the torchvision with version 0.2.1. The code does not support the newest version of torchvision.

Cifar10

cd cifar/affine

or

cd cifar/projective

Unsupervised learning:

CUDA_VISIBLE_DEVICES=0 python main.py --cuda --outf ./output --dataroot $YOUR_CIFAR10_PATH$ 

Supervised evaluation with two FC layers:

python classification.py --dataroot $YOUR_CIFAR10_PATH$ --epochs 200 --schedule 100 150 --gamma 0.1 -c ./output_cls --net ./output/net_epoch_1499.pth --gpu-id 0

ImageNet

cd imagenet

Generate and save 0.5 million projective transformation parameters:

python save_homography.py

Unsupervised learning:

CUDA_VISIBLE_DEVICES=0 python main.py --exp ImageNet_Unsupervised

Supervised evaluation with non-linear classifiers:

CUDA_VISIBLE_DEVICES=0 python main.py --exp ImageNet_NonLinearClassifiers

Supervised evaluation with linear classifiers (max pooling):

CUDA_VISIBLE_DEVICES=0 python main.py --exp ImageNet_LinearClassifiers_Maxpooling

Supervised evaluation with linear classifiers (average pooling):

CUDA_VISIBLE_DEVICES=0 python main.py --exp ImageNet_LinearClassifiers_Avgpooling

To use the pretrained ImageNet model:

mkdir experiments
cd experiments
mkdir ImageNet_Unsupervised

Please download the pre-trained model from the link: https://1drv.ms/u/s!AhnMU9glhsl-xxI-e68xrOe3gvQg?e=nFNnir and put the models under ./experiments/ImageNet_Unsupervised

Places205

Firstly pretrain the model on Imagenet, then evalutate the model with linear classifiers (max pooling):

CUDA_VISIBLE_DEVICES=0 python main.py --exp Places205_LinearClassifiers_Maxpooling

Supervised evalutation with linear classifiers (average pooling):

CUDA_VISIBLE_DEVICES=0 python main.py --exp Places205_LinearClassifiers_Avgpooling

Citation

Liheng Zhang, Guo-Jun Qi, Liqiang Wang, Jiebo Luo. AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, June 16th - June 20th, 2019. [pdf]

Disclaimer

Some of our codes reuse the github project FeatureLearningRotNet.

License

This code is released under the MIT License.

aet's People

Contributors

guojunq avatar z331565360 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aet's Issues

Why pre-generate and save the projective transformation parameters?

Hi,

Thanks for sharing the code with us. I am curious why do we need to pre-generate and save the projective transformation parameters. This doesn't seem like an expensive computation, which should be OK to compute on the fly. Specifically:

Generate and save 0.5 million projective transformation parameters:

python save_homography.py

Thank you!

Provide the file of pre-trained models on ImageNet?

Hello, Guo-Jun. Thanks for your great work of AET and making your code available.
Can you also provide the file of pre-trained models on ImageNet for the convenience of experiments?
Thank you again for your valuable attention.

How to avoid trivial shortcut?

Hi Liheng,
General affine transformation or projective transformation can bring black areas at image border. Do these artifacts help network learn trivial shortcuts, such as predicting transformation parameters just according to black area's shape. Your paper did not mention this issue, and I wonder is there some special process to solve this.
Hope to get your reply. Thanks!

Error in find_coeffs function

res = np.dot(np.linalg.inv(A.T * A) * A.T, B)

What is the purpose of the above statement? It is giving an error:
"numpy.linalg.linalg.LinAlgError: Singular matrix"

I tried using replacing np.linalg.inv with numpy.linalg.pinv. But still it is giving ValueError of shape inconsistency.

Thanks.

Reproduced supervised NIN results.

Hi, Liheng.
I have reproduced supervised NIN as the paper setting, which is 88.23 in my experiment results, but the result of the paper report is 92.80. Will 92.8 be the result of running your code or refer to the results in other people's papers?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.