Coder Social home page Coder Social logo

google-research / simclr Goto Github PK

View Code? Open in Web Editor NEW
3.9K 47.0 620.0 3.86 MB

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

Home Page: https://arxiv.org/abs/2006.10029

License: Apache License 2.0

Python 5.78% Jupyter Notebook 94.22%
simclr contrastive-learning representation-learning self-supervised-learning unsupervised-learning computer-vision simclrv2

simclr's Introduction

SimCLR - A Simple Framework for Contrastive Learning of Visual Representations

News! We have released a TF2 implementation of SimCLR (along with converted checkpoints in TF2), they are in tf2/ folder.

News! Colabs for Intriguing Properties of Contrastive Losses are added, see here.

SimCLR Illustration
An illustration of SimCLR (from our blog here).

Pre-trained models for SimCLRv2

Open In Colab

We opensourced total 65 pretrained models here, corresponding to those in Table 1 of the SimCLRv2 paper:

Depth Width SK Param (M) F-T (1%) F-T(10%) F-T(100%) Linear eval Supervised
50 1X False 24 57.9 68.4 76.3 71.7 76.6
50 1X True 35 64.5 72.1 78.7 74.6 78.5
50 2X False 94 66.3 73.9 79.1 75.6 77.8
50 2X True 140 70.6 77.0 81.3 77.7 79.3
101 1X False 43 62.1 71.4 78.2 73.6 78.0
101 1X True 65 68.3 75.1 80.6 76.3 79.6
101 2X False 170 69.1 75.8 80.7 77.0 78.9
101 2X True 257 73.2 78.8 82.4 79.0 80.1
152 1X False 58 64.0 73.0 79.3 74.5 78.3
152 1X True 89 70.0 76.5 81.3 77.2 79.9
152 2X False 233 70.2 76.6 81.1 77.4 79.1
152 2X True 354 74.2 79.4 82.9 79.4 80.4
152 3X True 795 74.9 80.1 83.1 79.8 80.5

These checkpoints are stored in Google Cloud Storage:

We also provide examples on how to use the checkpoints in colabs/ folder.

Pre-trained models for SimCLRv1

The pre-trained models (base network with linear classifier layer) can be found below. Note that for these SimCLRv1 checkpoints, the projection head is not available.

Model checkpoint and hub-module ImageNet Top-1
ResNet50 (1x) 69.1
ResNet50 (2x) 74.2
ResNet50 (4x) 76.6

Additional SimCLRv1 checkpoints are available: gs://simclr-checkpoints/simclrv1.

A note on the signatures of the TensorFlow Hub module: default is the representation output of the base network; logits_sup is the supervised classification logits for ImageNet 1000 categories. Others (e.g. initial_max_pool, block_group1) are middle layers of ResNet; refer to resnet.py for the specifics. See this tutorial for additional information regarding use of TensorFlow Hub modules.

Enviroment setup

Our models are trained with TPUs. It is recommended to run distributed training with TPUs when using our code for pretraining.

Our code can also run on a single GPU. It does not support multi-GPUs, for reasons such as global BatchNorm and contrastive loss across cores.

The code is compatible with both TensorFlow v1 and v2. See requirements.txt for all prerequisites, and you can also install them using the following command.

pip install -r requirements.txt

Pretraining

To pretrain the model on CIFAR-10 with a single GPU, try the following command:

python run.py --train_mode=pretrain \
  --train_batch_size=512 --train_epochs=1000 \
  --learning_rate=1.0 --weight_decay=1e-4 --temperature=0.5 \
  --dataset=cifar10 --image_size=32 --eval_split=test --resnet_depth=18 \
  --use_blur=False --color_jitter_strength=0.5 \
  --model_dir=/tmp/simclr_test --use_tpu=False

To pretrain the model on ImageNet with Cloud TPUs, first check out the Google Cloud TPU tutorial for basic information on how to use Google Cloud TPUs.

Once you have created virtual machine with Cloud TPUs, and pre-downloaded the ImageNet data for tensorflow_datasets, please set the following enviroment variables:

TPU_NAME=<tpu-name>
STORAGE_BUCKET=gs://<storage-bucket>
DATA_DIR=$STORAGE_BUCKET/<path-to-tensorflow-dataset>
MODEL_DIR=$STORAGE_BUCKET/<path-to-store-checkpoints>

The following command can be used to pretrain a ResNet-50 on ImageNet (which reflects the default hyperparameters in our paper):

python run.py --train_mode=pretrain \
  --train_batch_size=4096 --train_epochs=100 --temperature=0.1 \
  --learning_rate=0.075 --learning_rate_scaling=sqrt --weight_decay=1e-4 \
  --dataset=imagenet2012 --image_size=224 --eval_split=validation \
  --data_dir=$DATA_DIR --model_dir=$MODEL_DIR \
  --use_tpu=True --tpu_name=$TPU_NAME --train_summary_steps=0

A batch size of 4096 requires at least 32 TPUs. 100 epochs takes around 6 hours with 32 TPU v3s. Note that learning rate of 0.3 with learning_rate_scaling=linear is equivalent to that of 0.075 with learning_rate_scaling=sqrt when the batch size is 4096. However, using sqrt scaling allows it to train better when smaller batch size is used.

Finetuning the linear head (linear eval)

To fine-tune a linear head (with a single GPU), try the following command:

python run.py --mode=train_then_eval --train_mode=finetune \
  --fine_tune_after_block=4 --zero_init_logits_layer=True \
  --variable_schema='(?!global_step|(?:.*/|^)Momentum|head)' \
  --global_bn=False --optimizer=momentum --learning_rate=0.1 --weight_decay=0.0 \
  --train_epochs=100 --train_batch_size=512 --warmup_epochs=0 \
  --dataset=cifar10 --image_size=32 --eval_split=test --resnet_depth=18 \
  --checkpoint=/tmp/simclr_test --model_dir=/tmp/simclr_test_ft --use_tpu=False

You can check the results using tensorboard, such as

python -m tensorboard.main --logdir=/tmp/simclr_test

As a reference, the above runs on CIFAR-10 should give you around 91% accuracy, though it can be further optimized.

For fine-tuning a linear head on ImageNet using Cloud TPUs, first set the CHKPT_DIR to pretrained model dir and set a new MODEL_DIR, then use the following command:

python run.py --mode=train_then_eval --train_mode=finetune \
  --fine_tune_after_block=4 --zero_init_logits_layer=True \
  --variable_schema='(?!global_step|(?:.*/|^)Momentum|head)' \
  --global_bn=False --optimizer=momentum --learning_rate=0.1 --weight_decay=1e-6 \
  --train_epochs=90 --train_batch_size=4096 --warmup_epochs=0 \
  --dataset=imagenet2012 --image_size=224 --eval_split=validation \
  --data_dir=$DATA_DIR --model_dir=$MODEL_DIR --checkpoint=$CHKPT_DIR \
  --use_tpu=True --tpu_name=$TPU_NAME --train_summary_steps=0

As a reference, the above runs on ImageNet should give you around 64.5% accuracy.

Semi-supervised learning and fine-tuning the whole network

You can access 1% and 10% ImageNet subsets used for semi-supervised learning via tensorflow datasets: simply set dataset=imagenet2012_subset/1pct and dataset=imagenet2012_subset/10pct in the command line for fine-tuning on these subsets.

You can also find image IDs of these subsets in imagenet_subsets/.

To fine-tune the whole network on ImageNet (1% of labels), refer to the following command:

python run.py --mode=train_then_eval --train_mode=finetune \
  --fine_tune_after_block=-1 --zero_init_logits_layer=True \
  --variable_schema='(?!global_step|(?:.*/|^)Momentum|head_supervised)' \
  --global_bn=True --optimizer=lars --learning_rate=0.005 \
  --learning_rate_scaling=sqrt --weight_decay=0 \
  --train_epochs=60 --train_batch_size=1024 --warmup_epochs=0 \
  --dataset=imagenet2012_subset/1pct --image_size=224 --eval_split=validation \
  --data_dir=$DATA_DIR --model_dir=$MODEL_DIR --checkpoint=$CHKPT_DIR \
  --use_tpu=True --tpu_name=$TPU_NAME --train_summary_steps=0 \
  --num_proj_layers=3 --ft_proj_selector=1

Set the checkpoint to those that are only pre-trained but not fine-tuned. Given that SimCLRv1 checkpoints do not contain projection head, it is recommended to run with SimCLRv2 checkpoints (you can still run with SimCLRv1 checkpoints, but variable_schema needs to exclude head). The num_proj_layers and ft_proj_selector need to be adjusted accordingly following SimCLRv2 paper to obtain best performances.

Other resources

Model conversion to Pytorch format

This repo provides a solution for converting the pretrained SimCLRv1 Tensorflow checkpoints into Pytorch ones.

This repo provides a solution for converting the pretrained SimCLRv2 Tensorflow checkpoints into Pytorch ones.

Other non-offical / unverified implementations

(Feel free to share your implementation by creating an issue)

Implementations in PyTorch:

Implementations in Tensorflow 2 / Keras (official TF2 implementation was added in tf2/ folder):

Known issues

  • Batch size: original results of SimCLR were tuned under a large batch size (i.e. 4096), which leads to suboptimal results when training using a smaller batch size. However, with a good set of hyper-parameters (mainly learning rate, temperature, projection head depth), small batch sizes can yield results that are on par with large batch sizes (e.g., see Table 2 in this paper).

  • Pretrained models / Checkpoints: SimCLRv1 and SimCLRv2 are pretrained with different weight decays, so the pretrained models from the two versions have very different weight norm scales (convolutional weights in SimCLRv1 ResNet-50 are on average 16.8X of that in SimCLRv2). For fine-tuning the pretrained models from both versions, it is fine if you use an LARS optimizer, but it requires very different hyperparameters (e.g. learning rate, weight decay) if you use the momentum optimizer. So for the latter case, you may want to either search for very different hparams according to which version used, or re-scale th weight (i.e. conv kernel parameters of base_model in the checkpoints) to make sure they're roughly in the same scale.

Cite

SimCLR paper:

@article{chen2020simple,
  title={A Simple Framework for Contrastive Learning of Visual Representations},
  author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey},
  journal={arXiv preprint arXiv:2002.05709},
  year={2020}
}

SimCLRv2 paper:

@article{chen2020big,
  title={Big Self-Supervised Models are Strong Semi-Supervised Learners},
  author={Chen, Ting and Kornblith, Simon and Swersky, Kevin and Norouzi, Mohammad and Hinton, Geoffrey},
  journal={arXiv preprint arXiv:2006.10029},
  year={2020}
}

Disclaimer

This is not an official Google product.

simclr's People

Contributors

chentingpc avatar dependabot[bot] avatar saxenasaurabh avatar williamfalcon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simclr's Issues

Shape doesn't match when performing linear eval

Hi, thank you for the code release!

I encounter the following error when performing linear eval on CIFAR.

Pretraining:

python run.py --train_mode=pretrain   --train_batch_size=512 --train_epochs=1000   --learning_rate=1.0 --weight_decay=1e-4 --temperature=0.5   --dataset=cifar10 --image_size=32 --eval_split=test --resnet_depth=18   --use_blur=False --color_jitter_strength=0.5   --model_dir=/mnt/research/results/simclr/simclr_test --use_tpu=False

Linear eval:

python run.py --mode=train_then_eval --train_mode=finetune   --fine_tune_after_block=4 --zero_init_logits_layer=True   --variable_schema='(?!global_step|(?:.*/|^)Momentum|head)'   --global_bn=False --optimizer=momentum --learning_rate=0.1 --weight_decay=0.0   --train_epochs=100 --train_batch_size=512 --warmup_epochs=0   --dataset=cifar10 --image_size=32 --eval_split=test --resnet_depth=18   --checkpoint=/mnt/research/results/simclr/simclr_test --model_dir=/mnt/research/results/simclr/simclr_test_ft --use_tpu=False
I0625 13:45:52.051569 140622183225152 evaluation.py:276] Finished evaluation at 2020-06-25-13:45:52
INFO:tensorflow:Saving dict for global step 9766: contrast_loss = 0.0, contrastive_top_1_accuracy = 1.0, contrastive_top_5_accuracy = 1.0, global_step = 9766, label_top_1_accuracy = 0.8248, label_top_5_accuracy = 0.9829, loss = 0.5490037, regularization_loss = 0.0
I0625 13:45:52.051712 140622183225152 estimator.py:2053] Saving dict for global step 9766: contrast_loss = 0.0, contrastive_top_1_accuracy = 1.0, contrastive_top_5_accuracy = 1.0, global_step = 9766, label_top_1_accuracy = 0.8248, label_top_5_accuracy = 0.9829, loss = 0.5490037, regularization_loss = 0.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 9766: /mnt/research/results/simclr/simclr_test_ft/model.ckpt-9766
I0625 13:45:52.182560 140622183225152 estimator.py:2113] Saving 'checkpoint_path' summary for global step 9766: /mnt/research/results/simclr/simclr_test_ft/model.ckpt-9766
INFO:tensorflow:evaluation_loop marked as finished
I0625 13:45:52.182964 140622183225152 error_handling.py:108] evaluation_loop marked as finished
WARNING:tensorflow:From /home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_hub/saved_model_lib.py:110: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
W0625 13:45:52.510126 140622183225152 deprecation.py:323] From /home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_hub/saved_model_lib.py:110: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
Traceback (most recent call last):
  File "run.py", line 435, in <module>
    app.run(main)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "run.py", line 430, in main
    num_classes=num_classes)
  File "run.py", line 343, in perform_evaluation
    checkpoint_path=checkpoint_path)
  File "run.py", line 293, in build_hub_module
    name_transform_fn=None)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_hub/module_spec.py", line 80, in export
    export_module_spec(self, path, checkpoint_path, name_transform_fn)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_hub/module.py", line 74, in export_module_spec
    tf_v1.train.init_from_checkpoint(checkpoint_path, assign_map)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint
    init_from_checkpoint_fn)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1949, in merge_call
    return self._merge_call(merge_fn, args, kwargs)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1956, in _merge_call
    return merge_fn(self._strategy, *args, **kwargs)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in <lambda>
    ckpt_dir_or_file, assignment_map)
  File "/home/mren/miniconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 329, in _init_from_checkpoint
    tensor_name_in_ckpt, str(variable_map[tensor_name_in_ckpt])
ValueError: Shape of variable module/head_supervised/linear_layer/dense/kernel:0 ((512, 10)) doesn't match with shape of tensor head_supervised/linear_layer/dense/kernel ([128, 10]) from checkpoint reader.

Using a single GPU to train the carif10 data set fails to start

Hello, when I used the single GPU to train the caifr10 data set, the following prompt appeared: The system prompts to skip this training.
The system prompts as follows:
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0627 17:39:39.536073 139766252832576 tpu_context.py:209] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0627 17:39:39.536876 139766252832576 tpu_context.py:211] eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Skipping training since max_steps has already saved.
I0627 17:39:39.541100 139766252832576 estimator.py:360] Skipping training since max_steps has already saved.
INFO:tensorflow:training_loop marked as finished
I0627 17:39:39.541271 139766252832576 error_handling.py:96] training_loop marked as finished

Semi-supervised Learning via Fine-Tuning

Is there code for semi-supervised learning via Fine-tuning(Zhai et al. (2019))? I can't find it. Please let me know if there is a way.

I have more questions about semi-supervised learning via fine-tuning(Zhai et al. (2019)).
When fine-tuning, the network is updated using the following Loss

image

Here are some questions.

In semi-supervised learning, is the loss used for unlabeled dataset the same as the contrastive loss used in SIMCLR-based pretraining? (Augmentation-> encoder-> MLP-> contrastive loss)
or
Is it loss such as Loss_rot used in Zhai et al (2019)?

When fine-tuning with semi-supervised learning, do you learn after replacing g (.) (MLP) with a head that acts as a classifier? Or is there a layer for classification independent of g (.)?
At this time, is it correct to fine-tun both the encoder and the changed head (ex FC layer)?

Finally, I wonder about backpropagation from contrastive loss of unlabeled data and how it can be fine-tuned not only to encoder but also to FC for classification.

Image preprocessing statistics for dataset

Hello,

Thanks so much for making this code available! I am trying to use the pretrained SimCLR models to extract features from my own custom dataset. I have done this successfully with other models trained using standard methods ("successfully" meaning that the extracted features are effective at classifying my custom dataset). With SimCLR however the extracted features seem to be only slightly better than random. I am wondering if this is because the input statistics to SimCLR are different from those in most ImageNet-trained models. I normalize my data to the Imagenet mean/std of mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]. Should I be using different normalization for SimCLR?

Questions about the mod_DIR path in Run.py

Hello, I'm here to disturb you again. When I was training your simclr model, I saw in your readme that if you use the imagenet2012 dataset, mod_dir=None, and when training the ciarf10 dataset, mod_dir=/tmp/simclr_test. Does this mod_dir refer to the model storage location? Or is the model automatically selected when training different data sets?

How did you choose 1% imagenet data?

Hi.

How did you choose 1% imagenet data? To do the few-shot learning/semi-supervised learning, choosing labeled images are important. and I'd like to ask how you chose the images out of the whole imagenet dataset.

Thanks.

simclr for audio classification

Hello, thank you for this great work!
I am using simclr for audio (speech) processing, by extracting features as images from the audio clips and then use them to train/fine-tune a model. My first experiments provides me 28% (accuracy) for 150 epochs. Here what I did:

1- Extract MFCCs spectrogram from speech audio
2- Fine tune simclr by training the linear classifier

Is it the right way to use simclr ? Do you think that 28% is a "normal" accuracy? I think it is due to the fact that MFCCs images are very similar that the model can not classify them.

Thanks in advance

About classifier architecture for fine tuning

Thank you for your impressive work.
I'm reading your paper and wondering the architecture of SimCLR's classifier.

From my understanding, you used an l2-regularized multinomial logistic regression classifier for linear evaluation with frozen SimCLR. But, which classifier is used for fine-tuning? Did you use l2-regularized multinomial logistic regression classifier or something like non-linear MLP for fine-tuning?

Hope to hear from you soon.
Thanks in advance!

Why does the paper choose 0.1 as temperature value?

In the paper the explanation on temperature is l2 normalization along with temperature effectively weights different examples, and an appropriate temperature can help the model learn from hard negatives. But I can't really understand how adding temperature achieves 'learning from hard negatives'... Also, looking at Table 5, it seems like choosing 0.1 is significantly better than choosing temperature larger/ smaller than 0.1. I'm also wondering how that happened...

Question about cosine similarity

Why did you only do dot product when calculating cosine similarity, but not divided by the magnitude of the vector?As follows:
logits_aa = tf.matmul(hidden1, hidden1_large, transpose_b=True) / temperature

Replicating 94.0% performance on CIFAR-10

Hi,

I am trying to replicate the numbers reported in Appendix B.9, but stuck around 91~92%.
The training recipe in README gives me around 91% accuracy as stated.

At least increasing depth (ResNet18 -> 50) did not show a meaningful performance gain (~0.5%), and I believe this is consistent with the behavior in the supervised learning on CIFAR-10, which is reported in https://github.com/kuangliu/pytorch-cifar.

If I understand correctly, I should be able to see 93~94% accuracy after 1k epochs with any batch size as reported in Figure B.4, and specifically 94.0% when the batch size is 1024.

Could you share the exact training recipe to replicate the performances?

Thanks.

The padding of max_pooling2d is not fixed as conv2d

simclr/resnet.py

Lines 487 to 490 in f3ca72f

inputs = tf.layers.max_pooling2d(
inputs=inputs, pool_size=3, strides=2, padding='SAME',
data_format=data_format)
inputs = tf.identity(inputs, 'initial_max_pool')

So there are two types of padding in this repo, which may cause confusion when users want to use simclr model in another codebase as the weights are neither compatible with the vanilla tensorflow nor other frameworks like pytorch or MXNet.

Short and Quick Question on TPU/GPU usage

Hi Ting,

Thanks for sharing all these, it's amazing and elegant. One quick question:

I never used TPU, and I'm very new to TensorFlow. I would like to test your code on GPU(s). The pretraining works, but I get a problem with finetuning.

Shall line 369 in run.py be
sliced_eval_mode = tf.estimator.tpu.InputPipelineConfig.PER_HOST_V1
rather than
sliced_eval_mode = tf.estimator.tpu.InputPipelineConfig.SLICED
to enable GPU usage?

If using SLICED, my terminal told me SLICED could only be used when using TPU(s).

Custom dataset usage

hi @chentingpc can you please post the instructions/ guidelines when the code is used on a custom dataset. Any tips for the specific usage of code would also be helpful. Thanks.

Tensorflow version in requirements.txt

Hey there, thanks for making this available, much appreciated!

Quick question: running pip install -r requirements.txt results in the following error: Could not find a version that satisfies the requirement tensorflow==1.15. Any ideas on what I should be doing differently?

Inference

I have checkpoints of SimCLR V1
I want to input an image and I want to obtain features representation from encoder (2048D) as output.

Can you please guide me regarding this

I figured out , my output tensor will be 'base_model/final_avg_pool:0',
But couldn't find the input tensor name. It was quite complicated to understand from graph.

A minimal implementation of SimCLR on an ImageNet subset

I am Sayak, an ML Engineer from India. I am writing to let you know that I have started working on a minimal implementation of SimCLR.

I was able to use the utility functions (the augmentation policies, and the NT-XEnt loss) provided in the GitHub repository mentioned in the paper. For my implementation, I am using a combination of tf.keras and custom loops with tf.GradientTape. I am trying my experiments on a small subset of ImageNet, top 5 categories, each class having 250 images. I am unsure at this point if my implementation is buggy but I am not able to get desired results when I am evaluating the self-supervised learned representations using Linear Evaluation.

This is why I am reaching out to know if it'd be possible to take a look at it and let me know about the feedback. I know this is a lot to ask for. But in case, if you would like to have a look, I have attached two notebooks:

  • one that gathers the dataset, preprocesses it, and trains the SimCLR model along with the major things
  • the other one evaluates the learned representations

I used a GCP VM pre-configured with TensorFlow 2.1 and Tesla T4 GPU.

aminimalimplementationofsimclronanimagenetsubset.zip

Fine Tuning on CIFAR-10

Hi,
I am following the instruction in Readme. And I used the command of fine tuning CIFAR10, while there is an error accured,saying
"ValueError: Shape of variable base_model/batch_normalization_1/beta:0 ((64,)) doesn't match with shape of tensor base_model/batch_normalization_1/beta ([256]) from checkpoint reader."
Does anyone have idea about this?
Thanks a lot!

Running error - could you please look into this?

I get an error when I run this command:

python run.py --mode=train_then_eval --train_mode=finetune
--fine_tune_after_block=4 --zero_init_logits_layer=True
--variable_schema='(?!global_step|(?:.*/|^)Momentum|head)'
--global_bn=False --optimizer=momentum --learning_rate=0.1 --weight_decay=0.0
--train_epochs=2 --train_batch_size=512 --warmup_epochs=0
--dataset=cifar10 --image_size=32 --eval_split=test --resnet_depth=18
--checkpoint=/tmp/simclr_test --model_dir=/tmp/simclr_test_ft --use_tpu=False

(Note: I changed train_epochs to 2 so that it would run quickly.)

The error is:

ValueError: Shape of variable module/head_supervised/linear_layer/dense/kernel:0 ((512, 10)) doesn't match with shape of tensor head_supervised/linear_layer/dense/kernel ([128, 10]) from checkpoint reader.

I am using TensorFlow 1.15.2. While I was trying to figure out this error, I downloaded your old code prior to the update 5 days ago and found that it runs well. Could you please look into this issue? Thank you very much!

ImageNet tensorflow_datasets setup

Dear Ting,

Thank you so much for providing this great codebase for the community!

When I tried to train simclr on ImageNet using TPU, I got different errors when the code tried to read the ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar (I have downloaded those files). Following docs will trigger different errors. May I gently ask on TPU cloud how did you set up the ImageNet TensorFlow dataset properly given that the data is downloaded?

If you would like me to provide the error information, I will post it :)

Thank you so much!

data normalization

Thanks Ting for open-sourcing the code and checkpoints. which method about data normalization in the ptoject? I can't find it

Understanding add_contrastive_loss()

Hello,

I'm having some difficulty following the logic of the function call for add_contrastive_loss() defined in objectives.py. It appears that it is being passed two latent vectors concatenated into one latent vector.

(1) Are the latent vectors representing the latent vectors generated by ResNet for the N input images and the N augmented input images?

(2) If so, are both of these latent vectors aligned? i.e. reference the code; is element i of hidden1 and element i for hidden2 referencing augmented versions of the same image?

And if so, (3) Does the add_contrastive_loss code take care of the negative samples? At which line are negative samples handled?

Thanks!

About the use of mask on objective.py

Hello,
I would like to know the logic behind the use of the mask on the 'add_contrastive_loss' function:
It is my understanding that you use it in order to decrease the value of the similarity for s_{i,i} when computing the loss.
What you do is the following:

LARGE_NUM = 1e9
masks = masks = tf.one_hot(tf.range(batch_size), batch_size)
logits_aa = tf.matmul(hidden1, hidden1_large, transpose_b=True) / temperature
logits_aa = logits_aa - masks * LARGE_NUM_

I know that the final value for the logits of (i,i) will be a very large negative number. But wouldnt it be more precise to just set the logits of positions (i,i) to zero ? I was thinking of the following code:

logits_aa = tf.matmul(hidden1, hidden1_large, transpose_b=True) / temperature
mask_inv = tf.one_hot(tf.range(batch_size), batch_size , on_value=0, off_value=1)
logits_aa = logits_aa * mask_inv

Thanks in advance

One-class classification

Firstly, many thanks for sharing a very interesting piece of research.

I was just wondering if such a representation learning framework may also prove to be effective for one-class classification (or anomaly / novelty detection) tasks. One very simple setting may involve just training a one-class SVDD (Ruff et al. 2018) on top of a frozen base network (trained on, say, ImageNet). It would be great to hear your views!

Kind regards,
Se

Pretrained SavedModel Issues

First of all thanks a lot for releasing this! Its a huge contribution for the whole industry.

I've explored the pretrained model you released and got it working, I have some small issues that could improve its usage:

  1. Use the the serving_default signature which I think its the standard.
  2. Some documentation around the inputs and outputs of the model. I got that inputs are (batch_size, 224, 224, 3), the default output tensor is the embeddings, and *logits* output tensor is the output when training, don't know about the other outputs.

Checkpoints for baseline networks?

Could it possible to also release the checkpoints for the supervised networks (especially ResNet-4x), which are used as a baseline throughout the paper? Except for ResNet50, the checkpoints for larger width nets are unavailable at tensorhub or any other pre-trained network sources.

Pretrained checkpoint for projection head

Hi -- I was wondering if it would be possible to include the projection head (for the SimCLR objective) in the model checkpoints? The current checkpoints only seem to include the supervised projection head ("head_supervised") but not the unsupervised head ("head_contrastive")

Only found 'default' signature in the model checkpoints

Hi, I am trying to further evaluate learned representations from 'init_conv', 'block_group1' etc.. And I tried loading the pre-trained model using tensorflow_hub with the following code,
module = hub.Module('./checkpoints_ResNet50_1x/ResNet50_1x/hub')
print(module.get_signature_names())
This only gives me the 'default' signature, but no other signatures listed in the README.
Is this the right way to access them?
Thanks!

How to tap the features after encoder

I have obtained checkpoints after training simCLR V1 with cifar 10 data. Now how can I use these checkpoint to tap the features(representation) after encoder.

Basic question on my understanding

Hi,
Had a question on the underlying paper itself. Not sure where else I could ask.

  • What is the basis for assuming the crops of the same image should form a positive semantically related pair vs crops from different images are not semantically related?
    - For example, two crops of the same image could have nothing semantically in common. Similarly crops of two different images could have the same semantics. Wouldnt this affect the learning process? This would be especially true if object of interest is localized in one small part of the image.

Transfer learning to CIFAR-10 dataset

Thanks Ting for open-sourcing the code and checkpoints. From the pre-trained model, I have been trying to reproduce the results on transfer learning to datasets mentioned in the paper. Given the challenges there, I wonder whether it will possible to provide some additional instructions on transfer learning, in particular to low-dimensional datasets.

In my attempt, I found the details in the paper enough to reproduce results within 0.5-1% of reported accuracy on most transfer learning dataset and linear evaluation on ImageNet itself. However, on transfer learning to CIFAR-10, the best accuracy on my end was 75% (reported in the paper ~90%). I am using ResNet50-1x as the feature extractor. Unfortunately, both the repo and the paper are quite thin on the details when it comes to transfer learning to low dimensional datasets. These are the design choices I used:

  1. Since CIFAR-10 have only 32x32 pixel images, I used constant padding to make it 224x224. Did you follow a different strategy (like upsampling)?
  2. Extract features from the pooling layer after the last residual block ('default'). I also experimented with the features from the other blocks. Best accuracy (75%) achieved from features from third resnet block.

Along these design choices, I have conducted a thorough hyper-parameter search but met no success. I wonder whether you employed different design choices to report results on transfer learning to CIFAR-10/CIFAR-100 dataset?

Mismatch in reported vs resulted evaluation accuracy on ImageNet

On evaluating the released checkpoints, I observe that the accuracy is a bit below the reported numbers (around 1% less). For example, with ResNet50_1x, the accuracy I got 68.2 whereas the reported number is 69.1. For ResNet50_2x, the resulted accuracy is 73.4 compared to reported 74.2. The evaluation strategy if following:

  1. Using the single-center crop following your code where I first center crop along each edge with proportion=0.85 and then resize to 224x224.
  2. Using the single-center crop evaluation (default from PyTorch), where we resize the shortest edge to 256 and then take a center crop of 224x224. This gives 0.3-0.5% better accuracy than the former approach.

I wonder whether you use multiple random-crop or any other evaluation strategy in your results. Or whether the released checkpoints differ from the ones used in results for Table 6 in the paper?
Thanks.

Detailed settings for the linear evaluation

Could you brief the detailed settings for the linear evaluation? Thank you very much!

  1. Learning rate and scheduler.
  2. Batch size.
  3. Epochs.
  4. Optimizer: I saw you wrote L-BFGS in paper, but I did not find the keyword "lbfgs" or "bfgs" in your code.

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.