Coder Social home page Coder Social logo

clever's Introduction

CLeVER

Static Badge Static Badge Static Badge Static Badge

Pytorch implementation of CLeVER (Contrastive Learning Via Equivariant Representation).

Arch

If you find our CLeVER useful in your research, please star this repository and consider citing:

@article{song2024contrastive,
  title={Contrastive Learning Via Equivariant Representation},
  author={Song, Sifan and Wang, Jinfeng and Zhao, Qiaochu and Li, Xiang and Wu, Dufan and Stefanidis, Angelos and Su, Jionglong and Zhou, S Kevin and Li, Quanzheng},
  journal={arXiv preprint arXiv:2406.00262},
  year={2024}
}

Table of Content

    1. Updates
    1. Supported Backbone Models
    1. Getting Started
    1. Evaluation
    1. Acknowledgement

1. Updates

  • 30/May/2024: The code repository is publicly available.

2. Supported Backbone Models

CLeVER framework currently is compatible with augmentation strategies of arbitrary complexity for various mainstream backbone models.

  • ResNet, 2015 (Convolutional Modules)
  • ViT, 2021 (Multi-Head Self-Attention Modules)
  • VMamba, 2024 (2D Selective Scan Modules)
  • We find that VMamba has surprising performance gains from equivariance within the CLeVER framework (also within contrastive learning / self-supervised learning framework). This suggests that the integration of equivariant factors not only improves robustness and generalization but also maximizes the potential of innovative backbone architectures like VMamba.

Comparison of the performance of various backbone models pre-trained with CLeVER and DINO on ImageNet-100. All performances are evaluated under rotational perturbation (Top1-Ori+CJ+R).

Backbones

In this figure, for Linear Evaluation on ImageNet-100 with 200 epochs (trained with CAug and evaluated using Original images / Original images + ColorJitter + RandomRotation -90~90), the details are as follows.

Backbones Methods Params GFLOPs Top1-Ori Top1-Ori+CJ+R
ViT-Tiny DINO 5.5M 1.26G 66.2 62.6
CLeVER 5.5M 1.26G 68.7 66.0
ViT-Small DINO 21.7M 4.61G 73.2 71.0
CLeVER 21.7M 4.61G 75.7 73.6
ResNet18 DINO 11.2M 1.83G 71.5 68.7
CLeVER 11.2M 1.83G 74.2 71.3
ResNet50 DINO 23.5M 4.14G 78.4 76.4
CLeVER 23.5M 4.14G 79.1 77.7
VMamba-Tiny DINO 29.5M 4.84G 80.9 79.5
CLeVER 29.5M 4.84G 83.0 81.1

For Linear Evaluation on ImageNet-100 with 200 epochs (trained with BAug, the most common augmentation setting in CL, and evaluated using Orignal images / Orignal images + ColorJitter + RandomRotation -90~90), the details are as follows.

Backbones Methods Params GFLOPs Top1-Ori Top1-Ori+CJ+R
ViT-Tiny DINO 5.5M 1.26G 71.9 46.0
CLeVER 5.5M 1.26G 74.9 48.3
ViT-Small DINO 21.7M 4.61G 75.9 50.7
CLeVER 21.7M 4.61G 76.7 50.4
ResNet18 DINO 11.2M 1.83G 74.4 46.8
CLeVER 11.2M 1.83G 76.8 46.5
ResNet50 DINO 23.5M 4.14G 80.6 53.8
CLeVER 23.5M 4.14G 81.1 53.9
VMamba-Tiny DINO 29.5M 4.84G 83.2 53.4
CLeVER 29.5M 4.84G 83.7 54.4

(* Compared to default augmentation setting used in DINO (i.e., BAug), the CAug has an additional “transforms.RandomRotation(degrees=(-90, 90))” for all input images.)

3. Getting Started

Installation

We have found that installing the environment identical to vmunet enables successful execution of DINO or CLeVER with all mainstream backbones (ResNet, ViT, VMamba).

If you only intend to run DINO or CLeVER with classic backbones (ResNet, ViT), you have the option to simply install the environment using

conda env create -f CLeVER.yml
conda install CLeVER

Pre-training

With the configuration for pre-training (can also directly follow run.sh)

PORT_NUM=25606
GPU_num=4 ## 4GPU
DEVICES=0,1,2,3
net_name=CLeVER ##dino, CLeVER
backbone_name=vit_tiny ## resnet18, resnet50, vit_tiny, vit_small, vssm2-vmambav2_tiny_224
EPOCH=200
DATASET=IN100 ##Imagenet, IN100
DATASET_PATH=<path to imagenet-100 or imagenet>
OTHER_PARA=("65536" "" "" "" "_reg0.001") ## ("DVR_out_dim: 410/2048/16384/65536" "NA" "NA" "NA" "_reg0.001/blank")
HP1=("0.8") ## hyperparameter for separation ratio of representations of IR and EF (default 0.8).
BATCH=("128") ## 128 for 4*GPUs / 256 for 2*GPUs
AUG_TYPE=("aug1_2") ## different augmentation types: aug1=BAug, aug1_2=CAug, aug1_4_2=CAug+ (identical to the manuscript)
SEP_LAMBD=("1.0") ## The coefficient of DVR loss (default 1.0)

Result_dir=<path to output path>
PARA_dir=${DATASET}_ep${EPOCH}/${net_name}_${backbone_name}/${AUG_TYPE[0]}/${BATCH[0]}/${HP1[0]}/
TRIAL_name=${net_name}_${backbone_name}_${BATCH[0]}_${HP1[0]}_${OTHER_PARA[0]}_${SEP_LAMBD[0]}${OTHER_PARA[4]}
mkdir -p ${Result_dir}/${PARA_dir}/${TRIAL_name}/

The pre-training can be conducted by

## Pretraining
CUDA_VISIBLE_DEVICES=${DEVICES} OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${GPU_num} --master_port=${PORT_NUM} main_dino.py --net ${net_name} --arch ${backbone_name} --data_path ${DATASET_PATH}/train --output_dir ${Result_dir}/${PARA_dir}/${TRIAL_name}/ --epochs ${EPOCH} --batch_size_per_gpu ${BATCH[0]} --aug ${AUG_TYPE[0]} --hp1 ${HP1[0]} --DVR_out_dim ${OTHER_PARA[0]} --sep_lambd ${SEP_LAMBD[0]} --reg_lambd ${OTHER_PARA[4]} --saveckp_freq 100 > ${Result_dir}/${PARA_dir}/${TRIAL_name}/result_pretrain_${TRIAL_name}.txt

4. Evaluation

Linear Probe

With the configuration for pre-training (identical to the subsection "Pre-training")

and with the configuration for linear evaluation

GPU_num_LN=2
DEVICES_LN=0,1
BATCH_LN=128
OTHER_LINEAR_PARA=("sgd" "0.001" "" "100")

The linear evaluation can be conducted by

## For dino and CLeVER
## linear probe using full representation
CUDA_VISIBLE_DEVICES=${DEVICES_LN} OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${GPU_num_LN} --master_port=${PORT_NUM} eval_linear.py --net ${net_name} --arch ${backbone_name} --pretrained_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint.pth --output_dir ${Result_dir}/${PARA_dir}/${TRIAL_name}/ --trial_name ${TRIAL_name} --data_path ${DATASET_PATH} --dataset_type ${DATASET} --batch_size_per_gpu ${BATCH_LN} --lr ${OTHER_LINEAR_PARA[1]} --epochs ${OTHER_LINEAR_PARA[3]} --hp1 ${HP1[0]} --else_part ALL --n_last_blocks 4 > ${Result_dir}/${PARA_dir}/${TRIAL_name}/result_linear_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.txt

## For CLeVER only
## linear probe using only invariant representation (IR)
CUDA_VISIBLE_DEVICES=${DEVICES_LN} OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${GPU_num_LN} --master_port=${PORT_NUM} eval_linear.py --net ${net_name} --arch ${backbone_name} --pretrained_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint.pth --output_dir ${Result_dir}/${PARA_dir}/${TRIAL_name}/ --trial_name ${TRIAL_name} --data_path ${DATASET_PATH} --dataset_type ${DATASET} --batch_size_per_gpu ${BATCH_LN} --lr ${OTHER_LINEAR_PARA[1]} --epochs ${OTHER_LINEAR_PARA[3]} --hp1 ${HP1[0]} --else_part main_part --n_last_blocks 4 > ${Result_dir}/${PARA_dir}/${TRIAL_name}/result_linear_hp${HP1[0]}_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.txt

## linear probe using only equivariant factor (EF)
CUDA_VISIBLE_DEVICES=${DEVICES_LN} OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${GPU_num_LN} --master_port=${PORT_NUM} eval_linear.py --net ${net_name} --arch ${backbone_name} --pretrained_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint.pth --output_dir ${Result_dir}/${PARA_dir}/${TRIAL_name}/ --trial_name ${TRIAL_name} --data_path ${DATASET_PATH} --dataset_type ${DATASET} --batch_size_per_gpu ${BATCH_LN} --lr ${OTHER_LINEAR_PARA[1]} --epochs ${OTHER_LINEAR_PARA[3]} --hp1 ${HP1[0]} --else_part else_part --n_last_blocks 4 > ${Result_dir}/${PARA_dir}/${TRIAL_name}/result_linear_hp${HP1[0]}_else_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.txt

Performance Evaluation with perturbed input images

With the configuration for pre-training (identical to the subsection "Pre-training") and with the configuration for linear evaluation (identical to the subsection "Linear Probe")

and with the configuration for performance evaluation after linear probe

GPU_num_EVAL=1
DEVICES_EVAL=0

The performance evaluation with perturbed input images can be conducted by

TEST_AUG_TYPE=("basic" "aug1" "aug1_2" "aug1_4_2")
for test_aug in {0..3..1}
do
## For dino and CLeVER
## evaluation using full representation
CUDA_VISIBLE_DEVICES=${DEVICES_EVAL} OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${GPU_num_EVAL} --master_port=${PORT_NUM} eval_linear.py --net ${net_name} --arch ${backbone_name} --pretrained_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint.pth --output_dir ${Result_dir}/${PARA_dir}/${TRIAL_name}/ --trial_name ${TRIAL_name} --data_path ${DATASET_PATH} --dataset_type ${DATASET} --lr ${OTHER_LINEAR_PARA[1]} --epochs ${OTHER_LINEAR_PARA[3]} --hp1 ${HP1[0]} --else_part ALL --n_last_blocks 4 --test_aug_type ${TEST_AUG_TYPE[${test_aug}]} --evaluate --final_eval_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint_linear_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.pth.tar > ${Result_dir}/${PARA_dir}/${TRIAL_name}/only_eval_${TEST_AUG_TYPE[${test_aug}]}_linear_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.txt

## For CLeVER only
## evaluation using only invariant representation (IR)
CUDA_VISIBLE_DEVICES=${DEVICES_EVAL} OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${GPU_num_EVAL} --master_port=${PORT_NUM} eval_linear.py --net ${net_name} --arch ${backbone_name} --pretrained_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint.pth --output_dir ${Result_dir}/${PARA_dir}/${TRIAL_name}/ --trial_name ${TRIAL_name} --data_path ${DATASET_PATH} --dataset_type ${DATASET} --lr ${OTHER_LINEAR_PARA[1]} --epochs ${OTHER_LINEAR_PARA[3]} --hp1 ${HP1[0]} --else_part main_part --n_last_blocks 4 --test_aug_type ${TEST_AUG_TYPE[${test_aug}]} --evaluate --final_eval_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint_linear_hp${HP1[0]}_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.pth.tar > ${Result_dir}/${PARA_dir}/${TRIAL_name}/only_eval_${TEST_AUG_TYPE[${test_aug}]}_linear_hp${HP1[0]}_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.txt

## evaluation using only equivariant factor (EF)
CUDA_VISIBLE_DEVICES=${DEVICES_EVAL} OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${GPU_num_EVAL} --master_port=${PORT_NUM} eval_linear.py --net ${net_name} --arch ${backbone_name} --pretrained_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint.pth --output_dir ${Result_dir}/${PARA_dir}/${TRIAL_name}/ --trial_name ${TRIAL_name} --data_path ${DATASET_PATH} --dataset_type ${DATASET} --lr ${OTHER_LINEAR_PARA[1]} --epochs ${OTHER_LINEAR_PARA[3]} --hp1 ${HP1[0]} --else_part else_part --n_last_blocks 4 --test_aug_type ${TEST_AUG_TYPE[${test_aug}]} --evaluate --final_eval_weights ${Result_dir}/${PARA_dir}/${TRIAL_name}/checkpoint_linear_hp${HP1[0]}_else_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.pth.tar > ${Result_dir}/${PARA_dir}/${TRIAL_name}/only_eval_${TEST_AUG_TYPE[${test_aug}]}_linear_hp${HP1[0]}_else_${OTHER_LINEAR_PARA[1]}_${OTHER_LINEAR_PARA[3]}_${TRIAL_name}.txt
done

Downstream Classification Task

Please follow the configurations and codes in run_ds.sh.

Downstream Video Object Segmentation

To evaluate the representation quality of pre-trained ViT models in DAVIS 2017 dataset, please follow the instructions of Evaluation: DAVIS 2017 Video object segmentation provided in the official DINO repository.

Downstram Unsupervised Saliency Detection

To evaluate the representation quality of pre-trained ViT models in ECSSD, DUTS, DUT_OMRON dataset, please follow the instructions of 4.2 Unsupervised saliency detection provided in the official TokenCut repository.

5. Acknowledgments

Thanks Simsiam (official, small datasets), DINO (official), TokenCut (official), and VMamba (official) for their public code and released models. We appreciate their efforts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.