facebookresearch / vicreg Goto Github PK

View Code? Open in Web Editor NEW

498.0 498.0 86.0 127 KB

VICReg official code base

License: MIT License

Python 100.00%

vicreg's People

Contributors

Stargazers

Watchers

Forkers

rishirelan hibatallahk azizighani maveriq ruthrash lucianogallegos iman67 daishengyu1993 micseb marcelomata stjordanis repo-collection sg774 aditya701 joaquinmorenoa joskid ankitshah009 bekyilma carrascomj repst yamine15 duolinwang etetteh bairesearch jeanne-ber alexandresee lily-le mmaltafe imayukh alexfoo-dw lopezpaz justcherie yveshartmann104 jayamundra zlapp nickb- aerinkim peabrane celsopitta harvard-visionlab shuaizengmu rainotus bvanberl moajjem04 epokhe zhuowenzou scheibenreif otakbeku jlqzzz chester-w-xie cwalshai sr-ndai hgiangcao aashere arunadevikaruppasamy danoneata mobasherah12 md-pham vivek9chavan dtdannen msed-ebrahimi jplineb arioluwa pascal20100 ruchikachavhan danyalrehman isr-wang amitsingh695 dennisweiss misal3 adken colibrisson shreyjasuja drscotthawley brookluo jykang0332 furyhawk jasonchen505 esterci siaer dat-dangtien barkain gabri14el

vicreg's Issues

Questions about loss's range of experience

In my experiment, the cov_loss is about 1e-5 within the first 10 epochs and then became 0, I wonder the magnitude of your cov_loss. Thank you!

Variance regularization conflicts with paper

Hi!

Reading your code, I've noticed you take the mean of the two variance terms:
https://github.com/facebookresearch/vicreg/blob/main/main_vicreg.py#L207

On the other hand, there's no sign of division by 2 in equation 1) in the paper.
I've also found the default parameters of λ, µ and ν matches to the best performing setup in Table 7, so I believe there's a conflict between code and paper. Can you please double-check this?

Thanks.

Cross-modal retrieval on COCO

Hi! Thank you for your work! I have a question regarding cross-modal retrieval on COCO, I am struggling to reproduce the results reported in the paper. Could you provide more details on your training protocol? Which coefficients are you using for VICReg/what kind of expander architecture/are you doing any further downstream training or do you directly use the encoder embeddings obtained via ssl?

Thank you!

Does this code need large GPU memory？

Hello! During the training of cifar10 dataset, do you encounter that when the batchsize is set to 2048, you can't run on the dual card nvidia3090? Display memory overflow.

So I changed the batch size to 256, which is still a memory overflow.

Finally, I had no choice but to change it to 128 to run.

However, compared with simclr and swav codes, the batch size that can be set under the same device is not so small. I can generally run 2048 or 1024. Is this normal?

My device is nvidia3090, dual card, with 48g of running video memory. The training data set is cifar10

If you can easily answer, I will be very happy!

C value for SVM for VOC07

"For VOC07 Everingham et al. (2010), we train a linear SVM with LIBLINEAR Fan et al. (2008). The images are center cropped and resized to 224 × 224, and the C values are computed with cross-validation." - this is from the VICReg paper by Bardes et al., ICLR 2022.

I am not being able to reproduce the results. It will be useful if the authors provided the code or the optimal C value which worked for them in this case.

Issues in repreducing object detection results on VOC2007+12

Dear VICReg authors, thanks for sharing the codes of this great work! I have strictly followed the parameter settings for the object detection task on VOC2007+12 described in the paper and initialized the model with the provided VICReg model in this repo. However, I achieved only 79.0734 mAP50 using detectron2, lower than the reported results 82.4 in Table 2. Could you please give some guidance or share your detectron2 config file to reproduce the result? Thanks in advance!

Loss curve

Hi, thanks for the great work!

I am trying to implement your method on my own dataset. Would you please post a figure of the loss changes with respect to the training epochs? It would be very helpful.

Thanks!

Coefficients of loss

Hi, in the code the std loss is averaged by 2 (not the same as the paper) while the cov loss is not (same as the paper). Then, is the coefficient of mse vs. std vs. cov loss still 25.0 & 25.0 & 1.0 (same as the paper), but not 25.0 & 50.0 & 1?

vicreg/main_vicreg.py

Line 211 in 4e12602

cov_loss = off_diagonal(cov_x).pow_(2).sum().div(

Thanks a lot!

torchvision dependencies mismatch with 'InterpolationMode' class

Hi,
Thanks for the repo!

I saw from the README that the only requirement was pytorch==1.7.1 but I think there is also an implicit requirement on the vision library: torchvision >=0.9.0.

Indeed from pytorch official installation guide pytorch 1.7.1 should be installed with torchvision 0.8.2 (see doc). However, according to the torchvision repo, the class InterpolationMode, only appears in version 0.9.0.

This leads to an ImportError in augmentations.py.

Do you know the reason behind this and can you confirm which version of torchvision you are using?

Thanks!

Question about Table 12 in the paper

Hi. Thanks for the great work!
I'm trying to reproduce the results of Table 12 (impact of expander dimensionality) in your paper.
Could you teach me what hyperparameters you used in the experiments?

Unable to load the pre-trained models : resnet50() returns tuple not model

Following the instructions on the README I am trying to download the pretrained models

import torch
resnet50 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')

This fails because "tuple" has no attribute load_state_dict(), pointing to line 21 on script vicreg/hubconf.py

In fact, the function resnet50() defined in the script /main/resnet.py returns a tuple

def resnet50(**kwargs):
    return ResNet(Bottleneck, [3, 4, 6, 3], **kwargs), 2048

In order to make this work, I had to modify the function resnet50() in hubconf.py:


def resnet50(pretrained=True, **kwargs):
    model = resnet.resnet50(**kwargs)
    if pretrained:
        state_dict = torch.hub.load_state_dict_from_url(
            url="https://dl.fbaipublicfiles.com/vicreg/resnet50.pth",
            map_location='cpu',
        )
        model[0].load_state_dict(state_dict, strict=True)     --> Grab first element of tuple!
    return model

TypeError: BasicBlock.init() got an unexpected keyword argument 'last_activation'

I got the following error, when I ran:
python -m torch.distributed.launch --nproc_per_node=8 main_vicreg.py --data-dir <path to my data> --exp-dir <path to my output> --arch resnet34 --epochs 10 --batch-size 512 --base-lr 0.3

Error output

Traceback (most recent call last):
  File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 340, in <module>
    main(args)
  File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 106, in main
    model = VICReg(args)#.cuda(gpu)
  File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 191, in __init__
    self.backbone, self.embedding = resnet.__dict__[args.arch](
  File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 300, in resnet34
    return ResNet(BasicBlock, [3, 4, 6, 3], **kwargs), 512
  File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 191, in __init__
    self.layer1 = self._make_layer(block, num_out_filters, layers[0])
  File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 253, in _make_layer
    block(
TypeError: BasicBlock.__init__() got an unexpected keyword argument 'last_activation'

Upon adding last_activation="relu" to the __init__() of the BasicBlock, the error was rectified.

Discrepancy with paper for covariance loss

Hi,

I noticed a difference between the code (main_vicreg.py#L202) and the Algorithm 1 in the paper where the subtraction of the mean from $x$ and $y$ is performed after the variance loss.

Could you check and let me know which one is correct?
Thank you for your time,
Paolo

What is the base learning rate for batch size 512?

The original paper states that the base lr is 0.4.

However, 8-GPU single node training script says 0.3.

python -m torch.distributed.launch --nproc_per_node=8 main_vicreg.py --data-dir /path/to/imagenet/ --exp-dir /path/to/experiment/ --arch resnet50 --epochs 100 --batch-size 512 --base-lr 0.3

Which one is correct?

DxD instead of NxN in the computation of the covariance matrix of Z?

Hi, I came across the implementation of the covariance matrix of Z while trying to use VICReg in my project. I feel like I am missing something here, but I wonder if the implementation should be x @ x.T instead of x.T @ x?

Thanks for any information you can provide.

Non-symmetric Augmentations

Hi,
In the paper, you mentioned that the augmentations are symmetrized but in the code probabilities for blur and polarization are not symmetric (similar to BYOL). Is there a reason for that?

How to get image embedding with vicreg?

Hi, thanks for your work!
How can i get image features from vicreg models?
I doing:

import torch 
vicreg_model = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')
data_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

data = torch.unsqueeze(data_transforms(Image.open(image).convert('RGB')), dim=0)
with torch.no_grad():  
    embedding = vicreg_model(data.float()).cpu()

And question for further works: how i can implement self-supervised metric learning with vicreg for custom dataset? I need a good embedding for image from my own dataset.

Can't load the full checkpoint

I downloaded the available checkpoint for ResNet-50 through the provided link: https://dl.fbaipublicfiles.com/vicreg/resnet50_fullckpt.pth

But upon loading the checkpoint, following error appears:

>>> torch.load(checkpoint_path)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/user/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/serialization.py", line 853, in _load
    result = unpickler.load()
AttributeError: Can't get attribute 'exclude_bias_and_norm' on <module '__main__' (built-in)>

Same error happens with the other checkpoints. Is there something I am doing wrong? Appreciate the help!

Using pytorch version 1.7.1 and torchvision 0.8.2

Hyperparameter recommendations for Resnet18 and batch size of 256

Hi,

I can push up to a batch size of 256 on the Resnet18 backbone.
Could you suggest recommendations on hyperparameters ( Base learning rate, Projector dimensions, similarity/std/cov coefficients, etc.) that would be appropriate for the VICReg framework?

Regards

Readme commands to load pretrained models on PyTorch Hub (resnet50x2 and resnet200x2) not working

Hi,

When running the commands specified on the readme file to load the pretrained models on PyTorch Hub, they fail for resnet50x2 and resnet200x2. The issue is that the callable methods that load such models in hubconf.py are not called 'resnet50x2' and 'resnet200x2' but 'resnet50x2' and 'resnet200x2', respectively.

I got it working by changing the original commands shown in the readme by the following:

import torch
resnet50 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')
resnet50x2 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50w2')
resnet200x2 = torch.hub.load('facebookresearch/vicreg:main', 'resnet200w2')

Where I just replaced the 'x' by 'w' in 'resnet50x2' and 'resnet200x2'.

Thank you,
Xavi

Loss becomes Nan suddenly.

The loss becomes NaN after some number of epochs, and then the model never converges. This happens randomly. Trying to train a custom dataset with a batch size of 2048 and base lr of 0.2 on 4 A100s.
{"epoch": 299, "step": 264121, "loss": 14.823115348815918, "time": 33933, "lr": 1.2852132052592342} {"epoch": 299, "step": 264163, "loss": 14.84267520904541, "time": 33993, "lr": 1.2851170355212638} {"epoch": 299, "step": 264205, "loss": NaN, "time": 34054, "lr": 1.2850208546990547} {"epoch": 299, "step": 264245, "loss": 14.683608055114746, "time": 34115, "lr": 1.2849292436129383} {"epoch": 300, "step": 264300, "loss": 14.624290466308594, "time": 34213, "lr": 1.2848032619604233} {"epoch": 300, "step": 264335, "loss": 14.825407981872559, "time": 34275, "lr": 1.2847230819273718} {"epoch": 300, "step": 264376, "loss": 14.545491218566895, "time": 34336, "lr": 1.2846291469640327} {"epoch": 300, "step": 264417, "loss": 14.715323448181152, "time": 34397, "lr": 1.2845352014486329} {"epoch": 300, "step": 264458, "loss": 54.99197769165039, "time": 34458, "lr": 1.284441245383221} {"epoch": 300, "step": 264501, "loss": 22.475656509399414, "time": 34518, "lr": 1.2843426947628278} {"epoch": 300, "step": 264542, "loss": 23.183082580566406, "time": 34579, "lr": 1.2842487170891577} {"epoch": 300, "step": 264583, "loss": 23.91857147216797, "time": 34640, "lr": 1.284154728871724} {"epoch": 300, "step": 264626, "loss": 24.39642906188965, "time": 34701, "lr": 1.284056144537631} {"epoch": 300, "step": 264668, "loss": 24.610559463500977, "time": 34762, "lr": 1.2839598416708278} {"epoch": 300, "step": 264711, "loss": 24.703632354736328, "time": 34823, "lr": 1.2838612344228337} {"epoch": 300, "step": 264753, "loss": 24.7344970703125, "time": 34883, "lr": 1.283764909179525} {"epoch": 300, "step": 264793, "loss": 24.75002670288086, "time": 34944, "lr": 1.283673160576227} {"epoch": 300, "step": 264835, "loss": 24.750017166137695, "time": 35005, "lr": 1.2835768137547283} {"epoch": 300, "step": 264878, "loss": 24.750001907348633, "time": 35066, "lr": 1.2834781615145805} {"epoch": 300, "step": 264921, "loss": 24.75, "time": 35127, "lr": 1.283379497695405} {"epoch": 300, "step": 264963, "loss": 24.75, "time": 35187, "lr": 1.2832831172076795} {"epoch": 300, "step": 265005, "loss": 24.75, "time": 35248, "lr": 1.2831867256776874}

Regarding the variance regulation term

Thanks for the great work! It is stated in the paper (Sec.4.1) that the hinge function encourages the variance to be equal to \gamma. I think it should be above \gamma instead. Is that correct? Will there be a situation where the variance becomes too large?

Transfer learning results

I used this code to get the classification results of the paper`s table2(transfer learning results)

However, the results are quite different.

                  Paper       Try                                  
INat18            47%         10%
Places205         54.3%       41%

Table2: "linear classification tasks on top of frozen representations"
C ADDITIONAL IMPLEMENTATION DETAILS
C.3 TRANSFER LEARNING
Linear classification : "experiment detail"

Following above sentence and C.3 paragraph, I tried two dataset(INat18, Places205)
1. Load pretrained by Imagenet weight in this repo.
2. Frozen all weights in resnet50 backbone.
(It is not updated at all.)
3. Training with Linear Layer.

I wonder is there anything I miss?
Is this process right access?