Coder Social home page Coder Social logo

vicreg's People

Contributors

adrien987k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vicreg's Issues

Cross-modal retrieval on COCO

Hi! Thank you for your work! I have a question regarding cross-modal retrieval on COCO, I am struggling to reproduce the results reported in the paper. Could you provide more details on your training protocol? Which coefficients are you using for VICReg/what kind of expander architecture/are you doing any further downstream training or do you directly use the encoder embeddings obtained via ssl?

Thank you!

Does this code need large GPU memory?

Hello! During the training of cifar10 dataset, do you encounter that when the batchsize is set to 2048, you can't run on the dual card nvidia3090? Display memory overflow.

So I changed the batch size to 256, which is still a memory overflow.

Finally, I had no choice but to change it to 128 to run.

However, compared with simclr and swav codes, the batch size that can be set under the same device is not so small. I can generally run 2048 or 1024. Is this normal?

My device is nvidia3090, dual card, with 48g of running video memory. The training data set is cifar10

If you can easily answer, I will be very happy!

C value for SVM for VOC07

"For VOC07 Everingham et al. (2010), we train a linear SVM with LIBLINEAR Fan et al. (2008). The images are center cropped and resized to 224 × 224, and the C values are computed with cross-validation." - this is from the VICReg paper by Bardes et al., ICLR 2022.

I am not being able to reproduce the results. It will be useful if the authors provided the code or the optimal C value which worked for them in this case.

Issues in repreducing object detection results on VOC2007+12

Dear VICReg authors, thanks for sharing the codes of this great work! I have strictly followed the parameter settings for the object detection task on VOC2007+12 described in the paper and initialized the model with the provided VICReg model in this repo. However, I achieved only 79.0734 mAP50 using detectron2, lower than the reported results 82.4 in Table 2. Could you please give some guidance or share your detectron2 config file to reproduce the result? Thanks in advance!

Loss curve

Hi, thanks for the great work!

I am trying to implement your method on my own dataset. Would you please post a figure of the loss changes with respect to the training epochs? It would be very helpful.

Thanks!

Coefficients of loss

  • Hi, in the code the std loss is averaged by 2 (not the same as the paper) while the cov loss is not (same as the paper). Then, is the coefficient of mse vs. std vs. cov loss still 25.0 & 25.0 & 1.0 (same as the paper), but not 25.0 & 50.0 & 1?
    cov_loss = off_diagonal(cov_x).pow_(2).sum().div(

    Thanks a lot!

torchvision dependencies mismatch with 'InterpolationMode' class

Hi,
Thanks for the repo!

I saw from the README that the only requirement was pytorch==1.7.1 but I think there is also an implicit requirement on the vision library: torchvision >=0.9.0.

Indeed from pytorch official installation guide pytorch 1.7.1 should be installed with torchvision 0.8.2 (see doc). However, according to the torchvision repo, the class InterpolationMode, only appears in version 0.9.0.

This leads to an ImportError in augmentations.py.

Do you know the reason behind this and can you confirm which version of torchvision you are using?

Thanks!

Question about Table 12 in the paper

Hi. Thanks for the great work!
I'm trying to reproduce the results of Table 12 (impact of expander dimensionality) in your paper.
Could you teach me what hyperparameters you used in the experiments?

Unable to load the pre-trained models : resnet50() returns tuple not model

Following the instructions on the README I am trying to download the pretrained models

import torch
resnet50 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')

This fails because "tuple" has no attribute load_state_dict(), pointing to line 21 on script vicreg/hubconf.py

In fact, the function resnet50() defined in the script /main/resnet.py returns a tuple

def resnet50(**kwargs):
    return ResNet(Bottleneck, [3, 4, 6, 3], **kwargs), 2048

In order to make this work, I had to modify the function resnet50() in hubconf.py:


def resnet50(pretrained=True, **kwargs):
    model = resnet.resnet50(**kwargs)
    if pretrained:
        state_dict = torch.hub.load_state_dict_from_url(
            url="https://dl.fbaipublicfiles.com/vicreg/resnet50.pth",
            map_location='cpu',
        )
        model[0].load_state_dict(state_dict, strict=True)     --> Grab first element of tuple!
    return model

TypeError: BasicBlock.__init__() got an unexpected keyword argument 'last_activation'

I got the following error, when I ran:
python -m torch.distributed.launch --nproc_per_node=8 main_vicreg.py --data-dir <path to my data> --exp-dir <path to my output> --arch resnet34 --epochs 10 --batch-size 512 --base-lr 0.3

Error output

Traceback (most recent call last):
  File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 340, in <module>
    main(args)
  File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 106, in main
    model = VICReg(args)#.cuda(gpu)
  File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 191, in __init__
    self.backbone, self.embedding = resnet.__dict__[args.arch](
  File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 300, in resnet34
    return ResNet(BasicBlock, [3, 4, 6, 3], **kwargs), 512
  File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 191, in __init__
    self.layer1 = self._make_layer(block, num_out_filters, layers[0])
  File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 253, in _make_layer
    block(
TypeError: BasicBlock.__init__() got an unexpected keyword argument 'last_activation'

Upon adding last_activation="relu" to the __init__() of the BasicBlock, the error was rectified.

Discrepancy with paper for covariance loss

Hi,

I noticed a difference between the code (main_vicreg.py#L202) and the Algorithm 1 in the paper where the subtraction of the mean from $x$ and $y$ is performed after the variance loss.

Could you check and let me know which one is correct?
Thank you for your time,
Paolo

What is the base learning rate for batch size 512?

The original paper states that the base lr is 0.4.
image

However, 8-GPU single node training script says 0.3.

python -m torch.distributed.launch --nproc_per_node=8 main_vicreg.py --data-dir /path/to/imagenet/ --exp-dir /path/to/experiment/ --arch resnet50 --epochs 100 --batch-size 512 --base-lr 0.3

Which one is correct?

Non-symmetric Augmentations

Hi,
In the paper, you mentioned that the augmentations are symmetrized but in the code probabilities for blur and polarization are not symmetric (similar to BYOL). Is there a reason for that?

How to get image embedding with vicreg?

Hi, thanks for your work!
How can i get image features from vicreg models?
I doing:

import torch 
vicreg_model = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')
data_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

data = torch.unsqueeze(data_transforms(Image.open(image).convert('RGB')), dim=0)
with torch.no_grad():  
    embedding = vicreg_model(data.float()).cpu()

And question for further works: how i can implement self-supervised metric learning with vicreg for custom dataset? I need a good embedding for image from my own dataset.

Can't load the full checkpoint

I downloaded the available checkpoint for ResNet-50 through the provided link: https://dl.fbaipublicfiles.com/vicreg/resnet50_fullckpt.pth

But upon loading the checkpoint, following error appears:

>>> torch.load(checkpoint_path)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/user/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/serialization.py", line 853, in _load
    result = unpickler.load()
AttributeError: Can't get attribute 'exclude_bias_and_norm' on <module '__main__' (built-in)>

Same error happens with the other checkpoints. Is there something I am doing wrong? Appreciate the help!

Using pytorch version 1.7.1 and torchvision 0.8.2

Hyperparameter recommendations for Resnet18 and batch size of 256

Hi,

I can push up to a batch size of 256 on the Resnet18 backbone.
Could you suggest recommendations on hyperparameters ( Base learning rate, Projector dimensions, similarity/std/cov coefficients, etc.) that would be appropriate for the VICReg framework?

Regards

Readme commands to load pretrained models on PyTorch Hub (resnet50x2 and resnet200x2) not working

Hi,

When running the commands specified on the readme file to load the pretrained models on PyTorch Hub, they fail for resnet50x2 and resnet200x2. The issue is that the callable methods that load such models in hubconf.py are not called 'resnet50x2' and 'resnet200x2' but 'resnet50x2' and 'resnet200x2', respectively.

I got it working by changing the original commands shown in the readme by the following:

import torch
resnet50 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')
resnet50x2 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50w2')
resnet200x2 = torch.hub.load('facebookresearch/vicreg:main', 'resnet200w2')

Where I just replaced the 'x' by 'w' in 'resnet50x2' and 'resnet200x2'.

Thank you,
Xavi

Loss becomes Nan suddenly.

The loss becomes NaN after some number of epochs, and then the model never converges. This happens randomly. Trying to train a custom dataset with a batch size of 2048 and base lr of 0.2 on 4 A100s.
{"epoch": 299, "step": 264121, "loss": 14.823115348815918, "time": 33933, "lr": 1.2852132052592342} {"epoch": 299, "step": 264163, "loss": 14.84267520904541, "time": 33993, "lr": 1.2851170355212638} {"epoch": 299, "step": 264205, "loss": NaN, "time": 34054, "lr": 1.2850208546990547} {"epoch": 299, "step": 264245, "loss": 14.683608055114746, "time": 34115, "lr": 1.2849292436129383} {"epoch": 300, "step": 264300, "loss": 14.624290466308594, "time": 34213, "lr": 1.2848032619604233} {"epoch": 300, "step": 264335, "loss": 14.825407981872559, "time": 34275, "lr": 1.2847230819273718} {"epoch": 300, "step": 264376, "loss": 14.545491218566895, "time": 34336, "lr": 1.2846291469640327} {"epoch": 300, "step": 264417, "loss": 14.715323448181152, "time": 34397, "lr": 1.2845352014486329} {"epoch": 300, "step": 264458, "loss": 54.99197769165039, "time": 34458, "lr": 1.284441245383221} {"epoch": 300, "step": 264501, "loss": 22.475656509399414, "time": 34518, "lr": 1.2843426947628278} {"epoch": 300, "step": 264542, "loss": 23.183082580566406, "time": 34579, "lr": 1.2842487170891577} {"epoch": 300, "step": 264583, "loss": 23.91857147216797, "time": 34640, "lr": 1.284154728871724} {"epoch": 300, "step": 264626, "loss": 24.39642906188965, "time": 34701, "lr": 1.284056144537631} {"epoch": 300, "step": 264668, "loss": 24.610559463500977, "time": 34762, "lr": 1.2839598416708278} {"epoch": 300, "step": 264711, "loss": 24.703632354736328, "time": 34823, "lr": 1.2838612344228337} {"epoch": 300, "step": 264753, "loss": 24.7344970703125, "time": 34883, "lr": 1.283764909179525} {"epoch": 300, "step": 264793, "loss": 24.75002670288086, "time": 34944, "lr": 1.283673160576227} {"epoch": 300, "step": 264835, "loss": 24.750017166137695, "time": 35005, "lr": 1.2835768137547283} {"epoch": 300, "step": 264878, "loss": 24.750001907348633, "time": 35066, "lr": 1.2834781615145805} {"epoch": 300, "step": 264921, "loss": 24.75, "time": 35127, "lr": 1.283379497695405} {"epoch": 300, "step": 264963, "loss": 24.75, "time": 35187, "lr": 1.2832831172076795} {"epoch": 300, "step": 265005, "loss": 24.75, "time": 35248, "lr": 1.2831867256776874}

Regarding the variance regulation term

Thanks for the great work! It is stated in the paper (Sec.4.1) that the hinge function encourages the variance to be equal to \gamma. I think it should be above \gamma instead. Is that correct? Will there be a situation where the variance becomes too large?

Transfer learning results

I used this code to get the classification results of the paper`s table2(transfer learning results)

However, the results are quite different.

                  Paper       Try                                  
INat18            47%         10%
Places205         54.3%       41%

Table2: "linear classification tasks on top of frozen representations"
C ADDITIONAL IMPLEMENTATION DETAILS
C.3 TRANSFER LEARNING
Linear classification : "experiment detail"

Following above sentence and C.3 paragraph, I tried two dataset(INat18, Places205)
1. Load pretrained by Imagenet weight in this repo.
2. Frozen all weights in resnet50 backbone.
(It is not updated at all.)
3. Training with Linear Layer.

I wonder is there anything I miss?
Is this process right access?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.