facebookresearch / vicreg Goto Github PK
View Code? Open in Web Editor NEWVICReg official code base
License: MIT License
VICReg official code base
License: MIT License
In my experiment, the cov_loss is about 1e-5 within the first 10 epochs and then became 0, I wonder the magnitude of your cov_loss. Thank you!
Hi!
Reading your code, I've noticed you take the mean of the two variance terms:
https://github.com/facebookresearch/vicreg/blob/main/main_vicreg.py#L207
On the other hand, there's no sign of division by 2 in equation 1) in the paper.
I've also found the default parameters of λ, µ and ν matches to the best performing setup in Table 7, so I believe there's a conflict between code and paper. Can you please double-check this?
Thanks.
Hi! Thank you for your work! I have a question regarding cross-modal retrieval on COCO, I am struggling to reproduce the results reported in the paper. Could you provide more details on your training protocol? Which coefficients are you using for VICReg/what kind of expander architecture/are you doing any further downstream training or do you directly use the encoder embeddings obtained via ssl?
Thank you!
Hello! During the training of cifar10 dataset, do you encounter that when the batchsize is set to 2048, you can't run on the dual card nvidia3090? Display memory overflow.
So I changed the batch size to 256, which is still a memory overflow.
Finally, I had no choice but to change it to 128 to run.
However, compared with simclr and swav codes, the batch size that can be set under the same device is not so small. I can generally run 2048 or 1024. Is this normal?
My device is nvidia3090, dual card, with 48g of running video memory. The training data set is cifar10
If you can easily answer, I will be very happy!
"For VOC07 Everingham et al. (2010), we train a linear SVM with LIBLINEAR Fan et al. (2008). The images are center cropped and resized to 224 × 224, and the C values are computed with cross-validation." - this is from the VICReg paper by Bardes et al., ICLR 2022.
I am not being able to reproduce the results. It will be useful if the authors provided the code or the optimal C value which worked for them in this case.
Dear VICReg authors, thanks for sharing the codes of this great work! I have strictly followed the parameter settings for the object detection task on VOC2007+12 described in the paper and initialized the model with the provided VICReg model in this repo. However, I achieved only 79.0734 mAP50 using detectron2, lower than the reported results 82.4 in Table 2. Could you please give some guidance or share your detectron2 config file to reproduce the result? Thanks in advance!
Hi, thanks for the great work!
I am trying to implement your method on my own dataset. Would you please post a figure of the loss changes with respect to the training epochs? It would be very helpful.
Thanks!
Line 211 in 4e12602
Hi,
Thanks for the repo!
I saw from the README that the only requirement was pytorch==1.7.1
but I think there is also an implicit requirement on the vision library: torchvision >=0.9.0
.
Indeed from pytorch official installation guide pytorch 1.7.1 should be installed with torchvision 0.8.2 (see doc). However, according to the torchvision repo, the class InterpolationMode
, only appears in version 0.9.0.
This leads to an ImportError
in augmentations.py.
Do you know the reason behind this and can you confirm which version of torchvision you are using?
Thanks!
Hi. Thanks for the great work!
I'm trying to reproduce the results of Table 12 (impact of expander dimensionality) in your paper.
Could you teach me what hyperparameters you used in the experiments?
Following the instructions on the README I am trying to download the pretrained models
import torch
resnet50 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')
This fails because "tuple" has no attribute load_state_dict(), pointing to line 21 on script vicreg/hubconf.py
In fact, the function resnet50() defined in the script /main/resnet.py returns a tuple
def resnet50(**kwargs):
return ResNet(Bottleneck, [3, 4, 6, 3], **kwargs), 2048
In order to make this work, I had to modify the function resnet50() in hubconf.py:
def resnet50(pretrained=True, **kwargs):
model = resnet.resnet50(**kwargs)
if pretrained:
state_dict = torch.hub.load_state_dict_from_url(
url="https://dl.fbaipublicfiles.com/vicreg/resnet50.pth",
map_location='cpu',
)
model[0].load_state_dict(state_dict, strict=True) --> Grab first element of tuple!
return model
I got the following error, when I ran:
python -m torch.distributed.launch --nproc_per_node=8 main_vicreg.py --data-dir <path to my data> --exp-dir <path to my output> --arch resnet34 --epochs 10 --batch-size 512 --base-lr 0.3
Traceback (most recent call last):
File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 340, in <module>
main(args)
File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 106, in main
model = VICReg(args)#.cuda(gpu)
File "/Users/en_tetteh/SSL/vicreg/main_vicreg.py", line 191, in __init__
self.backbone, self.embedding = resnet.__dict__[args.arch](
File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 300, in resnet34
return ResNet(BasicBlock, [3, 4, 6, 3], **kwargs), 512
File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 191, in __init__
self.layer1 = self._make_layer(block, num_out_filters, layers[0])
File "/Users/en_tetteh/SSL/vicreg/resnet.py", line 253, in _make_layer
block(
TypeError: BasicBlock.__init__() got an unexpected keyword argument 'last_activation'
Upon adding last_activation="relu"
to the __init__()
of the BasicBlock
, the error was rectified.
Hi,
I noticed a difference between the code (main_vicreg.py#L202) and the Algorithm 1 in the paper where the subtraction of the mean from
Could you check and let me know which one is correct?
Thank you for your time,
Paolo
The original paper states that the base lr is 0.4.
However, 8-GPU single node training script says 0.3.
python -m torch.distributed.launch --nproc_per_node=8 main_vicreg.py --data-dir /path/to/imagenet/ --exp-dir /path/to/experiment/ --arch resnet50 --epochs 100 --batch-size 512 --base-lr 0.3
Which one is correct?
Hi, I came across the implementation of the covariance matrix of Z while trying to use VICReg in my project. I feel like I am missing something here, but I wonder if the implementation should be x @ x.T instead of x.T @ x?
Thanks for any information you can provide.
Hi,
In the paper, you mentioned that the augmentations are symmetrized but in the code probabilities for blur and polarization are not symmetric (similar to BYOL). Is there a reason for that?
Hi, thanks for your work!
How can i get image features from vicreg models?
I doing:
import torch
vicreg_model = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')
data_transforms = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
data = torch.unsqueeze(data_transforms(Image.open(image).convert('RGB')), dim=0)
with torch.no_grad():
embedding = vicreg_model(data.float()).cpu()
And question for further works: how i can implement self-supervised metric learning with vicreg for custom dataset? I need a good embedding for image from my own dataset.
I downloaded the available checkpoint for ResNet-50 through the provided link: https://dl.fbaipublicfiles.com/vicreg/resnet50_fullckpt.pth
But upon loading the checkpoint, following error appears:
>>> torch.load(checkpoint_path)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/user/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
AttributeError: Can't get attribute 'exclude_bias_and_norm' on <module '__main__' (built-in)>
Same error happens with the other checkpoints. Is there something I am doing wrong? Appreciate the help!
Using pytorch version 1.7.1 and torchvision 0.8.2
Hi,
I can push up to a batch size of 256 on the Resnet18 backbone.
Could you suggest recommendations on hyperparameters ( Base learning rate, Projector dimensions, similarity/std/cov coefficients, etc.) that would be appropriate for the VICReg framework?
Regards
Hi,
When running the commands specified on the readme file to load the pretrained models on PyTorch Hub, they fail for resnet50x2 and resnet200x2. The issue is that the callable methods that load such models in hubconf.py are not called 'resnet50x2' and 'resnet200x2' but 'resnet50x2' and 'resnet200x2', respectively.
I got it working by changing the original commands shown in the readme by the following:
import torch
resnet50 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50')
resnet50x2 = torch.hub.load('facebookresearch/vicreg:main', 'resnet50w2')
resnet200x2 = torch.hub.load('facebookresearch/vicreg:main', 'resnet200w2')
Where I just replaced the 'x' by 'w' in 'resnet50x2' and 'resnet200x2'.
Thank you,
Xavi
The loss becomes NaN after some number of epochs, and then the model never converges. This happens randomly. Trying to train a custom dataset with a batch size of 2048 and base lr of 0.2 on 4 A100s.
{"epoch": 299, "step": 264121, "loss": 14.823115348815918, "time": 33933, "lr": 1.2852132052592342} {"epoch": 299, "step": 264163, "loss": 14.84267520904541, "time": 33993, "lr": 1.2851170355212638} {"epoch": 299, "step": 264205, "loss": NaN, "time": 34054, "lr": 1.2850208546990547} {"epoch": 299, "step": 264245, "loss": 14.683608055114746, "time": 34115, "lr": 1.2849292436129383} {"epoch": 300, "step": 264300, "loss": 14.624290466308594, "time": 34213, "lr": 1.2848032619604233} {"epoch": 300, "step": 264335, "loss": 14.825407981872559, "time": 34275, "lr": 1.2847230819273718} {"epoch": 300, "step": 264376, "loss": 14.545491218566895, "time": 34336, "lr": 1.2846291469640327} {"epoch": 300, "step": 264417, "loss": 14.715323448181152, "time": 34397, "lr": 1.2845352014486329} {"epoch": 300, "step": 264458, "loss": 54.99197769165039, "time": 34458, "lr": 1.284441245383221} {"epoch": 300, "step": 264501, "loss": 22.475656509399414, "time": 34518, "lr": 1.2843426947628278} {"epoch": 300, "step": 264542, "loss": 23.183082580566406, "time": 34579, "lr": 1.2842487170891577} {"epoch": 300, "step": 264583, "loss": 23.91857147216797, "time": 34640, "lr": 1.284154728871724} {"epoch": 300, "step": 264626, "loss": 24.39642906188965, "time": 34701, "lr": 1.284056144537631} {"epoch": 300, "step": 264668, "loss": 24.610559463500977, "time": 34762, "lr": 1.2839598416708278} {"epoch": 300, "step": 264711, "loss": 24.703632354736328, "time": 34823, "lr": 1.2838612344228337} {"epoch": 300, "step": 264753, "loss": 24.7344970703125, "time": 34883, "lr": 1.283764909179525} {"epoch": 300, "step": 264793, "loss": 24.75002670288086, "time": 34944, "lr": 1.283673160576227} {"epoch": 300, "step": 264835, "loss": 24.750017166137695, "time": 35005, "lr": 1.2835768137547283} {"epoch": 300, "step": 264878, "loss": 24.750001907348633, "time": 35066, "lr": 1.2834781615145805} {"epoch": 300, "step": 264921, "loss": 24.75, "time": 35127, "lr": 1.283379497695405} {"epoch": 300, "step": 264963, "loss": 24.75, "time": 35187, "lr": 1.2832831172076795} {"epoch": 300, "step": 265005, "loss": 24.75, "time": 35248, "lr": 1.2831867256776874}
Thanks for the great work! It is stated in the paper (Sec.4.1) that the hinge function encourages the variance to be equal to \gamma. I think it should be above \gamma instead. Is that correct? Will there be a situation where the variance becomes too large?
I used this code to get the classification results of the paper`s table2(transfer learning results)
However, the results are quite different.
Paper Try
INat18 47% 10%
Places205 54.3% 41%
Table2: "linear classification tasks on top of frozen representations"
C ADDITIONAL IMPLEMENTATION DETAILS
C.3 TRANSFER LEARNING
Linear classification : "experiment detail"
Following above sentence and C.3 paragraph, I tried two dataset(INat18, Places205)
1. Load pretrained by Imagenet weight in this repo.
2. Frozen all weights in resnet50 backbone.
(It is not updated at all.)
3. Training with Linear Layer.
I wonder is there anything I miss?
Is this process right access?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.