juliendenize / eztorch Goto Github PK
View Code? Open in Web Editor NEWLibrary to perform image and video self-supervised learning.
Home Page: https://juliendenize.github.io/eztorch/
License: Other
Library to perform image and video self-supervised learning.
Home Page: https://juliendenize.github.io/eztorch/
License: Other
Hello! Firstly thanks for providing this amazing library and research! I got an issue during the inference using the provided checkpoint.
I was unable to use the ViViT Tiny checkpoint to run the inference. There are several missing keys in the checkpoint, such as pytorch-lightning_version
, global_step
, epoch
and state_dict
. However I modified them and artificially created the missing keys, and moved the whole checkpoint dict into the state_dict
. Now I was greeted with another missing keys of the model itself:
RuntimeError: Error(s) in loading state_dict for SoccerNetSpottingModel:
Missing key(s) in state_dict: "train_transform.0._transform.0.transforms.2.brightness", "train_transform.0._transform.0.transforms.2.contrast", "train_transform.0._transform.0.transforms.2.saturation", "val_transform.1._transform.0.transforms.2.mean", "val_transform.1._transform.0.transforms.2.std", "test_transform.1._transform.0.transforms.2.mean", "test_transform.1._transform.0.transforms.2.std".
Unexpected key(s) in state_dict: "trunk.transformer.temporal_mask_token", "val_transform.1._transform.2.mean", "val_transform.1._transform.2.std", "test_transform.1._transform.2.mean", "test_transform.1._transform.2.std".
size mismatch for train_transform.0._transform.0.transforms.5.mean: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([3, 1, 1]).
size mismatch for train_transform.0._transform.0.transforms.5.std: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([3, 1, 1]).
I'm using the inference step args in the doc, with this as the inference input config
config_path="../eztorch/configs/run/finetuning/vivit"
config_name="vivit_tiny_soccernet_uniform"
...
Do you have any idea what did I miss?
Thanks again!
I am interested in computing my own video with COMEDIAN to detect soccer events.
Specifically I would like to use the already pretrained model for a number of different events.
I have tried to capture the model in a Python script to load the model.
from hydra import compose, initialize
import hydra
from eztorch.utils.utils import compile_model
# initialize
with initialize(
config_path="./eztorch/configs/run/finetuning/vivit",
version_base="1.11",
):
config = compose(config_name="vivit_tiny_soccernet_uniform").
model = hydra.utils.instantiate(config.model)
model = compile_model(model, config)
Load .pth:
ckpt_path = "./comedian_vivit_tiny_seed203.pth"
state_dict = torch.load(ckpt_path)
state_dict["train_transform.0._transform.0.transforms.5.mean"] = state_dict["train_transform.0._transform.0.transforms.5.mean"].view(3, 1, 1)
state_dict["train_transform.0._transform.0.transforms.5.std"] = state_dict["train_transform.0._transform.0.transforms.5.std"].view(3, 1, 1)
model._orig_mod.load_state_dict(state_dict, strict=False)
The result:
_IncompatibleKeys(missing_keys=['train_transform.0._transform.0.transforms.2.brightness', 'train_transform.0._transform.0.transforms.2.contrast', 'train_transform.0._transform.0.transforms.2.saturation', 'val_transform.1._transform.0.transforms.2.mean', 'val_transform.1._transform.0.transforms.2.std', 'test_transform.1._transform.0.transforms.2.mean', 'test_transform.1._transform.0.transforms.2.std'], unexpected_keys=['trunk.transformer.temporal_mask_token', 'trunk.transformer.temporal_transformer.blocks.4.norm1.weight', 'trunk.transformer.temporal_transformer.blocks.4.norm1.bias', 'trunk.transformer.temporal_transformer.blocks.4.attn.qkv.weight', 'trunk.transformer.temporal_transformer.blocks.4.attn.qkv.bias', 'trunk.transformer.temporal_transformer.blocks.4.attn.proj.weight', 'trunk.transformer.temporal_transformer.blocks.4.attn.proj.bias', 'trunk.transformer.temporal_transformer.blocks.4.norm2.weight', 'trunk.transformer.temporal_transformer.blocks.4.norm2.bias', 'trunk.transformer.temporal_transformer.blocks.4.mlp.fc1.weight', 'trunk.transformer.temporal_transformer.blocks.4.mlp.fc1.bias', 'trunk.transformer.temporal_transformer.blocks.4.mlp.fc2.weight', 'trunk.transformer.temporal_transformer.blocks.4.mlp.fc2.bias', 'trunk.transformer.temporal_transformer.blocks.5.norm1.weight', 'trunk.transformer.temporal_transformer.blocks.5.norm1.bias', 'trunk.transformer.temporal_transformer.blocks.5.attn.qkv.weight', 'trunk.transformer.temporal_transformer.blocks.5.attn.qkv.bias', 'trunk.transformer.temporal_transformer.blocks.5.attn.proj.weight', 'trunk.transformer.temporal_transformer.blocks.5.attn.proj.bias', 'trunk.transformer.temporal_transformer.blocks.5.norm2.weight', 'trunk.transformer.temporal_transformer.blocks.5.norm2.bias', 'trunk.transformer.temporal_transformer.blocks.5.mlp.fc1.weight', 'trunk.transformer.temporal_transformer.blocks.5.mlp.fc1.bias', 'trunk.transformer.temporal_transformer.blocks.5.mlp.fc2.weight', 'trunk.transformer.temporal_transformer.blocks.5.mlp.fc2.bias', 'val_transform.1._transform.2.mean', 'val_transform.1._transform.2.std', 'test_transform.1._transform.2.mean', 'test_transform.1._transform.2.std'])
I think the pth has loaded well minus a few transforms. I'm not sure if I'm performing the model loading in the best way: I'm trying to use it freely from Python, because when I train it I want to use it in a process.
If I am right the model._orig_mod.trunk
layer directly extracts the features every 2 frames.
x = torch.randn(1, 3, 128, 224, 224).cuda()
model.eval()
model = model.cuda()
with torch.no_grad():
feats = model._orig_mod.trunk(x)
y = model(x)
assert torch.allclose(feats, y['h'], atol=1e-5)
Please confirm me these questions and if I am doing it right.
Hi!
Thanks for providing this cool code base!
I'm trying to reproduce our results from the paper in Table 5, where you got 48.1% t-AmAP without step 1 and step 2, and I am struggling with this.
As a beginning, I was able to reproduce good results based on your checkpoint so it appears not to be a data issue.
Can you share the hyperparams that led you to get these results without any pertaining?
I see that the learning rate in the paper was 5*10^(-4) but in the yaml it is 0.001, but this did not help the network to converge.
Do I miss something?
Did you train on 2-80GB A100?
Thanks in advance!
Daniel
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.