Coder Social home page Coder Social logo

juliendenize / eztorch Goto Github PK

View Code? Open in Web Editor NEW
36.0 36.0 0.0 2.08 MB

Library to perform image and video self-supervised learning.

Home Page: https://juliendenize.github.io/eztorch/

License: Other

Python 100.00%
contrastive-learning image image-processing pytorch self-supervised video video-processing

eztorch's People

Contributors

juliendenize avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

eztorch's Issues

Unable to use provided COMEDIAN checkpoint for inference

Hello! Firstly thanks for providing this amazing library and research! I got an issue during the inference using the provided checkpoint.

I was unable to use the ViViT Tiny checkpoint to run the inference. There are several missing keys in the checkpoint, such as pytorch-lightning_version, global_step, epoch and state_dict. However I modified them and artificially created the missing keys, and moved the whole checkpoint dict into the state_dict. Now I was greeted with another missing keys of the model itself:

RuntimeError: Error(s) in loading state_dict for SoccerNetSpottingModel:
        Missing key(s) in state_dict: "train_transform.0._transform.0.transforms.2.brightness", "train_transform.0._transform.0.transforms.2.contrast", "train_transform.0._transform.0.transforms.2.saturation", "val_transform.1._transform.0.transforms.2.mean", "val_transform.1._transform.0.transforms.2.std", "test_transform.1._transform.0.transforms.2.mean", "test_transform.1._transform.0.transforms.2.std".
        Unexpected key(s) in state_dict: "trunk.transformer.temporal_mask_token", "val_transform.1._transform.2.mean", "val_transform.1._transform.2.std", "test_transform.1._transform.2.mean", "test_transform.1._transform.2.std".
        size mismatch for train_transform.0._transform.0.transforms.5.mean: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([3, 1, 1]).
        size mismatch for train_transform.0._transform.0.transforms.5.std: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([3, 1, 1]).

I'm using the inference step args in the doc, with this as the inference input config

config_path="../eztorch/configs/run/finetuning/vivit"
config_name="vivit_tiny_soccernet_uniform"
...

Do you have any idea what did I miss?

Thanks again!

Compute custom video with COMEDIAN

I am interested in computing my own video with COMEDIAN to detect soccer events.

Specifically I would like to use the already pretrained model for a number of different events.

I have tried to capture the model in a Python script to load the model.

from hydra import compose, initialize
import hydra
from eztorch.utils.utils import compile_model

# initialize

with initialize(
    config_path="./eztorch/configs/run/finetuning/vivit",
    version_base="1.11",
):
    config = compose(config_name="vivit_tiny_soccernet_uniform").
    model = hydra.utils.instantiate(config.model)
    model = compile_model(model, config)

Load .pth:

ckpt_path = "./comedian_vivit_tiny_seed203.pth"
state_dict = torch.load(ckpt_path)
state_dict["train_transform.0._transform.0.transforms.5.mean"] = state_dict["train_transform.0._transform.0.transforms.5.mean"].view(3, 1, 1)
state_dict["train_transform.0._transform.0.transforms.5.std"] = state_dict["train_transform.0._transform.0.transforms.5.std"].view(3, 1, 1)
model._orig_mod.load_state_dict(state_dict, strict=False)

The result:

_IncompatibleKeys(missing_keys=['train_transform.0._transform.0.transforms.2.brightness', 'train_transform.0._transform.0.transforms.2.contrast', 'train_transform.0._transform.0.transforms.2.saturation', 'val_transform.1._transform.0.transforms.2.mean', 'val_transform.1._transform.0.transforms.2.std', 'test_transform.1._transform.0.transforms.2.mean', 'test_transform.1._transform.0.transforms.2.std'], unexpected_keys=['trunk.transformer.temporal_mask_token', 'trunk.transformer.temporal_transformer.blocks.4.norm1.weight', 'trunk.transformer.temporal_transformer.blocks.4.norm1.bias', 'trunk.transformer.temporal_transformer.blocks.4.attn.qkv.weight', 'trunk.transformer.temporal_transformer.blocks.4.attn.qkv.bias', 'trunk.transformer.temporal_transformer.blocks.4.attn.proj.weight', 'trunk.transformer.temporal_transformer.blocks.4.attn.proj.bias', 'trunk.transformer.temporal_transformer.blocks.4.norm2.weight', 'trunk.transformer.temporal_transformer.blocks.4.norm2.bias', 'trunk.transformer.temporal_transformer.blocks.4.mlp.fc1.weight', 'trunk.transformer.temporal_transformer.blocks.4.mlp.fc1.bias', 'trunk.transformer.temporal_transformer.blocks.4.mlp.fc2.weight', 'trunk.transformer.temporal_transformer.blocks.4.mlp.fc2.bias', 'trunk.transformer.temporal_transformer.blocks.5.norm1.weight', 'trunk.transformer.temporal_transformer.blocks.5.norm1.bias', 'trunk.transformer.temporal_transformer.blocks.5.attn.qkv.weight', 'trunk.transformer.temporal_transformer.blocks.5.attn.qkv.bias', 'trunk.transformer.temporal_transformer.blocks.5.attn.proj.weight', 'trunk.transformer.temporal_transformer.blocks.5.attn.proj.bias', 'trunk.transformer.temporal_transformer.blocks.5.norm2.weight', 'trunk.transformer.temporal_transformer.blocks.5.norm2.bias', 'trunk.transformer.temporal_transformer.blocks.5.mlp.fc1.weight', 'trunk.transformer.temporal_transformer.blocks.5.mlp.fc1.bias', 'trunk.transformer.temporal_transformer.blocks.5.mlp.fc2.weight', 'trunk.transformer.temporal_transformer.blocks.5.mlp.fc2.bias', 'val_transform.1._transform.2.mean', 'val_transform.1._transform.2.std', 'test_transform.1._transform.2.mean', 'test_transform.1._transform.2.std'])

I think the pth has loaded well minus a few transforms. I'm not sure if I'm performing the model loading in the best way: I'm trying to use it freely from Python, because when I train it I want to use it in a process.

If I am right the model._orig_mod.trunk layer directly extracts the features every 2 frames.

x = torch.randn(1, 3, 128, 224, 224).cuda()
model.eval()
model = model.cuda()
with torch.no_grad():
    feats = model._orig_mod.trunk(x)
    y = model(x)
    assert torch.allclose(feats, y['h'], atol=1e-5)

Please confirm me these questions and if I am doing it right.

Train Vivit_tiny on soccerNetV2 from scratch

Hi!
Thanks for providing this cool code base!
I'm trying to reproduce our results from the paper in Table 5, where you got 48.1% t-AmAP without step 1 and step 2, and I am struggling with this.

As a beginning, I was able to reproduce good results based on your checkpoint so it appears not to be a data issue.

Can you share the hyperparams that led you to get these results without any pertaining?
I see that the learning rate in the paper was 5*10^(-4) but in the yaml it is 0.001, but this did not help the network to converge.
Do I miss something?
Did you train on 2-80GB A100?

Thanks in advance!
Daniel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.