Coder Social home page Coder Social logo

pytorch-resnet3d's Introduction

3D ConvNets in Pytorch

Do you want >70% top-1 accuracy on a large video dataset? Are you tired of Kinetics videos disappearing from YouTube every day? Do you have recurring nightmares about Caffe2? Then this is the repo for you!

This is a PyTorch implementation of the Caffe2 I3D ResNet baseline from the video-nonlocal-net repo. The weights are directly ported from the caffe2 model (See checkpoints). This should be a good starting point to extract features, finetune on another dataset etc. without the hassle of dealing with Caffe2, and with all the benefits of a very carefully trained Kinetics model.

It's only a matter of time before FAIR releases a good PyTorch version of their nonlocal-net codebase, but until then, at least you have this ¯\_(ツ)_/¯

Amazing features:
⁣- Only a single model (ResNet50-I3D). Parameters hardcoded with love.
⁣- Only the evaluation script for Kinetics (training from scratch or ftuning has not been tested yet.)
⁣- No nonlocal versions yet.

Kinetics Evaluation

The code has been tested with Python 3.7 + PyTorch 1.0.

Pretrained Weights
Download pretrained weights for run_i3d_baseline_400k_32f from the nonlocal repo

wget https://dl.fbaipublicfiles.com/video-nonlocal/i3d_baseline_32x2_IN_pretrain_400k.pkl -P pretrained/

Convert these weights from caffe2 to pytorch. This is just a simple renaming of the blobs to match the pytorch model.

python -m utils.convert_weights pretrained/i3d_baseline_32x2_IN_pretrain_400k.pkl pretrained/i3d_r50_kinetics.pth

The model can be created and weights loaded using

from models import resnet
net = resnet.i3_res50()

Data
Download videos using the official crawler and extract frames. This repo has a script to do this. Then create softlinks for frames and annotations:

mkdir -p data/kinetics/frames/ data/kinetics/annotations/
ln -s /path/to/kinetics/frames data/kinetics/frames/
ln -s /path/to/kinetics/annotation_csvs data/kinetics/annotations/

Evaluate
Run the evaluation script to generate scores on the validation set.

# Evaluation using 3 random spatial crops per frame + 10 uniformly sampled clips per video
python eval.py --batch_size 8 --mode video
>> (test) A: 0.722 | clf: 1.158 | total_loss: 1.158

# Evaluation using a single, center crop and a single, centered clip of 32 frames
python eval.py --batch_size 8 --mode clip
>> (test) A: 0.647 | clf: 1.551 | total_loss: 1.551

# Use --parallel for multiple GPUs
python eval.py --batch_size 64 --mode clip --parallel

You should get around 72.2% top-1 accuracy for the video (3 random spatial crops per frame + 10 uniformly sampled clips per video) and around 64.7% top-1 accuracy for the clip (single, center crop and a single, centered clip). Note that these numbers are on whatever is left of the Kinetics val set these days (~18434 videos).

pytorch-resnet3d's People

Contributors

tushar-n avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.