Coder Social home page Coder Social logo

howto100m's Introduction

HowTo100M code

This repo provides code from the HowTo100M paper. We provide implementation of:

  • Our training procedure on HowTo100M for learning a joint text-video embedding
  • Our evaluation code on MSR-VTT, YouCook2 and LSMDC for Text-to-Video retrieval
  • A pretrain model on HowTo100M
  • Feature extraction from raw videos script we used

More information about HowTo100M can be found on the project webpage: https://www.di.ens.fr/willow/research/howto100m/

Requirements

  • Python 3
  • PyTorch (>= 1.0)
  • gensim

Video feature extraction

This separate github repo: https://github.com/antoine77340/video_feature_extractor provides an easy to use script to extract the exact same 2D and 3D CNN features we have extracted in our work.

Downloading a pretrained model

This will download our pretrained text-video embedding model on HowTo100M.

mkdir model
cd model
wget https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/models/howto100m_pt_model.pth
cd ..

Downloading meta-data for evaluation (csv, pre-extracted features for evaluation, word2vec)

This will download all the data needed for evaluation.

wget https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/metadata_eval.zip
unzip -d metadata_eval.zip

Downloading meta-data for training on HowTo100M (for the very brave folks :))

This will download all the additional meta data needed for training on HowTo100M.

cd data
wget https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/HowTo100M.zip
unzip -d HowTo100M.zip
cd ..

Evaluate the HowTo100M pretrained model on MSR-VTT, YouCook2 and LSMDC

This command will evaluate the off-the-shelf HowTo100M pretrained model on MSR-VTT, YouCook2 and LSMDC.

python eval.py --eval_msrvtt=1 --eval_youcook=1 --eval_lsmdc=1 --num_thread_reader=8 --embd_dim=6144 --pretrain_path=model/howto100m_pt_model.pth

Note that for MSR-VTT this will evaluate on the same test set as in JSFusion (https://arxiv.org/abs/1808.02559) work and that YouCook2 will be evaluated on the validation set since no test set is provided.

Fine-tune the pretrained model on MSR-VTT, YouCook2 and LSMDC

To fine-tune the pretrained model on MSR-VTT, just run this:

python train.py --msrvtt=1 --eval_msrvtt=1 --num_thread_reader=4 --batch_size=256 --epochs=50 --n_display=200 --lr_decay=1.0 --embd_dim=6144 --pretrain_path=model/howto100m_pt_model.pth

You can also fine-tune on YouCook2 / LSMDC with the parameter --youcook=1 / --lsmdc=1 instead of --msr-vtt=1.

Same thing to monitor evaluation on other dataset with the parameters --eval_msrvtt and --eval_lsmdc.

Training embedding from scratch on HowTo100M

If you were brave enough to download all the videos and extract 2D and 3D CNN features for them using our provided feature extractor script, we also provide you a way to train the same embedding model on HowTo100M.

First, you will need to extract features for all the HowTo100M videos using our 2D and 3D features extraction script: https://github.com/antoine77340/video_feature_extractor. The default folder you need to extract the 2d (resp. 3d) features is in 'feature_2d' (resp. 'feature_3d'), but the default folder can be modified by changing the argument --features_path_2D (resp. --features_path_3D). After specifying the root folder for the 2d and 3d features, please use the exact same relative output path name for the features.

Then, modify the training HowTo100M training CSV file (extracted in data/HowTo100M_v1.csv) by adding an additional column 'path', which points to the .npy feature path of each video you have extracted using the provided script.

For example, the modification will look like this:

Original HowTo100M CSV file:

video_id,category_1,category_2,rank,task_id
nVbIUDjzWY4,Cars & Other Vehicles,Motorcycles,27,52907
CTPAZ2euJ2Q,Cars & Other Vehicles,Motorcycles,35,109057
...
_97kyZVWVG0,Hobbies and Crafts,Crafts,34,119814
gkjnR3-ZVts,Food and Entertaining,Drinks,6,25938

Updated CSV file with features path:

video_id,category_1,category_2,rank,task_id,path
nVbIUDjzWY4,Cars & Other Vehicles,Motorcycles,27,52907, path/to/feature/vid_nVbIUDjzWY4.npy
CTPAZ2euJ2Q,Cars & Other Vehicles,Motorcycles,35,109057, path/to/feature/vid_CTPAZ2euJ2Q.npy
...
_97kyZVWVG0,Hobbies and Crafts,Crafts,34,119814, path/to/feature/vid__97kyZVWVG0.npy
gkjnR3-ZVts,Food and Entertaining,Drinks,6,25938, path/to/feature/vid_gkjnR3-ZVts.npy

Eventually, this command will train the model and save a checkpoint to the ckpt folder every epochs.

python train.py --num_thread_reader=8 --epochs=15 --batch_size=32 --n_pair=64 --n_display=100 --embd_dim=4096 --checkpoint_dir=ckpt --features_path_2D=folder_for_2d_features --features_path_3D=folder_for_3d_features --train_csv=train.csv

Note that you have to replace "folder_for_2d_features" and "folder_for_3d_features" with the directory used to extract the 2d and 3d video features.

If you find the code / model useful, please cite our paper

@inproceedings{miech19howto100m,
   title={How{T}o100{M}: {L}earning a {T}ext-{V}ideo {E}mbedding by {W}atching {H}undred {M}illion {N}arrated {V}ideo {C}lips},
   author={Miech, Antoine and Zhukov, Dimitri and Alayrac, Jean-Baptiste and Tapaswi, Makarand and Laptev, Ivan and Sivic, Josef},
   booktitle={ICCV},
   year={2019},
}

howto100m's People

Contributors

antoine77340 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.