Coder Social home page Coder Social logo

video-cnn-feat's Introduction

Extracting CNN features from video frames by MXNet

The video-cnn-feat toolbox provides python code and scripts for extracting CNN features from video frames by pre-trained MXNet models. We have used this toolbox for our winning solution at TRECVID 2018 ad-hoc video search (AVS) task and in our W2VV++ paper.

Requirements

Environments

  • Ubuntu 16.04
  • CUDA 9.0
  • python 2.7
  • opencv-python
  • mxnet-cu90
  • numpy

We used virtualenv to setup a deep learning workspace that supports MXNet. Run the following script to install the required packages.

virtualenv --system-site-packages ~/cnn_feat
source ~/cnn_feat/bin/activate
pip install -r requirements.txt
deactivate

MXNet models

1. ResNet-152 from the MXNet model zoo

# Download resnet-152 model pre-trained on imagenet-11k
./do_download_resnet152_11k.sh

2. ResNeXt-101 from MediaMill, University of Amsterdam

Send a request to xirong ATrucDOTeduDOTcn for the model link. Please read the ImageNet Shuffle paper for technical details.

Get started

Our code assumes the following data organization. We provide the toydata folder as an example.

collection_name
+ VideoData
+ ImageData
+ id.imagepath.txt

The toydata folder is assumed to be placed at $HOME/VisualSearch/. Video files are stored in the VideoData folder. Frame files are in the ImageDatafolder.

  • Video filenames shall end with .mp4, .avi, .webm, or .gif.
  • Frame filenames shall end with .jpg.

Feature extraction for a given video collection is performed in the following four steps. Skip the first step if frames are already there.

Step 1. Extract frames from videos

collection=toydata
./do_extract_frames.sh $collection

Step 2. Extract frame-level CNN features

./do_resnet152-11k.sh $collection
./do_resnext101.sh $collection

Step 3. Obtain video-level CNN features (by mean pooling over frames)

./do_feature_pooling.sh $collection pyresnet-152_imagenet11k,flatten0_output,os
./do_feature_pooling.sh $collection pyresnext-101_rbps13k,flatten0_output,os

Step 4. Feature concatenation

featname=pyresnext-101_rbps13k,flatten0_output,os+pyresnet-152_imagenet11k,flatten0_output,os
./do_concat_features.sh $collection $featname

Acknowledgements

This project was supported by the National Natural Science Foundation of China (No. 61672523).

video-cnn-feat's People

Contributors

xuchaoxi avatar li-xirong avatar qcuong98 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.