Coder Social home page Coder Social logo

sunghoon-jung / video-caffe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chuckcho/video-caffe

0.0 2.0 0.0 47.04 MB

Video-friendly caffe -- comes with a recent version of Caffe (Dec 2016), a video reader, 3D(ND) pooling layer, and an example training script for C3D network and UCF-101 data

License: Other

CMake 2.67% Makefile 0.68% Shell 0.33% C++ 79.96% Cuda 5.79% MATLAB 0.87% M 0.01% Python 8.07% Protocol Buffer 1.61%

video-caffe's Introduction

Video-Caffe: Caffe with C3D implementation and video reader

Build Status

This is 3D convolution (C3D) and video reader implementation in the latest Caffe (Dec 2016). The original Facebook C3D implementation is branched out from Caffe on July 17, 2014 with git commit b80fc86, and has not been rebased with the original Caffe, hence missing out quite a few new features in the recent Caffe. I therefore pulled in C3D concept and an accompanying video reader and applied to the latest Caffe, and will try to rebase this repo with the upstream whenever there is a new important feature. This repo is rebased on 5a201dd, on Dec 19 2016. Please reach me for any feedback or question.

Check out the original Caffe readme for Caffe-specific information.

Branches

refactor branch is a recent re-work, based on the original Caffe and Nd convolution and pooling with cuDNN PR. This is a cleaner, less-hacky implementation of 3D convolution/pooling than the master branch, and is supposed to more stable than the master branch. So, feel free to try this branch. One missing feature in the refactor branch (yet) is the python wrapper.

Requirements

In addition to prerequisites for Caffe, video-caffe depends on cuDNN. It is known to work with CuDNN verson 4 and 5, but it may need some efforts to build with v3.

  • If you use "make" to build make sure Makefile.config point to the right paths for CUDA and CuDNN.
  • If you use "cmake" to build, double-check CUDNN_INCLUDE and CUDNN_LIBRARY are correct. If not, you may want something like cmake -DCUDNN_INCLUDE="/your/path/to/include" -DCUDNN_LIBRARY="/your/path/to/lib" ${video-caffe-root}.

Building video-caffe

Key steps to build video-caffe are:

  1. git clone [email protected]:chuckcho/video-caffe.git
  2. cd video-caffe
  3. mkdir build && cd build
  4. cmake ..
  5. Make sure CUDA and CuDNN are detected and their paths are correct.
  6. make all -j8
  7. make install
  8. (optional) make runtest

Usage

Look at ${video-caffe-root}/examples/c3d_ucf101/c3d_ucf101_train_test.prototxt for how 3D convolution and pooling are used. In a nutshell, use NdConvolution or NdPooling layer with {kernel,stride,pad}_shape that specifies 3D shapes in (L x H x W) where L is the temporal length (usually 16).

...
# ----- video/label input -----
layer {
  name: "data"
  type: "VideoData"
  top: "data"
  top: "label"
  video_data_param {
    source: "examples/c3d_ucf101/c3d_ucf101_train_split1.txt"
    batch_size: 50
    new_height: 128
    new_width: 171
    new_length: 16
    shuffle: true
  }
  include {
    phase: TRAIN
  }
  transform_param {
    crop_size: 112
    mirror: true
    mean_value: 90
    mean_value: 98
    mean_value: 102
  }
}
...
# ----- 1st group -----
layer {
  name: "conv1a"
  type: "NdConvolution"
  bottom: "data"
  top: "conv1a"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    kernel_shape { dim: 3 dim: 3 dim: 3 }
    stride_shape { dim: 1 dim: 1 dim: 1 }
    pad_shape    { dim: 1 dim: 1 dim: 1 }
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
...
layer {
  name: "pool1"
  type: "NdPooling"
  bottom: "conv1a"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_shape { dim: 1 dim: 2 dim: 2 }
    stride_shape { dim: 1 dim: 2 dim: 2 }
  }
}
...

UCF-101 training demo

Scripts and training files for C3D training on UCF-101 are located in examples/c3d_ucf101/. Steps to train C3D on UCF-101:

  1. Download UCF-101 dataset from UCF-101 website.
  2. Unzip the dataset: e.g. unrar x UCF101.rar
  3. (Optional) video reader works more stably with extracted frames than directly with video files. Extract frames from UCF-101 videos by revising and running a helper script, ${video-caffe-root}/examples/c3d_ucf101/extract_UCF-101_frames.sh.
  4. Change ${video-caffe-root}/examples/c3d_ucf101/c3d_ucf101_{train,test}_split1.txt to correctly point to UCF-101 videos or directories that contain extracted frames.
  5. Modify ${video-caffe-root}/examples/c3d_ucf101/c3d_ucf101_train_test.prototxt to your taste or HW specification. Especially batch_size may need to be adjusted for the GPU memory.
  6. Run training script: e.g. cd ${video-caffe-root} && examples/c3d_ucf101/train_ucf101.sh (optionally use --gpu to use multiple GPU's)
  7. (Optional) Occasionally run ${video-caffe-root}/tool/extra/plot_training_loss.sh to get training loss / validation accuracy (top1/5) plot. It's pretty hacky, so look at the file to meet your need.
  8. At 7 epochs of training, clip accuracy should be around 45%.

Pretrained model

Jimmy provided a pretrained model (downloadable link) for UCF101 (trained from scratch), achieving top-1 accuracy of 47% (as reported in chuckcho#46).

License and Citation

Caffe is released under the BSD 2-Clause license.

video-caffe's People

Contributors

shelhamer avatar jeffdonahue avatar yangqing avatar longjon avatar sguada avatar kloudkl avatar sergeyk avatar ronghanghu avatar qipeng avatar lukeyeager avatar flx42 avatar rbgirshick avatar philkr avatar dgolden1 avatar eelstork avatar mavenlin avatar jamt9000 avatar cypof avatar tnarihi avatar yosinski avatar erictzeng avatar mohomran avatar jyegerlehner avatar mtamburrano avatar netheril96 avatar ducha-aiki avatar blgene avatar ste-m5s avatar timmeinhardt avatar kkhoot avatar

Watchers

James Cloos avatar Sunghoon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.