Coder Social home page Coder Social logo

xvikx / stpn-sys843 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bellos1203/stpn

0.0 0.0 0.0 366.83 MB

STPN - Weakly Supervised Action Localization by Sparse Temporal Pooling Network

License: Apache License 2.0

Shell 1.59% Python 98.41%

stpn-sys843's Introduction

STPN - Weakly Supervised Action Localization by Sparse Temporal Pooling Network (reproduced)

Overview

This repository contains a reproduced code for the paper "Weakly Supervised Action Localization by Sparse Temporal Pooling Network" by Phuc Nguyen, Ting Liu, Gautam Prasad, and Bohyung Han, CVPR 2018.

Usage Guide

  • Hardware : TITAN X GPU

0.Requirements

  • Python3
  • Tensorflow 1.6.0
  • numpy 1.15.0
  • OpenCV 3.4.2
  • Sonnet (to extract features from I3D model)
  • Pandas (to evaluate)
  • SciPy 1.1.0

1.Preprocessing

  1. Subsample the video with the sampling ratio of 10 frames per second.
  2. After sampling the video frames, rescale them to make the smallest dimension of the frame equal to 256 while preserving the aspect ratio.
  3. Calculate the Optical Flow (TV-L1)
  4. Save the rgb frames to train_data/rgb and the flow frames to train_data/flows with the name of vid_num/{:06d}.png. (test_data/rgb, test_data/flows for the case of test data) I simply save the videos as 1,2,3,....200 for the convenience.
  1. Extract the feature vector of each video by using the code in the "feature_extraction" folder. The extracted features will be saved in the [train/test]_data/[rgb/flow]_features. Since I use the TITAN X GPU which has 12GB Memory, I extract the feature from 16*100 frames which means 100 segments at each time. If you have the GPU with smaller memory, you should extract the feature with the reduced number of segments. Please refer to the extract_feature.sh in the folder.

2.Train the Model

  • Run the train.sh code.
  • Please refer to the train.sh for more details.

3.Test and Extract the Result

  • Run the test.sh code.
  • Please refer to the test.sh for more details.
  • Note that I excluded two falsely annotated videos, 270, 1496, following the SSN paper.

4.Evaluate

  • Run the eval.sh code.
  • Please refer to the eval.sh for more details.
  • I used the evaluation code from the official ActivityNet repo, as the authors did.

Reproduced Result

With the provided sample checkpoint(files in the code/ckpt/ckpt001), I got the following result for the THUMOS14 test set, which is similar to the paper.

tIoU 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 mAP
STPN(paper) 52.0 44.7 35.5 25.8 16.9 9.9 4.3 1.2 0.1 21.2
Reproduced 52.1 44.2 34.7 26.1 17.7 10.1 4.9 1.3 0.1 21.3

Please note that the best result appears around 22k ~ 25k and sometimes the performance could be slightly different from the numbers above.

Comments

If you have any questions or comments, please contact me. [email protected]

Acknowledgements

This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2017-0-01780, The technology development for event recognition/relational reasoning and learning knowledge based system for video understanding)

License

Apache-2.0

stpn-sys843's People

Contributors

bellos1203 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.