Coder Social home page Coder Social logo

chihyaoma / activity-recognition-with-cnn-and-rnn Goto Github PK

View Code? Open in Web Editor NEW
438.0 26.0 148.0 120.5 MB

Temporal Segments LSTM and Temporal-Inception for Activity Recognition

License: MIT License

Lua 92.82% Python 1.91% C++ 5.27%
activity-recognition video-understanding torch lstm-neural-networks convolutional-neural-networks

activity-recognition-with-cnn-and-rnn's Introduction

Activity Recognition with RNN and Temporal-ConvNet

License: MIT

Chih-Yao Ma, Min-Hung Chen
(equal contribution)

Codes for the paper:
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition
(Accepted in the journal Signal Processing: Image Communication, 2019)

Project:
Activity Recognition with RNN and Temporal-ConvNet


Abstract

In this work, we demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: 1) temporal segment RNN and 2) Inception-style Temporal-ConvNet.

Our analysis identifies specific limitations for each method that could form the basis of future work. Our experimental results on UCF101 and HMDB51 datasets achieve state-of-the-art performances, 94.1% and 69.0%, respectively, without requiring extensive temporal augmentation.


How we tackle Activity Recognition problem?


Demo

The GIFs demonstrate the top-3 predictions results of our TS-LSTM and Temporal-Inception methods. The text on the top is the ground truth, three texts are the predictions for each of the method, and the bar right next to the predictions are how confident the model makes predictions.


Dataset

We are currently using UCF101 and HMDB51 dataset for our project. You can directly download the videos here:

UCF101 HMDB51
RGB link link
TV-L1 link link

Prerequisites


Usage

We proposed two different methods to train the models for activity recognition: TS-LSTM and Temporal-Inception.

Inputs

Our models takes the feature vectors generated by the first stage two-stream ConvNet as input for training. You can generate the features using our codes under "/CNN-Pred-Feat/". You can also download the feature vectors generated by us. (please refer to the Dropbox link below.) We followed the training/testing splits from UCF101 and HMDB51. If you would like to compare with our results, please use the same training and testing list, as it will affect your overall performance a lot.

  • Features for training:
UCF101 HMDB51
RGB sp1 sp2 sp3 sp1 sp2 sp3
TV-L1 sp1 sp2 sp3 sp1 sp2 sp3
  • Features for testing:
UCF101 HMDB51
RGB sp1 sp2 sp3 sp1 sp2 sp3
TV-L1 sp1 sp2 sp3 sp1 sp2 sp3

Train with RNN

We use the RNN library provided by Element-Research. Simply install it by:

$ luarocks install rnn

After you downloaded the feature vectors, please modify the code in ./RNN/data-ucf101.lua to the director where you put your feature vector files.

To start the training process, go to ./RNN and simply execute:

$ th main.lua -pastalogName 'model_RNN' -nGPU 1 -dataset 'ucf101' -split '1' -fcSize '{0}' -hiddenSize '{512}' -lstm -spatFeatDir '<path/to/feature/>' -tempFeatDir '<path/to/feature/>'

The training and testing loss will be reported, and the results will be saved into log files. The learning rate and best testing accuracy will be reported each epoch if there is any update.

Train with Temporal-ConvNet

To start the training process, go to ./Temporal-ConvNet and simply execute:

$ th run.lua -o <output_folder_name> --dataset <dataset-name>

For more details and hyper-parameter tuning, please refer to the readme file in the folder ./Temporal-ConvNet/.

You also need to modify the code in ./Temporal-ConvNet/data-2Stream.lua to the director where you put your feature vector files.

The training and testing performance will be plotted, and the results will be saved into log files. The best testing accuracy will be reported each epoch if there is any update.


Can I train with frame-level features?

To standardize the comparison, the above features are equally sampled across each video. If you would like to train with frame-level features extracted at 25fps for all videos in UCF101. Please refer to Temporal Augmentation using frame-level features with RNN.


Citation

@article{ma2019ts,
  title={Ts-lstm and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition},
  author={Ma, Chih-Yao and Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan},
  journal={Signal Processing: Image Communication},
  volume={71},
  pages={76--87},
  year={2019},
  publisher={Elsevier}
}

Acknowledgment

This work was initialized as a class project for deep learning class in Georgia Tech 2016 Spring. We were teamed up with Hao Yan and Casey Battaglino to work on this class project, who have been a great help and provide valuable discussions as we go long this class project.

Please contact us if you have any questions.

Chih-Yao Ma at [email protected] or [LinkedIn]
Min-Hung Chen at [email protected] or [LinkedIn]

activity-recognition-with-cnn-and-rnn's People

Contributors

chihyaoma avatar cmhungsteve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

activity-recognition-with-cnn-and-rnn's Issues

Optical_Flow_images

Thanks for your shared code and nice job!
Could you please provide the link of the optical flow maps(images) of UCF101 and HMDB? Computing the optical flow is so slow that I have to beg you to help me. @chihyaoma @cmhungsteve

/CNN-Pred-Feat/ error

@chihyaoma @cmhungsteve
when I configure the Torch Video Decoder Library, it appears an error, My steps to configure the environment are as follows:

$ sudo add-apt-repository ppa:mc3man/trusty-media
$ sudo apt-get update
$ sudo apt-get dist-upgrade
$ sudo apt-get install ffmpeg
$ luarocks install ffmpeg

sudo apt-get install -y libavformat-dev libavutil-dev libavcodec-dev libswscale-dev libfreetype6-dev

then I go to torch-toolbox/Video-decoder/ and make, it appears an error, and I do not know how to solve it:

user@2017:/torch-toolbox-master/Video-decoder$ make
gcc -O3 -c -fpic -Wall -DDOVIDEOCAP -I. -I/home/user/torch/install/include -I/usr/include/freetype2 video_decoder.c
video_decoder.c:3065:30: error: array type has incomplete element type ‘struct luaL_reg’
static const struct luaL_reg video_decoder[] = {
^
video_decoder.c: In function ‘luaopen_libvideo_decoder’:
video_decoder.c:3094:2: warning: implicit declaration of function ‘luaL_register’ [-Wimplicit-function-declaration]
luaL_register(L, "libvideo_decoder", video_decoder);
^
In file included from video_decoder.c:13:0:
video_decoder.c: At top level:
/home/user/torch/install/include/luaT.h:41:12: warning: ‘luaL_typerror’ defined but not used [-Wunused-function]
static int luaL_typerror(lua_State *L, int narg, const char *tname)
^
video_decoder.c:3065:30: warning: ‘video_decoder’ defined but not used [-Wunused-variable]
static const struct luaL_reg video_decoder[] = {
^
make: *** [video_decoder.o] Error 1
user@2017:
/torch-toolbox-master/Video-decoder$

Python

Hey,

Is there any chance of releasing this code in python? I really don't know nothing about lua.

Thanks.

Datasets

Good Afternoon.

I want to create my own dataset with human facial emotions with my own classes.

Is it possible to use it? If it is should i make a feature extraction?

Thanks in advance and sorry, i'm still a newbie on this.

IDE

Could you tell me which IDE you use to debug lua in Linux ? thanks.

the 25th frame of each video not used in segments

Hi,
Thanks a lot for your open-source code!
It seems that only 24 frames per video is used. There are three segments and each segment contains only 8 frames. Is that correct?
Highly appreciate your time and help!

CNN-Pred-Feat/run_pred-feat_twoStreams.lua error

(around line 401-404)
(existTr = opt.save and paths.filep(namePredTr) and paths.filep(nameFeatTr[1]) and paths.filep(nameFeatTr[2])
Tr = {} -- output prediction & info
featTr = {}
if not existData then)
I think existData should be changed to existTr, because there is no variable existData in the context. is that true?

Trained model

Hi,

Could you please share the trained model file. I am using a CPU system and training takes a lot of time.

RNN model

Hi~
Im trying to implement the RNN model your code shows on HMDB-51,but i think it's better to running your well-trained rnn model before i really implement the whole rnn code so that probably i can have a deeper understanding to the structure of your rnn code .So... could you please share the well-trained rnn model file(s)?
Much regards and thanks!

top1 0

I ran main.lua in /RNN folder with

th main.lua -pastalogName 'model_RNN' -nGPU 3 -dataset 'ucf101' -split '1' -fcSize '{0}' -hiddenSize '{512}' -lstm -spatFeatDir './FeatureMap/' -tempFeatDir './FeatureMap/'

and the top1 decreased from 100 to 0 and stayed in 0,why?

a part of log:
87.629 | Epoch: [2][1/9537] Time 0.161 Data 0.047 Err 3.4332 top1 0.000 top3 0.000
87.629 | Epoch: [2][65/9537] Time 0.066 Data 0.000 Err 3.4050 top1 0.000 top3 0.000
87.629 | Epoch: [2][129/9537] Time 0.067 Data 0.000 Err 3.3989 top1 0.000 top3 0.000
87.629 | Epoch: [2][193/9537] Time 0.064 Data 0.000 Err 3.3824 top1 0.000 top3 0.000
87.629 | Epoch: [2][257/9537] Time 0.068 Data 0.000 Err 3.4021 top1 0.000 top3 0.000
87.629 | Epoch: [2][321/9537] Time 0.067 Data 0.000 Err 3.3154 top1 1.562 top3 0.000
87.629 | Epoch: [2][385/9537] Time 0.065 Data 0.000 Err 3.3323 top1 0.000 top3 0.000
87.629 | Epoch: [2][449/9537] Time 0.065 Data 0.000 Err 3.3774 top1 0.000 top3 0.000
87.629 | Epoch: [2][513/9537] Time 0.065 Data 0.000 Err 3.2946 top1 0.000 top3 0.000
87.629 | Epoch: [2][577/9537] Time 0.066 Data 0.000 Err 3.3189 top1 0.000 top3 0.000
87.629 | Epoch: [2][641/9537] Time 0.065 Data 0.000 Err 3.3365 top1 0.000 top3 0.000
87.629 | Epoch: [2][705/9537] Time 0.067 Data 0.000 Err 3.3565 top1 1.562 top3 0.000
87.629 | Epoch: [2][769/9537] Time 0.067 Data 0.000 Err 3.3036 top1 0.000 top3 0.000
87.629 | Epoch: [2][833/9537] Time 0.064 Data 0.000 Err 3.2297 top1 0.000 top3 0.000
87.629 | Epoch: [2][897/9537] Time 0.066 Data 0.000 Err 3.2951 top1 1.562 top3 0.000
87.629 | Epoch: [2][961/9537] Time 0.115 Data 0.000 Err 3.2905 top1 0.000 top3 0.000
87.629 | Epoch: [2][1025/9537] Time 0.075 Data 0.000 Err 3.2996 top1 0.000 top3 0.000
87.629 | Epoch: [2][1089/9537] Time 0.095 Data 0.000 Err 3.2523 top1 0.000 top3 0.000
87.629 | Epoch: [2][1153/9537] Time 0.089 Data 0.000 Err 3.2662 top1 0.000 top3 0.000
87.629 | Epoch: [2][1217/9537] Time 0.087 Data 0.000 Err 3.2950 top1 0.000 top3 0.000
87.629 | Epoch: [2][1281/9537] Time 0.070 Data 0.000 Err 3.1993 top1 0.000 top3 0.000
87.629 | Epoch: [2][1345/9537] Time 0.070 Data 0.000 Err 3.2526 top1 0.000 top3 0.000
87.629 | Epoch: [2][1409/9537] Time 0.065 Data 0.000 Err 3.2543 top1 1.562 top3 0.000
87.629 | Epoch: [2][1473/9537] Time 0.066 Data 0.000 Err 3.2717 top1 0.000 top3 0.000
87.629 | Epoch: [2][1537/9537] Time 0.064 Data 0.000 Err 3.2546 top1 0.000 top3 0.000
87.629 | Epoch: [2][1601/9537] Time 0.066 Data 0.000 Err 3.1987 top1 0.000 top3 0.000
87.629 | Epoch: [2][1665/9537] Time 0.068 Data 0.000 Err 3.1955 top1 0.000 top3 0.000
87.629 | Epoch: [2][1729/9537] Time 0.068 Data 0.000 Err 3.2191 top1 0.000 top3 0.000
87.629 | Epoch: [2][1793/9537] Time 0.068 Data 0.000 Err 3.1774 top1 0.000 top3 0.000
87.629 | Epoch: [2][1857/9537] Time 0.089 Data 0.000 Err 3.1900 top1 0.000 top3 0.000
87.629 | Epoch: [2][1921/9537] Time 0.101 Data 0.000 Err 3.2152 top1 0.000 top3 0.000
87.629 | Epoch: [2][1985/9537] Time 0.077 Data 0.000 Err 3.1566 top1 0.000 top3 0.000
87.629 | Epoch: [2][2049/9537] Time 0.093 Data 0.000 Err 3.1597 top1 0.000 top3 0.000
87.629 | Epoch: [2][2113/9537] Time 0.092 Data 0.000 Err 3.1578 top1 0.000 top3 0.000
87.629 | Epoch: [2][2177/9537] Time 0.078 Data 0.000 Err 3.1873 top1 1.562 top3 0.000
87.629 | Epoch: [2][2241/9537] Time 0.076 Data 0.000 Err 3.1941 top1 1.562 top3 0.000

memory issue

Hi,
Thanks a lot for providing the open-source code!
May I ask how much memory to run the example code? such as
$th main.lua -pastalogName 'model_RNN' -nGPU 1 -dataset 'ucf101' -split '1' -fcSize '{0}' -hiddenSize '{512}' -lstm -spatFeatDir '<path/to/feature/>' -tempFeatDir '<path/to/feature/>'

    My desktop has a Nvidia 1070, but it run out of memory when running this code.
    Thanks a lot!

libvideo_decoder

hello,Is there a usable "libvideo_decoder" package or how can i get it in the luarocks? in run_pred-feat_twoStreams.lua
thank u !!!

Real Time

Good afternoon.

Is there any possible way to run this model in Real Time?

I will what for your response.

Thanks

Fine-tuned model "model_best.t7" is worng.

Hi:
I got new error,
After spent almost two weeks to fine tuned both flow and RGB model("model_best.t7" in the folder CNN-GPUs),when I use the two models to generat the feature(in CNN-Pred-Feat),the error appear:

inconsistent tensor size, expected tensor [1 x 25 x 101] and src [400 x 101] to have the same number of elements, but got 2525 and 40400 elements respectively at /home/gtune/torch/pkg/torch/lib/TH/generic/THTensorCopy.c:86

How to correct the code?

cannot generating features

@chihyaoma @cmhungsteve
Hi:
in folder CNN-Pred-Feat,when i run the command to generate feature,I got error below:

split #: 1
...
==> Processing all the videos...
Current Class: 1. ApplyEyeMakeup
[mpeg4 @ 0x7f45937175a0] Invalid and inefficient vfw-avi packed B frames detected
[mpeg4 @ 0x7f4594342ac0] Invalid and inefficient vfw-avi packed B frames detected
/home/user/torch/install/bin/luajit: bad argument #2 to '?' (start index out of bound at /tmp/luarocks_torch-scm-1-2915/torch7/generic/Tensor.c:984)
stack traceback:
[C]: at 0x7f4636095bd0
[C]: in function '__index'
run_pred-feat_twoStreams.lua:679: in main chunk
[C]: in function 'dofile'
...tune/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670
...

The first problem is "[mpeg4 @ ...] Invalid and inefficient vfw-avi packed B frames detected"
The second problem is "/torch/install/bin/luajit: bad argument #2 to '?' (start index out of bound at /tmp/luarocks_torch-scm-1-2915/torch7/generic/Tensor.c:984)"

I don't know how to fixed these problems...

And how to prepare the dataset,do I have to extract RGB and optical frames into jpg files from video?

invalid arguments: DoubleTensor FloatTensor number

@chihyaoma @cmhungsteve
When I execute : th run.lua -o /root/datasets/trainingData/TC --dataset UCF-101

give the result as follows :
root@rnn-ThinkServer-TS50X:~/Activity-Recognition-with-CNN-and-RNN/Temporal-ConvNet# th run.lua -o /root/datasets/trainingData/TC --dataset UCF-101
==> processing options
==> switching to CUDA
==> using GPU #1
==> load data
dirFeature #: /root/datasets/Features/UCF-101/
split #: 1
==> load test data: data_feat_test_RGB_centerCrop_25f_sp1.t7
==> load test data: data_feat_test_FlowMap-TVL1-crop20_centerCrop_25f_sp1.t7
dirFeature #: /root/datasets/Features/UCF-101/
split #: 1
==> load test data: data_feat_test_RGB_centerCrop_25f_sp1.t7
==> load test data: data_feat_test_FlowMap-TVL1-crop20_centerCrop_25f_sp1.t7
/root/torch/install/bin/lua: /root/torch/install/share/lua/5.1/trepl/init.lua:389: invalid arguments: DoubleTensor FloatTensor number
expected arguments: [DoubleTensor] DoubleTensor DoubleTensor [index] | [DoubleTensor] {DoubleTensor+} [index]
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
run.lua:66: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

What should I do? ,I need some help

Feature vectors

Hi,

I am not able to download the feature vectors from the links given in the page. I have permission issues to access dropbox files from my machine. Could you please share the files.

'-rho'

What Does this parameter mean ? Does it mean the total number of frames that you sampled for each video ?

Trouble with luarocks while installing torch

This isn't really relevant to the code base. But I had trouble installing Torch and luarocks which is a dependency for the code. So I thought I should put it here. Sometimes permission issues with Lua can lead to luarocks not working. The following link provides a good discussion about the issue and a solution as well.

libvideo_decoder missing

Trying to execute th run_pred-feat_twoStreams.lua, but the package libvideo_decoder is missing. I installed ffmpeg using luarocks as well as apt. Tried using luarocks to install libvideo_decoder, but it couldn't identify the query. Tried installing the video-decoder from torch-toolbox, but it seems it only works with lua5.1 (runs into compile issues when trying to make with lua5.2).
How do you install the libvideo_decoder package?

Error in checkpoints when trying to load a saved model

I get this error - unknown Torch class nn.Sequencer
when i was trying to create a demo file which returns test results on the data for a particular trained epoch/model
it originates from the following line in checkpoints.lua
"local latest=torch.load(latestPath)"

running optical_flow.cpp error

When I built optical_flow.cpp with eclips,therr're two errors came out.
The errors came out in line 369 and 373:
cvtColor(cv::Mat&, cv::Mat&, ) is ambiguous

I just don't how to solve this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.