chihyaoma / activity-recognition-with-cnn-and-rnn Goto Github PK

View Code? Open in Web Editor NEW

438.0 26.0 148.0 120.5 MB

Temporal Segments LSTM and Temporal-Inception for Activity Recognition

License: MIT License

Lua 92.82% Python 1.91% C++ 5.27%

activity-recognition video-understanding torch lstm-neural-networks convolutional-neural-networks

activity-recognition-with-cnn-and-rnn's Introduction

Activity Recognition with RNN and Temporal-ConvNet

Chih-Yao Ma, Min-Hung Chen
(equal contribution)

Codes for the paper:
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition
(Accepted in the journal Signal Processing: Image Communication, 2019)

Project:
Activity Recognition with RNN and Temporal-ConvNet

Abstract

In this work, we demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: 1) temporal segment RNN and 2) Inception-style Temporal-ConvNet.

Our analysis identifies specific limitations for each method that could form the basis of future work. Our experimental results on UCF101 and HMDB51 datasets achieve state-of-the-art performances, 94.1% and 69.0%, respectively, without requiring extensive temporal augmentation.

How we tackle Activity Recognition problem?

Demo

The GIFs demonstrate the top-3 predictions results of our TS-LSTM and Temporal-Inception methods. The text on the top is the ground truth, three texts are the predictions for each of the method, and the bar right next to the predictions are how confident the model makes predictions.

Dataset

We are currently using UCF101 and HMDB51 dataset for our project. You can directly download the videos here:

	UCF101	HMDB51
RGB	link	link
TV-L1	link	link

Prerequisites

Linux (tested on Ubuntu 14.04)
Torch
CUDA and cuDNN
NVIDIA GPU is strongly recommended
torch-pastalog to visualize training status

Usage

We proposed two different methods to train the models for activity recognition: TS-LSTM and Temporal-Inception.

Inputs

Our models takes the feature vectors generated by the first stage two-stream ConvNet as input for training. You can generate the features using our codes under "/CNN-Pred-Feat/". You can also download the feature vectors generated by us. (please refer to the Dropbox link below.) We followed the training/testing splits from UCF101 and HMDB51. If you would like to compare with our results, please use the same training and testing list, as it will affect your overall performance a lot.

Features for training:

	UCF101	HMDB51
RGB	sp1 sp2 sp3	sp1 sp2 sp3
TV-L1	sp1 sp2 sp3	sp1 sp2 sp3

Features for testing:

	UCF101	HMDB51
RGB	sp1 sp2 sp3	sp1 sp2 sp3
TV-L1	sp1 sp2 sp3	sp1 sp2 sp3

Train with RNN

We use the RNN library provided by Element-Research. Simply install it by:

$ luarocks install rnn

After you downloaded the feature vectors, please modify the code in ./RNN/data-ucf101.lua to the director where you put your feature vector files.

To start the training process, go to ./RNN and simply execute:

$ th main.lua -pastalogName 'model_RNN' -nGPU 1 -dataset 'ucf101' -split '1' -fcSize '{0}' -hiddenSize '{512}' -lstm -spatFeatDir '<path/to/feature/>' -tempFeatDir '<path/to/feature/>'

The training and testing loss will be reported, and the results will be saved into log files. The learning rate and best testing accuracy will be reported each epoch if there is any update.

Train with Temporal-ConvNet

To start the training process, go to ./Temporal-ConvNet and simply execute:

$ th run.lua -o <output_folder_name> --dataset <dataset-name>

For more details and hyper-parameter tuning, please refer to the readme file in the folder ./Temporal-ConvNet/.

You also need to modify the code in ./Temporal-ConvNet/data-2Stream.lua to the director where you put your feature vector files.

The training and testing performance will be plotted, and the results will be saved into log files. The best testing accuracy will be reported each epoch if there is any update.

Can I train with frame-level features?

To standardize the comparison, the above features are equally sampled across each video. If you would like to train with frame-level features extracted at 25fps for all videos in UCF101. Please refer to Temporal Augmentation using frame-level features with RNN.

Citation

@article{ma2019ts,
  title={Ts-lstm and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition},
  author={Ma, Chih-Yao and Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan},
  journal={Signal Processing: Image Communication},
  volume={71},
  pages={76--87},
  year={2019},
  publisher={Elsevier}
}

Acknowledgment

This work was initialized as a class project for deep learning class in Georgia Tech 2016 Spring. We were teamed up with Hao Yan and Casey Battaglino to work on this class project, who have been a great help and provide valuable discussions as we go long this class project.

Please contact us if you have any questions.

Chih-Yao Ma at [email protected] or [LinkedIn]
Min-Hung Chen at [email protected] or [LinkedIn]

activity-recognition-with-cnn-and-rnn's People

Contributors

Stargazers

Watchers

Forkers

chuckcho caomw huan2016 keishinkickback konglongteng fireae rainstrom kuangjuihsu mvpduncan fengqian1989 jayvischeng nagyist shuxb104 diegolelis saifsayed stevenlol raghparihar benjamesbabala ml-lab giserh ritesh1991 unclenine xxxhycl2010 lightbillow zhengshou newzhx pratapabhay bityangke arasharchor jeffreyyihuang anchitagarwal millerhooks bullud snoopyboyang ccv-edward akishinoshiame mubarak feitiandemiaomi barongeng rchgit dimplesl ifled alexfridman harsh-k erinwl saadmahboob zumbalamambo dgiunchi kwan-ywan willdamon dntai rizwanabro hyzcn datasharing-wow victorleelk dkrathi457 fadwaalazzo sherzz shubhampachori12110095 jingang-cv telescopeuser dbyxatu gabrielwithtina shehabk jxlin fendaq guofuzheng schu23 wpfhtl bghb manqiaoyue reiisky kinect59 mrojasabregu ai3dvision runningdongxu vikasmech anhvaut jiangxun1001 baifanysu smilesun ymqian1785 kyhoolee atery matthew43 panna19951227 mahan2 sureshvairamuthu 804476261 vinxentzhang wrat hanimiao wumzii lijianglong xiaoyu5301 koala7580 amirunpri2018 zy314099208 corner4world 72etc

activity-recognition-with-cnn-and-rnn's Issues

I can't see any code

Optical_Flow_images

Thanks for your shared code and nice job!
Could you please provide the link of the optical flow maps(images) of UCF101 and HMDB? Computing the optical flow is so slow that I have to beg you to help me. @chihyaoma @cmhungsteve

/CNN-Pred-Feat/ error

@chihyaoma @cmhungsteve
when I configure the Torch Video Decoder Library, it appears an error, My steps to configure the environment are as follows：

$ sudo add-apt-repository ppa:mc3man/trusty-media
$ sudo apt-get update
$ sudo apt-get dist-upgrade
$ sudo apt-get install ffmpeg
$ luarocks install ffmpeg

sudo apt-get install -y libavformat-dev libavutil-dev libavcodec-dev libswscale-dev libfreetype6-dev

then I go to torch-toolbox/Video-decoder/ and make, it appears an error, and I do not know how to solve it:

user@2017:/torch-toolbox-master/Video-decoder$ make
gcc -O3 -c -fpic -Wall -DDOVIDEOCAP -I. -I/home/user/torch/install/include -I/usr/include/freetype2 video_decoder.c
video_decoder.c:3065:30: error: array type has incomplete element type ‘struct luaL_reg’
static const struct luaL_reg video_decoder[] = {
^
video_decoder.c: In function ‘luaopen_libvideo_decoder’:
video_decoder.c:3094:2: warning: implicit declaration of function ‘luaL_register’ [-Wimplicit-function-declaration]
luaL_register(L, "libvideo_decoder", video_decoder);
^
In file included from video_decoder.c:13:0:
video_decoder.c: At top level:
/home/user/torch/install/include/luaT.h:41:12: warning: ‘luaL_typerror’ defined but not used [-Wunused-function]
static int luaL_typerror(lua_State *L, int narg, const char *tname)
^
video_decoder.c:3065:30: warning: ‘video_decoder’ defined but not used [-Wunused-variable]
static const struct luaL_reg video_decoder[] = {
^
make: *** [video_decoder.o] Error 1
user@2017:/torch-toolbox-master/Video-decoder$

Python

Hey,

Is there any chance of releasing this code in python? I really don't know nothing about lua.

Thanks.

The Dropbox link for pre-generated UCF-101 flow maps (37.16GB) is broken.

I am trying to download the file in the Dropbox link, but I cannot download the file correctly. It always fails at some points. Could you check the file is broken or not?

test on my own dataset

hi. how to use it to test my own dataset?

Datasets

Good Afternoon.

I want to create my own dataset with human facial emotions with my own classes.

Is it possible to use it? If it is should i make a feature extraction?

Thanks in advance and sorry, i'm still a newbie on this.

IDE

Could you tell me which IDE you use to debug lua in Linux ? thanks.

Paper reference for Temporal CNN

Hi @chihyaoma ,

What is the paper reference for the TCNN?
Thanks

the 25th frame of each video not used in segments

Hi,
Thanks a lot for your open-source code!
It seems that only 24 frames per video is used. There are three segments and each segment contains only 8 frames. Is that correct?
Highly appreciate your time and help!

the meaning of "Fine-tune" in the table 1

Thanks for your nice job! @chihyaoma
I have read your paper carfully, I want to know the meaning of "Fine-tune" in the table 1.
It means the "Cross modality pre-training" in the TSN paper?

CNN-Pred-Feat/run_pred-feat_twoStreams.lua error

(around line 401-404)
(existTr = opt.save and paths.filep(namePredTr) and paths.filep(nameFeatTr[1]) and paths.filep(nameFeatTr[2])
Tr = {} -- output prediction & info
featTr = {}
if not existData then)
I think existData should be changed to existTr, because there is no variable existData in the context. is that true?

Trained model

Hi,

Could you please share the trained model file. I am using a CPU system and training takes a lot of time.

RNN model

Hi~
Im trying to implement the RNN model your code shows on HMDB-51,but i think it's better to running your well-trained rnn model before i really implement the whole rnn code so that probably i can have a deeper understanding to the structure of your rnn code .So... could you please share the well-trained rnn model file(s)?
Much regards and thanks!

top1 0

I ran main.lua in /RNN folder with

th main.lua -pastalogName 'model_RNN' -nGPU 3 -dataset 'ucf101' -split '1' -fcSize '{0}' -hiddenSize '{512}' -lstm -spatFeatDir './FeatureMap/' -tempFeatDir './FeatureMap/'

and the top1 decreased from 100 to 0 and stayed in 0,why?

a part of log:
87.629 | Epoch: [2][1/9537] 87.629 | Epoch: [2][65/9537] 87.629 | Epoch: [2][129/9537] 87.629 | Epoch: [2][193/9537] 87.629 | Epoch: [2][257/9537] 87.629 | Epoch: [2][321/9537] 87.629 | Epoch: [2][385/9537] 87.629 | Epoch: [2][449/9537] 87.629 | Epoch: [2][513/9537] 87.629 | Epoch: [2][577/9537] 87.629 | Epoch: [2][641/9537] 87.629 | Epoch: [2][705/9537] 87.629 | Epoch: [2][769/9537] 87.629 | Epoch: [2][833/9537] 87.629 | Epoch: [2][897/9537] 87.629 | Epoch: [2][961/9537] 87.629 | Epoch: [2][1025/9537] 87.629 | Epoch: [2][1089/9537] 87.629 | Epoch: [2][1153/9537] 87.629 | Epoch: [2][1217/9537] 87.629 | Epoch: [2][1281/9537] 87.629 | Epoch: [2][1345/9537] 87.629 | Epoch: [2][1409/9537] 87.629 | Epoch: [2][1473/9537] 87.629 | Epoch: [2][1537/9537] 87.629 | Epoch: [2][1601/9537] 87.629 | Epoch: [2][1665/9537] 87.629 | Epoch: [2][1729/9537] 87.629 | Epoch: [2][1793/9537] 87.629 | Epoch: [2][1857/9537] 87.629 | Epoch: [2][1921/9537] 87.629 | Epoch: [2][1985/9537] 87.629 | Epoch: [2][2049/9537] 87.629 | Epoch: [2][2113/9537] 87.629 | Epoch: [2][2177/9537] 87.629 | Epoch: [2][2241/9537] Time 0.161 Data 0.047 Err 3.4332 top1 0.000 top3 0.000
Time 0.066 Data 0.000 Err 3.4050 top1 0.000 top3 0.000
Time 0.067 Data 0.000 Err 3.3989 top1 0.000 top3 0.000
Time 0.064 Data 0.000 Err 3.3824 top1 0.000 top3 0.000
Time 0.068 Data 0.000 Err 3.4021 top1 0.000 top3 0.000
Time 0.067 Data 0.000 Err 3.3154 top1 1.562 top3 0.000
Time 0.065 Data 0.000 Err 3.3323 top1 0.000 top3 0.000
Time 0.065 Data 0.000 Err 3.3774 top1 0.000 top3 0.000
Time 0.065 Data 0.000 Err 3.2946 top1 0.000 top3 0.000
Time 0.066 Data 0.000 Err 3.3189 top1 0.000 top3 0.000
Time 0.065 Data 0.000 Err 3.3365 top1 0.000 top3 0.000
Time 0.067 Data 0.000 Err 3.3565 top1 1.562 top3 0.000
Time 0.067 Data 0.000 Err 3.3036 top1 0.000 top3 0.000
Time 0.064 Data 0.000 Err 3.2297 top1 0.000 top3 0.000
Time 0.066 Data 0.000 Err 3.2951 top1 1.562 top3 0.000
Time 0.115 Data 0.000 Err 3.2905 top1 0.000 top3 0.000
Time 0.075 Data 0.000 Err 3.2996 top1 0.000 top3 0.000
Time 0.095 Data 0.000 Err 3.2523 top1 0.000 top3 0.000
Time 0.089 Data 0.000 Err 3.2662 top1 0.000 top3 0.000
Time 0.087 Data 0.000 Err 3.2950 top1 0.000 top3 0.000
Time 0.070 Data 0.000 Err 3.1993 top1 0.000 top3 0.000
Time 0.070 Data 0.000 Err 3.2526 top1 0.000 top3 0.000
Time 0.065 Data 0.000 Err 3.2543 top1 1.562 top3 0.000
Time 0.066 Data 0.000 Err 3.2717 top1 0.000 top3 0.000
Time 0.064 Data 0.000 Err 3.2546 top1 0.000 top3 0.000
Time 0.066 Data 0.000 Err 3.1987 top1 0.000 top3 0.000
Time 0.068 Data 0.000 Err 3.1955 top1 0.000 top3 0.000
Time 0.068 Data 0.000 Err 3.2191 top1 0.000 top3 0.000
Time 0.068 Data 0.000 Err 3.1774 top1 0.000 top3 0.000
Time 0.089 Data 0.000 Err 3.1900 top1 0.000 top3 0.000
Time 0.101 Data 0.000 Err 3.2152 top1 0.000 top3 0.000
Time 0.077 Data 0.000 Err 3.1566 top1 0.000 top3 0.000
Time 0.093 Data 0.000 Err 3.1597 top1 0.000 top3 0.000
Time 0.092 Data 0.000 Err 3.1578 top1 0.000 top3 0.000
Time 0.078 Data 0.000 Err 3.1873 top1 1.562 top3 0.000
Time 0.076 Data 0.000 Err 3.1941 top1 1.562 top3 0.000

memory issue

Hi,
Thanks a lot for providing the open-source code!
May I ask how much memory to run the example code? such as
$th main.lua -pastalogName 'model_RNN' -nGPU 1 -dataset 'ucf101' -split '1' -fcSize '{0}' -hiddenSize '{512}' -lstm -spatFeatDir '<path/to/feature/>' -tempFeatDir '<path/to/feature/>'

    My desktop has a Nvidia 1070, but it run out of memory when running this code.
    Thanks a lot!

libvideo_decoder

hello,Is there a usable "libvideo_decoder" package or how can i get it in the luarocks？ in run_pred-feat_twoStreams.lua
thank u ！！！

Real Time

Good afternoon.

Is there any possible way to run this model in Real Time?

I will what for your response.

Thanks

Fine-tuned model "model_best.t7" is worng.

Hi:
I got new error,
After spent almost two weeks to fine tuned both flow and RGB model("model_best.t7" in the folder CNN-GPUs),when I use the two models to generat the feature(in CNN-Pred-Feat),the error appear:

inconsistent tensor size, expected tensor [1 x 25 x 101] and src [400 x 101] to have the same number of elements, but got 2525 and 40400 elements respectively at /home/gtune/torch/pkg/torch/lib/TH/generic/THTensorCopy.c:86

How to correct the code?

cannot generating features

@chihyaoma @cmhungsteve
Hi:
in folder CNN-Pred-Feat,when i run the command to generate feature,I got error below:

split #: 1
...
==> Processing all the videos...
Current Class: 1. ApplyEyeMakeup
[mpeg4 @ 0x7f45937175a0] Invalid and inefficient vfw-avi packed B frames detected
[mpeg4 @ 0x7f4594342ac0] Invalid and inefficient vfw-avi packed B frames detected
/home/user/torch/install/bin/luajit: bad argument #2 to '?' (start index out of bound at /tmp/luarocks_torch-scm-1-2915/torch7/generic/Tensor.c:984)
stack traceback:
[C]: at 0x7f4636095bd0
[C]: in function '__index'
run_pred-feat_twoStreams.lua:679: in main chunk
[C]: in function 'dofile'
...tune/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670
...

The first problem is "[mpeg4 @ ...] Invalid and inefficient vfw-avi packed B frames detected"
The second problem is "/torch/install/bin/luajit: bad argument #2 to '?' (start index out of bound at /tmp/luarocks_torch-scm-1-2915/torch7/generic/Tensor.c:984)"

I don't know how to fixed these problems...

And how to prepare the dataset,do I have to extract RGB and optical frames into jpg files from video?

invalid arguments: DoubleTensor FloatTensor number

@chihyaoma @cmhungsteve
When I execute : th run.lua -o /root/datasets/trainingData/TC --dataset UCF-101

give the result as follows :
root@rnn-ThinkServer-TS50X:~/Activity-Recognition-with-CNN-and-RNN/Temporal-ConvNet# th run.lua -o /root/datasets/trainingData/TC --dataset UCF-101
==> processing options
==> switching to CUDA
==> using GPU #1
==> load data
dirFeature #: /root/datasets/Features/UCF-101/
split #: 1
==> load test data: data_feat_test_RGB_centerCrop_25f_sp1.t7
==> load test data: data_feat_test_FlowMap-TVL1-crop20_centerCrop_25f_sp1.t7
dirFeature #: /root/datasets/Features/UCF-101/
split #: 1
==> load test data: data_feat_test_RGB_centerCrop_25f_sp1.t7
==> load test data: data_feat_test_FlowMap-TVL1-crop20_centerCrop_25f_sp1.t7
/root/torch/install/bin/lua: /root/torch/install/share/lua/5.1/trepl/init.lua:389: invalid arguments: DoubleTensor FloatTensor number
expected arguments: [DoubleTensor] DoubleTensor DoubleTensor [index] | [DoubleTensor] {DoubleTensor+} [index]
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
run.lua:66: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

What should I do? ,I need some help

RNN vs Temporal Convnet accuracy?

Any idea about the accuracy with RNNs vs Temporal Convnets? Which is better?

Feature vectors

Hi,

I am not able to download the feature vectors from the links given in the page. I have permission issues to access dropbox files from my machine. Could you please share the files.

'-rho'

What Does this parameter mean ? Does it mean the total number of frames that you sampled for each video ?

Trouble with luarocks while installing torch

This isn't really relevant to the code base. But I had trouble installing Torch and luarocks which is a dependency for the code. So I thought I should put it here. Sometimes permission issues with Lua can lead to luarocks not working. The following link provides a good discussion about the issue and a solution as well.

Can you please show the way how to make inference?

libvideo_decoder missing

Trying to execute th run_pred-feat_twoStreams.lua, but the package libvideo_decoder is missing. I installed ffmpeg using luarocks as well as apt. Tried using luarocks to install libvideo_decoder, but it couldn't identify the query. Tried installing the video-decoder from torch-toolbox, but it seems it only works with lua5.1 (runs into compile issues when trying to make with lua5.2).
How do you install the libvideo_decoder package?

Error in checkpoints when trying to load a saved model

I get this error - unknown Torch class nn.Sequencer
when i was trying to create a demo file which returns test results on the data for a particular trained epoch/model
it originates from the following line in checkpoints.lua
"local latest=torch.load(latestPath)"

running optical_flow.cpp error

When I built optical_flow.cpp with eclips,therr're two errors came out.
The errors came out in line 369 and 373:
cvtColor(cv::Mat&, cv::Mat&, ) is ambiguous

I just don't how to solve this.