hbilen / dynamic-image-nets Goto Github PK

View Code? Open in Web Editor NEW

180.0 180.0 66.0 3.56 MB

Dynamic Image Networks for Action Recognition

MATLAB 99.74% Shell 0.26%

action-recognition computer-vision convolutional-neural-networks deep-learning

dynamic-image-nets's People

Contributors

Stargazers

Watchers

Forkers

geshiming ilovecv shaoli-huang fucusy jerryharrison peterzs arasharchor wanjinchang lgw1860 benmanw jinwchoi benjamesbabala nlkim0817 applesleam milestonesvn fanghaizhao zhengfangwu fabienmorin cv-ip longlong-jing huan2016 rainstrom caomw bityangke newzhx gaoyuanhong greenteahua rzel tracy007v5 gongcr julysk blssel freywang junmuzi mafeimf murari023 dqgdqg locussam xiaoyu5301 xiaoluenbi panna19951227 ssysteve adiser minminaicode ferisetiawan shivam2298 kevintrannz klqulei fendou201398 acewjh mdongbenben peterzhousz vandana-rajan zhao-yl tanmoy-in xmulinhui d3v3l0 codemanduo ntuzxy noammy jiang6158 wisamreid jpmh1309 dombergka

dynamic-image-nets's Issues

how to get the single class accuracy

from the paper we can see that the model can only give the average of all the classes and evaluate the whole model , so how I can get the single class such as Biking Class accuracy ? thanks for your help !

how can i see the dynamic image result?

how can i see the dynamic image result? after run the step 5

I did not find any code about SI+DI+OF+DOF

How did you extract optical flow and DOF

compute approximate dynamic images

I am a newcomer to programming. After the fifth step, I want to compute approximate dynamic images and get the dynamic image, but I didn't find the relevant code. I don't know what to do next. Thank you for your help.

Predicting Activity in Video

Hi, thanks for the code.
Let say, I have one video from UCF101, then I extract the optical flow images from that video. Can I use trained model (Deploy\resnext50-of-arpool-split1.mat) and run cnn_dicnn_of() to predict predifined activity within this video? Thank you

About the accuracy on UCF101

Hi, I find a strange problem when evaluating your code on ucf101. The accuracies are shown below.
Accuracy on split 1: 0.659001
Accuracy on split 2: 0.996249
Accuracy on split 3: 0.997294

Do you have any idea about the huge difference?

Temporal Max Pooling

Love the work, I am just having difficulty understanding the architecture for the SI + DI model.
From what I see in the architecture of the resnext.mat model, the model uses a temporal max pooling layer just before the softmax layer. It says the input to the temporal max pooling layer are the merged conv7 features and Video2. I am assuming the merged conv7 features come from running the dynamic image through the ResNext model. Where does the Video2 come from?
Are we supposed to pass the whole video or just a single frame from the video clip?

What is the format of trainlist or testlist files？

dynamci-image show

i want see the dynamic image ,but in the code" vl_nnarpooltemporal layer"get the image is not same with the paper's image.the left image is from paper, the right image is from matlab code.can you help me?

How can I reproduce the results in the paper?

How can I reproduce the results in the paper?
I could train the dynamic image net.
However, I could not find prediction or evaluation script in the code.
Could you please tell me how to do the evaluation? Or can you upload the evaluation code?

how to choose N?

function di = compute_approximate_dynamic_images(images)
% Computes approximate dynamic images for a given array of images
% IMAGES must be a tensor of H x W x D x N dimensionality or
% cell of image names

I want to use the approximate_dynamic_images. I do not know how to choose N for a video.

what is SI?

I don't understand the SI(RGB image)
what the different between the DI: dynamic RBG image and SI: RGB image?

the download of the cnn models for the ucf101 dataset

Hello, I am very interested in your work. I got a problem that the link you provide to download the cnn models for the ucf101 dataset is invalid, can you provide the link again? Thank you very much!

How to extract the optical flow diagram

Hi, I have two questions to ask you，thanks

How to extract the optical flow diagram
How to combine SI+DI+OF+DOF to get 95% accuracy

at step 4, can you share some sample file in data/UCF101/ucfTrainTestlist/

please show some sample file list in README.md like below

data/UCF101/ucfTrainTestlist/
├── classIndFixed.txt
├── classInd.txt
├── testlist01.txt
├── testlist02.txt
├── testlist03.txt
├── trainlist01.txt
├── trainlist02.txt
└── trainlist03.txt

how to do the step 4, Convert videos to frames, resize them etc

Convert videos to frames, resize them to 256x256 and store them in such a directory structure

Training on CPU

is it possible to train on CPU.?

obtaining dynamic images.

hey, recently i am reading your paper about dynamic image net, and one question i do not know how to achieve it:
it seems that the dynamic images are obtained offline, and i do not how to obtain this.

Visualizing dynamic image

Hello,
I wanted to re-implement the function compute_approximate_dynamic_images with python and compare the results. But I can't find the function visualize_approximate_dynamic_images in the matlab code...
I tried to perform L2 normalization on the output of compute_approximate_dynamic_images, and multiply the values by 255 to get an image. Is this the correct way of visualizing dynamic image?

Cell contents reference from a non-cell array object

while I run the cnn_dicnn， I get the following error，but I have no idea;

cnn_dicnn
train: epoch 01: 1/1193:Cell contents reference from a non-cell array object.

Error in cnn_video_get_batch (line 23)
fetch = numel(images) >= 1 && ischar(images{1}) ;

Error in cnn_dicnn>getDagNNBatch (line 177)
im = cnn_video_get_batch(images, VideoId1, opts, ...

Error in cnn_dicnn>@(x,y)getDagNNBatch(bopts,useGpu,x,y) (line 102)
fn = @(x,y) getDagNNBatch(bopts,useGpu,x,y) ;

Error in cnn_train_dag>processEpoch (line 201)
inputs = params.getBatch(params.imdb, batch) ;

Error in cnn_train_dag (line 87)
[net, state] = processEpoch(net, state, params, 'train') ;

Error in cnn_dicnn (line 73)
[net, info] = cnn_train_dag(net, imdb, getBatchFn(opts, net.meta), ...

thank you for your help!

MDI modility accuracy

hello，i want to know how the accuracy is computed when the input modility is MDI? For SDI, a dynamic have one recognition result while MDI have more than one, in the paper, it seems not to be introduced how
the MDI accuracy is made. Thanks for your reply.

Training own dataset

I am trying to train own data using your model. My dataset contains 29398 folders, each of which has 11 frames. There are two classes: normal and abnormal. I wrote the custom sepup_data.m for it. When I run main_train.m, some errors occurred as follow. How can I solve it? Will I need to change the parameters of architecture to fit this dataset? Thanks for your time!

How can I re-implement all the layers in DIN?

How can I use other DL frameworks to achieve this, such as tensorflow？

Is it possible?

I have visualize the dagnn by this. But I can not re-implement the last few layers such as alPooling, ultiClass, kPooling and so on.

Do you have any idea about this?