Coder Social home page Coder Social logo

mit-han-lab / temporal-shift-module Goto Github PK

View Code? Open in Web Editor NEW
2.0K 42.0 418.0 244 KB

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

Home Page: https://arxiv.org/abs/1811.08383

License: MIT License

Python 89.57% Shell 1.63% Makefile 0.43% C++ 8.38%
acceleration low-latency temporal-modeling video-understanding efficient-model nvidia-jetson-nano tsm

temporal-shift-module's Issues

Some doubts about the performance

Hi!
Thanks for your interesting work and the source code.
I find that the performance on Sthv1 of TSM with 8-frames and ResNet-50 backbone, efficient test setting is much better than your paper. Have you made any improvements to the original paper?
And can you share the training script on Sthv1 for TSM with 8-frames and ResNet-50 backbone, which can get the same Top-1 Acc in the GitHub?

Thanks very much

About 3D Network?

Hello!Thanks for your excellent work.I find there are very little 3D works for sthv1/v2 datastes.
I check the leaderboard of sthv2.The top methods nearly all 2D.The performance of 3D is far away 2D works. In fact ,3D conv is proved more suitable for capturing space-time information.The top Acc of UCF/HMDB/Kinetics are all 3D methods.
So what's your opinions about there are fewer 3D works and their Acc are lower on sthv1/v2?
Looking forward to your reply soon.Thanks.

A question about optimizer policy

Appreciated for your great work and kind code sharing!
I notice that there is a complex optimizer policy in the TSN model. A part of that is like:
{'params': first_conv_weight, 'lr_mult': 5 if self.modality == 'Flow' else 1, 'decay_mult': 1,
'name': "first_conv_weight"},
However, I suppose that the embedded pytorch SGD optimizer cannot identify the parameters like 'lr_mult' and 'decay_mult', which are from Caffe framework. Considering there is no specific function to override the 'step' func in the original SGD class, I deem that those complex optimizer policy is indeed without efficiency.
Please disabuse me if I misunderstand this part.

how to train mobilenetv2 model

Thank you very much for your codebase. I have trained my own data with resnet50 successfully,but I when train it with mobilenet, the accuracy is very low.

python main.py ucf101 RGB --arch mobilenetv2 --num_segments 8 --gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 --batch-size 2 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres

Freezing BatchNorm2D except the first one.
Epoch: [24][0/104], lr: 0.00001 Time 15.333 (15.333) Data 15.214 (15.214) Loss 0.6946 (0.6946) Prec@1 50.000 (50.000) Prec@5 100.000 (100.000)
Epoch: [24][20/104], lr: 0.00001 Time 0.085 (0.815) Data 0.000 (0.725) Loss 0.6946 (0.6896) Prec@1 50.000 (54.762) Prec@5 100.000 (100.000)
Epoch: [24][40/104], lr: 0.00001 Time 0.084 (0.459) Data 0.000 (0.371) Loss 0.6947 (0.6907) Prec@1 50.000 (53.659) Prec@5 100.000 (100.000)
Epoch: [24][60/104], lr: 0.00001 Time 0.086 (0.336) Data 0.000 (0.250) Loss 0.6946 (0.6894) Prec@1 50.000 (54.918) Prec@5 100.000 (100.000)
Epoch: [24][80/104], lr: 0.00001 Time 0.082 (0.274) Data 0.000 (0.188) Loss 0.6391 (0.6893) Prec@1 100.000 (54.938) Prec@5 100.000 (100.000)
Epoch: [24][100/104], lr: 0.00001 Time 0.084 (0.236) Data 0.000 (0.151) Loss 0.6946 (0.6926) Prec@1 50.000 (51.980) Prec@5 100.000 (100.000)
Test: [0/12] Time 2.424 (2.424) Loss 0.7487 (0.7487) Prec@1 0.000 (0.000) Prec@5 100.000 (100.000)
Testing Results: Prec@1 52.174 Prec@5 100.000 Loss 0.69226
Best Prec@1: 52.174

why?

ResNet50 Pretrained Models

Hi, Thank you for your TSM code!

But I'm wondering if there is code for resnet50 pretrained models (not the weights).

Pretrain model on jester dataset ?

Hi, Author! Thank you for sharing such great jobs! I'm very interesting in your paper. Could you provid the pretrain model or the trainng script on jester dataset?

For Uni-directional TSM for online video detection

Thank you for the wonderful work.

For Uni-directional TSM for online video detection what is the network backbone used? Resnet 101 or mobilenetV2?
Also can you elaborate on the below lines from the paper. Like how the training and validation is carried out?
I am trying to reproduce the same result.

We show that we can significantly improve the performance of video detection by simply modifying the backbone with online TSM, without changing the detection module design or using optical flow features

For TSM experiments, we inserted uni-directional TSM to the backbone, while keeping other settings the same.

And if possible please release the online training script.

[solved problem] for pretrained something-v2 models, it should be "n_round = 2"

Thanks for your great work!

I tried to run

python test__models.py somethingv2 \
   --weights=pretrained/TSM_somethingv2_RGB_resnet101_shift8_blockres_avg_segment8_e45.pth \
   --test_segments=8 --batch_size=24 -j 12 --full_res --test_crops=3 --twice_sample

and encountered the following error message

RuntimeError: Error(s) in loading state_dict for TSN:
        Missing key(s) in state_dict: "base_model.layer1.1.conv1.net.weight", "base_model.layer2.1.conv1.net.weight", "base_model.layer2.3.conv1.net.weight", "base_model.layer3.1.conv1.net.weight", "base_model.layer3.3.conv1.net.weight", "base_model.layer3.5.conv1.net.weight", "base_model.layer3.7.conv1.net.weight", "base_model.layer3.9.conv1.net.weight", "base_model.layer3.11.conv1.net.weight", "base_model.layer3.13.conv1.net.weight", "base_model.layer3.15.conv1.net.weight", "base_model.layer3.17.conv1.net.weight", "base_model.layer3.19.conv1.net.weight", "base_model.layer3.21.conv1.net.weight", "base_model.layer4.1.conv1.net.weight".
        Unexpected key(s) in state_dict: "base_model.layer1.1.conv1.weight", "base_model.layer2.1.conv1.weight", "base_model.layer2.3.conv1.weight", "base_model.layer3.1.conv1.weight", "base_model.layer3.3.conv1.weight", "base_model.layer3.5.conv1.weight", "base_model.layer3.7.conv1.weight", "base_model.layer3.9.conv1.weight", "base_model.layer3.11.conv1.weight", "base_model.layer3.13.conv1.weight", "base_model.layer3.15.conv1.weight", "base_model.layer3.17.conv1.weight", "base_model.layer3.19.conv1.weight", "base_model.layer3.21.conv1.weight", "base_model.layer4.1.conv1.weight".

I solved this problem by changing the line n_round = 1 in ops/temporal_shift.py to n_round = 2.

Segmentation fault in the online_demo code

Hi, thanks for sharing this work

I'm having a segmentation fault when running online_demo code. Here is the error:

UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...
Segmentation fault (core dumped)

I found that the program is crashing in line 35 of the main.py file and I'm currently using the LLVM-4.0.0 on ubuntu 16.04.

relay_module, params = tvm.relay.frontend.from_onnx(onnx_model, shape=input_shapes)

Can anyone replicate this problem?

Thanks for your help.

The accuary of TSM-NL is only 57.85%

When I run this command:

# test NL TSM using non-local testing protocol
python test_models.py kinetics \
    --weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense_nl.pth \
    --test_segments=8 --test_crops=3 \
    --batch_size=8 --dense_sample --full_res

I got only 57.85% Overall Prec@1. Looking forward to your reply.

Training script for online TSM

Hi!
Thanks for the impressive work with publicly accessible source code here :)
I am trying to train another application for online TSM with different datasets and adding a few adjustments. The repo currently only has the offline version of training script. Is it possible for you also providing the training script for the online TSM?
Thank you very much!

Segmentation fault when running demo on ubuntu

I think online_demo will only work for jetson nano how can i run this on my laptop I install all the packages on laptop getting this error.

Open camera...
<VideoCapture 0x7f4e749c3270>
Build transformer...
/media/mustafa/ubuntu_backup/anaconda3/envs/video_action/lib/python3.7/site-packages/torchvision/transforms/transforms.py:210: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...
/media/mustafa/ubuntu_backup/Projects/video_action/temporal-shift-module/online_demo/mobilenet_v2_tsm.py:95: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  x1, x2 = x[:, : c // 8], x[:, c // 8:]
Segmentation fault```

Running into the following error when trying to run in parallel in two gpus

I just ran the training script
"python3 main.py kinetics RGB --arch mobilenetv2 --num_segments 8 --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 50 --batch-size 128 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres --npb --gpus 1"

But i ran into below error.

File "main.py", line 249, in train
output = model(input_var)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
"them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cuda:0

Also the loss,model,input and target are in GPU . What are the real expected settings? Please let us know

questions on online video object detection

Congratulations on the great work!

As noted in the supplementary section: "... we inserted uni-directional TSM to the backbone, while keeping other settings the same. We used the official training code of [60] to conduct the experiments".

May I ask a few questions on online video object detection:

  1. How many frames are used during training? 21 frames the same as FGFA?
  2. What is your learning rate policy? and optimizer? the same as FGFA?

How to load model trained by myself to test_models.py

I used this command to train the TSM model:

# You should get TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
python main.py kinetics RGB \
     --arch resnet50 --num_segments 8 \
     --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 50 \
     --batch-size 128 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \
     --shift --shift_div=8 --shift_place=blockres --npb

And I got a ckpt.best.pth.tar and ckpt.pth.tar, likely including the model parameter, the model structure information, but the test_models.py only need the parameter. I tried to save the model parameter in ckpt.pth.tar and deleted the lines in test_models.py:

    # base_dict = {('base_model.' + k).replace('base_model.fc', 'new_fc'): v for k, v in list(checkpoint.items())}
    base_dict = {'.'.join(k.split('.')[1:]): v for k, v in list(checkpoint.items())}
    replace_dict = {'base_model.classifier.weight': 'new_fc.weight',
                    'base_model.classifier.bias': 'new_fc.bias',
                    }
    for k, v in replace_dict.items():
        if k in base_dict:
            base_dict[v] = base_dict.pop(k)

    net.load_state_dict(base_dict)

However, I got very low accuracy. Please tell me how to load the parameter rightly. THX.

which dataset uses the pertained model from kinetics?

Hi, thanks for the code release. In your first version of Arxiv paper,

We then fine-tuned the model to other target datasets like Something- Something [12], UCF101 [34], and HMDB51 [22]

In the most current version

For most of the datasets, the model is fine-tuned from ImageNet pre-trained weights; while HMDB-51 [26] and UCF-101 [40] are too small and prone to over-fitting [48], we followed the common practice [48, 49] to fine-tune from Kinetics [25] pre-trained weights and freeze the Batch Normalization [22] layers.

Which dataset is trained using the pre-trained model to get the score reported in the paper? Jester, UCF101 and HDMB? Are the parameters set for Jester and HDMB the same as UCF101?

Thanks again.

Accuracy in the somethingv1 dataset

Can the test results on the somethingv1 dataset and the hyperparameter Settings achieve the accuracy of 47.3% in the paper? Num - segments = 8? Epoch25 isn't enough, is it? My 25 epochs are only 45.98 percent

Python main.py something RGB \

--arch resnet50 --num_segments 8 \

--gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 \

--batch-size 1-j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \

  • shift - shift_div = 8 -- shift_place = blockres \

-- tune_from = pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50. PTH

how to process the dataset ?

hi,dear,
for I have no the dataset, and I just want to use my own dataset ,then how to modify the code in
vid2img_kinetics,
or you can supply the usage of the script and tell me the what's the kinetics400 configure,
thx

run the online_demo by using my trained model

I got into a trouble when I run the online_demo by using my trained model.
My trained model has only 4 classes, but an unexpected error occurred. The error message is as follows.

File "main.py", line 319, in main
    cv2.putText(label, 'Prediction: ' + catigories[idx],

IndexError: list index out of range

I have modified catigories as 4 classes.

It seems like a simple bug but I don't have a clue about it.

Training speed about Optical Flow Model.

Hi, Author.
I set the param 'num_segment=16; batch-size=32' and train the Optical Flow Model on Kinetics dataset. The model can converg, but the training speed is very slow. Do you have the same question? or how to solve? Looking forward to your reply, thank you very much!

Use my own data

Thanks a lot for the author's code, the effect is amazing. But I have a problem and I am looking forward to helping: I have implemented the real-time pre-processing of the video frame captured by the webcam, and then how to use this model for motion recognition?
Can someone give me some advice, thank you very much!

How long did TSM take to train from scratch on Kinectics ?

Hey Hi,

Thank you for your work.
I am trying to train TSM online version on kinectics with resnet 50 and it has been two days and it has not passed two epocs.

How long did it take to train TSM network from scratch online version for both for resnet 50 and mobilenetv2? I just wanted to make sure if i am in the right path.

Online Demo error

Installing everything on a nano with a jetson sd card image r32.2
when launching /onlinedemo/main.py on python3 here the raise error
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

cuda is my path

Thanks for helping

notice that
1/ when making tvm, no /nnvm/python directory is generated.
2/ tried first to install on last release sd card image r32.3.1 and didn't succeed
=> on what version of jetson nano sd card you made it worked ?

IndexError: list index out of range???

Hi, thanks for sharing this work

I'm having a fault when running online_demo code. Here is the error:

Traceback (most recent call last):

File "main.py", line 349, in
main()

File "main.py", line 323, in main
idx, history = process_output(idx_, history)

File "main.py", line 249, in process_output
if not (history[-1] == history[-2]): # and history[-2] == history[-3]):

IndexError: list index out of range

Build Executor... taking time

I have 1060ti graphics card 16gb ram i7 processor but Build Executor... is taking more than 5min don't know why plus its using 1 cpu 100% and ram 2 gb even graphics memory 500mb

Open camera...
<VideoCapture 0x7fef3986be10>
Build transformer...
/media/mustafa/ubuntu_backup/anaconda3/envs/video_action/lib/python3.7/site-packages/torchvision/transforms/transforms.py:210: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...

Can you share the repository of kinectics 400 dataset that you had used for training?

Hi,

Thank you for your work . I was trying to use the TSM module and also check with the reported accuracy but the test_models.py is expecting a val_folder.txt and train_folder.txt(basically train and validation file list).
I tried to download the kinetics 400 data set(download from official code from the script in activity net) but the recent one has so many expired/broken YouTube links . If possible could you please give access to the kinetics data set that you used for training?.

The model performance of train-from-scratch models

Hi,

Thanks for your amazing work!

I'm new in video analysis, I'm wondering the model performance if you do not load ImageNet pretrained weights? And what if you load pretrained weights on other task dataset, e.g, detecton on MS-COCO?

I did not find you report this issue in your paper or your code, thanks for your help!

Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method

when run the repo, it will keep poping the UserWarning:
/pytorch/torch/csrc/autograd/python_function.cpp:638: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

os:win7
pytorch:1.2
python:3.5

Script for training TSM on something-something-v2

Hi Ji, thank you for publishing the work.

I want to double-check training parameters for training TSM on something-v2 which should achieve at least 58.8 which is the performance I got when testing with your weight. (tested with single crop and single clip)

According to your paper, the training parameters for the something-something-v2 dataset are: 50 training epochs, initial learning rate 0.01 (decays by 0.1 at epoch 20&40), weight decay 1e-4, batch size 64, and dropout 0.5. And the model is fine-tuned from ImageNet pre-trained weights.

However, the script in the git repository indicates that initial learning rate is 0.001, weight decay is 5e-4, and the model is tuned-from Kinetics pre-trained weight.

Due to this disparity, I am confusing which parameters should I use to reproduce the number. Could you provide accurate parameters for training TSM on something-v2?

Thank you.

Question about finetune on UCF101

I download the pretrained model: TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
and finetune it on UCF101-split1 using the command below:
'''
python main.py ucf101 RGB
--arch resnet50 --num_segments 8
--gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25
--batch-size 64 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1
--shift --shift_div=8 --shift_place=blockres
--tune_from=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
'''
but official UCF101 dataset doesn't provide validation dataset, so I split UCF101-split1 into 9:1 , 9 for training , 1 for validation.

after the training process, I test the model on UCF101-split 1 using the command below:
'''
python test_models.py ucf101
--weights=checkpoint/TSM_ucf101_RGB_resnet50_shift8_blockres_avg_segment8_e25/ckpt.best.pth.tar
--test_segments=8 --batch_size=72 -j 24 --test_crops=3 --twice_sample --full_res
'''

and I only get Acc1 93.4%, I want to know what I did wrong, and how can i reproduce your result in the paper Acc1 95.9%.
I really appreciate your reply, thank you very much!

the result in the arxiv v1 version is inconsistent with ICCV19 version

Thanks for your good work!

I follow TSM early after submit to arxiv first time. And I find the result in older version https://arxiv.org/pdf/1811.08383v1.pdf, Table 2,
TSM ResNet50 16 65G 24.3M 44.8 74.5
while in your ICCV paper under the same setting, the result is .
TSM ResNet50 16 65G 24.3M 47.2 77.1

While there is no difference in performance between kinetics pretrained model in these two versions, If the reason is using different hyperparmater or training the data more sufficient?
Looking forward to your reply! :)

Training MobilenetV2 from scratch on Kinetics

Hi Ji,

Thanks for your novel work. I wonder that, have you tried to train MobileNetV2 from scratch (without any pre-trained weight) on Kinetics or UCF-101? Could you share the configs of this setting like lr and bs?

Thanks!

set params when training the flow model

I'm trying to reproduce the two-stream results of TSM on Something v1, but the performance of my flow model is far below. (segment based sampling method)

I understand the 10 channels stacked optical flow (TV-L1) / learning rate 5 times in the first conv layer.

Is there any difference between RGB and Flow model in setting params?
(e.x, epochs, learning rate...)

how to test on a movie? mp4 or avi

hi, dear,
have tested the test_models.py a liittle difficult to read,
if I just want to test on some movies ,then how should I modify the codes,
thx

any advice or suggsetion will be appreciated.

if have a key argument 'video_path' will be more convenient
do not want test on the txt file below

Traceback (most recent call last):
  File "test_models.py", line 182, in <module>
    ]), dense_sample=args.dense_sample, twice_sample=args.twice_sample),
  File ".\temporal-shift-module\ops\dataset.py", line 58, in __init__
    self._parse_list()
  File ".\temporal-shift-module\ops\dataset.py", line 96, in _parse_list
    tmp = [x.strip().split(' ') for x in open(self.list_file)]
FileNotFoundError: [Errno 2] No such file or directory: '/ssd/video/kinetics/labels/val_videofolder.txt'

how to test ?

hi,dear
Do not want test on the txt file's images
Just want to test other images or videos ,so how to ?
pls supply the file

kinetics/labels/val_videofolder.txt

or tell me what's on it ?

thx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.