mit-han-lab / temporal-shift-module Goto Github PK
View Code? Open in Web Editor NEW[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Home Page: https://arxiv.org/abs/1811.08383
License: MIT License
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Home Page: https://arxiv.org/abs/1811.08383
License: MIT License
Hi!
Thanks for your interesting work and the source code.
I find that the performance on Sthv1 of TSM with 8-frames and ResNet-50 backbone, efficient test setting is much better than your paper. Have you made any improvements to the original paper?
And can you share the training script on Sthv1 for TSM with 8-frames and ResNet-50 backbone, which can get the same Top-1 Acc in the GitHub?
Thanks very much
Hello!Thanks for your excellent work.I find there are very little 3D works for sthv1/v2 datastes.
I check the leaderboard of sthv2.The top methods nearly all 2D.The performance of 3D is far away 2D works. In fact ,3D conv is proved more suitable for capturing space-time information.The top Acc of UCF/HMDB/Kinetics are all 3D methods.
So what's your opinions about there are fewer 3D works and their Acc are lower on sthv1/v2?
Looking forward to your reply soon.Thanks.
Appreciated for your great work and kind code sharing!
I notice that there is a complex optimizer policy in the TSN model. A part of that is like:
{'params': first_conv_weight, 'lr_mult': 5 if self.modality == 'Flow' else 1, 'decay_mult': 1,
'name': "first_conv_weight"},
However, I suppose that the embedded pytorch SGD optimizer cannot identify the parameters like 'lr_mult' and 'decay_mult', which are from Caffe framework. Considering there is no specific function to override the 'step' func in the original SGD class, I deem that those complex optimizer policy is indeed without efficiency.
Please disabuse me if I misunderstand this part.
Thank you very much for your codebase. I have trained my own data with resnet50 successfully,but I when train it with mobilenet, the accuracy is very low.
python main.py ucf101 RGB --arch mobilenetv2 --num_segments 8 --gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 --batch-size 2 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres
Freezing BatchNorm2D except the first one.
Epoch: [24][0/104], lr: 0.00001 Time 15.333 (15.333) Data 15.214 (15.214) Loss 0.6946 (0.6946) Prec@1 50.000 (50.000) Prec@5 100.000 (100.000)
Epoch: [24][20/104], lr: 0.00001 Time 0.085 (0.815) Data 0.000 (0.725) Loss 0.6946 (0.6896) Prec@1 50.000 (54.762) Prec@5 100.000 (100.000)
Epoch: [24][40/104], lr: 0.00001 Time 0.084 (0.459) Data 0.000 (0.371) Loss 0.6947 (0.6907) Prec@1 50.000 (53.659) Prec@5 100.000 (100.000)
Epoch: [24][60/104], lr: 0.00001 Time 0.086 (0.336) Data 0.000 (0.250) Loss 0.6946 (0.6894) Prec@1 50.000 (54.918) Prec@5 100.000 (100.000)
Epoch: [24][80/104], lr: 0.00001 Time 0.082 (0.274) Data 0.000 (0.188) Loss 0.6391 (0.6893) Prec@1 100.000 (54.938) Prec@5 100.000 (100.000)
Epoch: [24][100/104], lr: 0.00001 Time 0.084 (0.236) Data 0.000 (0.151) Loss 0.6946 (0.6926) Prec@1 50.000 (51.980) Prec@5 100.000 (100.000)
Test: [0/12] Time 2.424 (2.424) Loss 0.7487 (0.7487) Prec@1 0.000 (0.000) Prec@5 100.000 (100.000)
Testing Results: Prec@1 52.174 Prec@5 100.000 Loss 0.69226
Best Prec@1: 52.174
why?
我想试试双向TSM的效果,不知道在哪里修改可以实现?希望得到作者大大的回复,感谢!
I got how to finetune TSM in UCF101, but there is not the accuracy posted. Could you tell me the accuracy? THX.
Hi, Thank you for your TSM code!
But I'm wondering if there is code for resnet50 pretrained models (not the weights).
Hi, Author! Thank you for sharing such great jobs! I'm very interesting in your paper. Could you provid the pretrain model or the trainng script on jester dataset?
Thank you for the wonderful work.
For Uni-directional TSM for online video detection what is the network backbone used? Resnet 101 or mobilenetV2?
Also can you elaborate on the below lines from the paper. Like how the training and validation is carried out?
I am trying to reproduce the same result.
We show that we can significantly improve the performance of video detection by simply modifying the backbone with online TSM, without changing the detection module design or using optical flow features
For TSM experiments, we inserted uni-directional TSM to the backbone, while keeping other settings the same.
And if possible please release the online training script.
Thanks for your great work!
I tried to run
python test__models.py somethingv2 \
--weights=pretrained/TSM_somethingv2_RGB_resnet101_shift8_blockres_avg_segment8_e45.pth \
--test_segments=8 --batch_size=24 -j 12 --full_res --test_crops=3 --twice_sample
and encountered the following error message
RuntimeError: Error(s) in loading state_dict for TSN:
Missing key(s) in state_dict: "base_model.layer1.1.conv1.net.weight", "base_model.layer2.1.conv1.net.weight", "base_model.layer2.3.conv1.net.weight", "base_model.layer3.1.conv1.net.weight", "base_model.layer3.3.conv1.net.weight", "base_model.layer3.5.conv1.net.weight", "base_model.layer3.7.conv1.net.weight", "base_model.layer3.9.conv1.net.weight", "base_model.layer3.11.conv1.net.weight", "base_model.layer3.13.conv1.net.weight", "base_model.layer3.15.conv1.net.weight", "base_model.layer3.17.conv1.net.weight", "base_model.layer3.19.conv1.net.weight", "base_model.layer3.21.conv1.net.weight", "base_model.layer4.1.conv1.net.weight".
Unexpected key(s) in state_dict: "base_model.layer1.1.conv1.weight", "base_model.layer2.1.conv1.weight", "base_model.layer2.3.conv1.weight", "base_model.layer3.1.conv1.weight", "base_model.layer3.3.conv1.weight", "base_model.layer3.5.conv1.weight", "base_model.layer3.7.conv1.weight", "base_model.layer3.9.conv1.weight", "base_model.layer3.11.conv1.weight", "base_model.layer3.13.conv1.weight", "base_model.layer3.15.conv1.weight", "base_model.layer3.17.conv1.weight", "base_model.layer3.19.conv1.weight", "base_model.layer3.21.conv1.weight", "base_model.layer4.1.conv1.weight".
I solved this problem by changing the line n_round = 1
in ops/temporal_shift.py
to n_round = 2
.
I have followed this to install opencv but getting error aarch64: libgomp.so.1: cannot allocate memory in static TLS block
import cv2 in line 8
Hi, thanks for sharing this work
I'm having a segmentation fault when running online_demo code. Here is the error:
UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...
Segmentation fault (core dumped)
I found that the program is crashing in line 35 of the main.py file and I'm currently using the LLVM-4.0.0 on ubuntu 16.04.
relay_module, params = tvm.relay.frontend.from_onnx(onnx_model, shape=input_shapes)
Can anyone replicate this problem?
Thanks for your help.
When I run this command:
# test NL TSM using non-local testing protocol
python test_models.py kinetics \
--weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense_nl.pth \
--test_segments=8 --test_crops=3 \
--batch_size=8 --dense_sample --full_res
I got only 57.85% Overall Prec@1. Looking forward to your reply.
Thanks for the repo! I'm wondering is there a command to train a RGB + Flow model on a pretrained model?
Hi Dr Lin,
Did you train TSM on Kinetics dataset using optical flow as the input modality?
If so, could you please release the pre-trainded model on Kinetics with the input of optical flow?
Thank you!
Hi!
Thanks for the impressive work with publicly accessible source code here :)
I am trying to train another application for online TSM with different datasets and adding a few adjustments. The repo currently only has the offline version of training script. Is it possible for you also providing the training script for the online TSM?
Thank you very much!
I think online_demo
will only work for jetson nano how can i run this on my laptop I install all the packages on laptop getting this error.
Open camera...
<VideoCapture 0x7f4e749c3270>
Build transformer...
/media/mustafa/ubuntu_backup/anaconda3/envs/video_action/lib/python3.7/site-packages/torchvision/transforms/transforms.py:210: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...
/media/mustafa/ubuntu_backup/Projects/video_action/temporal-shift-module/online_demo/mobilenet_v2_tsm.py:95: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
x1, x2 = x[:, : c // 8], x[:, c // 8:]
Segmentation fault```
I just ran the training script
"python3 main.py kinetics RGB --arch mobilenetv2 --num_segments 8 --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 50 --batch-size 128 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres --npb --gpus 1"
But i ran into below error.
File "main.py", line 249, in train
output = model(input_var)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
"them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cuda:0
Also the loss,model,input and target are in GPU . What are the real expected settings? Please let us know
Congratulations on the great work!
As noted in the supplementary section: "... we inserted uni-directional TSM to the backbone, while keeping other settings the same. We used the official training code of [60] to conduct the experiments".
May I ask a few questions on online video object detection:
Hi,
Thanks for sharing the mobilenetV2 pretrained weights for online TSM on kinetics.
Can we fine-tune it on HMDB51 dataset the same way mentioned in the GitHub repository? If so what might be the expected accuracy on the smaller data sets?
I used this command to train the TSM model:
# You should get TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
python main.py kinetics RGB \
--arch resnet50 --num_segments 8 \
--gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 50 \
--batch-size 128 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \
--shift --shift_div=8 --shift_place=blockres --npb
And I got a ckpt.best.pth.tar and ckpt.pth.tar, likely including the model parameter, the model structure information, but the test_models.py only need the parameter. I tried to save the model parameter in ckpt.pth.tar and deleted the lines in test_models.py:
# base_dict = {('base_model.' + k).replace('base_model.fc', 'new_fc'): v for k, v in list(checkpoint.items())}
base_dict = {'.'.join(k.split('.')[1:]): v for k, v in list(checkpoint.items())}
replace_dict = {'base_model.classifier.weight': 'new_fc.weight',
'base_model.classifier.bias': 'new_fc.bias',
}
for k, v in replace_dict.items():
if k in base_dict:
base_dict[v] = base_dict.pop(k)
net.load_state_dict(base_dict)
However, I got very low accuracy. Please tell me how to load the parameter rightly. THX.
Hi, thanks for the code release. In your first version of Arxiv paper,
We then fine-tuned the model to other target datasets like Something- Something [12], UCF101 [34], and HMDB51 [22]
In the most current version
For most of the datasets, the model is fine-tuned from ImageNet pre-trained weights; while HMDB-51 [26] and UCF-101 [40] are too small and prone to over-fitting [48], we followed the common practice [48, 49] to fine-tune from Kinetics [25] pre-trained weights and freeze the Batch Normalization [22] layers.
Which dataset is trained using the pre-trained model to get the score reported in the paper? Jester, UCF101 and HDMB? Are the parameters set for Jester and HDMB the same as UCF101?
Thanks again.
Can the test results on the somethingv1 dataset and the hyperparameter Settings achieve the accuracy of 47.3% in the paper? Num - segments = 8? Epoch25 isn't enough, is it? My 25 epochs are only 45.98 percent
Python main.py something RGB \
--arch resnet50 --num_segments 8 \
--gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 \
--batch-size 1-j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \
-- tune_from = pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50. PTH
hi,dear,
for I have no the dataset, and I just want to use my own dataset ,then how to modify the code in
vid2img_kinetics,
or you can supply the usage of the script and tell me the what's the kinetics400 configure,
thx
I got into a trouble when I run the online_demo by using my trained model.
My trained model has only 4 classes, but an unexpected error occurred. The error message is as follows.
File "main.py", line 319, in main
cv2.putText(label, 'Prediction: ' + catigories[idx],
IndexError: list index out of range
I have modified catigories as 4 classes.
It seems like a simple bug but I don't have a clue about it.
Hi, Author.
I set the param 'num_segment=16; batch-size=32' and train the Optical Flow Model on Kinetics dataset. The model can converg, but the training speed is very slow. Do you have the same question? or how to solve? Looking forward to your reply, thank you very much!
Thanks a lot for the author's code, the effect is amazing. But I have a problem and I am looking forward to helping: I have implemented the real-time pre-processing of the video frame captured by the webcam, and then how to use this model for motion recognition?
Can someone give me some advice, thank you very much!
hi,dear
in the script non_local.py
I find the archs.small_resnet.ResNet
but in the archs/ I couldn't find it
So could you help me ?
thx
Hey Hi,
Thank you for your work.
I am trying to train TSM online version on kinectics with resnet 50 and it has been two days and it has not passed two epocs.
How long did it take to train TSM network from scratch online version for both for resnet 50 and mobilenetv2? I just wanted to make sure if i am in the right path.
我没有加载预训练模型,backbone用的resnet50,但是一直不收敛。希望作者大大帮忙解答,谢谢!
Installing everything on a nano with a jetson sd card image r32.2
when launching /onlinedemo/main.py on python3 here the raise error
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX
cuda is my path
Thanks for helping
notice that
1/ when making tvm, no /nnvm/python directory is generated.
2/ tried first to install on last release sd card image r32.3.1 and didn't succeed
=> on what version of jetson nano sd card you made it worked ?
Hi, will you release the training code for video object detection? thx.
Anyway you can release a Keras version of the Temporal Shift in ops/temporal_shift.py
. I am looking at implementing this in a custom layer to use with my project.
Hi, thanks for sharing this work
I'm having a fault when running online_demo code. Here is the error:
Traceback (most recent call last):
File "main.py", line 349, in
main()
File "main.py", line 323, in main
idx, history = process_output(idx_, history)
File "main.py", line 249, in process_output
if not (history[-1] == history[-2]): # and history[-2] == history[-3]):
IndexError: list index out of range
I have 1060ti graphics card 16gb ram i7 processor but Build Executor... is taking more than 5min don't know why plus its using 1 cpu 100% and ram 2 gb even graphics memory 500mb
Open camera...
<VideoCapture 0x7fef3986be10>
Build transformer...
/media/mustafa/ubuntu_backup/anaconda3/envs/video_action/lib/python3.7/site-packages/torchvision/transforms/transforms.py:210: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...
Hi,
Thank you for your work . I was trying to use the TSM module and also check with the reported accuracy but the test_models.py is expecting a val_folder.txt and train_folder.txt(basically train and validation file list).
I tried to download the kinetics 400 data set(download from official code from the script in activity net) but the recent one has so many expired/broken YouTube links . If possible could you please give access to the kinetics data set that you used for training?.
Hi,
Thanks for your amazing work!
I'm new in video analysis, I'm wondering the model performance if you do not load ImageNet pretrained weights? And what if you load pretrained weights on other task dataset, e.g, detecton on MS-COCO?
I did not find you report this issue in your paper or your code, thanks for your help!
when run the repo, it will keep poping the UserWarning:
/pytorch/torch/csrc/autograd/python_function.cpp:638: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
os:win7
pytorch:1.2
python:3.5
Hi Ji, thank you for publishing the work.
I want to double-check training parameters for training TSM on something-v2 which should achieve at least 58.8 which is the performance I got when testing with your weight. (tested with single crop and single clip)
According to your paper, the training parameters for the something-something-v2 dataset are: 50 training epochs, initial learning rate 0.01 (decays by 0.1 at epoch 20&40), weight decay 1e-4, batch size 64, and dropout 0.5. And the model is fine-tuned from ImageNet pre-trained weights.
However, the script in the git repository indicates that initial learning rate is 0.001, weight decay is 5e-4, and the model is tuned-from Kinetics pre-trained weight.
Due to this disparity, I am confusing which parameters should I use to reproduce the number. Could you provide accurate parameters for training TSM on something-v2?
Thank you.
I download the pretrained model: TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
and finetune it on UCF101-split1 using the command below:
'''
python main.py ucf101 RGB
--arch resnet50 --num_segments 8
--gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25
--batch-size 64 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1
--shift --shift_div=8 --shift_place=blockres
--tune_from=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
'''
but official UCF101 dataset doesn't provide validation dataset, so I split UCF101-split1 into 9:1 , 9 for training , 1 for validation.
after the training process, I test the model on UCF101-split 1 using the command below:
'''
python test_models.py ucf101
--weights=checkpoint/TSM_ucf101_RGB_resnet50_shift8_blockres_avg_segment8_e25/ckpt.best.pth.tar
--test_segments=8 --batch_size=72 -j 24 --test_crops=3 --twice_sample --full_res
'''
and I only get Acc1 93.4%, I want to know what I did wrong, and how can i reproduce your result in the paper Acc1 95.9%.
I really appreciate your reply, thank you very much!
Thanks for your good work!
I follow TSM early after submit to arxiv first time. And I find the result in older version https://arxiv.org/pdf/1811.08383v1.pdf, Table 2,
TSM ResNet50 16 65G 24.3M 44.8 74.5
while in your ICCV paper under the same setting, the result is .
TSM ResNet50 16 65G 24.3M 47.2 77.1
While there is no difference in performance between kinetics pretrained model in these two versions, If the reason is using different hyperparmater or training the data more sufficient?
Looking forward to your reply! :)
Hi Ji,
Thanks for your novel work. I wonder that, have you tried to train MobileNetV2 from scratch (without any pre-trained weight) on Kinetics or UCF-101? Could you share the configs of this setting like lr and bs?
Thanks!
I'm trying to reproduce the two-stream results of TSM on Something v1, but the performance of my flow model is far below. (segment based sampling method)
I understand the 10 channels stacked optical flow (TV-L1) / learning rate 5 times in the first conv layer.
Is there any difference between RGB and Flow model in setting params?
(e.x, epochs, learning rate...)
hi, dear,
have tested the test_models.py a liittle difficult to read,
if I just want to test on some movies ,then how should I modify the codes,
thx
any advice or suggsetion will be appreciated.
if have a key argument 'video_path' will be more convenient
do not want test on the txt file below
Traceback (most recent call last):
File "test_models.py", line 182, in <module>
]), dense_sample=args.dense_sample, twice_sample=args.twice_sample),
File ".\temporal-shift-module\ops\dataset.py", line 58, in __init__
self._parse_list()
File ".\temporal-shift-module\ops\dataset.py", line 96, in _parse_list
tmp = [x.strip().split(' ') for x in open(self.list_file)]
FileNotFoundError: [Errno 2] No such file or directory: '/ssd/video/kinetics/labels/val_videofolder.txt'
hi,dear
Do not want test on the txt file's images
Just want to test other images or videos ,so how to ?
pls supply the file
or tell me what's on it ?
thx
hi, dear
If I want to get the embedding features from the TSM pretrained model ,
could you please help me ?
thx
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.