zhang-can / eco-pytorch Goto Github PK

View Code? Open in Web Editor NEW

188.0 188.0 93.0 45.04 MB

PyTorch implementation for "ECO: Efficient Convolutional Network for Online Video Understanding", ECCV 2018

License: BSD 2-Clause "Simplified" License

Python 70.06% Lua 1.44% Shell 1.45% Jupyter Notebook 1.66% C++ 25.10% Makefile 0.24% Dockerfile 0.05%

eco-pytorch's People

Stargazers

Watchers

eco-pytorch's Issues

about train

i found that the Volatile GPU-Util is zero in most of time, so i want to know whether the data loading took a long time, but in pytorch ,which provide the "workers" parameter to set multithreading to read data,but it still slow ,i dont know the reason and hopeful to get the answer.

Error about checkpoint file when tried to test on a video

Sorry to interrupt again...I have done the training part and the checkpoint file is generated, and I want to do the inference task, so I write a demo using the .pth file and some frames cropped from a video in order to predict its class. But one error occurs:

Traceback (most recent call last):
  File "new_demo.py", line 105, in <module>
    main()
  File "new_demo.py", line 84, in main
    model.load_state_dict(checkpoint['state_dict'])
  File "/home//anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict
    .format(name))
KeyError: 'unexpected key "module.base_model.conv1_7x7_s2.weight" in state_dict

The code loading ckpt file is as follows:

checkpoint = torch.load(args.ckpt_file)
print checkpoint['state_dict'], type(checkpoint['state_dict'])
model.load_state_dict(checkpoint['state_dict'])

I have no idea what is wrong...

how to modify the number of classes?

though the num_class is set at the begin of the main.py as follow ,but it seems not being used in the program, could you give me some suggestion ？
thank you~
-------------------main.py-------------------------
if args.dataset == 'ucf101':
num_class = 101
elif args.dataset == 'hmdb51':
num_class = 51
elif args.dataset == 'kinetics':
num_class = 400
elif args.dataset == 'something':
num_class = 174

关于训练程序的使用方法

@zhang-can 您好，非常感谢你的工作，我安装readme文档中的方法进行实验，训练和测试数据分别产生了三个不同的文件，假设是train_list1.txt,train_list2.txt,train_list3.txt和val_list1.txt,val_list2.txt,val_list3.txt,
请问在执行python main.py ucf101 RGB <ucf101_rgb_train_list> <ucf101_rgb_val_list>应该如何将这三个文件传递给训练程序呢？谢谢！

Difference between pre-trained ECOLite models

Hi, I have downloaded the ECOLite pre-trained models following the instructions given here https://github.com/mzolfaghari/ECO-pytorch.
I am not sure what is the difference between the two pre-trained ECOLite models: eco_lite_rgb_16F_kinetics_v3.pth.tar and ECO_Lite_rgb_model_Kinetics.pth.tar
Thanks!

test model

Author, thank you very much for your work.
I have a question, I want to get the classification accuracy of the 101 class, but after the test my results have been like this, what do you think might be the problem?

[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Accuracy 0.99%

Looking forward to your reply

some difference between pytorch and caffe version

Hi, Can
Thanks so much for your contribution for this pytorch version.
I have tested on something dataset, I found the performance is around 3-4 worse than the caffe version. I obtain 38.6 accuracy using pytorch code ECO-lite with 16 segements whereas the caffe achieved 42.4 accuracy.

I have some questions about the difference between this pytorch version and original caffe version.

The mean is [104 117 123] in caffe whereas the pytorch version uses [104 117 128].
The caffe model resizes the image to [320 240] followed by cropping whereas the pytorch version resize the image to [256 256] followed by cropping.

Do you think these differences will result into performance difference?
Meanwhile, I wonder the pretrained model on Kinetics you provided is obtained by training your pytorch model or just convert from the pretrained caffe model?

Looking forward to your reply.
Best,
Tan

When using single input, model always produces wrong output

Hello,

I am Youngkyoon Jang, who is a postdoc in the University of Bristol. Thanks for sharing the PyTorch implementation of ECO network for testing new dataset, EPIC-KITCHEN. But I am facing the problem when testing the model to calculate the accuracy.

I am currently using PyTorch implementation for ECO network. And I noticed that the model always predicts the first index with the highest score when using a single input instead of multiple mini-batch samples in testing. Did you know this problem? Is there a correct way to get a consistent output?

When I put a different number of batch sample in a mini-batch, the model predicts a different score for the same sample depending on the number of mini-batch.

I look forward to your reply.

Best,
Young

How to mofidify the args.num_segments

currently training and test is ok, but if I change the args.num_segments from 4 to 8, an error will occur:

RuntimeError: size mismatch, m1: [24 x 1024], m2: [512 x 8] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

where should I modify the code to solve this problem?

@zhang-can thanks!

about downloading the pretrain model

Thank you for you code, it is really helpful to me.
recently I try to update the pretrain model from you new url,
'https://s3.us-east-2.amazonaws.com/zhangcan/kin_ECO_epoch-12-c89e9dc0.pth.tar'

but it hard to download from china, could you give us another option?

Thank you~

Performance on the validation set of something-something-v1

Hi, thank you for releasing pytorch version of ECO.

Have you run ECO-pytorch on the validation set of something-something-v1 dataset?

I wonder if it have reached the performance on the paper, 46.4%, which is obtained from an ensemble of networks with {16, 20, 24, 32} number of frames.

Thank you.

Out of memory

When I tried to run the command " sh run_demo_ECO_Full.sh local ", the "RuntimeError: CUDA error(2): out of memory" showed up.

No autograd

Hi, I find that before input the tensor into the network, you don't use torch.autograd.Variable to convert a tensor to a Variable. How does this work?

a problem of training model

I met a problem named as "FileNotFoundError: [Errno 2] No such file or directory: 'dataset\UCF-101\WalkingWithDog\img_00001.jpg' when I trained the model using "run_ECOLite_finetune_UCF101.sh".It couldn't continue to run with "train(train_loader, model, criterion, optimizer, epoch)" using debug.I could't find the reason resulting in this. can you help me ? Thanks!

eco_lite_rgb_16F_kinetics_v3.pth.tar is not in BaiduYun

Hi @zhang-can , thank you for your work! I can't access to Google Docs, so could you please upload the latest pretrained model eco_lite_rgb_16F_kinetics_v3.pth.tar on BaiduYun?

按照paper给出的初始化方案、相应的参数设置，在UCF101上进行训练，发现loss下降速度非常慢，甚至到了100个epoch时，top-1的acc才勉强到10%。。。

dataset problem

I did some expermients based on ECO,but encountered some problem about the benchmark datasets.My downloaded something -something dataset couldn't use with an unknown erro,so can anyone share the dataset?And the kinetics dataset is so bigger,I can't do the experiment on it(machine limiation).So how represent the scientific contrast with ECO and my modifying ?

How to draw loss and prec@1 picture?

How to use log.txt to draw loss and prec@1 picture?

UCF Crime Dataset

Folks,

Can we train UCF Crime Dataset, I wanted to train Crime videos for a research. I see this model uses UCF 101.

Can you please let me know.

Thanks
Guru

Can't reproduce top-1 85% results

Hi,

While running ECO-lite 4F I couldn't reach the reported results of top-1 85%. Instead got around 68%. Is there anything extra that needs to be done?

To my knowledge the only thing I'm doing differently is using a newer version of PyTorch.

Thanks

I would like to see the results of the ECO trained model. Please share with me. Thank you very much. 想看一下训练好的模型的结果，请大神分享一下，非常感谢。

I would like to see the results of the ECO trained model. Please share with me. Thank you very much.
想看一下训练好的模型的结果，请大神分享一下，非常感谢。

about the inference demo

sooorry to bother you again :p
I want to test the online recognition. but the inference erro like this: **RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58**
only i set the batchsize to 2 and then it can work. but I find the eval will success when the batchsize is 15. below is the code:

I test it on UCF101. thanks in advande

contribution to repo

Hello,

thank you for making this code available. you say that any contribution is welcome. I'd like to be help complete the code. could you add a TODO, so I know what tasks to start working on.

Closing this as I see that the code is being developed further in a different repo.

Thanks again for releasing this code

Trained ECO models on Kinetics dataset

As reported in the paper of ECO, pretrained 2D BNInception and 3D resnet-18 models on Kinetics dataset are not enough to get a good result, training the ECO model for another 10 epochs on Kinectics would promise a better result. However, I am really not able to train the model on Kinectics(GPU and memory limitation). Since you mentioned that you are training the models on Kinetics, would you please share the trained model? I am going to use the trained weights to initialize the model and train it on something something dataset. I can report the testing result and share it in this repository.

a runtimeError problem of training

When I tried to run the command " sh run_demo_ECO_Lite.sh local", the "RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cpu" showed up.

Traceback (most recent call last):
File "main.py", line 601, in
main()
File "main.py", line 211, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 449, in train
output = model(input_var)
File "/data3/yjx/venv_eco/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/data3/yjx/venv_eco/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
"them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cpu

运行报错这个如何解决

AttributeError: module 'PIL.Image' has no attribute 'HAMMING'

train shell error

scripts/run_ECOLite_finetune_UCF101.sh: 65: scripts/run_ECOLite_finetune_UCF101.sh: Syntax error: end of file unexpected (expecting "then")

thanks for your attenation

i has some ideas to write paper like eco, but i donot know how to create the ECO.yaml from origin caffe model?

i has some ideas to write paper like eco,
but i donot know how to create the ECO.yaml from origin caffe model?

a problem of training

when I train ,I met a error called"Type error:'BatchNorm3d' object is not iterable" . I thought a lot about it ,but no outcome.

ECO trained the test dataset to get high verification accuracy？！

Hello, thank you for your contribution in the pytorch-ECO code in GitHub. However, I found some problems in the validate function of main.py.

It looks like 254 lines in main.py：

compute output

output = model(input_var)
loss = criterion(output, target_var)

It seems that the ECO network trained the validation data set. I hope you take the time to check it out during your busy schedule.

problem with multiple gpus

solved

Failed!! Did you run the code under the dataset, called something-something

good work!
When I did try this repo by using the dataset, something-something, there was a odd bug. Although I have all the data of the something under the running folder, including the
/20bn-datasets/20bn-something-something-v1/76110/00006.jpg' ,
the code will stop owing to FileNotFoundError: [Errno 2] No such file or directory: '../20bn-datasets/20bn-something-something-v1/76110/img_00006.jpg'.
Interestingly, this time is /76110/img-00006.jpg, while other time is /36265/img_00006.jpg. In Summary, it is changable!!
Thanks!
Looking forward to any answers!!
@zhang-can

pretrained models cannot download

first of all. thanks your code. i notice that you upload the ucf script sh . and I download the pretrained models by wget and the baidu disk. all failed. it show me that :

test result on UCF101 with 16 segments each video

what's the result on testing set of UCF101 with 16 segments each video?

how to generate the eco.yaml

have a question that how to generate the *.yaml ,is handwriting? or generate this through other file.
thx : )

How to test with trained models? 想测试一个视频看效果如何

想看看这个ECO测试视频动作的效果，但是没有demo，作者能给一个sample吗？
If I want to test a video using trained models, how?

something-something-v1 dataset problem

when I downloaded the something-something-v1 dataset , I surprisedly found that I couldn't get these files which consist of video and images . In contrary,I got some unkown type of files.DO has someone encountered the problem?

Could you please tell me the data argumentation methods you used when training on UCF-101?

what(): cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCStorage.c:184

@zhang-can The code and dataset were the same as before, but I got the following error. However they could work well about a half mouth ago.
the pytorch's version is 0.3.1
raceback (most recent call last):
File "main.py", line 341, in
main()
File "main.py", line 141, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 192, in train
prec1, prec5 = accuracy(output.data, target, topk=(1,5))
File "main.py", line 329, in accuracy
_, pred = output.topk(maxk, 1, True, True)
RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/torch/lib/THC/generic/THCTensorTopK.cu:21
/pytorch/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.c line=184 error=59 : device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
what(): cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCStorage.c:184
Aborted (core dumped)

Total running time？

Thanks for your contribution!
I run the code successfully on Tesla M40, and it takes about 11 hours which is much longer than 3 hours mentioned in the origin paper. Is this normal?

classindice for kinetics-400

你好，我想知道你的label序号是怎么设定的，因为我在预训练的模型上测试图片后并不清楚结果属于哪一类，谢谢！

Training list and testing list splits

I noticed that there are three train/test splits in UCF101 dataset, does it mean that we have three training and testing evaluation benchmarks?

训练问题

您好，很感谢你提供的代码，现在有一个问题，我按照你的参数去训练，预训练模型用的是eco_lite_rgb_16F_kinetics_v3.pth.tar，但是loss一直不降低维持在4.61左右。

在ucf101上finetune

ECO-pytorch/models.py

Line 321 in ef4cab7

base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:]))

当前向传播执行到这一步时，base_out的
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
device='cuda:1', grad_fn=)大多都成了nan，请问该如何解决

网络结构

你好，num_segments=4,numsegments=16画出来的网络结构是一样的，这是为啥呢?

a bug --consensus_type avg consensus_module: avg

When I set --consensus_type avg, I got a bug as following:
@zhang-can
Initializing TSN with base model: ECO.
TSN Configurations:
input_modality: RGB
num_segments: 4
new_length: 1
consensus_module: avg
dropout_ratio: 0.5

['fc_final.weight', 'fc_final.bias']
/data/yuyongbo/action_recognition/ECO-pytorch/tf_model_zoo/ECO/pytorch_load.py:60: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
normal(new_state_dict[k], 0, std)
/data/yuyongbo/action_recognition/ECO-pytorch/tf_model_zoo/ECO/pytorch_load.py:62: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
constant(new_state_dict[k], 0)
/data/yuyongbo/action_recognition/ECO-pytorch/models.py:91: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
normal(self.new_fc.weight, 0, std)
/data/yuyongbo/action_recognition/ECO-pytorch/models.py:92: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
constant(self.new_fc.bias, 0)
/home/yuyongbo/anaconda3/envs/py27-torch4/lib/python2.7/site-packages/torchvision-0.2.1-py2.7.egg/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1
group: first_conv_bias has 1 params, lr_mult: 2, decay_mult: 0
group: normal_weight has 31 params, lr_mult: 1, decay_mult: 1
group: normal_bias has 19 params, lr_mult: 2, decay_mult: 0
group: BN scale/shift has 2 params, lr_mult: 1, decay_mult: 0
Freezing BatchNorm2D except the first one.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/THCCachingHostAllocator.cpp line=257 error=59 : device-side assert triggered
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [27,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed.
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fa0f247f3d0>> ignored
Traceback (most recent call last):
File "main.py", line 338, in
main()
File "main.py", line 137, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 188, in train
prec1, prec5 = accuracy(output.data, target, topk=(1,5))
File "main.py", line 325, in accuracy
_, pred = output.topk(maxk, 1, True, True)
RuntimeError: invalid argument 5: k not in range for dimension at /opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/generic/THCTensorTopK.cu:21
terminate called without an active exception
Aborted (core dumped)

2D tensor changed to 3D tensor, runtime error

y = self.con1x1(x) # 2D tensor
y = y.view((-1, 96, 16) + y.size()[2:]) #2D tensor changed to 3D tensor

error:

multi-GPUs will break down??

My environment is Ubuntu16.04 python2.7, torch.version '0.4.0' ;torchvision.version '0.2.1' . I have 8-gups with 1080Ti.
this repo will work will with one-GPU , however it will break down with multi-gups after run 1 epoch. The error is following:
Freezing BatchNorm2D except the first one.
main.py:251: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
input_var = torch.autograd.Variable(input, volatile=True)
main.py:252: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
target_var = torch.autograd.Variable(target, volatile=True)
main.py:261: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
losses.update(loss.data[0], input.size(0))
main.py:262: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top1.update(prec1[0], input.size(0))
main.py:263: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top5.update(prec5[0], input.size(0))
Test: [0/46] Time 6.590 (6.590) Loss 4.9549 (4.9549) Prec@1 1.562 (1.562) Prec@5 9.375 (9.375)
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 338, in
main()
File "main.py", line 141, in main
prec1 = validate(val_loader, model, criterion, (epoch + 1) * len(train_loader))
File "main.py", line 255, in validate
output = model(input_var)
File "/home/yuyongbo/object_detection/venv-torch-latest/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/yuyongbo/object_detection/venv-torch-latest/local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/yuyongbo/object_detection/venv-torch-latest/local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/yuyongbo/object_detection/venv-torch-latest/local/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

about the gen_dataset_lists.py

Thanks for you works on ECO, I just download your code and the DataSet ucf1101,
When I follow the instruction,I face such err:
could you help me check this error?
thanks~

================================ err ===================================

guo@guo-ubu:~/eng/ECO-pytorch-master$ python gen_dataset_lists.py ucf101 ~/eng/dataset/UCF-101/

Processing dataset ucf101:

Parse frames under folder /home/guo/eng/dataset/UCF-101/
0/101 videos parsed
Frame folder analysis done
Writing list files for training/testing
Traceback (most recent call last):
File "gen_dataset_lists.py", line 100, in
lists = build_split_list(split_tp, f_info, i, shuffle)
File "/home/guo/eng/ECO-pytorch-master/pyActionRecog/benchmark_db.py", line 58, in build_split_list
train_rgb_list, train_flow_list = build_set_list(split[0])
File "/home/guo/eng/ECO-pytorch-master/pyActionRecog/benchmark_db.py", line 48, in build_set_list
frame_dir = frame_info[0][item[0]]
KeyError: 'v_ApplyEyeMakeup_g08_c01'
guo@guo-ubu:~/eng/ECO-pytorch-master$

How to change num_segments?

hello,I want to set num_segments to 16,I have tried to change
args.num_segments=16,but it's wrong.Can you tell me what should I do?

and the base_model include one segments(input one frame ) or four segment network?
thank you very much

zhang-can / eco-pytorch Goto Github PK

eco-pytorch's People

Stargazers

Watchers

Forkers

eco-pytorch's Issues

compute output

Recommend Projects

Recommend Topics

Recommend Org