sense-x / x-temporal Goto Github PK

View Code? Open in Web Editor NEW

468.0 468.0 51.0 121 KB

A general video understanding codebase from SenseTime X-Lab

License: MIT License

Shell 1.13% Python 93.70% Cuda 2.14% C 1.69% C++ 1.33%

x-temporal's People

Contributors

Stargazers

Watchers

Forkers

deepcs233 hyzcn bityangke barryzm taowenleon yuhonghong7035 yuan-2703 12345fengce xrosliang sandy4321 zymale xlsean yuhonghong95721 zhyj3038 videodnn marvis buptbf daisily727 amieefu chenjiadong96 suhaisheng hzhang57 leafxx liqianqian123 cv-ip super-ljg xwyangjshb kshitijd20 threegun pingzhenyu wenzhendeng yagamishadow ahuirecome milkyboot anjole 1m58s himoda niltrou tigermki erasurecode lycosine rainbowbowbow evildonkey420 andreabrantes andy1621 eleenyang lovelyczli cbanyungong straitrobot tonia97

x-temporal's Issues

Can this project continue to be updated

Your project is very well written, but it seems that it has not been updated since TSM. Although there is mmaction, the bottom layer of mmcv feels very unfriendly.

Can I run without GPU?

I failed in running ./easy_setup.sh on my mac. The error is :

Traceback (most recent call last):
  File "setup.py", line 28, in <module>
    torch.utils.cpp_extension.CUDA_HOME = _find_cuda_home()
  File "setup.py", line 20, in _find_cuda_home
    nvcc = subprocess.check_output(['which', 'nvcc']).decode().rstrip('\r\n')
  File "/opt/anaconda3/lib/python3.7/subprocess.py", line 395, in check_output
    **kwargs).stdout
  File "/opt/anaconda3/lib/python3.7/subprocess.py", line 487, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['which', 'nvcc']' returned non-zero exit status 1.

It seems that cuda is necessary but MAC doesn't have gpu.

Prediction demo using different pretrained models

Hi,

Thanks for sharing the code and model zoo. I was wondering if you could direct me how should I modify this code to output predictions on single video files (e.g. .mp4 format) like it is done in slowfast code here

Thanks.

RuntimeError: DataLoader worker (pid(s) 1995171, 1996371) exited unexpectedly

when i set workers>0, i would get error: “RuntimeError: DataLoader worker (pid(s) 1995171, 1996371) exited unexpectedly ”

It seems to happen at “batch = next(iterator)” in “.../X-Temporal/x_temporal/interface/temporal_helper.py"
（note：running the experiment “tin”）

`AverageMeter` is not correct

I encountered an issue similar to this that AverageMeter in utils.py which is used in temporal_helper.py is not correct.

Could you please fix it?

issues with pytorch version higher than 1.4.0

The THCState_getCurrentStream seems deprecated in pytorch 1.5.0 or higher.
I read some information on replacing it with at::cuda::getCurrentCUDAStream, but failed. Any idea on fixing it?

error information below

./X-Temporal/x_temporal/cuda_shift/src/shift_cuda.cpp: In function ‘at::Tensor shift_featuremap_cuda_backward(const at::Tensor&, const at::Tensor&, const at::Tensor&)’:
./X-Temporal/x_temporal/cuda_shift/src/shift_cuda.cpp:41:27: error: ‘THCState_getCurrentStream’ was not declared in this scope
     ShiftDataCudaBackward(THCState_getCurrentStream(state),
                           ^~~~~~~~~~~~~~~~~~~~~~~~~

How can you get the meta files: .tex > Thanks for your feedback. You can set trainer.no_partial_bn = True if batch size >= 6 in each gpu and retry it, this will not affect the accuracy. That module exists some bug with distributed training, we will fix it quickly.

Thanks for your feedback. You can set trainer.no_partial_bn = True if batch size >= 6 in each gpu and retry it, this will not affect the accuracy. That module exists some bug with distributed training, we will fix it quickly.

Thanks for the reply, but it doesn't work for my problem.
When I set no_partial_bn = True, the log file stop at 'save_dir: checkpoint/' and with no update again. and the usage is still about 800~900M.

The changed settings in my YAML file are only dataset related:
root_dir: train: meta_file: /home/renb/project/action_recognition/X-Temporal/data_labels/sthv1/train_videofolder.txt val: meta_file: /home/renb/project/action_recognition/X-Temporal/data_labels/sthv1/val_videofolder.txt test: meta_file: /home/renb/project/action_recognition/X-Temporal/data_labels/sthv1/test_videofolder.txt
Very confused about this.

Thanks again and waiting for you suggestion.

Originally posted by @Amazingren in #1 (comment)

meta_file when training?

when i train on my dataset,i meet this

problem AttributeError: 'EasyDict' object has no attribute 'meta_file'

Could anyboby help me? Thank you.

Abort the multi-label

Hi. When I test, I can not find the output file and I don't find how to output the multi-label. When multi-label, how to define my pred labels are right. Thanks.

evaluate on test dataset when training?

X-Temporal/x_temporal/interface/temporal_helper.py

Line 386 in 5faa5f4

test_loader = self.data_loaders['test']

Here the data_loaders is set to "test" at any time.
I'm not sure wether this is a bug or not. Maybe there is something I misunderstand?

how to get the TIN pretrain model in Kinetics-600?

how to get the TIN pretrain model in Kinetics-600, i want to get the pretrain model. thanks

Feature extraction

Hi, thanks for the great codebase.

Could you kindly provide the code to extract features from custom videos using pre-trained models?

Found Two fresh BUGs and a Solution

Thank you for your beautiful code.
I run this model base on Kinetics dataset, and video format is .mp4.
1st ERROR: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
reason: Training on multiple GUPs is called tensor, which is divided into different kinds of memory or video memory.
Solution: In /.. /utils.py line 42, should add ** contiguous().view(1,-1) **, the same as line 46.

The train work but when the Iteration=40, it got a ERROR.
2nd ERROR: decord._ffi.base.DECORDError: [14:51:44] /io/decord/src/video/video_reader.cc:125: Check failed: st_nb >= 0 (-1381258232 vs. 0) ERROR cannot find video stream with wanted index: -1
And UserWarning: resource_tracker: There appear to be 122 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Although I tried many ways, I still couldn't solve the problem.

Would you kindly let me know of the problem. Thanks a lot

In linear_sampler(data, bias), bias 'shape should NTG?

In linear_sampler(), bias'shape is NTG, why is it not N*G?

./train.sh for TSM stop at the first log infor. : Freezing BatchNorm2D except...

Thanks for your nice style codebase.
However, when I try to train TSM in your codebase, there is a problem which stoped me from training it.
（1）The log file stop at: 2020-04-10 xxxx094-models.py#177: Freezing BatchNorm2D except the first one, and I wait it for 10 min but with no continue update.
（2）When I use 'gpustat' check the usage of gpu, it shows only about 800M data in each gpu(I use 8 in total)

I am sorry for disturbing you, while as a green hand also would be appreaciate if you could show me some light.

When training，the log file stop at 'save_dir: checkpoint/' and with no update again

Thank you for your nice code.
When I set the value in default.yaml
gpus:4
dataset:
img_prefix: '{:05d}.jpg'
video_source :True #because I use video as input data
modality: Flow
train:
meta_file : /my_path/train_videofolder.txt
trainer:
no_partial_bn: True

I set train script: train.sh
T=date +%m%d%H%M ROOT=../.. cfg=default.yaml export PYTHONPATH=$ROOT:$PYTHONPATH CUDA_VISIBLE_DEVICES=4,5,6,7 python $ROOT/x_temporal/train.py --config $cfg | tee log.train.$T

When I run train.sh , the log file stop at 'save_dir: checkpoint/' and with no update again.
Would you kindly let me know of the problem. Thanks a lot

About the model result

The previous issue was closed. #14 (comment)

Following your suggestion, I used multi-crops and a bigger input, I still can not reproduce the result on Multi-moments in time using TIN and slowfast model(TIN:57 vs 62 in report). However, I can get a little bit better result using tsn model(59.7 vs 58.9). Do you have any idea to solve this?

version: 1.0 
config:
  gpus: 4
  seed: 2020
  dataset:
    workers: 4
    num_class: 313
    num_segments: 16
    batch_size: 8
    img_prefix: '{:05d}.jpg'
    video_source: True
    dense_sample: False
    modality: RGB
    flow_prefix: ''
    root_dir: ""
    flip: False
    input_mean: [0.485, 0.456, 0.406]
    input_std: [0.229, 0.224 ,0.225]
    crop_size: 224
    scale_size: 256
    train:
      meta_file: /path
    val:
      meta_file: /workdir/wwn/Multi_Moments_in_Time/mit-val.txt
    test:
      meta_file: /workdir/wwn/Multi_Moments_in_Time/mit-val.txt
    multi_class: True

  net:
    arch: resnet50
    model_type: 2D
    tin: True
    shift_div: 4
    consensus_type: avg
    dropout: 0.8
    img_feature_dim: 256
    pretrain: True # imagenet pretrain for 2D network


  trainer:
    print_freq: 20
    eval_freq: 1
    epochs: 35
    start_epoch: 0
    loss_type: bce
    no_partial_bn: True
    clip_gradient: 20
    lr_scheduler:
      warmup_epochs: 1
      warmup_type: linear
      type: CosineAnnealingLR
      kwargs:
        T_max: 30
    optimizer:
      type: SGD
      kwargs:
        lr: 0.02
        momentum: 0.9
        weight_decay: 0.0005
        nesterov: True
  
  evaluate:
    spatial_crops: 1
    temporal_samples: 1


  saver:
    #save_dir: 'checkpoint/'
    #pretrain_model: '/path'
    resume_model: /home/hadoop-mtcv/cephfs/data/wangwanneng/X-Temporal-master/X-Temporal-master/pretrained/tin_mit_16.pth.tar

but the testing result is 14.4 mAP.

I think maybe there is somthing wrong in the configuration of the model because when testing the model, there are missing keys:

missing keys are as follows:
    module.base_model.layer3.4.bn1.num_batches_tracked
    module.base_model.layer2.1.bn2.num_batches_tracked
    module.base_model.layer3.2.bn3.num_batches_tracked
    module.base_model.layer3.5.bn1.num_batches_tracked
    module.base_model.bn1.num_batches_tracked
    module.base_model.layer4.2.bn3.num_batches_tracked
    module.base_model.layer4.1.bn2.num_batches_tracked
    module.base_model.layer1.2.bn2.num_batches_tracked
    module.base_model.layer2.2.bn1.num_batches_tracked
    module.base_model.layer3.5.bn2.num_batches_tracked
    module.base_model.layer4.2.bn2.num_batches_tracked
    module.base_model.layer4.0.downsample.1.num_batches_tracked
    module.base_model.layer1.0.bn3.num_batches_tracked
    module.base_model.layer3.0.downsample.1.num_batches_tracked
    module.base_model.layer3.3.bn3.num_batches_tracked
    module.base_model.layer3.3.bn2.num_batches_tracked
    module.base_model.layer4.0.bn1.num_batches_tracked
    module.base_model.layer3.2.bn1.num_batches_tracked
    module.base_model.layer2.3.bn2.num_batches_tracked
    module.base_model.layer1.0.bn2.num_batches_tracked
    module.base_model.layer4.1.bn1.num_batches_tracked
    module.base_model.layer2.1.bn3.num_batches_tracked
    module.base_model.layer2.0.downsample.1.num_batches_tracked
    module.base_model.layer3.4.bn3.num_batches_tracked
    module.base_model.layer1.0.downsample.1.num_batches_tracked
    module.base_model.layer1.2.bn1.num_batches_tracked
    module.base_model.layer4.1.bn3.num_batches_tracked
    module.base_model.layer4.0.bn3.num_batches_tracked
    module.base_model.layer3.1.bn1.num_batches_tracked
    module.base_model.layer3.3.bn1.num_batches_tracked
    module.base_model.layer1.0.bn1.num_batches_tracked
    module.base_model.layer1.1.bn3.num_batches_tracked
    module.base_model.layer3.0.bn2.num_batches_tracked
    module.base_model.layer3.0.bn3.num_batches_tracked
    module.base_model.layer2.1.bn1.num_batches_tracked
    module.base_model.layer1.2.bn3.num_batches_tracked
    module.base_model.layer2.3.bn1.num_batches_tracked
    module.base_model.layer3.1.bn2.num_batches_tracked
    module.base_model.layer1.1.bn1.num_batches_tracked
    module.base_model.layer2.0.bn3.num_batches_tracked
    module.base_model.layer2.0.bn2.num_batches_tracked
    module.base_model.layer1.1.bn2.num_batches_tracked
    module.base_model.layer3.4.bn2.num_batches_tracked
    module.base_model.layer4.0.bn2.num_batches_tracked
    module.base_model.layer3.5.bn3.num_batches_tracked
    module.base_model.layer2.2.bn2.num_batches_tracked
    module.base_model.layer3.1.bn3.num_batches_tracked
    module.base_model.layer3.2.bn2.num_batches_tracked
    module.base_model.layer2.3.bn3.num_batches_tracked
    module.base_model.layer3.0.bn1.num_batches_tracked
    module.base_model.layer4.2.bn1.num_batches_tracked
    module.base_model.layer2.2.bn3.num_batches_tracked
    module.base_model.layer2.0.bn1.num_batches_tracked

so can you share your config file when testing MMit dataset?