Coder Social home page Coder Social logo

x-temporal's People

Contributors

deepcs233 avatar liuyuisanai avatar sense-x avatar yifeitao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

x-temporal's Issues

Prediction demo using different pretrained models

Hi,

Thanks for sharing the code and model zoo. I was wondering if you could direct me how should I modify this code to output predictions on single video files (e.g. .mp4 format) like it is done in slowfast code here

Thanks.

How can you get the meta files: .tex > Thanks for your feedback. You can set trainer.no_partial_bn = True if batch size >= 6 in each gpu and retry it, this will not affect the accuracy. That module exists some bug with distributed training, we will fix it quickly.

Thanks for your feedback. You can set trainer.no_partial_bn = True if batch size >= 6 in each gpu and retry it, this will not affect the accuracy. That module exists some bug with distributed training, we will fix it quickly.

Thanks for the reply, but it doesn't work for my problem.
When I set no_partial_bn = True, the log file stop at 'save_dir: checkpoint/' and with no update again. and the usage is still about 800~900M.

The changed settings in my YAML file are only dataset related:
root_dir: train: meta_file: /home/renb/project/action_recognition/X-Temporal/data_labels/sthv1/train_videofolder.txt val: meta_file: /home/renb/project/action_recognition/X-Temporal/data_labels/sthv1/val_videofolder.txt test: meta_file: /home/renb/project/action_recognition/X-Temporal/data_labels/sthv1/test_videofolder.txt
Very confused about this.

Thanks again and waiting for you suggestion.

Originally posted by @Amazingren in #1 (comment)

About the model result

The previous issue was closed. #14 (comment)

Following your suggestion, I used multi-crops and a bigger input, I still can not reproduce the result on Multi-moments in time using TIN and slowfast model(TIN:57 vs 62 in report). However, I can get a little bit better result using tsn model(59.7 vs 58.9). Do you have any idea to solve this?

Pretrained model config file

Hi, I am trying to test the model-zoo model: TIN trained on MMit dataset based on resnet 50. I changed the default.yaml in ./experiments/tin folder to the following:

version: 1.0 
config:
  gpus: 4
  seed: 2020
  dataset:
    workers: 4
    num_class: 313
    num_segments: 16
    batch_size: 8
    img_prefix: '{:05d}.jpg'
    video_source: True
    dense_sample: False
    modality: RGB
    flow_prefix: ''
    root_dir: ""
    flip: False
    input_mean: [0.485, 0.456, 0.406]
    input_std: [0.229, 0.224 ,0.225]
    crop_size: 224
    scale_size: 256
    train:
      meta_file: /path
    val:
      meta_file: /workdir/wwn/Multi_Moments_in_Time/mit-val.txt
    test:
      meta_file: /workdir/wwn/Multi_Moments_in_Time/mit-val.txt
    multi_class: True

  net:
    arch: resnet50
    model_type: 2D
    tin: True
    shift_div: 4
    consensus_type: avg
    dropout: 0.8
    img_feature_dim: 256
    pretrain: True # imagenet pretrain for 2D network


  trainer:
    print_freq: 20
    eval_freq: 1
    epochs: 35
    start_epoch: 0
    loss_type: bce
    no_partial_bn: True
    clip_gradient: 20
    lr_scheduler:
      warmup_epochs: 1
      warmup_type: linear
      type: CosineAnnealingLR
      kwargs:
        T_max: 30
    optimizer:
      type: SGD
      kwargs:
        lr: 0.02
        momentum: 0.9
        weight_decay: 0.0005
        nesterov: True
  
  evaluate:
    spatial_crops: 1
    temporal_samples: 1


  saver:
    #save_dir: 'checkpoint/'
    #pretrain_model: '/path'
    resume_model: /home/hadoop-mtcv/cephfs/data/wangwanneng/X-Temporal-master/X-Temporal-master/pretrained/tin_mit_16.pth.tar

but the testing result is 14.4 mAP.

I think maybe there is somthing wrong in the configuration of the model because when testing the model, there are missing keys:

missing keys are as follows:
    module.base_model.layer3.4.bn1.num_batches_tracked
    module.base_model.layer2.1.bn2.num_batches_tracked
    module.base_model.layer3.2.bn3.num_batches_tracked
    module.base_model.layer3.5.bn1.num_batches_tracked
    module.base_model.bn1.num_batches_tracked
    module.base_model.layer4.2.bn3.num_batches_tracked
    module.base_model.layer4.1.bn2.num_batches_tracked
    module.base_model.layer1.2.bn2.num_batches_tracked
    module.base_model.layer2.2.bn1.num_batches_tracked
    module.base_model.layer3.5.bn2.num_batches_tracked
    module.base_model.layer4.2.bn2.num_batches_tracked
    module.base_model.layer4.0.downsample.1.num_batches_tracked
    module.base_model.layer1.0.bn3.num_batches_tracked
    module.base_model.layer3.0.downsample.1.num_batches_tracked
    module.base_model.layer3.3.bn3.num_batches_tracked
    module.base_model.layer3.3.bn2.num_batches_tracked
    module.base_model.layer4.0.bn1.num_batches_tracked
    module.base_model.layer3.2.bn1.num_batches_tracked
    module.base_model.layer2.3.bn2.num_batches_tracked
    module.base_model.layer1.0.bn2.num_batches_tracked
    module.base_model.layer4.1.bn1.num_batches_tracked
    module.base_model.layer2.1.bn3.num_batches_tracked
    module.base_model.layer2.0.downsample.1.num_batches_tracked
    module.base_model.layer3.4.bn3.num_batches_tracked
    module.base_model.layer1.0.downsample.1.num_batches_tracked
    module.base_model.layer1.2.bn1.num_batches_tracked
    module.base_model.layer4.1.bn3.num_batches_tracked
    module.base_model.layer4.0.bn3.num_batches_tracked
    module.base_model.layer3.1.bn1.num_batches_tracked
    module.base_model.layer3.3.bn1.num_batches_tracked
    module.base_model.layer1.0.bn1.num_batches_tracked
    module.base_model.layer1.1.bn3.num_batches_tracked
    module.base_model.layer3.0.bn2.num_batches_tracked
    module.base_model.layer3.0.bn3.num_batches_tracked
    module.base_model.layer2.1.bn1.num_batches_tracked
    module.base_model.layer1.2.bn3.num_batches_tracked
    module.base_model.layer2.3.bn1.num_batches_tracked
    module.base_model.layer3.1.bn2.num_batches_tracked
    module.base_model.layer1.1.bn1.num_batches_tracked
    module.base_model.layer2.0.bn3.num_batches_tracked
    module.base_model.layer2.0.bn2.num_batches_tracked
    module.base_model.layer1.1.bn2.num_batches_tracked
    module.base_model.layer3.4.bn2.num_batches_tracked
    module.base_model.layer4.0.bn2.num_batches_tracked
    module.base_model.layer3.5.bn3.num_batches_tracked
    module.base_model.layer2.2.bn2.num_batches_tracked
    module.base_model.layer3.1.bn3.num_batches_tracked
    module.base_model.layer3.2.bn2.num_batches_tracked
    module.base_model.layer2.3.bn3.num_batches_tracked
    module.base_model.layer3.0.bn1.num_batches_tracked
    module.base_model.layer4.2.bn1.num_batches_tracked
    module.base_model.layer2.2.bn3.num_batches_tracked
    module.base_model.layer2.0.bn1.num_batches_tracked

so can you share your config file when testing MMit dataset?

Found Two fresh BUGs and a Solution

Thank you for your beautiful code.
I run this model base on Kinetics dataset, and video format is .mp4.
1st ERROR: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
reason: Training on multiple GUPs is called tensor, which is divided into different kinds of memory or video memory.
Solution: In /.. /utils.py line 42, should add ** contiguous().view(1,-1) **, the same as line 46.

The train work but when the Iteration=40, it got a ERROR.
2nd ERROR: decord._ffi.base.DECORDError: [14:51:44] /io/decord/src/video/video_reader.cc:125: Check failed: st_nb >= 0 (-1381258232 vs. 0) ERROR cannot find video stream with wanted index: -1
And UserWarning: resource_tracker: There appear to be 122 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Although I tried many ways, I still couldn't solve the problem.

Would you kindly let me know of the problem. Thanks a lot

Feature extraction

Hi, thanks for the great codebase.

Could you kindly provide the code to extract features from custom videos using pre-trained models?

Can this project continue to be updated

Your project is very well written, but it seems that it has not been updated since TSM. Although there is mmaction, the bottom layer of mmcv feels very unfriendly.

./train.sh for TSM stop at the first log infor. : Freezing BatchNorm2D except...

Thanks for your nice style codebase.
However, when I try to train TSM in your codebase, there is a problem which stoped me from training it.
(1)The log file stop at: 2020-04-10 xxxx094-models.py#177: Freezing BatchNorm2D except the first one, and I wait it for 10 min but with no continue update.
(2)When I use 'gpustat' check the usage of gpu, it shows only about 800M data in each gpu(I use 8 in total)

I am sorry for disturbing you, while as a green hand also would be appreaciate if you could show me some light.

Can I run without GPU?

I failed in running ./easy_setup.sh on my mac. The error is :

Traceback (most recent call last):
  File "setup.py", line 28, in <module>
    torch.utils.cpp_extension.CUDA_HOME = _find_cuda_home()
  File "setup.py", line 20, in _find_cuda_home
    nvcc = subprocess.check_output(['which', 'nvcc']).decode().rstrip('\r\n')
  File "/opt/anaconda3/lib/python3.7/subprocess.py", line 395, in check_output
    **kwargs).stdout
  File "/opt/anaconda3/lib/python3.7/subprocess.py", line 487, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['which', 'nvcc']' returned non-zero exit status 1.

It seems that cuda is necessary but MAC doesn't have gpu.

forkserver

torch.multiprocessing.set_start_method("forkserver") might not work well for windows users.

meta_file when training?

when i train on my dataset,i meet this

problem AttributeError: 'EasyDict' object has no attribute 'meta_file'

Could anyboby help me? Thank you.

When training,the log file stop at 'save_dir: checkpoint/' and with no update again

Thank you for your nice code.
When I set the value in default.yaml
gpus:4
dataset:
img_prefix: '{:05d}.jpg'
video_source :True #because I use video as input data
modality: Flow
train:
meta_file : /my_path/train_videofolder.txt
trainer:
no_partial_bn: True

I set train script: train.sh
T=date +%m%d%H%M ROOT=../.. cfg=default.yaml export PYTHONPATH=$ROOT:$PYTHONPATH CUDA_VISIBLE_DEVICES=4,5,6,7 python $ROOT/x_temporal/train.py --config $cfg | tee log.train.$T

When I run train.sh , the log file stop at 'save_dir: checkpoint/' and with no update again.
Would you kindly let me know of the problem. Thanks a lot

issues with pytorch version higher than 1.4.0

The THCState_getCurrentStream seems deprecated in pytorch 1.5.0 or higher.
I read some information on replacing it with at::cuda::getCurrentCUDAStream, but failed. Any idea on fixing it?

error information below

./X-Temporal/x_temporal/cuda_shift/src/shift_cuda.cpp: In function ‘at::Tensor shift_featuremap_cuda_backward(const at::Tensor&, const at::Tensor&, const at::Tensor&)’:
./X-Temporal/x_temporal/cuda_shift/src/shift_cuda.cpp:41:27: error: ‘THCState_getCurrentStream’ was not declared in this scope
     ShiftDataCudaBackward(THCState_getCurrentStream(state),
                           ^~~~~~~~~~~~~~~~~~~~~~~~~

Abort the multi-label

Hi. When I test, I can not find the output file and I don't find how to output the multi-label. When multi-label, how to define my pred labels are right. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.