Coder Social home page Coder Social logo

minghanli / stmask Goto Github PK

View Code? Open in Web Editor NEW
35.0 4.0 8.0 3.2 MB

Code release for "STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation"(CVPR2021)

Python 92.86% Shell 0.20% CSS 0.67% HTML 0.65% JavaScript 5.11% Cython 0.51%
correlation-coefficient video-instance-segmentation video-segmentation

stmask's Introduction

STMask

The code is implmented for our paper in CVPR2021:

image

News

  • [27/06/2021] !Important issue: For previous results of YTVIS2021 and OVIS datasets, we use the bounding boxes with normalization in the function bbox_feat_extractor() of track_to_segmetn_head.py by mistake. However, the bounding boxes in bbox_feat_extractor() function should not be normalized. We update the results and trained models for YTVIS2021 and OVIS datasets. Apologize for our negligence.
  • [12/06/2021] Update the solution for the error in deform_conv_cuda.cu
  • [22/04/2021] Add experimental results on YTVIS2021 and OVIS datasets
  • [14/04/2021] Release code on Github and paper on arxiv

Installation

  • Clone this repository and enter it:

    git clone https://github.com/MinghanLi/STMask.git
    cd STMask
  • Set up the environment using one of the following methods:

    • Using Anaconda
      • Run conda env create -f environment.yml
      • conda activate STMask-env
    • Manually with pip
      • Set up a Python3 environment.
      • Install Pytorch 1.0.1 (or higher) and TorchVision.
      • Install some other packages:
        # Cython needs to be installed before pycocotools
        pip install cython
        pip install opencv-python pillow pycocotools matplotlib 
  • Install mmcv and mmdet

    • According to your Cuda and pytorch version to install mmcv or mmcv-full from here. Here my cuda and torch version are 10.1 and 1.5.0 respectively.
      pip install mmcv-full==1.1.2 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.5.0/index.html
    • install cocoapi and a customized COCO API for YouTubeVIS dataset from here
      pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI"
      git clone https://github.com/youtubevos/cocoapi
      cd cocoapi/PythonAPI
      # To compile and install locally 
      python setup.py build_ext --inplace
      # To install library to Python site-packages 
      python setup.py build_ext install
  • Install spatial-correlation-sampler

    pip install spatial-correlation-sampler
  • Complie DCNv2 code (see Installation)

    • Download code for deformable convolutional layers from here
      git clone https://github.com/CharlesShang/DCNv2.git
      cd DCNv2
      python setup.py build develop
  • Modify mmcv/ops/deform_conv.py to handle deformable convolution with different height and width (like 3 * 5) in FCB(ali) or FCB(ada)

    • Open the file deform_conv.py
      vim /your_conda_env_path/mmcv/ops/deform_conv.py
    • Replace padW=ctx.padding[1], padH=ctx.padding[0] with padW=ctx.padding[0], padH=ctx.padding[1], taking Line 81-89 as an example:
      ext_module.deform_conv_forward(
              input,
              weight,
              offset,
              output,
              ctx.bufs_[0],
              ctx.bufs_[1],
              kW=weight.size(3),
              kH=weight.size(2),
              dW=ctx.stride[1],
              dH=ctx.stride[0],
              padW=ctx.padding[0],
              padH=ctx.padding[1],
              dilationW=ctx.dilation[1],
              dilationH=ctx.dilation[0],
              group=ctx.groups,
              deformable_group=ctx.deform_groups,
              im2col_step=cur_im2col_step)

Dataset

  • If you'd like to train STMask, please download the datasets from the official web: YTVIS2019, YTVIS2021 and OVIS.

Evaluation

The input size on all VIS benchmarks is 360*640 here.

Quantitative Results on YTVIS2019 ((trained with 12 epoches))

Here are our STMask models (released on April, 2021) along with their FPS on a 2080Ti and mAP on valid set, where mAP and mAP* are obtained under cross class fast nms and fast nms respectively. Note that FCB(ali) and FCB(ada) are only executed on the classification branch.

Backbone FCA FCB TF FPS mAP mAP* Weights
R50-DCN-FPN FCA - TF 29.3 32.6 33.4 STMask_plus_resnet50.pth
R50-DCN-FPN FCA FCB(ali) TF 27.8 - 32.1 STMask_plus_resnet50_ali.pth
R50-DCN-FPN FCA FCB(ada) TF 28.6 32.8 33.0 STMask_plus_resnet50_ada.pth
R101-DCN-FPN FCA - TF 24.5 36.0 36.3 STMask_plus_base.pth
R101-DCN-FPN FCA FCB(ali) TF 22.1 36.3 37.1 STMask_plus_base_ali.pth
R101-DCN-FPN FCA FCB(ada) TF 23.4 36.8 37.9 STMask_plus_base_ada.pth

Quantitative Results on YTVIS2021 (trained with 12 epoches)

Backbone FCA FCB TF mAP* Weights Results
R50-DCN-FPN FCA - TF 30.6 STMask_plus_resnet50_YTVIS2021.pth -
R50-DCN-FPN FCA FCB(ada) TF 31.1 STMask_plus_resnet50_ada_YTVIS2021.pth stdout.txt
R101-DCN-FPN FCA - TF 33.7 STMask_plus_base_YTVIS2021.pth -
R101-DCN-FPN FCA FCB(ada) TF 34.6 STMask_plus_base_ada_YTVIS2021.pth stdout.txt

Quantitative Results on OVIS (trained with 20 epoches)

Backbone FCA FCB TF mAP* Weights Results
R50-DCN-FPN FCA - TF 15.4 STMask_plus_resnet50_OVIS.pth -
R50-DCN-FPN FCA FCB(ada) TF 15.4 STMask_plus_resnet50_ada_OVIS.pth stdout.txt
R101-DCN-FPN FCA - TF 17.3 STMask_plus_base_OVIS.pth stdout.txt
R101-DCN-FPN FCA FCB(ada) TF 15.8 STMask_plus_base_ada_OVIS.pth -

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands. The name of each config is everything before the numbers in the file name (e.g., STMask_plus_base for STMask_plus_base.pth). Here all STMask models are trained based on yolact_plus_base_54_80000.pth or yolact_plus_resnet_54_80000.pth from Yolact++ here.

Quantitative Results on COCO

We also provide quantitative results of Yolcat++ with our proposed feature calibration for anchors and boxes on COCO (w/o temporal fusion module). Here are the results on COCO valid set.

Image Size Backbone FCA FCB B_AP M_AP Weights
[550,550] R50-DCN-FPN FCA - 34.5 32.9 yolact_plus_resnet50_54.pth
[550,550] R50-DCN-FPN FCA FCB(ali) 34.6 33.3 yolact_plus_resnet50_ali_54.pth
[550,550] R50-DCN-FPN FCA FCB(ada) 34.7 33.2 yolact_plus_resnet50_ada_54.pth
[550,550] R101-DCN-FPN FCA - 35.7 33.3 yolact_plus_base_54.pth
[550,550] R101-DCN-FPN FCA FCB(ali) 35.6 34.1 yolact_plus_base_ali_54.pth
[550,550] R101-DCN-FPN FCA FCB(ada) 36.4 34.8 yolact_plus_baseada_54.pth

Inference

# Output a YTVOSEval json to submit to the website.
# This command will create './weights/results.json' for instance segmentation.
python eval.py --config=STMask_plus_base_ada_config --trained_model=weights/STMask_plus_base_ada.pth --mask_det_file=weights/results.json
# Output a visual segmentation results
python eval.py --config=STMask_plus_base_ada_config --trained_model=weights/STMask_plus_base_ada.pth --mask_det_file=weights/results.json --display

Training

By default, we train on YouTubeVOS2019 dataset. Make sure to download the entire dataset using the commands above.

  • To train, grab an COCO-pretrained model and put it in ./weights.

    • [Yolcat++]: For Resnet-50/-101, download yolact_plus_base_54_80000.pth or yolact_plus_resnet_54_80000.pth from Yolact++ here.
    • [Yolcat++ & FC]: Alternatively, you can use those Yolact++ with FC models on Table. 2 for training, which can obtain a relative higher performance than that of Yolact++ models.
  • Run one of the training commands below.

    • Note that you can press ctrl+c while training and it will save an *_interrupt.pth file at the current iteration.
    • All weights are saved in the ./weights directory by default with the file name <config>_<epoch>_<iter>.pth.
# Trains STMask_plus_base_config with a batch_size of 8.
CUDA_VISIBLE_DEVICES=0,1 python train.py --config=STMask_plus_base_config --batch_size=8 --lr=1e-4 --save_folder=weights/weights_r101


# Resume training STMask_plus_base with a specific weight file and start from the iteration specified in the weight file's name.
CUDA_VISIBLE_DEVICES=0,1 python train.py --config=STMask_plus_base_config --resume=weights/STMask_plus_base_10_32100.pth 

Citation

If you use STMask or this code base in your work, please cite

@inproceedings{STMask-CVPR2021,
  author    = {Minghan Li and Shuai Li and Lida Li and Lei Zhang},
  title     = {Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation},
  booktitle = {CVPR},
  year      = {2021},
}

Contact

For questions about our paper or code, please contact Li Minghan ([email protected] or [email protected]).

stmask's People

Contributors

minghanli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

stmask's Issues

valid_sub_YouTube_VOS_dataset

valid_sub_YouTube_VOS_dataset = dataset_base.copy({
    'img_prefix': '../datasets/YouTube_VOS2019/train/JPEGImages',
    'ann_file': '../datasets/YouTube_VOS2019/annotations_instances/valid_sub.json',
    'test_mode': False,
})

@MinghanLi
It sounds like you split the train jsons into train_sub and valid_sub for epoch/iterations evaluation.
Could you provide more details about this one such as how to split to reproduce your results? I could not find it across the codes.

And, is that the train-sub and valid_subare only used in developments, and all reported results are trained by full train set after 12 epochs for YT and 20 epochs for OVIS?

Thanks!

Error from Youtube 2019 Evaluation

Thank you for sharing the repo!

We are trying to reproduce the results of Youtube2019 based on your instructions. We tried two different pretrained weights, but got the same errors:

RuntimeError: invalid spatial size of offset, expected height: %d width: %d, but got height: %d width: %d50784880 (deform_conv_shape_check at ./mmcv/ops/csrc/pytorch/deform_conv_cuda.cu:177)

The two commands we used were as follows:
python eval.py --config=STMask_plus_base_ali_config --trained_model=weights/STMask_plus_base_ali.pth --mask_det_file=weights/results.json

python eval.py --config=STMask_plus_base_ada_config --trained_model=weights/STMask_plus_base_ada.pth --mask_det_file=weights/results.json

We would really appreciate some help, thank you very much!

Confuse about batchsize and total iterations

Dear author, thanks for your great job! I'm trying to train STMask and I find the batchsize and max_iter you gave in the code is different from the paper.
In the paper, you set bz 16 and total iters 160000.

paper

But in the code you set bz 8 and max_iters 250000
bzreal
iters

So, I want to know the exact paramters to get the final result.

g++ 's error

I found a problem.
This code runs uncorrectly in Pytorch1.5.0
please use Pytorch 1.4.0
and when install mmcv, please use pip install mmcv-full==1.1.2 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.4.0/index.html

weights link

Thank you for the great project!!
I try to test your project using provided weights.
However, it seems that the link for the provided weights is incorrect.
Could you please check the links?

RuntimeError: CUDA error: invalid device function 段错误 (核心已转储)

File "/STMask/layers/modules/track_to_segment_head.py", line 61, in correlate
out_corr = out_corr.view(b, ph*pw, h, w) / x1.size(1)
RuntimeError: CUDA error: invalid device function
段错误 (核心已转储)

Thank you for sharing the code!

when I run the code, I met the error above.

I would really appreciate your help to fix this problem.

Thank your very much~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.