Introduction

Recurrent Dynamic Embedding for Video Object Segmentation [CVPR 2022]

Install

If you just implement our method, refer to Requirements.
If you want to evaluate our method on the davis 2017 validation set, refer to Requirements.

Model zoo

You can download the pretrained models from Google.

The predictions of our method can be download from Google.

Dataset

Following STCN, we train the network in the three stages. Firstly, we train the network on the static image dataset, which can be downloaded in download_datasets.py. Then we fine-tune the network with SAM on the BL30K dataset, which can be downloaded in download_bl30k.py. Note, BL30K is an extensive dataset introduced by MiVOS and is 700GB in total. Finally, we fine-tune the network with SAM on the mixed dataset (DAVIS 2017 and YouTube-VOS 2019).

I know it doesn't look straightforward., while you can just download DAVIS 2017 and have a quick start right away.

├── BL30K
├── DAVIS
│   ├── 2016
│   │   ├── Annotations
│   │   └── ...
│   └── 2017
│       ├── test-dev
│       │   ├── Annotations
│       │   └── ...
│       └── trainval
│           ├── Annotations
│           └── ...
├── static
│   ├── BIG_small
│   └── ...
├── YouTube
│   ├── all_frames
│   │   └── valid_all_frames
│   ├── train
│   ├── train_480p
│   └── valid
└── YouTube2018
    ├── all_frames
    │   └── valid_all_frames
    └── valid

Quick start

Take the inference on the Davis 2017 validation set as an example. The inference command is as follows:

python eval_davis.py --output ... --davis_path ... --model  ...   --mode two-frames-compress  --mem_every ... --top ... --amp

output: prediction path to save the results.
davis_path: the path for Davis 2017.
model: pre-trained model path.
mode: optional items, such as two-frames-compress, gt-compress, last-compress.
mem_every: the interval to use SAM.
top: topk-filter.
amp: use apex to infer.

You can use this protocol.

python eval_davis.py --output prediction/s012 --davis_path ... --model  pretrain/model_s012_final.pth   --mode two-frames-compress  --mem_every 3 --amp

Quick evaluation

Take the evaluation on the Davis 2017 validation set as an example.

We modify this repo to evaluate our method.

python evaluation/2017/evaluation_ours.py --results_path ... --davis_path ...

results_path: prediction path to save the results.
davis_path: the path for Davis 2017.

Results

Without BL30K

Dataset	Split				FPS
DAVIS 2016	validation	91.1	89.7	92.5	35.0
DAVIS 2017	validation	84.2	80.8	87.5	27.0
DAVIS 2017	test-dev	77.4	73.6	81.2	-

Dataset	Split
YouTube 2019	validation	81.9	81.1	85.5	76.2	84.8

With BL30K

Dataset	Split				FPS
DAVIS 2016	validation	91.6	90.0	93.2	35.0
DAVIS 2017	validation	86.1	82.1	90.0	27.0
DAVIS 2017	test-dev	78.9	74.9	92.9	-

Dataset	Split
YouTube 2019	validation	83.3	81.9	86.3	78.0	86.9

Inference

By one gpu, you can infer these datasets as follows:

DAVIS 2017 validation set

python eval_davis.py --output prediction/DAVIS-2017-val --davis_path ... --model  pretrain/model_s012_final.pth   --mode two-frames-compress  --mem_every 3 --amp

Davis 2017 test set

python eval_davis.py --output prediction/DAVIS-2017-test --davis_path ... --model  pretrain/model_s012_final.pth   --mode two-frames-compress  --mem_every 3 --top 40 --split testdev --amp

DAVIS 2016 validation set

python eval_davis_2016.py --output prediction/DAVIS-2017-val --davis_path ... --model  pretrain/model_s012_final.pth   --mode two-frames-compress  --mem_every 3 --top 40 --split testdev --amp

YouTube 2019 validation set

python eval_youtube.py --output prediction/YV-19-val --yv_path ... --model  pretrain/model_s012_final_yv.pth  --mode two-frames-compress  --mem_every 4 --top 20 --amp

Training

Firstly, you must configure the paths to the dataset in util/hyper_para.py, which include --static_root, --bl_root, --yv_root and --davis_root.

stage 0

cd rootdir &&\
OMP_NUM_THREADS=4 python -m  torch.distributed.launch --master_port 9843 \
--nproc_per_node=4 \
train.py --id  s0 \
--stage 0 \
--perturb_max 1 \
--perturb_min 0.85 \
--save_interval  10000 \
--klloss_weight 10 \
--start_warm 5000 \
--end_warm 17500 \
--batch_size 16 \
--lr 2e-05 \
--steps 37500 \
--iterations 75000 \
--repeat 0

stage 0 -> 3 (w/o BL30K)

cd rootdir &&\
OMP_NUM_THREADS=4 python -m  torch.distributed.launch --master_port 9844 \
--nproc_per_node=2 \
train.py --id  s03 \
--stage 3 \
--load_network pretrain/s0/model_75000.pth \
--perturb_max 1 \
--perturb_min 0.85 \
--save_interval  10000 \
--klloss_weight 10 \
--batch_size 4 \
--lr 2e-05 \
--steps 125000 \
--iterations 150000 \
--repeat 0

stage 1

cd rootdir &&\
OMP_NUM_THREADS=4 python -m  torch.distributed.launch --master_port 9843 \
--nproc_per_node=2 \
train.py --id  s1 \
--stage 1 \
--load_network pretrain/s0/model_75000.pth \
--perturb_max 1 \
--perturb_min 0.85 \
--save_interval  10000 \
--klloss_weight 10 \
--start_warm 20000 \
--end_warm 70000 \
--batch_size 4 \
--lr 1e-05 \
--steps 400000 \
--iterations 500000 \
--repeat 0

stage 2

cd rootdir &&\
OMP_NUM_THREADS=4 python -m  torch.distributed.launch --master_port 9843 \
--nproc_per_node=2 \
train.py --id  s2 \
--stage 2 \
--load_network /gdata/limx/VOS/SAM/cvpr-22-code/pretrain/s1/model_500000.pth \
--perturb_max 1 \
--perturb_min 0.85 \
--save_interval  10000 \
--klloss_weight 5 \
--decoder_f2_weight 5 \
--decoder_f4_weight 5 \
--start_warm 5000 \
--end_warm 17500 \
--batch_size 8 \
--lr 2e-05 \
--steps 62500 \
--iterations 75000 \
--repeat 0

Note since I suffered temporary layoffs during my internship at Alibaba, there is uncertainty about the installation environment and the version of the code I applied for. I tried to reproduce the previous parameters on this version and got 0.857 on Davis 17 val (0.861 in the original paper) and 79.2 on Davis 17 test (0.789 in the original paper).

The original parameters:

klloss_weight = 10 (paper) -> 5 (now)
decoder_f2_weight = 10 (paper) -> 5 (now)
decoder_f4_weight = 10 (paper) -> 5 (now)

Acknowledgement

This project is built upon numerous previous projects. We'd like to thank the contributors of STCN and MiVOS.

To do

quick start and quick evaluation.
inference codes.
training detials.
pre-trained models.

trellixvulnteam / rde-vos-cvpr2022_cf02 Goto Github PK

rde-vos-cvpr2022_cf02's Introduction