This page is for organizing the contents of Video Object Segmentation.
ICCV 2015
- Graph-based object segmentation algorithm.
CVPR 2016
- Paper : https://graphics.ethz.ch/~perazzif/davis/files/davis.pdf
- Project repository : https://github.com/fperazzi/davis-2017 (2016, 2017)
- To evaluate three evaluation metric, one can link the veido segmentation result *.png with the project.
- DAVIS challenge evaluation example
Input example :
python eval.py -i ../../../OSVOS-PyTorch/models/Results/ -o result.yaml --year 2016 --single-object --phase val
See DAVIS Git repo for detail.
- Paper : http://files.is.tue.mpg.de/black/papers/TsaiCVPR2016.pdf
- Project repository : https://github.com/wasidennis/ObjectFlow
BMVC 2017
- Paper : https://arxiv.org/pdf/1706.09364.pdf
- Project repository : +https://www.vision.rwth-aachen.de/software/OnAVOS
ICCV 2017
- Paper : https://arxiv.org/pdf/1709.06750.pdf
- Project repository : https://github.com/JingchunCheng/SegFlow
- Using FlowNet for temporal information and FCN for segmentation in bidirectional way.
- Iterative training scheme
- Using two stream network (i.e., Search and Query)
- Network consists of three parts: encoding, pixel-level similarity FC layer and decoding.
- Compressers are used for memory efficiency.
CVPR 2017
- Paper : http://openaccess.thecvf.com/content_cvpr_2017/papers/Caelles_One-Shot_Video_Object_CVPR_2017_paper.pdf
- Project repository : https://github.com/kmaninis/OSVOS-PyTorch
- Adapt the CNN to a particular object instance given a single annotated image
- Segmenting each frame independently
- Can work at various points of the trade-off between speed and accuracy
- Improve the performance by a significant margin (79.8%)
- Propose two network streams for segmentation and segmentation result refinement
- Muilt-training step. (e.g., Base Network using pretrained weights, Parent Network using DAVIS dataset, Test Network using the frame 1)
- Paper : http://openaccess.thecvf.com/content_cvpr_2017/papers/Jang_Online_Video_Object_CVPR_2017_paper.pdf
- Project repository : https://github.com/wdjang/CTN
- Trident Network : separative(sgementation) / definite foreground / definite background
- Encoder - Deconder structure : 1-encoder stream / 3(tri)-decoder stream
- Paper : https://graphics.ethz.ch/~perazzif/masktrack/files/masktrack.pdf
- Project repository : https://graphics.ethz.ch/~perazzif/masktrack/index.html
- MaskTrack : use only one segmentation network (Deeplabv2-VGG16)
- Training time : Input -> RGB + synthesize training mask (video whole frames are not required) [Offline] affine transformations and non-rigid deformations are used for augmentation ~ 10^4 [Online] Using first frame with ground turth is exploited ~ 10^3
- Test time : Input -> RGB + t-1 segmentation mask
- Optical flow and CRF are optionally used
- Paper : https://varunjampani.github.io/papers/jampani17_VPN.pdf
- Project repository : https://github.com/varunjampani/video_prop_networks
- Using bilateral filter networks
- Online propagation : The method need no future frame (real-time, casual)
- Paper : https://davischallenge.org/challenge2017/papers/DAVIS-Challenge-6th-Team.pdf
- Project repository : https://github.com/JingchunCheng/Seg-with-SPN
CVPR 2018
- Paper : http://www.eecs.harvard.edu/~kalyans/research/videosegmentation/FastVideoSegmentation_CVPR18.pdf
- Project repository : https://github.com/seoungwugoh/RGMP
- Siamese encoder-decoder network for one-shot VOS
- The network works without any online-learning or post-processing
- Using two-stage scheme that pre-trains the network on synthetically generated image data & fine-tunes it on vedio data
- In fune-tuning step, they exploit BPTT (i.e., RNN structure)
- Training time is so long (totally 5 days)
- Part-based tracker + ROI SegNet + Similarity based part Aggregation
- Paper : https://arxiv.org/pdf/1802.01218.pdf
- Project repository : https://github.com/linjieyangsc/video_seg
- Motivated by conditional batch normalization
- y = rx + b
- Feature maps are modulated by visual / spatial modulator
- Visual modulator : [input] first frame image [output] scale parameters Spatial modulator : [input] t-1 frame image [output] bias parameters
- Optical flow-based + Active contour (Level-set) + CRN
- Paper : http://openaccess.thecvf.com/content_cvpr_2018/papers/Xiao_MoNet_Deep_Motion_CVPR_2018_paper.pdf
- Non-casual VOS system (see Figure 2)
- Using optical flow for robustness (Flownet 2.0 is used)
- For segmentation, DeepLab is utilized
- Distance Transform layer using FastMBD algorithm
- MoNet consists of a segmentation stream and an optical flow stream.
TPAMI
arXiv
- Paper : https://arxiv.org/pdf/1703.09554.pdf
- Project repository : https://github.com/ankhoreva/LucidDataDreaming