Generative Semantic Segmentation
Paper
Generative Semantic Segmentation,
Jiaqi Chen, Jiachen Lu, Xiatian Zhu, and Li Zhang
CVPR 2023
Abstract
We present Generative Semantic Segmentation (GSS), a generative framework for semantic segmentation. Unlike previous methods addressing a per-pixel classification problem, we cast semantic segmentation into an image-conditioned mask generation problem. This is achieved by replacing the conventional per-pixel discriminative learning with a latent prior learning process. Specifically, we model the variational posterior distribution of latent variables given the segmentation mask. This is done by expressing the segmentation mask with a special type of image (dubbed as maskige). This posterior distribution allows to generate segmentation masks unconditionally. To implement semantic segmentation, we further introduce a conditioning network (e.g., an encoder-decoder Transformer) optimized by minimizing the divergence between the posterior distribution of maskige (i.e. segmentation masks) and the latent prior distribution of input images on the training set. Extensive experiments on standard benchmarks show that our GSS can perform competitively to prior art alternatives in the standard semantic segmentation setting, whilst achieving a new state of the art in the more challenging cross-domain setting.
TODO List
- Upload model weights and DALL-E VQVAE weight
- Complete install.md
- Add dataset link
Results
Cityscapes
Name | Backbone | Iterations | mIoU | mAcc | Config | checkpoint |
---|---|---|---|---|---|---|
GSS-FF | R101 | 80k | 77.76 | 85.9 | config | google drive |
GSS-FF | Swin-L | 80k | 78.90 | 87.03 | config | google drive |
GSS-FT-W | ResNet | 80k | 78.46 | 85.92 | config | google drive |
GSS-FT-W | Swin-L | 80k | 80.05 | 87.32 | config | google drive |
ADE20K
Name | Backbone | Iterations | mIoU | mAcc | Config | checkpoint |
---|---|---|---|---|---|---|
GSS-FF | Swin-L | 160k | 46.29 | 57.84 | config | google drive |
GSS-FT-W | Swin-L | 160k | 48.54 | 58.94 | config | google drive |
MSeg
Name | Backbone | Iterations | h.mean | Config | checkpoint |
---|---|---|---|---|---|
GSS-FF | HRNet-W48 | 160k | 52.60 | config | google drive |
GSS-FF | Swin-L | 160k | 59.49 | config | google drive |
GSS-FT-W | HRNet-W48 | 160k | 55.20 | config | google drive |
GSS-FT-W | Swin-L | 160k | 61.94 | config | google drive |
Get Started
Environment
This implementation is build upon mmsegmentation, please follow the steps in install.md to prepare the environment.
Train & Test
# train with 8 GPUs
bash tools/dist_train.sh configs/gss/cityscapes/gss-ff_r101_768x768_80k_cityscapes.py 8
# test with 8 GPUs
bash tools/dist_test.sh configs/gss/cityscapes/gss-ff_r101_768x768_80k_cityscapes.py ./ckp_dir/iter_80000.pth 8 --eval mIoU
Reference
@inproceedings{chen2023generative,
title={Generative Semantic Segmentation
author={Chen, Jiaqi and Lu, Jiachen and Zhu, Xiatian and Zhang, Li},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}