This repo contains the supported code and configurations to reproduce Swin Transformer for semantic segmentation.
The goal is to do semantic segmentation on photos that can include one or more person.
The dataset is available on download here
In order to create annotation data I converted masks to label format [0,1].
The file structure of the dataset is shown below:
├── data
│ ├── full_body_tik_tok
│ │ ├── annotations
│ │ │ ├── training_1D
│ │ │ │ ├── xxx.png
│ │ │ │ ├── yyy.png
│ │ │ │ ├── zzz.png
│ │ │ ├── validation_1D
│ │ ├── images
│ │ │ ├── training
│ │ │ │ ├── xxx.png
│ │ │ │ ├── yyy.png
│ │ │ │ ├── zzz.png
│ │ │ ├── validation
See get started for installation
Navigate to train and test in colab
Notes:
- Refer to config to see example
- Refer to custom dataset preparation
- Refer to custom model preparation
Navigate to flask app to make prediction in web
Note:
- Do not forget put your ngrok token
Refer to youtube.
Name | mAcc | mIoU | Model |
---|---|---|---|
swin_tiny_patch4_window7_224 | 0.9291 | 0.7851 | model |
swin_base_patch4_window7_224 | 0.9015 | 0.7319 | model |
fcn_r101_d8 | 0.9798 | 0.9009 | model |
pspnet_r50-d8 | 0.9611 | 0.9356 | model |