Coder Social home page Coder Social logo

lizhenhao123 / san Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mendelxu/san

0.0 0.0 0.0 3.14 MB

Open-vocabulary Semantic Segmentation

Home Page: https://mendelxu.github.io/SAN/

License: MIT License

Shell 0.11% Python 99.41% Dockerfile 0.47%

san's Introduction

[CVPR2023-Highlight] Side Adapter Network for Open-Vocabulary Semantic Segmentation

PWC PWC PWC PWC PWC

This is the official implementation of our conference paper : "Side Adapter Network for Open-Vocabulary Semantic Segmentation".

Introduction

This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks. This decoupled design has the benefit CLIP in recognizing the class of mask proposals. Since the attached side network can reuse CLIP features, it can be very light. In addition, the entire network can be trained end-to-end, allowing the side network to be adapted to the frozen CLIP model, which makes the predicted mask proposals CLIP-aware. Our approach is fast, accurate, and only adds a few additional trainable parameters. We evaluate our approach on multiple semantic segmentation benchmarks. Our method significantly outperforms other counterparts, with up to 18 times fewer trainable parameters and 19 times faster inference speed.

Tab of Content

Demo

  • Run the demo app on 🤗HuggingFace. (It is running on a low-spec machine and could be slow)
  • Run the demo app with docker.
    docker build docker/app.Docker -t san_app
    docker run -it --shm-size 4G -p 7860:7860  san_app 
    

Installation

  1. Clone the repository
    git clone https://github.com/MendelXu/SAN.git
  2. Navigate to the project directory
    cd SAN
  3. Install the dependencies
    bash install.sh
    Hint: You can run the job in the docker instead of installing dependencies locally. Run with pre-built docker:
    docker run -it --gpus all --shm-size 8G mendelxu/pytorch:d2_nvcr_2008 /bin/bash
    
    or build your docker with provided dockerfile docker/Dcokerfile.

Data Preparation

See SimSeg for reference. The data should be organized like:

datasets/
    coco/
        ...
        train2017/
        val2017/
        stuffthingmaps_detectron2/
    VOC2012/
        ...
        images_detectron2/
        annotations_detectron2/
    pcontext/
        ...
        val/
    pcontext_full/
        ...
        val/
    ADEChallengeData2016/
        ...
        images/
        annotations_detectron2/
    ADE20K_2021_17_01/
        ...
        images/
        annotations_detectron2/        

Hint In the code, those datasets are registered with their related dataset names. The relationship is:

coco_2017_*_stuff_sem_seg : COCO Stuff-171
voc_sem_seg_*: Pascal VOC-20
pcontext_sem_seg_*: Pascal Context-59
ade20k_sem_seg_*: ADE-150
pcontext_full_sem_seg_*: Pascal Context-459
ade20k_full_sem_seg_*: ADE-847

Usage

  • Pretrained Weights

    Model Config Weights Logs
    SAN-ViT-B/16 configs/san_clip_vit_res4_coco.yaml Huggingface Log
    SAN-ViT-L/14 configs/san_clip_vit_large_res4_coco.yaml Huggingface Log
  • Evaluation

    • evaluate trained model on validation sets of all datasets.
    python train_net.py --eval-only --config-file <CONFIG_FILE> --num-gpus <NUM_GPU> OUTPUT_DIR <OUTPUT_PATH> MODEL.WEIGHTS <TRAINED_MODEL_PATH>

    For example, evaluate our pre-trained model:

    # 1. Download SAN (ViT-B/16 CLIP) from https://huggingface.co/Mendel192/san/blob/main/san_vit_b_16.pth.
    # 2. put it at `output/model.pth`.
    # 3. evaluation
      python train_net.py --eval-only --config-file configs/san_clip_vit_res4_coco.yaml --num-gpus 8 OUTPUT_DIR ./output/trained_vit_b16 MODEL.WEIGHTS output/model.pth
    
    • evaluate trained model on validation sets of one dataset.
    python train_net.py --eval-only --config-file <CONFIG_FILE> --num-gpus <NUM_GPU> OUTPUT_DIR <OUTPUT_PATH> MODEL.WEIGHTS <TRAINED_MODEL_PATH> DATASETS.TEST "('<FILL_DATASET_NAME_HERE>',)"

  • Training

    wandb off
    # [Optional] If you want to log the training logs to wandb.
    # wandb login
    # wandb on
    python train_net.py --config-file <CONFIG_FILE> --num-gpus <NUM_GPU> OUTPUT_DIR <OUTPUT_PATH> WANDB.NAME <WANDB_LOG_NAME>

    Hint: We use <> to denote the variables you should replace according to your own setting.

TODO

  • Add detailed description of the dataset preparation process.
  • Transfer and test cross attention implementation.
  • Add support to flash attention.

License

Distributed under the MIT License. See LICENSE for more information.

Cite

If you find it helpful, you can cite our paper in your work.

@proceedings{xu2023side,
  title={Side Adapter Network for Open-Vocabulary Semantic Segmentation},
  author={Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai},
  journal={CVPR},
  year={2023}
}

san's People

Contributors

dependabot[bot] avatar mendelxu avatar stupidzz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.