DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector

Code for DiffMatch paper: DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector.

Overview

Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this paper, we propose a VLM guidance-based semi-supervised CD method, namely DiffMatch. The insight of DiffMatch is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multi-temporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo labels may conflict with the pseudo labels from the consistency regularization paradigm (e.g. FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bi-temporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce metric-aware supervision by feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of DiffMatch. For instance, DiffMatch improves the FixMatch baseline by +5.3 $IoU^c$ on WHU-CD and by +2.4 $IoU^c$ on LEVIR-CD with 5% labels, and DiffMatch requires only 5% to 10% of the labels to achieve performance similar to the supervised methods. In addition, our CEG strategy, in an un-supervised manner, can achieve performance far superior to state-of-the-art (SOTA) un-supervised CD methods (e.g., IoU improved from 18.8% to 46.3% on LEVIR-CD dataset).

We evaluate DiffMatch on 2 change detection datasets (LEVIR-CD and WHU-CD), where it achieves major gains over previous semi-supervised methods as shown exemplary below.

If you find DiffMatch useful in your research, please consider citing:

@article{li2024diffmatch,
  title={DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector},
  author={Li, Kaiyu and Cao, Xiangyong and Deng, Yupeng and Meng, Deyu},
  journal={arXiv preprint arXiv:2405.04788},
  year={2024}
}

Getting Started

Environment

Create a conda environment:

conda create -n diffmatch python=3.7.13
conda activate diffmatch

Install the required pip packages:

pip install -r requirements.txt

Pre-Trained Backbones

ResNet-50 | ResNet-101 | Xception-65

├── ./pretrained
    ├── resnet50.pth
    ├── resnet101.pth
    └── xception.pth

Dataset

WHU-CD: imageA, imageB, and label
LEVIR-CD: imageA, imageB, and label

Please modify your dataset path in configuration files.

├── [Your WHU-CD/LEVIR-CD Path]
    ├── A
    ├── B
    └── label

Off-line generate VLM-guidance pseudo label

We provide the generated pseudo labels in the gen_cd_label and gen_seg_label directories (Download), and you can skip this step. If you want to reproduce our results step by step, you can refer to the following:

APE

APE is a vision-language model which can conduct open-vocabulary detection and segmentation. We directly use the released checkpoint APE-D to infer the roughly defined categories house, building, road, grass, tree, water, using the following commands:

# As an example, generate pre-event pseudo labels for the WHU-CD dataset.
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/WHU-CD-256/A/*.png --output APE_output/whu-cd_pseudo-label_ape_prob/A/ --confidence-threshold 0.2 --text-prompt 'house,building,road,grass,tree,water' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True

Before executing the above commands, please make sure that you have successfully built the APE environment. Please refer here to build APE's reasoning environment, we highly recommend using docker to build it.

After reasoning with APE, use the following commands for execute Change Event Generation (CEG) strategy:

# Execute instance-level CEG strategy
python scripts/gen_cd_map_json.py
# Execute Mixed CEG strategy
python scripts/gen_cd_map.py

Training

To launch a training job, please run:

python experiments.py --exp EXP_ID --run RUN_ID
# e.g. EXP_ID=47; RUN_ID=0 for DiffMatch on LEVIR-CD with 5% labels

It will automatically generate the relevant config files in configs/generated/ and start the corresponding training job.

For more information on the available experiments and runs, please refer to def generate_experiment_cfgs(exp_id) in experiments.py.

The training log, tensorboard, checkpoints, and debug images are stored in exp/.

Framework Structure

The following list provides the most relevant files of DiffMatch's implementation:

experiments.py: Definitions of the experiment configs used in the paper.
diffmatch_fixmatch.py: Main training logic for DiffMatch.
model/vlm.py: Vision-language model class.
model/builder.py: Logic for building a model from a config including a forward wrapper for feature perturbations.
third_party/unimatch/dataset/semicd.py: Data loader for semi-supervised training.
configs/_base_/models: Model config files.

Acknowledgements

DiffMatch is based on SemiVL, UniMatch, APE, and MMSegmentation. We thank their authors for making the source code publicly available.

likyoo / diffmatch Goto Github PK

diffmatch's Introduction

DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector

Overview

Getting Started

Environment

Pre-Trained Backbones

Dataset

Off-line generate VLM-guidance pseudo label

APE

Training

Framework Structure

Acknowledgements

diffmatch's People

Contributors

Stargazers

Watchers

Forkers

diffmatch's Issues

Recommend Projects

Recommend Topics

Recommend Org