Coder Social home page Coder Social logo

diffmatch's Introduction

DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector

Code for DiffMatch paper: DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector.

Overview

Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this paper, we propose a VLM guidance-based semi-supervised CD method, namely DiffMatch. The insight of DiffMatch is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multi-temporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo labels may conflict with the pseudo labels from the consistency regularization paradigm (e.g. FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bi-temporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce metric-aware supervision by feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of DiffMatch. For instance, DiffMatch improves the FixMatch baseline by +5.3 $IoU^c$ on WHU-CD and by +2.4 $IoU^c$ on LEVIR-CD with 5% labels, and DiffMatch requires only 5% to 10% of the labels to achieve performance similar to the supervised methods. In addition, our CEG strategy, in an un-supervised manner, can achieve performance far superior to state-of-the-art (SOTA) un-supervised CD methods (e.g., IoU improved from 18.8% to 46.3% on LEVIR-CD dataset).

We evaluate DiffMatch on 2 change detection datasets (LEVIR-CD and WHU-CD), where it achieves major gains over previous semi-supervised methods as shown exemplary below.

If you find DiffMatch useful in your research, please consider citing:

@article{li2024diffmatch,
  title={DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector},
  author={Li, Kaiyu and Cao, Xiangyong and Deng, Yupeng and Meng, Deyu},
  journal={arXiv preprint arXiv:2405.04788},
  year={2024}
}

Getting Started

Environment

Create a conda environment:

conda create -n diffmatch python=3.7.13
conda activate diffmatch

Install the required pip packages:

pip install -r requirements.txt

Pre-Trained Backbones

ResNet-50 | ResNet-101 | Xception-65

├── ./pretrained
    ├── resnet50.pth
    ├── resnet101.pth
    └── xception.pth

Dataset

Please modify your dataset path in configuration files.

├── [Your WHU-CD/LEVIR-CD Path]
    ├── A
    ├── B
    └── label

Off-line generate VLM-guidance pseudo label

We provide the generated pseudo labels in the gen_cd_label and gen_seg_label directories (Download), and you can skip this step. If you want to reproduce our results step by step, you can refer to the following:

APE

APE is a vision-language model which can conduct open-vocabulary detection and segmentation. We directly use the released checkpoint APE-D to infer the roughly defined categories house, building, road, grass, tree, water, using the following commands:

# As an example, generate pre-event pseudo labels for the WHU-CD dataset.
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/WHU-CD-256/A/*.png --output APE_output/whu-cd_pseudo-label_ape_prob/A/ --confidence-threshold 0.2 --text-prompt 'house,building,road,grass,tree,water' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True

Before executing the above commands, please make sure that you have successfully built the APE environment. Please refer here to build APE's reasoning environment, we highly recommend using docker to build it.

After reasoning with APE, use the following commands for execute Change Event Generation (CEG) strategy:

# Execute instance-level CEG strategy
python scripts/gen_cd_map_json.py
# Execute Mixed CEG strategy
python scripts/gen_cd_map.py

Training

To launch a training job, please run:

python experiments.py --exp EXP_ID --run RUN_ID
# e.g. EXP_ID=47; RUN_ID=0 for DiffMatch on LEVIR-CD with 5% labels

It will automatically generate the relevant config files in configs/generated/ and start the corresponding training job.

For more information on the available experiments and runs, please refer to def generate_experiment_cfgs(exp_id) in experiments.py.

The training log, tensorboard, checkpoints, and debug images are stored in exp/.

Framework Structure

The following list provides the most relevant files of DiffMatch's implementation:

Acknowledgements

DiffMatch is based on SemiVL, UniMatch, APE, and MMSegmentation. We thank their authors for making the source code publicly available.

diffmatch's People

Contributors

likyoo avatar

Stargazers

Rex Newman avatar Jiawei Jiang avatar  avatar  avatar Harald Lapp avatar  avatar Sapere Aude avatar Tongfei avatar  avatar  avatar Chao Pang avatar YonghuiTAN avatar

Watchers

Harald Lapp avatar  avatar  avatar  avatar  avatar

Forkers

wdc233

diffmatch's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.