Coder Social home page Coder Social logo

sinamalakouti / cddmsl Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 0.0 13.36 MB

[BMVC2023] This is the official repo for Semi-Supervised Domain Generalization for Object Detection via Language-Guided Feature Alignment (BMVC2023)

License: MIT License

Python 95.34% C++ 0.96% Cuda 3.49% Shell 0.18% CMake 0.03%
bmvc2023 conference-paper domain-adaptation domain-generalization object-detection semi-supervised-learning vision-language-model

cddmsl's Introduction

Semi-Supervised Domain Generalization for Object Detection via Language-Guided Feature Alignment (CDDMSL)
Sina Malakouti and Adriana Kovashka

This is the official repo for Semi-Supervised Domain Generalization for Object Detection via Language-Guided Feature Alignment (BMVC2023)

Please contact Sina Malakouti at sem238(at)pitt(dot)edu or siinamalakouti(at)gmail(dot)com for any questions or more information.

arXiv | Official BMVC Proceeding | Video | Poster | Supplement | BMVC Project Page

This repo will be updated soon!

Methodology

Results

Real-to-Artistic Generalization

Adverse-Weather Generalization

Setup

For installing the project, please see RegionCLIP and Detectron2.

Datasets

Real-to-Artistic

For this task, we used PASCAL-VOC as a labeled domain. Then, either Clipart, Comic, or Watercolor is used as the unlabeled domain. For instance, if Pascal-VOC and Clipart are used as labeled and unlabeled source domains. Then, Comics and Watercolor are the target domains in the DG experiment.

Please see here for downloading the dataset.

Please see the following files for dataset creation and/or modification:

  • detectron2/data/datasets/pascal_voc.py
  • detectron2/data/datasets/builtin.py

Adverse-Weather

Please download cityscapes and foggy-cityscapes as well as the bdd100k. Note that for bdd100k, we only used the validation set.

Please see the following files for dataset creation and/or modification:

  • detectron2/data/datasets/cityscapes.py
  • detectron2/data/datasets/builtin.py. For bdd100k, we used coco to register the data

Pre-trained files

Please download pre-trained parameters from Google Drive. Will be updated soon to cover all parameters)

You can find checkpoints required for both training and evaluation in the google drive. Some of the available parameters are:

  • RegionCLIP pretrained parameters
  • Text Embedding (VOC)
  • Text Embedding (Cityscapes)
  • Vision-to-Language Transformer
  • Real-to-Artistic Parameters
  • Adverse-Weather parameters

Training

  • Example for training a real-to-artistic generalization is available in faster_rcnn_voc.sh
  • Example of training an adverse-weather generalization is available in faster_rcnn_city.sh

Inference

During training, we evaluate all source and target domains. However, for inference only, please set the weights of the modules and add the flag --eval-only in the bash file.

ClipCap training

We have provided the pre-trained parameters for the ClipCap mapping-network here. If you wish to do the pre-training, please follow the following steps:

  • Follow the instructions on ClipCap to install the project and download the coco dataset.
  • Include the RegionCLIP2CLIP.py in the ClipCap repository.
  • Replace the parse_coco.py provided here with the one in the main reposotiry. The only difference is that we need to rename some of the parameters in the RegionClip encoder so that the naming format matches the CLIP's naming to successfully train the mapping network.
  • Then execute the following commands:
python parse_coco.py --clip_model_type RN50
python train.py --only_prefix --data ./data/coco/oscar_split_RN50_train.pkl --out_dir ./coco_train/ --mapping_type transformer  --num_layres 8 --prefix_length 40 --prefix_length_clip 40 --is_rn

Other Information

  • For training/inference of the RegionCLIP pre-trained model, please refer to here.

Citation

If you find this repo useful, please consider citing our paper:

@inproceedings{Malakouti_2023_BMVC,
author    = {Sina Malakouti and Adriana Kovashka},
title     = {Semi-Supervised Domain Generalization for Object Detection via Language-Guided Feature Alignment},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
url       = {https://papers.bmvc2023.org/0444.pdf}
}

Acknowledgement

This repo is based on Detectron2 and RegionCLIP repositories.

cddmsl's People

Contributors

sinamalakouti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cddmsl's Issues

Questions on the training of v2l.

Very nice work! One question is that, according to the paper, ClipCAP is pre-trained on COCO-captions using frozen RegionCLIP. However, there may exist domain gaps between COCO images and the datasets used in the paper, especially those stylized images. Does the gaps effect the pre-trained ClipCAP? Besides, it would be best to provide the complete configuration and the related code on the the training of ClipCAP with RegionCLIP.

problem about weights loading

Thanks for your great work! But there are still some problems I dont unstand.
There is 'p = self.cfg.MODEL.PRE_TRAINED_RCLIP_PATH' in train_loop.py, so what PRE_TRAINED_RCLIP_PATH is? Does it refers to 'MODEL.WEIGHTS' in faster_rcnn_city.sh?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.