Coder Social home page Coder Social logo

vitae-transformer / remote-sensing-rvsa Goto Github PK

View Code? Open in Web Editor NEW
390.0 6.0 29.0 4.61 MB

The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"

License: MIT License

Python 100.00%
deep-learning foundation-model object-detection remote-sensing self-supervised-learning semantic-segmentation transfer-learning

remote-sensing-rvsa's Introduction

Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

Di Wang, Qiming Zhang, Yufei Xu, Jing Zhang, Bo Du, Dacheng Tao and Liangpei Zhang

Updates | Introduction | Results & Models | Usage |

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Current applications

ViTAE: Please see ViTAE-Transformer;

VSA: Please see ViTAE-VSA;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing Pretraining: Please see ViTAE-Transformer-Remote-Sensing;

Updates

2023.10.18

RVSA won the highly cited paper!

2023.03.18

ViTAE-B + RVSA helped us win the championship of "High Resolution SAR Image Coastal Aquaculture Farm Segmentation Track" in "The 5th Gaofen Challenge", Team: TNT. (第五届“中科星图杯”国际高分遥感图像解译大赛高分辨率SAR图像中近海养殖场分割赛道冠军)News

2023.01.18

Our models have been supported by LuoJiaNET, please refer to RS-Vision-Foundation-Models for more details.

2022.11.21

The early access is available! TGRS link

2022.11.15

The arXiv has been updated! arXiv link

2022.11.06

The paper has been accepted by IEEE TGRS!

2022.10.11

The codes, configs and training logs of segmentation in fintuning are released!

2022.10.09

The codes, configs and training logs of detection in fintuning are released!

2022.10.08

The codes of pretraining and classification in fintuning are released!

2022.09.19

The codes and training logs of the VSA have been released, which is the foundation of our RVSA.

Introduction

This repository contains codes, models and test results for the paper "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model".

We resort to plain vision transformers with about 100M and make the first attempt to propose large vision models customized for RS tasks and propose a new rotated varied-size window attention (RVSA) to substitute the original full attention to handle the large image size and objects of various orientations in RS images. The RVSA could significantly reduce the computational cost and memory footprint while learn better object representation by extracting rich context from the generated diverse windows.

Fig.1 - The pipeline of pretraining and finetuning.

Fig.2 - The structure and block of the adopted plain vision transformer, and the proposed RVSA.

Results and Models

Pretraining

MillionAID

Pretrain Backbone Input size Params (M) Pretrained model
MAE ViT-B 224 × 224 86 Weights
MAE ViTAE-B 224 × 224 89 Weights

Object Detection

DOTA-V1.0 Single-Scale

Method Pretrain Backbone Lr schd mAP Config Log Model
Oriented R-CNN MAE ViT-B + RVSA 1x 78.75 Config Log Model
Oriented R-CNN MAE ViT-B + RVSA $^ \Diamond$ 1x 78.61 Config Log Model
Oriented R-CNN MAE ViTAE-B + RVSA 1x 78.96 Config Log Model
Oriented R-CNN MAE ViTAE-B + RVSA $^ \Diamond$ 1x 78.99 Config Log Model

DOTA-V1.0 Multi-Scale

Method Pretrain Backbone Lr schd mAP Config Log Model
Oriented R-CNN MAE ViT-B + RVSA 1x 81.01 Config Log Model
Oriented R-CNN MAE ViT-B + RVSA $^ \Diamond$ 1x 80.80 Config Log Model
Oriented R-CNN MAE ViTAE-B + RVSA 1x 81.24 Config Log Model
Oriented R-CNN MAE ViTAE-B + RVSA $^ \Diamond$ 1x 81.18 Config Log Model

DIOR-R

Method Pretrain Backbone Lr schd mAP Config Log Model
Oriented R-CNN MAE ViT-B + RVSA 1x 70.67 Config Log Model
Oriented R-CNN MAE ViT-B + RVSA $^ \Diamond$ 1x 70.85 Config Log Model
Oriented R-CNN MAE ViTAE-B + RVSA 1x 70.95 Config Log Model
Oriented R-CNN MAE ViTAE-B + RVSA $^ \Diamond$ 1x 71.05 Config Log Model

Scene Classification

Pretrain Backbone UCM-55 AID-28 AID-55 NWPU-19 NWPU-28
MAE ViT-B + RVSA 99.70 96.92 98.33 93.79 95.49
Model Model Model Model Model
MAE ViT-B + RVSA $^ \Diamond$ 99.58 96.86 98.44 93.74 95.45
Model Model Model Model Model
MAE ViTAE-B + RVSA 99.56 97.03 98.48 93.93 95.69
Model Model Model Model Model
MAE ViTAE-B + RVSA $^ \Diamond$ 99.50 97.01 98.50 93.92 95.66
Model Model Model Model Model

Semantic Segmentation

ISPRS Potsdam

Method Pretrain Backbone Crop size Lr schd OA Config Log Model
UperNet MAE ViT-B + RVSA 512 × 512 160k 90.60 Config Log Model
UperNet MAE ViT-B + RVSA $^ \Diamond$ 512 × 512 160k 90.77 Config Log Model
UperNet MAE ViTAE-B + RVSA 512 × 512 160k 91.22 Config Log Model
UperNet MAE ViTAE-B + RVSA $^ \Diamond$ 512 × 512 160k 91.15 Config Log Model

iSAID

Method Pretrain Backbone Crop size Lr schd mIOU Config Log Model
UperNet MAE ViT-B + RVSA 896 × 896 160k 63.76 Config Log Model
UperNet MAE ViT-B + RVSA $^ \Diamond$ 896 × 896 160k 63.85 Config Log Model
UperNet MAE ViTAE-B + RVSA 896 × 896 160k 63.48 Config Log Model
UperNet MAE ViTAE-B + RVSA $^ \Diamond$ 896 × 896 160k 64.49 Config Log Model

LoveDA

Method Pretrain Backbone Crop size Lr schd mIOU Config Log Model
UperNet MAE ViT-B + RVSA 512 × 512 160k 51.95 Config Log Model
UperNet MAE ViT-B + RVSA $^ \Diamond$ 512 × 512 160k 51.95 Config Log Model
UperNet MAE ViTAE-B + RVSA 512 × 512 160k 52.26 Config Log Model
UperNet MAE ViTAE-B + RVSA $^ \Diamond$ 512 × 512 160k 52.44 Config Log Model

Usage

Environment:

  • Python 3.8.5
  • Pytorch 1.9.0+cu111
  • torchvision 0.10.0+cu111
  • timm 0.4.12
  • mmcv-full 1.4.1

Pretraining & Finetuning-Classification

Pretraining (8 × A100 GPUs, 3~5 days)

  1. Preparing the MillionAID: Download the MillionAID. Here, we use previous train_labels.txt and valid_labels.txt of the RSP, which contain labels. However, since we conduct the unsupervised pretraining, the labels are not necessary. It is easy for users to record image names and revise corresponding codes MAEPretrain_SceneClassification/util/datasets.py/class MillionAIDDataset.

  2. Pretraining: take ViT-B as an example (batchsize: 2048=8*256)

python -m torch.distributed.launch --nproc_per_node 8 --master_port 10000 main_pretrain.py \
--dataset 'millionAID' --model 'mae_vit_base_patch16' \
--batch_size 256 --epochs 1600 --warmup_epochs 40 \
--input_size 224 --mask_ratio 0.75 \
--blr 1.5e-4  --weight_decay 0.05 --gpu_num 8 \
--output_dir '../mae-main/output/'

Note: Padding the convolutional kernel of PCM in the pretrained ViTAE-B with convertK1toK3.py for finetuning.

  1. Linear probe: an example of evaluating the pretrained ViT-B on UCM-55
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --master_port 10000 main_linprobe.py \
--dataset 'ucm' --model 'vit_base_patch16' \
--batch_size 256 --epochs 100 --warmup_epochs 10 \
--blr 1e-1  --weight_decay 0 --tag 0 \
--finetune '../mae-main/output/millionAID_224/1600_0.75_0.00015_0.05_2048/checkpoint-1599.pth'

Finetuning evaluation for pretraining & Finetuning-Classification

For instance, finetuning ViTAE-B + RVSA on NWPU-28

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --master_port 20000 main_finetune.py \
--dataset 'nwpu' --model 'vitae_nc_base_win_rvsa' --input_size 224 --postfix 'sota' \
--batch_size 64 --epochs 200 --warmup_epochs 5 \
--blr 1e-3  --weight_decay 0.05 --split 28 --tag 0 --exp_num 1 \
--finetune '../mae-main/output/mae_vitae_base_pretrn/millionAID_224/1600_0.75_0.00015_0.05_2048/checkpoint-1599-transform-no-average.pth'

Finetuning-Detection & Finetuning-Segmentation

Since we use OBBDetection and MMSegmenation to implement corresponding detection or segmentation models, we only provide necessary config and backbone files. The main frameworks are both in RSP

git clone https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing.git

The installation and dataset preparation can separately refer OBBDetection-installation and MMSegmentation-installation

Then put these files into corresponding folders.

For convenience, we preserve the relative path for users to find files.

For example, put ./Object Detection/mmdet/models/backbones/vit_win_rvsa_v3_wsz7.py into ViTAE-Transformer-Remote-Sensing/Object Detection/mmdet/models/backbones

Training-Detection

First, cd ./Object Detection

Then, we provide several examples. For instance,

Training the Oriented-RCNN with ViT-B + RVSA on DOTA-V1.0 multi-scale detection dataset with 2 GPUs

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=40000 tools/train.py \
configs/obb/oriented_rcnn/vit_base_win/faster_rcnn_orpn_our_rsp_vit-base-win-rvsa_v3_wsz7_fpn_1x_dota10_ms_lr1e-4_ldr75_dpr15.py \
--launcher 'pytorch' --options 'find_unused_parameters'=True

Training the Oriented-RCNN with ViTAE-B + RVSA $^ \Diamond$ backbone on DIOR-R detection dataset with 1 GPU

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=40001 tools/train.py \
configs/obb/oriented_rcnn/vit_base_win/faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_kvdiff_wsz7_fpn_1x_dior_lr1e-4_ldr75_dpr10.py \
--launcher 'pytorch' --options 'find_unused_parameters'=True

Inference-Detection

Predicting the saving detection map using ViT-B + RVSA $^ \Diamond$ on DOTA-V1.0 scale-scale detection dataset

CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/obb/oriented_rcnn/vit_base_win/faster_rcnn_orpn_our_rsp_vit-base-win-rvsa_v3_kvdiff_wsz7_fpn_1x_dota10_lr1e-4_ldr75_dpr15.py \
../OBBDetection/work_dirs/faster/faster_rcnn_orpn_our_rsp_vit-base-win-rvsa_v3_kvdiff_wsz7_fpn_1x_dota10_lr1e-4_ldr75_dpr15/latest.pth \
--format-only --show-dir work_dirs/save/faster/display/faster_rcnn_orpn_our_rsp_vit-base-win-rvsa_v3_kvdiff_wsz7_fpn_1x_dota10_lr1e-4_ldr75_dpr15 \
--options save_dir='work_dirs/save/faster/full_det/faster_rcnn_orpn_our_rsp_vit-base-win-rvsa_v3_kvdiff_wsz7_fpn_1x_dota10_lr1e-4_ldr75_dpr15' nproc=1

Evaluating the detection maps predicted by ViTAE-B + RVSA on DIOR-R dataset

CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/obb/oriented_rcnn/vit_base_win/faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_wsz7_fpn_1x_dior_lr1e-4_ldr75_dpr10.py \
../OBBDetection/work_dirs/faster/faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_wsz7_fpn_1x_dior_lr1e-4_ldr75_dpr10/latest.pth \
--out work_dirs/save/faster/full_det/faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_wsz7_fpn_1x_dior_lr1e-4_ldr75_dpr10/det_result.pkl --eval 'mAP' \
--show-dir work_dirs/save/faster/display/faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_wsz7_fpn_1x_dior_lr1e-4_ldr75_dpr10

Note: the pathes of saved maps and outputs should be constructed before evaluating the DIOR-R testing set.

Training & Evaluation-Segmentation

cd ./Semantic Segmentation

Training and evaluation the UperNet with ViT-B + RVSA backbone on Potsdam dataset:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=30000 tools/train.py \
configs/vit_base_win/upernet_vit_base_win_rvsa_v3_512x512_160k_potsdam_rgb_dpr10_lr6e5_lrd90_ps16_class5_ignore5.py \
--launcher 'pytorch' --cfg-options 'find_unused_parameters'=True

Note: when training on the LoveDA, please add --no-validate

Inference the LoveDA dataset for online evaluation using the UperNet with ViTAE-B + RVSA $^ \Diamond$ backbone

CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/vit_base_win/upernet_vitae_nc_base_rvsa_v3_kvdiff_wsz7_512x512_160k_loveda_dpr10_lr6e5_lrd90_ps16.py \
../mmsegmentation-master/work_dirs/upernet_vitae_nc_base_rvsa_v3_kvdiff_wsz7_512x512_160k_loveda_dpr10_lr6e5_lrd90_ps16/latest.pth \
--format-only --eval-options imgfile_prefix="work_dirs/display/upernet_vitae_nc_base_rvsa_v3_kvdiff_wsz7_512x512_160k_loveda_dpr10_lr6e5_lrd90_ps16/result" \
--show-dir work_dirs/display/upernet_vitae_nc_base_rvsa_v3_kvdiff_wsz7_512x512_160k_loveda_dpr10_lr6e5_lrd90_ps16/rgb

When finetuning with more than one GPU for detection or segmentation, please use nn.SyncBatchNorm in the NormalCell of ViTAE models.

Citation

If this repo is useful for your research, please consider citation

@ARTICLE{rvsa,
  author={Wang, Di and Zhang, Qiming and Xu, Yufei and Zhang, Jing and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={Advancing Plain Vision Transformer Toward Remote Sensing Foundation Model}, 
  year={2023},
  volume={61},
  number={},
  pages={1-15},
  doi={10.1109/TGRS.2022.3222818}
 }

@ARTICLE{rsp,
  author={Wang, Di and Zhang, Jing and Du, Bo and Xia, Gui-Song and Tao, Dacheng},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={An Empirical Study of Remote Sensing Pretraining}, 
  year={2023},
  volume={61},
  number={},
  pages={1-20},
  doi={10.1109/TGRS.2022.3176603}
}

Statement

This project is under MIT licence. For any other questions please contact di.wang at gmail.com .

References

The codes of Pretraining & Scene Classification part mainly from MAE.

Relevant Projects

[1] An Empirical Study of Remote Sensing Pretraining, IEEE TGRS, 2022 | Paper | Github
     Di Wang, Jing Zhang, Bo Du, Gui-Song Xia and Dacheng Tao

remote-sensing-rvsa's People

Contributors

dotwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

remote-sensing-rvsa's Issues

Clarification about installation for segmentation

Hi, thank you for an amazing model. I'm trying to install the model over the clean mmsegmentation installed from their repo.

Then put these files into corresponding folders.

I've put all of the files from Semantic Segmentation folder to the root of my mmsegmentation installation. Should I do pip install -v -e . again after I added the new files from your repo to mmsegmentation folder?

Also, please indicate where to put the files from Semantic Segmentation/mmcv_custom. Do we just leave it at mmsegmentation folder root? Or we should install mmcv from source?

KeyError: 'OrientedRCNN is not in the models registry'

image

I have run the inference command using the following command:
python image_demo.py demo/demo.jpg
configs/obb/oriented_rcnn/vit_base_win/faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_wsz7_fpn_1x_dota10_ms_lr1e-4
_ldr75_dpr15.py
checkpoints/vitae_rvsa_new.pth
--device cpu

Its using image_demo.py file from OBBDetection.
I am using MMCV version 1.6.2 and python version 3.9. How to reslove this error?

Pretrained Model Weights

您好,是否方便更新一个百度盘或者谷歌盘版本的Pretrained Model Weights ? Onedrive分享的文件已经无法打开,可能是由于达到了分享上限或者是微软服务比较不稳定。

Hello, are you able to update a Baidu or Google Drive version of the Pretrained Model Weights? The file shared on OneDrive can no longer be opened or downloaded (so sad), possibly due to reaching the sharing limit or instability with the Microsoft service

目标检测识别任务的训练评估结果复现不成功。能够成功复现obbdetection仓库oriented_rcnn的训练评估结果,但vitae-rvsa_dota10_ms的训练评估效果不理想

如题,根据vitae在目标检测识别上的构建方式,需要先搭建部署obbdetection的环境。我先搭好了obbdetection,准备dotav1.0的数据集,运行oriented_rcnn的faster_rcnn_orpn_r50_ms_rr_dota10.py进行训练以及评估,得到下图的效果
resnetobb
模型在dotav1.0上进行目标检测效果不错,说明obbdetection环境安装没问题。

随后我将remote-sensing-rvsa中object detection中的文件移动到obbdetection目录的相应位置,根据本项目readme中的命令行尝试运行faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_wsz7_fpn_1x_dota10_ms.py时,在backbone/init.py函数会报关于找不到"Swin"和另外几个模型的错误,我将这部分模型在__init__.py中删除后,可以正常训练评估,但效果如下图,
12
复现不出论文中效果。

1)先安装obbdetection,再进行本项目的代码文件和模型的移动复制,这部分流程应该是没有错误的。而后对于backbone/init.py里缺少的模型文件应该如何处理
2)项目时间也比较久了,复现过程在readme中能否详细一些。
3)这个复现结果不正确,可能是有什么原因造成的呢

ViTAE_NC_Win_RVSA_V3_WSZ7预训练权重加载

当我加载vitae-b-checkpoint-1599-transform-no-average.pth时,出现了如下错误,我使用的数据集时potsdam
Error(s) in loading state_dict for ViTAE_NC_Win_RVSA_V3_WSZ7:
size mismatch for pos_embed: copying a param with shape torch.Size([1, 197, 768]) from checkpoint, the shape in current model is torch.Size([1, 1024, 768]).

` def init_weights(self, pretrained=None):
"""Initialize the weights in backbone.

    Args:
        pretrained (str, optional): Path to pre-trained weights.
            Defaults to None.
    """
    pretrained = pretrained or self.pretrained
    def _init_weights(m):
        if isinstance(m, nn.Linear):
            trunc_normal_(m.weight, std=.02)
            if isinstance(m, nn.Linear) and m.bias is not None:
                nn.init.constant_(m.bias, 0)
        elif isinstance(m, nn.LayerNorm):
            nn.init.constant_(m.bias, 0)
            nn.init.constant_(m.weight, 1.0)

    if isinstance(pretrained, str):
        self.apply(_init_weights)
        logger = get_root_logger()
        print(f"load from {pretrained}")
        checkpoint = _load_checkpoint(self.pretrained, logger=logger,map_location='cpu')
        if 'state_dict' in checkpoint:
            state_dict = checkpoint['state_dict']
        elif 'model' in checkpoint:
            state_dict = checkpoint['model']
        else:
            state_dict = checkpoint
        self.load_state_dict(state_dict, False)
    elif pretrained is None:
        self.apply(_init_weights)
    else:
        raise TypeError('pretrained must be a str or None')`

LoveDA dataset training proble

It is written in the paper that train and val sets of LoveDA are combined for training, but the downloaded test set is not marked. How do you use the test set to evaluate?
image

I reproduced the code with an accuracy of only 54

The original code version is too old, so I reproduced the code to the new mmrotate version. I loaded the weights you provided and it went fine, the result was an accuracy of 68 on the validation set and 54 on the test set.

I don't know where the problem is, the weight file is loaded smoothly, I have also checked the configuration file parameters, but I just don't know what went wrong.If there is a problem with my reproduction, the final result should be 0, and it should not be as high as 54

dataset_type = 'DOTADataset'
data_root = '/data/facias/DOTA/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

angle_version = 'le90'
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version=angle_version),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(type='RRandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'train_split/labelTXt/',
        img_prefix=data_root + 'train_split/images/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'val_split/labelTxt/',
        img_prefix=data_root + 'val_split/images/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        test_mode=True,   #若test数据集没有标注,则设为True
        ann_file=data_root + 'test_split/images/',
        img_prefix=data_root + 'test_split/images/',
        pipeline=test_pipeline))

model = dict(
    type='OrientedRCNN',
    backbone=dict(
        type='ViT_Win_RVSA_V3_WSZ7',
        img_size=1024,
        embed_dim=768,
        depth=12,
        num_heads=12,
        mlp_ratio=4,
        qkv_bias=True,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.15,
        use_abs_pos_emb=True),
    neck=dict(
        type='FPN',
        in_channels=[768, 768, 768, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='OrientedRPNHead',
        in_channels=256,
        feat_channels=256,
        version=angle_version,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='MidpointOffsetCoder',
            angle_range=angle_version,
            target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='OrientedStandardRoIHead',
        bbox_roi_extractor=dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='RotatedShared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=15,
            bbox_coder=dict(
                type='DeltaXYWHAOBBoxCoder',
                angle_range=angle_version,
                norm_factor=None,
                edge_swap=True,
                proj_xy=True,
                target_means=(.0, .0, .0, .0, .0),
                target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)),
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                iou_calculator=dict(type='RBboxOverlaps2D'),
                ignore_iof_thr=-1),
            sampler=dict(
                type='RRandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            nms_pre=2000,
            min_bbox_size=0,
            score_thr=0.05,
            nms=dict(iou_thr=0.1),
            max_per_img=2000)))
# evaluation
evaluation = dict(interval=1, metric='mAP')
# optimizer
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)

# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable

dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

# disable opencv multithreading to avoid system being overloaded
opencv_num_threads = 0
# set multi-process start method as `fork` to speed up the training
mp_start_method = 'fork'

Problem about vit_win_rvsa_kvdiff_wsz7.py-def calc_rel_pos_spatial

q[:, :, sp_idx:] torch.Size([2, 12, 1280, 64])

r_q = q[:, :, sp_idx:].reshape(B, n_head, q_h, q_w, dim) # B, H, qwh, qww, C
RuntimeError: shape '[2, 12, 7, 7, 64]' is invalid for input of size 1966080

When I dont use the RVSA,Just simple use window-attention
vit_win_rvsa_kvdiff_wsz7.py line 182 have an error.

I found this error because the Q in QKV does not split windows
I hope this error can be corrected

图片大小不一样

这个可以处理非标准遥感影像吗,还是说每次都得映射到同一个标准大小

ORCNN required?

Attempting to use this w/ detectron2. Training metrics look good, but completely fails in evaluation. Currently debugging, but would there be any reason why this couldn't work with a regular RPN and ROI heads versus oriented?

推理得到的结果如何计算mIou指标

您好,我想在inference的时候将得到的result与gt计算miou等相关指标。现在我使用在potsdam上预训练的权重,在potsdam的验证集上进行推理,然后使用../mmseg/core/evaluation/metrics.py,计算得到的{'aAcc': array(0.04249801), 'IoU': array([0.06536244, 0.00681113, 0.0036782 , 0.02882337, 0.00158446]), 'Acc': array([0.34386185, 0.01245181, 0.00750425, 0.05191632, 0.00163203])},这似乎与您公布的在potsdam上的OA 91.1的相差甚远,我不太清楚该如何解决。期待您的回复。
以下是我的推理代码:
`image_root = "/data/user5/potsdam/img_dir/val"
ann_root = "/data/user5/potsdam/ann_dir/val"
image_list = os.listdir(image_root)
device = "cuda" if torch.cuda.is_available() else "cpu"
config = "../configs/vit_base_win/upernet_vitae_nc_base_rvsa_v3_kvdiff_wsz7_512x512_160k_potsdam_rgb_dpr10_lr6e5_lrd90_ps16_class5_ignore5.py"
checkpoint = "../pretrain_model/potsdam/vitae_rvsa_kvdiff.pth"
seg_model = init_segmentor(config, checkpoint, device=device)
num_classes = 5
ignore_index = 5
results = []
labels = []
for image in image_list:
# print(image)
image_path = os.path.join(image_root, image)
label_path = os.path.join(ann_root, image)
label = cv2.imread(label_path, 0)
_, masks = inference_segmentor(seg_model, image_path)
# print(masks[0])
copy_masks = masks[0]
results.append(copy_masks)
labels.append(label)

ret_metrics = eval_metrics( results, labels, num_classes, ignore_index, metrics='mIoU')
print(ret_metrics)
`

代码中value角度变换问题

sampling_angle_v = self.sampling_angles_v(x)
sampling_angle_v = sampling_angle_k.reshape(num_predict_total, 1, window_num_h, window_num_w)
这里代码是不是应该为
sampling_angle_v = self.sampling_angles_v(x)
sampling_angle_v = sampling_angle_v.reshape(num_predict_total, 1, window_num_h, window_num_w)

[Quesntion] When can I see config file?

Thank you for your contribution of deep learning research remote sensing field.
I'm very happy to find your research about self supervised learning in remote sensing field.

But I want to know about your config file of detection in dota and dior dataset.
Regards,
Kevin Cha.

预训练模型

有直接用于提取遥感影像特征的预训练模型吗,我想的是提取特征,之后融合其他特征用于自己的一个回归任务。

预训练模型

请问有没有MAE预训练ViT-B的backbone文件?只找到加上了RVSA之后的backbone文件。
谢谢!

potsdam_vitae_rvsa_kvidff权重推理时的精度差异

您好,我在使用RVSA仓库中所给出的potsdam_vitae_rvsa_kvidff.pth权重进行推理时,结果有所出入,config文件除data_root以外未作修改。我的结果如下:
image
RVSA仓库中的log日志结果为 "aAcc": 0.9115, "mIoU": 0.8307, "mAcc": 0.9005, "mFscore": 0.9061, "mPrecision": 0.9124, "mRecall": 0.9005;两者所有指标差0.3%左右,我不太确定这是否可以认为两个结果是对齐的。
我的测试集是使用mmseg官方脚本对‘2_Ortho_RGB.zip'’和‘5_Labels_all.zip'’进行划分,最后得到2016张512x512的测试集。虽然我猜测与测试集划分相关,但使用rsp_r50权重进行推理时,精度是能基本对齐的。ViTAE-Transformer/RSP#15
我确实不太理解造成这种状况的原因,期待您的回复。

mmcv版本问题

您好,mmcv已经升级到最新版本了,您代码中的mmcv_custom中的代码还是基于mmcv低版本写的,您能更新下代码吗?

Problem:MillionAID dataset

请问作者,文中使用的Million数据集在官网下载地址中,train文件只有1w张图片 1.82G,与所提到的MillionAID有百万图片不符,请问具体实验是使用多少图片做训练和测试的?

微信图片_20230710181445 期待您的回复,谢谢解答

Problems in the use of pre training model

Hello, first of all, thank you for your amazing results and grateful for providing the code!
After changing the network model, using the pre-trained model (tried both vit-base and vitae-base pre-training models) for training, many parameters cannot be matched, resulting in the final mAP drop (lower than the unmodified network structure).
I wonder whether the decrease in mAP is caused by changing the network structure or the lack of pre-training weight parameters? I wonder if it is necessary to re-do pre-training on Million AID after changing the network structure? Due to the large time cost and equipment cost of pre-training, it has not been tried.

UCM dataset train/test split.

Could you please provide the txt file taht split the UCM dataset or is there an official way to split the dataset?

关于 vit-base's fine-tuning 的问题

我使用作者你在https://github.com/ViTAE-Transformer/RSP
的mmseg训练代码来微调vit-base,复现你的结果。在potsdam上微调观察训练日志我发现了一个问题,中间测试结果和作者你的不一样,请问作者你改动什么超参数了吗。ps: 作者你给出的环境我都用一样的,potsdam使用RSP版本的mmseg处理的。下面是我的部分日志
image
这是作者你的日志
image
不同之处在于impervious_surface的问题,我考虑是train的reduce_zero_label参数的问题,但是修改后训练会报错,所以想请问你一下
修改了哪些超参数呢?或者是其他的问题?

Model pth

Hi,

May I ask what are vit_rvsa.pth, vitae_rvsa.pth, vit_rvsa_kvdiff.pth, vitae_rvsa_kvdiff.pth?
At first, I thought the 'Model' column in each downstream task contains checkpoints for traning model on different dataset, but it turns out that they are the same for different tasks and datasets.

Thank you.

load pretrained backbone weights?

Hello, if it is possible could you provide an example on how to load the pretrained weights of, for example, the ViTAE-B backbone on the pretraining model trained on the MillionAID dataset? (file: vitae-b-checkpoint-1599-transform-no-average-pth)

Also, can i expect that the size of the features array computed will be [1,768] for an image of size [1,3,224,224]?

Thank you very much

[resume error] loaded state dict has a different number of parameter groups

First of all, thanks for your great work! i already reproduced the train on mmrotae(version: 0.3.4)

  File "D:\Programs\Python\mmlabseries\lib\site-packages\torch\optim\optimizer.py", line 140, in load_state_dict
    raise ValueError("loaded state dict has a different number of "
ValueError: loaded state dict has a different number of parameter groups

when i want to continue train on latest.pth model , it gets an error, i debug and found
image
image

how to fix it?

Questions about labels for the MillionAID dataset

Hello. This is a very valuable work. I have a question to ask you. In the MillionAIDdataset used in this paper, only 10,000 images are given classification labels, while the remaining 0.99 million images are not given classification labels, perhaps because I did not find this. Where can I find the classification labels for the remaining 0.99 million pictures?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.