vitae-transformer / vitae-transformer-remote-sensing Goto Github PK

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

TeX 100.00%

remote-sensing deep-learning change-detection classification object-detection self-supervised-learning semantic-segmentation transfer-learning vision-transformer

vitae-transformer-remote-sensing's Introduction

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Updates | Introduction | Statement |

Current applications

Image Classification: Please see ViTAE-Transformer for image classification;

Object Detection: Please see ViTAE-Transformer for object detection;

Sementic Segmentation: Please see ViTAE-Transformer for semantic segmentation;

Animal Pose Estimation: Please see ViTAE-Transformer for animal pose estimation;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing: Please see ViTAE-Transformer for Remote Sensing;

Updates

09/04/2021

The pretrained models for ViTAE on matting and remote sensing are released! Please try and have fun!

24/03/2021

The pretrained models for both ViTAE and ViTAEv2 are released. The code for downstream tasks are also provided for reference.

07/12/2021

The code is released!

19/10/2021

The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021

The paper is post on arxiv! The code will be made public available once cleaned up.

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers. In ViTAEv2, we explore the usage of window attentions without shift operations to obtain a better balance between memory footprint, speed, and performance. We also stack the proposed RC and NC in a multi-stage manner to faciliate the learning on other vision tasks including detection, segmentation, and pose.

Fig.1 - The details of RC and NC design in ViTAE.

Fig.2 - The multi-stage design of ViTAEv2.

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Citing ViTAE and ViTAEv2

@article{xu2021vitae,
  title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
  author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
@article{zhang2022vitaev2,
  title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
  author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
  journal={arXiv preprint arXiv:2202.10108},
  year={2022}
}

vitae-transformer-remote-sensing's People

Contributors

Stargazers

Watchers

Forkers

gsiogkas biqiwhu chenhongruixuan mfkiwl nokiben cv-ip li-qingyun marszhaoyt edgscout smallsnailrunning vic-torr omkarthawakar hongbo-sun bai-due janewuneu wr19960001 trellixvulnteam forest-repo k296070 study-ml-cv-nlp-slam sherwincn armanfahradyan anmyles bllxk androbaza avtsoof qiaowenfan dou3516 vidsgr kayfour anh-vunguyen hustyhm zdw9915 xie-muxi terrencepai fengzx99 xiaolingis asahazmy luna10325 safgen sanoojan gao-rh zhangxy1999 smyucas dekopontree lwk1542 yanxingliu yijiecaoming hsutachuan vegetable2dog hellopoohpooh

vitae-transformer-remote-sensing's Issues

About the IMP weights on change detection

Hello, @DotWang. Your work is great. The results in your paper show that the bit with IMP-ViTAEv2-S weights performs best. So I wonder whether the pretrained weights from IMP on change detection will be released. Thank you very much.

ModuleNotFoundError: No module named 'mmdet.version'

Traceback (most recent call last):
File "/home/dgx/workspace/cui/ViTAE/tools/train.py", line 13, in
from mmdet.apis import set_random_seed, train_detector
File "/home/dgx/workspace/cui/ViTAE/mmdet/init.py", line 1, in
from .version import version, short_version
ModuleNotFoundError: No module named 'mmdet.version'

in the mmdet/init.py, I found the code to be written like this

from .version import version, short_version

all = ['version', 'short_version']

but the .version is not the python file, in the .version file, It is only one line of code
2.2.0

Can't find the hrsc2016 in configs/_base_/datasets?

KeyError: "EncoderDecoder: 'ViTAE_Window_NoShift_basic is not in the models registry'"

作者您好，我在尝试复现您在论文中，在Potsdam数据集上的实验时，运行以下代码
python -m torch.distributed.launch --nproc_per_node=1 --master_port=40001 tools/train.py configs/vitae_win/upernet_vitae_win_imp_window7_512x512_80k_potsdam.py --launcher 'pytorch'，
但是出现了KeyError: "EncoderDecoder: 'ViTAE_Window_NoShift_basic is not in the models registry'"的问题，请问这是为什么呢？有什么办法解决吗？
我还想请教一下如何正确复现您的实验呢？

About Reproducing Training of Remote Sensing Semantic Segmentation Models

Hello, I am interested in the remote sensing semantic segmentation of this project. I have downloaded all the relevant Potsdam datasets and configured the program running environment, but the reproduction steps of the entire training are still unclear. The downloaded public datasets Whether preprocessing is required, images and label images need to be cut into small pieces. If you don't do distributed training, you can simply remove the distributed settings when training on one card. Do you have a more detailed description of the training steps? The description of the training parameter configuration config file allows us to reproduce the model training. Thank you.

数据集处理问题

感谢您提出了这么优秀的模型！我在使用您的模型框架对我的数据进行训练的过程中，测试结果为‘forest land’这一类标签他的IoU=0，Acc=0，Fscore=nan，precision=nan，影响了我的测试结果，经过debug发现可能是标签设置过程中forest land这一类标签没有打上去，该如何修改呢

变化检测预训练权重问题

如图，加载预训练权重，报错，没遇到过这样的错误，还请解惑一下。具体报错：

use one image to test issues

想问一下，我在利用您给的ViTAEv2模型做语义分割测试时，发现所给的图片无法通过测试，问题出在ReductionCell.py的assert N==HW，但我在利用另外两个权重对该图片进行测试时均可正常产生结果。
1.所以该ViTAEv2模型是否只支持图片大小为2的倍数的情况呢？
2.当我使用1024512的png图片进行预测时，依旧会出问题，在ViTAE_Window_NoShiftt/base_model.py的outs.append(x.view(b,wh,wh,-1).permute(0,3,1,2))处也会出现size不一致的问题

我是用的权重如下图

如果您有空回答，十分感谢

模型预训练权重在哪下载

你好！我在训练是发生找不到权重 VitAE_window/output/ViTAE_Window_NoShift_12_basic_stages4_14_224/epoch100/ViTAE_Window_NoShift_12_basic_stages4_14/default/ckpt.pth

Reproduce the SeCo DOTA result.

Hi~I'm recently working on some comparison experiments. When I fine-tune the official SeCo pre-trained model (SeCo-1M) on DOTA objection detection tasks. The test set mAP result was much lower than the paper's (TABLE VIII 70.07).

I strictly followed the experimental setup in the paper, but instead of OBBDetection I used mmrotate, and the difference is, I think, not that big.

Do you have any suggestions for reproduction? Thanks~

The mmrotate config which I use is given blow:

angle_version = 'le90'
dataset_type = 'DOTADataset'
data_root = '../DOTA'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version='le90'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='DOTADataset',
        ann_file=data_root + "/trainVal/annfiles",
        img_prefix=data_root + "/trainVal/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(
                type='RRandomFlip',
                flip_ratio=[0.25, 0.25, 0.25],
                direction=['horizontal', 'vertical', 'diagonal'],
                version='le90'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ]),
    val=dict(
        type='DOTADataset',
        ann_file=data_root + "/trainVal/annfiles",
        img_prefix=data_root + "/trainVal/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='DOTADataset',
        ann_file=data_root + "/test/annfiles",
        img_prefix=data_root + "/tests/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='mAP')
optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(
    type='Fp16OptimizerHook',
    distributed=False,
    grad_clip=dict(max_norm=35.0, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
model = dict(
    type='OrientedRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(
            type='Pretrained',
            checkpoint='../pretrain_checkpoint/SeCo1m.pth')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='OrientedRPNHead',
        in_channels=256,
        feat_channels=256,
        version='le90',
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='MidpointOffsetCoder',
            angle_range='le90',
            target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='OrientedStandardRoIHead',
        bbox_roi_extractor=dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='RotatedShared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=15,
            bbox_coder=dict(
                type='DeltaXYWHAOBBoxCoder',
                angle_range='le90',
                norm_factor=None,
                edge_swap=True,
                proj_xy=True,
                target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
                target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)),
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                iou_calculator=dict(type='RBboxOverlaps2D'),
                ignore_iof_thr=-1),
            sampler=dict(
                type='RRandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            nms_pre=2000,
            min_bbox_size=0,
            score_thr=0.05,
            nms=dict(iou_thr=0.1),
            max_per_img=2000)))
work_dir = './seco_result'
auto_resume = False
gpu_ids = [0]

What does this ‘BIT’ abbreviation stand for

hello，great job. What does this ‘BIT’ abbreviation stand for? ths.

About MillionAID dataset

I downloaded the MillionAID dataset from the official homepage: MillionAID. The training set was found to have only 10K images. The test set is not labeled. May I know how the pre-training data in the paper was obtained?

Where is the train_labels_{}_{}.txt for scene recognition?

I am running the code of scene recognition and I have downloaded the AID, UCM, and NWPU datasets from their official webpages.

But there are no train_labels_{}{}.txt in these datasets. Where are train_labels{}_{}.txt used in your code?

questions about exp. of semantic seg.

Hi, thanks for your great work and codebase.

The batch size is 8 in the paper, and 4 in the config of Swin-T-IMP+UperNet.
And I did not find any description of num_gpu for the semantic seg. subsection.
In the README.md of semantic seg., the command:

python -m torch.distributed.launch --nproc_per_node=1 --master_port=40001 tools/train.py \
    configs/upernet/upernet_our_r50_512x512_80k_potsdam_epoch300.py \
    --launcher 'pytorch'

which seems to set num_gpus_per_node as 1? or your command is for 2 single GPU nodes and batch_size 4 for each (2x4)?

模型训练问题

用的mmseg原生的potsdam.py预处理的
参照#9 用的potsdam_ori.py
config
upernet_swin_tiny_patch4_window7_512x512_80k_potsdam
参数未改
batch_size = 8
model = dict(
pretrained='checkpoint/upernet-rsp-swin-t-potsdam-latest.pth')
pipline跟data中的
reduce_zero_label = True

2023-03-10 00:13:21,947 - mmseg - INFO - Iter [80000/80000] lr: 7.500e-10, eta: 0:00:00, time: 0.378, data_time: 0.003, memory: 15450, decode.loss_ce: 0.1454, decode.acc_seg: 74.9332, aux.loss_ce: 0.0674, aux.acc_seg: 74.0879, loss: 0.2129

+--------------------+-------+-------+--------+-----------+--------+
| Class | IoU | Acc | Fscore | Precision | Recall |
+--------------------+-------+-------+--------+-----------+--------+
| impervious_surface | 82.19 | 92.91 | 90.23 | 87.69 | 92.91 |
| building | 91.04 | 97.03 | 95.31 | 93.64 | 97.03 |
| low_vegetation | 70.99 | 88.3 | 83.03 | 78.36 | 88.3 |
| tree | 73.7 | 83.01 | 84.86 | 86.8 | 83.01 |
| car | 81.17 | 89.05 | 89.61 | 90.17 | 89.05 |
| clutter | 0.0 | 0.0 | nan | nan | 0.0 |
+--------------------+-------+-------+--------+-----------+--------+
2023-03-10 00:13:54,226 - mmseg - INFO - Summary:
2023-03-10 00:13:54,226 - mmseg - INFO -
+-------+-------+-------+---------+------------+---------+
| aAcc | mIoU | mAcc | mFscore | mPrecision | mRecall |
+-------+-------+-------+---------+------------+---------+
| 86.86 | 66.52 | 75.05 | 88.61 | 87.33 | 75.05 |
+-------+-------+-------+---------+------------+---------+

F1 88.61
请问，哪里出问题了么
感谢:)

mmcv版本问题

感谢您提出了这么优秀的模型！我在尝试使用您的模型框架时发现，您的代码版本已经和现在mmcv2.X版本的不匹配，您能提供下你当时所使用的版本吗？

What are the differences between 'Your_ResNet' and MMCV's ResNet vb/vc/vd

I find your config file the backbone network is 'our_resnet'. I see the code of your resnet and want to know what are the differences between general resnet and yours? Could I load the checkpoint file in mmcv's resnet vb/vc/vd directly？

reproduce problem about swin-t in scene classification.

Hi, I try to follow your hyperparameters to reproduce the classification results in misclassification, but I train aid (2:8) using max_epochs=200, base_lr=5e-4, and other settings following:
base = [
# '../base/models/swin_transformer/base_224.py',
# "../base/datasets/ucmerced_landuse_bs64_swin_224.py",
"../base/datasets/aid_bs64_autoaug.py",
"../base/schedules/imagenet_bs64_adamw_swin.py",
"../base/default_runtime.py",
]

refer to SimMIM paper

ADJUST_FACTOR = 1.0
BATCH_SIZE = 64
BASE_LR = 5e-4 * ADJUST_FACTOR # todo: adjust.
WARMUP_LR = 5e-7 * ADJUST_FACTOR
MIN_LR = 5e-6 * ADJUST_FACTOR
NUM_GPUS = 1
DROP_PATH_RATE = 0.2
SCALE_FACTOR = 512.0
MAX_EPOCHS = 200

model settings

model = dict(
type="ImageClassifier",
backbone=dict(
type="SwinTransformer",
# arch="base",
arch="tiny",
img_size=224,
# drop_path_rate=0.1, # DROP_PATH_RATE
drop_path_rate=DROP_PATH_RATE,
),
neck=dict(type="GlobalAveragePooling"),
head=dict(
type="LinearClsHead",
num_classes=21,
# in_channels=1024,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(type="LabelSmoothLoss", label_smooth_val=0.1, mode="original"),
cal_acc=False,
),
init_cfg=[
dict(type="TruncNormal", layer="Linear", std=0.02, bias=0.0),
dict(type="Constant", layer="LayerNorm", val=1.0, bias=0.0),
],
train_cfg=dict(
augments=[
dict(type="BatchMixup", alpha=0.8, num_classes=21, prob=0.5),
dict(type="BatchCutMix", alpha=1.0, num_classes=21, prob=0.5),
]
),
)

optimizer

paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys={
".absolute_pos_embed": dict(decay_mult=0.0),
".relative_position_bias_table": dict(decay_mult=0.0),
},
)

optimizer = dict(
type="AdamW",
# lr=1e-3 * 64 / 256, # 5e-4 * 64 / 512, # 1e-3 * 64 / 256,
# lr=1.25e-3 * 96 * 1 / 512.0,
# BASE_LR * BATCH_SIZE * NUM_GPUS / 512.0, # 1e-3 * 64 / 256,
lr=BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999),
paramwise_cfg=paramwise_cfg,
)
optimizer_config = dict(grad_clip=dict(max_norm=5.0))

learning policy

lr_config = dict(
policy="CosineAnnealing",
# min_lr=2.5e-7,
# by_epoch=False, # todo: try
by_epoch=False,
# min_lr_ratio=(2.5e-7 * 96 * 1 / 512.0) / (1.25e-3 * 96 * 1 / 512.0), # 1e-2,
min_lr_ratio=(MIN_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR)
/ (BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR),
# min_lr=2.5e-7, # MIN_LR,
warmup="linear",
# warmup_ratio=(2.5e-7 * 96 * 1 / 512.0) / (1.25e-3 * 96 * 1 / 512.0), # 1e-3,
warmup_ratio=(WARMUP_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR)
/ (BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR),
# warmup_lr=2.5e-7, # WARMUP_LR,
warmup_iters=20, # todo: 0
warmup_by_epoch=True,
)

checkpoint_config = dict(interval=MAX_EPOCHS // 10)
evaluation = dict(
interval=MAX_EPOCHS // 10, metric="accuracy", save_best="auto"
) # save the checkpoint with highest accuracy
runner = dict(type="EpochBasedRunner", max_epochs=MAX_EPOCHS)

data = dict(samples_per_gpu=96, workers_per_gpu=8,)

data = dict(samples_per_gpu=BATCH_SIZE, workers_per_gpu=8,)

fp16 settings

fp16 = dict(loss_scale="dynamic")

so could you help me with this? or provide your training log?
Thanks!

About Labels of Million-AID Dataset

The original split of MillionAID is used for recognition. Our study is about pretraining, so we resplit the training and testing sets. The obtained training set is relatively large for transferring the pretrained weights to downstream tasks. All RSP pretrained weights are available at https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing/blob/main/README.md

Thank you very much. Your work is very inspiring to us. We are working to do some research in the field of remote sensing image pre-training and would like to use your work as a baseline. However, we found that the open source MillionAID data has less annotated data than in your paper, so we propose this issue.

Originally posted by @pUmpKin-Co in #3 (comment)

Hello,
I wasn't able to understand what the conclusion is.

we resplit the training and testing sets.

I think the annotated data is needed to use all images and split the images again. Is the labeled data provided for test data? I mean, in my understanding, we need both "train_label.txt" and "valid_label.txt" to use million-AID, but I don't know where I can download them. I appreciate your help.

Semantic Segmentation: Potsdam 数据集复现性能差距有点大

未修改 ’upernet_vitae_win_window7_512x512_80k_potsdam_epoch100.py ‘参数，分别使用
RGB+ Label_all，IRRG + Label_all复现了一遍

我是使用 tools/convert_datasets/potsdam.py 脚本处理的

config, model

结果log和放出的log差距较大

是我处理的Potsdam数据集不对？

Couldn't reproduce the semantic segmentation experiment

Tried to by simply eval the model, as the notebook 'Semantic segmentatin/demo/inference_demo.ipynb' do. But instead of using restnet, use the uploaded model.
By providing the config and the model of RSP-ViTAEv2-S-E100, at the notebook, init_segmentor doesn't work, due to configs related with data outside the repo.

I would be pretty happy if you provide some notebook to reproduce that. :)

Best regards

label讀取的問題

在測試和訓練時，讀取dota1.0數據時，json和pkl中分別寫什麽，可以給一個示例嗎？運行時，測試時，提示我在lable文件夾裏面沒有split_config_json和ori_annfile_pkl，您可以及時回復我嗎？多謝

ann_file是什么格式的，怎么把八点法的labelTxt转成ann_file

About download the pretained model with change detection.

hello , I can't find the link of pretrained model ,like :RS_CLS_finetune/output/resnet_50_224/epoch120/millionAID_224_None/0.0005_0.05_192/resnet/100/ckpt.pth
Swin-Transformer-main/output/swin_tiny_patch4_window7_224/epoch120/swin_tiny_patch4_window7_224/default/ckpt.pth
...

change detection

我在运行python eval.py
--backbone 'swin' --dataset 'levir' --mode 'rsp_300'
--path [model path] 实例时报错如下：

Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2

请问你们用的pytorch那个版本，change detection 我安装按照
https://github.com/likyoo/Siam-NestedUNet/blob/master/README.md
Requirements
Python 3.6

Pytorch 1.4

torchvision 0.5.0

other packages needed

pip install opencv-python tqdm tensorboardX sklearn

帮忙分析下，谢谢了。

模型注册问题

KeyError: 'swin is not in the models registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
return build_from_cfg(cfg, registry, default_args) File "/media/user/volume/PycharmProjects/ViTAE-Transformer-Remote-Sensing-main/Semantic Segmentation/tools/train.py", line 234, in

File "/home/user/.conda/envs/pytorch/lib/python3.10/site-packages/mmcv/utils/registry.py", line 61, in build_from_cfg
raise KeyError(
KeyError: 'swin is not in the models registry'

求问如何debug

DIOR-R Benchmark question.

In your paper, the performance of your model in dior-r dataset is showed in table.
However, there is no information on whether this is single-scale or multi-scale.
I want to know the performance of your model in dior-r dataset is in single-scale setting or multi-scale setting.
Thank you.