Coder Social home page Coder Social logo

vitae-transformer / vitae-transformer-remote-sensing Goto Github PK

View Code? Open in Web Editor NEW
426.0 9.0 53.0 3.49 MB

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

TeX 100.00%
remote-sensing deep-learning change-detection classification object-detection self-supervised-learning semantic-segmentation transfer-learning vision-transformer

vitae-transformer-remote-sensing's Introduction

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Updates | Introduction | Statement |

Current applications

Image Classification: Please see ViTAE-Transformer for image classification;

Object Detection: Please see ViTAE-Transformer for object detection;

Sementic Segmentation: Please see ViTAE-Transformer for semantic segmentation;

Animal Pose Estimation: Please see ViTAE-Transformer for animal pose estimation;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing: Please see ViTAE-Transformer for Remote Sensing;

Updates

09/04/2021

24/03/2021

  • The pretrained models for both ViTAE and ViTAEv2 are released. The code for downstream tasks are also provided for reference.

07/12/2021

  • The code is released!

19/10/2021

  • The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021

  • The paper is post on arxiv! The code will be made public available once cleaned up.

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers. In ViTAEv2, we explore the usage of window attentions without shift operations to obtain a better balance between memory footprint, speed, and performance. We also stack the proposed RC and NC in a multi-stage manner to faciliate the learning on other vision tasks including detection, segmentation, and pose.

Fig.1 - The details of RC and NC design in ViTAE.

Fig.2 - The multi-stage design of ViTAEv2.

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Citing ViTAE and ViTAEv2

@article{xu2021vitae,
  title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
  author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
@article{zhang2022vitaev2,
  title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
  author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
  journal={arXiv preprint arXiv:2202.10108},
  year={2022}
}

Other Links

Image Classification: See ViTAE for Image Classification

Object Detection: See ViTAE for Object Detection.

Semantic Segmentation: See ViTAE for Semantic Segmentation.

Animal Pose Estimation: See ViTAE for Animal Pose Estimation.

Matting: See ViTAE for Matting.

Remote Sensing: See ViTAE for Remote Sensing.

vitae-transformer-remote-sensing's People

Contributors

dotwang avatar winter-jon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vitae-transformer-remote-sensing's Issues

About the IMP weights on change detection

Hello, @DotWang. Your work is great. The results in your paper show that the bit with IMP-ViTAEv2-S weights performs best. So I wonder whether the pretrained weights from IMP on change detection will be released. Thank you very much.

ModuleNotFoundError: No module named 'mmdet.version'

Traceback (most recent call last):
File "/home/dgx/workspace/cui/ViTAE/tools/train.py", line 13, in
from mmdet.apis import set_random_seed, train_detector
File "/home/dgx/workspace/cui/ViTAE/mmdet/init.py", line 1, in
from .version import version, short_version
ModuleNotFoundError: No module named 'mmdet.version'

in the mmdet/init.py, I found the code to be written like this

from .version import version, short_version

all = ['version', 'short_version']

but the .version is not the python file, in the .version file, It is only one line of code
2.2.0

KeyError: "EncoderDecoder: 'ViTAE_Window_NoShift_basic is not in the models registry'"

作者您好,我在尝试复现您在论文中,在Potsdam数据集上的实验时,运行以下代码
python -m torch.distributed.launch --nproc_per_node=1 --master_port=40001 tools/train.py configs/vitae_win/upernet_vitae_win_imp_window7_512x512_80k_potsdam.py --launcher 'pytorch',
但是出现了KeyError: "EncoderDecoder: 'ViTAE_Window_NoShift_basic is not in the models registry'"的问题,请问这是为什么呢?有什么办法解决吗?
我还想请教一下如何正确复现您的实验呢?

About Reproducing Training of Remote Sensing Semantic Segmentation Models

Hello, I am interested in the remote sensing semantic segmentation of this project. I have downloaded all the relevant Potsdam datasets and configured the program running environment, but the reproduction steps of the entire training are still unclear. The downloaded public datasets Whether preprocessing is required, images and label images need to be cut into small pieces. If you don't do distributed training, you can simply remove the distributed settings when training on one card. Do you have a more detailed description of the training steps? The description of the training parameter configuration config file allows us to reproduce the model training. Thank you.

数据集处理问题

感谢您提出了这么优秀的模型!我在使用您的模型框架对我的数据进行训练的过程中,测试结果为‘forest land’这一类标签他的IoU=0,Acc=0,Fscore=nan,precision=nan,影响了我的测试结果,经过debug发现可能是标签设置过程中forest land这一类标签没有打上去,该如何修改呢

use one image to test issues

想问一下,我在利用您给的ViTAEv2模型做语义分割测试时,发现所给的图片无法通过测试, 问题出在ReductionCell.py的assert N==HW,但我在利用另外两个权重对该图片进行测试时均可正常产生结果。
1.所以该ViTAEv2模型是否只支持图片大小为2的倍数的情况呢?
2.当我使用1024
512的png图片进行预测时,依旧会出问题,在ViTAE_Window_NoShiftt/base_model.py的outs.append(x.view(b,wh,wh,-1).permute(0,3,1,2))处也会出现size不一致的问题

我是用的权重如下图
image

如果您有空回答,十分感谢

模型预训练权重在哪下载

你好!我在训练是发生找不到权重 VitAE_window/output/ViTAE_Window_NoShift_12_basic_stages4_14_224/epoch100/ViTAE_Window_NoShift_12_basic_stages4_14/default/ckpt.pth

Reproduce the SeCo DOTA result.

Hi~I'm recently working on some comparison experiments. When I fine-tune the official SeCo pre-trained model (SeCo-1M) on DOTA objection detection tasks. The test set mAP result was much lower than the paper's (TABLE VIII 70.07).

I strictly followed the experimental setup in the paper, but instead of OBBDetection I used mmrotate, and the difference is, I think, not that big.

Do you have any suggestions for reproduction? Thanks~

The mmrotate config which I use is given blow:

angle_version = 'le90'
dataset_type = 'DOTADataset'
data_root = '../DOTA'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version='le90'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='DOTADataset',
        ann_file=data_root + "/trainVal/annfiles",
        img_prefix=data_root + "/trainVal/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(
                type='RRandomFlip',
                flip_ratio=[0.25, 0.25, 0.25],
                direction=['horizontal', 'vertical', 'diagonal'],
                version='le90'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ]),
    val=dict(
        type='DOTADataset',
        ann_file=data_root + "/trainVal/annfiles",
        img_prefix=data_root + "/trainVal/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='DOTADataset',
        ann_file=data_root + "/test/annfiles",
        img_prefix=data_root + "/tests/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='mAP')
optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(
    type='Fp16OptimizerHook',
    distributed=False,
    grad_clip=dict(max_norm=35.0, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
model = dict(
    type='OrientedRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(
            type='Pretrained',
            checkpoint='../pretrain_checkpoint/SeCo1m.pth')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='OrientedRPNHead',
        in_channels=256,
        feat_channels=256,
        version='le90',
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='MidpointOffsetCoder',
            angle_range='le90',
            target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='OrientedStandardRoIHead',
        bbox_roi_extractor=dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='RotatedShared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=15,
            bbox_coder=dict(
                type='DeltaXYWHAOBBoxCoder',
                angle_range='le90',
                norm_factor=None,
                edge_swap=True,
                proj_xy=True,
                target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
                target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)),
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                iou_calculator=dict(type='RBboxOverlaps2D'),
                ignore_iof_thr=-1),
            sampler=dict(
                type='RRandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            nms_pre=2000,
            min_bbox_size=0,
            score_thr=0.05,
            nms=dict(iou_thr=0.1),
            max_per_img=2000)))
work_dir = './seco_result'
auto_resume = False
gpu_ids = [0]

About MillionAID dataset

I downloaded the MillionAID dataset from the official homepage: MillionAID. The training set was found to have only 10K images. The test set is not labeled. May I know how the pre-training data in the paper was obtained?

Where is the train_labels_{}_{}.txt for scene recognition?

I am running the code of scene recognition and I have downloaded the AID, UCM, and NWPU datasets from their official webpages.

But there are no train_labels_{}{}.txt in these datasets. Where are train_labels{}_{}.txt used in your code?

image

questions about exp. of semantic seg.

Hi, thanks for your great work and codebase.

The batch size is 8 in the paper, and 4 in the config of Swin-T-IMP+UperNet.
And I did not find any description of num_gpu for the semantic seg. subsection.
In the README.md of semantic seg., the command:

python -m torch.distributed.launch --nproc_per_node=1 --master_port=40001 tools/train.py \
    configs/upernet/upernet_our_r50_512x512_80k_potsdam_epoch300.py \
    --launcher 'pytorch'

which seems to set num_gpus_per_node as 1? or your command is for 2 single GPU nodes and batch_size 4 for each (2x4)?

模型训练问题

用的mmseg原生的potsdam.py预处理的
参照#9 用的potsdam_ori.py
config
upernet_swin_tiny_patch4_window7_512x512_80k_potsdam
参数未改
batch_size = 8
model = dict(
pretrained='checkpoint/upernet-rsp-swin-t-potsdam-latest.pth')
pipline跟data中的
reduce_zero_label = True

2023-03-10 00:13:21,947 - mmseg - INFO - Iter [80000/80000] lr: 7.500e-10, eta: 0:00:00, time: 0.378, data_time: 0.003, memory: 15450, decode.loss_ce: 0.1454, decode.acc_seg: 74.9332, aux.loss_ce: 0.0674, aux.acc_seg: 74.0879, loss: 0.2129

+--------------------+-------+-------+--------+-----------+--------+
| Class | IoU | Acc | Fscore | Precision | Recall |
+--------------------+-------+-------+--------+-----------+--------+
| impervious_surface | 82.19 | 92.91 | 90.23 | 87.69 | 92.91 |
| building | 91.04 | 97.03 | 95.31 | 93.64 | 97.03 |
| low_vegetation | 70.99 | 88.3 | 83.03 | 78.36 | 88.3 |
| tree | 73.7 | 83.01 | 84.86 | 86.8 | 83.01 |
| car | 81.17 | 89.05 | 89.61 | 90.17 | 89.05 |
| clutter | 0.0 | 0.0 | nan | nan | 0.0 |
+--------------------+-------+-------+--------+-----------+--------+
2023-03-10 00:13:54,226 - mmseg - INFO - Summary:
2023-03-10 00:13:54,226 - mmseg - INFO -
+-------+-------+-------+---------+------------+---------+
| aAcc | mIoU | mAcc | mFscore | mPrecision | mRecall |
+-------+-------+-------+---------+------------+---------+
| 86.86 | 66.52 | 75.05 | 88.61 | 87.33 | 75.05 |
+-------+-------+-------+---------+------------+---------+

F1 88.61
请问,哪里出问题了么
感谢:)

mmcv版本问题

感谢您提出了这么优秀的模型!我在尝试使用您的模型框架时发现,您的代码版本已经和现在mmcv2.X版本的不匹配,您能提供下你当时所使用的版本吗?

reproduce problem about swin-t in scene classification.

Hi, I try to follow your hyperparameters to reproduce the classification results in misclassification, but I train aid (2:8) using max_epochs=200, base_lr=5e-4, and other settings following:
base = [
# '../base/models/swin_transformer/base_224.py',
# "../base/datasets/ucmerced_landuse_bs64_swin_224.py",
"../base/datasets/aid_bs64_autoaug.py",
"../base/schedules/imagenet_bs64_adamw_swin.py",
"../base/default_runtime.py",
]

refer to SimMIM paper

ADJUST_FACTOR = 1.0
BATCH_SIZE = 64
BASE_LR = 5e-4 * ADJUST_FACTOR # todo: adjust.
WARMUP_LR = 5e-7 * ADJUST_FACTOR
MIN_LR = 5e-6 * ADJUST_FACTOR
NUM_GPUS = 1
DROP_PATH_RATE = 0.2
SCALE_FACTOR = 512.0
MAX_EPOCHS = 200

model settings

model = dict(
type="ImageClassifier",
backbone=dict(
type="SwinTransformer",
# arch="base",
arch="tiny",
img_size=224,
# drop_path_rate=0.1, # DROP_PATH_RATE
drop_path_rate=DROP_PATH_RATE,
),
neck=dict(type="GlobalAveragePooling"),
head=dict(
type="LinearClsHead",
num_classes=21,
# in_channels=1024,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(type="LabelSmoothLoss", label_smooth_val=0.1, mode="original"),
cal_acc=False,
),
init_cfg=[
dict(type="TruncNormal", layer="Linear", std=0.02, bias=0.0),
dict(type="Constant", layer="LayerNorm", val=1.0, bias=0.0),
],
train_cfg=dict(
augments=[
dict(type="BatchMixup", alpha=0.8, num_classes=21, prob=0.5),
dict(type="BatchCutMix", alpha=1.0, num_classes=21, prob=0.5),
]
),
)

optimizer

paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys={
".absolute_pos_embed": dict(decay_mult=0.0),
".relative_position_bias_table": dict(decay_mult=0.0),
},
)

optimizer = dict(
type="AdamW",
# lr=1e-3 * 64 / 256, # 5e-4 * 64 / 512, # 1e-3 * 64 / 256,
# lr=1.25e-3 * 96 * 1 / 512.0,
# BASE_LR * BATCH_SIZE * NUM_GPUS / 512.0, # 1e-3 * 64 / 256,
lr=BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999),
paramwise_cfg=paramwise_cfg,
)
optimizer_config = dict(grad_clip=dict(max_norm=5.0))

learning policy

lr_config = dict(
policy="CosineAnnealing",
# min_lr=2.5e-7,
# by_epoch=False, # todo: try
by_epoch=False,
# min_lr_ratio=(2.5e-7 * 96 * 1 / 512.0) / (1.25e-3 * 96 * 1 / 512.0), # 1e-2,
min_lr_ratio=(MIN_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR)
/ (BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR),
# min_lr=2.5e-7, # MIN_LR,
warmup="linear",
# warmup_ratio=(2.5e-7 * 96 * 1 / 512.0) / (1.25e-3 * 96 * 1 / 512.0), # 1e-3,
warmup_ratio=(WARMUP_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR)
/ (BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR),
# warmup_lr=2.5e-7, # WARMUP_LR,
warmup_iters=20, # todo: 0
warmup_by_epoch=True,
)

checkpoint_config = dict(interval=MAX_EPOCHS // 10)
evaluation = dict(
interval=MAX_EPOCHS // 10, metric="accuracy", save_best="auto"
) # save the checkpoint with highest accuracy
runner = dict(type="EpochBasedRunner", max_epochs=MAX_EPOCHS)

data = dict(samples_per_gpu=96, workers_per_gpu=8,)

data = dict(samples_per_gpu=BATCH_SIZE, workers_per_gpu=8,)

fp16 settings

fp16 = dict(loss_scale="dynamic")

so could you help me with this? or provide your training log?
Thanks!

About Labels of Million-AID Dataset

The original split of MillionAID is used for recognition. Our study is about pretraining, so we resplit the training and testing sets. The obtained training set is relatively large for transferring the pretrained weights to downstream tasks. All RSP pretrained weights are available at https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing/blob/main/README.md

Thank you very much. Your work is very inspiring to us. We are working to do some research in the field of remote sensing image pre-training and would like to use your work as a baseline. However, we found that the open source MillionAID data has less annotated data than in your paper, so we propose this issue.

Originally posted by @pUmpKin-Co in #3 (comment)

Hello,
I wasn't able to understand what the conclusion is.

we resplit the training and testing sets.

I think the annotated data is needed to use all images and split the images again. Is the labeled data provided for test data? I mean, in my understanding, we need both "train_label.txt" and "valid_label.txt" to use million-AID, but I don't know where I can download them. I appreciate your help.

Semantic Segmentation: Potsdam 数据集复现性能差距有点大

未修改 ’upernet_vitae_win_window7_512x512_80k_potsdam_epoch100.py ‘参数,分别使用
RGB+ Label_all,IRRG + Label_all复现了一遍

我是使用 tools/convert_datasets/potsdam.py 脚本处理的

image

image

config, model
image

结果log和放出的log差距较大

是我处理的Potsdam数据集不对?

Couldn't reproduce the semantic segmentation experiment

Tried to by simply eval the model, as the notebook 'Semantic segmentatin/demo/inference_demo.ipynb' do. But instead of using restnet, use the uploaded model.
By providing the config and the model of RSP-ViTAEv2-S-E100, at the notebook, init_segmentor doesn't work, due to configs related with data outside the repo.

I would be pretty happy if you provide some notebook to reproduce that. :)

Best regards

label讀取的問題

在測試和訓練時,讀取dota1.0數據時,json和pkl中分別寫什麽,可以給一個示例嗎?運行時,測試時,提示我在lable文件夾裏面沒有split_config_json和ori_annfile_pkl,您可以及時回復我嗎?多謝

About download the pretained model with change detection.

hello , I can't find the link of pretrained model ,like :RS_CLS_finetune/output/resnet_50_224/epoch120/millionAID_224_None/0.0005_0.05_192/resnet/100/ckpt.pth
Swin-Transformer-main/output/swin_tiny_patch4_window7_224/epoch120/swin_tiny_patch4_window7_224/default/ckpt.pth
...

change detection

我在运行python eval.py
--backbone 'swin' --dataset 'levir' --mode 'rsp_300'
--path [model path] 实例时报错如下:

Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2

请问你们用的pytorch那个版本,change detection 我安装按照
https://github.com/likyoo/Siam-NestedUNet/blob/master/README.md
Requirements
Python 3.6

Pytorch 1.4

torchvision 0.5.0

other packages needed

pip install opencv-python tqdm tensorboardX sklearn

帮忙分析下,谢谢了。

模型注册问题

KeyError: 'swin is not in the models registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
return build_from_cfg(cfg, registry, default_args) File "/media/user/volume/PycharmProjects/ViTAE-Transformer-Remote-Sensing-main/Semantic Segmentation/tools/train.py", line 234, in

File "/home/user/.conda/envs/pytorch/lib/python3.10/site-packages/mmcv/utils/registry.py", line 61, in build_from_cfg
raise KeyError(
KeyError: 'swin is not in the models registry'

求问如何debug

DIOR-R Benchmark question.

In your paper, the performance of your model in dior-r dataset is showed in table.
However, there is no information on whether this is single-scale or multi-scale.
I want to know the performance of your model in dior-r dataset is in single-scale setting or multi-scale setting.
Thank you.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.