Coder Social home page Coder Social logo

lxasqjc / deformation-segmentation Goto Github PK

View Code? Open in Web Editor NEW
41.0 2.0 6.0 8.77 MB

PyTorch implementation of Learning to Downsample for Segmentation of Ultra-High Resolution Images

License: Other

Python 32.10% Shell 0.17% Cython 8.15% Makefile 2.65% C++ 5.58% C 51.35%
downsampling segmentation high-resolution low-cost-ai gigapixel-images histology-images geospatial-analysis

deformation-segmentation's Introduction

Learning to Downsample for Segmentation of Ultra-High Resolution Images in PyTorch

This is a PyTorch implementation of Learning to Downsample for Segmentation of Ultra-High Resolution Images which published at ICLR 2022.

Updates

  • Apology for the long delayed code cleaning, which is now done! Please let me know if you would like further clarification of any part :)
  • ICLR 2022 talk available HERE
  • For more details/examples/video demos visit our project page HERE

Table of Contents

  1. Environment-Setup
  2. Data-preparation
  3. Reproduce
  4. Citation

Environment-Setup

Install dependencies

Install dependencies with one of the following options: Conda installation with miniconda3 PATH /home/miniconda3/:

conda env create -f deform_seg_env.yml
conda activate deform_seg_env

Above environment is built with conda version: 4.7.11

Data preparation

  1. Download the Cityscapes, DeepGlobe and PCa-histo datasets.

  2. Your directory tree should be look like this:

$SEG_ROOT/data
├── cityscapes
│   ├── annotations
│   │   ├── testing
│   │   ├── training
│   │   └── validation
│   └── images
│       ├── testing
│       ├── training
│       └── validation
├── histomri
│   ├── train
│   │   ├── images
│   │   ├── labels
│   └── val
│   │   ├── images
│   │   ├── labels
├── deepglob
│   ├── land-train
│   └── land_train_gt_processed

note Histo_MRI is the PCa-histo dataset

  1. Data list .odgt files are provided in ./data prepare correspondingly for local datasets. (Note: for cityscapes please check its ./data/Cityscape/*.odgt, in my example I removed the city subfolders and put all images under one folder, if your data tree is different please modify accordingly e.g. change "images/training/tubingen_000025_000019_leftImg8bit.png" to "images/training/tubingen/000025_000019_leftImg8bit.png"

Reproduce

full configuration bash provided to reproduced paper results, suitable for large scale experiment in multiple GPU Environment, Syncronized Batch Normalization are deployed.

Training

Train a model by selecting the GPUs ($GPUS) and configuration file ($CFG) to use. During training, last checkpoints by default are saved in folder ckpt.

python3 train_deform.py --gpus $GPUS --cfg $CFG
  • To choose which gpus to use, you can either do --gpus 0-7, or --gpus 0,2,4,6.
  • Bashes and configurations are provided to reproduce our results:
  • note you will need to specify your root path 'SEG_ROOT' for DATASET.root_dataset option in those scripts.
bash quick_start_bash/cityscape_64_128_ours.sh
bash quick_start_bash/cityscape_64_128_uniform.sh
bash quick_start_bash/deepglob_300_300_ours.sh
bash quick_start_bash/deepglob_300_300_uniform.sh
bash quick_start_bash/pcahisto_80_800_ours.sh
bash quick_start_bash/pcahisto_80_800_uniform.sh
  • You can also override options in commandline, for example python3 train_deform.py TRAIN.num_epoch 10 .

Evaluation

  1. Evaluate a trained model on the validation set, simply override following options TRAIN.start_epoch 125 TRAIN.num_epoch 126 TRAIN.eval_per_epoch 1 TRAIN.skip_train_for_eval True
  • Alternatively, you can quick start with provided bash script:
bash quick_start_bash/eval/cityscape_64_128_ours.sh
bash quick_start_bash/eval/cityscape_64_128_uniform.sh
bash quick_start_bash/eval/deepglob_300_300_ours.sh
bash quick_start_bash/eval/deepglob_300_300_uniform.sh
bash quick_start_bash/eval/pcahisto_80_800_ours.sh
bash quick_start_bash/eval/pcahisto_80_800_uniform.sh

Citation

If you use this code for your research, please cite our paper:

@article{jin2021learning,
  title={Learning to Downsample for Segmentation of Ultra-High Resolution Images},
  author={Jin, Chen and Tanno, Ryutaro and Mertzanidou, Thomy and Panagiotaki, Eleftheria and Alexander, Daniel C},
  journal={arXiv preprint arXiv:2109.11071},
  year={2021}

@inproceedings{
jin2022learning,
title={Learning to Downsample for Segmentation of Ultra-High Resolution Images},
author={Chen Jin and Ryutaro Tanno and Thomy Mertzanidou and Eleftheria Panagiotaki and Daniel C. Alexander},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=HndgQudNb91}
}

deformation-segmentation's People

Contributors

cnstt avatar lxasqjc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

deformation-segmentation's Issues

How to reproduce the intrinsic mIoU value?

Dear Chen,

I am not able to reproduce a simple uniform intrinsic mIoU baseline metric of 0.78 on CityScapes downsampled to 64x128 from 1024x2048 (Figure 4(a) from paper). Do you downsample & upsample the GT map with nearest uniform sampler and measure mIoU? I can't really tell the steps by reading the code. I get 0.74 by doing so in a script written from scratch.

Best,
Yuyao

the implementation of create_grid function

Hi @lxasqjc , Could you please give some illustration of the create_grid function, specifically, could you give some comments on the following lines, and why there is a grid_y ? Thanks!

        if len(self.input_size_net_eval) != 0 and segSize is not None:# inference
            grid = nn.Upsample(size=self.input_size_net_infer, mode='bilinear')(grid)
        else:
            grid = nn.Upsample(size=self.input_size_net, mode='bilinear')(grid)
        if segSize is None:# training
            grid_y = nn.Upsample(size=tuple(np.array(self.input_size_net)//self.cfg.DATASET.segm_downsampling_rate), mode='bilinear')(grid)
        else:# inference
            grid_y = nn.Upsample(size=tuple(np.array(self.input_size_net_infer)), mode='bilinear')(grid)

seg loss and edge loss is NAN

I have one more question.

When I run the 'quick_start_bash/cityscape_64_128_ours.sh' (w/o any change) , seg loss and edge loss shows NaN value.
Can you check?

Thank you for your awesome work.

how to perform the sampling in motivational study

Hi @lxasqjc thanks for sharing the great work! I have a question for the motivational study, how do you generate the different sampling grid density, which parameter is to be controlled, I have tried the gaussian filter radius, but seems not working properly.

IoU val zero

Hello,
I am having some troubles with the results of the evaluation step.
What I'm trying to do is to run the model on the DeepGlob dataset in such a way that it ignores just the 'unknown' class, but what happens is that, if I'm using the 'default' configuration ignore_index: 6 and ignore_gt_labels: [6] then the model ignores both the classes 0 and 6 i.e. it returns 0 IoU for both of them. Following there is reported an example of history_epoch file showing the problem I'm referring to:

epoch                         0.000000
epoch.1                       1.000000
train_loss                    1.078230
train_acc                    49.376525
val_miou                      0.273724
val_acc                      58.940513
val_deformed_miou             0.274391
val_y_reverse_miou                 NaN
num_valid_samples           207.000000
train_edge_loss               0.652420
val_iou_class_0               0.000000
val_iou_deformed_class_0      0.000000
val_iou_class_1               0.672547
val_iou_deformed_class_1      0.667506
val_iou_class_2               0.181326
val_iou_deformed_class_2      0.176898
val_iou_class_3               0.420982
val_iou_deformed_class_3      0.426324
val_iou_class_4               0.406174
val_iou_deformed_class_4      0.414650
val_iou_class_5               0.235043
val_iou_deformed_class_5      0.235360
val_iou_class_6               0.000000
val_iou_deformed_class_6      0.000000

A similar behaviour happens also in the case when I'm using the configuration ignore_index: 0 and ignore_gt_labels: [0] (which I think it is the setting that I have to use in order to ignore just the 'unknown' class), more precisely in this case the "ignored" classes are the classes 0 and 1.
How can I solve this problem and what I'm doing wrong?
Thank you so much in advance :)

Question about deform_pretrain[_bol]

Sorry to spam issues haha just have a lot of questions. And thanks for releasing your codebase! It's really helpful for our research.

It seems these flags are referenced twice. Once here, where the low resolution image gets blurred if the epoch is less than cfg.TRAIN.deform_pretrain

if segSize is None and ((self.cfg.TRAIN.opt_deform_LabelEdge or self.cfg.TRAIN.deform_joint_loss) and epoch <= self.cfg.TRAIN.deform_pretrain):
min_saliency_len = min(self.input_size)
s = random.randint(min_saliency_len//3, min_saliency_len)
x_low = nn.AdaptiveAvgPool2d((s,s))(x_low)
x_low = nn.Upsample(size=self.input_size,mode='bilinear')(x_low)

and here, where the sampled image gets blurred if not self.cfg.TRAIN.deform_pretrain_bol (this flag is set to True, so the sampled image never gets blurred)

# EXPLAIN: pretraining trick following A. Recasens et,al. (2018)
N_pretraining = self.cfg.TRAIN.deform_pretrain
epoch = self.cfg.TRAIN.global_epoch
if self.cfg.TRAIN.deform_pretrain_bol or (epoch>=N_pretraining and (epoch<self.cfg.TRAIN.smooth_deform_2nd_start or epoch>self.cfg.TRAIN.smooth_deform_2nd_end)):
p=1 # non-pretain stage: no random size pooling to x_sampled
else:
p=0 # pretrain stage: random size pooling to x_sampled

So my questions are twofold.

  1. What's the intuition behind blurring the low resolution image (which is only used to generate saliency) for the first 100 epochs?
  2. Why doesn't the sampled image get blurred for the first 100 epochs? I thought this was the trick from Learning to Zoom. Do you not need it in practice?

Why using bilinear interpolation for label ?

Hello,
Thanks for this repo.

About this line:

y_sampled = F.grid_sample(y.float().unsqueeze(1), grid_y).squeeze(1)

y_sampled = F.grid_sample(y.float().unsqueeze(1), grid_y).squeeze(1)

by default, F.grid_sample will use bilinear interpolation,
which I think will corrupt the label (for example, 1:cyclist, 2:car, after interpolation, 1.50 : nothing = corrupted label).

Is there a reason you are not using nearest interpolation ?

Question about the mIoU metric

Dear authors,

I have carefully read your paper and have a question about the mIoU metric reported in Figure 4(a). My question is based on the following observations:

  • Figure 4(a) reports mIoU(single_loss) ~= 0.46 and mIoU(joint_loss) ~= 0.50, and
  • I run the code in this repository with seed=304, and get val_deformed_miou(single_loss) = 0.465027607866967 and val_deformed_miou(joint_loss) = 0.49334734890825, but val_miou(single_loss) = 0.418473241184434 and val_miou(joint_loss) = 0.447862081141869. (These values are recorded in the csv file, and the screen console seems only prints deform_miou in the name of miou)

So my question is: Does my run comply with the experiments in the paper? (Do you report deformed mIoU or mIoU of the original resolution?)

Look forward to your reply. Thanks a lot!

Replicating Cost-Performance curves (Figure 6)

Could you please share more info on how to replicate the cost-performance curves from Figure 6? Are the training hyperparameters kept the same as in the quick_start_bash scripts, or is the num_epochs for example and learning rate tuned per experiment? At what lower resolutions are the experiments run to generate the curves?

/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:105: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [1,0,0], thread: [864,0,0] Assertion `t >= 0 && t < n_classes` failed.

when I running the train_deform.py on the Cityscape dataset, I encountered the following error. I have checked that the numerical range of batch_data['seg_label'] is between 0 and 19. However, the code allows a range between 0 and 18. How should I solve this problem?

[2023-03-24 12:00:53,725 INFO train_deform.py line 693 48563] Outputing checkpoints to: ckpt/Cityscape_Tin_64_128_ours
/data/anaconda3/envs/deform_seg_env/lib/python3.8/site-packages/torch/nn/functional.py:3609: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  warnings.warn( # samples: 2975 1 Epoch = 744 iters
/data/PycharmProjects/high_resolution/Deformation-Segmentation-main/models/models.py:416: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  xs = nn.Softmax()(xs) # N,W*H
/data/anaconda3/envs/deform_seg_env/lib/python3.8/site-packages/torch/nn/functional.py:3981: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
  warnings.warn(
/data/anaconda3/envs/deform_seg_env/lib/python3.8/site-packages/torch/cuda/memory.py:260: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
  warnings.warn(
THCudaCheck FAIL file=../aten/src/THC/THCCachingHostAllocator.cpp line=278 error=710 : device-side assert triggered
Traceback (most recent call last):
  File "/data/PycharmProjects/high_resolution/Deformation-Segmentation-main/train_deform.py", line 731, in <module>
    main(cfg, gpus)
  File "/data/PycharmProjects/high_resolution/Deformation-Segmentation-main/train_deform.py", line 522, in main
    train(segmentation_module, iterator_train,
  File "/data/PycharmProjects/high_resolution/Deformation-Segmentation-main/train_deform.py", line 73, in train
    loss, acc, edge_loss = segmentation_module(batch_data[0], writer=writer, count=cur_iter, epoch=epoch)
  File "/data/anaconda3/envs/deform_seg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/PycharmProjects/high_resolution/Deformation-Segmentation-main/models/models.py", line 607, in forward
    acc = self.pixel_acc(pred, feed_dict['seg_label'])
  File "/data/PycharmProjects/high_resolution/Deformation-Segmentation-main/models/models.py", line 194, in pixel_acc
    acc_sum = torch.sum(valid * (preds == label).long())
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:105: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [1,0,0], thread: [864,0,0] Assertion `t >= 0 && t < n_classes` failed.
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/data/anaconda3/envs/deform_seg_env/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/data/anaconda3/envs/deform_seg_env/lib/python3.8/threading.py", line 870, in run

Question about training with loss at high resolution

Hi,

Thank you very much for your work.

I'm trying to train a model on a custom dataset, where I want to keep the segmentation map at the original resolution during loss calculation (rather than downsampling GT as done in the paper). However, I encountered an error in models.py, Line 570-578. I wonder if there is a bug in the code, where the image (x) was upsampled rather than the GT (pred)?

It also seems that enabling loss_at_high_res brings up additional imports (e.g. Interp2D) that are not intended to be used? There are no instructions on how to get them to work in this repository.

As a side question, do you observe any performance differences when computing loss at resolution as opposed to downsampling the GT?

open source question

  • The train_fove.py file does not exist. Where are you?

  • Should I create the directory tree folder myself?

code as a library

Thanks for this greate work.
How could I get your code as a library to use the downsampling and upsampling blocks in my segmentation code ?
What are the modules to import and the loss to import and backpropagate.
I would feel some simple lines in the README could help a lot to use your technics outside your repository.

A general question on the method

Hi Jin @lxasqjc , thanks for sharing the work and providing timely reply, I have a question on practical usage, have you compared the method to uniform sampling with higher resolution? For example, the method downsamples the image to 64x128 and achieves performance a with an inference time of t1, and we can also have another baseline that uniformly downsamples the image to a higher resolution (e.g., 80x160, or 100x200, or 128x256) and achieves performance b with inference time of t2, would b be larger higher than a while t2 is less or comparable to t1? I ask this question as the added saliency map prediction and the whole deform sampling process would also give some cost in inference.

Thanks in advance and looking forward to your reply!

nan Losses and 100 accuracy

Hello,

I was trying to train the model but I'm encountering some issues with Accuracy and losses values.
Currently I'm using the DeepGlobe dataset where pixels of masks are encoded using 0/255 encoding i.e., using the following mapping:
'unknown': [0, 0, 0], 'urban': [0, 255, 255], 'agriculture': [255, 255, 0], 'rangeland': [255, 0, 255], 'forest ': [0, 255, 0], 'water': [0, 0, 255], 'barren': [255, 255, 255].
Using this encoding somehow it works but after few steps the two losses becomes nan and in the second epoch Accuracy reaches 100, here the output:

Epoch: [1][0/114], Time: 8.72, Data: 3.96, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 7.50, Seg_Loss: 2.249039, Edge_Loss: 0.152145
Epoch: [1][20/114], Time: 2.30, Data: 0.20, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 45.84, Seg_Loss: 2.592526, Edge_Loss: 0.772081
Epoch: [1][40/114], Time: 2.11, Data: 0.11, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 49.47, Seg_Loss: nan, Edge_Loss: nan
Epoch: [1][60/114], Time: 2.04, Data: 0.08, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 65.99, Seg_Loss: nan, Edge_Loss: nan
Epoch: [1][80/114], Time: 2.01, Data: 0.06, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 74.39, Seg_Loss: nan, Edge_Loss: nan
Epoch: [1][100/114], Time: 1.99, Data: 0.05, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 79.46, Seg_Loss: nan, Edge_Loss: nan
Saving checkpoints...
Saving history...
Epoch: [2][0/114], Time: 1.90, Data: 0.01, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 100.00, Seg_Loss: nan, Edge_Loss: nan
Epoch: [2][20/114], Time: 1.94, Data: 0.01, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 100.00, Seg_Loss: nan, Edge_Loss: nan
Epoch: [2][40/114], Time: 1.92, Data: 0.01, lr_encoder: 0.000020, lr_decoder: 0.000020, Accuracy: 100.00, Seg_Loss: nan, Edge_Loss: nan

I imagine that the encoding is wrong but cannot understand what's the correct one.
Furthermore the input images I pass in input to the model are 0:255, but your normalisation is performed on a [0:1] range.
Should we normalise images to [0:1] before inputting them to the net? Or is it done inside? Could this be the cause of the issue?

no module named hrnet

Hello. thank you for your great work!
I have some minor question.

In the "models" folder, some backbone networks (HRnet, U-Net, ...) are missing, which causes errors.

Can you check?

pretraining trick error in the code

p is either 0 or 1
https://github.com/lxasqjc/Deformation-Segmentation/blob/main/models/models.py#L515

random.random() output is [0,1]

therefore
https://github.com/lxasqjc/Deformation-Segmentation/blob/main/models/models.py#L515

        if random.random() > p:
            min_saliency_len = min(self.input_size)
            s = random.randint(min_saliency_len // 3, min_saliency_len)
            x_sampled = nn.AdaptiveAvgPool2d((s, s))(x_sampled)
            x_sampled = nn.Upsample(size=self.input_size_net, mode='bilinear')(x_sampled)

Will NEVER get executed

GPU Memory Usage

What GPUs did you train on and how much memory did your experiments require? I'm trying to use standard 12 GB RTX 2080Ti's with no luck. For reference, I'm able to train DeepGlobes up to 275x275 with a batch size of 1 on 4 GPUs, but am unable to run the normal 300x300 size.

Also wondering if you have any intuition for what operations demand this much memory? For what it's worth, this seems VERY memory intensive.

single gpu mode edge loss undefined

hi @lxasqjc as I could not run into the multi-GPU mode data parallel, I was trying to do a step-by-step running with single GPU mode, but I notice that the edge loss is undefined if single gpu mode is used:

        if single_gpu_mode:
            loss, acc, _ = segmentation_module(batch_data[0], writer=writer, count=cur_iter, epoch=epoch)
        else:
            if cfg.TRAIN.opt_deform_LabelEdge and epoch >= cfg.TRAIN.fix_seg_start_epoch and epoch <= cfg.TRAIN.fix_seg_end_epoch:
                loss, acc, edge_loss = segmentation_module(batch_data)
            elif cfg.TRAIN.deform_joint_loss:
                loss, acc, edge_loss = segmentation_module(batch_data)
            else:
                loss, acc = segmentation_module(batch_data)

it raises edge_loss undefined later, could you please help? is it ok to just modify single gpu mode as :

        if single_gpu_mode:
            loss, acc, edge_loss = segmentation_module(batch_data[0], writer=writer, count=cur_iter, epoch=epoch)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.