Coder Social home page Coder Social logo

covid-19-ct-seg-benchmark's Introduction

Towards Efficient COVID-19 CT Annotation: A Benchmark for Lung and Infection Segmentation

Motivation

Tremendous studies show that deep learning methods have potential for providing accurate and quantitative assessment of COVID-19 infection in CT scans if hundreds of well-labeled training cases are available. However, manual delineation of lung and infection is time-consuming and labor-intensive. Thus, we set up this benchmark to explore annotation-efficient methods for COVID-19 CT scans segmentation. In particular, we focus on learning to segment left lung, right lung and infection using

  • pure but limited COVID-19 CT scans;

  • existing labeled lung CT dataset from other non-COVID-19 lung diseases;

  • heterogeneous datasets include both COVID-19 and non-COVID-19 CT scans.

Ultimate goal: training a model on limited data that can generalize on infinite data!

@article{MP-COVID-19-SegBenchmark,
  title={Towards Data-Efficient Learning: A Benchmark for COVID-19 CT Lung and Infection Segmentation},
  author = {Ma, Jun and Wang, Yixin and An, Xingle and Ge, Cheng and Yu, Ziqi and Chen, Jianan and Zhu, Qiongjie and Dong, Guoqiang and He, Jian and He, Zhiqiang and Cao, Tianjia and Zhu, Yuntao and Nie, Ziwei and Yang, Xiaoping},
  journal = {Medical Physics},
  volume = {48},
  number = {3},
  pages = {1197-1210},
  doi = {https://doi.org/10.1002/mp.14676},
  year = {2021}
}

Datasets

Download Dataset Description License
StructSeg 2019 50 lung CT scans; Annotations include left lung, right lung, spinal cord, esophagus, heart, trachea and gross target volume of lung cancer. Hold by the challenge organizers
NSCLC 402 lung CT scans; Annotations include left lung, right lung and pleural effusion (78 cases). CC BY-NC
MSD Lung Tumor 63 lung CT scans; Annotations include lung cancer. CC BY-SA
COVID-19-CT-Seg 20 lung CT scans; Annotations include left lung, right lung and infections. CC BY-NC-SA
MosMed 50 labelled COVID-19 CT scans; Annotations include infections. CC BY-NC-ND

Examples

Segmentation Task 1: Learning with limited annotations

This task is based on the COVID-19-CT-Seg dataset with 20 cases. Three subtasks are to segment lung, infection or both of them. For each task, 5-fold cross-validation results should be reported. It should be noted that each fold only has 4 training cases, and remained 16 cases are used for testing. In other words, this is a few-shot or zero-shot segmentation task. Dataset split file and quantitative results of U-Net baseline are presented in Task1 folder.

Subtask Training and Testing Testing
Lung 5-fold cross validation
4 cases (20% for training)
16 cases (80% for testing)
MosMed(50)
Infection
Lung and infection

Segmentation Task 2: Learning to segment COVID-19 CT scans from non-COVID-19 CT scans

This task is to segment lung and infection in COVID-19 CT scans. The main difficulty is that the training set and testing set differ in data distribution. Although all the datasets are lung CT, they vary in lesion types (i.e., cancer, pleural effusion, and COVID-19), patient cohorts and imaging scanners.

It should be noted that labeled COVID-19 CT scans are not allowed to be used during training. The following table presents the details of training, validation, and testing set. Name (Num.) denotes the dataset name and the number of cases in this dataset, e.g., StructSeg Lung (40) denotes that 40 cases in StructSeg dataset are used for training.

Dataset split file and quantitative results of U-Net baseline are presented in Task2 folder.

Subtask Training In-domain Testing (Unseen)Testing 1 (Unseen)Testing 2
Lung StructSeg Lung (40)
NSCLC Lung (322)
StructSeg Lung (10)
NSCLC Lung (80)
COVID-19-CT-Seg
Lung (20)
-
Infection MSD Lung Tumor (51)
StructSeg Gross Target (40)
NSCLC Plcural Effusion (62)
MSD Lung Tumor (12)
StructSeg Gross Target (10)
NSCLC Plcural Effusion (16)
COVID-19-CT-Seg
Infection(20)
MosMed(50)

Segmentation Task 3: Learning with both COVID-19 and non-COVID-19 CT scans

This task is also to segment lung and infection in COVID-19 CT scans, but a limited labeled COVID-19 CT scans are allowed to be used during training. For each subtask, 5-fold cross-validation results should be reported.

Dataset split file and quantitative results of U-Net baseline will be presented in Task3 folder.

Subtask Training Validation Testing 1 Testing 2
Lung StructSeg Lung (40)
NSCLC Lung (322)
COVID-19-CT-Seg Lung(4) StructSeg Lung (10)
NSCLC Lung (80)
COVID-19-CT-Seg Lung(16) -
Infection MSD Lung Tumor (51)
StructSeg Gross Target (40)
NSCLC Plcural Effusion (62)
COVID-19-CT-Seg Infection(4) MSD Lung Tumor (12)
StructSeg Gross Target (10)
NSCLC Plcural Effusion (16)
COVID-19-CT-Seg Infection(16) MosMed(50)

Guidelines

  • We hope these tasks can serve as a benchmark for novel annotation-efficient segmentation methods of COVID-19 CT scans. Both semi-automatic (e.g., level set, graph cut...) and fully automatic methods (e.g., CNNs...) are welcome.
  • Evaluation metrics are Dice similarity coefficient (DSC) and normalized surface Dice (NSD), and the python implementations are here.
  • In COVID-19-CT-Seg dataset, the last 10 cases from Radiopaedia have been adjusted to lung window [-1250,250], and then normalized to [0,255], we recommend to adust the first 10 cases from Coronacases with the same method.
  • Nifty format of the NSCLC dataset can be downloaded here (pw:1qop). It should be noted that all the copyrights belong to the original dataset contributors, and please also cite the corresponding publications if you use this dataset.
  • 2D/3D U-Net baselines are based on nnU-Net. 100 pretrained baseline models and corresponding segmentation results are available: 3D U-Net and 2D U-Net.

Baidu Net Disk mirror (pw: t5mj)

3D

U-Net
Subtask
Left Lung Right Lung Infection(COVID-19-CT-Seg) Infection(MosMed)
DSC NSD DSC NSD DSC NSD DSC NSD
Task1-Separate 85.8±10.5 71.2±13.8 87.9±9.3 74.8±11.9 67.3±22.3 70.0±24.4 58.8±20.6 66.4±20.3
Task1-Union 64.6±26.4 51.1±23.4 75.0±16.8 57.7±17.4 61.0±26.2 61.8±27.4 48.2±22.1 41.4±19.1
Task2-MSD - - - - 25.2±27.4 26.0±28.5 16.2±23.2 17.5±23.4
Task2-StructSeg 92.2±19.7 82.0±15.7 95.5±7.2 84.2±11.6 6.0±12.7 5.5±10.7 2.6±9.5 3.3±9.9
Task2-NSCLC 57.5±21.5 46.9±17.0 72.2±15.3 51.7±16.8 0.4±0.9 3.7±4.8 0.0±0.0 0.5±1.4
Task3-MSD 96.5±2.8 87.9±7.9 96.9±2.2 88.5±7.1 62.3±25.7 61.3±27.6 39.2±30.6 41.3±30.5
Task3-StructSeg 97.3±2.1 90.6±6.2 97.7±2.1 91.4±6.1 64.2±24.5 63.3±25.7 44.3±25.3 49.1±25.8
Task3-NSCLC 93.5±5.4 76.9±13.3 94.0±5.3 77.2±14.1 60.2±25.4 58.5±26.7 30.1±26.7 33.4±27.1
2D

U-Net
Subtask
Left Lung Right Lung Infection(COVID-19-CT-Seg) Infection(MosMed)
DSC NSD DSC NSD DSC NSD DSC NSD
Task1-Separate 95.1±7.9 84.6±12.7 95.6±7.4 85.5±12.8 60.9±24.5 61.5±27.0 53.7±21.4 61.5±21.2
Task1-Union 87.3±15.8 70.5±18.7 89.4±12.8 71.0±17.8 57.7±26.3 57.2±29.0 52.2±21.6 46.2±18.3
Task2-MSD - - - - 7.9±11.5 12.9±15.3 7.6±15.8 9.9±17.1
Task2-StructSeg 46.3±47.6 28.4±31.7 45.3±46.7 28.0±31.3 0.2±0.8 0.6±1.6 1.9±10.1 2.2±10.0
Task2-NSCLC 47.3±48.6 37.9±40.1 47.6±48.9 38.0±40.2 1.2±2.9 7.3±9.7 0.0±0.0 1.0±1.9
Task3-MSD 96.9±4.9 89.8±9.1 97.1±4.9 89.8±9.1 51.2±26.8 52.7±27.4 24.1±23.5 29.0±24.5
Task3-StructSeg 96.3±7.6 88.7±10.8 96.7±7.0 89.0±11.6 57.4±26.6 57.3±28.4 48.2±23.1 55.0±23.6
Task3-NSCLC 92.5±17.3 82.5±18.6 93.3±15.9 82.9±18.6 52.5±29.6 52.6±30.3 31.7±24.6 38.9±25.9
  • How to reproduce the baseline results?

Step 1. Install the nnU-Net following the official guidance.

Step 2. Download the 3D or 2D trained models and put them into your model folder.

Step 3. Run the inference code.

Update

Due to the license limitation, we can not directly share this dataset, pleanse download it from the official homepage.

  • 2020.06.30: Lung annotations of MSD dataset. Baidu NetDisk (pw: q2qv)

TODO

Acknowledgements

We thank all the organizers of MICCAI 2018 Medical Segmentation Decathlon, MICCAI 2019 Automatic Structure Segmentation for Radiotherapy Planning Challenge, the Coronacases Initiative and Radiopaedia for the publicly available lung CT dataset. We also thank Joseph Paul Cohen for providing convenient download link of 20 COVID-19 CT scans. We also thank all the contributor of NSCLC and COVID-19-Seg-CT dataset for providing annotations of lung, pleural effusion and COVID-19 infection. We also thank the organizers of TMI Special Issue on Annotation-Efficient Deep Learning for Medical Imaging because we get lots of insights from the call for papers when designing these segmentation tasks. We also thank the contributors of these great COVID-19 related resources: COVID19_imaging_AI_paper_list and MedSeg. Last but not least, we thank Chen Chen, Xin Yang, and Yao Zhang for their important feedback on this benchmark.

covid-19-ct-seg-benchmark's People

Contributors

gc-js avatar junma11 avatar wangyixinxin avatar yu56500 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-19-ct-seg-benchmark's Issues

Can nnUNet be used as a crop preprocessing in semi-supervised segmentation?

Hi JunMa, crop preprocessing based on labels is very common in medical image segmentation task. In Task 1, I want to continue to improve the effect through semi-supervised learning, but the unsupervised data requires crop preprocessing. Can I provide a pseudo-label for data preprocessing based on rough segmentation of unsupervised data based on baseline model in each fold? I wonder that if this process will be considered to be cheating? Thank you very much!

Feasibility of Few shot segmentation (specifically task 3)

Hi JunMa,

Maybe this is not the correct thread for asking but I'm working on this segmentation benchmark and my supervisor plan to extend this for the individual project. Do you think this task of few-shot learning (maybe a transfer from NSCLC or similar dataset) is feasible because as far as know (correct me if I'm wrong) few-shot learning in segmentation in the medical domain is still not well explored?
Or if you have any suggesting method in mind that I can start with?

Thanks

Can't find data

Thanks for sharing a wonderful project.
I have a problem with data permission because the link can't permit me to download. Can you give the way download dataset StructSeg 2019?

Best regards!

关于一个新冠分割数据的问题

您好,在task1中,我们关注到radiopaedia_29_86490_1这个数据的感染标注非常少,不同于其他数据。在对左右肺感染联合分割时,对这个数据的感染的分割非常困难,在验证集上其他数据得到的感染评价结果与该数据之间存在巨大的差异,请问radiopaedia_29_86490_1的病灶分割是否存在一个比较量化的评价标准以供我们分析这个问题出现的原因?

5-fold cross-validation

Dear JunMa:

Thank you for providing this amazing benchmark. Should we follow your data split for 5-fold cross-validation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.