Coder Social home page Coder Social logo

centime's Introduction

CenTime: Event-Conditional Modelling of Censoring in Survival Analysis

maintenance arxiv1 arxiv2 maintenance maintenance OSIC

Overview

This repository contains the code for the paper

A. H. Shahin, A. Zhao, A. C. Whitehead, D. C. Alexander, J. Jacob, D. Barber. CenTime: Event-Conditional Modelling of Censoring in Survival Analysis. [arXiv] [Medical Image Analysis]

We also include the code for our prequel paper published in the International Conference on Medical Imaging with Deep Learning (MIDL) 2022

A. H. Shahin, J. Jacob, D. C. Alexander, D. Barber. Survival Analysis for Idiopathic Pulmonary Fibrosis using CT Images and Incomplete Clinical Data. [arXiv] [OpenReview] [PMLR]

If you use these tools or methods in your publications, please consider citing the accompanying papers with a BibTeX entry similar to the following:

@article{shahin2023centime,
title = {CenTime: Event-conditional modelling of censoring in survival analysis},
author = {Ahmed H. Shahin and An Zhao and Alexander C. Whitehead and Daniel C. Alexander and Joseph Jacob and David Barber},
journal = {Medical Image Analysis},
volume = {91},
pages = {103016},
year = {2024},
issn = {1361-8415},
doi = {https://doi.org/10.1016/j.media.2023.103016},
url = {https://www.sciencedirect.com/science/article/pii/S1361841523002761},
keywords = {Survival Analysis, Deep Learning, Censoring},
}

@InProceedings{shahin22a,
  title = {Survival Analysis for Idiopathic Pulmonary Fibrosis using CT Images and Incomplete Clinical Data},
  author = {Shahin, Ahmed H. and Jacob, Joseph and Alexander, Daniel C. and Barber, David},
  booktitle = {Proceedings of The 5th International Conference on Medical Imaging with Deep Learning},
  pages = {1057--1074},
  year = {2022},
  editor = {Konukoglu, Ender and Menze, Bjoern and Venkataraman, Archana and Baumgartner, Christian and Dou, Qi and Albarqouni, Shadi},
  volume = {172},
  series = {Proceedings of Machine Learning Research},
  month = {06--08 Jul},
  publisher = {PMLR},
  url = {https://proceedings.mlr.press/v172/shahin22a.html},
}

Requirements

We used Python 3.9.5 and PyTorch 1.9.1 in our experiments. For the rest of the requirements, please refer to the requirements.txt file. You can install them using the following command: pip install -r requirements.txt.

Data

The data used in this work is the OSIC Dataset. If you are an OSIC member, please register to access the repository. If not, please visit the membership page to learn more.

In addition, the methods are quite general and can be applied to any survival analysis dataset. We are eager to see how our methods perform on other datasets and we encourage you to try them out and/or reach out with any questions or insights.

Usage

We assume that the code is executed from the root directory of the repository.

Dataset Class

First, you will need to write a custom pytorch dataset class that loads your data. You should place it in the dataset.py file. The __getitem__ function in the dataset class should return a dictionary with the following keys:

  • img: a tensor of shape (C, H, W) where C is the number of channels, H is the height, and W is the width.
  • time_to_event: a tensor of shape (1,) that contains the time to event for the current sample. This is the time at which the event occurred (for an uncensored sample) or the time at which the sample was censored (for a censored sample).
  • event: a tensor of shape (1,) that contains the event type for the current sample. This is 0 for a censored sample and 1 for an uncensored sample.
  • pid_sids: a tuple of shape (1,) that contains the patient ID and the study ID for the current sample. This is used to group the scans of the same patient together.
  • clinical_data (optional): a tensor of shape (D,) that contains the clinical data for the current sample, D is the number of clinical features after pre-processing. This is used to condition the model on the clinical data. If you do not have clinical data, you can ignore this key.

Preprocessing details are provided in the paper. We used the following preprocessing steps:

  • CT scans
    • Resample to a common voxel size of 1mm x 1mm x 1mm
    • Crop to the lung area
    • Pad to the largest dimension
    • Resize to 256 x 256 x 256
    • Apply Hounsfield Unit (HU) windowing to [-1024, 150]
    • Z-score normalization
  • Clinical data
    • Missing data imputation using our latent variable model (see the MIDL paper for details)
    • Z-score normalization for continuous features
    • One-hot encoding for categorical features

Changing the keys is possible but you will need to apply minimal modifications to the code, accordingly (e.g., custom_transforms.py and the training scripts.).

Missing Data Imputation

If you have missing clinical data, you will need to impute them before training. We used our latent variable model for missing data imputation. Please refer to the MIDL paper for details. The code for the model is provided in imputation/missing_data_imputation.py. You can use the following command to train the model:

python imputation/missing_data_imputation.py -data_path <path to the data> -save_path <path to save the model> -n_iter <number of iterations> -n_latent_states <number of latent states> -n_bins <number of bins to use for discretization> -tol <tolerance> -val_size <validation size percentage> -cont_feats_idx <indices of continuous features>

This will save the model parameters p(h) & p(x|h) in the specified path. The model parameters will be used for missing data imputation during training and validation (e.g., here).

Training

To train the model using our CenTime method, you can use the following command:

python train_dist.py -loss centime ...

and set the rest of the arguments as needed (see utils.py for the full list of arguments). To use the classical censoring model, you can use the following command:

python train_dist.py -loss classical ...

To train using Cox, CoxMB, and DeepHit, you can use the following commands:

python train_cox.py ... # for Cox
python train_coxmb.py -K <the proportion of training data to be stored in the memory bank> ... # for CoxMB
python train_deephit.py -ranking <1 to use the ranking term, 0 otherwise> ... # for DeepHit

Scripts use the C-Index, MAE, and RAE metrics for evaluation (see eval_utils.py for details).

Acknowledgements

This work was supported by the Open Source Imaging Consortium.

centime's People

Contributors

ahmedhshahin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.