Coder Social home page Coder Social logo

srinidhipy / ssl_cr_histo Goto Github PK

View Code? Open in Web Editor NEW
62.0 5.0 21.0 8.9 MB

Official code for "Self-Supervised driven Consistency Training for Annotation Efficient Histopathology Image Analysis" Published in Medical Image Analysis (MedIA) Journal, Oct, 2021.

Home Page: https://doi.org/10.1016/j.media.2021.102256

License: MIT License

Python 99.51% Shell 0.49%
self-supervised-learning digital-pathology camelyon16 breastpathq teacher-student-training annotation-efficient semi-supervised-learning deep-learning histopathology

ssl_cr_histo's Introduction

Self-Supervised Driven Consistency Training for Annotation Efficient Histopathology Image Analysis

Overview

We propose a self-supervised driven consistency training paradigm for histopathology image analysis that learns to leverage both task-agnostic and task-specific unlabeled data based on two strategies:

  1. A self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images (WSIs) to learn a powerful supervisory signal for unsupervised representation learning.

  2. A new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific unlabeled data.

We carry out extensive validation experiments on three histopathology benchmark datasets across two classification and one regression-based tasks:

We compare against the state-of-the-art self-supervised pretraining methods based on generative and contrastive learning techniques: Variational Autoencoder (VAE) and Momentum Contrast (MoCo), respectively.

1. Self-Supervised pretext task

2. Consistency training

Results

  • Predicted tumor cellularity (TC) scores on BreastPathQ test set for 10% labeled data


  • Predicted tumor probability on Camelyon16 test set for 10% labeled data

Pre-requisites

Core implementation:

  • Python 3.7+
  • Pytorch 1.7+
  • Openslide-python 1.1+
  • Albumentations 1.8+
  • Scikit-image 0.15+
  • Scikit-learn 0.22+
  • Matplotlib 3.2+
  • Scipy, Numpy (any version)

Additional packages can be installed via: requirements.txt

Datasets

Training

The model training consists of three stages:

  1. Task-agnostic self-supervised pretext task (i.e., the proposed Resolution sequence prediction (RSP) task)
  2. Task-specific supervised fine-tuning (SSL)
  3. Task-specific teacher-student consistency training (SSL_CR)

1. Self-supervised pretext task: Resolution sequence prediction (RSP) in WSIs

From the file "pretrain_BreastPathQ.py / pretrain_Camelyon16.py", you can pretrain the network (ResNet18) for predicting the resolution sequence ordering in WSIs on BreastPathQ & Camelyon16 dataset, respectively. This can be easily adapted to any other dataset of choice.

  • The choice of resolution levels for the RSP task can also be set in dataset.py#L277 while pretraining on any other datasets.
  • The argument --train_image_pth is the only required argument and should be set to the directory containing your training WSIs. There are many more arguments that can be set, and these are all explained in the corresponding files.
python pretrain_BreastPathQ.py    // Pretraining on BreastPathQ   
python pretrain_Camelyon16.py    // Pretraining on Camelyon16
  • We also provided the pretrained models for BreastPathQ and Camelyon16, found in the "Pretrained_models" folder. These models can also be used for feature transferability (domain adaptation) between datasets with different tissue types/organs.

  • A new version of RSP (version-2) pretraining has been implemented with Randaugment technique [Pretraining_v2] on TIGER Challenge dataset (https://tiger.grand-challenge.org).

2. Task specific supervised fine-tuning on downstream task

From the file "eval_BreastPathQ_SSL.py / eval_Camelyon_SSL.py / eval_Kather_SSL.py", you can fine-tune the network (i.e., task-specific supervised fine-tuning) on the downstream task with limited label data (10%, 25%, 50%). Refer to, paper for more details.

  • Arguments: --model_path - path to load self-supervised pretrained model (i.e., trained model from Step 1). There are other arguments that can be set in the corresponding files.
python eval_BreastPathQ_SSL.py  // Supervised fine-tuning on BreastPathQ   
python eval_Camelyon_SSL.py    // Supervised fine-tuning on Camelyon16
python eval_Kather_SSL.py    // Supervised fine-tuning on Kather dataset (Colorectal)

Note: we didn't perform self-supervised pretraining on the Kather dataset (colorectal) due to the unavailability of WSI's. Instead, we performed domain adaptation by pretraining on Camelyon16 and fine-tuning on the Kather dataset. Refer to, paper for more details.

3. Task specific teacher-student consistency training on downstream task

From the file "eval_BreastPathQ_SSL_CR.py / eval_Camelyon_SSL_CR.py / eval_Kather_SSL_CR.py", you can fine-tune the student network by keeping the teacher network frozen via task-specific consistency training on the downstream task with limited label data (10%, 25%, 50%). Refer to, paper for more details.

  • Arguments: --model_path_finetune - path to load SSL fine-tuned model (i.e., self-supervised pretraining followed by supervised fine-tuned model from Step 2) to intialize "Teacher and student network" for consistency training; There are other arguments that can be set in the corresponding files.
python eval_BreastPathQ_SSL_CR.py  // Consistency training on BreastPathQ   
python eval_Camelyon_SSL_CR.py    // Consistency training on Camelyon16
python eval_Kather_SSL_CR.py    // Consistency training on Kather dataset (Colorectal)

Testing

The test performance is validated at two stages:

  1. Self-Supervised pretraining followed by supervised fine-tuning
  • From the file "eval_BreastPathQ_SSL.py / eval_Kather_SSL.py ", you can test the model by changing the flag in argument: '--mode' to 'evaluation'.
  1. Consistency training
  • From the file "eval_BreastPathQ_SSL_CR.py / eval_Kather_SSL_CR.py", you can test the model by changing the flag in argument: '--mode' to 'evaluation'.

The prediction on Camelyon16 test set can be performed using "test_Camelyon16.py" file.

License

Our code is released under MIT license.

Citation

If you find our work useful in your research or if you use parts of this code please consider citing our paper:

@article{srinidhi2022self,
  title={Self-supervised driven consistency training for annotation efficient histopathology image analysis},
  author={Srinidhi, Chetan L and Kim, Seung Wook and Chen, Fu-Der and Martel, Anne L},
  journal={Medical Image Analysis},
  volume={75},
  pages={102256},
  year={2022},
  publisher={Elsevier}
}

Extended work

  • We also improved our self-supervised pretrained representations to Out-of-Distrbiution data via hardness-aware dynamic curriculum learning (HaDCL) approach. Published in ICCV 2021, CDpath Workshop (Oral). [Conference proceedings] [arXiv preprint] [Code]

Acknowledgements

  • This work was funded by Canadian Cancer Society and Canadian Institutes of Health Research (CIHR). It was also enabled in part by support provided by Compute Canada (www.computecanada.ca).

  • RSP (Version-2) pretraining code with Randaugment technique has been inspired by "Tailoring automated data augmentation to H&E-stained histopathology", MIDL 2021 (https://github.com/DIAGNijmegen/pathology-he-auto-augment). Please, do cite this paper if you use this code.

Questions or Comments

Please direct any questions or comments to me; I am happy to help in any way I can. You can email me directly at [email protected].

ssl_cr_histo's People

Contributors

srinidhipy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ssl_cr_histo's Issues

Pretrained weight

Hi, thanks for your amazing work! I have a question that when I try to load the ptetrained weight Pretrained_models/Camelyon16_pretrained_model.pt, it has some problem.
image
I think this weight is damaged, can you please check the validity of the weight. Looking forward to your reply!

BreastPathQ: After trianing the model by WSIs, the result which evaluated by Task-specific supervised fine-tuning is bad

  • For BreastPathQ, when I used the pretrian model author released and did the finetune, I could get a similar result to your paper.

  • But I tried to pretrain the model myself, following the instructions (1. Self-supervised pretext task: Resolution sequence prediction (RSP) in WSIs). I used the best pretrain model and did the same things as before. After finetuning, the result was not good enough like paper.

  • paper:image

  • my reslut: image

  • During my training, 4 WSIs which are bad wsis cannot be used. But I don’t think it is an essential problem for me, because just lose hundreds of data.

How to use the pre-trained models?

Hi,
Thank you for your great work!
I wonder how to use your model for linear probing.
I empirically find the results are not promising when I use your released models with the MLP removed.
If the MLP can not be removed, then how to use the model with only one magnitude of pathological images?
Looking forward to your help! Thanks again.

Some questions about the experimental results of the CRC dataset

Thank you for your excellent paper and open source code. I have some questions about the experimental results of NCT-CRC.

  1. The MoCo + CR approach obtains a new state-of-the-art result with an Acc of 0.990, weighted F-1 score of 0.953 and a macro AUC of 0.997, compared to the previous method ( Kather et al., 2019 ) which obtained an Acc of 0.943. However, using random initialization can get 97.2% acc with 10% training data in Table 5, which is also much higher than 0.943 of (kather et al., 2019), random initialization can also get high ACC, did I miss something?

Table 5 presents the overall Acc and weighted F 1 score ( F 1 ) for classification of 9 colorectal tissue classes using different methodologies. On this dataset, the MoCo + CR approach obtains a new state-of-the-art result with an Acc of 0.990, weighted F-1 score of 0.953 and a macro AUC of 0.997, compared to the previous method ( Kather et al., 2019 ) which obtained an Acc of 0.943.

  1. When I train the CRC dataset, the difference between my weighted F1 and ACC was not as great as yours(Acc :0.990, weighted F-1: 0.953), for example, ACC:0.9400, weight-F1:0.9399 , did I miss something?

Loss function of BreastPathQ fine-tuning

Hi, thank you for sharing your code and pre-trained models !

I have a question regarding the loss function used for fine-tuning the pre-trained model on the BreastPathQ dataset.
In line 387 it looks like you use the mean squarred error, although you build your model as a classifier. Is this what you wanted to do ? And if yes, could you explain why ?

Thank you.

Qustions about the BreastPathQ dataset.

图片
Thanks for your effort for releasing this great code, I have some issues towards the BreastPathQ dataset. I wander where can i get the target of the testing set of BreastPathQ dataset, since the official site on grand challenge only have labels of training and validation set.

Slide prediction in Camelyon16

Hello, thanks for your amazing work!In the process of reproducing the results of the paper, I encountered some problems, which I hope can be replyed.
For the slide prediction in Camelyon16, I didn't find code on how to predict from heatmap to slide level. According to the paper, I refer to the code here: For a Slide, I extracted 28 features based on the heatmap, and then fed into the random forest for training, but did not get a good result. So there will be some tricks to train the RandomForestClassifier?If you can open source the code for this part, I believe it will be of great help!
Looking forward to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.