Coder Social home page Coder Social logo

openmedlab / medfm Goto Github PK

View Code? Open in Web Editor NEW
254.0 2.0 13.0 3.85 MB

Official Repository of NeurIPS 2023 - MedFM Challenge

Home Page: https://medfm2023.grand-challenge.org/

Python 99.39% Dockerfile 0.54% Shell 0.07%
neurips-2023 foundation-models medical-image-classification transfer-learning

medfm's Introduction

NeurIPS 2023 - MedFM: Foundation Model Prompting for Medical Image Classification Challenge 2023

A naive baseline and submission demo for the Foundation Model Prompting for Medical Image Classification Challenge 2023 (MedFM).

✨ Notification

Please check out master branch. Third party implementation of MedFMC baseline is supported! It is based on the MMPreTrain, with backbone of ViT-cls, ViT-eva02, ViT-dinov2, Swin-cls and ViT-clip. More details could be found in its document. Thanks Ezra-Yu for this excellent work.

πŸ› οΈ Installation

Install requirements by

$ conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.1 -c pytorch
$ pip install mmcls==0.25.0 openmim scipy scikit-learn ftfy regex tqdm
$ mim install mmcv-full==1.6.0

We suggest you install PyTorch successfully first, then install OpenMMLab packages and their dependencies.

Moreover, you can use other Computer Vision or other foundation models such as EVA and CLIP.

πŸ“Š Results

The results of ChestDR, ColonPath and Endo in MedFMC dataset and their corresponding configs on each task are shown as below.

Few-shot Learning Results

We utilize Visual Prompt Tuning method as the few-shot learning baseline, whose backbone is Swin Transformer. The results are shown as below:

ChestDR

N Shot Crop Size Epoch mAP AUC Config
1 384x384 20 13.14 56.49 config
5 384x384 20 17.05 64.86 config
10 384x384 20 19.01 66.68 config

ColonPath

N Shot Crop Size Epoch Acc AUC Config
1 384x384 20 77.60 84.69 config
5 384x384 20 89.29 96.07 config
10 384x384 20 91.21 97.14 config

Endo

N Shot Crop Size Epoch mAP AUC Config
1 384x384 20 19.70 62.18 config
5 384x384 20 23.88 67.48 config
10 384x384 20 25.62 71.41 config

Transfer Learning on 20% (Fully Supervised Task)

Noted that MedFMC mainly focuses on few-shot learning i.e., transfer learning task. Thus, fully supervised learning tasks below only use 20% training data to make corresponding comparisons.

ChestDR

Backbone Crop Size Epoch mAP AUC Config
DenseNet121 384x384 20 24.48 75.25 config
EfficientNet-B5 384x384 20 29.08 77.21 config
Swin-B 384x384 20 31.07 78.56 config

ColonPath

Backbone Crop Size Epoch Acc AUC Config
DenseNet121 384x384 20 92.73 98.27 config
EfficientNet-B5 384x384 20 94.04 98.58 config
Swin-B 384x384 20 94.68 98.35 config

Endo

Backbone Crop Size Epoch mAP AUC Config
DenseNet121 384x384 20 41.13 80.19 config
EfficientNet-B5 384x384 20 36.95 78.23 config
Swin-B 384x384 20 41.38 79.42 config

🎫 License

This project is released under the Apache 2.0 license.

πŸ™Œ Usage

Data preparation

Prepare data following MMClassification. The data structure looks like below:

data/
β”œβ”€β”€ MedFMC
β”‚   β”œβ”€β”€ chest
β”‚   β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”œβ”€β”€ chest_X-shot_train_expY.txt
β”‚   β”‚   β”œβ”€β”€ chest_X-shot_val_expY.txt
β”‚   β”‚   β”œβ”€β”€ train_20.txt
β”‚   β”‚   β”œβ”€β”€ val_20.txt
β”‚   β”‚   β”œβ”€β”€ trainval.txt
β”‚   β”‚   β”œβ”€β”€ test_WithLabel.txt
β”‚   β”œβ”€β”€ colon
β”‚   β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”œβ”€β”€ colon_X-shot_train_expY.txt
β”‚   β”‚   β”œβ”€β”€ colon_X-shot_val_expY.txt
β”‚   β”‚   β”œβ”€β”€ train_20.txt
β”‚   β”‚   β”œβ”€β”€ val_20.txt
β”‚   β”‚   β”œβ”€β”€ trainval.txt
β”‚   β”‚   β”œβ”€β”€ test_WithLabel.txt
β”‚   β”œβ”€β”€ endo
β”‚   β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”œβ”€β”€ endo_X-shot_train_expY.txt
β”‚   β”‚   β”œβ”€β”€ endo_X-shot_val_expY.txt
β”‚   β”‚   β”œβ”€β”€ train_20.txt
β”‚   β”‚   β”œβ”€β”€ val_20.txt
β”‚   β”‚   β”œβ”€β”€ trainval.txt
β”‚   β”‚   β”œβ”€β”€ test_WithLabel.txt

Noted that the .txt files includes data split information for fully supervised learning and few-shot learning tasks. The public dataset is splitted to trainval.txt and test_WithLabel.txt, and trainval.txt is also splitted to train_20.txt and val_20.txt where 20 means the training data makes up 20% of trainval.txt. And the test_WithoutLabel.txt of each dataset is validation set.

Corresponding .txt files are stored at ./data_backup/ folder, the few-shot learning data split files {dataset}_{N_shot}-shot_train/val_exp{N_exp}.txt could also be generated as below:

python tools/generate_few-shot_file.py

Where N_shot is 1,5 and 10, respectively, the shot is of patient(i.e., 1-shot means images of certain one patient are all counted as one), not number of images.

The images in each dataset folder contains its images, which could be achieved from original dataset.

Training and evaluation using OpenMMLab codebases.

In this repository we provided many config files for fully supervised task (only uses 20% of original traning set, please check out the .txt files which split dataset) and few-shot learning task.

The config files of fully supervised transfer learning task are stored at ./configs/densenet, ./configs/efficientnet, ./configs/vit-base and ./configs/swin_transformer folders, respectively. The config files of few-shot learning task are stored at ./configs/ablation_exp and ./configs/vit-b16_vpt folders.

For the training and testing, you can directly use commands below to train and test the model:

# you need to export path in terminal so the `custom_imports` in config would work
export PYTHONPATH=$PWD:$PYTHONPATH
# Training
# you can choose a config file like `configs/vit-b16_vpt/in21k-vitb16_vpt1_bs4_lr6e-4_1-shot_chest.py` to train its model
python tools/train.py $CONFIG

# Evaluation
# Endo and ChestDR utilize mAP as metric
python tools/test.py $CONFIG $CHECKPOINT --metrics mAP
python tools/test.py $CONFIG $CHECKPOINT --metrics AUC_multilabel
# Colon utilizes accuracy as metric
python tools/test.py $CONFIG $CHECKPOINT --metrics accuracy --metric-options topk=1
python tools/test.py $CONFIG $CHECKPOINT --metrics AUC_multiclass

The repository is built upon MMClassification/MMPretrain. More details could be found in its document.

Generating Submission results of Validation Phase

Noted:

  • The order of filanames of all CSV files must follow the order of provided colon_val.csv, chest_val.csv and endo_val.csv! You can see files in ./data_backup/result_sample for more details.
  • The name of CSV files in result.zip must be the same names xxx_N-shot_submission.csv below.

Run

python tools/test_prediction.py $DATASETPATH/test_WithoutLabel.txt $DATASETPATH/images/ $CONFIG $CHECKPOINT --output-prediction $DATASET_N-shot_submission.csv

For example:

python tools/test_prediction.py data/MedFMC/endo/test_WithoutLabel.txt data/MedFMC/endo/images/ $CONFIG $CHECKPOINT --output-prediction endo_10-shot_submission.csv

You can generate all prediction results of endo_N-shot_submission.csv, colon_N-shot_submission.csv and chest_N-shot_submission.csv and zip them into result.zip file. Then upload it to Grand Challenge website.

result/
β”œβ”€β”€ endo_1-shot_submission.csv
β”œβ”€β”€ endo_5-shot_submission.csv
β”œβ”€β”€ endo_10-shot_submission.csv
β”œβ”€β”€ colon_1-shot_submission.csv
β”œβ”€β”€ colon_5-shot_submission.csv
β”œβ”€β”€ colon_10-shot_submission.csv
β”œβ”€β”€ chest_1-shot_submission.csv
β”œβ”€β”€ chest_5-shot_submission.csv
β”œβ”€β”€ chest_10-shot_submission.csv

Then using zip to make them as .zip file(i.e., result_sample.zip in ./data_backup folder) and upload it to submission site of Grand Challenge MedFMC Validation Phase.

πŸ–ŠοΈ Citation

@article{wang2023real,
  title={A real-world dataset and benchmark for foundation model adaptation in medical image classification},
  author={Wang, Dequan and Wang, Xiaosong and Wang, Lilong and Li, Mengzhang and Da, Qian and Liu, Xiaoqiang and Gao, Xiangyu and Shen, Jun and He, Junjun and Shen, Tian and others},
  journal={Scientific Data},
  volume={10},
  number={1},
  pages={574},
  year={2023},
  publisher={Nature Publishing Group UK London}
}

medfm's People

Contributors

guspan-tanadi avatar mengzhangli avatar xiaosongwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

medfm's Issues

ValueError: not enough values to unpack (expected 2, got 1)

File "C:\Anoconda\envs\pytorch\lib\site-packages\mmengine\config\config.py", line 1830, in call
key, val = kv.split('=', maxsplit=1)
ValueError: not enough values to unpack (expected 2, got 1)
['C:\master\configs\vit-b_vpt\1-shot_chest.py']

Why does the code start slowly?

Hi, first of all thank you for the code. I would like to ask why my code is stuck for a long time before the training starts, is there anyone else who has the same problem as me?

About the dataset preprocessing

Hi there.

Thanks for sharing the code. I have a question about generate_few-shot_file.

For example, to get data /MedFMC/chest/images, I merged two folders from the raw dataset: MedFMC_train/chest/images/ and and MedFMC_val/chest/images/ . Did I do the right thing?

I run the preprocessing script as below:

python tools/generate_few-shot_file.py

However, I get "The validation set are not enough..." Is this normal?

I attach the output of generate_few-shot_file.py . Please correct me if I did something wrong

0 16
1 23
2 43
3 2
The validation set are not enough...
929 84 845
0 101
1 103
2 60
3 15
The validation set are not enough...
929 279 650
0 124
1 86
2 88
3 108
The validation set are not enough...
929 406 523
32 47
7 2018_83253_1-1_2019-02-20 18_38_45-lv1-2951-33576-3045-4336p0006.png
23 2019-07506-1-5-5_2019-05-29 06_09_14-lv1-14171-23920-4547-5510p0018.png
The validation set are not enough...
2358 30 2328
32 47
115 2019_03721_1-1_2019-02-20 19_49_57-lv1-10166-8058-6824-7989p0020.png
150 2019-07501-1-1-1_2019-05-29 08_22_25-lv1-9661-18329-4972-6041p0007.png
The validation set are not enough...
2358 265 2093
32 47
529 D201707788_2019-05-14 14_05_34-lv1-14456-57-11246-23473p0003.png
430 2019-07805-1-1-1_2019-05-29 05_23_08-lv1-28048-27815-6537-5899p0016.png
The validation set are not enough...
2358 959 1399
The validation set are not enough...
979 19 960
The validation set are not enough...
979 95 884
The validation set are not enough...
979 190 789

Cholecystectomy Tasks

Hello, thank you for the great work, really impressive. I am wondering if you tried the foundation model's performance on Cholec80 and CholecT45 ?
For example, including other cholecystectomy dataset during pretraining and test on Cholec80 and CholecT45.

Thanks, btw, is it possible to add you through the Wechat? My email address is [email protected]

Finding dataset

Where can i find the datasets ? I don't find any instruction in this github to install the datasets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.