mimic-nle's Introduction

MIMIC-NLE Dataset

This repository contains the scripts to extract the MIMIC-NLE dataset from the MIMIC-CXR radiology reports. In order to download MIMIC-CXR, head here. More details on MIMIC-NLE are provided in our MICCAI 2022 paper:

Explaining Chest X-ray Pathologies in Natural Language (arxiv)

Extracting the dataset

To run our extraction script, please first a create an environment by running conda env create -f environment.yml. It is crucial that you use the same spaCy library, as otherwise there may be a discrepancy in the sentence splitting.

After you downloaded MIMIC-CXR, get the path of the radiology reports, which should be given in the following structure:

mimic_reports
└───p10
│   └───p10000032
│   │     s50414267.txt
│   │     s53189527.txt
│   │     ...
│   └───p10000764
│   ...
└───p11
...
└───p19

You can then generate MIMIC-NLE by simply running:

python extract_mimic_nle.py --reports_path path/to/your/reports

The train, dev, and test set will be stored in the mimic-nle folder.

Dataset details

For each Natural Language Explanation (NLE), we get the following information:

{
"sentence_ID": "s51038639#2",                                   // unique NLE identifier
"nle": "Subtle lower lobe opacities may reflect atelectasis.",  // the NLE
"patient_ID": "p11662490",                                      // unique patient ID
"report_ID": "s51038639",                                       // unique radiology report ID
"diagnosis_label": [1, 0, 0, 0, 0, 0, 0, 0, 1, 0],              // one-hot encoding of the diagnosis label, i.e. the label that is being explained
"evidence_label": [0, 0, 0, 0, 0, 1, 0, 0, 0, 0],               // one-hot encoding of the evidence label, i.e. the label that is evidence for another label
"img_labels": [[0, 1, 0], [1, 0, 0], ..., [1, 0, 0]]            // image-wide labels, given as [negative, uncertain, positive] for each class
}

The diagnosis and evidence labels apply only specifically to the NLE at hand, while the img_labels apply to the whole image and may contain labels not referred to in the NLE. Dictionaries to uncode the one-hot encoding are provided in encodings.py.

Citation

If you make use of MIMIC-NLE, please cite our paper:

@inproceedings{MICCAI-KECPPL-2022,
  title = "Explaining Chest X-ray Pathologies in Natural Language",
  author = "Maxime Kayser and Cornelius Emde and Oana Camburu and Guy Parsons and Bartlomiej Papiez and Thomas Lukasiewicz",
  year = "2022",
  booktitle = "Proceedings of the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2022, Singapore, 18--22 September 2022",
  month = "September",
  publisher = "Springer",
  series = "Lecture Notes in Computer Science (LNCS)",
}

mimic-nle's People

Contributors

Stargazers

Watchers

mimic-nle's Issues

Bug in conda environment - Mac M1

conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

libffi==3.3=hb1e8313_2
certifi==2022.6.15=py39hecd8cb5_0
python==3.9.12=hdfd78df_1
xz==5.2.5=hca72f7f_1
pip==22.1.2=py39hecd8cb5_0
ca-certificates==2022.07.19=hecd8cb5_0
ncurses==6.3=hca72f7f_3
openssl==1.1.1q=hca72f7f_0
readline==8.1.2=hca72f7f_1
setuptools==61.2.0=py39hecd8cb5_0
zlib==1.2.12=h4dc903c_2
tk==8.6.12=h5d9f67b_0
sqlite==3.38.5=h707629a_0
libcxx==12.0.0=h2f01273_0

Training DPT model

Hello,

I am currently working on a follow-up study that how various visual models could affect the NLE generation on chest X-ray images.
Fixing GPT-2 as an NLE generation model, I'm preparing the same hyperparameters and training setup provided in your paper (arxiv) but stuck where putting 7x7x1024 feature map(Xrep) with a text diagnosis(pj) and prediction vector Y to GPT-2 as input.

May I ask a detailed description of actual implementation, or maybe the code otherwise?

Thank you in advance.

No Code for Generating Evidence and Diagnosis Labels on New Reports

Hello there,

With the code currently available in the repo, there is no way to generate a list of diagnosis labels and evidence labels for a new report. While the 'query' folder contains these labels, they all come directly from reports in the MIMIC-CXR dataset. If you could provide the tool or code you used to extract these two sets of labels based on the evidence graph from your paper, that would be incredibly helpful. The new metric you introduced, CLEV, cannot be used to evaluate generated text reports without this. Thank you in advance.

Recommend Projects

maximek3 / mimic-nle Goto Github PK

mimic-nle's Introduction

MIMIC-NLE Dataset

Extracting the dataset

Dataset details

Citation

mimic-nle's People

Contributors

Stargazers

Watchers

Forkers

mimic-nle's Issues

Bug in conda environment - Mac M1

Training DPT model

No Code for Generating Evidence and Diagnosis Labels on New Reports

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent