Coder Social home page Coder Social logo

mimic-nle's Introduction

MIMIC-NLE Dataset

This repository contains the scripts to extract the MIMIC-NLE dataset from the MIMIC-CXR radiology reports. In order to download MIMIC-CXR, head here. More details on MIMIC-NLE are provided in our MICCAI 2022 paper:

Explaining Chest X-ray Pathologies in Natural Language (arxiv)

Extracting the dataset

To run our extraction script, please first a create an environment by running conda env create -f environment.yml. It is crucial that you use the same spaCy library, as otherwise there may be a discrepancy in the sentence splitting.

After you downloaded MIMIC-CXR, get the path of the radiology reports, which should be given in the following structure:

mimic_reports
└───p10
│   └───p10000032
│   │     s50414267.txt
│   │     s53189527.txt
│   │     ...
│   └───p10000764
│   ...
└───p11
...
└───p19

You can then generate MIMIC-NLE by simply running:

python extract_mimic_nle.py --reports_path path/to/your/reports

The train, dev, and test set will be stored in the mimic-nle folder.

Dataset details

For each Natural Language Explanation (NLE), we get the following information:

{
"sentence_ID": "s51038639#2",                                   // unique NLE identifier
"nle": "Subtle lower lobe opacities may reflect atelectasis.",  // the NLE
"patient_ID": "p11662490",                                      // unique patient ID
"report_ID": "s51038639",                                       // unique radiology report ID
"diagnosis_label": [1, 0, 0, 0, 0, 0, 0, 0, 1, 0],              // one-hot encoding of the diagnosis label, i.e. the label that is being explained
"evidence_label": [0, 0, 0, 0, 0, 1, 0, 0, 0, 0],               // one-hot encoding of the evidence label, i.e. the label that is evidence for another label
"img_labels": [[0, 1, 0], [1, 0, 0], ..., [1, 0, 0]]            // image-wide labels, given as [negative, uncertain, positive] for each class
}

The diagnosis and evidence labels apply only specifically to the NLE at hand, while the img_labels apply to the whole image and may contain labels not referred to in the NLE. Dictionaries to uncode the one-hot encoding are provided in encodings.py.

Citation

If you make use of MIMIC-NLE, please cite our paper:

@inproceedings{MICCAI-KECPPL-2022,
  title = "Explaining Chest X-ray Pathologies in Natural Language",
  author = "Maxime Kayser and Cornelius Emde and Oana Camburu and Guy Parsons and Bartlomiej Papiez and Thomas Lukasiewicz",
  year = "2022",
  booktitle = "Proceedings of the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2022, Singapore, 18--22 September 2022",
  month = "September",
  publisher = "Springer",
  series = "Lecture Notes in Computer Science (LNCS)",
}

mimic-nle's People

Contributors

maximek3 avatar

Stargazers

seilk avatar Katelyn Morrison avatar John Barsotti avatar Manxi Lin avatar yupei zhang avatar Angus Nicolson avatar Hao Hua avatar Francis Chen avatar none avatar Peilun Dai avatar Keegan Quigley avatar EZhang avatar chaoyi-wu avatar Jianfeng Wang avatar  avatar Andrey avatar

Watchers

T.C avatar  avatar

mimic-nle's Issues

No Code for Generating Evidence and Diagnosis Labels on New Reports

Hello there,

With the code currently available in the repo, there is no way to generate a list of diagnosis labels and evidence labels for a new report. While the 'query' folder contains these labels, they all come directly from reports in the MIMIC-CXR dataset. If you could provide the tool or code you used to extract these two sets of labels based on the evidence graph from your paper, that would be incredibly helpful. The new metric you introduced, CLEV, cannot be used to evaluate generated text reports without this. Thank you in advance.

Bug in conda environment - Mac M1

conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

  • libffi==3.3=hb1e8313_2
  • certifi==2022.6.15=py39hecd8cb5_0
  • python==3.9.12=hdfd78df_1
  • xz==5.2.5=hca72f7f_1
  • pip==22.1.2=py39hecd8cb5_0
  • ca-certificates==2022.07.19=hecd8cb5_0
  • ncurses==6.3=hca72f7f_3
  • openssl==1.1.1q=hca72f7f_0
  • readline==8.1.2=hca72f7f_1
  • setuptools==61.2.0=py39hecd8cb5_0
  • zlib==1.2.12=h4dc903c_2
  • tk==8.6.12=h5d9f67b_0
  • sqlite==3.38.5=h707629a_0
  • libcxx==12.0.0=h2f01273_0

Training DPT model

Hello,

I am currently working on a follow-up study that how various visual models could affect the NLE generation on chest X-ray images.
Fixing GPT-2 as an NLE generation model, I'm preparing the same hyperparameters and training setup provided in your paper (arxiv) but stuck where putting 7x7x1024 feature map(Xrep) with a text diagnosis(pj) and prediction vector Y to GPT-2 as input.

May I ask a detailed description of actual implementation, or maybe the code otherwise?

Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.