Coder Social home page Coder Social logo

hafsah2018 / clinicalbert-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kexinhuang12345/clinicalbert

0.0 1.0 0.0 205 KB

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission (CHIL 2020 Workshop)

Home Page: https://arxiv.org/abs/1904.05342

Python 24.77% Jupyter Notebook 75.23%

clinicalbert-1's Introduction

ClinicalBERT

This repo hosts pretraining and finetuning weights and relevant scripts for ClinicalBERT, a contextual representation for clinical notes.

New: Clinical XLNet and Pretraining Script

  1. clinical XLNet pretrained model is available at here.

  2. Detailed Step Instructions for pretraining ClinicalBERT and Clinical XLNet from scratch are available here

  3. The predictive performance result is updated in this version using the correct pretraining test splitting method described in pretraining script above. For more clinical outcomes performance comparison with more baselines using the correct split for ClinicalBERT/XLNet, please see the Clinical XLNet paper.

Installation and Requirements

pip install pytorch-pretrained-bert

Datasets

We use MIMIC-III. As MIMIC-III requires the CITI training program in order to use it, we refer users to the link. However, as clinical notes share commonality, users can test any clinical notes using the ClinicalBERT weight, although further fine-tuning from our checkpoint is recommended.

File system expected:

-data
  -discharge
    -train.csv
    -val.csv
    -test.csv
  -3days
    -train.csv
    -val.csv
    -test.csv
  -2days
    -test.csv

Data file is expected to have column "TEXT", "ID" and "Label" (Note chunks, Admission ID, Label of readmission).

ClinicalBERT Weights

Use this google link or this oneDrive link for users in mainland China to download pretrained ClinicalBERT along with the readmission task fine-tuned model weights.

The following scripts presume a model folder that has following structure:

-model
	-discharge_readmission
		-bert_config.json
		-pytorch_model.bin
	-early_readmission
		-bert_config.json
		-pytorch_model.bin
	-pretraining
		-bert_config.json
		-pytorch_model.bin
		-vocab.txt

Hospital Readmission using ClinicalBERT

Below list the scripts for running prediction for 30 days hospital readmissions.

Early Notes Prediction

python ./run_readmission.py \
  --task_name readmission \
  --readmission_mode early \
  --do_eval \
  --data_dir ./data/3days(2days)/ \
  --bert_model ./model/early_readmission \
  --max_seq_length 512 \
  --output_dir ./result_early

Discharge Summary Prediction

python ./run_readmission.py \
  --task_name readmission \
  --readmission_mode discharge \
  --do_eval \
  --data_dir ./data/discharge/ \
  --bert_model ./model/discharge_readmission \
  --max_seq_length 512 \
  --output_dir ./result_discharge

Training your own readmission prediction model from pretraining ClinicalBERT

python ./run_readmission.py \
  --task_name readmission \
  --do_train \
  --do_eval \
  --data_dir ./data/(DATA_FILE) \
  --bert_model ./model/pretraining \
  --max_seq_length 512 \
  --train_batch_size (BATCH_SIZE) \
  --learning_rate 2e-5 \
  --num_train_epochs (EPOCHs) \
  --output_dir ./result_new

It will use the train.csv from the (DATA_FILE) folder.

The results will be in the output_dir folder and it consists of

  1. 'logits_clinicalbert.csv': logits from ClinicalBERT to compare with other models
  2. 'auprc_clinicalbert.png': Precision-Recall Curve
  3. 'auroc_clinicalbert.png': ROC Curve
  4. 'eval_results.txt': RP80, accuracy, loss

Preprocessing

We provide script for preprocessing clinical notes and merge notes with admission information on MIMIC-III.

Notebooks

  1. Attention: this notebook is a tutorial to visualize self-attention.

Gensim Word2Vec and FastText models

Please use this link to download Word2Vec and FastText models for Clinical Notes.

To use, simply

import gensim
word2vec = gensim.models.KeyedVectors.load('word2vec.model')
weights = (m[m.wv.vocab])

Contact

Please contact [email protected] for help or submit an issue.

Citation

Please cite arxiv:

@article{clinicalbert,
author = {Kexin Huang and Jaan Altosaar and Rajesh Ranganath},
title = {ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission},
year = {2019},
journal = {arXiv:1904.05342},
}

clinicalbert-1's People

Contributors

kexinhuang12345 avatar leopold-franz avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.