Coder Social home page Coder Social logo

lindvalllab / mlsym Goto Github PK

View Code? Open in Web Editor NEW
7.0 0.0 6.0 63 KB

Deep learning for cancer symptoms monitoring on the basis of EHR unstructured clinical notes

License: GNU General Public License v2.0

Python 100.00%
nlp nlp-parsing clinical-notes clinical-data deep-learning ner ehr

mlsym's Introduction

Overview

This repository is a research work by Lindvall Lab at Dana-Farber Cancer Institute on extracting present/current symptoms reported by the patients from their electronic health record (EHR). Symptoms are vital outcomes for cancer clinical trials, observational research, and population-level surveillance. We sought to develop, test, and externally validate a deep learning model to extract symptoms from unstructured clinical notes in the electronic health record (EHR).

Project Pipeline

How to process annotation output label/text for model input

python processing/label_output.py \
  --input {location of the label-studio output json files} \
  --label_config {configuration used to set up label-studio; xml file} \
  --label all OR --keep goals_or care
  --hpi \
  --stratified_split 0.3 \
  --test
  • Without --test argument, data will be stratified split to train/valid 0.7/0.3
  • With --test argument, data will be stratified split to train/valid/test 0.7/0.15/0.15
  • It takes around 17s to load the spacy en_core_sci_lg model, please wait.

Training

Run the models

  • Transformer model choices: 'bert', 'xlnet', 'roberta', 'xlm-roberta', 'camembert', 'distilbert', 'electra'
conda activate transformers
python ner.py \
  --dset {location of the data that has been converted to ConLL format} \
  --model_class electra \
  --pretrained_model google/electra-base-discriminator \
  --lr 6e-5 \
  --decay 0.02 \
  --warmups 500

Optimize the hyperparameters

  • Bayesian optimization with Gaussian processes
    • Please open the interactive plots (contour_plot, slice_plot, cv_plot, etc) in browser
python optimization.py \
  --model bert \
  --lr 1e-6 1e-4 \
  --decay 0.01 0.1 \
  --warmups 0 3000 \
  --eps 1e-9 1e-7

Load model outputs back into server hosting label studio - for active learning

python processing/model_output.py \
  --model_output processing/output/symptoms_hpi_all/prediction_test.txt \
  --label_output_dir symptoms/storage/label-studio/project/completions/ \
  --label_config symptoms/storage/label-studio/project/config.xml

Inference

Use raw csv files with a column containing clinical note - no need to convert into ConLL format.

python inference/run_and_predict.py -ipf {location of the input file} -opf {location of dummy output file} -cn {name of the column containing the clinical note}

Copyright

All codes are modified from

License

The GNU GPL v2 version of PathML is made available via Open Source licensing. The user is free to use, modify, and distribute under the terms of the GNU General Public License version 2.

Commercial license options are available also.

Contact

Questions? Comments? Suggestions? Get in touch!

[email protected]

mlsym's People

Contributors

dana-farber avatar ssamine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.