claritynlp / claritynlp Goto Github PK

An NLP framework for clinical phenotyping. Docker | Python | Solr | OMOP. http://claritynlp.readthedocs.io/en/latest/

License: Mozilla Public License 2.0

Shell 0.50% JavaScript 0.17% Python 37.92% ANTLR 0.13% Dockerfile 0.06% Jupyter Notebook 50.06% Makefile 0.10% C++ 3.59% C# 2.78% CSS 4.19% HTML 0.46% SCSS 0.01% Less 0.04%

nlp natural-language-processing docker solr mongodb luigi clinical-research clinical-quality-language nlpql phenotype clinical-phenotyping phenotypes

claritynlp's Introduction

NOTE

As of March 2024, this repository is no longer being maintained. See https://github.com/ClarityNLP/claritynlp-nova.

ClarityNLP

What is ClarityNLP?

ClarityNLP is a clinical natural language processing platform focused on making healthcare NLP more accessible and reproducible. Over the past decade, NLP methods have far outstripped our ability to use them effectively.

ClarityNLP combines NLP techniques and libraries with a powerful query language, NLPQL, to identify patients and their clinical observations, extracted from text. ClarityNLP gives you insights into clinical (and other) text without a lot of custom configuration, and NLPQL lets you write your own definitions to find the patients and features that are relevant to your project.

Full ClarityNLP Documentation

You can read the full ClarityNLP documentation here: Read the Docs.

claritynlp's People

Contributors

Stargazers

Watchers

Forkers

jduke99 prathamesh1993 tgoodyear charhart wshusheng amymao64 monkeyconan manojis cephalization11 bidexbido lokeshchinthala georgia-ctsa-informatics dgunda1 kumc-bmi xiangyue9607 sbanashko crherlihy thiehie geoffreyweiner fathom-parth khalidriyaz romanegloo gmkanand prajwal967 kmodi2020 vignesh9395 pj0616 liyan06 ellieshivers m-espinoza-s clger007 danicc097 gkovaig cbarina babunamburi ravichoudhary33 nmcrumpton kpwhri ahmad-abdellatif hertera1 iut62elec andystevens98

claritynlp's Issues

OHDSI cohort ID issues

If I change the [ohdsi] configuration in project.cfg to use webapi=http://api.ohdsi.org/WebAPI and then try to run the hasSepsis.nlpql example, an exception is thrown from line 80 of nlp/ohdsi/webapi.py. The reason is that cohort 190, which the NLPQL file asks for, is not found on that server. We need to detect this condition somehow, perhaps by asking for the cohort info prior to running the tasks. A related problem is if the same cohort info exists on different OHDSI servers under different IDs.

Data Entity final values don't show up as final (only intermediate), unless operations exist

Sync up documentation for methods with pipeline output

Methods documentation should include the parameters that will be returned by the pipieline. Specifically, Value Extraction does not actually list value as a returned parameter. This is presumably because the pipeline is creating this value itself. But we need the documentation to match the parameters available to the end user.

Other options would be:

Separate documentation that lists what Value means for every method
Changing the actual method to return a value instead of an X

http://clarity-nlp.readthedocs.io/en/latest/developer_guide/algorithms/value_extraction.html

Data Ingestion Running ToDo List

Relational DB ingestion
- User must provide JDBC connection string
- User must provide query to match the column names we specify
Enable bringing in folder of text files (or selecting group of files from browser file selector)

Lexical variant library

Create a library/API that we can hook in to pipeline or wherever in ClarityNLP. Should create plurals and other word forms (such as verb conjugations). This is different from synonyms and will apply to a word or phrase.

NLM has a library that does this, but I suspect we may not want to use in ClarityNLP. However, we can look at their approaches and review other approaches to this problem.

E.g.
heart attack -> heart attacks
sleep -> sleeps, slept, sleeping

etc.

Allow multiple logical statements in final

Would like to be able to do

define X
where a.value < 5 OR b.value<2

Add Adjectives to Clarity.Synonyms

Synonyms are great. Need them for adjectives, such as poor, good, bad, etc and adverbs such as slowly, quickly, etc.

Integrate the Tumor Stage extraction code into the pipeline

Prompt users for rows missing in .env but in .env.example

At docker up, remind users things are missing from .env (compare keys with .env.example), and prompt them to enter or use default.

Change enum_list to value_list in value_extractor NLPQL

Support lists in ValueExtractor NLPQL

Ability to get Mongo results as input to NLPQL data entities

Create an NLPQL call to use previous results as input instead of documents.

Integrate the Columbia measurement extraction code into the pipeline

createReportTagList AND / OR

Function should have the ability to set the Tag list as AND or OR. OR by default (as current). But need a way to specify for example that this get documents tagged as XR AND Chest.

Clarity.createReportTagList(["XR","Chest"]);

Numeric comparisons in NLPQL don't respect floats

e.g.

define hasFever:
    where Temperature.value >= 100.4;

All values >= 100 are returned.

Word substitutions in subject_finder sometimes appear in the output.

Configure Result Viewer in Docker compose

I made a first pass at this. Need help to deploy with Rancher/AWS.

It's a simple React UI + Node. It uses the Facebook react starter.

https://github.com/ClarityNLP/results-viewer

Solr field mapping configuration

For users with existing Solr installations, create a mapping the config, so users don't have to reingest Solr documents.

Handle quotes in createDocumentSet

See example below. in filter_query and query fields translate single quotes or escaped quotes to double quotes for solr.

documentset AmoxDischargeNotes:
     Clarity.createDocumentSet({
         "report_types":["Discharge summary"],
         "report_tags": [],
         "filter_query": "subject:('bob brown' OR 'betty blue')",
         "query":"report_text:'heart attack'"});

Readme instructions out of date

I tried following the readme instructions and get the following error message.

WARNING: The MAPPER_API_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_API_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_API_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_USER variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_DATABASE variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_REDIS_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_REDIS_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_REDIS_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_USER variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_DATABASE variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_CORE_NAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_SOLR_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_SOLR_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_SOLR_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_CLIENT_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_CLIENT_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_CLIENT_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The SCHEDULER_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The SCHEDULER_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The SCHEDULER_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_MONGO_DATABASE variable is not set. Defaulting to a blank string.
WARNING: The NLP_MONGO_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_MONGO_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_MONGO_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_SWAGGER_API_URL variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_SWAGGER_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_SWAGGER_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_SWAGGER_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_API_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_API_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_API_CONTAINER_PORT variable is not set. Defaulting to a blank string.
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.mapper-api.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.mapper-client.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.mapper-pg.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.nlp-api.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.nlp-mongo.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.nlp-postgres.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.nlp-solr.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.redis.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.scheduler.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.swagger.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]

Support nested operations in NLPQL

define final efValue:
    where EjectionFractionFunction.value < 50.0 OR EjectionFractionFunction.value >= 50.0;

ngram without OHDSI cohort

Would like another ngram function that can run without supplying an OHDSI cohort, but just using the search terms to pull together documents

Negation errors

This query:
limit 100;

 phenotype "Orthopnea" version "2";

 include ClarityCore version "1.0" called Clarity;

 termset Orthopnea:
    ["orthopnea","orthopnoea","PND"];

  define hasOrthopnea:
    Clarity.ProviderAssertion({
      termset:[Orthopnea]
      });

Produces results with incorrect negation:

On cardiac review of symptoms, she denies chest pain, shortness of breath, palpitations, lower extremity edema, orthopnea or PND.All other review of systems negative.

Change Delimiter on Ingest for CSV

Can you make the delimiter optional to be comma, tab, or pipe?

Move synonyms, word variants, etc. to Term Set creation functions

Anything involving manipulation of termsets should happen external to the NLP functions (like provider assertion). In other words, term sets should be manipulated and baked fully by termset functions, so we don't have to deal with incorporating things like synonym expansion into each pipeline individually.

In NLPQL, change enum_list to value_list

NLPQL Runner Web Page

Page for subitting NLPQL:

No login or email or job name needed

NLPQL field
Limit checkbox and input field (default Checked 100)

Show formatted response

Bonus, use polling method to update when Done.

New Proximity Custom Task

Our existing Proximity task should be removed.

A new TermProximity Custom Task should be created.

Parameters:

term_list1
term_list2
word_distance (distance range between terms from list 1 and terms from list 2)
any_order (default false, if set to true will accept 2 then 1 or 1 then 2. Default 1 then 2).

Should return:

Sentence
Term1
Term2

Datediff range

Would like to be able to do datediff range that is between -1 and 1, for example. So within X days on either side.

Either that or be able to write this statement and have it work:

where Clarity.dateDiff(
hasHighWBC.report_date,
hasFever.report_date,
d)<=1 AND Clarity.dateDiff(
hasHighWBC.report_date,
hasFever.report_date,
d)>=-1 ;

support multivalued solr text field

Change "Custom" algorithm framing

Want to convey it as adding new algorithms or incorporating external libraries etc rather than conveying customization.

Sanity check to test if services are up before running Phenotype job

Solr
mongo
Postgres
Report type mapper (don't fail, but alert)

Get value extraction list processing working again.

termset tester

Create a way for users to test the params of termset, so they can see all the variations that will be included in their pipelines.

Add preprocessing step option to NLPQL

Add Doc Limit to Debug

For debug; would like to be able to specify number of documents to run

Add filters to TermFinder

e.g. ProviderAssertion has these by default

{
    'negex': ["Affirmed"],
    "temporality": ["Recent", "Historical"],
    "experiencer": ["Patient"]
}

Add options in param, so users can enter their own. TermFinder by default has none of these.

Link NLP Ingest in Docker Compose

https://github.com/ClarityNLP/clarity-data-ingest

runs on 5100.

Develop Design for Unified UX with SSO

Allow creating phenotypes (finals) that combine other finals

NLPQL only allows define final where X AND Y if

X and Y are both features (not finals)

should be able to use logic on finals as well, I would think. But open to reasons why this cannot be.

Job Kill API call

To kill running job via API

Add additional unit tests

Flask startup
NLPQL

Add a createReportTypeList for simple documentset creation

Request: A Clarity function createReportTypeList

Rationale: Our current createReportTagList is awesome and is the underpinnings of our interoperability. That said, sometime you just want to import and run some documents using just the report type, without going over to the mapping API. This also helps if you are running offline and don't have access to the GT servers but don't want to rebuild all the maps locally.

Desired functionality:
documentset ProviderNotes:
Clarity.createReportTypeList(["SomeReportTypeName"]);

Then it builds up the documenset WITHOUT having to go through the mapping APIs. Just those report_type's are used in the solr query.

"flask run" command not working

This command sequence

export FLASK_APP=api.py
flask run

produces an import error stating that util (i.e. util.py) cannot be found ('no module named util'). Somehow flask is not finding the nlp directory where this file resides.

Problem with value extractor enumlist feature

This sentence generates a result of '2' for value extraction with term "NYHA" and enumlist "1,2,3,4,i,ii,iii,iv":

"Chief Complaint: NYHA class IV CHF 24 Hour Events:"

The value extractor version 0.12 seems to be returning the correct result ('iv') when running from the command line with this command:

python3 ./value_extractor.py -t "NYHA" -s "Chief Complaint: NYHA class IV CHF\n24 Hour Events:" --enumlist "1,2,3,4,i,ii,iii,iv"

This is the result:

{
"sentence": "Chief Complaint: NYHA class IV CHF 24 Hour Events:",
"measurementCount": 1,
"terms": [
"NYHA"
],
"querySuccess": true,
"measurementList": [
{
"text": "NYHA class IV CHF 24 Hour Events:",
"start": 17,
"end": 50,
"condition": "EQUAL",
"matchingTerm": "NYHA",
"x": "iv",
"y": -1,
"minValue": -1,
"maxValue": -1
}
]
}

Need to check other parts of the pipeline after value extraction to see what's happening.

Clarity.createDocumentSet({
reportTags:[optional]
reportTypes:[optional],
fields:[key:value,key:value],
customQuery:[query],
etc