Coder Social home page Coder Social logo

claritynlp / claritynlp Goto Github PK

View Code? Open in Web Editor NEW
105.0 17.0 42.0 194.43 MB

An NLP framework for clinical phenotyping. Docker | Python | Solr | OMOP. http://claritynlp.readthedocs.io/en/latest/

License: Mozilla Public License 2.0

Shell 0.50% JavaScript 0.17% Python 37.92% ANTLR 0.13% Dockerfile 0.06% Jupyter Notebook 50.06% Makefile 0.10% C++ 3.59% C# 2.78% CSS 4.19% HTML 0.46% SCSS 0.01% Less 0.04%
nlp natural-language-processing docker solr mongodb luigi clinical-research clinical-quality-language nlpql phenotype clinical-phenotyping phenotypes

claritynlp's Introduction

NOTE

As of March 2024, this repository is no longer being maintained. See https://github.com/ClarityNLP/claritynlp-nova.

ClarityNLP

Build Status

What is ClarityNLP?

ClarityNLP is a clinical natural language processing platform focused on making healthcare NLP more accessible and reproducible. Over the past decade, NLP methods have far outstripped our ability to use them effectively.

ClarityNLP combines NLP techniques and libraries with a powerful query language, NLPQL, to identify patients and their clinical observations, extracted from text. ClarityNLP gives you insights into clinical (and other) text without a lot of custom configuration, and NLPQL lets you write your own definitions to find the patients and features that are relevant to your project.

ClarityNLP dashboard

Full ClarityNLP Documentation

You can read the full ClarityNLP documentation here: Read the Docs.

claritynlp's People

Contributors

andystevens98 avatar calebbsides avatar charhart avatar cjamadagni avatar codeschneider avatar crherlihy avatar ellieshivers avatar jduke99 avatar prathamesh1993 avatar richardboyd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

claritynlp's Issues

OHDSI cohort ID issues

If I change the [ohdsi] configuration in project.cfg to use webapi=http://api.ohdsi.org/WebAPI and then try to run the hasSepsis.nlpql example, an exception is thrown from line 80 of nlp/ohdsi/webapi.py. The reason is that cohort 190, which the NLPQL file asks for, is not found on that server. We need to detect this condition somehow, perhaps by asking for the cohort info prior to running the tasks. A related problem is if the same cohort info exists on different OHDSI servers under different IDs.

Sync up documentation for methods with pipeline output

Methods documentation should include the parameters that will be returned by the pipieline. Specifically, Value Extraction does not actually list value as a returned parameter. This is presumably because the pipeline is creating this value itself. But we need the documentation to match the parameters available to the end user.

Other options would be:

  • Separate documentation that lists what Value means for every method
  • Changing the actual method to return a value instead of an X

http://clarity-nlp.readthedocs.io/en/latest/developer_guide/algorithms/value_extraction.html

Data Ingestion Running ToDo List

  • Relational DB ingestion

    • User must provide JDBC connection string
    • User must provide query to match the column names we specify
  • Enable bringing in folder of text files (or selecting group of files from browser file selector)

Lexical variant library

Create a library/API that we can hook in to pipeline or wherever in ClarityNLP. Should create plurals and other word forms (such as verb conjugations). This is different from synonyms and will apply to a word or phrase.

NLM has a library that does this, but I suspect we may not want to use in ClarityNLP. However, we can look at their approaches and review other approaches to this problem.

E.g.
heart attack -> heart attacks
sleep -> sleeps, slept, sleeping

etc.

createReportTagList AND / OR

Function should have the ability to set the Tag list as AND or OR. OR by default (as current). But need a way to specify for example that this get documents tagged as XR AND Chest.

Clarity.createReportTagList(["XR","Chest"]);

Handle quotes in createDocumentSet

See example below. in filter_query and query fields translate single quotes or escaped quotes to double quotes for solr.

documentset AmoxDischargeNotes:
     Clarity.createDocumentSet({
         "report_types":["Discharge summary"],
         "report_tags": [],
         "filter_query": "subject:('bob brown' OR 'betty blue')",
         "query":"report_text:'heart attack'"});

Readme instructions out of date

I tried following the readme instructions and get the following error message.

WARNING: The MAPPER_API_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_API_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_API_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_USER variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_DATABASE variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_PG_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_REDIS_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_REDIS_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_REDIS_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_USER variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_DATABASE variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_PG_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_CORE_NAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_SOLR_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_SOLR_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_SOLR_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_CLIENT_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_CLIENT_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_CLIENT_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The SCHEDULER_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The SCHEDULER_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The SCHEDULER_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_MONGO_DATABASE variable is not set. Defaulting to a blank string.
WARNING: The NLP_MONGO_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_MONGO_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_MONGO_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_SWAGGER_API_URL variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_SWAGGER_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_SWAGGER_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MAPPER_SWAGGER_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_API_HOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The NLP_API_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The NLP_API_CONTAINER_PORT variable is not set. Defaulting to a blank string.
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.mapper-api.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.mapper-client.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.mapper-pg.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.nlp-api.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.nlp-mongo.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.nlp-postgres.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.nlp-solr.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.redis.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.scheduler.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.swagger.ports is invalid: Invalid port ":", should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]

ngram without OHDSI cohort

Would like another ngram function that can run without supplying an OHDSI cohort, but just using the search terms to pull together documents

Negation errors

This query:
limit 100;

 phenotype "Orthopnea" version "2";

 include ClarityCore version "1.0" called Clarity;

 termset Orthopnea:
    ["orthopnea","orthopnoea","PND"];

  define hasOrthopnea:
    Clarity.ProviderAssertion({
      termset:[Orthopnea]
      });

Produces results with incorrect negation:

On cardiac review of symptoms, she denies chest pain, shortness of breath, palpitations, lower extremity edema, orthopnea or PND.All other review of systems negative.

Move synonyms, word variants, etc. to Term Set creation functions

Anything involving manipulation of termsets should happen external to the NLP functions (like provider assertion). In other words, term sets should be manipulated and baked fully by termset functions, so we don't have to deal with incorporating things like synonym expansion into each pipeline individually.

NLPQL Runner Web Page

Page for subitting NLPQL:

No login or email or job name needed

NLPQL field
Limit checkbox and input field (default Checked 100)

Show formatted response

Bonus, use polling method to update when Done.

New Proximity Custom Task

Our existing Proximity task should be removed.

A new TermProximity Custom Task should be created.

Parameters:

  • term_list1
  • term_list2
  • word_distance (distance range between terms from list 1 and terms from list 2)
  • any_order (default false, if set to true will accept 2 then 1 or 1 then 2. Default 1 then 2).

Should return:

  • Sentence
  • Term1
  • Term2

Datediff range

Would like to be able to do datediff range that is between -1 and 1, for example. So within X days on either side.

Either that or be able to write this statement and have it work:

where Clarity.dateDiff(
hasHighWBC.report_date,
hasFever.report_date,
d)<=1 AND Clarity.dateDiff(
hasHighWBC.report_date,
hasFever.report_date,
d)>=-1 ;

termset tester

Create a way for users to test the params of termset, so they can see all the variations that will be included in their pipelines.

Add filters to TermFinder

e.g. ProviderAssertion has these by default

{
    'negex': ["Affirmed"],
    "temporality": ["Recent", "Historical"],
    "experiencer": ["Patient"]
}

Add options in param, so users can enter their own. TermFinder by default has none of these.

Add a createReportTypeList for simple documentset creation

Request: A Clarity function createReportTypeList

Rationale: Our current createReportTagList is awesome and is the underpinnings of our interoperability. That said, sometime you just want to import and run some documents using just the report type, without going over to the mapping API. This also helps if you are running offline and don't have access to the GT servers but don't want to rebuild all the maps locally.

Desired functionality:
documentset ProviderNotes:
Clarity.createReportTypeList(["SomeReportTypeName"]);

Then it builds up the documenset WITHOUT having to go through the mapping APIs. Just those report_type's are used in the solr query.

"flask run" command not working

This command sequence

export FLASK_APP=api.py
flask run

produces an import error stating that util (i.e. util.py) cannot be found ('no module named util'). Somehow flask is not finding the nlp directory where this file resides.

Problem with value extractor enumlist feature

This sentence generates a result of '2' for value extraction with term "NYHA" and enumlist "1,2,3,4,i,ii,iii,iv":

"Chief Complaint: NYHA class IV CHF 24 Hour Events:"

The value extractor version 0.12 seems to be returning the correct result ('iv') when running from the command line with this command:

python3 ./value_extractor.py -t "NYHA" -s "Chief Complaint: NYHA class IV CHF\n24 Hour Events:" --enumlist "1,2,3,4,i,ii,iii,iv"

This is the result:

{
"sentence": "Chief Complaint: NYHA class IV CHF 24 Hour Events:",
"measurementCount": 1,
"terms": [
"NYHA"
],
"querySuccess": true,
"measurementList": [
{
"text": "NYHA class IV CHF 24 Hour Events:",
"start": 17,
"end": 50,
"condition": "EQUAL",
"matchingTerm": "NYHA",
"x": "iv",
"y": -1,
"minValue": -1,
"maxValue": -1
}
]
}

Need to check other parts of the pipeline after value extraction to see what's happening.

Sometimes results are blank

Things that are logical operator results are sometimes blank.

(These items have a different lookup key and should use the phenotype_final flag.)

Automated Test Framework with Jenkins

Add the ability to run automated tests on nlp-api with Jenkins.

We had done this with travis and pytest, but probably makes sense to do with Jenkins, if possible

Change documentset creation to more rubust function

Would like to have the ability to create documentset with more stuff like searching on attr fields etc.

Probably should wrap this in with #26 and just make a

Clarity.createDocumentSet({
reportTags:[optional]
reportTypes:[optional],
fields:[key:value,key:value],
customQuery:[query],
etc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.