Coder Social home page Coder Social logo

kf-task-fhir-etl's Introduction

Kids First FHIR ETL Task Service

The Kids First FHIR ETL Task Service, built off of the Kids First Data Ingest Library, is a Python wrapper application, which

  • Extracts tables from the KF Dataservice DB;
  • Transforms the extracted tabular data to FHIR resources in JSON; and
  • Loads the transformed records into a Kids First FHIR Service.

Quickstart

Running ETL from Command Line Interface

  1. Make sure Python (>= 3.7) is installed.

  2. Obtain three sets of credentials as follows:

    • Kids First Dataservice DB URL: Contact Kids First DRC DevOps Team.
    • FHIR USERNAME and PASSWORD: The Kids First FHIR ETL uses basic authentication for POST, PUT, PATCH, and DELETE . Contact Kids First DRC DevOps Team.
    • FHIR Cookie: Follow the instruction described here.
  3. Clone this repository:

$ git clone https://github.com/kids-first/kf-task-fhir-etl.git
$ cd kf-task-fhir-etl
  1. Create and activate a virtual environment:
$ python3 -m venv venv
$ source venv/bin/activate
  1. Install dependencies:
(venv) $ pip install --upgrade pip && pip install -e .
  1. Create a .env with the following environment variable names:
KF_DATASERVICE_DB_URL=<PUT-KF-DATASERVICE-DB-URL>
KF_API_DATASERVICE_URL=<PUT-KF-API-DATASERVICE-URL> # e.g., https://kf-api-dataservice.kidsfirstdrc.org/
KF_API_FHIR_SERVICE_URL=<PUT-KF-API-FHIR-SERVICE-URL> # e.g., https://kf-api-fhir-service.kidsfirstdrc.org

FHIR_USERNAME=<PUT-FHIR-USERNAME>
FHIR_PASSWORD=<PUT-FHIR-USERNAME>
FHIR_COOKIE=<PUT-FHIR-COOKIE>
  1. Get familiar with required arguments:
(venv) kidsfirst fhir-etl -h
Usage: kidsfirst fhir-etl [OPTIONS] KF_STUDY_IDS...

  Ingest a Kids First study(ies) into a FHIR server.

  Arguments:

      KF_STUDY_IDS - a KF study ID(s) concatenated by whitespace, e.g., SD_BHJXBDQK SD_M3DBXD12

Options:
  -h, --help  Show this message and exit.
  1. Tunnel to the KF Dataservice DB (See also here or contact Kids First DRC DevOps Team):
(venv) igor awslogin
(venv) export AWS_PROFILE=Mgmt-Console-Dev-D3bCenter@232196027141
(venv) igor dev-env-tunnel --environment prd --cidr_block 0.0.0.0/0
  1. Run the following command (the KF study IDs below are exemplars):
(venv) kidsfirst fhir-etl SD_ZXJFFMEF SD_46SK55A3

Running ETL from Docker (TBD)

kf-task-fhir-etl's People

Contributors

liberaliscomputing avatar

Stargazers

Alex Lubneuski avatar

Watchers

James Cloos avatar Allison Heath avatar Yuankun Zhu avatar  avatar  avatar  avatar  avatar

kf-task-fhir-etl's Issues

Implement first pass step function

At this point the step functions are just placeholders but we want to appropriately break apart the FHIR ETL code and fit it into the step function pipeline.

The first pass will be to invoke the FHIR ETL as it is now in one step. Once we get this working and are comfortable with step functions, how they are monitored, how to debug, etc then we can break apart the ETL into multiple steps (e.g. extract dataservice tables, transform to FHIR, load into FHIR server)

Setup step function + batch pipeline

Setup the the infra, CI/CD for the initial step function pipeline. The steps for now will be placeholders where each step will print what it is intending to do to stdout. The devs should make sure they know how to run the pipeline and see that it works before we merge this PR

Initial FHIR ETL - without Step Functions

Background

The current Dataservice to FHIR ETL is not a service meaning it is Python code that has to be run on a local dev machine. we want to run this ETL in a pipeline that can easily be run and monitored by anyone without having to setup a local dev environment.

We've chosen to try wrapping the FHIR ETL in a AWS step function + batch pipeline so that it can run and be monitored centrally and it is scalable. This repo will contain all of the necessary infra setup for the FHIR ETL code written in Python and also the configuration for the step functions and batch job.

Issue

Before we use step functions, we want to get the initial ETL code working. This is what @liberaliscomputing has already been working on. It does the following:

  • Extract tables from KF dataservice
  • Transforms tables to FHIR JSON
  • Loads FHIR JSON into KF FHIR service

Integration Tests

Motivation

We don't currently have integration tests which means right now it is tough to know whether a change made to any part of the code base will break something and/or will have the expected outcome in the target FHIR server.

Approach

We need to add some basic end-to-end integration tests which test that all of the components in the ETL code work without error and that the FHIR server ends up with the correct data. Specifically we need to:

End to End Test

  • Deploy Data Service
  • Deploy FHIR Service
  • Load a test study into the Data Service
  • Run the ETL
  • At a minimum we need to check that the FHIR server has the correct number of entities, with correct IDs

Entity Builder and Target API Plugin Test

  • Deploy FHIR Service
  • Run each entity builder with a sufficiently detailed Data Service payload
  • Load the resulting FHIR resource into the FHIR server
  • Check that the FHIR resource in the FHIR server has the expected data

Add docker-compose for local testing and development

It be great if we had a docker-compose.yaml so that a developer can get this up and running locally fairly quickly. We could use the free HAPI FHIR server for our FHIR service and the KF dataservice image from kfdrc/kf-api-dataservice

Add a CI pipeline

Setup a Github workflow which can run the unit tests and integration tests. Specifically, we want to:

Unit Tests

  • Should be run on every pushed commit

Integration Tests

  • Test setup (service deployment) should be run only once per PR, when it opens/re-opens
  • Test teardown (service teardown) should be run only once per PR, when it closes
  • Test clean up (clean out FHIR server) should be run before integration tests run
  • Integration tests run on each pushed commit

Unit Tests

Motivation

We don't currently have unit tests which means right now it is tough to know whether a change made to any part of the code base will break something and/or will have the expected outcome in the target FHIR server.

This type of code can be difficult to unit test since much of it relies on pulling/pushing to external systems. We should add unit tests to test as much of the code base as we can. Some things may be very difficult to test with a unit test, and may be good candidates for an integration test. Here are some of the things we can/should unit test:

Approach

We should implement the following unit tests. Developers must run these locally while they are working and we need to run these in a CI pipeline on each pushed commit.

Entity Builders

  • Each entity builder in the entity_builders package should have a set of unit tests
  • At a minimum we should test that each of the class methods executes without error
  • In our integration tests, we can test that the resulting FHIR resource can be successfully created/updated in the FHIR server

Ingest

  • The ingest.py may need some unit tests to ensure the transformation from Data Service payloads to the intermediate model works.
  • In order to do this, we will need to break up the Ingest.transform into separate methods, one per entity type
  • At a minimum we just need to test that each transformation executes without error
  • In our integration tests, we can test that the extraction from Data Service works as expected

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.