Coder Social home page Coder Social logo

omop-core's People

Contributors

thanepi avatar

Watchers

 avatar  avatar

omop-core's Issues

Rootless `supervisord` in Dockerfile

The supervisord is running as root for some reason, This affect container of rstudio ignore some user-related setting and required root as username to access.

image

Urgent Level: Optional + Not in 2023Q3 MVP
When to use: Enhancing/Hardening action required

Pre-Requisite: Secret file patterning.
Possible Obsolete: Secret embedding, or passing in environment.

Trino TLS/HTTPS Support

All authentication technologies supported by Trino require configuring TLS as the foundational layer.
Learn more: https://trino.io/docs/current/security/overview.html

Setting up this layer could be a mandatory step to proceed any further security related feature.

Urgent Level: Optional + Not in 2023Q3 MVP
When to use: Establish an internet connection in external level (running as a cloud server)

Pre-Requisite: Exposing IP Address or domain for having TLS certificate
Possible Obsolete: Self-Sign certificates method

OMOP DB volume is not mounted

Found that current build of OMOP Postgres didn't mount volume yet.

Impact is possible data lost when VM machine is turned-off or reset.

To-Fix is mounting volume at specific path or named-volume on compose file.

volumes:
# copy the sql script to create tables
# - ./sql/create_tables.sql:/docker-entrypoint-initdb.d/create_tables.sql
- ../dwh/omop-ddl/:/docker-entrypoint-initdb.d/

Apache Spark Containerization

Used as E-L backbone in CDM linkage and to push from CDM instance to analytics instance.

The containerization of this Spark will be customized with additional requirement include: Python + Dependency and JDBC Driver.

Most of container work will be foundation support for package of "crosspipe" which will be described on next issue, or visit its early proposal at:
https://github.com/sidataplus/omop-core/blob/d8d35349e2d44502689eb8a1aa6f136de398119c/container/module/TCEL/Spark/README

Remodel OHDSI Tools

Work on directory and workflow of OHDSI Tools before release (merge to main)

[Agile] α: Warehouse Deployment

Epic

This development milestone is focus on method of deployment for ETL developers, who just clone this repo and start modeling their EMR/schemas into OMOP.

Story

  • Add source DB connection support, pilot with MSSQL.
    • Generic DB engine (MSSQL) to Trino.
    • Trino connection of generic DB (MSSQL) to dbt.
  • Restructure container compose file, match with each use case.
    • Warp-up to single compose file, by reference each isolated compose file again.

Review

This epic is on the way.

Feature

  • x (#x)

Fix

  • x (#x)

Built-in `vocab` schema

This will add database schema named vocab of OMOP vocabulary into database of omop-pg

By the meaning of "built-in" because this section will be spin-off to individual repo instated of omop-core feature.

Dockerfile using secrets in Compose for credential management

A secret is any piece of data, such as a password, certificate, or API key, that shouldn’t be transmitted over a network or stored unencrypted in a Dockerfile or in your application’s source code.
Learn more: https://docs.docker.com/compose/use-secrets/

Setting up this layer possible prevent breach from credential leaks.

Urgent Level: Optional + Not in 2023Q3 MVP
When to use: Enhancing/Hardening action required

Pre-Requisite: Secret file patterning.
Possible Obsolete: Secret embedding, or passing in environment.

Limitation: DHCP-Managed Hosts and `SECURITY_ORIGIN` Configuration in WebAPI

One of environment variable SECURITY_ORIGIN in OHDSI/WebAPI is related to CORS was designed for static value to reach address of its WebAPI from ATLAS. (as code below)

https://github.com/sidataplus/omop-core/blob/a39556cd3f0e8f00c424050303acfb124dbf19ca/container/module/OHDSI/WebAPI/WebAPI-compose.yaml#L56C7-L58C47

The default value is pointing directly to localhost or 127.0.0.1 for single machine working purpose and will occur an error when using another machine in the network, to fix this problem the value of SECURITY_ORIGIN should be "the actual IP address of WebAPI host machine" but this limitation would not possible when host machine cannot be assigned with dedicate IP address, or have to re-obtain DHCP every fixed period of time.

image

Note: Screenshot from non-host machine when test with same LAN network with host machine.

Current approach would be disable SECURITY_CORS_ENABLED but may not recommended.

[Agile] β: Cross Pipeline

Epic

This development milestone focuses on data pipeline module for crossing platform between CDM and analytic instance include developing ETL tools for transferring data between databases, performing basic data validation prior to the transfer, implementing ATLAS DB (WebAPI) backup and data reload processes, and ensuring cross-platform compatibility.

Story

  • Develop data integration tools (ETL) to seamlessly transfer data between databases (CDM to ATLAS).
    • Setup containerized of Apache Spark as bulk ETL backbone.
    • Ensure efficient extraction, transformation, and loading of data.
    • Implement data mapping and transformation rules for accurate integration.
  • Incorporate robust data validation procedures prior to data transfer.
    • Apply CDM transformation correctness checks to identify and rectify errors.
    • Enhance data integrity by enforcing validation rules on pseudonymization (OHDSI CureID).
  • Implement a comprehensive data backup and restoration system.
    • Create temporary DB instance for backups feature of the ATLAS database.
    • Enable seamless data recovery and reloading to prevent case of data loss when working on Prod and Dev environment.
  • Work on container compatibility.
    • Ensure seamless operation across CDM, ATLAS and Temporary for backup.
    • Optimize usability and performance include environments variable management.

Review

This epic is on the way.

Feature

  • x (#x)

Fix

  • x (#x)
  • N/A - No hot fix task this sprint

Postponed / Changed

  • x (#x)
  • N/A - No task postponed or changed this sprint

Issue Raised (Won't Fix)

Critical

  • x (#x)
  • N/A - No critical issue raised this sprint

Optional

  • x (#x)
  • N/A - No optional issue raised this sprint

[Agile] β: OHDSI Tools

Epic

This development milestones work on early supports to OHDSI Tools, primarily focusing on ATLAS.

In later of development, this feature should spin-off to individual repo.

Story

  • ATLAS go through
    • ACHILLES as ATLAS required (used in application level)
  • Containerized R Server
  • More Infra Containerization
    • WebAPI
    • Atlas
    • ACHILLES
    • DQD (Data Quality Dashboard)
  • Workflow Optimization

Review

OHDSI tools were added as container compose of pre-installed R server and combo pack of WebAPI with DB instance for Atlas, this sprint also complete on workflow optimization by re-modular container script, tested in Podman.

Also a hot fix on Postgres of OMOP CDM about configuration file support and shared memory related config.

Feature

Fix

Postponed / Changed

  • N/A - No task postponed or changed this sprint

Issue Raised (Won't Fix)

Critical

  • N/A - No critical issue raised this sprint

Optional

Create transferable `concept_hierarchy` from Vocab DB to ATLAS DB

There is vocab population process which required to complete on vocab DB side as below:

https://github.com/OHDSI/WebAPI/blob/fe070e527abe61a59c42c23e69121f82dff5b4f1/src/main/resources/ddl/results/init_concept_hierarchy.sql

But as purposed ATLAS' backend DB should operate separately from vocab (also CDM) side. It's not possible to inserting data cross database. According to official instruction, this process require human intervention, we may need to create a loader tool for this when concept_hierarchy table was required.

Related discussion: https://forums.ohdsi.org/t/how-to-create-concept-hierarchy/15090

Revise Synthea: Mandatories

Working on pre-made Synthea to OMOP code using this for create dbt test

The mandatory table includes PERSON and OBSERVATION_PERIOD

Enhancement: Secure and Dynamic `source_daimon` Connection Strings in WebAPI Setup for ATLAS

According to setup WebAPI for ATLAS, source_daimon as SQL file is designed to collect connection info also include username and password in format of database connection string. (as code below)

populate_source_source_daimon.sql

The connection string is somewhat currently contain sensitive information and host name should dynamically matched to container name for this case but currently was fixed string.

Current recommended approach is just expose default or simple username/password like this, but have additional work on access control/network segmentation.

Example:

  • Let ohdsi-webapi-pg of database not expose inbound connection from outside container, but still able to reach by other container, include container of DB management like pg-admin-web which allow to be reach from external and tighten credential on this level instated.

Seamless data transfer between CDM Instance

Using previous Apache Spark to connect between both CDM and just E-L from source to target (cdm_a, cdm_b), pre-define work boundary of CDM table and Vocab table, and also add early support when calling with CLI

Trino Containerization

This task expect Trino (query engine) to be containerized and called in container platform.

In this phase, Trino should be able to interact directly with OMOP CDM database of PostgreSQL of #2 by sharing their config resource like username and password.

Add MSSQL connector to Trino

This task expect Trino to have additional connection as source database (while still connecting with OMOP database of PostgreSQL).

[Agile] β: Mandatory CDM

Epic

This development milestone focusing conduct unit test case for CDM mandatory tables, which required for further table modeling.

In addition to find-out if current directory pattern is well effective for divide version control section between data modeling pipeline and infrastructure.

Story

  • Quickly clone pre-made 'Synthea to OMOP' of PostgreSQL and port to Trino format for validation.
  • Create for mandatory schema test:
    • Person
    • Observation_period
  • Add early vocabulary supports.

Review

Vocab update was added a few week after planning, Schema test for both person and observation_period was conducted, however the tasks were changed to generic test.

A critical bug on container volume was solved on this PR.

Feature

Fix

Postponed / Changed

Fix rootless Podman cannot reach external host in host's network

I'm initializing development scenario where using this repo to connect to production EMR as OMOP source DB and stuck that Trino cannot reach source DB which running outside Podman environment.

From deeply investigated, I notice that Podman is running in rootless by default which is fine but by the technical about networking. To let each machine freely talking each other may require privileges.
image

Partially map external IP or URL to Podman internal IP should be a possible method,

Currently looking for solution.
https://github.com/containers/podman/blob/main/docs/tutorials/basic_networking.md#slirp4netns
https://docs.podman.io/en/latest/markdown/podman-network-create.1.html
containers/podman#13966

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.