sidataplus / omop-core Goto Github PK
View Code? Open in Web Editor NEWPart of OMOP NHSO TCELS project
Part of OMOP NHSO TCELS project
This task expect PostgreSQL (database) to be containerized and called in container platform.
In this phase, PostgreSQL should be able to spined as a OMOP CDM foundation, the container have to include configuration area for desired default username, password and database name.
Using Docker's official exist best practice to create this task:
https://github.com/docker/awesome-compose/tree/master/postgresql-pgadmin
This development milestone focusing conduct unit test case for CDM mandatory tables, which required for further table modeling.
In addition to find-out if current directory pattern is well effective for divide version control section between data modeling pipeline and infrastructure.
Vocab update was added a few week after planning, Schema test for both person
and observation_period
was conducted, however the tasks were changed to generic test.
A critical bug on container volume was solved on this PR.
The supervisord is running as root for some reason, This affect container of rstudio ignore some user-related setting and required root as username to access.
Urgent Level: Optional + Not in 2023Q3 MVP
When to use: Enhancing/Hardening action required
Pre-Requisite: Secret file patterning.
Possible Obsolete: Secret embedding, or passing in environment.
Conduct a pattern to develop container architecture
Used as E-L backbone in CDM linkage and to push from CDM instance to analytics instance.
The containerization of this Spark will be customized with additional requirement include: Python + Dependency and JDBC Driver.
Most of container work will be foundation support for package of "crosspipe" which will be described on next issue, or visit its early proposal at:
https://github.com/sidataplus/omop-core/blob/d8d35349e2d44502689eb8a1aa6f136de398119c/container/module/TCEL/Spark/README
According to setup WebAPI for ATLAS, source_daimon
as SQL file is designed to collect connection info also include username and password in format of database connection string. (as code below)
populate_source_source_daimon.sql
The connection string is somewhat currently contain sensitive information and host name should dynamically matched to container name for this case but currently was fixed string.
Current recommended approach is just expose default or simple username/password like this, but have additional work on access control/network segmentation.
Example:
ohdsi-webapi-pg
of database not expose inbound connection from outside container, but still able to reach by other container, include container of DB management like pg-admin-web
which allow to be reach from external and tighten credential on this level instated.This development milestone is focus on method of deployment for ETL developers, who just clone this repo and start modeling their EMR/schemas into OMOP.
This epic is on the way.
One of environment variable SECURITY_ORIGIN
in OHDSI/WebAPI is related to CORS was designed for static value to reach address of its WebAPI from ATLAS. (as code below)
The default value is pointing directly to localhost
or 127.0.0.1
for single machine working purpose and will occur an error when using another machine in the network, to fix this problem the value of SECURITY_ORIGIN
should be "the actual IP address of WebAPI host machine" but this limitation would not possible when host machine cannot be assigned with dedicate IP address, or have to re-obtain DHCP every fixed period of time.
Note: Screenshot from non-host machine when test with same LAN network with host machine.
Current approach would be disable SECURITY_CORS_ENABLED
but may not recommended.
There is vocab population process which required to complete on vocab DB side as below:
But as purposed ATLAS' backend DB should operate separately from vocab (also CDM) side. It's not possible to inserting data cross database. According to official instruction, this process require human intervention, we may need to create a loader tool for this when concept_hierarchy
table was required.
Related discussion: https://forums.ohdsi.org/t/how-to-create-concept-hierarchy/15090
Top layer of the webtools
and above of current WebAPI
container
Working on pre-made Synthea to OMOP code using this for create dbt test
The mandatory table includes PERSON
and OBSERVATION_PERIOD
All authentication technologies supported by Trino require configuring TLS as the foundational layer.
Learn more: https://trino.io/docs/current/security/overview.html
Setting up this layer could be a mandatory step to proceed any further security related feature.
Urgent Level: Optional + Not in 2023Q3 MVP
When to use: Establish an internet connection in external level (running as a cloud server)
Pre-Requisite: Exposing IP Address or domain for having TLS certificate
Possible Obsolete: Self-Sign certificates method
A secret is any piece of data, such as a password, certificate, or API key, that shouldn’t be transmitted over a network or stored unencrypted in a Dockerfile or in your application’s source code.
Learn more: https://docs.docker.com/compose/use-secrets/
Setting up this layer possible prevent breach from credential leaks.
Urgent Level: Optional + Not in 2023Q3 MVP
When to use: Enhancing/Hardening action required
Pre-Requisite: Secret file patterning.
Possible Obsolete: Secret embedding, or passing in environment.
Using previous Apache Spark to connect between both CDM and just E-L from source to target (cdm_a, cdm_b), pre-define work boundary of CDM table and Vocab table, and also add early support when calling with CLI
This task expect Trino (query engine) to be containerized and called in container platform.
In this phase, Trino should be able to interact directly with OMOP CDM database of PostgreSQL of #2 by sharing their config resource like username and password.
Wrong button, Ignore this.
Found that current build of OMOP Postgres didn't mount volume yet.
Impact is possible data lost when VM machine is turned-off or reset.
To-Fix is mounting volume at specific path or named-volume on compose file.
omop-core/podman/compose/omop-compose.yaml
Lines 15 to 18 in 0465a93
I'm initializing development scenario where using this repo to connect to production EMR as OMOP source DB and stuck that Trino cannot reach source DB which running outside Podman environment.
From deeply investigated, I notice that Podman is running in rootless by default which is fine but by the technical about networking. To let each machine freely talking each other may require privileges.
Partially map external IP or URL to Podman internal IP should be a possible method,
Currently looking for solution.
https://github.com/containers/podman/blob/main/docs/tutorials/basic_networking.md#slirp4netns
https://docs.podman.io/en/latest/markdown/podman-network-create.1.html
containers/podman#13966
This task expect Trino to have additional connection as source database (while still connecting with OMOP database of PostgreSQL).
This development milestones work on early supports to OHDSI Tools, primarily focusing on ATLAS.
In later of development, this feature should spin-off to individual repo.
OHDSI tools were added as container compose of pre-installed R server and combo pack of WebAPI with DB instance for Atlas, this sprint also complete on workflow optimization by re-modular container script, tested in Podman.
Also a hot fix on Postgres of OMOP CDM about configuration file support and shared memory related config.
This development milestone focuses on data pipeline module for crossing platform between CDM and analytic instance include developing ETL tools for transferring data between databases, performing basic data validation prior to the transfer, implementing ATLAS DB (WebAPI) backup and data reload processes, and ensuring cross-platform compatibility.
This epic is on the way.
This will add database schema named vocab
of OMOP vocabulary into database of omop-pg
By the meaning of "built-in" because this section will be spin-off to individual repo instated of omop-core
feature.
Work on directory and workflow of OHDSI Tools before release (merge to main)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.