The data-models from chop-dbhi

unnamed foreign key constraints in omop v5

Standardize field names in data model files

The purpose of this file is to make it possible to translate the files into a common format for downstream consumption.

Table definition file fields: tables.csv

model (required) - Name of the model
version (required) - Version of the model
table (required) - Name of the table
label - Label for the table.
description - Description of the table.

Field definition file fields: definitions/<table>.csv

model (required) - Name of the model.
version (required) - Version of the model.
table (required) - Name of the table.
field (required) - Name of the field.
label - Corresponds to the field label. This defaults to table + field
description - Corresponds to the description or documentation of the field.
type - Describes the data type of the value the field holds.
ref_table - The table this field references.
ref_field - The field this field references.

Field schema file fields: schema/<table>.csv

model (required) - Name of the model.
version (required) - Version of the model.
table (required) - Name of the table.
field (required) - Name of the field.
type (required) - Data type of the field.
length - Describes the maximum length of the value.
precision - Describes the precision of value specified in type, typically for a number.
scale - Describes the scale of the value specified in type, typically for a number.
default - Defines the default value for the field.

Constraints file fields: constraints/<table>.csv

model (required) - Name of the model.
version (required) - Version of the model.
table (required) - Table the constraint applies to.
fields - One or more fields the constraint applies to.
type (required) - Type of constraint.
name - Suggested name of the constraint

Indexes file fields: indexes/<table>.csv

model (required) - Name of the model.
version (required) - Version of the model.
table (required) - Name of the table.
fields (required) - One or more fields the index applies.
name - Suggested name of the index.
type - Suggested type of the index.

Add PCORnet v3

I don't believe this is published yet.

Consider adding datapackage.json files

This would enable integration with the Open Knowledge Foundation: http://data.okfn.org/doc/data-package

Add links to compare model against previous versions

Currently the /compare endpoint needs to be accessed directly, however links generated for each pair of versions for a model, e.g. v1 → v2 and v2 → v3.

Proposal for supporting non-table based data models

The motivation is to support other structures such as REDCap data dictionaries (form, section, field), Harvest metadata (category, concept, field), etc. The benefit is that data models even with different structures can be maintained and represented in a similar way. The interesting bit is what will come out of defining the mappings between disparate models.

Support for this requires generalizing the format to support any DAG (directed acyclic graph; e.g. hierarchy) using a variable length path rather than a static table. For example a REDCap form → section → field could be represented this way:

model	version	path	name
redcap_project	v1		demographics
redcap_project	v1	demographics	location
redcap_project	v1	demographics/location	city

An empty path denotes name is a root element in the model (a table in the case of a relational model). The full identifier within a model would be path + name

Here are few survey questions I would appreciate interested members to answer:

Do you have non-relational models that you want represented in this format?
What is the advantage for you to represent all of your data models in one format?
For the data models that are related, do you intend to define mappings between them?
What changes to the current views (HTML/Markdown) would you want or expect to see?
Does this alternate naming feel more cumbersome or confusing to work with?

I will note that this format could live alongside the current one.

/cc @aaron0browne @murphyke @tjrivera

Add landing page for service

The format will vary based on format. See #25

Add PCORnet v3 mappings

Integrate renaming file

A renaming file declares name changes across two versions in a model. This can be used in two areas:

Link on target version back to source version, e.g. Previously named...
Comparison output would show rename rather than add + remove

Add indexes for i2b2_pedsnet

Currently there is just one; committed by accident (which will be removed by a pending PR). The old original i2b2 indexes are very heavy-weight and largely a huge waste of disk and load time as far as pedsnet is concerned.

i2b2_pedsnet v2 `pk_i2b2` constraint refers to non-existent `i2b2.id` column

This probably came from a Django implementation? In any case, some implementations obviously require a PK, but we shouldn't enforce that at the data model level. The Django model generator will create a surrogate PK if required. So if there's no PK on this table in the original DDL, let's remove the constraint.

Break up field-based CSVs into smaller files

For example, one CSV file per table.

Add concept table to PEDSnet models

Add basic top-bar navigation to HTML view

This is primarily to jump between the major endpoints.

Needs change in table definition

In Oracle Section under FULL DDL, First table has following
conditionid VARCHAR2 NOT NULL,
encounterid VARCHAR2,
patid VARCHAR2 NOT NULL,
raw_condition VARCHAR2,
raw_condition_source VARCHAR2,
raw_condition_status VARCHAR2,
raw_condition_type VARCHAR2,

Varchar2 without size specifications will result in errors.

Parth

Add tool to generate a data model from an existing database

This will be implemented using SQLAlchemy since it has a robust and consistent way of extracting metadata from databases (as supposed to JDBC).

Document file formats

See #5 as a starting point.

More missing schema files in pedsnet v2

domain
relationship
concept_relationship

Errors in csv data found through DDL work

Specific issues:

pedsnet.v1.person.day_of_birth.type from numder -> number
pedsnet.v2.vocabulary.vocabulary_concept_id from '' -> integer
pedsnet.v1.indexes.idx_visit_person_id and ...idx_visit_concept_id replaced by ...idx_visit_person_date
omop.v5.cohort_definition definitions field cohort_instantiation_date -> cohort_initiation_date
omop.v5.concept definitions remove field concept_level
omop.v5.fact_relationship schema field domain_concept_id split into ..._1 and ..._2
omop.v5.constraints.primary_keys.xpk_cohort_attribute remove in favor of xpk_cohort_definition

Larger issues:

Can the JSON object always adhere to the defined structure, even if the underlying lists are empty? (For example, can the pcornet.v3 json have schema.constraints.not_null and foreign_keys set to [] instead of none?)
~~The omop.v4 json endpoint is not returning anything.~~
Several pcornet.v1.schema.vital fields have integer types with a length attribute, which is choking the SQLAlchemy constructors... What is the intended meaning of this attribute? Should these be number types (with the length attribute moved to precision instead)?
Downcase all object names in the pcornet models.
Several i2b2.v1_7.schema fields share the integer with length problem described above.
Is timestamp as used in i2b2.v1_7 significantly different from datetime elsewhere?

Add support for indexing multiple repos

The motivation is to maintain a separate repository or private or internal data models that can be merged in with public data models.

Remove ref_table and ref_field from field definitions

These fields were originally used to represent references, but since has been replaced with reference file types. They are no longer being used in the service as well.

PEDSnet CDM V2 notes

visit_occurrence table
1. visit_start_time is not required
2. ...date fields should be date type`
3. visit_end_date is not required
4. place_of_service_source_value should be removed
5. visit_type_concept_id should be added
procedure_occurrence table
1. relevant_condition_concept_id should be removed
2. procedure_date should be date type
provider table
1. gender_source_concept_id should be included
care_site table
1. specialty_source_value should be added

Other:

relevant_condition_concept_id should not exist in OMOPV5.drug_exposure table

Replace datamodel.json with models.csv file

Remove dependence on model definition

Currently, the directory of the datamodel.json file is walked and bound to the declared model. This presumes a structure which is not necessary since each file type can stand alone (each one contains the model and version).

The first step is to replace datamodel.json with a CSV file (see #45), then update the walk algorithm to rely on the model specified in the file.

Add directory containing template files

This is to enable creators of new data models to have a starting point.

Add "renaming" files

A renaming file specifies fields that have been renamed between versions which cannot be inferred from a simple diff mechanism.

Proposed fields:

model
source_version
source_table
source_field
target_version
target_table
target_field

/cc @aaron0browne @murphyke

Add pedsnet v2 flavor of i2b2 data models

Will reorganize the i2b2 directory to contain V1 and V2; existing stuff will move into V1. The V2 directory will be populated with content that can be digested by https://github.com/chop-dbhi/data-models, i.e. a handful of CSV files describing the data model.

i2b2 pedsnet v2 index refers to nonexistent i2b2.id field

Remove *.numbers files

These used in the early stages make it easier to do mass edits.

pedsnet v2 concept_class schema missing

Add field in models.csv to declare base model

This may be a bit too specific to our use case, but declaring the base model could help to infer some things. For example, the PEDSnet data model is based on OMOP so we could rely on that for implicit mappings and add a file type to declare the diff from the base model (rather than redefining all the fields).

Add constraints and indexes to the JSON service endpoint

Integers with length in i2b2_pedsnet v2 schemas

Sort CSV files by identifier columns

This simply makes it easy to read and navigate the documents.

Investigate JSON Table Schema

See http://dataprotocols.org/json-table-schema/

Use content negotiation to determine output format

The order precedence is:

Accept header e.g. Accept: application/json
Extension on URL, e.g. /models/pcornet/v1.html

Add PCORnet v2 mappings

Implement service endpoint to generate DDL

This encapsulates a call to the service layer described here and by passing the output described in #18 with the target database engine.

Add support for label field in tables.csv

Pedsnet v2 idx_visit_payer_visit_occurrence_id exceeds oracle identifier max length

Add schema, constraint, and index files for PEDSnet models

See #5 for the original discussion, but here are relevant fields:

Field schema file fields: schema/<table>.csv

model (required) - Name of the model.
version (required) - Version of the model.
table (required) - Name of the table.
field (required) - Name of the field.
type (required) - Data type of the field.
length - Describes the maximum length of the value.
precision - Describes the precision of value specified in type, typically for a number.
scale - Describes the scale of the value specified in type, typically for a number.
default - Defines the default value for the field.

Constraints file fields: constraints/<table>.csv

model (required) - Name of the model.
version (required) - Version of the model.
table (required) - Table the constraint applies to.
field - One or more fields the constraint applies to.
type (required) - Type of constraint.
name - Suggested name of the constraint

Indexes file fields: indexes/<table>.csv

model (required) - Name of the model.
version (required) - Version of the model.
table (required) - Name of the table.
field (required) - One or more fields the index applies.
name - Suggested name of the index.
type - Suggested type of the index.
unique - If true, denotes a unique index.
order - For ordered indexes, specifies asc or desc

This issue applies to v1 and v2:

Fill in remaining schema fields (default values)
Constraints
Indexes

Integrate references, constraints, and indexes

Add structs for each type
~~Render in templates~~

Develop website for rendering various views of the data models

Depends on #5

Encoding error

See pedsnet.v2.concept.concept_class_id.description for the example:

"The category or class of the concept along both the hierarchical tree as well as different domains within a vocabulary. Examples are â€œClinical Drugâ€�, â€œIngredientâ€�, â€œClinical Findingâ€� etc. "

Refactor compare resource to use template

Define struct to contain the comparison outcome
Add direct links to the model, table, and field definitions
Add Summary section at the top to highlight all things that changed

Cannot find path to repository on refresh

Log data:

2015-05-22T11:46:02.976121598Z fatal: Not a git repository (or any of the parent directories): .git
2015-05-22T11:46:02.976765909Z time="2015-05-22T11:46:02Z" level=fatal msg="problem pulling repo: exit status 128"

Add data model "mapping" files

A map file defines the relationship between two models. The intended audience are people who want to learn more about how data models are related. There is not enough detail in these map files for authors of ETL code.

The following fields are being proposed. Even though "source" and "target" imply directionality, there is no inherent directionality the way fields are mapped. comment contains any useful or necessary high-level information about a non-obvious mapping.

source_model
source_version
source_table
source_field
target_model
target_version
target_table
target_field
comment

/cc @aaron0browne @murphyke

chop-dbhi / data-models Goto Github PK

data-models's People

Contributors

Stargazers

Watchers

Forkers

data-models's Issues

Recommend Projects

Recommend Topics

Recommend Org