The purpose of this file is to make it possible to translate the files into a common f

The output may look something like this: <div class="highlight highlight-source-js

Example, OMOP v4 person fields: <a href="https://gist.github.com/bruth/96e542a121efdf2

Standardize field names in data model files about data-models HOT 28 CLOSED

chop-dbhi commented on June 2, 2024

Standardize field names in data model files

from data-models.

Comments (28)

bruth commented on June 2, 2024

A little program can read in the original file, performing the mapping and output a "normalized" file.

from data-models.

gracebrownecodes commented on June 2, 2024

Any room for expanding the common format described above to include table-level attributes? Or are you just thinking consumers will get table names from the field descriptions (and not get multi-column unique constraints, indexes, primary keys, etc.)?

from data-models.

bruth commented on June 2, 2024

I presume we could generalize this an have the metadata file describe the entities in the file, e.g:

table: 
    path:
        - Table

field:
    path:
        - Table
        - Field
    doc: Description
    type: Type Precision
    required: Required

This would allow for supporting all kinds of variables since it is opt-in.

from data-models.

bruth commented on June 2, 2024

By file I mean the flat files containing information about the data model (those in the repository).

from data-models.

bruth commented on June 2, 2024

A logical extension would be to specify which files should be considered when parsing:

table: 
    files: ./tables.csv
    path:
        - Table

field:
    files: ./fields/*.csv
    path:
        - Table
        - Field
    doc: Description
    type: Type Precision
    required: Required

from data-models.

bruth commented on June 2, 2024

The output may look something like this:

[
    {
        "ident": "person/person_id",
        "doc": "A system-generated unique identifier for each person.",
        "type": "integer",
        "required": "Yes"
        "fields": [
            // Original fields in order...
        ]
    }
]

from data-models.

bruth commented on June 2, 2024

Example, OMOP v4 person fields: https://gist.github.com/bruth/96e542a121efdf2eabf1

from data-models.

gracebrownecodes commented on June 2, 2024

This looks nice to me for fields. The type of information I'm hoping to get into the tables flat file would are represented in the OMOP 5 DDL here.

from data-models.

gracebrownecodes commented on June 2, 2024

I suggest two additional files for each data models: constraints.csv and indexes.csv. See some arbitrary examples below, that include some possible edge cases I can imagine (complex constraint types, non-default index types, multi-column indexes).

constraints.csv

table	field	name	type
person	gender_concept_id		NOT NULL
care_site	care_site_name	uq_care_site_name	UNIQUE
organization	organization_name	check_pedsnet_orgs	CHECK IN (colorado, chop, nationwide, nemours, stlouis, seattle)

indexes.csv

table	field	name	type
person	year_of_birth	idx_person_yob
observation	observation_concept_id		hash
condition_occurrence	condition_id, condition_concept_id

from data-models.

gracebrownecodes commented on June 2, 2024

Is there a place for default values to be recorded? I'm thinking they should go in the <table name>.csv files as a field attribute.

from data-models.

bruth commented on June 2, 2024

We will give it a go.

from data-models.

bruth commented on June 2, 2024

@aaron0browne, @murphyke I updated the description of the ticket with the relevant fields. Let me know your thoughts.

from data-models.

gracebrownecodes commented on June 2, 2024

I like both the separation of schema and definition and the inclusion of model and version in each row. Charge ahead.

from data-models.

bruth commented on June 2, 2024

Great. I added the model and version fields just we do not need to rely on the directory structure for declaring the model and version.

from data-models.

murphyke commented on June 2, 2024

Looks good

from data-models.

bruth commented on June 2, 2024

In the OMOP model (and therefore PEDSnet) there are integer and number(x) types. Now that we are splitting the length, precision, and scale values into separate columns, shall we standardize on just number for the type? If there is a length, we know it is a bounded integer. If there is a scale and precision, we knows it's a decimal. I do not see a float anywhere, so I presume that is not used?

from data-models.

gracebrownecodes commented on June 2, 2024

Yes I think we should use an un-modified data type with optional precision and scale in separate columns. I actually like float instead of number for its specificity, and they are using it in the latest omop.

from data-models.

gracebrownecodes commented on June 2, 2024

And I think we could do the same with string.

from data-models.

bruth commented on June 2, 2024

So integer and float?

from data-models.

bruth commented on June 2, 2024

float is a bit weird to use in place of a decimal since it is technically not precise.

from data-models.

bruth commented on June 2, 2024

I see number => integer and decimal (with scale and precision) and float being it's own thing. I agree with string.

from data-models.

bruth commented on June 2, 2024

Also, because the type is unbounded from the properties, I am thinking of dropping the type column in the field definitions since they can be extracted from the schema themselves. That being said, if we would change the type definition to be more description such as concept when there is a foreign key relationship rather than an opaque integer, it could be more useful.

from data-models.

gracebrownecodes commented on June 2, 2024

I support dropping the type column from the description files. And I think the description you are talking about in your second sentence above is more similar to the field called standard in the omop definitions right now (which I think might be better described with a header like codeset).

from data-models.

gracebrownecodes commented on June 2, 2024

As for types, I've revised my opinion given your comments and think the value set should be:

integer
decimal
string
date
datetime
text?

from data-models.

bruth commented on June 2, 2024

So are we stating that a decimal without a scale and precision is interpreted as a float? I am not sure the value of text when we have string, unless it implies a large text field. We probably should add binary as well.

from data-models.

gracebrownecodes commented on June 2, 2024

I think a decimal without scale and precision gets the implementation defaults (like an unqualified NUMERIC in postgres). I just didn't include float because I haven't seen it in any of the models. Same for binary and boolean.

from data-models.

gracebrownecodes commented on June 2, 2024

Clarification: haven't seen a true float data type (I believe the V5 OMOP documentation does not actually mean float, but we should check on that, probably).

from data-models.

bruth commented on June 2, 2024

Even though we have not seen them doesn't mean we can't include them.

from data-models.

Standardize field names in data model files about data-models HOT 28 CLOSED

Comments (28)

constraints.csv

indexes.csv

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent