Coder Social home page Coder Social logo

Comments (28)

bruth avatar bruth commented on June 2, 2024

A little program can read in the original file, performing the mapping and output a "normalized" file.

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

Any room for expanding the common format described above to include table-level attributes? Or are you just thinking consumers will get table names from the field descriptions (and not get multi-column unique constraints, indexes, primary keys, etc.)?

from data-models.

bruth avatar bruth commented on June 2, 2024

I presume we could generalize this an have the metadata file describe the entities in the file, e.g:

table: 
    path:
        - Table

field:
    path:
        - Table
        - Field
    doc: Description
    type: Type Precision
    required: Required

This would allow for supporting all kinds of variables since it is opt-in.

from data-models.

bruth avatar bruth commented on June 2, 2024

By file I mean the flat files containing information about the data model (those in the repository).

from data-models.

bruth avatar bruth commented on June 2, 2024

A logical extension would be to specify which files should be considered when parsing:

table: 
    files: ./tables.csv
    path:
        - Table

field:
    files: ./fields/*.csv
    path:
        - Table
        - Field
    doc: Description
    type: Type Precision
    required: Required

from data-models.

bruth avatar bruth commented on June 2, 2024

The output may look something like this:

[
    {
        "ident": "person/person_id",
        "doc": "A system-generated unique identifier for each person.",
        "type": "integer",
        "required": "Yes"
        "fields": [
            // Original fields in order...
        ]
    }
]

from data-models.

bruth avatar bruth commented on June 2, 2024

Example, OMOP v4 person fields: https://gist.github.com/bruth/96e542a121efdf2eabf1

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

This looks nice to me for fields. The type of information I'm hoping to get into the tables flat file would are represented in the OMOP 5 DDL here.

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

I suggest two additional files for each data models: constraints.csv and indexes.csv. See some arbitrary examples below, that include some possible edge cases I can imagine (complex constraint types, non-default index types, multi-column indexes).

constraints.csv

table field name type
person gender_concept_id NOT NULL
care_site care_site_name uq_care_site_name UNIQUE
organization organization_name check_pedsnet_orgs CHECK IN (colorado, chop, nationwide, nemours, stlouis, seattle)

indexes.csv

table field name type
person year_of_birth idx_person_yob
observation observation_concept_id hash
condition_occurrence condition_id, condition_concept_id

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

Is there a place for default values to be recorded? I'm thinking they should go in the <table name>.csv files as a field attribute.

from data-models.

bruth avatar bruth commented on June 2, 2024

We will give it a go.

from data-models.

bruth avatar bruth commented on June 2, 2024

@aaron0browne, @murphyke I updated the description of the ticket with the relevant fields. Let me know your thoughts.

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

I like both the separation of schema and definition and the inclusion of model and version in each row. Charge ahead.

from data-models.

bruth avatar bruth commented on June 2, 2024

Great. I added the model and version fields just we do not need to rely on the directory structure for declaring the model and version.

from data-models.

murphyke avatar murphyke commented on June 2, 2024

Looks good

from data-models.

bruth avatar bruth commented on June 2, 2024

In the OMOP model (and therefore PEDSnet) there are integer and number(x) types. Now that we are splitting the length, precision, and scale values into separate columns, shall we standardize on just number for the type? If there is a length, we know it is a bounded integer. If there is a scale and precision, we knows it's a decimal. I do not see a float anywhere, so I presume that is not used?

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

Yes I think we should use an un-modified data type with optional precision and scale in separate columns. I actually like float instead of number for its specificity, and they are using it in the latest omop.

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

And I think we could do the same with string.

from data-models.

bruth avatar bruth commented on June 2, 2024

So integer and float?

from data-models.

bruth avatar bruth commented on June 2, 2024

float is a bit weird to use in place of a decimal since it is technically not precise.

from data-models.

bruth avatar bruth commented on June 2, 2024

I see number => integer and decimal (with scale and precision) and float being it's own thing. I agree with string.

from data-models.

bruth avatar bruth commented on June 2, 2024

Also, because the type is unbounded from the properties, I am thinking of dropping the type column in the field definitions since they can be extracted from the schema themselves. That being said, if we would change the type definition to be more description such as concept when there is a foreign key relationship rather than an opaque integer, it could be more useful.

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

I support dropping the type column from the description files. And I think the description you are talking about in your second sentence above is more similar to the field called standard in the omop definitions right now (which I think might be better described with a header like codeset).

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

As for types, I've revised my opinion given your comments and think the value set should be:

  • integer
  • decimal
  • string
  • date
  • datetime
  • text?

from data-models.

bruth avatar bruth commented on June 2, 2024

So are we stating that a decimal without a scale and precision is interpreted as a float? I am not sure the value of text when we have string, unless it implies a large text field. We probably should add binary as well.

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

I think a decimal without scale and precision gets the implementation defaults (like an unqualified NUMERIC in postgres). I just didn't include float because I haven't seen it in any of the models. Same for binary and boolean.

from data-models.

gracebrownecodes avatar gracebrownecodes commented on June 2, 2024

Clarification: haven't seen a true float data type (I believe the V5 OMOP documentation does not actually mean float, but we should check on that, probably).

from data-models.

bruth avatar bruth commented on June 2, 2024

Even though we have not seen them doesn't mean we can't include them.

from data-models.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.