Comments (28)
A little program can read in the original file, performing the mapping and output a "normalized" file.
from data-models.
Any room for expanding the common format described above to include table-level attributes? Or are you just thinking consumers will get table names from the field descriptions (and not get multi-column unique constraints, indexes, primary keys, etc.)?
from data-models.
I presume we could generalize this an have the metadata file describe the entities in the file, e.g:
table:
path:
- Table
field:
path:
- Table
- Field
doc: Description
type: Type Precision
required: Required
This would allow for supporting all kinds of variables since it is opt-in.
from data-models.
By file I mean the flat files containing information about the data model (those in the repository).
from data-models.
A logical extension would be to specify which files should be considered when parsing:
table:
files: ./tables.csv
path:
- Table
field:
files: ./fields/*.csv
path:
- Table
- Field
doc: Description
type: Type Precision
required: Required
from data-models.
The output may look something like this:
[
{
"ident": "person/person_id",
"doc": "A system-generated unique identifier for each person.",
"type": "integer",
"required": "Yes"
"fields": [
// Original fields in order...
]
}
]
from data-models.
Example, OMOP v4 person fields: https://gist.github.com/bruth/96e542a121efdf2eabf1
from data-models.
This looks nice to me for fields. The type of information I'm hoping to get into the tables flat file would are represented in the OMOP 5 DDL here.
from data-models.
I suggest two additional files for each data models: constraints.csv
and indexes.csv
. See some arbitrary examples below, that include some possible edge cases I can imagine (complex constraint types, non-default index types, multi-column indexes).
constraints.csv
table | field | name | type |
---|---|---|---|
person | gender_concept_id | NOT NULL | |
care_site | care_site_name | uq_care_site_name | UNIQUE |
organization | organization_name | check_pedsnet_orgs | CHECK IN (colorado, chop, nationwide, nemours, stlouis, seattle) |
indexes.csv
table | field | name | type |
---|---|---|---|
person | year_of_birth | idx_person_yob | |
observation | observation_concept_id | hash | |
condition_occurrence | condition_id, condition_concept_id |
from data-models.
Is there a place for default values to be recorded? I'm thinking they should go in the <table name>.csv
files as a field attribute.
from data-models.
We will give it a go.
from data-models.
@aaron0browne, @murphyke I updated the description of the ticket with the relevant fields. Let me know your thoughts.
from data-models.
I like both the separation of schema and definition and the inclusion of model and version in each row. Charge ahead.
from data-models.
Great. I added the model and version fields just we do not need to rely on the directory structure for declaring the model and version.
from data-models.
Looks good
from data-models.
In the OMOP model (and therefore PEDSnet) there are integer
and number(x)
types. Now that we are splitting the length, precision, and scale values into separate columns, shall we standardize on just number
for the type? If there is a length, we know it is a bounded integer. If there is a scale and precision, we knows it's a decimal. I do not see a float
anywhere, so I presume that is not used?
from data-models.
Yes I think we should use an un-modified data type with optional precision and scale in separate columns. I actually like float
instead of number
for its specificity, and they are using it in the latest omop.
from data-models.
And I think we could do the same with string
.
from data-models.
So integer
and float
?
from data-models.
float
is a bit weird to use in place of a decimal since it is technically not precise.
from data-models.
I see number
=> integer
and decimal
(with scale and precision) and float
being it's own thing. I agree with string
.
from data-models.
Also, because the type is unbounded from the properties, I am thinking of dropping the type
column in the field definitions since they can be extracted from the schema themselves. That being said, if we would change the type
definition to be more description such as concept
when there is a foreign key relationship rather than an opaque integer
, it could be more useful.
from data-models.
I support dropping the type
column from the description files. And I think the description you are talking about in your second sentence above is more similar to the field called standard
in the omop definitions right now (which I think might be better described with a header like codeset
).
from data-models.
As for types, I've revised my opinion given your comments and think the value set should be:
integer
decimal
string
date
datetime
text
?
from data-models.
So are we stating that a decimal
without a scale and precision is interpreted as a float
? I am not sure the value of text
when we have string
, unless it implies a large text field. We probably should add binary
as well.
from data-models.
I think a decimal
without scale and precision gets the implementation defaults (like an unqualified NUMERIC
in postgres). I just didn't include float
because I haven't seen it in any of the models. Same for binary
and boolean
.
from data-models.
Clarification: haven't seen a true float
data type (I believe the V5 OMOP documentation does not actually mean float, but we should check on that, probably).
from data-models.
Even though we have not seen them doesn't mean we can't include them.
from data-models.
Related Issues (20)
- Organize files by table
- Downstream packages to start requiring type-specific attributes
- Issues in creating pcornet schema from automated ddl HOT 3
- OHDSI added new column in drug_strength, denominator value HOT 3
- Separate OMOP Vocabulary data model? HOT 2
- Updates to PCORnet v3 field lengths
- Maybe create additional indexes for pedsnet, pcornet
- Allow two PEDSnet vocabulary columns to be NULL
- Missing concept_id constraints from PEDSnet v2.2/v2.3 HOT 1
- Re-add pedsnet foreign key indexes
- OHDSI added box_size to the drug_strength table HOT 5
- Final tweak to pedsnet 2.4.0 era table additions HOT 3
- Add PCORNET v3.1 model
- Add version 5.1 of OMOP
- Add version 2.5 of PEDSnet
- CI configuration HOT 3
- Fix omop 5.0.0 HOT 2
- Possible Changes for Data Models Generator HOT 1
- Data Model service not generating timestamp fields for oracle DDL HOT 1
- Table: hash_token | data type: varchar(18) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data-models.