Tools for converting value sets in different formats, such as converting extensional value sets in CSV format to JSON format able to be uploaded to a FHIR server. Tools to automate CRUD operations such as reads and updates from various different data sources and web services.
- You must have Python3 installed.
- Run to clone repo:
git https://github.com/HOT-Ecosystem/ValueSet-Tools.git
- Change directory:
cd ValueSet-Converters
- Make & use virtual environment:
virtualenv env; source env/bin/activate
- Run to install dependencies:
pip install -r requirements.txt
- To use the "VSAC to OMOP/FHIR JSON" tool, which fetches from Google Sheets,
you'll need the following:
3.a. Access to this google sheet.
3.b. Placecredentials.json
andtoken.json
inside theenv/
directory. For BIDS members, these can be downloaded from the BIDS OneDrive here. - Create an
env/.env
file based onenv/.env.example
, replacingVSAC_API_KEY
with your own VSAC API key as shown in your profile. More instructions on getting an API key can be found in "Step 1" on this page. Or, if you are a BIDS member, you can simply download and use the.env
file from the BIDS OneDrive. It already has an API key from the shared UMLS BIDS account pre-populated.
First, cd
into the directory where this repository was cloned.
This will fetch OIDs from the "OID" column of this google sheet, make VSAC API calls, and produce output.
python3 -m vsac_wrangler <options>
Options:
Short flag | Long flag | Choices | Default | Description |
---|---|---|---|---|
-i |
--input-source-type |
['google-sheet', 'oids-txt'] |
'oids-txt' |
If "google-sheet", this will fetch from a specific, hard-coded Google Sheet, and pull OIDs from a specific column in that sheet. If "oids-txt" it will pull a list of OIDs from input/oids.txt . |
-g |
--google-sheet-name |
['CDC reference table list', 'VSAC Lisa1'] |
'CDC reference table list' |
The name of the tab within a the Google Sheet containing the target data within OID column. Make sure to encapsulate the text in quotes, e.g. -g "VSAC Lisa1" . This option can only be used if --input-source-type is google-sheet . |
-o |
--output-structure |
['fhir', 'vsac', 'palantir-concept-set-tables', 'atlas'] |
'vsac' |
Destination structure. This determines the specific fields, in some cases, internal structure of the data in those fields. |
-f |
--output-format |
['tabular/csv', 'json'] |
'json' |
The output format. If csv/tabular, it will produce a tabular file; CSV by default. This can be changed to TSV by passing "\t" as the field-delimiter. |
-d |
--tabular-field-delimiter |
[',', '\t'] |
',' |
Field delimiter for tabular output. This applies when selecting "tabular/csv" for "output-format". By default, uses ",", which menas that the output will be CSV (Comma-Separated Values). If "\t" is chosen, output will be TSV (Tab-Separated Values). |
-d2 |
--tabular-intra-field-delimiter |
[',', '\t', ';', '|'] |
| |
Intra-field delimiter for tabular output. This applies when selecting "tabular/csv" for "output-format". This delimiter will be used when a specific field contains multiple values. For example, in "tabular/csv" format, there will be 1 row per combination of OID (Object ID) + code system. A single OID represents a single value set, which can have codes from multiple code systems. For a given OID+CodeSystem combo, there will likely be multiple codes in the "code" field. These codes will be delimited using the "intra-field delimiter". |
-j |
--json-indent |
0 - 4 | 4 | The number of spacees to indent when outputting JSON. If 0, there will not only be no indent, but there will also be no whitespace. 0 is useful for minimal file size. 2 and 4 tend to be standard indent values for readability. |
-c |
--use-cache |
When running this tool, a cache of the results from the VSAC API will always be saved. If this flag is passed, the cached results will be used instead of calling the API. This is useful for (i) working offline, or (ii) speeding up processing. In order to not use the cache and get the most up-to-date results (both from (i) the OIDs present in the Google Sheet, and (ii) results from VSAC), simply run the tool without this flag. | ||
-h |
--help |
Shows help information for using the tool. |
python -m vsac_wrangler -o vsac -f tabular/csv -d \t -d2 , -c
First, convert your CSV to have column names like the example below. Then can run these commands.
python3 -m csv_to_fhir path/to/FILE.csv
python3 -m csv_to_fhir examples/1/input/n3cLikeExtensionalValueSetExample.csv
Before:
valueSet.id,valueSet.name,valueSet.description,valueSet.status,valueSet.codeSystem,valueSet.codeSystemVersion,concept.code,concept.display
1,bear family,A family of bears.,draft,http://loinc.org,2.36,1234,mama bear
1,bear family,A family of bears.,draft,http://loinc.org,2.36,1235,papa bear
1,bear family,A family of bears.,draft,http://loinc.org,2.36,1236,baby bear
After:
{
"resourceType": "ValueSet",
"id": 1,
"meta": {
"profile": [
"http://hl7.org/fhir/StructureDefinition/shareablevalueset"
]
},
"text": {
"status": "generated",
"div": "<div xmlns=\"http://www.w3.org/1999/xhtml\">\n\t\t\t<p>A family of bears.</p>\n\t\t</div>"
},
"name": "bear family",
"title": "bear family",
"status": "draft",
"description": "A family of bears.",
"compose": {
"include": [
{
"system": "http://loinc.org",
"version": 2.36,
"concept": [
{
"code": 1234,
"display": "mama bear"
},
{
"code": 1235,
"display": "papa bear"
},
{
"code": 1236,
"display": "baby bear"
}
]
}
]
}
}