Coder Social home page Coder Social logo

aced-idp / dictionaryutils Goto Github PK

View Code? Open in Web Editor NEW

This project forked from uc-cdis/dictionaryutils

0.0 0.0 0.0 256 KB

python wrapper and metaschema for datadictionary

License: Apache License 2.0

Shell 2.75% Python 94.63% Dockerfile 2.62%

dictionaryutils's Introduction

dictionaryutils

python wrapper and metaschema for datadictionary. It can be used to:

  • load a local dictionary to a python object.
  • dump schemas to a file that can be uploaded to s3 as an artifact.
  • load schema file from an url to a python object that can be used by services

Test for dictionary validity with Docker

Say you have a dictionary you are building locally and you want to see if it will pass the tests.

You can add a simple alias to your .bash_profile to enable a quick test command:

testdict() { docker run --rm -v $(pwd):/dictionary quay.io/cdis/dictionaryutils:master; }

Then from the directory containing the gdcdictionary directory run testdict.

Generate simulated data with Docker

If you wish to generate fake simulated data you can also do that with dictionaryutils and the data-simulator.

simdata() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "cd /dictionary && python setup.py install --force; python /src/datasimulator/bin/data-simulator simulate --path /simdata/ $*; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS"; }
simdataurl() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "python /src/datasimulator/bin/data-simulator simulate --path /simdata/ $*; chmod -R a+rwX /simdata"; }

Then from the directory containing the gdcdictionary directory run simdata and a folder will be created called simdata with the results of the simulator run. You can also pass in additional arguments to the data-simulator script such as simdata --max_samples 10.

The --max_samples argument will define a default number of nodes to simulate, but you can override it using the --node_num_instances_file argument. For example, if you create the following instances.json:

{
        "case": 100,
        "demographic": 100
}

Then run the following:

docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "cd /dictionary && python setup.py install --force; python /src/datasimulator/bin/data-simulator simulate --path /simdata/ --program workshop --project project1 --max_samples 10 --node_num_instances_file instances.json; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS";

Then you'll get 100 each of case and demographic nodes and 10 each of everything else. Note that the above example also defines program and project names.

You can also run the simulator for an arbitrary json url by using simdataurl --url https://datacommons.example.com/schema.json.

Use dictionaryutils to load a dictionary

from dictionaryutils import DataDictionary

dict_fetch_from_remote = DataDictionary(url=URL_FOR_THE_JSON)

dict_loaded_locally = DataDictionary(root_dir=PATH_TO_SCHEMA_DIR)

Use dictionaryutils to dump a dictionary

import json
from dictionaryutils import dump_schemas_from_dir

with open('dump.json', 'w') as f:
    json.dump(dump_schemas_from_dir('../datadictionary/gdcdictionary/schemas/'), f)

dictionaryutils's People

Contributors

philloooo avatar paulineribeyre avatar michaellukowski avatar zflamig avatar vpsx avatar mjmartinson avatar themarcelor avatar giangbui avatar binamb avatar m0nhawk avatar frickjack avatar fantix avatar vzpgb avatar cmlsn avatar drspez avatar aidanhilt avatar mysterious-progression avatar mikeabreu avatar andrzejgrzelak avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.