Coder Social home page Coder Social logo

alt text

Table of contents

FAIRtracks - metadata standard for genomic tracks

FAIRtracks is a set of JSON Schemas developed through the ELIXIR implementation study: "FAIRification of Genomic Tracks", as a minimal standard for genomic track metadata. For more information on the implementation study, please check out:

https://fairtracks.github.io/

FAIRtracks version information

FAIRtracks v1.0.2

Overview of structure of the FAIRtracks standard

Making changes to the standard

Dependencies for running the scripts

  • Linux-like shell with "bash". Mac OS X will do, but you probably need to install either XCode (from the App Store) or the XCode Command line tools.
  • Python >= 3.6
  • Node.js >= v10 and npm >= 3.10.8
  • git (relatively recent version is probably best)
  • On Mac OS X, all the above can be installed using HomeBrew.
  • An OPML editor is also recommended, but not required. See OPML editors below for more information.

Overview of how to contribute

  1. Create personal fork in GitHub ("Fork" button).
  2. Clone the fork to your computer (e.g., git clone https://github.com/myusername/fairtracks_standard.git).
  3. Run make raw, and edit the raw OPML files to your liking. For more information about the make targets, see below.
  4. Run make or make all
  5. Repeat step 4 and 5 until you are satisfied with the changes.
  6. Run make rawclean to remove the raw OPML files before committing.
  7. Commit and push your changes to a feature branch in your personal fork and create a pull request, as described in the standard GitHub Flow workflow.
  8. Once the Pull Request is accepted:
    • Pull the latest changes in the master branch to your local repo.
    • Rebase your feature branch on top of master.
    • Make sure that all commits are consistently built. The automatically installed git-hooks will also check for consistency. To make a commit consistent, rebuild it with the rebuild_all.sh script. To clean up previous commits, use interactive rebase as described under 1b. make git-hooks below.
  9. Force push your feature branch to your personal fork, which should update the pull request, and notify us.

Overview of file types and auto-generation

There is an inherent order to the different types of files in this repo, defined in the Makefile. The FAIRtracks standard is almost fully defined in the OPML files found under json/overview, with just a small bit of top-level logic being handled by opml_to_json.py. All the JSON Schema and JSON example files are automatically generated based upon the OPML files. Such automatic file generation are handled by various make targets:

1. Automatic make targets for initial setup

These make targets are run automatically if needed by the other make targets, but are also available for manual use if there is need.

a. make venv

  • Autogenerates a Python virtual environment in the .venv directory, if not already present. In case the Python executable you want to link up to the virtual environment is located in a non-standard path, you can use the environment variable PYTHON_EXE before the first make venv command. For instance:

    PYTHON_EXE=/path/to/my/python3 make .venv

b. make git-hooks

  • Installs the version-controlled git hooks into the local repo. The git hooks makes sure that:

    1. All changed files are committed together
    2. All secondary files have been recompiled with make

    The checks are run before git commits or remote pushes are finalized.

    It is especially important that the git hooks are installed before merging or rebasing is done, as the SHA256 signatures of the JSON files may then need to be recalculated (by make) on merged/rebased commits. To fix such issues (which will appear when trying to push to GitHub) one will need to carry out an interactive rebase:

    1. Start interactive rebase: git rebase -i $FIRST_COMMIT^, where $FIRST_COMMIT is the first commit that need editing (you can find this in the log messages from the failed remote push).
    2. In the editor that appears, replace pick with edit for the commits that needs editing. You should also at this point plan to clean up your commits by reordering or squashing them, as well as improving the commit messages.
    3. ./rebuild_all.sh
    4. For all changed files: git add $FILE
    5. git commit --amend
    6. git rebase --continue
    7. Repeat iii-vi for all commits selected for editing.

c. make jsonschema2md

  • Installs the node package "jsonschema2md" which is used to generate the JSON Schema documentation. The package is installed under "node_modules", together with all its dependencies.

2. Main process (with make targets) for making changes to the standard

The following process should be followed when changing the contents of the FAIRtracks standard itself:

a. make raw

  • This makes copies of the existing *.opml files into similarly names *.raw.opml files. The raw OPML files are made to be opened for editing in specialized outlining tools. As such tools vary in the exact content of the exported OPML files, the raw OPML files need to be compiled into standardized, cleaned-up versions before they are committed to git.
  • You only need to run make raw once. If you accidentally run the command twice, any existing raw OPML files will be renamed to *.raw.opml.old.
  • The raw OPML files are ignored by git and can be edited in an OPML editor of choice. See OPML file format below for more information.
  • Be sure to delete the raw OPML files (with make rawclean) before carrying out any git commands. This is important, as e.g. changing branches will not change the raw OPML files, since they are ignored by git. Thus, if one fails to remove the raw OPML files before switching commits, make will just regenerate the prevous commit on top of the new one.

b. make or make all

  • After the raw OPML files have been edited, make runs:

    • make opml to generate cleaned up, standardized versons of the raw OPML files.
    • make json to generate JSON Schema files and related example JSON files from the cleaned up OPML files.
    • make docs to generate Markdown documentation files under the docs directory.

    All the generated JSON Schema files, as well as the top-level JSON example file, include a stable SHA256 signature of their contents.

3. make targets for overview and cleanup

a. make signature

  • Computes and prints the stable SHA256 signature for all the JSON files.

b. make rawclean

  • Removes all raw OPML and related .old files.
  • Should only be run if you are sure that all changes in the raw OPML files have propagated to other files, i.e. you should make sure that you have run make first.
  • Raw OPML files must be removed prior to running any git command, as explained above, section 2a.

c. make clean

  • Runs make rawclean, in addition to removing the virtual environment in the .venv directory, the git hooks, and the node_modules directory.

OPML file format

OPML is a standard file format defined specifically for outlining software.

OPML editors

Raw OPML files can be edited by specific outlining tools, but as the format it is a subtype of XML one can also use generic XML editors:

  • On Mac OS, we recommend using the commercial tool OmniOutliner, as there are really no open source alternatives with similar user interface.
  • As an open source, platform-agnostic alternative, we recommend TreeLine.
  • The OPML files can of course also be edited manually, in which case you can ignore the raw OPML files completely.

How the FAIRtracks standard is defined in OPML

  • Each <outline> tag defines a JSON property, with the hierarchy defined by the XML hierarchy.

  • The details for each JSON property is defined by a set of possible attributes for each tag. Many of the standard JSON Schema keywords are directly supported:

    Attribute Description
    _name The name of the JSON property.
    const Constant value (the only value allowed).
    default Default value if no value is provided.
    description Human-readable description of the property.
    enum Set of values allowed, separated by |.
    examples Set of example values, separated by |. All properties must have the same number of examples (or none) within each JSON Schema.
    format Format of current string property. Supports all of the standard JSON Schema formats, and in addition we support two custom formats: "curie" and "term", for respectively Identifiers.org-resolvable CURIEs and ontology terms.
    minItems Minimal number of items in current array property.
    pattern Regexp format for current string property.
    ref JSON Pointer to another JSON Schema to import under the property.
    required If "true" the current property is required.
    title Title of the JSON Schema
    type Data type of the current property: string, object, array, number, or boolean.

    In addition to the standard JSON keywords detailed above, a set of extended attributes have been defined:

    Attribute Description
    ancestors Ontology labels, separated by |, used to validate properties in term format. At least one of these terms must be an ancestor of the value in one of the specified ontologies.
    autogenerated If true, the contents of the current property will be filled automatically by the FAIRtracks autogenerate service (to be implemented later).
    comments Comments that will remain in the OPML files only.
    constIf If the specified if_property has the specified if_value, the current property must follow the specified then_value, interpreted as const.
    foreignProperty JSON Pointer to a linked identifier property in another schema. Two JSON documents, one following the current schema and the other following the foreign schema, are related if the values in the two linked properties are the same.
    matchType Validation rule. For properties in curie format: either basic, loose, or canonical. For properties in curie format: either exact, suffix, or label.
    namespace Namespaces, separated by |, registered in http://identifiers.org. Is used to validate curie values.
    ontology URLs to downloadable ontologies in OWL format, separated by |. To be used to validate properties in term format, which is used for ontology term_id properties.
    ontologyTermPair Pair of JSON Pointers in the format id=IDPTR;label=LABELPTR, where IDPTR and LABELPTR are JSON Pointers to, respectively, an ontology term id and its corresponding (primary) label. Currently only pointers to child properties are supported, e.g. id=0/term_id;label=0/term_label. To be used in autogeneration and validation.
    requireAnyOf For every level of the object hierarchy, at least one of the properties with requireAnyOf="true" at that level is required.
    requireIf If the specified if_property has the specified if_value, the current property is required.
    unique If "true" the value of the current property must be unique across all JSON documents.

    For more information, please visit the FAIRtracks validator GitHub repository (see VALIDATION.md for directions).

  • The constIf and requireIf attributes require the value to follow a specific pattern:

    Pattern part Description Attribute(s) Obligatory Example
    if_property= Relative JSON Pointer to property to check constIf requireIf Yes 2/technique/term_id=
    if_value Value to check for constIf requireIf Yes http://purl.obolibrary.org/obo/OBI_0001853
    ; If-then delimiter constIf Yes ;
    then_property= Relative JSON Pointer to property to acquire const value constIf No 1/term_id=
    then_value const value for then_property constIf Yes http://purl.obolibrary.org/obo/SO_0000685
    | Pattern delimiter (between patterns if more than one) constIf requireIf No
  • In order to support multiple OPML editors, the first <outline> tag in the OPML files (the one with _text="#title") should contain all properties in alphabetical order, with an attached value (typically "." or "0"). These parameters are ignored for that line (as it is just used to generate the title of the JSON Schema).

  • When adding, removing, or renaming attributes:

    • Please update the first <outline> tag (with _text="#title") for all OPML files, as described directly above.
    • In most cases, new attributes should also be added to the ATTRIBS_TO_IMPORT constant in the opml_to_json.py script, in the order in which they should appear in the generated JSON Schemas.

Validation

FAIRtracks's Projects

fairfiller icon fairfiller

Tool that fills 'term_value' based on ontology in the 'term_iri'

fairtracks_json_to_gsuite icon fairtracks_json_to_gsuite

REST web service which converts track metadata that follows the FAIRtracks JSON Schema (https://github.com/FAIRtracks) into GSuite format

fairtracks_standard icon fairtracks_standard

FAIRtracks is a JSON Schema defining a minimal standard for genomic track metadata.

json-schema icon json-schema

JSON Schema validator for java, based on the org.json API

omnipy icon omnipy

Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration (under development)

pdocs icon pdocs

A simple program and library to auto generate API documentation for Python modules.

python-tabulate icon python-tabulate

Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.

sqip-loader icon sqip-loader

Loads images and exports tiny SQIP previews as image/svg+xml URL-encoded data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.