Coder Social home page Coder Social logo

dnastack / data-connect-trino Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 3.0 3.24 MB

Cloned from https://github.com/DNAstack/ga4gh-search-adapter-presto

License: Apache License 2.0

Shell 2.11% Dockerfile 1.65% Java 96.16% HTML 0.08%
active data-connect publisher trino

data-connect-trino's People

Contributors

angelo-dnastack avatar avikamdna avatar dependabot[bot] avatar dna-minn avatar elise-dnastack avatar jabran-khan avatar jfuerth avatar jstromsky avatar kevin-dna avatar lukecashion avatar mbarkley avatar patmagee avatar prajjwolmondal avatar sharvari-kapadia avatar shiroyuki avatar sokarthika avatar usanthan avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

data-connect-trino's Issues

Review modularization/service architecture and config for DataModelSupplier

See addition of DataModelSupplier implementations in https://github.com/ianfore/data-connect-trino

Raises some questions about how DataModelSuppliers are modularized. Having them as services is a good approach, but should the specific clients be in the main code base? Or should the clients be external? In the latter case a running implementation would likely be told via some config details where to find its DataModelSuppliers.

Also in the examples in https://github.com/ianfore/data-connect-trino one model supplier was not implemented as a web service. That client accessed schema files locally.

The other DataModelSupplier was a hybrid. XML data dictionaries were accessed over http/ftp so in that sense the client was a true client. However, it also took on the responsibility of transforming the XML to the json schema required for Data Connect.

The original implementation, and the additions above were all pragmatic choices adequate for current needs, but we should anticipate how this would scale as more Data Connect implementations are added. These considerations are likely also relevant to the Starter Kit implementation being developed by the GA4GH tech team.

Consider/implement alternate schema/model representations

As we widen Data Connect interactions with other GA4GH work streams it may be worth an experimental branch in which to implement DataModelSuppliers that provide the schema or model in different forms than the current json schema. The purpose of this would be experimental - to match the representation required for user need with what different schema types provide. How models are represented in GA4GH is currently an open question being considered by TASC. That the Data Connect implementation could provide a workbench to test out different possibilities may be helpful to that effort.

Some formats worth looking at might include, but are most definitely not limited to

  • Simple extended dbGAP dictionary format
  • ISO11179 - at least a couple of metadata repositories of relevance use this standard.
  • R approach to documenting data structures
  • Link-ML
  • SchemaBlocks
  • Protobuf
  • XML Metadata Interchange (XMI)
  • RDA Data Type Registries

Some of the representation in specific formats could be handled on the client end. For example, an R client could deal with translation of the Data Connect/GA4GH schema format for the format used to define the data structures in R. This likely the best solution architecturally. The base question though is what needs to be provided by Data Connect in order to meet user need.

A high level summary of the specific user needs referred to are:

  • Understand the data:
    • from an unfamiliar domain
    • from standard, but niche, specialities e.g. AJCC cancer stage for glioblastoma multiforme
    • the data structure of a particular, perhaps unique, experimental design
  • For the data described by the schema/model; be provided with sufficient information to:
    • Transform the data as needed for the user's purpose
    • Aggregate the data with data from other sources

It is clear that at least the following are core to the needs:

  • References to semantic descriptions (standard or not)
  • Use of scientific units

These would be relevant to data scientists who would be direct users of Data Connect or who would use tools that make use of Data Connect services.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.