Coder Social home page Coder Social logo

mathiasrichter / shapiro Goto Github PK

View Code? Open in Web Editor NEW
31.0 11.0 2.0 596 KB

Modelling data with JSON-LD, Turtle, SHACL

License: Apache License 2.0

Python 33.71% HTML 11.69% JavaScript 1.84% CSS 52.57% Mermaid 0.18%
json-ld model-as-code turtle shacl data data-structures linked-data rdf schema semantic

shapiro's Introduction

Unit Tests Python 3.8 Unit Tests Python 3.9 Unit Tests Python 3.10 Unit Tests Python 3.11 Unit Tests Python 3.12 Coverage Last Commit Release Date

Shapiro Shapiro

What is Shapiro

Shapiro is a simple ontology/vocabulary server serving turtle, json-ld, html or json-schema (as indicated by the requesting client in the accept-header). It therefore provides a simple approach to serving up an organization's ontologies/vocabularies.

Motivation - Model as Code

Why would one need something like Shapiro? The basic idea is to model data as knowledge graphs using Turtle or JSON-LD and use these models in API definitions/implementations and all other code consuming data based on these models.

Make the use of these machine-readable model definitions pervasive throughout all phases of the software lifecycle (design, implement, test, release) and the lifecycle of the data originating from software built using these models.

Express non-functional requirements like security, traceability/lineage, data quality in the models and bind them to the instances of data wherever the data is distributed to and used.

Drive all documentation (model diagrams, documents, graph visualizations, etc.) from the same RDF-based model definition (a.k.a. ontology/knowledge graph).

Start out with providing a toolset from developers for developers for formulating such models and using them in source code, gradually extending towards tools, editors, UIs, transformations making this modelling approach accessible to non-technical actors like business analysts, domain data owners, etc.

In order to do so, you need a way to serve the models - this is where Shapiro comes in.

Serving Schemas

Shapiro serves schemas from a directory hierarchy in the file system (specified by the content_dirparameter at startup). Shapiro will regularly check new or modified schemas for syntax errors and exclude such "bad schemas" from getting served. Schemas can be moved into Shapiro's content_dir while it is running. This decouples the lifecycle for schemas from the lifecycle of Shapiro - the basic idea being that the lifecycle of schemas is managed in some code repository where changes get pushed into Shapiro's content directory without Shapiro having to be restarted.

Content Negotiation

Shapiro will use the accept header of the get request for a schema to determine the mime type of its response, independent of the format that Shapiro holds the schema in on its file system:

Request Accept Header & Response Mime Type Implementation Status
application/ld+json implemented
text/turtle implemented
text/html implemented
application/schema+json implemented
application/json implemented (will return JSON-SCHEMA)

If no accept header is specified, Shapiro will assume application/schema+json as default, because many JSON-SCHEMA processors/validators do not properly set the accept header when resolving $ref URLs.

Integration with OpenAPI & JSON-Schema

Shapiro converts Shacl nodeshapes into JSON-Schema and thereby integrates with JSON-Schema validation. Based on this, you can use the semantic datamodels served by Shapiro in your OpenAPI definitions (by way of $ref). An end to end example based on this OpenAPI tutorial can be found in test/openapi/tutorial.yaml where the corresponding semantic model is at test/openapi/tutorial/artist.ttl.

Markdown in RDFS Comments/SKOS Definitions/DCT Descriptions

When rendering for mime type text/html Shapiro will consider markdown in RDFS comments, SKOS definitions, DCT descriptions for improved readability of documentation.

No URL fragments

Shapiro is opinionated about URL fragments for referring to terms in a schema - it plainly does not support them (here's why). So when writing your schema a.k.a. model a.k.a. vocabulary a.k.a. ontology, please ensure you refer to the individual terms it defines using the regular forward slash: e.g. http://myserver.com/myontology/term instead of http://myserver.com/myontology#term

Hierarchical Namespaces

Shapiro allows you to keep schemas/ontologies in arbitrary namespace hierarchies - simply by reflecting namespaces as a directory hierarchy. This allows organizations to separate their schemas/ontologies across a hierarchical namespace and avoid any clashes. This also means you can have a more relaxed governance around the various ontologies/schemas across a collaborating community. The assumption is that you manage your schemas/ontologies in a code repository (Github, etc.) and manage releases form there onto a Shapiro instance serving these schemas in a specific environment (dev/test/prod).

Querying the combined Graph of all Schemas served by Shapiro

Shapiro keeps the complete graph of all schemas combined in memory. The graph can be queried using the post request API /query. This takes a SPARQL query (no updates) in the request body. That way you can query and mine the combined graph of all models.

Searching Shapiro

Shapiro uses Whoosh Full-text-search to index all schemas it serves. Shapiro regularly checks for modified or new schemas in its content directory and indexes them.

Shapiro UI

Shapiro provides a minimal UI available at /welcome/. Any GETrequest to / without a schema name to retrieve will also redirect to the UI. The ui lists all schemas served by Shapiro at a given point in time and allows to full-text-search schema content. The Shapiro UI also renders models/schemas/ontologies as HTML.

Writing Semantic Models to be served by Shapiro

Given the number of possibilities to use ontologies & vocabularies for your models, Shapiro can't anticipate them all. While I'm trying to keep Shapiro as open as possible and while Shapiro can serve any kind of ontology or vocabulary, HTML rendering of models and JSON-SCHEMA rendering of models work best if you keep the following in mind:

  • Use RDFS for modelling your classes and properties. HTML rendering will work best with this vocabulary.
  • Use RDFS labels that are acceptable object names resp. property names in programming languages (specifically when you use JSON-SCHEMA & OpenAPI in conjunction with schemas hosted by Shapiro)
  • JSON-SCHEMA conversion requires your model defining NodeShapes with the appropriate SHACL properties and constraints. Shapiro will render empty schemas if you ask for JSON-SCHEMA of an RDFS class.

Installing Shapiro

  1. Clone the Shapiro repository.
  2. Install dependencies: pip install -r requirements.txt

Running Shapiro

  1. Run Shapiro Server: python shapiro_server.py with commandline parameters as per parameter reference
  2. Access the UI at http://localhost:8000/welcome/
  3. Access the API docs at http://localhost:8000/docs
  4. Try curl -X 'GET' 'http://localhost:8000/<SCHEMANAME HERE>' -H 'accept-header: application/ld+json' to get JSON-LD from a schema in the content dir
  5. Try curl -X 'GET' 'http://localhost:8000/<SCHEMANAME HERE>' -H 'accept-header: text/turtle' to get JSON-LD from a schema in the content dir.

Commandline Parameter Reference

Parameter Description
--host The host for uvicorn to use. Defaults to 127.0.0.1
--port The port for the server to receive requests on. Defaults to 8000.
--domain The domain that Shapiro uses to build its BASE_URL.
Defaults to '127.0.0.1:8000' and is typically set to the domain name under which you deploy Shapiro.
This is what Shapiro uses to ensure schemas are rooted on its server, to build links in the HTML docs and it's also the URL Shapiro uses to resolve static resources in HTML renderings.
Include the port if needed. Examples: --domain schemas.myorg.com, --domain schemas.myorg.com:1234
--content_dir The content directory to be used. Defaults to "./". If you specify parameters for a GitHub user and repo, then this is the path of the content directory relative to the repository. If you're using GitHub to serve schemas from, this would be relative to the repository's root directory.
--log_level The log level to run with. Defaults to "info"
--default_mime The mime type to use if the requested mimetype in the accept header is not available or usable. Defaults to "text/turtle".
--index_dir The directory where Shapiro stores the full-text-search indices. Default is ./fts_index
--ssl_keyfile
--ssl_certfile
--ssl_ca_certs
If these are set, Shapiro uses SSL. No defaults.
--github_user
--github_repo
If these are set, Shapiro serves schemas from the content dir in this repo.
--github_branch Set this to use a specific branch in your git hub repo (if github repo and github user parameters are specified). Defaults to the GitHub repo's default branch.
--github_token The access token for the GitHub repo (if guthub repo and github user parameters are specified). If no value is specified, no authentication is used with GitHub (which will limit the number of requests that can be made through the API).

Make sure you run python shapiro_server.py --helpfor a full reference of command line parameters (host, port, domain, content dir, log level, default mime type, index directory, and if needed ssl-parameters).

shapiro's People

Contributors

guscht avatar mathiasrichter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

guscht

shapiro's Issues

Refactor (remove global var references and split into proper classes)

The functions for resolving the request for a schema directly reference global vars (CONTENT_DIR, SUPPORTED_SUFFIXES, etc.). Need to remove these reference by providing these inputs in the function signature.

Once done, it should be easy to refactor the collection of functions in the current implementation into a small set of classes.

JSON-SCHEMA generation: ensure proper handling of (RDFS) inheritance

Currently, the JSON-SCHEMA renderer of Shapiro does not properly consider inheritance (as per RDFS reasoning).

This means in essence that Shapiro needs to traverse the inheritance hierarchy of the shacl:targetClass and include all NodeShapes with their SHACL property definitions.

Need to improve how Shapiro deals with large models

Models can contain actual instance data (e.g. a model defines a currency class and a closed list of instances to reflect the collection of accept currencies).

Sometimes, the list of instances can become quite large. The fact that Shapiro parse the serialized model from a file in the file system every time a model or a model element is requested, can negatively impact performance.

Need to think about a way to cache the graphs for larger models in memory to increase response time...

Render ontologies as UML class diagrams

in practice, people better understand models if there is some notion of a UML-style class diagram. For practical examples, see the documentation for standard ontologies like DCAT, Time or Organization. They all comprise UML-style diagrams as overview of the model.

Need to amend the HMTL rendering to include such class diagrams.

SHACL to JSON-schema conversion

Add support for converting the nodeshapes defined in a model into JSON-schema so models/model elements can directly be referenced in OpenAPI/Swagger definitions without breaking the OpenAPI tooling.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.