Verida schemas The Verida protocol supports defining a schema for

Naming convention <a href="https://schemas.verida.io/contact/lates

Added issue to resolve schema versioning issue with signatures: <a class="issue-link j

Verida protocol schema research about verida-js HOT 7 CLOSED

verida commented on July 25, 2024

Verida protocol schema research

from verida-js.

Comments (7)

benlongstaff commented on July 25, 2024

Encouraging Core Schema use

If developers make use of application schemas instead of core schemas this may reduce the utility of the data.

For example the Drivers licence schema when compared against Australian drivers' licences from different states is missing fields for:

middle names
date of birth
effective date (QLD only)
licence class
conditions
card number
expiry date
organ donor
address

These fields could be stored in the extras object, however because the schema does not enforce them the implementations could be different.

Storing whether a driver is an organ donor with the key "donor" or "organDonor"
Storing the value as Y/N, true / false, 0/1
Omitting the field if the driver is not registered as a donor

If we have multiple KYC providers that can issue verified drivers licence data but some use application schemas and others use the core schemas this could increase the validation app developers need to do.

Sync conflict user story
As a user I want to be able to sync data back to my vault. I use two applications that both make use of the core schema for drivers licences.

I sign up to a hire car company that requires I do KYC with Identity Provider 1
I complete my KYC with Identity Provider 1 and they provided signed data back to the hire car company.
I don't have a verified drivers licence record in my Verida Vault so I select to share the data back from the hire car app to my Verida Vault.
The hire car application relies on my licence class being present to function correctly.
The hire car application wants to receive updates to my address if I am issued a new license.
I have my licence revoked some time passes, I change address.
I get a new licence issued with a different class.
I sign up to an exchange that does not accept credentials from Identity Provider 1
I complete KYC with Identity Provider 2 and they provide signed data back to the exchange that does not include my licence class.
I sync the drivers licence record back to my Verida Vault so that I have an up to date record.
My Vault syncs the updated drivers licence data hire car company datastore, the address is updated but the licence class is missing.

This creates a difficult problem for the developer as the number of Identity Providers increases if they need to check if and how the data they issues stores the fields they require.

Comments
Should our core schema's have the equivalent of interfaces, abstract classes and implementations? Maybe with a structure like

schemas/identity/licence/interface/v1.json 
schemas/identity/licence/{country}/v1.json 
schemas/identity/licence/{country}/{state}/v1.json

Then developers just need to select core schemas they support rather than assessing which application schemas are compatible or which applications store the data in the extras object. Similar to schemastore providing schemas for specific plugins that multiple editors support.

There's a security risk where a schema is specified by URL and then the schema is modified (or the hosting provider hacked) to generate a different URL. For example, modifying the schema to remove the list of required fields, allowing data to be saved across the network with invalid data.

perhaps

schemas/identity/licence/schema.json

could be split to include application schemas that have been registered through part of the trust framework process to reduce the level of review that developers need to do when incorporating them

schemas/core/identity/licence/schema.json
schemas/application/identity/licence/{APP_DID}/schema.json

Applications are also free to self host which could be similar to the browser warning you get with a self signed root certificate.

Patchy data

Birth certificates have a variety of data recorded. Some have extra data like

witnesses (medical staff / other)
previous children
marriage of parents (some with year and place)

Should we include validation logic like the max length of a last name where there maybe outliers

To discourage developers from rolling their own application schema, should we include fields that are common on forms that are allowed to empty like

mother's maiden name
parents given names, date of birth, place of birth, occupation or address
registration number, registration date, registrar, certification date
Should future versions have more ontology in the schemas like schema.org has so that DID's can be linked? In the case of a birth certificate linking to the parents DIDs?

from verida-js.

tahpot commented on July 25, 2024

Additional relevant content:

JSON Schema Slack

I have reached out on multiple occasions and there does not appear to be a standardized way of versioning JSON schemas.

Versioning

Examples of versioning schemas:

https://help.liferay.com/hc/en-us/articles/360017889792-Meaningful-Schema-Versioning
https://www.itwinjs.org/bis/intro/schema-versioning-and-generations/
https://semver.org/ (Versioning of software, not schemas, but closely related)

Overlay Capture Architecture offers a novel approach that appears to lot of effort behind it in the existing DIF working groups. This may be a a good long term solution, but is still in its early stages.

Migrating of data

As an alternative, Tan and Katayama (1989) propose a lazy mechanism for converting data in a database to the current version only when required

source

This lazy approach as it works well in a decentralized environment. The user's database is updated when it is opened.

Still leaves unresolved as to how schemas are migrated from one version to another.

from verida-js.

tahpot commented on July 25, 2024

Here is a rough outline of a proposed solution.

Overview

There are a set of core schemas, but they are kept to a minimum (ie: base schema used by all applications)
Applications are responsible for designing their own schemas, versioning them and handling data migrations. This reduces the scope that Verida is responsible for, however Verida can take an active role in the future to support collaboration initiatives.
Verida maintains schemas for the core and also for any applications it manages (such as the Verida: Vault)
Schemas can be registered in the Verida Trust Framework (when available) to provide security and verification of schemas. This provides an on-chain hash of each schema version which the Verida Client SDK uses to verify schemas have not been tampered with.

Versioning

Schemas are versioned with a logical structure similar to the liferay example
Schema files are in the format:
https://example.xyz/<schema/name>/schema.json -- The latest, most up-to-date version of a schema
https://example.xyz/<schema/name>/v1.0.1.json -- The latest, most up-to-date version of a schema also stored as a versioned file (this is the file that is hashed and stored in the Verida Trust Framework)
https://example.xyz/<schema/name>/v1.0.0.json -- An archive of a previous schema version
Schemas have an $id property that represents the full versioned URI of the schema (ie .../v1.0.1.json)

Data storage

Data is stored with two properties:
schema: The full path (without version information) to the latest schema (ie: https://example.xyz/<schema/name>/schema.json)
schemaVersion: The semantic version number representing the schema version used for the data record (ie: v1.0.1)

This approach ensures the schema never changes which greatly simplifies data queries, however the versioning information is always available allowing data records to be "upgraded" to the latest schema version on a per-application basis.

Data migration

Initially it will be up to each application to detect data that is an old version and manually run code to update it to the latest version.

In the future, Verida can provide tools in the Client SDK to simplify this upgrade process, enforce data upgrades and potentially leverage the Trust Framework to distribute trusted data migration logic.

from verida-js.

tahpot commented on July 25, 2024

Naming convention

https://schemas.verida.io/contact/latest/schema.json -> https://schemas.verida.io/contact/v0.2.0/schema.json
https://schemas.verida.io/contact/v0.1.0/schema.json
https://schemas.verida.io/contact/v0.2.0/schema.json

Example

{
 name: "Nick",
 schema: {
   type: "https://schemas.verida.io/contact"
  version: "0.1.0"
  }
}

// Gives me all records regardless of version
contacts = verida.openDatastore("https://schemas.verida.io/contact")

// query
db.getMany({
"schema.type": "https://schemas.verida.io/contact"
})

// Gives me v0.1.0 records
contacts = verida.openDatastore({
schema: "https://schemas.verida.io/contact"
version: "0.1.0"
})

// query
db.getMany({
"schema.type": "https://schemas.verida.io/contact",
"schema.version": "0.1.0"
})

Some rules:

Versioning rules will be hard coded (today)
If an app has an old version of the schema, the app should still work even if there's a new version of the schema
If data moves between apps (context1 -> context2), there are some more rigid checks on data versions and data movement be rejected (future work)

from verida-js.

tahpot commented on July 25, 2024

After a discussion with an external JSON schema expert we have agreed on some key points.

Version upgrades

Our key priority is to ensure that minor schema upgrades are backwards compatible. The protocol needs to ensure this is possible without needing to update the version of all schemas across the network or within a particular application context.

For example; It should be possible to add a new field to base schema that doesn't break previous base schemas and new applications can start using that field without defining any new schemas of their own.

Schema ecosystem

In regards to allowing schemas to organically be defined by different projects, our objective is to support the following:

Verida core schemas: Schemas core to all applications (ie: base schema)
Verida common schemas: Schemas that are useful to many applications, but not essential (ie: credential schema)
Verida vault schemas: Schemas specifically supported by the Verida Vault

Verida will be responsible for maintaining those schemas, merging appropriate PR's from third parties and handling any data migration between schema versions with custom code in the Verida Vault for all end users.

Other application developers will build their own schema definitions which may or may not leverage the Verida supplied schemas. At any time, multiple third parties can collaborate to define their own industry standard schema definitions.

Application developers will be responsible for any data migration required caused by upgrades from one schema version to another schema version. In the future, tools may be developed by Verida or the community to help simplify or standardise this data migration process.

Cambraia is an interesting (experimental) Typescript library designed to help convert data between JSON schemas. It may be possible to leverage this for supporting data migrations in the future.

Schema extensibility

An idea was floated where schemas could become extensible. For example instead of a record belonging to a single schema (via the schema property) a record could belong to multiple schemas.

In this way an application could extend a common schema with its own properties and other applications would have the option to support or not support that particular extension. As the extension would be a supplementary JSON schema file, each record would end up with its own schema that can be validated.

This is an interesting idea that could be explored in a future version of the protocol, but isn't anticipated to be supported in version 1.

from verida-js.

tahpot commented on July 25, 2024

I've just realised there is one issue we will have with signatures.

If a piece of data has been signed by a third party and we perform a future migration of that data, we will change the schema property which in turn will break the signature.

Options:

Ignore the problem
Modify the signature scheme to not include any schema metadata in the signed data
Modify the signature scheme to retain the base schema URL, but drop all version information
Sign two versions of the data. 1) "as is", 2) with schema info dropped

I'm in favour of (3) as it seems semantically correct and ensures signatures don't unexpectedly break, however I'm concerned about that adding some complexity and potential confusion.

Data is versioned, so it would be possible to still recover the previously signed data with an older version and correct signature. On that basis (1) isn't awful.

from verida-js.

tahpot commented on July 25, 2024

Added issue to resolve schema versioning issue with signatures: #110

Can now close this issue.

from verida-js.

Verida protocol schema research about verida-js HOT 7 CLOSED

Comments (7)

Encouraging Core Schema use

Patchy data

Overview

Versioning

Data storage

Data migration

Naming convention

Example

Some rules:

Version upgrades

Schema ecosystem

Schema extensibility

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent