Comments (7)
Encouraging Core Schema use
If developers make use of application schemas instead of core schemas this may reduce the utility of the data.
For example the Drivers licence schema when compared against Australian drivers' licences from different states is missing fields for:
- middle names
- date of birth
- effective date (QLD only)
- licence class
- conditions
- card number
- expiry date
- organ donor
- address
These fields could be stored in the extras object, however because the schema does not enforce them the implementations could be different.
- Storing whether a driver is an organ donor with the key "donor" or "organDonor"
- Storing the value as Y/N, true / false, 0/1
- Omitting the field if the driver is not registered as a donor
If we have multiple KYC providers that can issue verified drivers licence data but some use application schemas and others use the core schemas this could increase the validation app developers need to do.
Sync conflict user story
As a user I want to be able to sync data back to my vault. I use two applications that both make use of the core schema for drivers licences.
- I sign up to a hire car company that requires I do KYC with Identity Provider 1
- I complete my KYC with Identity Provider 1 and they provided signed data back to the hire car company.
- I don't have a verified drivers licence record in my Verida Vault so I select to share the data back from the hire car app to my Verida Vault.
- The hire car application relies on my licence class being present to function correctly.
- The hire car application wants to receive updates to my address if I am issued a new license.
- I have my licence revoked some time passes, I change address.
- I get a new licence issued with a different class.
- I sign up to an exchange that does not accept credentials from Identity Provider 1
- I complete KYC with Identity Provider 2 and they provide signed data back to the exchange that does not include my licence class.
- I sync the drivers licence record back to my Verida Vault so that I have an up to date record.
- My Vault syncs the updated drivers licence data hire car company datastore, the address is updated but the licence class is missing.
This creates a difficult problem for the developer as the number of Identity Providers increases if they need to check if and how the data they issues stores the fields they require.
Comments
Should our core schema's have the equivalent of interfaces, abstract classes and implementations? Maybe with a structure like
schemas/identity/licence/interface/v1.json
schemas/identity/licence/{country}/v1.json
schemas/identity/licence/{country}/{state}/v1.json
Then developers just need to select core schemas they support rather than assessing which application schemas are compatible or which applications store the data in the extras object. Similar to schemastore providing schemas for specific plugins that multiple editors support.
There's a security risk where a schema is specified by URL and then the schema is modified (or the hosting provider hacked) to generate a different URL. For example, modifying the schema to remove the list of required fields, allowing data to be saved across the network with invalid data.
perhaps
schemas/identity/licence/schema.json
could be split to include application schemas that have been registered through part of the trust framework process to reduce the level of review that developers need to do when incorporating them
schemas/core/identity/licence/schema.json
schemas/application/identity/licence/{APP_DID}/schema.json
Applications are also free to self host which could be similar to the browser warning you get with a self signed root certificate.
Patchy data
Birth certificates have a variety of data recorded. Some have extra data like
- witnesses (medical staff / other)
- previous children
- marriage of parents (some with year and place)
Should we include validation logic like the max length of a last name where there maybe outliers
To discourage developers from rolling their own application schema, should we include fields that are common on forms that are allowed to empty like
- mother's maiden name
- parents given names, date of birth, place of birth, occupation or address
- registration number, registration date, registrar, certification date
Should future versions have more ontology in the schemas like schema.org has so that DID's can be linked? In the case of a birth certificate linking to the parents DIDs?
from verida-js.
Additional relevant content:
JSON Schema Slack
I have reached out on multiple occasions and there does not appear to be a standardized way of versioning JSON schemas.
Versioning
Examples of versioning schemas:
- https://help.liferay.com/hc/en-us/articles/360017889792-Meaningful-Schema-Versioning
- https://www.itwinjs.org/bis/intro/schema-versioning-and-generations/
- https://semver.org/ (Versioning of software, not schemas, but closely related)
Overlay Capture Architecture offers a novel approach that appears to lot of effort behind it in the existing DIF working groups. This may be a a good long term solution, but is still in its early stages.
Migrating of data
As an alternative, Tan and Katayama (1989) propose a lazy mechanism for converting data in a database to the current version only when required
This lazy approach as it works well in a decentralized environment. The user's database is updated when it is opened.
Still leaves unresolved as to how schemas are migrated from one version to another.
from verida-js.
Here is a rough outline of a proposed solution.
Overview
- There are a set of core schemas, but they are kept to a minimum (ie:
base
schema used by all applications) - Applications are responsible for designing their own schemas, versioning them and handling data migrations. This reduces the scope that Verida is responsible for, however Verida can take an active role in the future to support collaboration initiatives.
- Verida maintains schemas for the
core
and also for any applications it manages (such as theVerida: Vault
) - Schemas can be registered in the Verida Trust Framework (when available) to provide security and verification of schemas. This provides an on-chain hash of each schema version which the Verida Client SDK uses to verify schemas have not been tampered with.
Versioning
- Schemas are versioned with a logical structure similar to the liferay example
- Schema files are in the format:
https://example.xyz/<schema/name>/schema.json
-- The latest, most up-to-date version of a schemahttps://example.xyz/<schema/name>/v1.0.1.json
-- The latest, most up-to-date version of a schema also stored as a versioned file (this is the file that is hashed and stored in the Verida Trust Framework)https://example.xyz/<schema/name>/v1.0.0.json
-- An archive of a previous schema version- Schemas have an
$id
property that represents the full versioned URI of the schema (ie.../v1.0.1.json
)
Data storage
- Data is stored with two properties:
schema
: The full path (without version information) to the latest schema (ie:https://example.xyz/<schema/name>/schema.json
)schemaVersion
: The semantic version number representing the schema version used for the data record (ie:v1.0.1
)
This approach ensures the schema never changes which greatly simplifies data queries, however the versioning information is always available allowing data records to be "upgraded" to the latest schema version on a per-application basis.
Data migration
Initially it will be up to each application to detect data that is an old version and manually run code to update it to the latest version.
In the future, Verida can provide tools in the Client SDK to simplify this upgrade process, enforce data upgrades and potentially leverage the Trust Framework to distribute trusted data migration logic.
from verida-js.
Naming convention
https://schemas.verida.io/contact/latest/schema.json -> https://schemas.verida.io/contact/v0.2.0/schema.json
https://schemas.verida.io/contact/v0.1.0/schema.json
https://schemas.verida.io/contact/v0.2.0/schema.json
Example
{
name: "Nick",
schema: {
type: "https://schemas.verida.io/contact"
version: "0.1.0"
}
}
// Gives me all records regardless of version
contacts = verida.openDatastore("https://schemas.verida.io/contact")
// query
db.getMany({
"schema.type": "https://schemas.verida.io/contact"
})
// Gives me v0.1.0 records
contacts = verida.openDatastore({
schema: "https://schemas.verida.io/contact"
version: "0.1.0"
})
// query
db.getMany({
"schema.type": "https://schemas.verida.io/contact",
"schema.version": "0.1.0"
})
Some rules:
- Versioning rules will be hard coded (today)
- If an app has an old version of the schema, the app should still work even if there's a new version of the schema
- If data moves between apps (context1 -> context2), there are some more rigid checks on data versions and data movement be rejected (future work)
from verida-js.
After a discussion with an external JSON schema expert we have agreed on some key points.
Version upgrades
Our key priority is to ensure that minor schema upgrades are backwards compatible. The protocol needs to ensure this is possible without needing to update the version of all schemas across the network or within a particular application context.
For example; It should be possible to add a new field to base
schema that doesn't break previous base
schemas and new applications can start using that field without defining any new schemas of their own.
Schema ecosystem
In regards to allowing schemas to organically be defined by different projects, our objective is to support the following:
Verida core schemas
: Schemas core to all applications (ie:base
schema)Verida common schemas
: Schemas that are useful to many applications, but not essential (ie:credential
schema)Verida vault schemas
: Schemas specifically supported by the Verida Vault
Verida will be responsible for maintaining those schemas, merging appropriate PR's from third parties and handling any data migration between schema versions with custom code in the Verida Vault for all end users.
Other application developers will build their own schema definitions which may or may not leverage the Verida supplied schemas. At any time, multiple third parties can collaborate to define their own industry standard schema definitions.
Application developers will be responsible for any data migration required caused by upgrades from one schema version to another schema version. In the future, tools may be developed by Verida or the community to help simplify or standardise this data migration process.
Cambraia is an interesting (experimental) Typescript library designed to help convert data between JSON schemas. It may be possible to leverage this for supporting data migrations in the future.
Schema extensibility
An idea was floated where schemas could become extensible. For example instead of a record belonging to a single schema (via the schema
property) a record could belong to multiple schemas.
In this way an application could extend a common schema with its own properties and other applications would have the option to support or not support that particular extension. As the extension would be a supplementary JSON schema file, each record would end up with its own schema that can be validated.
This is an interesting idea that could be explored in a future version of the protocol, but isn't anticipated to be supported in version 1.
from verida-js.
I've just realised there is one issue we will have with signatures.
If a piece of data has been signed by a third party and we perform a future migration of that data, we will change the schema
property which in turn will break the signature.
Options:
- Ignore the problem
- Modify the signature scheme to not include any
schema
metadata in the signed data - Modify the signature scheme to retain the base
schema
URL, but drop all version information - Sign two versions of the data. 1) "as is", 2) with schema info dropped
I'm in favour of (3) as it seems semantically correct and ensures signatures don't unexpectedly break, however I'm concerned about that adding some complexity and potential confusion.
Data is versioned, so it would be possible to still recover the previously signed data with an older version and correct signature. On that basis (1) isn't awful.
from verida-js.
Added issue to resolve schema versioning issue with signatures: #110
Can now close this issue.
from verida-js.
Related Issues (20)
- [vda-node-xxx] Update for `StorageNodeRegistry` contract update
- Implement getPublicProfile() HOT 1
- Refactor how DIDs and Verida networks work
- Update to node 18 and Lerna 7+ HOT 1
- Harmonise publish to NPM
- Harmonise build configuration
- Investigate error `Unable to create DID: Already exists` when the RPC seems to be down
- Investigate error `Unable to send message. Recipient does not have an inbox for that context (Verida: Vault)` when the RPC seems to be down
- Add JSON file for banksia testnet HOT 1
- Migrate to Amoy
- [verifiable-credentials] Generated verifiable credential has incorrect data
- Enable web apps to open external databases they don't own HOT 1
- The `@verida/account-web-vault` package doesn't work in a Next.js application by default HOT 3
- PouchDB database locations should be configurable or default to somewhere guaranteed to be writable.
- [vda-reward-client] Update `claim()` function
- [vda-reward-xp-client] Create initial version
- Fix loosing properties when generating verifiable credentials
- Create a command line package for easy testing and debugging
- [cli-tools] Create a command to send a test inbox message
- [client-ts] Sending inbox message to a full node causes error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from verida-js.