digitalbazaar / cborld Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 12.0 265 KB

A Javascript CBOR-LD processor for web browsers and Node.js apps.

Home Page: https://digitalbazaar.github.io/cbor-ld-spec/

License: BSD 3-Clause "New" or "Revised" License

JavaScript 100.00%

jsonld cbor cborld linked-data

cborld's Issues

Add language map tests and support language tags as non-terms

Term to Code dictionary includes only applied scoped contexts

Hi,
the current term to code dictionary implementation includes only applied scoped contexts indexed in the application order?! This solution prevents generating static dictionaries for well-known context (sets) - the dictionary is tightly coupled with a document. Is very hard to implement.

The goal is to create a dictionary mapping strings to codes.

Easy solution to implement is to collect a set of all property names (and perhaps string values as well) found in all included contexts, no matter on (contexts) order, occurrence, nor usage, then sort it alphabetically, and that's it.

Convert 'cborld' library to isomorphic (to run in browser)

Currently, this library is Node-only, but we need it to work in the browser (without Buffer polyfill and other stuff).

(Depends on PR #17).

We need to:

Remove the Node-only fs caching documentLoader (have it throw a 'Not Implemented' error just like the browser loader). This will remove the need for polyfills for fs, os and path.
Convert all usages of Buffer to Uint8Array usages.
(main blocker) Convert the cbor dependency to be isomorphic (to run the browser), currently it's Node-only.
- the cbor-web variant uses Buffer polyfills; no good.
- @ellipticoin/cbor is isomorphic, but is way too basic (does not do what we need)

Unable to retrieve documentLoader

Are we supposed to provide our own document loader as an option input to encode and decode? This line suggests that documentLoader is provided by this library, but this value appears to be undefined 🤔

Single item array could be compressed as a single value saving one byte

Hi,
just an idea to improve compress ratio. Consider this example:

{
  "@context": {
    "type": "@type"
  }
}

{
  "@context": "...",
  "type": ["type-id1"]
}

the compressed output by this library is:

[{ 0: https...context.jsonld, 101: [type-id1] }]

101 says it's an array, and there is no need to use CBOR array marker array(1). It could be compressed as [{ 0: https...context.jsonld, 101: type-id1 }] saving one byte.

Extract embedded codec registry

Possibly house it at https://digitalbazaar.github.io/cbor-ld-spec/

Encoded byte array prefix does not match the specification

Hi,
the first three bytes generated by the library are 0xd90501 for compressed output, but the specification defines 0xd95001 see Compressed CBOR-LD Buffer Algorithm

Add a test to check for plural `@type` and type-alias values that have type-scoped contexts

For example, a VC with multiple types, each of which brings in its own type-scoped context.

Interoperability issue with vanilla cbor for base58didurl codec

First, let me say that I am beyond excited for CBOR-LD, and this bug that I am hunting is almost certainly related to my relative inexperience combined with my excitement.

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    "https://w3id.org/vaccination/v1"
  ],
  "type": [
    "VerifiableCredential",
    "VaccinationCertificate"
  ],
  "id": "urn:uvci:af5vshde843jf831j128fj",
  "name": "COVID-19 Vaccination Certificate",
  "description": "COVID-19 Vaccination Certificate",
  "issuanceDate": "2019-12-03T12:19:52Z",
  "expirationDate": "2029-12-03T12:19:52Z",
  "issuer": "did:key:z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3",
  "credentialSubject": {
    "type": "VaccinationEvent",
    "batchNumber": "1183738569",
    "administeringCentre": "MoH",
    "healthProfessional": "MoH",
    "countryOfVaccination": "NZ",
    "recipient": {
      "type": "VaccineRecipient",
      "givenName": "JOHN",
      "familyName": "SMITH",
      "gender": "Male",
      "birthDate": "1958-07-17"
    },
    "vaccine": {
      "type": "Vaccine",
      "disease": "COVID-19",
      "atcCode": "J07BX03",
      "medicinalProductName": "COVID-19 Vaccine Moderna",
      "marketingAuthorizationHolder": "Moderna Biotech"
    }
  },
  "proof": {
    "type": "Ed25519Signature2018",
    "created": "2021-02-18T23:00:15Z",
    "jws": "eyJhbGciOiJFZERTQSIsImI2NCI6ZmFsc2UsImNyaXQiOlsiYjY0Il19..vD_vXJCWdeGpN-qKHDIlzgGC0auRPcwp3O1sOI-gN8z3UD4pI0HO_77ob5KHhhU1ugLrrwrMsKv71mqHBn-dBg",
    "proofPurpose": "assertionMethod",
    "verificationMethod": "did:key:z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3#z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3"
  }
}

in hex, all instances of the encoded “Base58DidURL” are prefixed with d840 in my library but not in yours….

as far as I can tell this happens during cbor encoding, not during encoding mapping stage, since the encoded values, are the same in both libraries…

for example:

issuer: did:key:z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3
proof.verificationMethod did:key:z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3#z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3

I see the same encoded representation before the cbor encoding runs:

[
  1025,
  Uint8Array(34) [
    237,   1,  60, 171,  80, 213,  89, 231,
    138,  27, 171, 113, 132,  31, 217, 203,
    217, 232,   2, 153, 144,  56, 104,  94,
    248,  75, 126,  73,  13,  39,  23, 194,
    177,  16
  ]
]
[
  1025,
  Uint8Array(34) [
    237,   1,  60, 171,  80, 213,  89, 231,
    138,  27, 171, 113, 132,  31, 217, 203,
    217, 232,   2, 153, 144,  56, 104,  94,
    248,  75, 126,  73,  13,  39,  23, 194,
    177,  16
  ],
  Uint8Array(34) [
    237,   1,  60, 171,  80, 213,  89, 231,
    138,  27, 171, 113, 132,  31, 217, 203,
    217, 232,   2, 153, 144,  56, 104,  94,
    248,  75, 126,  73,  13,  39,  23, 194,
    177,  16
  ]
]

I worry that these changes may be the reason for this inconsistency:

https://github.com/digitalbazaar/cbor/blob/main/CHANGELOG.md

Refactor to do encoding/decoding without re-encoding twice.

The current code base was written to iterate on a number of intermediate formats to hone in on the final result. This means that there is an intermediate format that is used as a sort of abstract syntax tree which is then converted to the final form. This probably results in close to twice as much processing and even more memory usage than necessary. Once the final algorithm for v1.0 is settled on, we will want to refactor the code base so that the intermediate format is optimized out. I believe the current algorithm only makes one pass over the documents and thus there is no need to keep previous state around. We should be able to do stream processing (since stream processing is possible in JSON-LD and because I don't remember having to implement jumping around when processing contexts or encoding/decoding).

Multibase term value is not properly compressed when using @vocab

Hi,
please try this example:
context:

{
    "@context": {
        "@vocab": "https://w3id.org/security#",
        "value": { "@type": "multibase" }
    }
}

document:

{
    "@context": "https://.../0012-context.jsonld",
    "value": "z4mAs9uHU16jR4xwPcbhHyRUc6BbaiJQE5MJwn3PCWkRXsriK9AMrQQMbjzG9XXFPNgngmQXHKUz23WRSu9jSxPCF"
}

the expected output is

[{ 0: https://..../0012-context.jsonld, 100: [0x7A,0xBC,0x24,0x3F, ... 65 bytes] }]

but got:

[{ 0: https://.../0012-context.jsonld, 100: z4mAs9uHU16jR4xwPcbhHyRUc6BbaiJQE5MJwn3PCWkRXsriK9AMrQQMbjzG9XXFPNgngmQXHKUz23WRSu9jSxPCF }]

Consolidate how URI scheme encoding IDs are referenced

We express the URI scheme encoding IDs (such as 1024 and 1025 for did:v1:nym and did:key) in multiple places. We should figure out a way to consolidate them and reference them through constants/static vars or similar. However, it's important not to make it so we have to import any decoder code when encoding and vice versa to enable builds for just encoding or just decoding.

Add more examples

Feel free to copy any of the fixtures from here: https://github.com/transmute-industries/decentralized-cbor/tree/57d2f524a1b8927e6e19113b66890183a0c69871/src/__fixtures__/inputs

some are failing currently.

Make code more DRY

Audit the code to look for places to make it more DRY. One example of repetitious code is the initialization of a new active context when the context stack is empty.

Replace/rework node.js document loader

The current document loader does some bespoke on-disk caching. This should be reworked or moved outside of the core module as an optional module -- and its implementation should be replaced with something less custom, if we decide we want to keep using it. It is a premature optimization at this point and may create a number of foreseen issues (inability to invalidate caches, etc.) and unforeseen issues that other utility modules we could make use of may help us mitigate.

Fixup default argument usage.

Looking at the coverage reports there are many uses of f({value} = {}) or similar where a value is required for the code to work. Will those default paths ever be taken? If so error handling is needed, if not those default constructs can be removed.

Ensure that UUID URNs with queries and hash fragments are handled losslessly

Applying scoped term codecs fails for redeclarations

The algorithm for determining which codec to use when encoding/decoding a value is currently too simple and doesn't entirely take scoped terms into account. That is, when there is a redeclaration of an outer term, for example when @protected isn't used, last defined wins when it shouldn't.

The value encode/decode codec selection algorithm needs to be updated to take where the encoder/decoder is in the parsing tree into account as that can affect which codec is used for complex JSON-LD contexts.

Ensure codec values are registered in the spec.

This value is used here but not registered in the spec:

_addRegistration(0x32, 'https://purl.imsglobal.org/spec/ob/v3p0/context.json');

If the id is stable enough to last forever, it should be registered in the spec. If not, then this lib and the spec needs a new feature to support use of unregistered ids.

Improve error reporting

CBOR-LD is designed such that it expects valid JSON-LD to be passed to it for compression, however, we should have some better error messages both during compression and decompression when we encounter unexpected types, etc.

Add codec for well-known DI cryptosuite strings

Add tests and checks for default values.

Many static and instance methods use destructuring for default argument values. Add tests for the default values and add checks (with an assert or assert-like library) to handle errors as needed.

Related:
#41

Move CLI to another package `cborld-cli`

Problems with lossless conversion of DateTime string using XsdDateTimeCodec

It appears the XsdDateTimeCodec is encoding a valid date time string as Unix Epoch and then decoding it in accordance with ISO 8601. However if the input string features only a Date component i.e '2021-02-22' upon encoding and then decoding the returned value becomes 2021-02-22T00:00:00Z hence it is not lossless.

`@vocab` not implemented

import { encode, documentLoader } from "@digitalbazaar/cborld";

const documentLoader = (iri: string) => {
  if (iri === "https://www.w3.org/ns/activitystreams") {
    return {
      document: {
        "@context": {
          "@version": 1.1,
          "@vocab": "https://www.w3.org/ns/activitystreams#",
        },
      },
    };
  }
  console.log(iri);
  throw new Error("iri not supported");
};

const jsonldDocument = {
  "@context": "https://www.w3.org/ns/activitystreams",
  type: "Note",
  summary: "CBOR-LD",
  content: "CBOR-LD is awesome!",
};

it("can compress linked data", async () => {
  const cborldBytes = await encode({ jsonldDocument, documentLoader });
  console.log(cborldBytes);
});

ERR_UNKNOWN_CBORLD_TERM: Unknown term 'content' was detected in the JSON-LD input.

TypeError: BigNumber is not a constructor

Steps to reproduce:

npm i @digitalbazaar/[email protected]

https://github.com/digitalbazaar/cborld#encode-to-cbor-ld

On node v14.15.5

Appears to be thrown by your wrapper around cbor.... node_modules/@digitalbazaar/cbor/lib/util.js:1

Fix README examples.

The examples need to be fixed since the package no longer exports a documentLoader.

https://w3id.org/security/ed25519-signature-2018/v1 does not resolve

please fix this immediately, before it starts to make everyone cry about JSON-LD :)

Examples don't work with node v16.123.1

I've tried running the examples, but can get neither encode nor decode to work.

I set up a new npm init and installed @digitalbazaar/cborld.

When I tried the first code,

import { encode } from '@digitalbazaar/cborld';

const jsonldDocument = {
  '@context': 'https://www.w3.org/ns/activitystreams',
  type: 'Note',
  summary: 'CBOR-LD',
  content: 'CBOR-LD is awesome!'
};

// encode a JSON-LD Javascript object into CBOR-LD bytes
const cborldBytes = await encode({ jsonldDocument, documentLoader });

console.log(cborldBytes);

Unfortunately, that complained about the use of import outside a module

^^^^^^

SyntaxError: Cannot use import statement outside a module

If I add "type"="module" in my package.json, I get passed that error, only to get this one:

import { encode } from '@digitalbazaar/cborld';
         ^^^^^^
SyntaxError: Named export 'encode' not found. The requested module '@digitalbazaar/cborld' is a CommonJS module, which may not support all module.exports as named exports.
CommonJS modules can always be imported via the default export, for example using:

import pkg from '@digitalbazaar/cborld';
const { encode } = pkg;

FWIW, I tried this alternative import and it didn't work.

On a lark, I tried the "type":"module" hack in the cborld package.json, I get a different error:

import { encode } from '@digitalbazaar/cborld';
         ^^^^^^
SyntaxError: The requested module '@digitalbazaar/cborld' does not provide an export named 'encode'

Any suggestions?

p.s.
This was on a windows bash shell, running node v16.123.1

Issues with loading certain terms

The following LD document(VC) has trouble running through a CBOR encode mechanism

"@context": [
      "https://www.w3.org/2018/credentials/v1",
      "https://credreg.net/ctdlasn/schema/context/json"
  ],
  "type": [
      "VerifiableCredential"
  ],
  "issuer": "did:key:z6MkpYpkXs9u5ZzV3nij3AbdmPkwmiBQvxNeNFiRPvNz5ArP",
  "issuanceDate": "2022-05-02T11:03:53Z",
  "credentialSubject": {
      "id": "did:key:z6MkfwmZep5ZvkHfeXszxhxEuvkmGFRc8H9Nv9ZaQG4vhFzZ",
      "schema:hasCredential": {
          "type": "ceterms:MicroCredential",
          "ceterms:name": "Ogi ogi",
          "ceterms:description": "This is a proof that things work!",
          "ceterms:relatedAction": {
              "type": "ceterms:CredentialingAction",
              "ceterms:startDate": "2022-05-02T09:03:48.964Z",
              "ceterms:endDate": "2022-05-04T09:03:48.964Z"
          },
          "ceterms:subject": [
              {
                  "type": "ceterms:CredentialAlignmentObject",
                  "ceterms:targetNodeName": {
                      "en-US": "Making sure you know Javascript"
                  }
              }
          ]
      }
  }

It complains on the schema:hasCredential term. Even if I can run this through a issuance and verification loop using the same documentLoader.

You can run the code under the CBor creat function to test: https://github.com/vongohren/cbor-ld-test

Scoped Contexts Processing Algorithm

Hi,
please look at this example:

document

{
  "@context": "https://raw.githubusercontent.com/filip26/iridium-cbor-ld/main/src/test/resources/com/apicatalog/cborld/encoder/0025-context.jsonld",
  "a": "Note",
  "x": { "a": "x", "b": "y", "d": 106 },
  "y": {"a": "x", "b": "y", "c": 102 }
}

context

{
  "@context": {
    "@vocab": "https://www.w3.org/ns/activitystreams#",
    "y": {
      "@id": "idy",
      "@context": {
          "a": "@type",
          "b": "@id",
          "c": "longitude"
      }
    },	
    "a": "@type",
    "x": {
      "@id": "idx",
      "@context": {
	 "a": "@id",
	 "b": "@type",
	 "d": "latitude"
       }
    }
  }
}

The output produced by the library is:

[{ 0: ...0025-context.jsonld, 
  100: Note, 
  102: { 100: 102, 106: y, 108: 106 }, 
  104: { 100: x, 106: 104, 110: 102 } }]

where

100: a, 102: x, 104: y,  105: b, 108: d  110: c

why is not @type x.b : y and y.a: x encoded?

by using encoded examples produced by the library as a source to reverse engineering, I've got this result:

[{ 0: ...0025-context.jsonld, 
   100: Note, 
    102: { 100: 102, 106: 104, 108: 106 }, 
    104: { 100: 102, 106: 104, 110: 102 } }]

Thank you!

Encoding fails with undefined term

Hi,
try this example:

{
    "@context": [
        "https://www.w3.org/2018/credentials/v1",
        "https://www.w3.org/2018/credentials/examples/v1",
        "https://w3id.org/security/suites/ed25519-2020/v1"
    ],
    "id": "http://example.edu/credentials/3732",
    "type": [
        "VerifiableCredential",
        "UniversityDegreeCredential"
    ],
    "issuer": "https://example.edu/issuers/565049",
    "issuanceDate": "2010-01-01T00:00:00Z",
    "credentialSubject": {
        "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
        "degree": {
            "type": "BachelorDegree",
            "name": "Bachelor of Science and Arts"
        }
    }
}

it fails with ERR_UNKNOWN_CBORLD_TERM: ... degree ... but degree is defined in https://www.w3.org/2018/credentials/examples/v1.

documentLoader interface should not change...

const jsonld = require('jsonld');
const cborld = require('@digitalbazaar/cborld');

import { contexts, resolvedContexts } from '../__fixtures__';

it('can use jsonld.js documentLoader', async () => {
  const result = await jsonld.documentLoader(contexts.activitystreams);
  expect(result).toEqual(resolvedContexts.activitystreams);
});

it('can use cborld documentLoader', async () => {
  const result = await cborld.documentLoader(contexts.activitystreams);
  expect(result.toString()).toEqual(resolvedContexts.activitystreams.document);
});

Codec error handling activity stream collection

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Object history",
  "type": "Collection",
  "totalItems": 2,
  "items": [
    {
      "type": "Create",
      "actor": "http://www.test.example/sally",
      "object": "http://example.org/foo"
    },
    {
      "type": "Like",
      "actor": "http://www.test.example/joe",
      "object": "http://example.org/foo"
    }
  ]
}

Seeing

    -     Object {
    -       "actor": "http://www.test.example/sally",
    -       "object": "http://example.org/foo",
    -       "type": "Create",
    +     Map {
    +       143 => 15,
    +       64 => "http://www.test.example/sally",
    +       112 => "http://example.org/foo",
          },
    -     Object {
    -       "actor": "http://www.test.example/joe",
    -       "object": "http://example.org/foo",
    -       "type": "Like",
    +     Map {
    +       143 => 33,
    +       64 => "http://www.test.example/joe",
    +       112 => "http://example.org/foo",

Rethink diagnose option - should be logging instead?

@dlongley said:

diagnose seems more like a log function ... perhaps the docs should reflect this. Not sure what to think about this pattern

We should put some thought into how to do logging from deep within the cborld library.

https://w3id.org/security/ed25519-signature-2020/v1 does not resolve

same issue reported on #29

Add documentation for documentLoader

@mattcollier wrote:

The default documentLoader is used for CLI operations. documentLoader issues are very popular in jsonld related repos. Seems like a word or two about that would be good in the README.md.

Second conditional will never be hit in context.js

Since arrays are objects, the second conditional here will never be hit:

cborld/lib/context.js

Lines 29 to 31 in 499759e

    
           } else if(typeof value === 'object') { 
        
             contextUrls.push(...getJsonldContextUrls({jsonldDocument: value})); 
        
           } else if(Array.isArray(value)) {

The order needs to be swapped and better test coverage added.

CborldError: Unknown term 'name' was detected in JSON-LD input.Error: ERR_UNKNOWN_CBORLD_TERM: Unknown term 'name' was detected in JSON-LD input.

context: https://schema.org/version/latest/schemaorg-current-http.jsonld
input:

{
  "@context": "http://schema.org",
  "@type": "Person",
  "name": "Brent",
  "makesOffer": {
    "@type": "Offer",
    "priceSpecification": {
      "@type": "UnitPriceSpecification",
      "priceCurrency": "USD",
      "price": "18000"
    },
    "itemOffered": {
      "@type": "Car",
      "name": "2009 Volkswagen Golf V GTI MY09 Direct-Shift Gearbox",
      "description": "2009 Volkswagen Golf V GTI MY09 Direct-Shift Gearbox in perfect mechanical condition and low kilometres. It's impressive 2.0 litre turbo engine makes every drive a fun experience. Well looked after by one owner with full service history. It drives like new and has only done 50,000kms. (...)",
      "image": "2009_Volkswagen_Golf_V_GTI_MY09.png",
      "color": "Black",
      "numberOfForwardGears": "6",
      "vehicleEngine": {
        "@type": "EngineSpecification",
        "name": "4 cylinder Petrol Turbo Intercooled 2.0 L (1984 cc)"
      }
    }
  }
}

appContextMap seems unused

The activitystream example in the readme does not make use of appContextMap at all.

https://github.com/digitalbazaar/cborld#encode-to-cbor-ld

This feature seems like dangerous complexity, can it be safely removed?

Removing `instanceOf` check causes erroneous context processing

499759e#diff-d911385a221c0b7502eabe1fdac9d209L59-L61

Handle codecs for aliased `@id`.

When using an alias for @id, the optimized codecs, such as Base58DidUrlCodec, are not used. Example:

{ "@context": "....", "id": "did:key:..." }

Fix may be in lib/codec.js _generateTermEncodingMap.

	} else if(typeof value === 'object') {
	contextUrls.push(...getJsonldContextUrls({jsonldDocument: value}));
	} else if(Array.isArray(value)) {

digitalbazaar / cborld Goto Github PK

cborld's Issues

Recommend Projects

Recommend Topics

Recommend Org