cyphrme / coze Goto Github PK

View Code? Open in Web Editor NEW

107.0 8.0 3.0 4.19 MB

Coze is a cryptographic JSON messaging specification.

Home Page: https://cyphr.me/coze

License: BSD 3-Clause "New" or "Revised" License

Go 100.00%

authentication cryptography json coze auth login jwt es256 es384 es512

coze's Issues

Use JSONv2 when production ready

Instead of making various issues for various JSON concerns. I'm going to use this issue to track any concerns with JSON.

Coze needs strictly defined JSON capabilities. Some of these capabilities are not provided by the standard
Go library, or misbehaves under certain circumstances. Currently, there are no known 3rd party libraries that we consider suitable for Coze. Our hope is that we can use JSONv2 when production ready and that will solve any JSON concerns.

A new JSON library should minimally include these characteristics in addition to
the existing behavior of the standard library:

Duplicate fields should error.
Preserve order of JSON fields.
Invalid UTF-8 should error.
JSON should not be HTML escaped. (For example when containing the characters &, <,>. and while not adding an additional new line as the standard library currently does.)

Three of the larger obstacles:

To resolve these obstacles, Coze implemented orderedmap, its own custom JSON unmarshaler, and a "checkDuplicate" helper function.

A search of other Go JSON issues that may be relevant to Coze:
https://github.com/golang/go/issues?q=is%3Aissue+is%3Aopen+encoding%2Fjson+in%3Atitle

JSONv2

See https://pkg.go.dev/github.com/go-json-experiment/json
There are other best practices in JSONv2.

Issues

Unsigned Alg and Tmb

Proposal here:

https://github.com/Cyphrme/CozeX/blob/master/proposal/unsigned_alg_tmb.md

Wait for more feedback discussion (mostly in the matrix room)
Implement in Go Coze, CozeJS, the power Verifier, and the simple verifier.

JSON round-trip fails for MapSlice

It's possible to successfully unmarshal valid JSON into a MapSlice, but then fail to marshal that same MapSlice back into JSON.

I'm assuming this isn't desired behavior, but I could be wrong.

POC: https://go.dev/play/p/MlNV74p2cCY

package main

import (
	"encoding/json"
	"log"

	"github.com/cyphrme/coze"
)

func main() {
	b := []byte("{\"\x7f\":{}}") // discovered via fuzzing
	if !json.Valid(b) {
		log.Fatal("invalid JSON")
	}

	var ms coze.MapSlice
	if err := json.Unmarshal(b, &ms); err != nil {
		log.Fatalf("json.Unmarshal: %v", err)
	}

	if _, err := json.Marshal(ms); err != nil {
		log.Fatalf("json.Marshal: %v", err) // fails here
	}
}

Duplicate JSON keys create misleading verification results in the web UI

During a conversation, I was sent this example message: https://cyphr.me/coze#?input={%22pay%22:{%22msg%22:%22Hello%20Retr0id!%22,%22alg%22:%22ES256%22,%22iat%22:1701603747,%22tmb%22:%221KGZzsqiFAYE5uDO3CCh3PMl9pqwlG1RrI8i5gZY94c%22,%22typ%22:%22cyphr.me/msg/create%22},%22sig%22:%22vZEALUQShRlLYyhJoGmkxpQhhEeUPlzbIZqyTsUyPBMlJMIqGClgSs3uXTDiwumsMivFQKU8K4z4Ec7WlxYIew%22}&dontSignRevoke&updateIat&selectedAlg=ES256&verify

By introducing a duplicate "msg" key, I was able to forge a new message that also passes signature verification according to the web UI: https://cyphr.me/coze#?input={%22pay%22:{%22msg%22:%22Hello,%20Zamicol.%20I%20believe%20you%20will%20find%20that%20this%20Coze%20message%20is%20also%20(supposedly)%20signed%20by%20your%20key!%20Coze%20could%20fix%20this%20issue%20with%20better%20UI,%20but%20I%20think%20this%20illustrates%20just%20how%20hard%20it%20is%20to%20canonicalize%20JSON.%20This%20wouldn't%20be%20possible%20in%20the%20first%20place%20if%20you%20were%20signing%20base64'd%20bytes.%22,%22msg%22:%22Hello%20Retr0id!%22,%22alg%22:%22ES256%22,%22iat%22:1701603747,%22tmb%22:%221KGZzsqiFAYE5uDO3CCh3PMl9pqwlG1RrI8i5gZY94c%22,%22typ%22:%22cyphr.me/msg/create%22},%22sig%22:%22vZEALUQShRlLYyhJoGmkxpQhhEeUPlzbIZqyTsUyPBMlJMIqGClgSs3uXTDiwumsMivFQKU8K4z4Ec7WlxYIew%22}&dontSignRevoke&updateIat&selectedAlg=ES256&verify

The web UI gives no indication that there's anything awry here, and proclaims that the message was verified.

This would be a non-issue in many use-cases, because any code reading the msg parameter will likely see only the real one - but an implementation in some other language may see only the first, causing breakage. But, for the purposes of the web UI, it's definitely misleading.

As an aside, I think Coze's general design/approach is fine, but dealing with JSON like this is error-prone, and here is one such error - one which is hopefully an easy fix.

Make new repositories for the specification and implementations

Edit:

Coze repository organization

- Coze          ("Core"/main specification and the Go Coze reference implementation)
- Coze_x        (Coze extended)
- Coze_go_x     (Go implementation of extended features)
- Coze_js       (Javascript implementation)
- Coze_js_x     (Javascript implementation of extended)
- etc...

Discussion on the main spec or the Go reference implementation should go into
Coze, aka "core".
Discussion on "x" design goes into Coze_x. Future discussion on
implementing/supporting new algorithms also goes into x, as implementations of
new algorithms will live in x first before being adopted by core. Only
established and widely adopted algorithms are eventually included into core.
Every language that implements Coze should be in it's own respective language
specific directory (except the Go reference implementation).
It is suggested that implementation of "x" features should go into the
appropriate language "x" repository.

Old

This repository should be split into a few different repositories. It is usually good practice to split out documents from code so that developers aren't burdened with reviewing changes to the repository.

I suggest the following repositories:

coze
coze_go
coze_go_experimental
coze_js
coze_js_experimental
etc...

This unfortunately would probably require renaming "Coze" to "coze". "coze" would be for the main spec (README.md) which defines Coze Core, and also documents, discussion, proposals, best practices, and FAQ. Github appears to be a good place for discussion and document modification, but we don't want a large number of developers worrying about the changes happening in the repository due to simple document modification.

Experimental is for new algorithms and useful Coze related libraries not in the core spec (normal would be one such library). New algorithms are allowed time to mature first in experimental before getting accepted into "Core".

The plan for Coze Core

The only planned expansion of "Coze Core", the implementation and specification that currently lives in this repo, is adding new algorithms. No changes are currently planned to the spec other than per algorithm adjustments. However, we wanted to give this more time and receive more feedback before we made a more final decision.

There will be minor tweaks on a per algorithm basis for Coze Core. For example, @LoupVaillant suggested doing the following when handling Ed25519:

My choice for Monocypher was to do the same as Zebra:

Reject any S that equals or exceeds the order of the curve.

Accept low-order A and R.

Accept non-canonical A and R.

Use the batch verification equation (it's the forgiving one).

This needs to be specified in Coze Core so that all implementations align.

Alternative Repository Structure

We could do a less dramatic division. However, this is less normalized so I'd advocate for the aforementioned naming.

Coze - The Go (and reference) implementation of Coze. (This repo.)
Cozejs (Already exists)
coze_spec - "The main spec, documents, discussion, proposals, best practices, and FAQ.
coze_experimental - For useful libraries "Standard" and

Enforce Canonical Base 64 encoding.

Playground demonstrating the issue:

Using Coze https://go.dev/play/p/l_dZ9q4DZAA
Pure Go (non-strict): https://go.dev/play/p/N1rZckATLOf
Pure Go (strict): https://go.dev/play/p/t2nXCD8VEmw // Correctly errors.

There's an apparent problem with RFC 4648. There are three places base 64 representation may contain string variation:

Padding
Alphabet (URI unsafe or URI safe)
Canonical encoding (various characters can encode to the same byte string, but there is only one canonical decoding)

What is "canonical encoding"? From the last three characters of the example tmb, "cLj8vs...XNuhOk", the values hOk and hOl may both decode to the same byte value (in Hex, 84E9) even though they are different UTF-8 values. (Example decoding hOk and hOl.) The canonical encoding is hOk

The RFC specifically addresses 1 and 2, but not really 3.

RFC 4648 advises to reject non-alphabet characters, which can include padding. I agree with this advice:

Implementations MUST reject the encoded data if it contains
characters outside the base alphabet when interpreting base-encoded
data, unless the specification referring to this document explicitly
states otherwise. [...] Furthermore, such specifications MAY ignore the pad
character, "=", treating it as non-alphabet data[.]

I don't see the RFC really address the to the third concern.

Behavior

Obviously non-"strict"/non-canonical base 64 encoding is incorrect, and any encoder producing non-strict encoding should be fixed. However the question is what should Coze specify regarding non-strict encoding/decoding? Both Go and Javascript are permissive when decoding and do not throw errors.

Ultimately, the concern is different base 64 encoders/decoders may have different behavior. Ideally, Coze should specify the appropriate behavior for Coze. Section 3.5 mentions non-canonical encoding in the context of unpadded data and this issues is unrelated to padding (hOk= and hOl=, both padded, have the same issue as unpadded strings).

The concern is that if a Coze implementation used string comparison instead of byte comparison, this could result implementations disagreeing about valid messages. For example, with a non-strict tmb encoded string, if a Coze implementation checks tmb before cryptographic verification, it may check this based on the string value or the byte value, and comparing the string value or the byte value will result in different behavior.

Another note for any Coze restriction on encoding: JSON is base 64 unaware, any sort of Coze specified enforcement of base 64 encoding can only be applied to Coze known fields with type b64ut, and cannot be applied generally to any b64ut value.

Solutions

There appears to be only two options to handle this:

Be permissive on inbound encoding, force strict outbound encoding.
Force strict encoding and decoding. (This can only be done when type is known to be b64ut.)

2 is more conservative, but may require unnecessary checks that don't really add value. 1 has the potential to be more compatible if assuming that systems can decode permissively (other programming language's base 64 libraries decode permissively), which may be a bad assumption.

Regardless, I believe that 1 is the correct behavior here. Even if languages/system do no error on non-canonical encoding, implementing an encoding error can be implemented by re-encoding the decoded data and comparing strings.

Security Considerations

This base 64 decoding bug doesn't appear to be a structural/architectural/security concern since Coze uses the UTF-8 encoding of the string for signing and verification, however it is a interesting problem that should be known when working with RFC base 64. Concerning specifically replay attacks, signatures are still not malleable as payloads are UTF-8 encoded and the signing operation is not base 64 aware.

If Coze used the base 64 representation directly, this would be a security concern and could result in reply attacks.

Notes

It should be obvious, but this situation also applies to the URI unsafe alphabet and messages with base 64 padding, which all are interpreted as the same bytes. (My conversion tool only has "base64 as an input and not the various permutations since all variations can be known (or is irrelevant) and results in the sames decoded binary payload.

RFC 4648

I currently have errata open on one of the relevant sections.

I'm going to implement a non-canonical encoding check on Go and JS Coze.

Locally hosted Cyphr.me's verifier.

Although the simple verifier is already in-browser, offline compatible, and is easily locally hosted, eventually we hope to provide the same ability with power verifier. It probably could be done programatically using a tool like gildas-lormeau/SingleFile

Coze Verifier
Simple Coze Verifier
Github Hosted Simple Verifier
Simple Verifier Codebase

Think about adding "MustMarshal" and "MarshalPretty"

MapItem is unsafe, and MapSlice does not have a well-defined order

indexCounter is accessed without synchronization, which produces data races that violate the memory model. As a proof of concept, build and run the following program with the -race flag.

package main

import (
	"encoding/json"
	"sync"

	"github.com/cyphrme/coze"
)

func main() {
	n := 10
	var wg sync.WaitGroup
	for i := 0; i < n; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			var item coze.MapItem
			json.Unmarshal([]byte(`{}`), &item)
		}()
	}
	wg.Wait()
}

You'll see output like the following.

==================
WARNING: DATA RACE
Read at 0x00010041e838 by goroutine 13:
  github.com/cyphrme/coze.nextIndex()
      .../pkg/mod/github.com/cyphrme/[email protected]/mapslice.go:36 +0xf0
  github.com/cyphrme/coze.(*MapItem).UnmarshalJSON()
      .../pkg/mod/github.com/cyphrme/[email protected]/mapslice.go:118 +0x11c
  encoding/json.(*decodeState).object()
      ...

Previous write at 0x00010041e838 by goroutine 12:
  github.com/cyphrme/coze.nextIndex()
      .../pkg/mod/github.com/cyphrme/[email protected]/mapslice.go:36 +0x108
  github.com/cyphrme/coze.(*MapItem).UnmarshalJSON()
      .../pkg/mod/github.com/cyphrme/[email protected]/mapslice.go:118 +0x11c
  encoding/json.(*decodeState).object()
      ...

Consider documenting that `rvk` denotes expiry for Coze keys.

The word revoke may connotate that a key was never trusted/authorized.

The specification should better explain that rvk is expected to be an expiry time.

Cozies signed before the rvk time should be considered valid and action signed after should be ignored.

As is currently noted by the specification, actionable events based on future expiration times are outside the scope of Coze, and revoke messages with future times should result in the signing key considered immediately expired.

Thanks @qbit for bringing up this concern.

cmd

A cli client would be fantastic.

sign
signpay
verify
newkey
tmb
meta
revoke

JSON doesn't define "integer" -- int values need to be strings

JSON doesn't define "integer" but only the concept of a "number" which doesn't require any specific precision, leaving that up to the implementation. JSON numbers therefore can't be assumed to have any specific value, and fields iat and rvk would need to be specified as strings in order to provide deterministic and bijective encoding.

Godoc in 1.19 does not appear to link to packages as documented.

See issue golang/go#51082 (comment)

Once that is addressed, I'll update the docs with package links.

Active

Is this project still active?

Further constraints on Ed25519

@LoupVaillant suggested doing the following when handling Ed25519:

In my opinion standardizing signatures and public keys is much more
important than worrying about anything related to the private key. And
just at that level you have to grapple with much more fundamental issues
than how to define your private key:

https://hdevalence.ca/blog/2020-10-04-its-25519am

So you have a public key A, and a signature R || S.
A and R are points on the curve, and S is just a number.
Thankfully, the main issues were dealt with from the beginning:

Points on the curve are compressed as a field element and a sign bit.
All numbers are encoded in little-endian.
A, R, and S are all serialised with 32 bytes.

But there's still room for variation in the verifier:

Do we accept S when it exceeds the order of the curve?
Do we accept A and R when they have low order?
Do we accept non-canonical encodings of A and R?
What verification equation do we use exactly?

When two verifiers disagree on any of the above, this can cause problems
when maliciously crafted signatures end up being accepted by some and
rejected by others, leading to problems like network partitions. Worse,
the RFC didn't clearly answer all of those questions, and allowed users
to chose which verification equation they would use. And it's difficult
in practice to find two implementations that behave identically. It's a
freaking nightmare.

My choice for Monocypher was to do the same as Zebra:

Reject any S that equals or exceeds the order of the curve.
Accept low-order A and R.
Accept non-canonical A and R.
Use the batch verification equation (it's the forgiving one).

The reason I reject high S is because (i) everyone else does, and (ii)
accepting it would enable malleability attacks. For everything else I
chose to be as permissive as possible. This has the advantage of being
backwards compatible with any other implementation: no signature that
was previously accepted will be rejected.

The RFC on the other hand made the following choices:

Reject any S that equals or exceeds the order of the curve.
Accept low-order A and R.
Reject non-canonical A and R.
Leave equation choice to the implementer.

I personally disagree with the last two items. Interoperability with
batch verification (which is twice as fast as regular verification)
should be mandatory, and rejecting non-canonical points makes the code
more complex for no benefit at all.

You'll have to make your own choice too if you want a complete
specification. I personally would recommend you imitate Zebra and
Monocypher, because many implementations can be made compatible with a
bit of pre-processing:

Reject the signature if S is too big. Almost all implementations
already do this however, so you can generally skip this step.
If both A and R have low order, and S == 0, accept the signature.
In total, low order points have 14 different encodings, so you can
just use a table and compare buffers to do that check.
Run your implementation of choice. It must use the batch equation.
If it accepts the signature, accept it.
If it rejects the signature, reject it.

Ran into this great presentation on normalizing/standardizing Ed25519, "Taming the many EdDSAs"
https://csrc.nist.gov/csrc/media/Presentations/2023/crclub-2023-03-08/images-media/20230308-crypto-club-slides--taming-the-many-EdDSAs.pdf

Also consider the advice in:

https://github.com/MystenLabs/ed25519-unsafe-libs

404 in Readme.md

Base64 encoding can only elide padding when the size of encoded data is known

https://github.com/Cyphrme/Coze/blob/01c154e4024b4e876b8d152166ce85cf2a945e22/README.md#coze-fields

Binary values are encoded as RFC 4648 base64 URI with padding truncated (b64ut).

https://www.rfc-editor.org/rfc/rfc4648#section-3.2

when assumptions about the size of transported data cannot be made, padding is required to yield correct decoded data.

As far as I can tell, the size of binary values is not communicated to recipients, and therefore padding should not be truncated. (The URI encoding is also non-standard.)

Expunge "cryptographic agility" from Coze vocabulary

@LoupVaillant suggested to avoid semantical confusion by avoiding entirely the term "cryptographic agility". In its place we could say "provide loose primitive coupling."

The design goals would then become:

Valid and idiomatic JSON.
Human readable and writable.
Small in scope.
Provide loose primitive coupling.

Edit: Thinking more on this, perhaps the fourth design goal is dropped altogether since Coze implementations inherently provide loose primitive coupling, and that phrase itself would need to be rigidly defined. It also doesn't capture what I was trying to convey, that Coze provides "versioning" via "alg". Perhaps even something along the lines of "provide defined cipher suites". Edit2: or "Specify cipher suite expectations"

implement kyber (liboqs) ciphers for Coze

post quantum crypto ftw?

implement pure python or python wrapper lib

if you can get python versions of this, there are entire ecosystems like django / flask that can adopt coze in place of JWT.