nullus157 / cbor-diag-rs Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 6.0 491 KB

Support for parsing/encoding CBOR diagnostic notation and annotated hex

Home Page: https://cbor.nemo157.com

License: Apache License 2.0

Rust 100.00%

cbor diagnostics

cbor-diag-rs's People

Contributors

Stargazers

Watchers

Forkers

isgasho fmorency nemo157 pinkdiamond1 chrysn-pull-requests chendave

cbor-diag-rs's Issues

Wishlist: support for CBOR Sequences

https://www.rfc-editor.org/rfc/rfc8742.html

For example, if support for cbor sequences were present, I believe the following should parse:

{
  echo '{"a":1}' | cbor-diag --to=bytes ;
  echo '{"b":2}' | cbor-diag --to=bytes ;
} | cbor-diag --to=annotated

Ideally the following would work as well (accept jsonl/ndjson and other representations of sequences of documents):

{ echo '{"a":1}' ; echo '{"b":2}' ; } | cbor-diag --to=annotated

Colorize output

The CLI should support colorizing the output data to make it easier to read, both diagnostic and annotated hex encoding.

This will probably have to be generated by the library, easiest might be to inject ANSI escape codes in the output strings, can probably be used in the website with something like https://www.npmjs.com/package/ansi-to-html (how to support Windows terminals? hopefully there's something that can parse ANSI escapes and generate the correct commands, or maybe there's a library that gives higher-level colorized strings via annotated spans that can be converted into different formats as needed).

An example of what jq does for colorizing JSON which the diagnostic colorization can probably be based on:

Retaining comments

I'm undecided whether this is worthwhile, but the AST could have nodes added to retain comments when parsing extended diagnostic notation. These could then be included when outputting either diagnostic notation, or as part of the annotated hex output.

This wouldn't really help with what I see as the main usecases though, converting EDN -> bytes for tooling that wants human writeable input, or converting bytes -> {EDN, annotated hex} for tooling that wants to show human readable output.

JSON output

Would it be possible to add a json output? Basically similar to the diag/compact form but without things like tags that throw off json parsing?

Update to nom 5

Probably should be done before #17 in case the errors change.

Cargo test with minimal versions fails

proptest depends on rand, which currently fails to build with minimal versions: rust-random/rand#741

Once that's resolved should try to change the minimal CI job from just building to actually running the tests.

Whitespace within b64 string not accepted

Currently the cbor2diag tool seems to treat whitespace within hex-encoded bytestrings differently than for base64-encoded bytestrings. An example of bad behavior is below. It seems that the where the space is in the base64 sequence doesn't matter. Any space causes this issue.

Section G.1 of RFC 8610 indicates that all whitespace is ingored. Its examples uses hex encoding but my interpretation it it should also apply to base64 encoding.

[STRIP ~]$ echo "h'6865  6C6C 6F'" | diag2pretty.rb
45            # bytes(5)
   68656c6c6f # "hello"
[STRIP ~]$ echo "b64'aGVsbG8='" | diag2pretty.rb
45            # bytes(5)
   68656c6c6f # "hello"
[STRIP ~]$ echo "b64'aGVs bG8='" | diag2pretty.rb
*** can't parse b64'aGVs bG8='
*** Expected one of [0-9a-zA-Z_\-+/], [=], "'" at line 1, column 9 (byte 9) after b64'aGVs

Distribute CLI as precompiled wasm blob

Some very quick testing via cargo-wasi shows that everything except colored output appears to work (probably need to add some support for something in termcolor). From a release build wasmtime appears to need ~7s on a Macbook Air to JIT the optimized wasm blob on first run, and it's pretty much instantaneous on future runs.

How to actually do this? 🤷‍♂

One additional thing I would like to do with this is to sandbox the tool more. The only thing it needs access to is stdin+stdout+args, all network and filesystem access should be disabled. It doesn't look like running from wasmtime as a CLI tool supports this, so maybe cbor-diag-cli should use wasmtime as a library if there's some way to do so there.

Nightly build failed

Failed Run

Retaining byte string serialization variants

Byte strings have two wide-spread serialization variants: 'text' and h'74657874' (and the rarer b32, h32 and b64, which I personally don't care about but hey they're there) prefixes. It would be nice if this could be preserved, maybe as an extra Option property of ByteString.

Looking at RFC8610 Appendix G Extended Diagnostic Notation provides even more options (including internal whitespace and embedded CBOR); they are more complex and not really on my wish-list, but it might be good to be aware of it when implementing to not duplicate work if that later becomes relevant.

This would be especially convenient when building a diagnostic notation programmatically.

This would probably share patterns with #117, in that it is a property that is set when coming from DN, but unavailable when coming from CBOR. Filling out those gaps when going from arbitrary CBOR to DN could be done by the user at the AST stage by applying arbitrary heuristics, some of which may be provided by cbor-diag-rs, but that's ultimately application specific. (For example, a simple universal heuristic would be taking the ratio of printable ASCII characters; a more application specific choice might be guided by CDDL).

Nightly build failed

Failed Run

API changes for extensibility

Given how public the core DataItem type is, I think that issues #117, #132 and #138 will all need API changes: Neither can DataItem gain another variant CommentedItem { comment_before: String, comment_after: String, item: Box<DataItem>} , nor can ByteString gain a field that tells whether it's a '', a h'' or a b64'' string.

I've briefly tried sprinkling in a few #[non_exhaustive] (which are a breaking API change), but they had the side effect of breaking construction as DataItem::Integer { value: 10, bitwidth: 0 } as that may have missed fields.

I suggest that a single API change be made in which a lot of API is made private (possibly DataItem even turns into a struct so it can have hidden internals). Maybe the breaking change would not even add the features, just change the types so that extensibility is possible. After, things could look like constructors that are more focused on the values:

let item = DataItem::integer(42);

We could still allow setting bit widths etc, but I don't think that it's practically needed often. (It gets set through internal access anyway when deserializing CBOR, but when constructing diagonstic notation programmatically, I don't think it should be the focus of ergonomics).

Thing is: I don't know the typical use of the crate well enough to make a full proposal yet -- I mainly use it to do a full conversion without ever touching intermediate artifacts (yet -- once #132 is in, I'll start reaching in more).

Add function to output unannotated hex

Needed for CLI, would be useful for generating links in webpage as well

https://github.com/Nemo157/cbor-diag-rs/blob/ae83dddee571c6769e9e79abd4ca97ad06a581b2/cli/src/main.rs#L111-L113

RUSTSEC-2020-0159 - Potential segfault in localtime_r invocations

Affects chrono < 0.4.20

See https://rustsec.org/advisories/RUSTSEC-2020-0159

Add useful errors

https://github.com/Nemo157/cbor-diag-rs/blob/ae83dddee571c6769e9e79abd4ca97ad06a581b2/src/error.rs#L33

Nightly build failed

Failed Run

Improve "trivial" estimation

The current "pretty" format uses an estimation of when a value is "trivial" in order to print it on one line when it would normally be split to multiline, e.g. an array containing a single trivial element, a map containing a trivial key-value pair:

https://github.com/Nemo157/cbor-diag-rs/blob/8fdda103fce2cfaad7982365986001d904e3821f/src/encode/diag.rs#L21-L67

This has issues, e.g. { "hello": "world" } will be kept on one line, but [1, 2, 3, 4] will be split over 6 lines despite being a similar size/complexity.

A better heuristic would be to perform an estimate of how much space the value will take to print, and set an arbitrary upper limit on that. For most data the estimate can probably be perfect, but some like floating point numbers are probably not worth calculating exactly. Some data types like maps might apply an extra "cost" to their contained values because of their relative complexity.

Source-map style position annotation

For interactive editing (highlighting cursor positions in a two-paned hex and diagnostic view), or for debugging (implementing pd-body-error-position), it would be cool to match ranges of bytes encoded in CBOR to ranges of bytes encoded in diagnostic notation -- similar to how a compiler outputs debug information matching instructions to source lines.

This tangentially related to #20, as it would pave the way to color-highlighting hex output.

One thing that'll make this relatively hard for this crate is that it's interconverting via a mutable AST (which on its own is great, just needs some more effort here). A relatively easy API would be to turn a CBOR byte string into a DN text string (or vice versa), and also produce a source map as a list of corresponding (frequently nested) ranges. There's probably a design pattern by which the AST can keep cursors in two serializations, but I don't know how to make a pretty API out of it, or how to do it with neither pinning nor Rc'ing nor indices for which it isn't completely clear which slice they relate to.

Nightly build failed

Failed Run

wishlist: EDN literals

With edn-literals recently adopted by the CBOR WG, I'd love to see support for that in here. Implementing cri"" won't make much sense yet because CRI is not finished (when it is, I can provide the workhorse crate), but dt"" seems stable enough.

Like EDN embedded CBOR, this can be ingested without any extra processing, but producing it will need decisions (with "we don't produce it" being a viable first step). The same mechanism that would resolve #132 could be used to eventually produce them -- it'd just need generalization (because now suddenly a float or integer can either be encoded directly with a bitwidth, or the dt EDN literal).

Overflow crash

This issue can be reproduced ONLY when compiling the CLI in Debug; it does NOT occur when compiling the CLI in Release.

Run

$ cargo build -p cbor-diag-cli

then run

$ echo a300470128bf0000002c01c11a62978f4d02a100820400 | ./target/debug/cbor-diag --to diag
thread 'main' panicked at 'attempt to subtract with overflow', src/encode/diag.rs:42:42

The CBOR diagnostic notation of the hex string is

{
    0: h'0128bf0000002c',
    1: 1(1654099789_2),
    2: {0: [4, 0]},
}

The crash occurs because the len variable is hardcoded to 4 and the max parameter given to the estimate() function is less than 4, overflowing the unsigned subtraction max - len.

Thanks a lot for making and sharing this library!

Program does not work?

When copy pasting the example from crates.io output is "<< was unexpected at this time."
None of the examples listed work for me....

Add "pretty" diagnostic notation

Should probably change the current --to diag output to output prettified diagnostic notation and add another --to compact or something to output a compact form.

Update strum

Revert code changes from Nemo157@77d606c once strum 0.16 is released (hopefully soon Peternator7/strum#69).

Update from nom 5

Building nom with a recent Rust raises warnings about future compatibility:

warning: the following packages contain code that will be rejected by a future version of Rust: nom v5.1.2

It appears that nom uses some macro syntax that was only accidentally valid.

I just briefly tried bumping the nom version to 7, but while I could do the change from separated_list to separated_list0 easily (nom 6's changelog says it was effectively renamed), other errors seem to require understanding what it does, for which unfortunately I don't have the time right now.

error[E0632]: cannot provide explicit generic arguments when `impl Trait` is used in argument position
   --> /home/fmetz/.cargo/registry/src/github.com-1ecc6299db9ec823/cbor-diag-0.1.11/src/encode/hex.rs:506:27
    |
506 |             typed_array::<1>(context, value, "unsigned", |[byte]| byte.to_string())
    |                           ^ explicit generic argument not allowed
    |
    = note: see issue #83701 <https://github.com/rust-lang/rust/issues/83701> for more information

Any help ?
Best Regards
Fréd

nullus157 / cbor-diag-rs Goto Github PK

cbor-diag-rs's People

Contributors

Stargazers

Watchers

Forkers

cbor-diag-rs's Issues

Recommend Projects

Recommend Topics

Recommend Org