Coder Social home page Coder Social logo

cbor-diag-rs's People

Contributors

bors[bot] avatar chrysn avatar dependabot-preview[bot] avatar nemo157 avatar satelles157 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cbor-diag-rs's Issues

Wishlist: support for CBOR Sequences

https://www.rfc-editor.org/rfc/rfc8742.html

For example, if support for cbor sequences were present, I believe the following should parse:

{
  echo '{"a":1}' | cbor-diag --to=bytes ;
  echo '{"b":2}' | cbor-diag --to=bytes ;
} | cbor-diag --to=annotated

Ideally the following would work as well (accept jsonl/ndjson and other representations of sequences of documents):

{ echo '{"a":1}' ; echo '{"b":2}' ; } | cbor-diag --to=annotated

Colorize output

The CLI should support colorizing the output data to make it easier to read, both diagnostic and annotated hex encoding.

This will probably have to be generated by the library, easiest might be to inject ANSI escape codes in the output strings, can probably be used in the website with something like https://www.npmjs.com/package/ansi-to-html (how to support Windows terminals? hopefully there's something that can parse ANSI escapes and generate the correct commands, or maybe there's a library that gives higher-level colorized strings via annotated spans that can be converted into different formats as needed).

An example of what jq does for colorizing JSON which the diagnostic colorization can probably be based on:

jq-colors

Retaining comments

I'm undecided whether this is worthwhile, but the AST could have nodes added to retain comments when parsing extended diagnostic notation. These could then be included when outputting either diagnostic notation, or as part of the annotated hex output.

This wouldn't really help with what I see as the main usecases though, converting EDN -> bytes for tooling that wants human writeable input, or converting bytes -> {EDN, annotated hex} for tooling that wants to show human readable output.

JSON output

Would it be possible to add a json output? Basically similar to the diag/compact form but without things like tags that throw off json parsing?

Whitespace within b64 string not accepted

Currently the cbor2diag tool seems to treat whitespace within hex-encoded bytestrings differently than for base64-encoded bytestrings. An example of bad behavior is below. It seems that the where the space is in the base64 sequence doesn't matter. Any space causes this issue.

Section G.1 of RFC 8610 indicates that all whitespace is ingored. Its examples uses hex encoding but my interpretation it it should also apply to base64 encoding.

[STRIP ~]$ echo "h'6865  6C6C 6F'" | diag2pretty.rb
45            # bytes(5)
   68656c6c6f # "hello"
[STRIP ~]$ echo "b64'aGVsbG8='" | diag2pretty.rb
45            # bytes(5)
   68656c6c6f # "hello"
[STRIP ~]$ echo "b64'aGVs bG8='" | diag2pretty.rb
*** can't parse b64'aGVs bG8='
*** Expected one of [0-9a-zA-Z_\-+/], [=], "'" at line 1, column 9 (byte 9) after b64'aGVs

Distribute CLI as precompiled wasm blob

Some very quick testing via cargo-wasi shows that everything except colored output appears to work (probably need to add some support for something in termcolor). From a release build wasmtime appears to need ~7s on a Macbook Air to JIT the optimized wasm blob on first run, and it's pretty much instantaneous on future runs.

How to actually do this? 🤷‍♂

One additional thing I would like to do with this is to sandbox the tool more. The only thing it needs access to is stdin+stdout+args, all network and filesystem access should be disabled. It doesn't look like running from wasmtime as a CLI tool supports this, so maybe cbor-diag-cli should use wasmtime as a library if there's some way to do so there.

Retaining byte string serialization variants

Byte strings have two wide-spread serialization variants: 'text' and h'74657874' (and the rarer b32, h32 and b64, which I personally don't care about but hey they're there) prefixes. It would be nice if this could be preserved, maybe as an extra Option property of ByteString.

Looking at RFC8610 Appendix G Extended Diagnostic Notation provides even more options (including internal whitespace and embedded CBOR); they are more complex and not really on my wish-list, but it might be good to be aware of it when implementing to not duplicate work if that later becomes relevant.

This would be especially convenient when building a diagnostic notation programmatically.

This would probably share patterns with #117, in that it is a property that is set when coming from DN, but unavailable when coming from CBOR. Filling out those gaps when going from arbitrary CBOR to DN could be done by the user at the AST stage by applying arbitrary heuristics, some of which may be provided by cbor-diag-rs, but that's ultimately application specific. (For example, a simple universal heuristic would be taking the ratio of printable ASCII characters; a more application specific choice might be guided by CDDL).

API changes for extensibility

Given how public the core DataItem type is, I think that issues #117, #132 and #138 will all need API changes: Neither can DataItem gain another variant CommentedItem { comment_before: String, comment_after: String, item: Box<DataItem>} , nor can ByteString gain a field that tells whether it's a '', a h'' or a b64'' string.

I've briefly tried sprinkling in a few #[non_exhaustive] (which are a breaking API change), but they had the side effect of breaking construction as DataItem::Integer { value: 10, bitwidth: 0 } as that may have missed fields.

I suggest that a single API change be made in which a lot of API is made private (possibly DataItem even turns into a struct so it can have hidden internals). Maybe the breaking change would not even add the features, just change the types so that extensibility is possible. After, things could look like constructors that are more focused on the values:

let item = DataItem::integer(42);

We could still allow setting bit widths etc, but I don't think that it's practically needed often. (It gets set through internal access anyway when deserializing CBOR, but when constructing diagonstic notation programmatically, I don't think it should be the focus of ergonomics).

Thing is: I don't know the typical use of the crate well enough to make a full proposal yet -- I mainly use it to do a full conversion without ever touching intermediate artifacts (yet -- once #132 is in, I'll start reaching in more).

Improve "trivial" estimation

The current "pretty" format uses an estimation of when a value is "trivial" in order to print it on one line when it would normally be split to multiline, e.g. an array containing a single trivial element, a map containing a trivial key-value pair:

https://github.com/Nemo157/cbor-diag-rs/blob/8fdda103fce2cfaad7982365986001d904e3821f/src/encode/diag.rs#L21-L67

This has issues, e.g. { "hello": "world" } will be kept on one line, but [1, 2, 3, 4] will be split over 6 lines despite being a similar size/complexity.

A better heuristic would be to perform an estimate of how much space the value will take to print, and set an arbitrary upper limit on that. For most data the estimate can probably be perfect, but some like floating point numbers are probably not worth calculating exactly. Some data types like maps might apply an extra "cost" to their contained values because of their relative complexity.

Source-map style position annotation

For interactive editing (highlighting cursor positions in a two-paned hex and diagnostic view), or for debugging (implementing pd-body-error-position), it would be cool to match ranges of bytes encoded in CBOR to ranges of bytes encoded in diagnostic notation -- similar to how a compiler outputs debug information matching instructions to source lines.

This tangentially related to #20, as it would pave the way to color-highlighting hex output.

One thing that'll make this relatively hard for this crate is that it's interconverting via a mutable AST (which on its own is great, just needs some more effort here). A relatively easy API would be to turn a CBOR byte string into a DN text string (or vice versa), and also produce a source map as a list of corresponding (frequently nested) ranges. There's probably a design pattern by which the AST can keep cursors in two serializations, but I don't know how to make a pretty API out of it, or how to do it with neither pinning nor Rc'ing nor indices for which it isn't completely clear which slice they relate to.

wishlist: EDN literals

With edn-literals recently adopted by the CBOR WG, I'd love to see support for that in here. Implementing cri"" won't make much sense yet because CRI is not finished (when it is, I can provide the workhorse crate), but dt"" seems stable enough.

Like EDN embedded CBOR, this can be ingested without any extra processing, but producing it will need decisions (with "we don't produce it" being a viable first step). The same mechanism that would resolve #132 could be used to eventually produce them -- it'd just need generalization (because now suddenly a float or integer can either be encoded directly with a bitwidth, or the dt EDN literal).

Overflow crash

This issue can be reproduced ONLY when compiling the CLI in Debug; it does NOT occur when compiling the CLI in Release.

Run

$ cargo build -p cbor-diag-cli

then run

$ echo a300470128bf0000002c01c11a62978f4d02a100820400 | ./target/debug/cbor-diag --to diag
thread 'main' panicked at 'attempt to subtract with overflow', src/encode/diag.rs:42:42

The CBOR diagnostic notation of the hex string is

{
    0: h'0128bf0000002c',
    1: 1(1654099789_2),
    2: {0: [4, 0]},
}

The crash occurs because the len variable is hardcoded to 4 and the max parameter given to the estimate() function is less than 4, overflowing the unsigned subtraction max - len.

Thanks a lot for making and sharing this library!

Program does not work?

When copy pasting the example from crates.io output is "<< was unexpected at this time."
None of the examples listed work for me....

Add "pretty" diagnostic notation

Should probably change the current --to diag output to output prettified diagnostic notation and add another --to compact or something to output a compact form.

Update from nom 5

Building nom with a recent Rust raises warnings about future compatibility:

warning: the following packages contain code that will be rejected by a future version of Rust: nom v5.1.2

It appears that nom uses some macro syntax that was only accidentally valid.

I just briefly tried bumping the nom version to 7, but while I could do the change from separated_list to separated_list0 easily (nom 6's changelog says it was effectively renamed), other errors seem to require understanding what it does, for which unfortunately I don't have the time right now.

Support streaming input

CLI should support streaming in undelimited CBOR data items. Will need some kind of update to the library to support reading a prefix of an input and returning where it ended.

Compile error when installing cbor-diag-cli

Hi,
I am new to rust but want to use the cbor-diag-cli. When using the cargo command to install I get a buch of errors of the same category:

error[E0632]: cannot provide explicit generic arguments when `impl Trait` is used in argument position
   --> /home/fmetz/.cargo/registry/src/github.com-1ecc6299db9ec823/cbor-diag-0.1.11/src/encode/hex.rs:506:27
    |
506 |             typed_array::<1>(context, value, "unsigned", |[byte]| byte.to_string())
    |                           ^ explicit generic argument not allowed
    |
    = note: see issue #83701 <https://github.com/rust-lang/rust/issues/83701> for more information

Any help ?
Best Regards
Fréd

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.