3hren / msgpack-rust Goto Github PK

MessagePack implementation for Rust / msgpack.org[Rust]

License: MIT License

Rust 100.00%

rust rmp crates-rmp serde messagepack msgpack serialization decoding

msgpack-rust's Introduction

RMP - Rust MessagePack

RMP is a complete pure-Rust MessagePack implementation. MessagePack a compact self-describing binary serialization format.

This project consists of three crates:

RMP-Serde (Documentation) — easy serializing/deserializing via Serde.
RMP-Value (Documentation) — a universal Value enum that can hold any MessagePack type. Allows deserializing arbitrary messages without a known schema.
RMP (Documentation) — low-level functions for reading/writing encoded data.

Features

Convenient and powerful APIs

RMP is designed to be lightweight and straightforward. There is a high-level API with support for Serde, which provides you convenient interface for encode/decode Rust's data structures using derive attribute. There are also low-level APIs, which give you full control over data encoding/decoding process, with no-std support and without heap allocations.
Zero-copy value decoding

RMP allows to decode bytes from a buffer in a zero-copy manner. Parsing is implemented in safe Rust.
Robust, stable and tested

This project is developed using TDD and CI, so any found bugs will be fixed without breaking existing functionality.

Why MessagePack?

It's smaller and much simpler to parse than JSON. The encoded data is self-describing and extensible, without using any schema definitions. It supports the same data types as JSON, plus binary data, non-string map keys, all float values, and 64-bit numbers. Msgpack values use <lenght><data> encoding, so they can be safely concatenated and read from a stream.

MessagePack is similar to CBOR, but has simpler data types (no bignums, decimal floats, dates, or indefinite-length sets, etc.)

Requirements

An up-to-date stable version of Rust, preferably from rustup.

msgpack-rust's People

Contributors

Stargazers

Watchers

Forkers

mbudde bombela erickt cameronism tempbottle cocaine-teamcity-bot codyps fs-kazuma-takeda dyule superfluffy jrasky plietar knsd jeremyletang euclio datariot ryansname theduke nateozem oroshnivskyy torchrs sacherjj sx91 termoshtt sergiobenitez pftbest dbrgn alex-shapiro hjiayz rohitjoshi gavento killthemule scrtlabs dani-garcia aszkid groveco daboross kornelski mobilecoinofficial bbshelper stusmall eclipseo klieth ignatenkobrain pmnoxx fasterthanlime unrealhoang vmx vext01 rich-murphey omegablitz yuejw fhsgoncalves lynzrand byron fx-kirin zerospam santoshkc kmz yushiomote analog-hors lucretiel joshtriplett aobatact mesalock-linux rua bobby rmja aviramha vedantroy lyude 01intelligence hellish insertish reactorscram ajunlonglive standardgalactic techcable brandonros fghzxm ccmlm panda0125 mattico njeans playfloor isgasho abst-lib cccs-wecorre evanrichter dkushner ikrivosheev xgroleau binadamu-isiyoonekana dnaka91 js2xxx andrewth hulthe jonathanhood mdmarek shimaowo

msgpack-rust's Issues

Encoding then decoding `Option<Vec<u8>>` (and similar) loses `None` and gets `Some([])` instead

A quick repo setup for using serde to demonstrate: https://github.com/jmesmon/rmp-serde-bork/blob/master/src/main.rs.in

I don't know if this reproduces with rustc-serialize

Integration with serde

From serde-rs/serde#96.

I see it as an optional feature instead of rustc_serialize (or maybe leave both of them?).

Investigate how to cook serde (compile some examples, see how json serialization/deserialization is done etc.)
Implement a decoder.
Implement an encoder.

Merge `high` module into `core`

Currently it looks in the wrong place. Moreover there will be a single encoding/decoding function, maybe some traits, but not much.

Raw Binary support

Is Raw Binary supported at the moment? I believe Vec<u8> is serialized as an array of numbers instead of a Raw String. Is there a wrapper that can be used?

missing implementation of emit_enum

Hello,

It would be nice to be able encode and decode enums.
When it could be expected?

Regards,
Godfryd

Module decomposition

I need approximately the following modules:

lib.rs (common types)
core (low-level encoding/decoding without heap allocations)
- mod.rs (error and result types)
- decode.rs
- encode.rs
high
- mod.rs (error and result types)
- decode.rs
- endode.rs

Test decomposition

Historically we have a single large file containing all the code with its tests.

Implement From trait for Value

It would be much easier to work with values if you could just use primitive data types and have them converted automatically by the From trait.

Zero-copy binary decoding

Implement no-copy binary decoding from [u8].

How to read i32, if I don't know whether it is signed?

hey,

Some language interfaces for msgpack decide dynamically whether they use signed or unsigned integers. Specifically, I am working with java. Java does not have unsigned types, so if I pack an int (32 bits) the msgpack Java implementation chooses the best possible coding. If the value is small enough and positive, this may as well be FixPos, U8 or U16.

read_i32_loosely handles FixNeg, I8, I16 and I32, but no unsigned types. read_u32_loosely handles all unsigned types, but not the signed types. So if I know that the next value in a buffer fits into an i32, but it may be decoded as U16 (or FixPos), how do I read it?

Value & ValueRef

Do not forget about Borrow and ToOwned traits.

Value.
ValueRef.
Borrow trait.
ToOwned trait.

Core string decoding

After implementing high-level string decoding, there can be reviewed core::Error enum to be able to contain borrowed buffer sometimes.

For example, when failed to decode UTF8 buffer,

Variant encoding/decoding policy.

MessagePack has no specification about how to encode variant types. Thus we are free to do whatever we want.

Every Rust variant value can be represented as a tuple of index and a value, and we encode/decode them this way by default, but the given chose may be not ideal for you.

The first solution I've imaging is to introduce enum policies as a template parameter attached to Encoder and Decoder with default value of course.

Enforce symmetric naming for pack/unpack encode/decode functions

For example there is rmp::encode::write_array_len, but rmp::decode::read_array_size (thanks tedsta from reddit).

Readme

Value isn't encodable/decodable

I'm (slowly) working on #40, and I'm running into a problem with representing heterogenous lists (for arguments). I've resorting to using Vec<Value> as a struct field. However, since Value isn't encodable, it's difficult to serialize it as part of a larger struct.

Documentation bugs

There are some strange titles in documentation, need to fix them.

Encode fields using their names as keys

I wanted to use msgpack-rust as a drop-in replacement for serde-json. As it turns out, rmp-serde is ignoring field names in structs when serializing the data, which is very unfortunate. I produce a lot of data from numerical simulations and analyze them in Python, which is why I would like to be able to deserialize the msgpacked data using proper keys (without the need to write a custom deserialization strategy and keep it updated with changes to my rust structs). Is there a way to have rmp serialize struct fields using their names, i.e. much more akin to what serde-json does? For example, the code sample at the bottom of this post is deserialized as:

{'field_2': [1, [[[1.0, 1.0], [3.0, 3.0]]]],
 'field_1': [0, [[[2, 2], [4, 4]]]],
 'field_3': [2, [[[0, [[[2, 2], [4, 4]]]], [1, [[[1.0, 1.0], [3.0, 3.0]]]]]]]}

where I was expecting something closer to (i.e. not necessarily taking into account the name of the top-level struct):

{'field_2': [('cs',[1.0, 1.0]), ('ds',[3.0, 3.0])],
 'field_1': [('xs',[2, 2]), ('ys',[4, 4])],
 'field_3': [('foo', [('xs',[2, 2]), ('ys',[4, 4])]), ('bar', [('cs',[1.0, 1.0]), ('ds',[3.0, 3.0])])]}

Notice too, that the serde(rename="name") is being ignored.

use std::io::BufWriter;
use msgpack::Serializer;

#[derive(Serialize)]
enum Custom {
    Foo(Foo),
    Bar(Bar),
    FooBar(Vec<Custom>),
}

#[derive(Serialize)]
struct Foo {
    #[serde(rename="a")]
    xs: Vec<u64>,
    #[serde(rename="b")]
    ys: Vec<u64>,
}

#[derive(Serialize)]
struct Bar {
    cs: Vec<f64>,
    ds: Vec<f64>,
}

fn main() {
    let foo = Foo { xs: vec![2;2], ys: vec![4;2] };
    let bar = Bar { cs: vec![1.0;2], ds: vec![3.0;2] };
    let foobar = vec![Custom::Foo(foo), Custom::Bar(bar), ];

    let mut map = BTreeMap::new();

    let foo = Foo { xs: vec![2;2], ys: vec![4;2] };
    let bar = Bar { cs: vec![1.0;2], ds: vec![3.0;2] };
    map.insert("field_1".to_owned(), Custom::Foo(foo));
    map.insert("field_2".to_owned(), Custom::Bar(bar));
    map.insert("field_3".to_owned(), Custom::FooBar(foobar));

    let f = File::create("test.txt").unwrap();
    let mut writer = BufWriter::new(f);

    map.serialize(&mut Serializer::new(&mut writer)).unwrap()
}

Renaming

There are some renaming TODO's, most of them associated with internal stuff. I need to check them and to decide what to do.

Encoding/decoding of 0i32 fails

extern crate rmp as msgpack;
extern crate rustc_serialize;

use rustc_serialize::{Encodable, Decodable};
use msgpack::{Encoder, Decoder};

fn main() {
    let val = 0i32;
    let mut buf = Vec::new();
    val.encode(&mut Encoder::new(&mut buf)).unwrap();
    let mut decoder = Decoder::new(&buf[..]);
    let res: i32 = Decodable::decode(&mut decoder).unwrap();
    assert_eq!(val, res);
}

fails with

thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: TypeMismatch(PositiveFixnum(0))', /home/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:729

Change val to any non-zero value and the program runs without errors. Tested on both version 0.4 and 0.5.

Using the msgpack bin format family?

Is it possible to use the bin format family for encoding and decoding? When I try to encode [u8], I get arrays of ints and when I try to decode binary data I get errors. I see bin support in the rmp sources.

Decoding string

Handle all potential cases including errors while decoding strings:

insufficient bytes.
handle exactly len bytes, if the buffer provided is larger, when copying.
any UTF8 errors.
too small buffer to copy result, when copying.
~~return [u8] with len actually read, when copying.~~
return &str instead of &[u8], when no-copy.
~~return &[u8] read when failed to decode, when no-copy.~~

Modules overhaul

Some files are now too large (>2k lines of code), need to split them into submodules.

Also consider making both serialization modules behind feature gates, because it requires too much dependencies.

Error revolution

It's time to stabilize how error handling in RMP will look like.

For example we consider decoding operations. There are 2 major groups how it's possible to diverge error categories with its pros and cons.

One large error enum for all functions.
- Pros: easy to implement; no conversions between levels of abstraction; Unix way.
- Cons: strong documentation support is required; forcing users to mark unreachable branches with unreachable!.
Several error enums for each function and level of abstraction.
- Pros: each error value is self-documented; no unreachable! branches; hierarchical divergence between levels of abstraction.
- Cons: tricky to implement to right way; significant amount of code required (for example for From and Display traits); possible performance impact during chained error conversion.

Personally I like the second way - this is how error handling looks now and probably stays the same. But maybe someone sees a terrible mistake in this way, so I'm open for discussing.

encode vec<u8> as messagepack bin-family in rustc

Using rustc, is there a way to encode vec as bin data object instead of array of ints? The later takes up nearly twice as much space. I cannot just wrap Vec and Implement Encodable for that struct manually since the Encoder doesn't allow for calls to write_bin or to access to the underlying writer.
Is there a way to achieve the same with serde?

Fix RFC-1214 associated warnings.

See rust-lang/rfcs#1214 for more details.

RPC

Investigate https://github.com/msgpack-rpc/msgpack-rpc.

Seems like it's not so hard to implement.

read_str_ref is useless?

Maybe I misunderstand the purpose of read_str_ref, but the function seems to have two problems:

It returns &[u8] on success, not &str.
It doesn't give any indication on success how many bytes it read from the input buffer. How does the caller know how to extract the next message-packed field in the input buffer following the string it just read?

The second problem is what makes read_str_ref useless (as I understand it). Suppose the following snippet:

let r: &[u8]  = { /* generate input here */ };

// Read a string from the input.
let x = rmp::decode::read_str_ref(r).unwrap();
println!("{:?}", x);

// How to read next field? The `r` slice still points to the extracted string,
// not the next field in the input

Here's the current signature for read_str_ref.

fn read_str_ref(rd: &[u8]) -> Result<&[u8], DecodeStringError>

How about something like this instead?

fn read_str_ref(rd: &mut &[u8]) -> Result<(&str, &[u8]), DecodeStringError>

On success, the tuple (&str, &[u8]) pairs the string just read along with a slice to the beginning of the input immediately following the string. This would allow the caller to continue reading from the buffer.

let r: &[u8]  = { /* generate input here */ };

// Read a string from the input.
let (x, mut r) = rmp::decode::read_str_ref(r).unwrap();
println!("{:?}", x);

// Read next field from the input.
let x = rmp::decode::read_u32_loosely(&mut r).unwrap();
println!("{:?}", x);

docs for decoding needed

Hello,

Currently readme shows how to encode some simple data and struct.
It would be nice to see also some examples for decoding.

Regards,
Godfryd

Cargo tags

To ease searching the most appropriate crate.

Low level integer decoding

The following low-level decoding functions needs to be reviewed and documented:

Extend Encoder/Decoder interface

Allowing them to provide the underlying writer/reader.

Consider merging rmp, rmp-serde, and rmp-serialize using cargo features.

Having to wrap Value in rmp_serde::Value to get Serialize adds complexity to the implementation and puts additional weight on the user.

It might be best to use exclusive cargo features to pick which serialization implementation you want to use.

Unbounded buffers

I'd like to serialize structs with String fields of arbitrary length with this library, is it possible?
Not sure how can i create buffer array for that.

Update doc links for crates.io

read_i/uXX_loosely should be looser :)

They should accept any integer input that matches the target type.

For example read_i64_loosely, should accept [1]. It should even accept an unsigned 64-bit integer as long as the value is in the range of 64-bit signed integers.

msgpack is sometimes used with dynamic languages, that will often use whatever encoding is most efficient without regard to an unknown target type in a statically-typed language.

(In my case, messages are being generated by Python.)

Signal safety

Some low- and high-level functions are signal unsafe. That means if Read or Write given returns an Err with EINTR there are no way to continue decoding/encoding properly.

Investigate which functions and traits are signal unsafe by marking it in a documentation.
Implement signal-safe alternatives if possible.

When num cast fails

For example we've received, a decoded string (which size is strictly fits in u32, see MessagePack protocol), read its size and it found out that the size is not fit in usize (on 8- or 16-bit systems). The default behavior involves as operator, which panics on overflow. This is not what we want of course.

We can simply use num crate with it FromPrimitive trait, which manages all number cast magic inside, but I simply can't figure what I should do if such kind or error comes to me.

Okay, the 2 main questions are:

How to deal with that case?
How to test this on x86_64?

Compilation error: no method 'join' for Vec

Hi,

I tried using your package through vim-markdown-composer, and when compiling all the rust's packages, rmp fail and give me this message:

rmp-0.7.3/src/value.rs:58:22: 58:32 error: no method named 'join' found for type 'collections::vec::Vec<collections::string::String>' in the current scope

Support new serde map/seq serialization API

Release 0.8.0 contains significant breaking changes, see release notes for details.

Streaming interface?

Hi,

I'm trying to figure out if and how to feed deserializer chunks of data (eg. from a socket). In other serde-based libraries (eg. bincode) I can find a Reader and Writer support: http://tyoverby.com/bincode/bincode/rustc_serialize/struct.DecoderReader.html

How to do that with msgpack-rust?

Stack safety

The current implementation has data-dependent stack depth which can be exploited to crash the program if a recursive data structure is deserialized and the attacker controls the serialized data.

struct Entry {
  number: u64,
  subentries: Vec<Self>
}

An attacker could create a binary representation of a thousand-fold nested entry and thus crash the program as it exceeds the stack limit. Thanks to msgpack, such a message of death would be only a few thousand bytes in size.

To fix this, the implementation should count and limit the recursion depth (a default limit of 1000 should not break any non-malicious cases). I could implement this if you agree with my approach.

Seeing error when trying to serialize an enum with rmp-serde

I am trying to serialize an enum with rmp-serde and I am running into an error:

`Err` value: DeserializeError(Syntax("syntax error: expected an enum"))`

which seems to come from serde's default implementation of visit_enum.

I see that you have tests surround serializing enums, which is almost exactly what I am trying to do. I am wondering if there is something that I am missing here. Taking a look at your Travis builds it does not seem like those tests are being run there.

I already have serde_macros enabled and have custom_derive and plugin features, not sure what else I need.

Any help would be much appreciated!

problems with encoding to Vec<u8>

Hello,
I tried to encode to growable buffer (not fixed length one like it is used in example).
I change the example to:

extern crate rmp as msgpack;
extern crate rustc_serialize;
use rustc_serialize::Encodable;
use msgpack::Encoder;
fn main() {
    let val = (42u8, "the Answer");
    //let mut buf = [0u8; 23];
    let mut buf: Vec<u8> = Vec::new();
    val.encode(&mut Encoder::new(&mut &mut buf[..])).unwrap();
}

It panics on val.encode:

thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: Unimplemented', src/libcore/result.rs:731
stack backtrace:
   1:     0x7fe7d8338e8e - sys::backtrace::write::hd71fc66fa1c984beLqs
   2:     0x7fe7d833be35 - panicking::on_panic::hc1f1f7a7d3f1cd2al8w
   3:     0x7fe7d83352ae - rt::unwind::begin_unwind_inner::h26926b64783ab69bZNw
   4:     0x7fe7d83354fc - rt::unwind::begin_unwind_fmt::ha28396e9e1ce1f5e5Mw
   5:     0x7fe7d833b826 - rust_begin_unwind
   6:     0x7fe7d8365414 - panicking::panic_fmt::h44955cbbc9e564ec19B
   7:     0x7fe7d8328580 - result::Result<T, E>::unwrap::h3504239604260474918
                        at src/libcore/macros.rs:28
   8:     0x7fe7d832804d - main::habc3974e69a85df6iaa
                        at src/lib.rs:14
   9:     0x7fe7d83402a8 - rust_try_inner
  10:     0x7fe7d8340295 - rust_try
  11:     0x7fe7d833d968 - rt::lang_start::h9a2154723c054292i3w
  12:     0x7fe7d832eb9b - main
  13:     0x7fe7d7512a3f - __libc_start_main
  14:     0x7fe7d8327e28 - _start
  15:                0x0 - <unknown>

Coverage

Involves:

Introduce code coverage measuring and uploading it to https://coveralls.io
Notifications for every PR showing coverage changes.
Badge.

Deserialize strings and bytes without intermediate buffer with serde deserializer

There is buf field in Deseralizer which is used for strings and bytes deseralization: functions read_str_data and read_full takes a references to this buffer and Deserializer's R: Read, read some bytes from R: Read and writes it to the buffer.

Looks like it's unnecessary overhead if R is &[u8] because it's possible to use this bytes directly from &[u8].

I think it would be nice to have two different options:

deserialize from any generic R: Read, so we need in intermediate buffer,
deserialize from slice, without intermediate buffer.

Display trait for Value and ValueRef

Something like this:

[0, 1, ["nested"], {"key": true}, 3.1415, b"binary", nil]

Decoding arbitrary messages conveniently

Hi, I'm struggling with decoding arbitrary messages with msgpack-rust.
In examples in this repo it's know ahead of time what "type" decoded message will have, but what if messages can vary in structure?

Here's an example scenario.

A server needs to send several events to the client:

// Using pseudocode here
{ name: "event1", a: 12, b: ["text", "some more"] }
{ name: "event2" c: { d: "yet again" } }

These two events have different "types" and can't be decoded into same struct.

How do I decode a group of different messages properly?

Update readme

Make table with badges for each subcrate.
Documentation with examples for rmp.
Documentation with examples for rmp-serialize
Documentation with examples for rmp-serde.
Documentation links to sub crates.

String decoding

The following functions are required:

exact buffer (return String);
insufficient bytes (return all bytes read with Error);
extra bytes (just don't read them, return String);
invalid UTF8 (return invalid buffer with Error);