Coder Social home page Coder Social logo

msgpack-rust's Introduction

RMP - Rust MessagePack

RMP is a complete pure-Rust MessagePack implementation. MessagePack a compact self-describing binary serialization format.

This project consists of three crates:

Features

  • Convenient and powerful APIs

    RMP is designed to be lightweight and straightforward. There is a high-level API with support for Serde, which provides you convenient interface for encode/decode Rust's data structures using derive attribute. There are also low-level APIs, which give you full control over data encoding/decoding process, with no-std support and without heap allocations.

  • Zero-copy value decoding

    RMP allows to decode bytes from a buffer in a zero-copy manner. Parsing is implemented in safe Rust.

  • Robust, stable and tested

    This project is developed using TDD and CI, so any found bugs will be fixed without breaking existing functionality.

Why MessagePack?

It's smaller and much simpler to parse than JSON. The encoded data is self-describing and extensible, without using any schema definitions. It supports the same data types as JSON, plus binary data, non-string map keys, all float values, and 64-bit numbers. Msgpack values use <lenght><data> encoding, so they can be safely concatenated and read from a stream.

MessagePack is similar to CBOR, but has simpler data types (no bignums, decimal floats, dates, or indefinite-length sets, etc.)

Requirements

  • An up-to-date stable version of Rust, preferably from rustup.

Build Coverage Status

msgpack-rust's People

Contributors

3hren avatar byron avatar cameronism avatar daboross avatar danielyule avatar dbrgn avatar eclipseo avatar erickt avatar euclio avatar joshtriplett avatar killthemule avatar knsd avatar kornelski avatar lucretiel avatar marhkb avatar petrochenkov avatar rich-murphey avatar rolag avatar rua avatar ryansname avatar sergiobenitez avatar suhr avatar sx91 avatar techcable avatar termoshtt avatar unrealhoang avatar vedantroy avatar vext01 avatar vmx avatar warmongeringbeaver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

msgpack-rust's Issues

Integration with serde

From serde-rs/serde#96.

I see it as an optional feature instead of rustc_serialize (or maybe leave both of them?).

  • Investigate how to cook serde (compile some examples, see how json serialization/deserialization is done etc.)
  • Implement a decoder.
  • Implement an encoder.

Merge `high` module into `core`

Currently it looks in the wrong place. Moreover there will be a single encoding/decoding function, maybe some traits, but not much.

Raw Binary support

Is Raw Binary supported at the moment? I believe Vec<u8> is serialized as an array of numbers instead of a Raw String. Is there a wrapper that can be used?

Module decomposition

I need approximately the following modules:

  • lib.rs (common types)
  • core (low-level encoding/decoding without heap allocations)
    • mod.rs (error and result types)
    • decode.rs
    • encode.rs
  • high
    • mod.rs (error and result types)
    • decode.rs
    • endode.rs

Test decomposition

Historically we have a single large file containing all the code with its tests.

Implement From trait for Value

It would be much easier to work with values if you could just use primitive data types and have them converted automatically by the From trait.

How to read i32, if I don't know whether it is signed?

hey,

Some language interfaces for msgpack decide dynamically whether they use signed or unsigned integers. Specifically, I am working with java. Java does not have unsigned types, so if I pack an int (32 bits) the msgpack Java implementation chooses the best possible coding. If the value is small enough and positive, this may as well be FixPos, U8 or U16.

read_i32_loosely handles FixNeg, I8, I16 and I32, but no unsigned types. read_u32_loosely handles all unsigned types, but not the signed types. So if I know that the next value in a buffer fits into an i32, but it may be decoded as U16 (or FixPos), how do I read it?

Value & ValueRef

Do not forget about Borrow and ToOwned traits.

  • Value.
  • ValueRef.
  • Borrow trait.
  • ToOwned trait.

Core string decoding

After implementing high-level string decoding, there can be reviewed core::Error enum to be able to contain borrowed buffer sometimes.

For example, when failed to decode UTF8 buffer,

Variant encoding/decoding policy.

MessagePack has no specification about how to encode variant types. Thus we are free to do whatever we want.

Every Rust variant value can be represented as a tuple of index and a value, and we encode/decode them this way by default, but the given chose may be not ideal for you.

The first solution I've imaging is to introduce enum policies as a template parameter attached to Encoder and Decoder with default value of course.

Value isn't encodable/decodable

I'm (slowly) working on #40, and I'm running into a problem with representing heterogenous lists (for arguments). I've resorting to using Vec<Value> as a struct field. However, since Value isn't encodable, it's difficult to serialize it as part of a larger struct.

Encode fields using their names as keys

I wanted to use msgpack-rust as a drop-in replacement for serde-json. As it turns out, rmp-serde is ignoring field names in structs when serializing the data, which is very unfortunate. I produce a lot of data from numerical simulations and analyze them in Python, which is why I would like to be able to deserialize the msgpacked data using proper keys (without the need to write a custom deserialization strategy and keep it updated with changes to my rust structs). Is there a way to have rmp serialize struct fields using their names, i.e. much more akin to what serde-json does? For example, the code sample at the bottom of this post is deserialized as:

{'field_2': [1, [[[1.0, 1.0], [3.0, 3.0]]]],
 'field_1': [0, [[[2, 2], [4, 4]]]],
 'field_3': [2, [[[0, [[[2, 2], [4, 4]]]], [1, [[[1.0, 1.0], [3.0, 3.0]]]]]]]}

where I was expecting something closer to (i.e. not necessarily taking into account the name of the top-level struct):

{'field_2': [('cs',[1.0, 1.0]), ('ds',[3.0, 3.0])],
 'field_1': [('xs',[2, 2]), ('ys',[4, 4])],
 'field_3': [('foo', [('xs',[2, 2]), ('ys',[4, 4])]), ('bar', [('cs',[1.0, 1.0]), ('ds',[3.0, 3.0])])]}

Notice too, that the serde(rename="name") is being ignored.

use std::io::BufWriter;
use msgpack::Serializer;

#[derive(Serialize)]
enum Custom {
    Foo(Foo),
    Bar(Bar),
    FooBar(Vec<Custom>),
}

#[derive(Serialize)]
struct Foo {
    #[serde(rename="a")]
    xs: Vec<u64>,
    #[serde(rename="b")]
    ys: Vec<u64>,
}

#[derive(Serialize)]
struct Bar {
    cs: Vec<f64>,
    ds: Vec<f64>,
}

fn main() {
    let foo = Foo { xs: vec![2;2], ys: vec![4;2] };
    let bar = Bar { cs: vec![1.0;2], ds: vec![3.0;2] };
    let foobar = vec![Custom::Foo(foo), Custom::Bar(bar), ];

    let mut map = BTreeMap::new();

    let foo = Foo { xs: vec![2;2], ys: vec![4;2] };
    let bar = Bar { cs: vec![1.0;2], ds: vec![3.0;2] };
    map.insert("field_1".to_owned(), Custom::Foo(foo));
    map.insert("field_2".to_owned(), Custom::Bar(bar));
    map.insert("field_3".to_owned(), Custom::FooBar(foobar));

    let f = File::create("test.txt").unwrap();
    let mut writer = BufWriter::new(f);

    map.serialize(&mut Serializer::new(&mut writer)).unwrap()
}

Renaming

There are some renaming TODO's, most of them associated with internal stuff. I need to check them and to decide what to do.

Encoding/decoding of 0i32 fails

extern crate rmp as msgpack;
extern crate rustc_serialize;

use rustc_serialize::{Encodable, Decodable};
use msgpack::{Encoder, Decoder};

fn main() {
    let val = 0i32;
    let mut buf = Vec::new();
    val.encode(&mut Encoder::new(&mut buf)).unwrap();
    let mut decoder = Decoder::new(&buf[..]);
    let res: i32 = Decodable::decode(&mut decoder).unwrap();
    assert_eq!(val, res);
}

fails with

thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: TypeMismatch(PositiveFixnum(0))', /home/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:729

Change val to any non-zero value and the program runs without errors. Tested on both version 0.4 and 0.5.

Using the msgpack bin format family?

Is it possible to use the bin format family for encoding and decoding? When I try to encode [u8], I get arrays of ints and when I try to decode binary data I get errors. I see bin support in the rmp sources.

Decoding string

Handle all potential cases including errors while decoding strings:

  • insufficient bytes.
  • handle exactly len bytes, if the buffer provided is larger, when copying.
  • any UTF8 errors.
  • too small buffer to copy result, when copying.
  • return [u8] with len actually read, when copying.
  • return &str instead of &[u8], when no-copy.
  • return &[u8] read when failed to decode, when no-copy.

Modules overhaul

Some files are now too large (>2k lines of code), need to split them into submodules.

Also consider making both serialization modules behind feature gates, because it requires too much dependencies.

Error revolution

It's time to stabilize how error handling in RMP will look like.

For example we consider decoding operations. There are 2 major groups how it's possible to diverge error categories with its pros and cons.

  1. One large error enum for all functions.
    • Pros: easy to implement; no conversions between levels of abstraction; Unix way.
    • Cons: strong documentation support is required; forcing users to mark unreachable branches with unreachable!.
  2. Several error enums for each function and level of abstraction.
    • Pros: each error value is self-documented; no unreachable! branches; hierarchical divergence between levels of abstraction.
    • Cons: tricky to implement to right way; significant amount of code required (for example for From and Display traits); possible performance impact during chained error conversion.

Personally I like the second way - this is how error handling looks now and probably stays the same. But maybe someone sees a terrible mistake in this way, so I'm open for discussing.

encode vec<u8> as messagepack bin-family in rustc

Using rustc, is there a way to encode vec as bin data object instead of array of ints? The later takes up nearly twice as much space. I cannot just wrap Vec and Implement Encodable for that struct manually since the Encoder doesn't allow for calls to write_bin or to access to the underlying writer.
Is there a way to achieve the same with serde?

read_str_ref is useless?

Maybe I misunderstand the purpose of read_str_ref, but the function seems to have two problems:

  1. It returns &[u8] on success, not &str.
  2. It doesn't give any indication on success how many bytes it read from the input buffer. How does the caller know how to extract the next message-packed field in the input buffer following the string it just read?

The second problem is what makes read_str_ref useless (as I understand it). Suppose the following snippet:

let r: &[u8]  = { /* generate input here */ };

// Read a string from the input.
let x = rmp::decode::read_str_ref(r).unwrap();
println!("{:?}", x);

// How to read next field? The `r` slice still points to the extracted string,
// not the next field in the input

Here's the current signature for read_str_ref.

fn read_str_ref(rd: &[u8]) -> Result<&[u8], DecodeStringError>

How about something like this instead?

fn read_str_ref(rd: &mut &[u8]) -> Result<(&str, &[u8]), DecodeStringError>

On success, the tuple (&str, &[u8]) pairs the string just read along with a slice to the beginning of the input immediately following the string. This would allow the caller to continue reading from the buffer.

let r: &[u8]  = { /* generate input here */ };

// Read a string from the input.
let (x, mut r) = rmp::decode::read_str_ref(r).unwrap();
println!("{:?}", x);

// Read next field from the input.
let x = rmp::decode::read_u32_loosely(&mut r).unwrap();
println!("{:?}", x);

docs for decoding needed

Hello,

Currently readme shows how to encode some simple data and struct.
It would be nice to see also some examples for decoding.

Regards,
Godfryd

Cargo tags

To ease searching the most appropriate crate.

Low level integer decoding

The following low-level decoding functions needs to be reviewed and documented:

  • read_pfix
  • read_nfix
  • read_i8
  • read_i16
  • read_i32
  • read_i64
  • read_i64_loosely
  • read_u8
  • read_u16
  • read_u32
  • read_u64
  • read_u64_loosely

Unbounded buffers

I'd like to serialize structs with String fields of arbitrary length with this library, is it possible?
Not sure how can i create buffer array for that.

read_i/uXX_loosely should be looser :)

They should accept any integer input that matches the target type.

For example read_i64_loosely, should accept [1]. It should even accept an unsigned 64-bit integer as long as the value is in the range of 64-bit signed integers.

msgpack is sometimes used with dynamic languages, that will often use whatever encoding is most efficient without regard to an unknown target type in a statically-typed language.

(In my case, messages are being generated by Python.)

Signal safety

Some low- and high-level functions are signal unsafe. That means if Read or Write given returns an Err with EINTR there are no way to continue decoding/encoding properly.

  • Investigate which functions and traits are signal unsafe by marking it in a documentation.
  • Implement signal-safe alternatives if possible.

When num cast fails

For example we've received, a decoded string (which size is strictly fits in u32, see MessagePack protocol), read its size and it found out that the size is not fit in usize (on 8- or 16-bit systems). The default behavior involves as operator, which panics on overflow. This is not what we want of course.

We can simply use num crate with it FromPrimitive trait, which manages all number cast magic inside, but I simply can't figure what I should do if such kind or error comes to me.

Okay, the 2 main questions are:

  • How to deal with that case?
  • How to test this on x86_64?

Compilation error: no method 'join' for Vec

Hi,

I tried using your package through vim-markdown-composer, and when compiling all the rust's packages, rmp fail and give me this message:

rmp-0.7.3/src/value.rs:58:22: 58:32 error: no method named 'join' found for type 'collections::vec::Vec<collections::string::String>' in the current scope

Stack safety

The current implementation has data-dependent stack depth which can be exploited to crash the program if a recursive data structure is deserialized and the attacker controls the serialized data.

struct Entry {
  number: u64,
  subentries: Vec<Self>
}

An attacker could create a binary representation of a thousand-fold nested entry and thus crash the program as it exceeds the stack limit. Thanks to msgpack, such a message of death would be only a few thousand bytes in size.

To fix this, the implementation should count and limit the recursion depth (a default limit of 1000 should not break any non-malicious cases). I could implement this if you agree with my approach.

Seeing error when trying to serialize an enum with rmp-serde

I am trying to serialize an enum with rmp-serde and I am running into an error:

`Err` value: DeserializeError(Syntax("syntax error: expected an enum"))`

which seems to come from serde's default implementation of visit_enum.

I see that you have tests surround serializing enums, which is almost exactly what I am trying to do. I am wondering if there is something that I am missing here. Taking a look at your Travis builds it does not seem like those tests are being run there.

I already have serde_macros enabled and have custom_derive and plugin features, not sure what else I need.

Any help would be much appreciated!

problems with encoding to Vec<u8>

Hello,
I tried to encode to growable buffer (not fixed length one like it is used in example).
I change the example to:

extern crate rmp as msgpack;
extern crate rustc_serialize;
use rustc_serialize::Encodable;
use msgpack::Encoder;
fn main() {
    let val = (42u8, "the Answer");
    //let mut buf = [0u8; 23];
    let mut buf: Vec<u8> = Vec::new();
    val.encode(&mut Encoder::new(&mut &mut buf[..])).unwrap();
}

It panics on val.encode:

thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: Unimplemented', src/libcore/result.rs:731
stack backtrace:
   1:     0x7fe7d8338e8e - sys::backtrace::write::hd71fc66fa1c984beLqs
   2:     0x7fe7d833be35 - panicking::on_panic::hc1f1f7a7d3f1cd2al8w
   3:     0x7fe7d83352ae - rt::unwind::begin_unwind_inner::h26926b64783ab69bZNw
   4:     0x7fe7d83354fc - rt::unwind::begin_unwind_fmt::ha28396e9e1ce1f5e5Mw
   5:     0x7fe7d833b826 - rust_begin_unwind
   6:     0x7fe7d8365414 - panicking::panic_fmt::h44955cbbc9e564ec19B
   7:     0x7fe7d8328580 - result::Result<T, E>::unwrap::h3504239604260474918
                        at src/libcore/macros.rs:28
   8:     0x7fe7d832804d - main::habc3974e69a85df6iaa
                        at src/lib.rs:14
   9:     0x7fe7d83402a8 - rust_try_inner
  10:     0x7fe7d8340295 - rust_try
  11:     0x7fe7d833d968 - rt::lang_start::h9a2154723c054292i3w
  12:     0x7fe7d832eb9b - main
  13:     0x7fe7d7512a3f - __libc_start_main
  14:     0x7fe7d8327e28 - _start
  15:                0x0 - <unknown>

Coverage

Involves:

  • Introduce code coverage measuring and uploading it to https://coveralls.io
  • Notifications for every PR showing coverage changes.
  • Badge.

Deserialize strings and bytes without intermediate buffer with serde deserializer

There is buf field in Deseralizer which is used for strings and bytes deseralization: functions read_str_data and read_full takes a references to this buffer and Deserializer's R: Read, read some bytes from R: Read and writes it to the buffer.

Looks like it's unnecessary overhead if R is &[u8] because it's possible to use this bytes directly from &[u8].

I think it would be nice to have two different options:

  1. deserialize from any generic R: Read, so we need in intermediate buffer,
  2. deserialize from slice, without intermediate buffer.

Decoding arbitrary messages conveniently

Hi, I'm struggling with decoding arbitrary messages with msgpack-rust.
In examples in this repo it's know ahead of time what "type" decoded message will have, but what if messages can vary in structure?

Here's an example scenario.

A server needs to send several events to the client:

// Using pseudocode here
{ name: "event1", a: 12, b: ["text", "some more"] }
{ name: "event2" c: { d: "yet again" } }

These two events have different "types" and can't be decoded into same struct.

How do I decode a group of different messages properly?

Update readme

  • Make table with badges for each subcrate.
  • Documentation with examples for rmp.
  • Documentation with examples for rmp-serialize
  • Documentation with examples for rmp-serde.
  • Documentation links to sub crates.

String decoding

The following functions are required:

  • exact buffer (return String);
  • insufficient bytes (return all bytes read with Error);
  • extra bytes (just don't read them, return String);
  • invalid UTF8 (return invalid buffer with Error);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.