Coder Social home page Coder Social logo

simdnbt's Introduction

simdnbt

Simdnbt is a very fast NBT serializer and deserializer.

It was originally made as a joke but it ended up being too good of a joke so it's actually a thing now.

Usage

cargo add simdnbt

Deserializing

For deserializing, you'll likely want either simdnbt::borrow::read or simdnbt::owned::read. The difference is that the "borrow" variant requires you to keep a reference to the original buffer, but is significantly faster.

use std::borrow::Cow;
use std::io::Cursor;

fn example(item_bytes: &[u8]) {
    let nbt = simdnbt::borrow::read(&mut Cursor::new(item_bytes))
        .unwrap()
        .unwrap();
    let skyblock_id: Cow<str> = nbt
        .list("i")
        .and_then(|i| i.compounds())
        .and_then(|i| i.get(0))
        .and_then(|i| i.compound("tag"))
        .and_then(|tag| tag.compound("ExtraAttributes"))
        .and_then(|ea| ea.string("id"))
        .map(|id| id.to_string_lossy())
        .unwrap_or_default();
}

Serializing

use simdnbt::owned::{BaseNbt, Nbt, NbtCompound, NbtTag};

let nbt = Nbt::Some(BaseNbt::new(
    "",
    NbtCompound::from_values(vec![
        ("key".into(), NbtTag::String("value".into())),
    ]),
));
let mut buffer = Vec::new();
nbt.write(&mut buffer);

Performance guide

Use the borrow variant of Nbt if possible, and avoid allocating unnecessarily (for example, keep strings as Cow<str> if you can).

The most significant and simple optimization you can do is switching to an allocator like mimalloc (it's ~20% faster on my machine). Setting RUSTFLAGS='-C target-cpu=native' when running your code may also help a little bit.

Implementation details

Simdnbt currently makes use of SIMD instructions for two things:

  • swapping the endianness of int arrays
  • checking if a string is plain ascii for faster mutf8 to utf8 conversion

Simdnbt cheats takes some shortcuts to be this fast:

  1. it requires a reference to the original data (to avoid cloning)
  2. it doesn't validate/decode the mutf-8 strings at decode-time

Benchmarks

Simdnbt is likely the fastest NBT decoder currently in existence.

Here's a benchmark comparing Simdnbt against a few of the other fastest NBT crates for decoding complex_player.dat:

Library Throughput
simdnbt::borrow 1.9725 GiB/s
simdnbt::owned 825.59 MiB/s
shen_nbt5 606.68 MiB/s
graphite_binary 363.94 MiB/s
azalea_nbt 330.46 MiB/s
valence_nbt 279.58 MiB/s
hematite_nbt 180.22 MiB/s
fastnbt 162.92 MiB/s

And for writing complex_player.dat:

Library Throughput
simdnbt::borrow 2.6116 GiB/s
simdnbt::owned 2.5033 GiB/s
azalea_nbt 2.4152 GiB/s
graphite_binary 1.8804 GiB/s

The tables above were made from the compare benchmark in this repo. Note that the benchmark is somewhat unfair, since simdnbt::borrow doesn't fully decode some things like strings and integer arrays until they're used. Also keep in mind that if you run your own benchmark you'll get different numbers, but the speeds should be about the same relative to each other.

simdnbt's People

Contributors

mat-1 avatar szabgab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

simdnbt's Issues

Unsoundness in swap_endianness

Simply put, the Vec::from_raw_parts is unsound in swap_endianness because it doesn't fit alignment requirements. Simple example shown below.

    #[test]
    fn test_swap_endianness_u64_vec() {
        assert_eq!(
            swap_endianness::<u64>(&[1, 2, 3, 4, 5, 6, 7, 8, 8, 7, 6, 5, 4, 3, 2, 1]),
            vec![
                u64::from_le_bytes([8, 7, 6, 5, 4, 3, 2, 1]),
                u64::from_le_bytes([1, 2, 3, 4, 5, 6, 7, 8])
            ]
        );
    }

The results when you run the tests in Miri says as much

test swap_endianness::tests::test_swap_endianness_u64_vec ... error: Undefined Behavior: constructing invalid value: encountered an unaligned refere
nce (required 8 byte alignment but found 2)
   --> /Users/ciel/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/raw.rs:109:9
    |
109 |         &*ptr::slice_from_raw_parts(data, len)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ constructing invalid value: encountered an unaligned reference (required 8 byte alignment but f
ound 2)
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE:
    = note: inside `std::slice::from_raw_parts::<'_, u64>` at /Users/ciel/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/libra
ry/core/src/slice/raw.rs:109:9: 109:47
    = note: inside `<std::vec::Vec<u64> as std::ops::Deref>::deref` at /Users/ciel/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/r
ust/library/alloc/src/vec/mod.rs:2709:18: 2709:64
    = note: inside `std::vec::partial_eq::<impl std::cmp::PartialEq for std::vec::Vec<u64>>::eq` at /Users/ciel/.rustup/toolchains/nightly-aarch64-a
pple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/partial_eq.rs:16:54: 16:58
note: inside `swap_endianness::tests::test_swap_endianness_u64_vec`

Running Miri on your code if you're gonna use unsafe is probably a good idea... ๐Ÿ‘€ alignment is easy to forget about, in this case, u64 wants alignment of 8 bytes, which the original byte array doesn't fufill

Unsafe with enum repr(u8)

Well... Just don't use unsafe

https://youtu.be/hBjQ3HqCfxs

match self {
    Byte(_) => BYTE_ID,
    Short(_) => SHORT_ID,
    Int(_) => INT_ID,
    Long(_) => LONG_ID,
    Float(_) => FLOAT_ID,
    Double(_) => DOUBLE_ID,
    ByteArray(_) => BYTE_ARRAY_ID,
    String(_) => STRING_ID,
    List(_) => LIST_ID,
    Compound(_) => COMPOUND_ID,
    IntArray(_) => INT_ARRAY_ID,
    LongArray(_) => LONG_ARRAY_ID,
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.