Coder Social home page Coder Social logo

simdjzon's Introduction

⚠️ Work in progress. Expect bugs and/or missing features ⚠️

simdjzon

This is a port of simdjson, a high performance JSON parser developed by Daniel Lemire and Geoff Langdale to zig.

cpu support

Only 64 bit CPUs are supported so far.

x86_64

A CPU with AVX is required and CLMUL is preferred. the following usually have both

  • Intel - Haswell from 2013 onwards
  • AMD - Ryzen/EPYC CPU (Q1 2017)

These commands show how to test for specific target and cpu support

zig build test -Dtarget=x86_64-linux -Dcpu=x86_64+avx # uses clmulSoft - missing pclmul
zig build test -Dtarget=x86_64-linux -Dcpu=x86_64+avx+pclmul
# zig build test -Dtarget=x86_64-linux # doesn't work - missing avx
# zig build test -Dcpu=x86_64_v2 # doesn't work - missing avx

aarch64

A CPU with AES is preferred.

zig build test -Dtarget=aarch64-linux -Dcpu=apple_latest-aes -fqemu # uses clmulSoft
zig build test -Dtarget=aarch64-linux -fqemu

powerpc

Not supported yet

# zig build test -Dtarget=powerpc-linux -fqemu # doesn't work - no classify() + 32bit errors
# zig build test -Dtarget=powerpc64-linux -fqemu # doesn't work - no classify()

fallback

No fallback for unsupported CPUs is provided yet.

# zig build test -Dcpu=baseline # doesn't work - no classify()

zig compiler support

The main branch is meant to compile with zig's master branch. It is tested weekly on linux, windows and macos.

The zig-0.10.0 branch works with zig's 0.10.0 release. It is tested on linux only when it is updated.

usage

# json validation
$ git clone https://github.com/travisstaloch/simdjzon
$ cd simdjzon
$ zig build -Drelease-fast # uses the dom api by default
$ zig-out/bin/simdjzon test/test.json
$ echo $? # 0 on success
0
$ zig build -Drelease-fast -Dondemand # use the ondemand api
$ zig-out/bin/simdjzon test/test.json
$ echo $? # 0 on success
0
$ zig build test
All 19 tests passed.
const dom = @import("dom.zig");
test "get with struct" {
    const S = struct { a: u8, b: []const u8, c: struct { d: u8 } };
    const input =
        \\{"a": 42, "b": "b-string", "c": {"d": 126}}
    ;
    var parser = try dom.Parser.initFixedBuffer(allr, input, .{});
    defer parser.deinit();
    try parser.parse();
    var s: S = undefined;
    try parser.element().get(&s);
    try testing.expectEqual(@as(u8, 42), s.a);
    try testing.expectEqualStrings("b-string", s.b);
    try testing.expectEqual(@as(u8, 126), s.c.d);
}

test "at_pointer" {
    const input =
        \\{"a": {"b": [1,2,3]}}
    ;
    var parser = try dom.Parser.initFixedBuffer(allr, input, .{});
    defer parser.deinit();
    try parser.parse();
    const b0 = try parser.element().at_pointer("/a/b/0");
    try testing.expectEqual(@as(i64, 1), try b0.get_int64());
}

const ondemand = @import("ondemand.zig");
test "ondemand get with struct" {
    const S = struct { a: struct { b: []const u8 } };
    const input =
        \\{"a": {"b": "b-string"}}
    ;
    var src = std.io.StreamSource{ .const_buffer = std.io.fixedBufferStream(input) };
    var parser = try ondemand.Parser.init(&src, allr, "<fba>", .{});
    defer parser.deinit();
    var doc = try parser.iterate();

    var s: S = undefined;
    try doc.get(&s, .{ .allocator = allr });
    defer allr.free(s.a.b);
    try testing.expectEqualStrings("b-string", s.a.b);
}

test "ondemand at_pointer" {
    const input =
        \\{"a": {"b": [1,2,3]}}
    ;
    var src = std.io.StreamSource{ .const_buffer = std.io.fixedBufferStream(input) };
    var parser = try ondemand.Parser.init(&src, allr, "<fba>", .{});
    defer parser.deinit();
    var doc = try parser.iterate();
    var b0 = try doc.at_pointer("/a/b/0");
    try testing.expectEqual(@as(u8, 1), try b0.get_int(u8));
}

performance

parsing/validating twitter.json (630Kb)

simdjson

$ wget https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.h https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.cpp https://raw.githubusercontent.com/simdjson/simdjson/master/jsonexamples/twitter.json

$ cat main.cpp
#include "simdjson.h"
using namespace simdjson;
int main(int argc, char** argv) {
    if(argc != 2) {
        std::cout << "USAGE: ./simdjson <file.json>" << std::endl;
        exit(1);
    }
    dom::parser parser; 
    try
    {
        const dom::element doc = parser.load(argv[1]);
    }
    catch(const std::exception& e)
    {
        std::cerr << e.what() << '\n';
        return 1;
    }
    return 0;
}

$ g++ main.cpp simdjson.cpp -o simdjson -O3 -march=native
$ time ./simdjson twitter.json

real	0m0.003s
user	0m0.002s
sys	0m0.001s

$ echo $?
0

simdjzon

$ time zig-out/bin/simdjzon twitter.json 

real	0m0.002s
user	0m0.000s
sys	0m0.002s

$ echo $?
0

timed against simdjson, go, nim, zig std lib

The simdjson binary was compiled as shown above. Go and nim binaries created with sources from JSONTestSuite. zig std lib driver. Validation times for several large json files. Created with benchmark_and_plot.jl results

JSONTestSuite

Results of running simdjson and simdjzon through JSONTestSuite. Results are equal as of 8/7/21

results

simdjzon's People

Contributors

travisstaloch avatar validark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

simdjzon's Issues

possible api improvements

Ideas from sasuke420 discussion on discord:

  • an option to not care about the LSB of doubles
  • an option to store some numbers in 1 slot instead of 2, if they fit
  • an option to store some decimals exactly using some weird format including the base 10 exponent instead of as doubles, also in 1 slot
  • an option to be willing to parse numbers out of strings, if the person who made the json was a trickster who put them inside of string literals, but they are still actually numbers
  • 3 different options to assume fields will be in order and give nonsense results if they are not, assume fields will be in order and fall back to the current behavior if they are not, or use the current behavior (only affects parsing of structs)

Parser assumes all slices are strings

const std = @import("std");
const dom = @import("src/dom.zig");

test {
    const allocator = std.testing.allocator;

    const T = struct { xs: []struct { a: u8 } };

    const input =
        \\{ "xs": [
        \\{"a": 42}
        \\]}
    ;

    var parser = try dom.Parser.initFixedBuffer(allocator, input, .{});
    defer parser.deinit();
    try parser.parse();

    var s: T = undefined;
    try parser.element().get(&s);

    std.debug.print("\n{} \n", .{s});
}

The parser will fail, expecting the [{...}] to be a string.

test "ondemand struct iteration types" is failing

@sharpobject i started working on integrating your changes and removing libc dependency here using zig 0.10.0

i had to sligntly alter the asm version of mm256_shuffle_epi8 you posted in discord to get some of the tests passing.

so this

 var ret: u8x32 = asm (
    \\ vpshufb %[a], %[b], %[ret]
    : [ret] "=x" (-> u8x32)
    : [a] "x" (a),
      [b] "x" (b)
);

became the following - taken from my benchmarks game solution

return asm (
    \\ vpshufb %[mask], %[x], %[out]
    : [out] "=x" (-> v.u8x32),
    : [x] "+x" (x),
      [mask] "x" (mask),
);

anyway the problem now is that the test "ondemand struct iteration types" is failing. you can see it here. i think it has something to do with the _prev1/2/3() methods.

its wierd, if you look here, i tried comparing results with the old versions from utils.c. but as soon as i link libc, the error goes away and the test passes.

let me know if you spot anything amiss with the _prevN() methods or have any ideas about what is happening. this is the only failing test left out of 24 and i'm not sure whats causing it.

reconsider ci runners to re-enable an aarch64 job

currently, the macos ci job is running on an x86_64 machine. this changed sometime in the past few weeks or so from aarch64 to x86_64.

goal

revive macos aarch64 ci job or decide how best to test aarch64 code

notes

the job is set here:

jobs:
  build:
    runs-on: macos-latest

i did some searching for how to change this to use an aarch64 machine but didn't find anything.

found this https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

Runner image YAML workflow label
macOS Monterey 12 macos-latest, macos-12, macos-latest-xl or macos-12-xl
macOS Big Sur 11 macos-11
macOS Catalina 10.15 [deprecated] macos-10.15

questions

  • what runner to use to test the aarch64 code?

get rid of llvm_intrinsics.zig

convert all of the hacks found here and here to zig builtin calls. this will hopefully make it possible to build on mac and other platforms and close #1.

  1. try to use existing zig intrinsics if possible
  2. work toward implementing them in the zig compiler
  • saturating arithmetic - ziglang/zig#9619
  • carrylessMu()l (llvm.x86.pclmulqdq)
  • shuffleEpi8() (llvm.x86.avx2.pshuf.b)
  • shuffleEpi32() (llvm.x86.ssse3.pshuf.b.128)
  • vpalignr() (llvm.x86.vpalignr)
  • _mm256_movemask_epi8() (llvm.x86.avx2.pmovmskb)
  • _mm_maddubs_epi16() (llvm.x86.ssse3.pmadd.ub.sw.128)
  • _mm_madd_epi16() (llvm.x86.sse2.pmadd.wd)
  • _mm_packus_epi32() (llvm.x86.sse41.packusdw)
  • _prevN()
    • converted to @shuffle instruction in aarch64 branch

macos ci failure

looks like two errors:

  • '__builtin_ia32_permti256' needs target feature avx2
  • LLVM ERROR: Cannot select: 0x7f837444e7b0: v32i8 = X86ISD::PSHUFB
    ...
    in function llvm_intrinsics.test "pshufb"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.