Coder Social home page Coder Social logo

wasmx / fizzy Goto Github PK

View Code? Open in Web Editor NEW
207.0 12.0 33.0 3.63 MB

Fizzy aims to be a fast, deterministic, and pedantic WebAssembly interpreter written in C++.

License: Apache License 2.0

CMake 4.26% C++ 87.17% Python 0.38% C 3.46% WebAssembly 0.48% Rust 4.15% Shell 0.10%
webassembly interpreter cpp wasm blockchain hacktoberfest rust

fizzy's Introduction

Fizzy

webassembly badge readme style standard badge circleci badge codecov badge license badge

Fizzy aims to be a fast, deterministic, and pedantic WebAssembly interpreter written in C++.

Goals

I) Code quality

  • Clean and modern C++17 codebase without external dependencies
  • Easily embeddable (and take part of the standardisation of the "C/C++ embedding API")

II) Simplicity

  • Only implement WebAssembly 1.0 (and none of the proposals)
  • Interpreter only
  • Support only WebAssembly binary encoding as an input (no support for the text format (.wat/.wast))

III) Conformance

  • Should have 99+% unit test coverage
  • Should pass the official WebAssembly test suite
  • Should have stricter tests than the official test suite

IV) First class support for determistic applications (blockchain)

  • Support canonical handling of floating point (i.e. NaNs stricter than in the spec)
  • Support an efficient big integer API (256-bit and perhaps 384-bit)
  • Support optional runtime metering in the interpreter
  • Support enforcing a call depth bound
  • Further restrictions of complexity (e.g. number of locals, number of function parameters, number of labels, etc.)

Building and using

Fizzy uses CMake as a build system. This will build Fizzy as a library:

$ mkdir build && cd build
$ cmake ..
$ cmake --build .

C API

A public C API is provided for embedding the Fizzy engine in applications.

A small example of using the C API, which loads a wasm binary and executes a function named main:

#include <fizzy/fizzy.h>

bool execute_main(const uint8_t* wasm_binary, size_t wasm_binary_size)
{
    // Parse and validate binary module.
    const FizzyModule* module = fizzy_parse(wasm_binary, wasm_binary_size);

    // Find main function.
    uint32_t main_fn_index;
    if (!fizzy_find_exported_function_index(module, "main", &main_fn_index))
        return false;

    // Instantiate module without imports.
    FizzyInstance* instance = fizzy_instantiate(module, NULL, 0, NULL, NULL, NULL, 0);

    // Execute main function without arguments.
    const FizzyExecutionResult result = fizzy_execute(instance, main_fn_index, NULL, 0);
    if (result.trapped)
        return false;

    fizzy_free_instance(instance);
    return true;
}

CMake integration

Fizzy also provides a CMake package for easy integration in projects that use it:

find_package(fizzy CONFIG REQUIRED)
...
target_link_libraries(app_name PRIVATE fizzy::fizzy)

Rust

Fizzy also has a Rust binding. It is published on crates.io and the official documentation with examples can be read on docs.rs.

WASI

Building with the FIZZY_WASI option will output a fizzy-wasi binary implementing the WASI API (a very limited subset of wasi_snapshot_preview1, to be precise). It uses uvwasi under the hood. It can be used to execute WASI-compatible binaries on the command line.

$ mkdir build && cd build
$ cmake -DFIZZY_WASI=ON ..
$ cmake --build .

Try it with a Hello World example:

$ bin/fizzy-wasi ../test/smoketests/wasi/helloworld.wasm
hello world

Testing tools

Building with the FIZZY_TESTING option will output a few useful utilities:

$ mkdir build && cd build
$ cmake -DFIZZY_TESTING=ON ..
$ cmake --build .

These utilities are as follows:

fizzy-bench

This can be used to run benchmarks available in the test/benchmarks directory. Read this guide for a short introduction.

fizzy-spectests

Fizzy is capable of executing the official WebAssembly tests. Follow this guide for using the tool.

fizzy-testfloat

A tool for testing implementations of floating-point instructions. Follow this guide for using the tool.

fizzy-unittests

This is the unit tests suite of Fizzy.

Special notes about floating point

Exceptions

Certain floating point operations can emit exceptions as defined by the IEEE 754 standard. (Here is a summary from the GNU C Library). It is however up to the language how these manifest, and in C/C++ depending on the settings they can result in a) setting of a flag; b) or in traps, such as the SIGFPE signal.

Fizzy does not manipulate this setting, but expects it to be left at option a) of above, which is the default. In the GNU C Library this can be controlled via the feenableexcept and fedisableexcept functions.

The user of the Fizzy library has to ensure this is not set to trap mode. The behavior with traps enabled is unpredictable and correct behaviour is not guaranteed, thus we strongly advise against using them.

Rounding

The IEEE 754 standard defines four rounding directions (or modes):

  • to nearest, tie to even (default),
  • toward -∞ (rounding down),
  • toward +∞ (rounding up),
  • toward 0 (truncation).

The WebAssembly specification expects the default, "to nearest", rounding mode for instructions whose result can be influenced by rounding.

Fizzy does not manipulate this setting, and expects the same rounding mode as WebAssembly, otherwise the result of some instructions may be different. In the GNU C Library the rounding mode can be controlled via the fesetround and fegetround functions.

If strict compliance is sought with WebAssembly, then the user of Fizzy must ensure to keep the default rounding mode.

x87 FPU

On the 32-bit Intel i386 architecture an x87-compatible FPU is used by default to perform floating-point operations. The FPU is claimed to be IEEE-754 compliant, but there is one gotcha. The operations are executed with so-called internal precision and the results are rounded to the target precision at the end [1]. By default, the precision is set to 80-bit extended precision (except for VC++ runtime [2]). Unfortunately, this causes problems for 64-bit double precision operations (f64.add, f64.sub, f64.mul, f64.div) as the results may be different from when computed with double precision directly.

The FPU precision can be dynamically modified by using compiler intrinsics [1], but this has similar issues to controlling the rounding mode and there exists no C/C++ standard way of doing so.

We decided against fighting the x87 FPU quirks, because floating-point operations were not the top priorities. Instead of creating manual workarounds, a reasonable solution is to opt-in for using SSE2 instructions to implement WebAssembly floating-point instructions, not only for 64-bit (where it is the default), but 32-bit builds as well. This means for strict WebAssembly compliance the SSE2 instruction set is required.

This is controlled by the -msse2 -mfpmath=sse compiler options, and one can always override to experiment with the x87 FPU. Worth mentioning that the -mpc64 GCC compiler option is supposed to set the FPU to 64-bit double precision mode, but for an unknown reason this is not working.

See also:

  1. Deterministic cross-platform floating point arithmetics
  2. Intermediate Floating-Point Precision
  3. Verified Compilation of Floating-Point Computations

Development

Performance testing

We use the following build configuration for performance testing:

  • Release build type (-DCMAKE_BUILD_TYPE=Release)

    This enables -O3 optimization level and disables asserts.

  • Link-Time Optimization (-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=TRUE)

    This does not give big performance gains (v0.3: +1%, v0.4: -3%), but most importantly eliminates performance instabilities related to layout changes in the code section of the binaries. See #510 for example.

  • Latest version of GCC or Clang compiler

    The Fizzy built with recent versions of GCC and Clang has comparable performance. For LTO builds of Fizzy 0.4.0 Clang 10/11 is 4% faster than GCC 10.

  • libstdc++ standard library

  • No "native" architecture selection

    The compiler option -march=native is not used, leaving default architecture to be selected. Building for native CPU architecture can be easily enabled with CMake option -DNATIVE=TRUE. We leave the investigation of the impact of this for the future.

Releases

For a list of releases and changelog see the CHANGELOG file.

Maintainers

License

license badge

Licensed under the Apache License, Version 2.0.

fizzy's People

Contributors

axic avatar chfast avatar cjihrig avatar graydon avatar gumb0 avatar herobank110 avatar masha-maksimova avatar poemm avatar qix- avatar rodiazet avatar yellow-king21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fizzy's Issues

Extend wast2wasm4cpp to also work with Rust sources

wast2wasm4cpp is a tool proccessing .cpp files, which looks for a specific comment containing wasm source code and updates the binary code below it, e.g.

    /* wat2wasm
    (func (param i64) (result i64)
      (local i64)
      get_local 0
      set_local 1
      get_local 1
    )
    */
    const auto wasm =
        from_hex("0061736d0100000001060160017e017e030201000a0c010a01017e2000210120010b");

Here the source are the lines below /* wat2wasm and, the tool fills in the binary hex string within the from_hex() call.

Change this tool to emit Rust-compatible output if the filename is .rs:

    /* wat2wasm
    (func (param i64) (result i64)
      (local i64)
      get_local 0
      set_local 1
      get_local 1
    )
    */
    let input = hex::decode("0061736d0100000001060160017e017e030201000a0c010a01017e2000210120010b").unwrap();

Please note it must also run cargo fmt after the replacement, just like it runs clang-format for C/C++ sources.

Support big-endian architectures

Currently Fizzy is strictly little-endian only and considering our main target markets that seems to be appropriate. However for completeness (and supporting many MIPS devices) it may make sense considering this, iff it can be added without any slowdown to little-endian. That being said it is not a priority.

Alternative approach for import resolution in C API

Currently the fizzy_resolve_instantiate(module, <functions>, <tables>, <memories>, <globals>) function is used to resolve every single import in one go.

I think it would be nice to structure this differently, from a consumer side:

Externals externals(module);
externals.resolve_global(ExternalGlobal);
externals.resolve_globals(...); // with a span

Originally posted by @axic in #637 (comment)

Extend instantiate() with a memory limit

This would replace the hard MemoryPagesLimit in limits.h. If no limit is set by the user, it defaults to MemoryPagesLimit (256Mb). If it is set higher than 2^32-1 return an error (or just make the limit uint32).

Also set this limit to 4Gb in spectests in order to pass them.

Benchmarks framework for individual opcodes

For fizzy-bench-internal create a generator that generates wasm code focusing on a selected instruction.
For binary instructions it may use accumulation procedure. Fixed size array is provided as imported memory. The code may look like:

    (memory (import "" "m") 1 1)
    (func (result f32)
      (f32.load offset=0 (i32.const 0))

      (f32.load offset=4 (i32.const 0))
      (f32.add)
      (f32.load offset=8 (i32.const 0))
      (f32.add)
      (f32.load offset=12 (i32.const 0))
      (f32.add)
      (f32.load offset=16 (i32.const 0))
      (f32.add)
      ...
      (f32.load offset=N*4 (i32.const 0))
      (f32.add)
    )

Also generate a "baseline" benchmark case where the binary instruction is replaced with drop.

Design and implement a generator to unary instructions.

Create documentation

@chfast suggested we use sphinx/readthedocs, but we may as well just use markdown in the docs directory.

Here's an idea on general layout:

  • Introduction (what is fizzy, why, etc.)
  • Build & usage
  • Internals
    • Internal representation (separate instruction and data stream/stack)
    • "Optimisation techniques"
    • etc..

Add parser tests causing memory grow of control_stack

Oops, I have not see this. In this case this is fine as we use std::deque, but still I should have think about this before :) I guess I need some tests that causes stack memory grow here and then ASan should find the bug, if it ever happen.

Originally posted by @chfast in #319

Review stack handling (especially 32-bit vs 64-bit dirty upper bits)

If we add an assert(i64 & 0xffffffff00000000) == 0) to as<uint32_t>() we get a lot of failures. As an example:

[ RUN      ] end_to_end.milestone2
Assertion failed: ((i64 & 0xffffffff00000000) == 0), function as, file /Users/alex/Projects/fizzy/lib/fizzy/value.hpp, line 59.
Abort trap: 6

The first instruction unit test failing is execute_numeric.i32_eqz. Haven't investigated further. The test branch is in value_assert.

This came up #464 (comment).

Safe/unsafe API

Unsafe API

The "unsafe" API does not pass any type information (including arity) around. One of the sides needs to trust the other side provided correct data, otherwise 🗑️ or 💥.

It uses union Value as a basic element.

Execute

The execute() should be like:

void execute(Instance*, FuncIdx, Value* args);
  • The caller must provide the correct number of arguments (crash if too low).
  • The caller must provide the correct value types.
  • The result is returned in the args[0] (think: all args are popped from the stack and result is pushed).
  • So caller must also reserve space for the result (if function takes no arguments).

Instantiate

To instantiate, the user must provide all imported stuff.

  1. Globals
    1.1 A global may be passed as Value* (not sure about immutable globals).
    1.2 A list of globals: Value*[], i.e. Value**.
  2. Memory: (char*, size_t)
  3. Functions
    3.1. The simplest imported function is pointer void (*)(Value* args) with calling convention as in execute().
    3.2. But we may want/need to pass additional objects like Instance*, HostContext*, etc.

UB in OperandStack

There is potential undefined behavior in OperandStack when no locals. Then the stack top pointer is set to m_bottom - 1. This is undefined behavior because you are allowed to move pointer within the array or to the element after the array (not before).

Possible solutions

  1. Make the top pointer to always point to the first empty stack slot (init to m_bottom). This may negatively affect unary op, because m_top - 1 must be used instead of m_top being used currently.
  2. Grow the stack in reverse order. m_top = m_bottom + max_stack_size (i.e. the "end"). Push is *m_top-- = value`. This will work, but pointer arithmetic will be confusing.
  3. Always add unused stack item (aka stack red zone). So m_bottom[0] is always allocated but should never be used. Init m_top = m_bottom. This works good, but requires more memory.
  4. Conditionally add unused stack item when number of locals is zero. The rest works like 3. Requires more memory only in rare case of not having any locals (including params). Requires additional branch in initialization, but can be marked with [[unlikely]].

3 and 4 have additional property that you can assign execution result value unconditionally because you can always safely access stack.top(), although in may contain garbage value.

Add PGO builds

The PGO builds give good boost to the performance, but we need to select out training set.

Comparing o/master to ../pgouse/o/master
Benchmark                                                     Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------
fizzy/execute/blake2b/512_bytes_rounds_1                   -0.1681         -0.1740            99            82            99            82
fizzy/execute/blake2b/512_bytes_rounds_16                  -0.2512         -0.2512          1503          1125          1503          1125
fizzy/execute/ecpairing/onepoint                           -0.1878         -0.1878        473759        384788        473721        384762
fizzy/execute/keccak256/512_bytes_rounds_1                 -0.3023         -0.3023           116            81           116            81
fizzy/execute/keccak256/512_bytes_rounds_16                -0.3153         -0.3152          1713          1173          1713          1173
fizzy/execute/memset/256_bytes                             -0.2147         -0.2147             8             6             8             6
fizzy/execute/memset/60000_bytes                           -0.2239         -0.2239          1659          1287          1659          1287
fizzy/execute/mul256_opt0/input1                           -0.3080         -0.3079            33            23            33            23
fizzy/execute/ramanujan_pi/33_runs                         -0.3250         -0.3250           151           102           151           102
fizzy/execute/sha1/512_bytes_rounds_1                      -0.2244         -0.2244           104            81           104            81
fizzy/execute/sha1/512_bytes_rounds_16                     -0.2124         -0.2124          1443          1137          1443          1137
fizzy/execute/sha256/512_bytes_rounds_1                    -0.2048         -0.2048            97            77            97            77
fizzy/execute/sha256/512_bytes_rounds_16                   -0.1844         -0.1844          1323          1079          1323          1079
fizzy/execute/taylor_pi/pi_1000000_runs                    -0.0811         -0.0812         42727         39260         42724         39257
fizzy/execute/micro/eli_interpreter/exec105                -0.2854         -0.2854             6             4             6             4
fizzy/execute/micro/factorial/20                           -0.1208         -0.1208             1             1             1             1
fizzy/execute/micro/fibonacci/24                           -0.1348         -0.1348          5828          5042          5827          5042
fizzy/execute/micro/host_adler32/1                         -0.1522         -0.1522             0             0             0             0
fizzy/execute/micro/host_adler32/1000                      -0.2068         -0.2066            33            26            33            26
fizzy/execute/micro/icall_hash/1000_steps                  -0.1021         -0.1021            72            65            72            65
fizzy/execute/micro/spinner/1                              -0.1430         -0.1430             0             0             0             0
fizzy/execute/micro/spinner/1000                           -0.2250         -0.2250            10             8            10             8

Parser and validation design

Levels of parsing / decoding / validation

  1. "Unsafe" parser.
    It assumes the wasm module must be valid and happily reads it without checkout out-of-buffer access. Providing invalid module can crash the parser.
  2. "???" parser.
    It assumes the wasm module must be valid but have additional check preventing invalid memory access. Providing invalid module will produce invalid but deterministic execution results. The parser never crashes. Hopefully, this one is not much slower than the "unsafe" one.
  3. Validator.
    It parses the wasm module and also fully validates it.

What we currently have is mixture of 0 and 2 with partial validation.

Design

  1. wasm spec allows lazy validation of functions (executing invalid function traps). So the good option is to implement the parser for wasm this way - functions are validated before being called for the first time.
  2. wasm spec has "Verification Algorithm": https://webassembly.github.io/spec/core/appendix/algorithm.html
    (Thanks to Paul for the reference).
  3. The goal is to receive internal module representation needed for execution. Not needed information should be discarded during parsing.
  4. Validation can be a template param for Parser type (separating code for reader/parser and validator seems impractical).
#include <cstdint>

enum class ValType : uint8_t
{
    i32 = 0x7f,
    i64 = 0x7e,
};

template <bool Validate>
struct Parser
{
    ValType parseValType(uint8_t b)
    {
        if constexpr (Validate)
        {
            if (b != 0x7f && b != 0x7e)
                throw "invalid type";
        }

        return (b == 0x7e) ? ValType::i64 : ValType::i32;
    }
};

Support floating point inputs and outputs in the benchmarks

fizzy-bench aka test/bench/bench.cpp currently requires all the inputs and outputs to be integers. Extend this to support floating point.

This also needs changes in test/utils/wasm_engine.hpp and its implementations. This part was attempted in #507, which can be used as a reference.

Lastly change the ramanujan_pi and taylor_pi benchmarks to output floating point.

Improve code coverage

  • Add coverage for Rust. #706
  • Add GCC coverage for spectests. #649
  • Improve LLVM coverage for spectests - do it only for fizzy-spectests binary. #646
  • Upload spectests coverage to codecov with separate tag. #649

Calls optimization plan

Detailed calls optimization plan

Current status

In Fizzy 0.5, the internal calls work like this:

  1. The call instruction implementation needs to know the number or arguments in the callee. This is taken from module's function and type sections.
  2. The required number of argument is "packed" into span<const Value> (remember this is a reference type).
  3. The span of arguments is passed to the callee.
  4. Callee copies the argument to the locals storage. Also allocates space for local variables and stack in single go.
  5. In the end of callee execution, the execute() function returns the top stack item, if any.
  6. The caller must drop the call arguments from the stack (i.e. move the stack pointer).
  7. And copy the callee return value (if any) to the top of the stack. This is currently done by inspecting the function type, but can be done by inspecting ExecutionResult. #554

Plan

  1. Both caller and callee needs information about number of arguments. Caller takes this from the module, callee from the span. The callee can take it from the module as well, so execute() can skip this information by only passing const Value* without size. Although, in theory this may not be faster. #552
  2. Instead of allocating "stack space" in each call, we can pre-allocate shared execution-thread stack space. It can be fixed size (this can work as a valid wasm exhaustion limit) or infinite by allocate more segments when needed. #529
  3. When 2 is implemented, arguments don't need to be copied because there are already in the right place (unless new segment must be allocated).
  4. At the call end, locals must be dropped and the top stack item (if any) must be beginning of the "stack space" (where the first argument was). After returning, the caller will have the result on the top of the stack. Caller does not even need to know if there is any result.
  5. Therefore, we don't need to return anything from the execute() except for the "trapped" flag.

Bonus (out-of-order items)

  1. The call instruction always calls the same function, so the number or arguments can be encoded in immediates. No need to reach module each time.
  2. Similarly, all execution required information (max stack height, number of arguments) may be added to the Code type. Then accessing function and type sections is not needed during execution.

TODO

  1. It seems easy to wrap this "unsafe" execution API with a "safe" API which will check if passed argument are of the correct number and type. But also reverse-wrapping is needed for host function - unsafe host function entry point will only receive Value*. Some mechanism is needed to check and extract the type information about received arguments.

Importing memory and table in public APIs

There's currently a discrepancy between C++ and C APIs, where C++ instantiate takes vectors of imported tables and memory and C fizzy_instantiate takes single memory and single table.

Wasm 1.0 allows only single memory and table, so it's fine to pass only single (optional) one. Probably C++ API should be changed for consistency with C.

Support wasm-c-api next to fizzy-capi

wasm-c-api seems to be substantially different to fizzy-capi to have both supported. I am not sure whether it is easier to make it a wrapper over fizzy-capi or directly over the underlying C++ internals.

Support Apple Silicon natively

While due to Rosetta fizzy will just work on macOS, it would make sense considering supporting dual-binary builds with Apple Silicon.

From https://developer.apple.com/documentation/apple_silicon/addressing_architectural_differences_in_your_macos_code:
image

But probably stdc++ hides this difference for us.

CMake features: https://cmake.org/cmake/help/latest/envvar/CMAKE_APPLE_SILICON_PROCESSOR.html and https://cmake.org/cmake/help/latest/variable/CMAKE_OSX_ARCHITECTURES.html

And a helpful tip (from https://cutecoder.org/software/detecting-apple-silicon-shell-script/) to force architecture selection: arch -arm64 fizzy-bench

Rust binding

  • Validate #538
  • Parse #566
  • Instantiate #566
    • Providing imported functions #782
    • Providing imported globals
    • Providing imported memory
    • Providing imported tables
    • Providing memory limits
  • Unsafe execute #566
  • Execute #652 #705
  • Memory access #609
  • Module inspection
    • Initial support (with has_start_function) #728
    • Globals
    • Tables
  • Clone module #719

Support compilation with MSVC

This works depends on #543.

The goal is to support compilation on Windows using MSVC, with C++20 mode enabled:

  • Adjust CMake
  • Adjust CircleCI to compile on Windows
  • Create slim wrappers where needed

Based on this feature chart at least VS 2019 16.8 is needed.

Split up wasi implementation into unit testable pieces

Suggested here. It would be possible to place the wasi implementation into a new namespace (e.g. fizzy::wasi), in a separate file, and have it unit tested. Then the frontend (fizzy-wasi) would use those features and retain the integration testing (the cli tests) it has.

Clang12 performance comparison

Comparing clang-12 against clang-11, without LTO. An improvement is visible. Hopefully, not because of the code layout.

fizzy/parse/blake2b_mean                                           +0.0103         +0.0103            25            25            25            25
fizzy/instantiate/blake2b_mean                                     -0.0129         -0.0129            31            31            31            31
fizzy/execute/blake2b/512_bytes_rounds_1_mean                      -0.0361         -0.0361            81            78            81            78
fizzy/execute/blake2b/512_bytes_rounds_16_mean                     -0.0603         -0.0603          1212          1139          1212          1139
fizzy/parse/ecpairing_mean                                         +0.0061         +0.0061          1450          1459          1450          1459
fizzy/instantiate/ecpairing_mean                                   +0.0073         +0.0073          1507          1518          1507          1518
fizzy/execute/ecpairing/onepoint_mean                              -0.0359         -0.0359        408026        393370        408002        393373
fizzy/parse/keccak256_mean                                         -0.0002         -0.0002            46            46            46            46
fizzy/instantiate/keccak256_mean                                   +0.0028         +0.0028            50            50            50            50
fizzy/execute/keccak256/512_bytes_rounds_1_mean                    -0.0854         -0.0854           102            93           102            93
fizzy/execute/keccak256/512_bytes_rounds_16_mean                   -0.0907         -0.0907          1487          1352          1487          1352
fizzy/parse/memset_mean                                            +0.0022         +0.0022             6             6             6             6
fizzy/instantiate/memset_mean                                      +0.0056         +0.0056            10            10            10            10
fizzy/execute/memset/256_bytes_mean                                -0.0452         -0.0452             7             6             7             6
fizzy/execute/memset/60000_bytes_mean                              -0.0460         -0.0460          1497          1428          1497          1428
fizzy/parse/mul256_opt0_mean                                       +0.0004         +0.0004             8             8             8             8
fizzy/instantiate/mul256_opt0_mean                                 +0.0019         +0.0019            12            12            12            12
fizzy/execute/mul256_opt0/input1_mean                              -0.0561         -0.0561            28            27            28            27
fizzy/parse/ramanujan_pi_mean                                      +0.0110         +0.0110            26            26            26            26
fizzy/instantiate/ramanujan_pi_mean                                +0.0137         +0.0137            30            30            30            30
fizzy/execute/ramanujan_pi/33_runs_mean                            -0.0159         -0.0159           110           108           110           108
fizzy/parse/sha1_mean                                              +0.0003         +0.0003            42            42            42            42
fizzy/instantiate/sha1_mean                                        +0.0033         +0.0033            45            46            45            46
fizzy/execute/sha1/512_bytes_rounds_1_mean                         -0.0025         -0.0025            88            88            88            88
fizzy/execute/sha1/512_bytes_rounds_16_mean                        -0.0080         -0.0080          1224          1215          1224          1215
fizzy/parse/sha256_mean                                            +0.0029         +0.0029            70            70            70            70
fizzy/instantiate/sha256_mean                                      +0.0064         +0.0064            74            75            74            75
fizzy/execute/sha256/512_bytes_rounds_1_mean                       -0.0026         -0.0026            81            81            81            81
fizzy/execute/sha256/512_bytes_rounds_16_mean                      -0.0026         -0.0026          1115          1112          1115          1112
fizzy/parse/taylor_pi_mean                                         +0.0015         +0.0015             3             3             3             3
fizzy/instantiate/taylor_pi_mean                                   +0.0045         +0.0045             6             6             6             6
fizzy/execute/taylor_pi/pi_1000000_runs_mean                       -0.0051         -0.0051         40718         40511         40718         40511
fizzy/parse/micro/eli_interpreter_mean                             -0.0065         -0.0065             4             4             4             4
fizzy/instantiate/micro/eli_interpreter_mean                       -0.0047         -0.0046             8             8             8             8
fizzy/execute/micro/eli_interpreter/exec105_mean                   -0.0580         -0.0579             4             4             4             4
fizzy/parse/micro/factorial_mean                                   +0.0091         +0.0092             1             1             1             1
fizzy/instantiate/micro/factorial_mean                             +0.0018         +0.0018             1             1             1             1
fizzy/execute/micro/factorial/20_mean                              -0.0611         -0.0611             1             1             1             1
fizzy/parse/micro/fibonacci_mean                                   +0.0189         +0.0189             1             1             1             1
fizzy/instantiate/micro/fibonacci_mean                             +0.0157         +0.0157             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                              -0.0024         -0.0024          5171          5159          5171          5159
fizzy/parse/micro/host_adler32_mean                                +0.0012         +0.0012             2             2             2             2
fizzy/instantiate/micro/host_adler32_mean                          -0.0148         -0.0148             4             4             4             4
fizzy/execute/micro/host_adler32/1_mean                            -0.0374         -0.0374             0             0             0             0
fizzy/execute/micro/host_adler32/1000_mean                         -0.0260         -0.0260            31            31            31            31
fizzy/parse/micro/icall_hash_mean                                  +0.0017         +0.0017             3             3             3             3
fizzy/instantiate/micro/icall_hash_mean                            -0.0014         -0.0014             7             7             7             7
fizzy/execute/micro/icall_hash/1000_steps_mean                     -0.0511         -0.0511            69            66            69            66
fizzy/parse/micro/spinner_mean                                     -0.0233         -0.0233             1             1             1             1
fizzy/instantiate/micro/spinner_mean                               -0.0056         -0.0057             1             1             1             1
fizzy/execute/micro/spinner/1_mean                                 -0.0854         -0.0855             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                              -0.0276         -0.0276             9             9             9             9
fizzy/parse/stress/guido-fuzzer-find-1_mean                        +0.0022         +0.0022           146           146           146           146
fizzy/instantiate/stress/guido-fuzzer-find-1_mean                  -0.0017         -0.0017           176           175           176           175

Nice function execution API

I think using methods instead of free function may be nicer and will be more JavaScript-like.

In particular, for function execution we may have something like:

auto instance = instantiate();

auto fn = instance->fn("func1");
fn(0, 1.0);

instance->fn("func2")(1, 2);

instance->fn(0)(-0.0f);   // Get function by index.

Optimize fN.abs, fN.neg, fN.copysign instructions

The goal is to have the simplest and leanest implementation of the following WebAssembly instructions:

  • f32.abs, f64.abs
  • f32.neg, f64.neg
  • f32.copysign, f64.copysign

Current implementation uses C/C++ standard functions to implement the core of the instructions: std::fabs, std::copysign and -x for negation. Due to compilers optimization pipelines, this does not provide optimal implementation: the generated code sometimes unnecessary uses floating-point or SSE2 instructions.

Change the implementation to use matching unsigned integer types and bit manipulation.

E.g. for f32.abs it is enough to zero the most significant bit of the unsigned integer representation of the value.

float fabs(float a) noexcept
{
    return bit_cast<float>(bit_cast<uint32_t>(a) & 0x7fffffff);
}

Here are some additional tips and hints: https://godbolt.org/z/KaET9c.

Prototype metering

Implement metering based on the ewasm docs. Should have a separate gas table and go with everything set at 1 in the beginning, but should experiment with the numbers from here later.

We want to implement multiple metering methods and compare them via benchmarking.

The different methods:

  1. Naive runtime metering — each instruction in the execution loop checks for cost vs limit
  2. Block-based runtime metering — the basic blocks are summed up and there is a pseudo opcode which is injected or each block has a cost immediate
  3. Injected metering — a call to a useTicks function is injected

The injection is also implemented by "sentinel" but better to implement it here.

Running spec test suite

Spec test suite: https://github.com/WebAssembly/spec/tree/master/test/core

Converting *.wast files with a wast2json tool produces wasm files + json files like these:
https://gist.github.com/gumb0/a826ed24a6f16e5a6e88c7a9a88a8b3b
https://gist.github.com/gumb0/f7f3e443ced081fd1902c67881d0ac31

Basically for each command in json file we need to do one of the following:

  • module - instantiate module from file (it is then used in the following commands)
    • when instantiate failed, skip following tests
    • provide imports to instantiate when required (link exports of registered modules)
    • spectest module provided by test harness for importing in tests
  • register - register module to be used in the following commands for linking exports-imports of several modules (used in linking.wast, imports.wast, elem.wast)
  • actions
    • invoke an exported function (by name)
    • get exported global value (by name)
    • choosing module for action by name (many cases in linking.json)
  • assertions - check result of an action or module instantiation
    • assert_return
      • support case when no returned value expected (see break-drop.wast)
    • assert_trap
    • assert_exhaustion
    • assert_malformed (support needed only for "module_type": "binary")
    • assert_invalid
    • assert_unlinkable
    • assert_uninstantiable (in start.json and linking.json; in original wast it's assert_trap with module inside)
  • action without assert (see memory_redundancy.wast)

The spec for all possible asserts/actions/commands is at https://github.com/WebAssembly/spec/tree/master/interpreter#scripts
(all of the assertions mentioned there are used in test suite)

Benchmarking suite

  1. It will be based on evmone-bench: https://github.com/ethereum/evmone/tree/master/test/bench
  2. It will use Google Benchmark library.
  3. It will load all wasm binary files from a directory (with subdirectories).
  4. Each wasm file can have matching *.inputs file with inputs (see evmone: https://github.com/ethereum/evmone/blob/master/test/benchmarks/sha1_divs.inputs).
    1. Test cases are separated by empty lines.
    2. Each test cases has following lines:
      1. Test case name.
      2. Exported function name to be executed.
      3. Function arguments.
      4. Initial memory, hex encoded.
      5. Expected result.
      6. Expected memory after execution.

Limit verbosity of fizzy-spectests

The output from fizzy-spectests is quite large as it reports status of every executed test case. I expect users are mostly interested in failures. Some ideas how to improve it.

  1. Skip PASSED reports by default. Easy to implement.
  2. Repeat status of each file in the end. This will allow at least quickly find the file with failures and repeat the run on the subset of files.
  3. Use "dot notation" per file. Each test result is represented by a single character. E.g.
    binary.wast: ......F.....s.....
    Line 98: assert_return FAILED Incorrect returned value. Expected: 9219994337134247936 (0x7ff4000000000000) Actual: 0 (0x0)
    Line 99: assert_return SKIPPED Unsupported feature: Floating point instruction.
    

fizzy-spectests doesn't use import resolve API

resolve_imported_functions and other resolve_* functions from #637 in their current form are not convenient to use to link exports/imports of several modules together. Resolve API and suits more to import host functions (because of convenience of flattened ImportedFunction struct and other Imported* - fewer curly braces on the caller side).

Importing from one module to another requires External* structs - which are returned from find_exported_* functions - that's what spectest runner uses.

Perhaps resolve API could be improved to accomodate both needs.

Public C API

  • Validate #530
    • returning error code/message #772
  • Parse #576
    • returning error code/message #772
  • Instantiate #576
    • Providing imports
    • Providing memory pages limit #780
    • returning error code/message #772
    • With metering enabled #799
  • Execute #576
  • Module inspection
    • types #675
    • imports #683
    • function types #557
    • functions (get type index by function index)
    • table
      • has_table #684
      • limits
    • memory
      • has_memory #684
      • limits
    • global types #675
    • exports (get index by name)
    • exports - get export by index
    • start function
      • has_start_function #685
      • get index
    • element, data, code?
    • custom sections
  • Clone module #674
  • Instance inspection
    • memory
    • table
    • globals
  • Exports access (find by name)
  • Resolving imports
    • functions #608
    • table
    • memory
    • globals #660

Benchmarks results after all floating-point instructions implemented

I compared Fizzy 0.3.0 with #486 (which is the last FP instruction implemented).

The parsing time is unaffected (±2%).
The instantiation time is unaffected (±1%).

The comparison of execution times looks as flows:

fizzy/execute/blake2b/512_bytes_rounds_1                     +0.0377
fizzy/execute/blake2b/512_bytes_rounds_16                    +0.0389
fizzy/execute/ecpairing/onepoint                             (statistically not conclusive)
fizzy/execute/keccak256/512_bytes_rounds_1                   -0.0105
fizzy/execute/keccak256/512_bytes_rounds_16                  -0.0142
fizzy/execute/memset/256_bytes                               -0.0090
fizzy/execute/memset/60000_bytes                             -0.0068
fizzy/execute/mul256_opt0/input0                             +0.0398
fizzy/execute/mul256_opt0/input1                             +0.0440
fizzy/execute/sha1/512_bytes_rounds_1                        +0.0024
fizzy/execute/sha1/512_bytes_rounds_16                       +0.0049
fizzy/execute/sha256/512_bytes_rounds_1                      +0.0184
fizzy/execute/sha256/512_bytes_rounds_16                     +0.0249
fizzy/execute/micro/eli_interpreter/halt                     -0.0133
fizzy/execute/micro/eli_interpreter/exec105                  +0.0292
fizzy/execute/micro/factorial/10                             -0.0188
fizzy/execute/micro/factorial/20                             +0.0080
fizzy/execute/micro/fibonacci/24                             -0.0229
fizzy/execute/micro/host_adler32/1                           -0.0140
fizzy/execute/micro/host_adler32/100                         -0.0265
fizzy/execute/micro/host_adler32/1000                        -0.0283
fizzy/execute/micro/spinner/1                                -0.0359
fizzy/execute/micro/spinner/1000                             -0.0288

The mean is 1.4%, median is 1.8% (with micro benchmarks ignored).

I also compared the number of instruction in the execute() function (without the "cold" part):

0.3.0:   1517
all FPs: 2323  (+53%)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.