bytecodealliance / wasmtime Goto Github PK

A fast and secure runtime for WebAssembly

License: Apache License 2.0

Shell 0.10% WebAssembly 21.69% Rust 70.31% C 1.11% C++ 0.08% Assembly 0.04% Makefile 0.01% OCaml 0.03% Common Lisp 6.52% Dockerfile 0.01% CMake 0.03% JavaScript 0.08% CSS 0.01%

wasm jit sandbox cranelift wasi wasmtime webassembly runtime rust aot standalone

wasmtime's Introduction

`wasmtime`

A standalone runtime for WebAssembly

A Bytecode Alliance project

Guide | Contributing | Website | Chat

Installation

The Wasmtime CLI can be installed on Linux and macOS (locally) with a small install script:

curl https://wasmtime.dev/install.sh -sSf | bash

Windows or otherwise interested users can download installers and binaries directly from the GitHub Releases page.

Example

If you've got the Rust compiler installed then you can take some Rust source code:

fn main() {
    println!("Hello, world!");
}

and compile/run it with:

$ rustup target add wasm32-wasi
$ rustc hello.rs --target wasm32-wasi
$ wasmtime hello.wasm
Hello, world!

(Note: make sure you installed Rust using the rustup method in the official instructions above, and do not have a copy of the Rust toolchain installed on your system in some other way as well (e.g. the system package manager). Otherwise, the rustup target add... command may not install the target for the correct copy of Rust.)

Features

Fast. Wasmtime is built on the optimizing Cranelift code generator to quickly generate high-quality machine code either at runtime or ahead-of-time. Wasmtime is optimized for efficient instantiation, low-overhead calls between the embedder and wasm, and scalability of concurrent instances.
Secure. Wasmtime's development is strongly focused on correctness and security. Building on top of Rust's runtime safety guarantees, each Wasmtime feature goes through careful review and consideration via an RFC process. Once features are designed and implemented, they undergo 24/7 fuzzing donated by Google's OSS Fuzz. As features stabilize they become part of a release, and when things go wrong we have a well-defined security policy in place to quickly mitigate and patch any issues. We follow best practices for defense-in-depth and integrate protections and mitigations for issues like Spectre. Finally, we're working to push the state-of-the-art by collaborating with academic researchers to formally verify critical parts of Wasmtime and Cranelift.
Configurable. Wasmtime uses sensible defaults, but can also be configured to provide more fine-grained control over things like CPU and memory consumption. Whether you want to run Wasmtime in a tiny environment or on massive servers with many concurrent instances, we've got you covered.
WASI. Wasmtime supports a rich set of APIs for interacting with the host environment through the WASI standard.
Standards Compliant. Wasmtime passes the official WebAssembly test suite, implements the official C API of wasm, and implements future proposals to WebAssembly as well. Wasmtime developers are intimately engaged with the WebAssembly standards process all along the way too.

Language Support

You can use Wasmtime from a variety of different languages through embeddings of the implementation.

Languages supported by the Bytecode Alliance:

Rust - the wasmtime crate
C - the wasm.h, wasi.h, and wasmtime.h headers, CMake or wasmtime Conan package
C++ - the wasmtime-cpp repository or use wasmtime-cpp Conan package
Python - the wasmtime PyPI package
.NET - the Wasmtime NuGet package
Go - the wasmtime-go repository
Ruby - the wasmtime gem

Languages supported by the community:

Elixir - the wasmex hex package
Perl - the Wasm Perl package's Wasm::Wasmtime

Documentation

📚 Read the Wasmtime guide here! 📚

The wasmtime guide is the best starting point to learn about what Wasmtime can do for you or help answer your questions about Wasmtime. If you're curious in contributing to Wasmtime, it can also help you do that!

It's Wasmtime.

wasmtime's People

Contributors

Stargazers

Watchers

Forkers

alexxnica kryndex m4b markswanson pepyakin sunfishcode anp froydnj vmosone geal buckle2000 sudosalim yurydelendik maxmcd kanaka vavrusa eira-fransham pombredanne cdetrio isgasho brianchin samparsky lalatendumohanty cjschneider2 colineberhardt hoangpq mbuhot mcculley elena2016 hadashia alexcrichton philipc tiborvass orlov-vo rylev sapessi linecode shritesh jakelang kustomzone poignardazur damons heyitsanthony wi11yp marcelopalacioabreu amaanc jlb6740 swiftwasm tolufash tschneidereit tungbt94 3hren afinch7 alanfoster stamo-gochev willscott gregoirevda ccgus callahad cambricorp bobo1239 sendilkumarn nertpinx happy-ferret robertomalatesta mbestavros mrowqa steveej-forks marmistrz abrown pkrasam flexanon backwardn vitiral dingelish newpavlov fitzgen kpbmarques acidburn0zzz peterhuene undingen zslayton xwang98 meadori-cs grandgarcon sin-ning thepythoniccow zoosky forkkit xujuntwt95329 libapps ryzokuken igrep bjorn3 dio oubotong mikevoronov joshtriplett jmfrank63 zhengyang92

wasmtime's Issues

wasm2obj: Implement memory initializers

of the test files, only call does not have memory initializer. I'd like to see it implemented, or at least stubbed out, so I can examine the object files produced.

RUST_BACKTRACE=1 cargo run --bin wasm2obj -- memory.wasm -o memory.o
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/wasm2obj memory.wasm -o memory.o`
%wasm_0x0(0): relocs: []
 error: FIXME: implement data initializers

How much work would it be to implement this, or at least add stubs so it doesn't stop the entire binary emission process?

idea: "Multi-return" functions for faster Rust `Result` and/or simpler exceptions

From the paper "Multi-return Function Call" (http://www.ccs.neu.edu/home/shivers/papers/mrlc-jfp.pdf).
The basic idea from the perspective of compiled code is to include multiple return pointers in a stack frame so functions can return to different places.

Compared to `Result<T,E>`

This is denotationally the same as return a value of a Rust enum with 1 variant according to each of the return pointer slots, with fields according to the returned data associated with that slot (registers, spilled stack slots, etc). But with the naive enum calling convention of adding a tag field, the caller needs to branch on the tag field, even if the enum value was just created before the return so nothing is in principle unpredictable. In the common case of a function "rethrowing" a Err, (Err(e0) => ... return Err(e1) ... math arm), the native way results results on O(n) branches (one per stack frame) one each of the Err tags, while this way allows the error return pointer to point to disjoint control flow for the failure case, catching and rethrowing without additional branches, so the only 1 branch is the original failure condition.

Compared to unwinding

Success and failure control flow is implemented identically, avoiding significant effort on the part of compiler writers in maintaining a completely separate implementation concepts while optimizations can work with both, and don't get stuck on the success failure boundary. At run time, the lack of any DWARF-like interpreters reduces dependencies and simplifies things too.

In short, we have the asymptotic efficiency of unwinding with the implementation (compiler and run-time) efficiency of enum return.

I talked to @eddyb about this once and he said to talk to someone on #cranelift, but alas I am on IRC less these days and I forgot their nick. Opening this to make sure the idea isn't lost completely due to my negligence. Cheers.

Predicates on patterns

Suppose we want to compare 8-bit ints on a 32-bit RISC:

widen32.legalize(
    a << icmp('ult', x, y),
    Rtl(
        wx << uextend.i32(x),
        wy << uextend.i32(y),
        a << icmp('ult', wx, wy),
    ))

We want to generalize this pattern, but this transformation is only valid for the unsigned or sign-neutral condition codes, so this is wrong:

widen32.legalize(
    a << icmp(cc, x, y),
    Rtl(
        wx << uextend.i32(x),
        wy << uextend.i32(y),
        a << icmp(cc, wx, wy),
    ))

We need a way of specifying a predicate on the immediate cc. Ideally, this mechanism should share representation with the instruction predicates already supported by instruction encodings.

(Also note that the first example doesn't work either—we can't even require a fixed immediate value in the input pattern.)

Better handling of literals in Ast substitution/copy and semantic checking

Rationale: icmp,load,store all have an enumerable immediate field, base on whose value we may want to do something different in the semantics. As a result, we'd like to be able to match a different transform to a concrete piece of rtl, depending on the value of some of the immediate fields. To enable this we need several things:

A base class for all Literals - current (Enumerator, ConstantInt) and future (FlagSet)
Support substitutions Var->Literal, Literal->Var (requires changes to {Ast, Def,Rtl}.{copy, substitution}
Change verify_semantics() to enumerate all concrete values of enumerable immediates on top of enumerating all concrete typings.

imported functions

How do I configure the wasm environment so imported functions work?
If I use dlopen() and dlsym() to get the address of a function (call_test), how do I inject that function address into the wasm linear memory so when I call wasm functions they work correctly (because those wasm functions call my imported function).

Example: this Rust fn compiles down to wasm

#[no_mangle]
pub fn test(n: i32) -> i32 {
    let _a = unsafe {call_test(1)};  // how do I place the dlsym address for call_test() into the wasm environment?
    n
}

If there is no mechanism, and you have a moment, please consider writing out the steps to implement this and I'll do it.

Thanks!

Implement real jump tables

There is this panic

https://github.com/CraneStation/wasmtime/blob/c5f0cd7d5e350afe4dd2d18e8efc009ad7f1eae3/lib/environ/src/compilation.rs#L66-L73

what is this about? I've tried to use a simple br_table with a few arms and it seems to work fine.

Meta: pass the testsuite

This will require at least these features:

API for reading value of instance globals
API for executing exported functions on the instance. This includes:
- passing arguments and receiving result value in case of successful execution (#17)
call_indirect implementation (signature checking, checking the item isn't null, etc)
jump tables (#16)
proper instantiation that allows importing external functions
a way of defining host functions that can access embedder defined state

Spillling for branch instructions with constrained register classes

There seems to be a problems with branch instructions that take register operands from a constrained register class. The problem is the global values live across the branch into the destination EBB. These values can't be temporarily moved because the destination EBB expects them in their global registers.

Normally, the spiller will make sure that there are enough free registers for the branch instruction's own operands, but it can't guarantee that there are registers free in a constrained register class.

An example is an Intel brnz.b1 instruction whose controlling b1 operand is constrained to the ABCD register class. We can't currently guarantee the live-ins for the destination EBB are not taking up the whole ABCD register class.

Coloring: Handle tied constraints in coalescing

Tied constraints are when an input to an instruction has to be in the same register as an output, as is the case for most arithmetic instructions on x86.

Currently, Cranelift's register allocator handles this constraint in the coloring pass. However, the coloring pass is very late, when a lot decisions have already been made, and a lot of other constraints have been saved up to be solved at once.

One idea for doing this would be to extend the concept of CSSA form produced by the coalescing pass. CSSA is essentially about putting "phi-related" values into sets which can be allocated the same virtual register, because whenever the input to a phi and the output to a phi can occupy the same register, we avoid a copy. Tied constraints are very similar: we want the input to an instruction and the output to be in the same register, so coalescing them would also avoid requiring a copy.

In cases where the input and output register conflict, we could insert an explicit copy.

And since coalescing happens before spilling or coloring, this should mean that coloring wouldn't have to worry about these constraints.

Missing variants of umul/smul/ushl/sshl/ushr/sshr returning if it overflows

Coloring: Register hints

See also the discussion of biased coloring in #1029.

The register coloring pass is currently assigning simply the first available register to new register values. This can be improved with register hints to reduce the amount of register shuffling needed in the following cases:

An instruction requires an input operand in a specific register. For example, Intel's dynamic shift instructions require the shift amount in %rcx.
A value passed as a call argument must be in a register specified by the ABI.
Values belonging to the same virtual register should preferably be assigned the same physical register to minimize shuffling before branches.

There are also cases where the register hint is not a specific register, but rather a subset of the top-level register class:

Some instructions only accept a subset of registers. For example some of Intel's addressing modes can't use %r12 and others can't use %r13.
If a value is live across a call, it should be in one of the callee-saved registers. (SpiderMonkey doesn't have CSRs).

The LiveRange::affinity field is already used to track register class hints. When a value is used by an instruction with a reduced register class constraint, the affinity is intersected with the constraint. These hints are currently ignored, and we just assign registers from the top-level register class.

Individual register hints are not tracked anywhere. They could be computed by the reload pass.

Constraint processing

Hints, whether for register sets or singletons, require the constraint solver to be a bit more clever. It should use the hints as much as possible, but ignore them before failing to find a solution.

Anti-hints

Sometimes a hint can't be used because another value is already using the register we want. We can prevent this by trying to avoid assigning value to registers where they will get in the way later.

This can be done using a data structure similar to LLVM's register matrix. Whenever a value is given a hint (during the reload pass), its live range is inserted into the register matrix for the corresponding unit. The coloring pass can then check live ranges against the matrix to see if there is a conflict with other hinted values.

Fuzz various functions in cranelift

I've wanted to play with a fuzzing tool for a long time, so I brought up cargo-fuzz on cretonne. I think this could be a useful complement to the test suite, but I want to do the following before generating a PR:

Hook up a few more fuzzers.
Debug at least one fuzzing failure.
Document the debugging workflow for it -- the commit has some info, but the ASan backtrace is pretty useless. How do I get gdb at the point of failure?
Figure out if there's any useful integration to be done with the test suite.

Test instruction encoding against another toolchain

I have encountered at least one bug related to instruction encodings that was pretty tricky to isolate, but easy enough to fix, bytecodealliance/cranelift#211

It would be helpful to have a test suite that compared Cretonne's encodings against another toolchain (probably llvm, though gcc could work too). The meta-language could add a way to specify how to translate an instruction into assembly language text, and the encoding of the instruction could be compared against the output of another assembler. The domain size for each instruction is small enough that this could probably be an exhaustive test.

Better LICM

bytecodealliance/cranelift#133 and bytecodealliance/cranelift#138 started some work toward improving LICM, in particular to better handle loops that end in the middles of ebbs; see the actual PRs for more discussion.

I've now merged those patches into a branch here:

https://github.com/CraneStation/cranelift/tree/better_licm

Use callee-saved registers across calls

Currently, cranelift spills all registers across calls, without regard to whether they're callee-saved.

However the nice thing about callee-saved registers is that they're saved across calls ;-), and Cranelift indeed supports the callee side of this.

At a high level, the steps here are:

Generalize x86's callee_saved_gprs into callee_saved_regs and make it not specific to GPRs (this appears needed for saving XMM registers on windows_fastcall too).
Refactor this callee_saved_regs from being a private function to being a function in the TargetIsa trait, so that we can access the callee_saved_gprs from other places in the code.
Make the code mentioned above in spilling.rs consult the TargetIsa's callee_saved_regs set and skip spilling registers in the callee-saved set.

Passing arguments while calling exported functions

To call an exported function we need to be able to pass arguments to it. I have a two straw man approaches to implement that.

The first one: pass arguments dynamically. That is, every exported function should have a thunk generated. This thunk reads all arguments from the dynamic list of argument values (a-la wasmi's RuntimeValue) and then call into the original function with the whatever ABI it has. Upon calling the exported function, we check that the argument slice matches the signature and then call we call the thunk with the fixed signature (vmctx, args_ptr). This thunk could also convert and write the return value to the specifc pre-allocated location in the callee. Dumb but easy.

The second one. Generate exported functions with the specific ABI, say system_v. Then introduce an argument wrapper trait.

unsafe trait ArgsWrapper {
  fn signature() → WasmSignature;

  fn call(fn_ptr: usize, vmctx: usize, args: Self);
}

This trait's main purpose is to provide a way to get the signature dynamically and unpack the values for the actual call. We can generate impls for this trait by a macro. Here is an example impl for function with arity 2.

unsafe impl<A: HasWasmType, B: HasWasmType> ArgsWrapper for (A, B) {
  fn signature() → WasmSignature {
    WasmSignature::new(&[A::wasm_type(), B::wasm_type()])
  }
  unsafe fn call(fn_ptr: usize, vmctx: usize, args: (A, B)) {
    let (a1, a2) = args;
    let f = fn_ptr as extern "C" fn(usize, A, B);
    unsafe {
      f(vmctx, a1, a2)
    }
  }
}

Then actual call can be implemented as following:

fn call<A: ArgsWrapper >(func_name: &str, args: A) {
  let wasm_func = self.funcs.get(func_name).unwrap();
  assert_eq!(wasm_func.signature, A::signature());
  unsafe { A::call(wasm_func.ptr, args) }
}

I think this approach can be scaled to handle returning values as well.

@sunfishcode what do you think? Am I missing something, do you have better proposals?

Example combining wasm + simplejit

I'd like to use CraneLift as a JIT backend for a standalone WebAssembly embedding. However, from browsing the documentation I'm not sure how to do this, or even if it's fully possible yet. An example showing how to use cranelift_wasm and cranelift_simplejit together would be really helpful.

add binary files?

I wanted to test native code generation, but unless I missed something, only textual wasm files are supplied, but the tools require binary wasm modules.

Is there an easy way to generate binary from text versions in the repo, or do I need another tool?

stack protection

If SIP really does isolate, then perhaps wasm code shouldn't be able to smash the native stack.
Currently Cretonne expects wasm and native code to share the stack so for example wasm code can call an imported native function.

What I'd like:

SIP wasm environment/code runs using its own stack (sized to fit, mmap guarded).
ability to switch stacks when making wasm -> native calls, and native->wasm calls.

If imported fn return types are i32, then perhaps this could be marshalled using a register, and a few assembler instructions could take care of the return value and prologue/epilogue modifying the stack pointer.

Is this a crazy idea?

Thanks!

Debugging in Cranelift

It's become clear that Cranelift needs the facilities to debug the compiled code.

I propose implementing a trait DebugSink that looks something like this:

trait DebugSink {
  fn insert_inst(&mut self, inst: InstructionData, source_loc: SourceLoc, code_offset: CodeOffset);
  fn insert_func(&mut self, name: String, source_loc: SourceLoc, code_offset: CodeOffset);
}

Once insert_func is called, all instructions afterwards (until another function is inserted) belong to that function.

It could also take on the form of two traits:

trait DebugSinkFunc {
  fn insert_inst(&mut self, inst: InstructionData, source_loc: SourceLoc, code_offset: CodeOffset);
}

trait DebugSink {
  fn insert_func(&mut self, name: String, source_loc: SourceLoc, code_offset: CodeOffset) -> &mut DebugSinkFunc
}

Make "wasm" and "compile" subcommand output more consistent.

clif-util's "wasm" and "compile" subcommands essentially do the same thing, except that "wasm"'s input is a wasm file, and "compile"'s input is a clif file, but then they both do a full compilation. Currently, they print different things:

"wasm" has a -v option which enables colorized messages like "Translating... ok", "compile" doesn't
"compile" prints out the ".byte" sequence, "wasm" doesn't
"compile" runs the capstone disassembler, "wasm" doesn't

and possibly other differences. We should harmonize these two subcommands.

Defining the same encoding twice doesn't give an error

Mitigating Spectre attacks

Hello,

You are probably well aware, but some mainstream compilers are emitting retpolines to help mitigate Spectre variant 2 attacks. Do you have any plans to add a similar capability to the cretonne code generator (and/or do you think it makes sense for cretonne to do this sort of thing)?

Thanks,
Jon

/cc @tyler @pchickey

Use `//` for line comments?

Cranelift's clif files (example here) currently use ; for line comments. This is somewhat common in assembler languages, and LLVM IR, but it's not always immediately obvious for people from other backgrounds.

Using // would follow Cranelift's syntax for types v0: i32, -> i32, and so on in taking syntactic queues from Rust where it makes sense to do so. (Related: should we change function to fn? That doesn't feel as important because function is more self-explanatory there, but it's worth considering.)

// is two characters rather than ;'s one, but my intuition is that little things we can do to make IR dumps easier to approach for people not already familiar with them will end up being valuable in a variety of contexts, more so than absolute conciseness.

Infix operators in the text format?

Currently, cranelift IR is always printed with one instruction per line, eg.:

function %foo(i32, i32, i32) -> i32 {
ebb0(v0: i32, v1: i32, v2:i32):
    v3 = imul v0, v1
    v4 = iadd v3, v2
    return v4
}

What if we introduced some simple syntax sugar for instructions with only one use? It'd be in addition to the existing syntax. We could then (optionally) print that same code like this:

function %foo(i32, i32, i32) -> i32 {
ebb0(v0: i32, v1: i32, v2:i32):
    return v0 * v1 + v2
}

That would be much easier to read in many cases, which is of potential interest to cranelift developers, but also to cranelift users looking to understand how cranelift is compiling their code.

This also might make it even more interesting to switch to // comments (#471).

There's some ambiguity with syntax like v0 + 1, but I think we can resolve it by saying that we always use the _imm instruction when possible rather than emitting an iconst

And there's the question if value numbers for the intermediate values. My rough idea is that they'd just always use the next available value number.

There are other issues to consider too, such as printing srclocs and instruction encodings. But I think we could find reasonable ways to make these work. The main question is, is this idea worth pursuing?

support global-value offsets

Hit this error when trying to play around with a non-trivial wasm file (stack trace trimmed for readability):

   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:409
   6: <wasmtime_environ::environ::ModuleEnvironment<'data, 'module> as cranelift_wasm::environ::spec::ModuleEnvironment<'data>>::declare_table_elements
             at lib/environ/src/environ.rs:190
   7: cranelift_wasm::sections_translator::parse_elements_section
             at /home/froydnj/.cargo/registry/src/github.com-1ecc6299db9ec823/cranelift-wasm-0.18.1/src/sections_translator.rs:387
   8: cranelift_wasm::module_translator::translate_module
             at /home/froydnj/.cargo/registry/src/github.com-1ecc6299db9ec823/cranelift-wasm-0.18.1/src/module_translator.rs:91
   9: wasmtime_environ::environ::ModuleEnvironment::translate
             at lib/environ/src/environ.rs:60
  10: wasm2obj::handle_module
             at src/wasm2obj.rs:135
  11: wasm2obj::main
             at src/wasm2obj.rs:87
  12: std::rt::lang_start::{{closure}}
             at /checkout/src/libstd/rt.rs:74

Catching traps

Ref #14

We need to catch traps generated by page faults, ud2 and probably others (e.g. div by zero exceptions, but I'm not familiar how they are handled in cranelift).

As far as I know, we need to use signals on unix-like platforms. I have no idea how to handle these cases on other platforms (and even what platforms we would like to support at all).

I wonder can we provide this functionality out-of-box? Or should we require to setup all machinery from the user and just provide means to, for example, lookup trap codes?

switch from region to memmap?

#9 proposes ading a dependency on memmap, so we should consider using memmap rather than region for executable memory too.

Error: Call must have an encoding

~/Desktop/rust/wasmstandalone $ cd $(mktemp -d)
/tmp/tmp.rwXoFoGNoi $ cargo new --lib test-wasm
     Created library `test-wasm` project
/tmp/tmp.rwXoFoGNoi $ echo -e '\n[lib]\ncrate-type = ["cdylib"]' >> test-wasm/Cargo.toml
/tmp/tmp.rwXoFoGNoi $ echo -e '#[no_mangle]\npub fn nop() {}' > test-wasm/src/lib.rs
/tmp/tmp.rwXoFoGNoi $ (cd test-wasm && cargo rustc --target wasm32-unknown-unknown --release)
   Compiling test-wasm v0.1.0 (file:///tmp/tmp.rwXoFoGNoi/test-wasm)
    Finished release [optimized] target(s) in 0.38s
/tmp/tmp.rwXoFoGNoi $ cd -
/home/aidanhs/Desktop/rust/wasmstandalone
~/Desktop/rust/wasmstandalone $ cargo run --bin wasmstandalone -- /tmp/tmp.rwXoFoGNoi/test-wasm/target/wasm32-unknown-unknown/release/test_wasm.wasm
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/wasmstandalone /tmp/tmp.rwXoFoGNoi/test-wasm/target/wasm32-unknown-unknown/release/test_wasm.wasm`
error while processing /tmp/tmp.rwXoFoGNoi/test-wasm/target/wasm32-unknown-unknown/release/test_wasm.wasm: Verifier error: inst18: Call must have an encoding

What is the goal of this project?

What does the description means?

"Standalone JIT-style runtime support for WebAsssembly code in Cranelift"

Can I use this project as substitute for wasmi? Like instantiate a module providing imports (satisfied by other modules and/or host functions), and execute exports.

Or is it for running wasm executables (something like cervus)?

Optimize bitreverse using rotate instructions

The bitreverse seqences in lib/codegen/meta-python/base/legalize.py all end with two shifts and a bitwise or that effectively swap the low half of the value and the high half:
https://github.com/CraneStation/cranelift/blob/master/lib/codegen/meta-python/base/legalize.py#L445
https://github.com/CraneStation/cranelift/blob/master/lib/codegen/meta-python/base/legalize.py#L475
and others for the other types

It would be better to replace these trailing sequences with rotl_imm.

That change is the first step, however the catch is that rotl_imm isn't implemented in isel yet so we'll need to implement that too. See the encodings for shifts and non-imm rotates as well as the encodings for imm shifts for some examples.

Of course, in the future Cranelift is expected to have a pattern-matching optimization which would automatically optimize shift+bor sequences into rotates, however it doesn't have one right now, and even when it does, it would make the code simpler to just use rotate, and it's more efficient to just use the instruction we want than to emit sequences of instructions that we know will end up getting replaced.

VerifierStepResult is confusing

The verifier functions have an API that could probably be simplified. Functions which return VerifierStepResult must by convention also take an out-param VerifierErrors that will contain non-fatal errors.

Moreover, the T in VerifierStepResult<T> seems to always be set to (), so it's unused.

It seems the out param is redundant with the error that's present in the Result hidden VerifierStepResult. I think slightly modifying the interface of VerifierStepResult would avoid this out param:

let the Ok type be VerifierErrors, in case we only have non-fatal errors (they could be called "warnings").
let the Err type stay the same (VerifierErrors too), and contain non-fatal and fatal errors, if there was at least one fatal error.

Then we wouldn't need the errors outparam anymore, which looks cleaner and "more Rusty". It might mean that a few users of the fatal! etc macros would need to have their own errors variable, but that seems OK.

Thoughts?

Document how to use `sigid`

As noticed in bytecodealliance/cranelift#473, we don't currently have any documentation for how to use the sigid parameter attribute. We should add examples showing how to do WebAssembly-style signature checking.

Compact Layout representation

The ir::layout module keeps track of the ordering of instructions and extended basic blocks in a function. It is currently implemented with doubly linked lists of EBBs and instructions. All program points have a sequence number so the ProgramOrder trait can be implemented efficiently.

This representation uses 20 bytes per EBB and 16 bytes per instruction. We should experiment with a more compact layout representation:

Use entity maps to assign a sequence number to every EBB and every instruction.
Keep a B+-tree of the EBBs in layout order.
Keep a B+-tree of all the instructions in layout order.

This compact representation uses 8 bytes per EBB and 8 bytes per instruction plus a minimal overhead for the non-leaf nodes in the B+-trees.

The Cursor struct should probably contain a path to its position in both B+-trees which means that the standard library B-trees won't work.

SimpleJIT returns functions with unclear ownership semantics

Docs don't make it particularly clear who owns finalised functions and data returned from SimpleJITBackend.

From the code it looks like they are all pointers into a shared block of memory owned by SimpleJITBackend, but, if so, definitions don't even try to help with detecting cases where SimpleJITBackend is dropped but result remains.

One way to avoid this would be to reduce usages of raw pointers in SimpleJITBackend (they are really rather unnecessary hazards in that particular case) and instead return lifetime-annotated structures which can be further casted into a required function, which would allow to get proper compiler errors in case of dropping owner too early.

varargs support

In order to compile C code that calls printf in libc, we need to implement the caller side of varargs. And in order to compile printf itself, we need to implement the callee side.

encoding tables are non-shareable data

Looking at a recent-ish Firefox nightly, I see the following:

# Look at x86 cranelift symbols, sorted by (descending) size, in the .data.rel.ro section.
froydnj@hawkeye:~$ readelf -sW firefox/libxul.so|grep cranelift |sort -k 3 -g -r |awk '$7 == 27 { print }' |grep OBJECT |grep x86
123872: 0000000007342048  9480 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x8610enc_tables18RECIPE_CONSTRAINTS17h10c82bc0de555a21E
123849: 0000000007344550  3792 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x8610enc_tables12RECIPE_NAMES17h54e6a911fe6097bbE
123866: 0000000007345c20  1896 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x8610enc_tables17RECIPE_PREDICATES17hc3712077e222c860E
123908: 0000000007347da0   408 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x868settings11DESCRIPTORS17h03dfd2ae0adf323dE
123854: 0000000007346388   120 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x8610enc_tables15INST_PREDICATES17h3a3bab86e4a3b3e7E
123911: 0000000007347f38    96 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x868settings8TEMPLATE17hdfd5aa2b6a7f01ecE
123922: 0000000007341ea8    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers9GPR8_DATA17h4c7c028be6f5b4d8E
123921: 0000000007341f68    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers9FPR8_DATA17h4d3540abe0a3bbe2E
123920: 0000000007341e18    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers9FLAG_DATA17h06234297cfb01581E
123919: 0000000007341f38    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers9ABCD_DATA17h231c9e2f56f02113E
123918: 0000000007341db8    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers8GPR_DATA17h327360c70468dbffE
123917: 0000000007341de8    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers8FPR_DATA17h3d44028fd0ecbabdE
123915: 0000000007341f08    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers25GPR8_ZERO_DEREF_SAFE_DATA17h743615e457b73283E
123914: 0000000007341e78    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers24GPR_ZERO_DEREF_SAFE_DATA17h82d7795ce9382dfbE
123913: 0000000007341ed8    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers20GPR8_DEREF_SAFE_DATA17h460e84f7bb9ca949E
123912: 0000000007341e48    48 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers19GPR_DEREF_SAFE_DATA17had1b3296ef50e9ebE
123916: 0000000007341f98    32 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x869registers4INFO17h2a2b91216d7265ffE
123855: 0000000007345c00    32 OBJECT  LOCAL  DEFAULT   27 _ZN17cranelift_codegen3isa3x8610enc_tables16LEGALIZE_ACTIONS17h4ddebdf4d5a40777E

There's about 15k of symbols in cranelift_codegen::isa::x86 that live in .data.rel.ro, which renders them non-shareable. For Firefox's purposes, we'd really like to minimize the amount of data that lives in .data.rel.ro, moving it to .rodata if at all possible.

The usual problem for data winding up in .data.rel.ro is data structures that contain slices, and indeed, for RECIPE_CONSTRAINTS, the RecipeConstraints structure has a couple slices in it. I haven't looked at RECIPE_NAMES, RECIPE_PREDICATES, and so forth, but I assume the problems are similar there.

It would be great if things were shuffled around somehow so the data associated with these symbols could live in .rodata. unicode-rs/unicode-normalization#14 is one example of how we addressed this in another Rust library; I might have time to implement the same changes here, not sure though. We don't necessarily have to address all of these, but addressing the top five or so would make a big difference.

/cc @EricRahm

Automatic stack slot alignment

Stack slots can be defined with an explicit alignment, but the alignment can also be left out, and Cretonne "will pick an appropriate alignment for the stack slot based on its size and access patterns".

We should:

Clarify exactly what this means in the documentation. In particular, make it clear that the alignment can depend on the target ISA and OS.
Implement an optional alignment field on StackSlotData.
Implement the alignment inference algorithm.

The alignment inference algorithm needs to consider:

The function's incoming stack pointer alignment guarantees. If any stack slots are more aligned than the incoming stack pointer, dynamic stack realignment is required. We want to avoid that.
The stack_load and stack_store instructions accessing a stack slot will have a preferred alignment that depends on the target ISA.
Explicitly aligned stack slots may force a dynamic stack realignment anyway. In that case, there's no reason to bound the inferred alignment.

A TargetISA method that returns the preferred alignment for accessing a given type seems appropriate.

Inline Assembly.

Since cranelift is soon to be a backend for rust, it will need to support inline assembly. There is no good way to solve this right now, since rust uses the llvm inline asm syntax right now. I'm making this issue so we can think about this in the long-term.

wasm code accessing data on the native stack

It doesn't seem useful to give wasm code references to data structures on the native stack because:

the wasm code can only push/pop values on to the native stack - it can't directly reference data on the stack (because the stack is outside the 4/8 GiB wasm linear memory region).

Is this correct, or is there some mechanism I'm overlooking?

Thanks!

Coloring: Don't leave regmove/regspill/regfill behind

The regmove/regspill/regfill instructions are special. They're emitted by the coloring pass to temporarily override the assigned registers (func.locations) for values. This is surprising, and it means that very late passes which run after register allocation can't just look at func.locations and have to use RegDiversions and walk the IR in order to know what registers are assigned to what values.

They also currently require special casing to work around this issue.

It would be nice to find a way to either avoid using regfill/regspill/regfill, or to rewrite them once coloring is done, so that func.locations is left up to date.

Consider alternative methods for writing binary data

Currently, binemit has its own CodeSink trait for writing binary data. It isn't entirely satisfying, in part because it's an unsafe interface -- it doesn't perform bounds checking on the underlying data. While we can provide relatively safe interfaces to protect users from misusing the API, it's harder to be absolutely certain that the number of bytes compile says a function needs is the number of bytes emit_to_memory actually writes for that function.

One option would be to provide a safe checked version of MemoryCodeSink. However, it's also worth considering evaluating available crates that provide low-level byte buffer writing functionality, including:

Factor librustc_target from rustc into a standalone library

librustc_target is a library in rustc for implementing a lot of native ABI and toolchain logic. It has some overlap with target-lexicon, however it provides much more extensive features, especially including knowledge of calling conventions. It's written in a way which is mostly independent from the rest of rustc, so it's an interesting candidate for factoring out into a standalone library that Cranelift users and others could use to more easily integrate with native C ABI/toolchain environments.

This aligns fairly well with cranelift-codegen's rough design for calling conventions, in which it only does the lowest-level parts, and assumes that cranelift-frontend or other libraries will be the place for offering higher-level functionality such as handling struct types.

See this comment for some more details on librustc_target.

This library isn't Cranelift-specific, as other projects could make use of such a library too. That said, it is particularly interesting for use with Cranelift, so I'm posting an issue here so we can track it.

Add LLVM IR as a target code

I know that LLVM IR is not really similar with real machine isa, but there's some advantages if we can easily translate Cretonne IR to LLVM IR.

It's well-known that LLVM is awfully slow for "bad" code, while cretonne is fast-by-default but currently lacking deep optimizations. I saw that there's a plan to use cretonne for rustc debug mode. But why should compilers handles two different target IR? If we can generate "pretty good" LLVM IR by running cretonne with --target=llvm, not just rustc but also whole range of languages whose compiler is written in rust can benefit from it.

Use callee-saved registers strategically

Once prologue/epilogue generation knows how to allocate stack space, bytecodealliance/cranelift#187, it'll also need to emit code to save and restore callee-saved registers.

Explanation of scope

The readme says that cretonne is supposed to be a code generator for WebAssembly, but it's a bit unclear what that means, seeing how there doesn't appear to be anything to actually do with wasm in the code.

I heard that cretonne is meant to have inter-procedural optimizations as another layer built over top. Is this true, and always going to be the case?
Does another tool to translate wasm to cretonne IR need to be built in order to generate code for wasm?
Is it in scope for cretonne to output to wasm, or is it only intended to output to native code? (e.g. will I be able to use cretonne as a pure-rust binaryen alternative?)
Is cretonne meant to be useful in both JIT and AOT configurations?

Preferably, the readme or documentation should be able to answer these questions.

No list of supported flags in crate docs

Atomics

I am currently emitting non atomic versions. This is doesnt have a high priority for me.

DWARF line table support

DWARF line table support may be a good way to get started with the broader topic of producing debug info, as it's relatively straightforward, and quite valuable.

Briefly surveying the landscape, the three main options appear to be:

Add dwarf writing support to the gimli crate. The gimli crate doesn't have any support for writing DWARF, and I don't know whether the maintainers would be interested in accepting patches for that, but it's worth exploring.
Write a new DWARF producer library.
Create Rust bindings to an existing C/C++ dwarf producer library, such as libdwarf. It'd be fun to pursue a pure-Rust solution instead, however using a C/C++ library might be a quicker way to get something working without delving into the lowest level of DWARF encoding.

(There is also the dwarf crate has rudimentary support for writing, though not yet writing .debug_line sections. Judging by this comment, it seems no longer maintained, in favor of gimli.)

Regardless of how we implement it, the code for consuming cretonne IR and emitting DWARF should be in a new crate, as other users of the cretonne-codegen crate won't need it. It will need to emit the binary data for the section, plus relocation records telling the linker where to fix up program addresses.

The first step is to write a minimal .debug_info section, containing a DW_TAG_compile_unit, and a DW_TAG_subprogram entry for each function. Chapter 2 "General Description" of the dwarf spec defines the overall structure of these. I also recommend either the dwarfdump utility and/or readelf --debug-dump=info,lines for examining DWARF output from other compilers to get a sense of what this needs to look like. Then, we can implement the line table. Section 6.2 "Line Number Information" defines the line table format. I'll flesh out these steps more when we're ready; right now I'm just sketching out the major areas that would be covered.

Tracking issue for missing i8/i16 encodings

Opening this issue to prevent polluting the issue list

clz.i16

target x86_64

function %clz_i16(i16) -> i16 fast {
ebb0(v0: i16):
    v1 = clz.i16 v0
    return v1
}

function %clz_i16(i16 [%rdi]) -> i16 [%rax] fast {
                                ebb0(v0: i16):
                                    v1 = clz.i16 v0
[Op1ret#c3]                         return v1
                                    ^~~~~~ verifier inst1: v1 is a ghost value used by a real [Op1ret#c3] instruction
}

ireduce i16 -> i8

target x86_64

function %ireduce_i8(i16) -> i8 fast {
ebb0(v0: i16):
    v1 = ireduce.i8 v0
    return v1
}

function %ireduce_i8(i16 [%rdi]) -> i8 [%rax] fast {
                                ebb0(v0: i16):
                                    v1 = ireduce.i8 v0
[Op1ret#c3]                         return v1
                                    ^~~~~~ verifier inst1: v1 is a ghost value used by a real [Op1ret#c3] instruction

}

icmp.i8
icmp_imm.i8
ushr.i8
iadd_cout.i8
isub_bout.i8
select.i8
rotl.i8
rotr.i8

Edit: update for bytecodealliance/cranelift#524
Edit2: add isplit.i32 and iconcat.i32
Edit3: removed isplit.i32 and iconcat.i32

bytecodealliance / wasmtime Goto Github PK

wasmtime's Introduction

wasmtime

Guide | Contributing | Website | Chat

Installation

Example

Features

Language Support

Documentation

wasmtime's People

Contributors

Stargazers

Watchers

Forkers

wasmtime's Issues

Compared to Result<T,E>

Compared to unwinding

Constraint processing

Anti-hints

Recommend Projects

Recommend Topics

Recommend Org

`wasmtime`

Compared to `Result<T,E>`