bytecodealliance / wasmtime Goto Github PK
View Code? Open in Web Editor NEWA fast and secure runtime for WebAssembly
Home Page: https://wasmtime.dev/
License: Apache License 2.0
A fast and secure runtime for WebAssembly
Home Page: https://wasmtime.dev/
License: Apache License 2.0
Stack slots can be defined with an explicit alignment, but the alignment can also be left out, and Cretonne "will pick an appropriate alignment for the stack slot based on its size and access patterns".
We should:
StackSlotData
.The alignment inference algorithm needs to consider:
stack_load
and stack_store
instructions accessing a stack slot will have a preferred alignment that depends on the target ISA.A TargetISA
method that returns the preferred alignment for accessing a given type seems appropriate.
I wanted to test native code generation, but unless I missed something, only textual wasm files are supplied, but the tools require binary wasm modules.
Is there an easy way to generate binary from text versions in the repo, or do I need another tool?
I'd like to use CraneLift as a JIT backend for a standalone WebAssembly embedding. However, from browsing the documentation I'm not sure how to do this, or even if it's fully possible yet. An example showing how to use cranelift_wasm and cranelift_simplejit together would be really helpful.
Tied constraints are when an input to an instruction has to be in the same register as an output, as is the case for most arithmetic instructions on x86.
Currently, Cranelift's register allocator handles this constraint in the coloring pass. However, the coloring pass is very late, when a lot decisions have already been made, and a lot of other constraints have been saved up to be solved at once.
One idea for doing this would be to extend the concept of CSSA form produced by the coalescing pass. CSSA is essentially about putting "phi-related" values into sets which can be allocated the same virtual register, because whenever the input to a phi and the output to a phi can occupy the same register, we avoid a copy. Tied constraints are very similar: we want the input to an instruction and the output to be in the same register, so coalescing them would also avoid requiring a copy.
In cases where the input and output register conflict, we could insert an explicit copy.
And since coalescing happens before spilling or coloring, this should mean that coloring wouldn't have to worry about these constraints.
As noticed in bytecodealliance/cranelift#473, we don't currently have any documentation for how to use the sigid
parameter attribute. We should add examples showing how to do WebAssembly-style signature checking.
The ir::layout
module keeps track of the ordering of instructions and extended basic blocks in a function. It is currently implemented with doubly linked lists of EBBs and instructions. All program points have a sequence number so the ProgramOrder
trait can be implemented efficiently.
This representation uses 20 bytes per EBB and 16 bytes per instruction. We should experiment with a more compact layout representation:
This compact representation uses 8 bytes per EBB and 8 bytes per instruction plus a minimal overhead for the non-leaf nodes in the B+-trees.
The Cursor
struct should probably contain a path to its position in both B+-trees which means that the standard library B-trees won't work.
Looking at a recent-ish Firefox nightly, I see the following:
# Look at x86 cranelift symbols, sorted by (descending) size, in the .data.rel.ro section.
froydnj@hawkeye:~$ readelf -sW firefox/libxul.so|grep cranelift |sort -k 3 -g -r |awk '$7 == 27 { print }' |grep OBJECT |grep x86
123872: 0000000007342048 9480 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x8610enc_tables18RECIPE_CONSTRAINTS17h10c82bc0de555a21E
123849: 0000000007344550 3792 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x8610enc_tables12RECIPE_NAMES17h54e6a911fe6097bbE
123866: 0000000007345c20 1896 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x8610enc_tables17RECIPE_PREDICATES17hc3712077e222c860E
123908: 0000000007347da0 408 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x868settings11DESCRIPTORS17h03dfd2ae0adf323dE
123854: 0000000007346388 120 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x8610enc_tables15INST_PREDICATES17h3a3bab86e4a3b3e7E
123911: 0000000007347f38 96 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x868settings8TEMPLATE17hdfd5aa2b6a7f01ecE
123922: 0000000007341ea8 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers9GPR8_DATA17h4c7c028be6f5b4d8E
123921: 0000000007341f68 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers9FPR8_DATA17h4d3540abe0a3bbe2E
123920: 0000000007341e18 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers9FLAG_DATA17h06234297cfb01581E
123919: 0000000007341f38 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers9ABCD_DATA17h231c9e2f56f02113E
123918: 0000000007341db8 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers8GPR_DATA17h327360c70468dbffE
123917: 0000000007341de8 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers8FPR_DATA17h3d44028fd0ecbabdE
123915: 0000000007341f08 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers25GPR8_ZERO_DEREF_SAFE_DATA17h743615e457b73283E
123914: 0000000007341e78 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers24GPR_ZERO_DEREF_SAFE_DATA17h82d7795ce9382dfbE
123913: 0000000007341ed8 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers20GPR8_DEREF_SAFE_DATA17h460e84f7bb9ca949E
123912: 0000000007341e48 48 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers19GPR_DEREF_SAFE_DATA17had1b3296ef50e9ebE
123916: 0000000007341f98 32 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x869registers4INFO17h2a2b91216d7265ffE
123855: 0000000007345c00 32 OBJECT LOCAL DEFAULT 27 _ZN17cranelift_codegen3isa3x8610enc_tables16LEGALIZE_ACTIONS17h4ddebdf4d5a40777E
There's about 15k of symbols in cranelift_codegen::isa::x86
that live in .data.rel.ro
, which renders them non-shareable. For Firefox's purposes, we'd really like to minimize the amount of data that lives in .data.rel.ro
, moving it to .rodata
if at all possible.
The usual problem for data winding up in .data.rel.ro
is data structures that contain slices, and indeed, for RECIPE_CONSTRAINTS
, the RecipeConstraints
structure has a couple slices in it. I haven't looked at RECIPE_NAMES
, RECIPE_PREDICATES
, and so forth, but I assume the problems are similar there.
It would be great if things were shuffled around somehow so the data associated with these symbols could live in .rodata
. unicode-rs/unicode-normalization#14 is one example of how we addressed this in another Rust library; I might have time to implement the same changes here, not sure though. We don't necessarily have to address all of these, but addressing the top five or so would make a big difference.
/cc @EricRahm
See also the discussion of biased coloring in #1029.
The register coloring pass is currently assigning simply the first available register to new register values. This can be improved with register hints to reduce the amount of register shuffling needed in the following cases:
%rcx
.There are also cases where the register hint is not a specific register, but rather a subset of the top-level register class:
%r12
and others can't use %r13
.The LiveRange::affinity
field is already used to track register class hints. When a value is used by an instruction with a reduced register class constraint, the affinity is intersected with the constraint. These hints are currently ignored, and we just assign registers from the top-level register class.
Individual register hints are not tracked anywhere. They could be computed by the reload pass.
Hints, whether for register sets or singletons, require the constraint solver to be a bit more clever. It should use the hints as much as possible, but ignore them before failing to find a solution.
Sometimes a hint can't be used because another value is already using the register we want. We can prevent this by trying to avoid assigning value to registers where they will get in the way later.
This can be done using a data structure similar to LLVM's register matrix. Whenever a value is given a hint (during the reload pass), its live range is inserted into the register matrix for the corresponding unit. The coloring pass can then check live ranges against the matrix to see if there is a conflict with other hinted values.
To call an exported function we need to be able to pass arguments to it. I have a two straw man approaches to implement that.
The first one: pass arguments dynamically. That is, every exported function should have a thunk generated. This thunk reads all arguments from the dynamic list of argument values (a-la wasmi's RuntimeValue
) and then call into the original function with the whatever ABI it has. Upon calling the exported function, we check that the argument slice matches the signature and then call we call the thunk with the fixed signature (vmctx, args_ptr)
. This thunk could also convert and write the return value to the specifc pre-allocated location in the callee. Dumb but easy.
The second one. Generate exported functions with the specific ABI, say system_v
. Then introduce an argument wrapper trait.
unsafe trait ArgsWrapper {
fn signature() → WasmSignature;
fn call(fn_ptr: usize, vmctx: usize, args: Self);
}
This trait's main purpose is to provide a way to get the signature dynamically and unpack the values for the actual call. We can generate impls for this trait by a macro. Here is an example impl for function with arity 2.
unsafe impl<A: HasWasmType, B: HasWasmType> ArgsWrapper for (A, B) {
fn signature() → WasmSignature {
WasmSignature::new(&[A::wasm_type(), B::wasm_type()])
}
unsafe fn call(fn_ptr: usize, vmctx: usize, args: (A, B)) {
let (a1, a2) = args;
let f = fn_ptr as extern "C" fn(usize, A, B);
unsafe {
f(vmctx, a1, a2)
}
}
}
Then actual call can be implemented as following:
fn call<A: ArgsWrapper >(func_name: &str, args: A) {
let wasm_func = self.funcs.get(func_name).unwrap();
assert_eq!(wasm_func.signature, A::signature());
unsafe { A::call(wasm_func.ptr, args) }
}
I think this approach can be scaled to handle returning values as well.
@sunfishcode what do you think? Am I missing something, do you have better proposals?
I know that LLVM IR is not really similar with real machine isa, but there's some advantages if we can easily translate Cretonne IR to LLVM IR.
It's well-known that LLVM is awfully slow for "bad" code, while cretonne is fast-by-default but currently lacking deep optimizations. I saw that there's a plan to use cretonne for rustc debug mode. But why should compilers handles two different target IR? If we can generate "pretty good" LLVM IR by running cretonne with --target=llvm
, not just rustc but also whole range of languages whose compiler is written in rust can benefit from it.
From the paper "Multi-return Function Call" (http://www.ccs.neu.edu/home/shivers/papers/mrlc-jfp.pdf).
The basic idea from the perspective of compiled code is to include multiple return pointers in a stack frame so functions can return to different places.
Result<T,E>
This is denotationally the same as return a value of a Rust enum with 1 variant according to each of the return pointer slots, with fields according to the returned data associated with that slot (registers, spilled stack slots, etc). But with the naive enum calling convention of adding a tag field, the caller needs to branch on the tag field, even if the enum value was just created before the return so nothing is in principle unpredictable. In the common case of a function "rethrowing" a Err
, (Err(e0) => ... return Err(e1) ...
math arm), the native way results results on O(n) branches (one per stack frame) one each of the Err
tags, while this way allows the error return pointer to point to disjoint control flow for the failure case, catching and rethrowing without additional branches, so the only 1 branch is the original failure condition.
Success and failure control flow is implemented identically, avoiding significant effort on the part of compiler writers in maintaining a completely separate implementation concepts while optimizations can work with both, and don't get stuck on the success failure boundary. At run time, the lack of any DWARF-like interpreters reduces dependencies and simplifies things too.
In short, we have the asymptotic efficiency of unwinding with the implementation (compiler and run-time) efficiency of enum return.
I talked to @eddyb about this once and he said to talk to someone on #cranelift
, but alas I am on IRC less these days and I forgot their nick. Opening this to make sure the idea isn't lost completely due to my negligence. Cheers.
How do I configure the wasm environment so imported functions work?
If I use dlopen() and dlsym() to get the address of a function (call_test), how do I inject that function address into the wasm linear memory so when I call wasm functions they work correctly (because those wasm functions call my imported function).
Example: this Rust fn compiles down to wasm
#[no_mangle]
pub fn test(n: i32) -> i32 {
let _a = unsafe {call_test(1)}; // how do I place the dlsym address for call_test() into the wasm environment?
n
}
If there is no mechanism, and you have a moment, please consider writing out the steps to implement this and I'll do it.
Thanks!
Currently, cranelift spills all registers across calls, without regard to whether they're callee-saved.
However the nice thing about callee-saved registers is that they're saved across calls ;-), and Cranelift indeed supports the callee side of this.
At a high level, the steps here are:
callee_saved_gprs
into callee_saved_regs
and make it not specific to GPRs (this appears needed for saving XMM registers on windows_fastcall too).callee_saved_regs
from being a private function to being a function in the TargetIsa
trait, so that we can access the callee_saved_gprs
from other places in the code.TargetIsa
's callee_saved_regs
set and skip spilling registers in the callee-saved set.There seems to be a problems with branch instructions that take register operands from a constrained register class. The problem is the global values live across the branch into the destination EBB. These values can't be temporarily moved because the destination EBB expects them in their global registers.
Normally, the spiller will make sure that there are enough free registers for the branch instruction's own operands, but it can't guarantee that there are registers free in a constrained register class.
An example is an Intel brnz.b1
instruction whose controlling b1
operand is constrained to the ABCD register class. We can't currently guarantee the live-ins for the destination EBB are not taking up the whole ABCD register class.
There is this panic
what is this about? I've tried to use a simple br_table
with a few arms and it seems to work fine.
bytecodealliance/cranelift#133 and bytecodealliance/cranelift#138 started some work toward improving LICM, in particular to better handle loops that end in the middles of ebbs; see the actual PRs for more discussion.
I've now merged those patches into a branch here:
Once prologue/epilogue generation knows how to allocate stack space, bytecodealliance/cranelift#187, it'll also need to emit code to save and restore callee-saved registers.
Cranelift's clif files (example here) currently use ;
for line comments. This is somewhat common in assembler languages, and LLVM IR, but it's not always immediately obvious for people from other backgrounds.
Using //
would follow Cranelift's syntax for types v0: i32
, -> i32
, and so on in taking syntactic queues from Rust where it makes sense to do so. (Related: should we change function
to fn
? That doesn't feel as important because function
is more self-explanatory there, but it's worth considering.)
//
is two characters rather than ;
's one, but my intuition is that little things we can do to make IR dumps easier to approach for people not already familiar with them will end up being valuable in a variety of contexts, more so than absolute conciseness.
Ref #14
We need to catch traps generated by page faults, ud2
and probably others (e.g. div by zero exceptions, but I'm not familiar how they are handled in cranelift).
As far as I know, we need to use signals on unix-like platforms. I have no idea how to handle these cases on other platforms (and even what platforms we would like to support at all).
I wonder can we provide this functionality out-of-box? Or should we require to setup all machinery from the user and just provide means to, for example, lookup trap codes?
clif-util's "wasm" and "compile" subcommands essentially do the same thing, except that "wasm"'s input is a wasm file, and "compile"'s input is a clif file, but then they both do a full compilation. Currently, they print different things:
and possibly other differences. We should harmonize these two subcommands.
The bitreverse seqences in lib/codegen/meta-python/base/legalize.py all end with two shifts and a bitwise or that effectively swap the low half of the value and the high half:
https://github.com/CraneStation/cranelift/blob/master/lib/codegen/meta-python/base/legalize.py#L445
https://github.com/CraneStation/cranelift/blob/master/lib/codegen/meta-python/base/legalize.py#L475
and others for the other types
It would be better to replace these trailing sequences with rotl_imm
.
That change is the first step, however the catch is that rotl_imm
isn't implemented in isel yet so we'll need to implement that too. See the encodings for shifts and non-imm rotates as well as the encodings for imm shifts for some examples.
Of course, in the future Cranelift is expected to have a pattern-matching optimization which would automatically optimize shift+bor sequences into rotates, however it doesn't have one right now, and even when it does, it would make the code simpler to just use rotate, and it's more efficient to just use the instruction we want than to emit sequences of instructions that we know will end up getting replaced.
It's become clear that Cranelift needs the facilities to debug the compiled code.
I propose implementing a trait DebugSink
that looks something like this:
trait DebugSink {
fn insert_inst(&mut self, inst: InstructionData, source_loc: SourceLoc, code_offset: CodeOffset);
fn insert_func(&mut self, name: String, source_loc: SourceLoc, code_offset: CodeOffset);
}
Once insert_func
is called, all instructions afterwards (until another function is inserted) belong to that function.
It could also take on the form of two traits:
trait DebugSinkFunc {
fn insert_inst(&mut self, inst: InstructionData, source_loc: SourceLoc, code_offset: CodeOffset);
}
trait DebugSink {
fn insert_func(&mut self, name: String, source_loc: SourceLoc, code_offset: CodeOffset) -> &mut DebugSinkFunc
}
Currently, binemit has its own CodeSink trait for writing binary data. It isn't entirely satisfying, in part because it's an unsafe interface -- it doesn't perform bounds checking on the underlying data. While we can provide relatively safe interfaces to protect users from misusing the API, it's harder to be absolutely certain that the number of bytes compile
says a function needs is the number of bytes emit_to_memory
actually writes for that function.
One option would be to provide a safe checked version of MemoryCodeSink
. However, it's also worth considering evaluating available crates that provide low-level byte buffer writing functionality, including:
I have encountered at least one bug related to instruction encodings that was pretty tricky to isolate, but easy enough to fix, bytecodealliance/cranelift#211
It would be helpful to have a test suite that compared Cretonne's encodings against another toolchain (probably llvm, though gcc could work too). The meta-language could add a way to specify how to translate an instruction into assembly language text, and the encoding of the instruction could be compared against the output of another assembler. The domain size for each instruction is small enough that this could probably be an exhaustive test.
Opening this issue to prevent polluting the issue list
clz.i16
target x86_64
function %clz_i16(i16) -> i16 fast {
ebb0(v0: i16):
v1 = clz.i16 v0
return v1
}
function %clz_i16(i16 [%rdi]) -> i16 [%rax] fast {
ebb0(v0: i16):
v1 = clz.i16 v0
[Op1ret#c3] return v1
^~~~~~ verifier inst1: v1 is a ghost value used by a real [Op1ret#c3] instruction
}
ireduce i16 -> i8
target x86_64
function %ireduce_i8(i16) -> i8 fast {
ebb0(v0: i16):
v1 = ireduce.i8 v0
return v1
}
function %ireduce_i8(i16 [%rdi]) -> i8 [%rax] fast {
ebb0(v0: i16):
v1 = ireduce.i8 v0
[Op1ret#c3] return v1
^~~~~~ verifier inst1: v1 is a ghost value used by a real [Op1ret#c3] instruction
}
icmp.i8
icmp_imm.i8
ushr.i8
iadd_cout.i8
isub_bout.i8
select.i8
rotl.i8
rotr.i8
Edit: update for bytecodealliance/cranelift#524
Edit2: add isplit.i32
and iconcat.i32
Edit3: removed isplit.i32
and iconcat.i32
librustc_target is a library in rustc for implementing a lot of native ABI and toolchain logic. It has some overlap with target-lexicon, however it provides much more extensive features, especially including knowledge of calling conventions. It's written in a way which is mostly independent from the rest of rustc, so it's an interesting candidate for factoring out into a standalone library that Cranelift users and others could use to more easily integrate with native C ABI/toolchain environments.
This aligns fairly well with cranelift-codegen's rough design for calling conventions, in which it only does the lowest-level parts, and assumes that cranelift-frontend or other libraries will be the place for offering higher-level functionality such as handling struct types.
See this comment for some more details on librustc_target.
This library isn't Cranelift-specific, as other projects could make use of such a library too. That said, it is particularly interesting for use with Cranelift, so I'm posting an issue here so we can track it.
Hit this error when trying to play around with a non-trivial wasm file (stack trace trimmed for readability):
5: std::panicking::begin_panic
at /checkout/src/libstd/panicking.rs:409
6: <wasmtime_environ::environ::ModuleEnvironment<'data, 'module> as cranelift_wasm::environ::spec::ModuleEnvironment<'data>>::declare_table_elements
at lib/environ/src/environ.rs:190
7: cranelift_wasm::sections_translator::parse_elements_section
at /home/froydnj/.cargo/registry/src/github.com-1ecc6299db9ec823/cranelift-wasm-0.18.1/src/sections_translator.rs:387
8: cranelift_wasm::module_translator::translate_module
at /home/froydnj/.cargo/registry/src/github.com-1ecc6299db9ec823/cranelift-wasm-0.18.1/src/module_translator.rs:91
9: wasmtime_environ::environ::ModuleEnvironment::translate
at lib/environ/src/environ.rs:60
10: wasm2obj::handle_module
at src/wasm2obj.rs:135
11: wasm2obj::main
at src/wasm2obj.rs:87
12: std::rt::lang_start::{{closure}}
at /checkout/src/libstd/rt.rs:74
Rationale: icmp,load,store all have an enumerable immediate field, base on whose value we may want to do something different in the semantics. As a result, we'd like to be able to match a different transform to a concrete piece of rtl, depending on the value of some of the immediate fields. To enable this we need several things:
The readme says that cretonne is supposed to be a code generator for WebAssembly, but it's a bit unclear what that means, seeing how there doesn't appear to be anything to actually do with wasm in the code.
Preferably, the readme or documentation should be able to answer these questions.
Currently, cranelift IR is always printed with one instruction per line, eg.:
function %foo(i32, i32, i32) -> i32 {
ebb0(v0: i32, v1: i32, v2:i32):
v3 = imul v0, v1
v4 = iadd v3, v2
return v4
}
What if we introduced some simple syntax sugar for instructions with only one use? It'd be in addition to the existing syntax. We could then (optionally) print that same code like this:
function %foo(i32, i32, i32) -> i32 {
ebb0(v0: i32, v1: i32, v2:i32):
return v0 * v1 + v2
}
That would be much easier to read in many cases, which is of potential interest to cranelift developers, but also to cranelift users looking to understand how cranelift is compiling their code.
This also might make it even more interesting to switch to //
comments (#471).
There's some ambiguity with syntax like v0 + 1
, but I think we can resolve it by saying that we always use the _imm
instruction when possible rather than emitting an iconst
And there's the question if value numbers for the intermediate values. My rough idea is that they'd just always use the next available value number.
There are other issues to consider too, such as printing srclocs and instruction encodings. But I think we could find reasonable ways to make these work. The main question is, is this idea worth pursuing?
Docs don't make it particularly clear who owns finalised functions and data returned from SimpleJITBackend
.
From the code it looks like they are all pointers into a shared block of memory owned by SimpleJITBackend
, but, if so, definitions don't even try to help with detecting cases where SimpleJITBackend
is dropped but result remains.
One way to avoid this would be to reduce usages of raw pointers in SimpleJITBackend
(they are really rather unnecessary hazards in that particular case) and instead return lifetime-annotated structures which can be further casted into a required function, which would allow to get proper compiler errors in case of dropping owner too early.
The verifier functions have an API that could probably be simplified. Functions which return VerifierStepResult
must by convention also take an out-param VerifierErrors
that will contain non-fatal errors.
Moreover, the T
in VerifierStepResult<T>
seems to always be set to ()
, so it's unused.
It seems the out param is redundant with the error that's present in the Result
hidden VerifierStepResult
. I think slightly modifying the interface of VerifierStepResult
would avoid this out param:
Ok
type be VerifierErrors
, in case we only have non-fatal errors (they could be called "warnings").Err
type stay the same (VerifierErrors
too), and contain non-fatal and fatal errors, if there was at least one fatal error.Then we wouldn't need the errors
outparam anymore, which looks cleaner and "more Rusty". It might mean that a few users of the fatal!
etc macros would need to have their own errors
variable, but that seems OK.
Thoughts?
#9 proposes ading a dependency on memmap, so we should consider using memmap rather than region for executable memory too.
This will require at least these features:
call_indirect
implementation (signature checking, checking the item isn't null, etc)I am currently emitting non atomic versions. This is doesnt have a high priority for me.
What does the description means?
"Standalone JIT-style runtime support for WebAsssembly code in Cranelift"
Can I use this project as substitute for wasmi
? Like instantiate a module providing imports (satisfied by other modules and/or host functions), and execute exports.
Or is it for running wasm executables (something like cervus)?
It doesn't seem useful to give wasm code references to data structures on the native stack because:
Is this correct, or is there some mechanism I'm overlooking?
Thanks!
of the test files, only call does not have memory initializer. I'd like to see it implemented, or at least stubbed out, so I can examine the object files produced.
RUST_BACKTRACE=1 cargo run --bin wasm2obj -- memory.wasm -o memory.o
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/wasm2obj memory.wasm -o memory.o`
%wasm_0x0(0): relocs: []
error: FIXME: implement data initializers
How much work would it be to implement this, or at least add stubs so it doesn't stop the entire binary emission process?
Suppose we want to compare 8-bit ints on a 32-bit RISC:
widen32.legalize(
a << icmp('ult', x, y),
Rtl(
wx << uextend.i32(x),
wy << uextend.i32(y),
a << icmp('ult', wx, wy),
))
We want to generalize this pattern, but this transformation is only valid for the unsigned or sign-neutral condition codes, so this is wrong:
widen32.legalize(
a << icmp(cc, x, y),
Rtl(
wx << uextend.i32(x),
wy << uextend.i32(y),
a << icmp(cc, wx, wy),
))
We need a way of specifying a predicate on the immediate cc
. Ideally, this mechanism should share representation with the instruction predicates already supported by instruction encodings.
(Also note that the first example doesn't work either—we can't even require a fixed immediate value in the input pattern.)
Hello,
You are probably well aware, but some mainstream compilers are emitting retpolines to help mitigate Spectre variant 2 attacks. Do you have any plans to add a similar capability to the cretonne code generator (and/or do you think it makes sense for cretonne to do this sort of thing)?
Thanks,
Jon
The regmove/regspill/regfill instructions are special. They're emitted by the coloring pass to temporarily override the assigned registers (func.locations
) for values. This is surprising, and it means that very late passes which run after register allocation can't just look at func.locations
and have to use RegDiversions
and walk the IR in order to know what registers are assigned to what values.
They also currently require special casing to work around this issue.
It would be nice to find a way to either avoid using regfill/regspill/regfill, or to rewrite them once coloring is done, so that func.locations
is left up to date.
~/Desktop/rust/wasmstandalone $ cd $(mktemp -d)
/tmp/tmp.rwXoFoGNoi $ cargo new --lib test-wasm
Created library `test-wasm` project
/tmp/tmp.rwXoFoGNoi $ echo -e '\n[lib]\ncrate-type = ["cdylib"]' >> test-wasm/Cargo.toml
/tmp/tmp.rwXoFoGNoi $ echo -e '#[no_mangle]\npub fn nop() {}' > test-wasm/src/lib.rs
/tmp/tmp.rwXoFoGNoi $ (cd test-wasm && cargo rustc --target wasm32-unknown-unknown --release)
Compiling test-wasm v0.1.0 (file:///tmp/tmp.rwXoFoGNoi/test-wasm)
Finished release [optimized] target(s) in 0.38s
/tmp/tmp.rwXoFoGNoi $ cd -
/home/aidanhs/Desktop/rust/wasmstandalone
~/Desktop/rust/wasmstandalone $ cargo run --bin wasmstandalone -- /tmp/tmp.rwXoFoGNoi/test-wasm/target/wasm32-unknown-unknown/release/test_wasm.wasm
Finished dev [unoptimized + debuginfo] target(s) in 0.07s
Running `target/debug/wasmstandalone /tmp/tmp.rwXoFoGNoi/test-wasm/target/wasm32-unknown-unknown/release/test_wasm.wasm`
error while processing /tmp/tmp.rwXoFoGNoi/test-wasm/target/wasm32-unknown-unknown/release/test_wasm.wasm: Verifier error: inst18: Call must have an encoding
In order to compile C code that calls printf
in libc, we need to implement the caller side of varargs. And in order to compile printf itself, we need to implement the callee side.
If SIP really does isolate, then perhaps wasm code shouldn't be able to smash the native stack.
Currently Cretonne expects wasm and native code to share the stack so for example wasm code can call an imported native function.
What I'd like:
If imported fn return types are i32, then perhaps this could be marshalled using a register, and a few assembler instructions could take care of the return value and prologue/epilogue modifying the stack pointer.
Is this a crazy idea?
Thanks!
DWARF line table support may be a good way to get started with the broader topic of producing debug info, as it's relatively straightforward, and quite valuable.
Briefly surveying the landscape, the three main options appear to be:
(There is also the dwarf crate has rudimentary support for writing, though not yet writing .debug_line sections. Judging by this comment, it seems no longer maintained, in favor of gimli.)
Regardless of how we implement it, the code for consuming cretonne IR and emitting DWARF should be in a new crate, as other users of the cretonne-codegen crate won't need it. It will need to emit the binary data for the section, plus relocation records telling the linker where to fix up program addresses.
The first step is to write a minimal .debug_info
section, containing a DW_TAG_compile_unit
, and a DW_TAG_subprogram
entry for each function. Chapter 2 "General Description" of the dwarf spec defines the overall structure of these. I also recommend either the dwarfdump utility and/or readelf --debug-dump=info,lines
for examining DWARF output from other compilers to get a sense of what this needs to look like. Then, we can implement the line table. Section 6.2 "Line Number Information" defines the line table format. I'll flesh out these steps more when we're ready; right now I'm just sketching out the major areas that would be covered.
Since cranelift is soon to be a backend for rust, it will need to support inline assembly. There is no good way to solve this right now, since rust uses the llvm inline asm syntax right now. I'm making this issue so we can think about this in the long-term.
I've wanted to play with a fuzzing tool for a long time, so I brought up cargo-fuzz on cretonne. I think this could be a useful complement to the test suite, but I want to do the following before generating a PR:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.