Coder Social home page Coder Social logo

webassembly / reference-types Goto Github PK

View Code? Open in Web Editor NEW
162.0 55.0 40.0 29.68 MB

Proposal for adding basic reference types (anyref)

Home Page: https://webassembly.github.io/reference-types/

License: Other

Makefile 0.51% Python 2.06% Batchfile 0.53% CSS 0.03% Shell 0.32% HTML 0.08% OCaml 6.68% Standard ML 0.01% WebAssembly 83.29% JavaScript 6.13% Perl 0.38% TeX 0.01%
proposal

reference-types's People

Contributors

alexcrichton avatar andrewscheidecker avatar backes avatar binji avatar bnjbvr avatar cellule avatar chfast avatar chicoxyzzy avatar dschuff avatar eqrion avatar fitzgen avatar flagxor avatar gahaas avatar gumb0 avatar honry avatar jfbastien avatar kg avatar kripken avatar lars-t-hansen avatar littledan avatar lukewagner avatar ms2ger avatar ngzhian avatar pepyakin avatar pjuftring avatar ppopth avatar rossberg avatar sunfishcode avatar swasey avatar xtuc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reference-types's Issues

Typed select's return type representation

This proposal introduces a typed select instruction, which takes reference types and also a return type. The return type is written as a vector of valtype. Currently only the vector of length 1 is accepted in the v8 validation. I guess the reason it is a vector is for the multivalue proposal, right?

Is there a reason this has a different representation from that of block, loop, and if type when there are multiple types? blocktype for the multivalue proposal says it is extended to have a type index in the presence of multi values. This will be applied to other instructions such as loops and ifs. Wouldn't it be better for select to be consistent?

cc @tlively

eqref imposes restrictions on host language implementations

As mentioned at the in-person meeting, this makes me uncomfortable.

I think we can get better functionality and similar performance without requiring embedders to use any specific implementation strategy.

Rather than mandating reference equality, the core spec could say ref.eq calls an embedding specific equality function. The JS/web spec would define that function to be strict equals. Implementations of course would then be free to inline the strict equals function in the same way they do for JS.

This has concrete advantages above the current spec: it allows for strings (and all other JS types/DOM objects) to be compared. It also removes the need for eqref altogether, which require extra type checks/marshalling on the JS/wasm boundary.

select and br_table spec tests are inconsistent

The latest spec tests seem to produce invalid JS code, e.g. in select.wast

(assert_return (invoke "join-funcref" (i32.const 1)) (ref.func))

(assert_return (invoke "join-anyref" (i32.const 1) (ref.host 1)) (ref.func))

The ref.func is unknown by JS (ReferenceError: ref is not defined) and therefore produces an error. Instead, the result should be compared to the $dummy function reference which needs to be made accessible.

Similarly in br_table.wast:

(assert_return (invoke "meet-funcref-1" (i32.const 0)) (ref.func))
(assert_return (invoke "meet-funcref-1" (i32.const 1)) (ref.func))
(assert_return (invoke "meet-funcref-1" (i32.const 2)) (ref.func))
(assert_return (invoke "meet-funcref-2" (i32.const 0)) (ref.func))
(assert_return (invoke "meet-funcref-2" (i32.const 1)) (ref.func))
(assert_return (invoke "meet-funcref-2" (i32.const 2)) (ref.func))
(assert_return (invoke "meet-funcref-3" (i32.const 0)) (ref.func))
(assert_return (invoke "meet-funcref-3" (i32.const 1)) (ref.func))
(assert_return (invoke "meet-funcref-3" (i32.const 2)) (ref.func))
(assert_return (invoke "meet-funcref-4" (i32.const 0)) (ref.func))
(assert_return (invoke "meet-funcref-4" (i32.const 1)) (ref.func))
(assert_return (invoke "meet-funcref-4" (i32.const 2)) (ref.func))

Also, the join-funcref test should maybe be using anyref as type instead of funcref:

(func (export "join-funcref") (param i32) (result anyref)
(select (result anyref)
(table.get $tab (i32.const 0))
(ref.null)
(local.get 0)
)
)

Tests and spec disagree about call_indirect table specifier

test/core/ref_func.wast seems to think the table comes first, before the sig. e.g:

(call_indirect $t (param i32) (result i32) (local.get $x) (i32.const 0))

  (func (export "call-g") (param $x i32) (result i32)                            
    (table.set $t (i32.const 0) (ref.func $g))                                   
    (call_indirect $t (param i32) (result i32) (local.get $x) (i32.const 0))     
  )             

Whereas the spec test says it comes after:

call_indirect (type $t) $x : [t1* i32] -> [t2*]
iff $t = [t1*] -> [t2*]
and $x : table t'
and t' <: funcref

wabt also seems to have implemented the later: https://github.com/WebAssembly/wabt/blob/a147d92575d386ef45c75a3e492c8ca4d33a3bbc/test/parse/expr/reference-types-call-indirect.txt#L13

I hope the tests, rather than the spec, can be updated since add an extra optional parameter seems to be much simpler to implement in the parser.

Remove type annotation on ref.is_null

Before we removed anyref, the ref.is_null instruction had a canonical type:

ref.is_null : [anyref] -> [i32]

One piece of the fallout from removing anyref was that this no longer worked. In order to avoid a dependency on the outcome of the wider discussion opened in WebAssembly/function-references#27, I added a type annotation on the instruction, so that it became

ref.is_null <reftype> : [<reftype>] -> [i32]

(with the understanding that the <reftype> would later be refined to a <heaptype> as per the typed (function) references proposal).

However, given that the discussion on WebAssembly/function-references#27 seems to show a common sentiment to avoid redundant type annotations -- especially considering the many more affected instructions added in something like the GC proposal -- it would be unfortunate if ref.is_null became an outlier. And having adapted all the tests, I can say that it is quite annoying in practice, too (ref.null is tedious enough already).

So I propose removing the annotation and changing the instruction to

ref.is_null : [<reftype>] -> [i32]

such that the a linear validator simply has to check that there is some<reftype> on the stack.

Thoughts?

Include table.size

Since table.grow will need to take a default argument to use for initialization, table.grow(0) is not a plausible mechanism for obtaining a table's length. So we should include a mechanism for that. Since it's memory.size (nee current_memory), let it be table.size.

[js-api] Update table API

Table stuff in the JS API needs to be reworded to account for general references.

Also, table constructor and grow method need optional init argument.

Make funcref not a subtype of anyref

(This idea came up after yesterday's discussion about the GC extension. I have tried to describe it here in a self-contained matter, but let me know if there are any terms I forgot to define or motivations I forgot to provide.)

Having funcref be a subtype of anyref forces the two to have the same register-level representation. Yet there are good reasons why an engine might want to represent a function reference differently than an arbitrary reference. For example, function references might always be an assembly-code pointer paired with a module-instance pointer, effectively representing the assembly code compiled from a wasm module closed over the global state of the specific instance the function reference was created from. If so, it might make sense for an engine to use a fat pointer for a function reference. But if funcref is a subtype of anyref, and if it overall makes sense for arbitrary references to be implemented with normal-width pointers, then that forces function references to be implemented with normal-width pointers as well, causing an otherwise-avoidable additional memory-indirection in every indirect function call.

Regardless of the reason, by making funcref not a subtype of anyref, we give engines the flexibility to represent these two types differently (including the option to represent them the same). Instead of subtyping, we could have a convert instruction that could take a function reference and convert it into an anyref representation, or more generally could convert between "convertible" types. The only main benefit of subtyping over conversion in a low-level type system is its behavior with respect to variance, such as co/contravariance of function types, but I see no such application for funcref and anyref. And in the worst case, we could always making funcref a subtype of anyref later if such a compelling need arises.

Should WebAssembly support anyref globals?

Intuitively I understood anyref as a new value type. Therefore I expected anyref to be allowed in signatures, locals, and globals. The current proposal does not mention globals though. I wonder if we should support anyref globals, just for symmetry to the other value types.

Missing design rationals

I am missing a design rational for this proposal in the same vein as the design rational documents for the Wasm specification that can be found here: https://webassembly.org/docs/rationale/

One of the questions I was having is why this documents proposes to add the questionable null-reference and all the infrastructure (is_null) around it. To me coming from a C++ background it is quite confusing because semantically references should be assumed to always point to something, aka they are non-nullable pointers - same concept in Rust. Pointers in these languages on the other hand are nullable. However, there is also this interesting quote from the original designer of the null-pointer where he refers to it as a 1-billion dollar mistake: https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/

Which make people again wonder why we would want to repeat this in Wasm.
Note that I am not completely against adding nullable pointers to the Wasm spec I just really am in need of a rational of the why we cannot or do not want go without them.

There will be many more of those questions for upcoming Wasm proposals. Maybe we should enforce a dedicated rationals doc or section for all of them to quickly explain certain designs.

Finalize opcode encodings

As part of the effort to drive both this proposal and the bulk memory proposal toward shipping status, let's nail down the opcode encodings. (The bulk memory proposal depends on what we choose for ref.null and ref.func, since those are used to express passive element segments.)

The spec interpreter in this repo has some TODO comments around the opcode encodings, and some opcodes are missing from the interpreter at present, and all proposed opcodes are precious single-byte ones.

From the interpreter in this repo we have:

0x25 == table.get
0x26 == table.set
0xd0 == ref.null
0xd1 == ref.is_null
0xd2 == ref.func

From the bulk memory proposal we have these proposed codes:

0xfc 0x08 == memory.init
0xfc 0x09 == data.drop
0xfc 0x0a == memory.copy
0xfc 0x0b == memory.fill
0xfc 0x0c == table.init
0xfc 0x0d == elem.drop
0xfc 0x0e == table.copy

In addition we need opcodes in this proposal for table.grow, table.size, and (possibly) table.fill.

We are in somewhat short supply of single-byte opcodes so I propose that we (a) change the encoding of ref.func since is not likely to be a very common opcode, and (b) allocate prefixed opcodes also for the three table operations mentioned above, yielding the following table for the present proposal:

0x25 == table.get
0x26 == table.set

0xd0 == ref.null
0xd1 == ref.is_null

0xfc 0x0f == table.grow
0xfc 0x10 == table.size
0xfc 0x11 == table.fill

0xfc 0x20 == ref.func

with the idea that 0xfc 0x20 can be the start of the group for multi-byte gc/reftypes operations, and 0xd0 remains the start of the group for single-byte gc operations.

@rossberg @binji @lukewagner @titzer, opinions?

[js-api] Formalization only: eliminate extern value cache

Nothing in here is meant to change the behavior of the proposal. It is just a suggestion on how to formalize the existing behavior.

The extern value cache doesn't seem to serve a purpose.

One purpose I can imagine is as a formalization device: it associates a natural number that represents a JS object. But that purpose seems addressible by letting externaddr in the core spec be an abstract set of values specified by the embedder, in which case ref.extern could take a JS object as its argument. (That's a matter of taste though, so I'm happy to leave it as is.)

But the extern value cache also seems to be ensuring that every JS object has at most one natural number associated with it as well as determinizing what that number is. Since this number has no way of leaking into either wasm or JS (or at least I would think it shouldn't), this seems unnecessary.

Problematic sentinel value with table.grow

Surprise, using a sentinel value (-1) for memory.grow is coming back to bite us. For table.grow should preferably behave analogously. and the proposal currently says that table.grow also returns -1 (i.e., 2^32-1) in case of error. However, that does not really make sense, because we allow table sizes to span the entire u32 value range.

I can see these possible options:

  1. Leave as is. Pro: symmetry with memory.grow; cons: quite a wart and sets a bad precedent.

  2. Keep -1 but disallow 2^32-1 as a table size. Pro: symmetry with memory.grow; cons: weird discontinuity, arguably backwards as a "fix", technically a breaking change

  3. Use zero as a sentinel value. Pro: less arbitrary, in practice you never want to grow by 0; cons: still a bit hacky.

  4. Always return previous size. Pro: no sentinel; cons: checking for error (via comparing with table.size afterwards) requires more work and is potentially racy in a future with shared tables.

  5. Return success status instead of previous size. Pro: simple and simplest to use; cons: getting old size would require separate call to table.size, which again is potentially racy

  6. Use multiple results, success status + old size. Pro: serves all purposes; cons: more complex, creates dependency on multi-value proposal

  7. Return an i64. Pro: solves problem, but only for the "wasm32" equivalent of tables; cons: does not avoid sentinel, coherence would suggest to make table size an i64 everywhere

@binji, @lukewagner, @lars-t-hansen, @titzer, I am probably leaning towards option 5 (Boolean result), but would like to hear your opinions. Is there a strong use case for getting the old size atomically? In general, I would be inclined to avoid sentinel values.

Can anyref and funcref be null?

While the current proposal allows them to be null, wouldn't it be better to have a set of refs that can be null and a set of refs that cannot so that call_indrect on a func ref doesn't require a nullcheck, etc?

Coordinate with bulk table ops / clarify encoding of the table index

This proposal introduces multiple tables, a new thing. This means that some existing operations must be expanded (call_indirect needs to carry an optional table index, although it has space for this) and that some proposed operations must take the multi-table case into account (table.copy, table.init). We should be sure to harmonize with the bulk memory proposal now.

The bulk table operations have a (misnamed) memory varu32 operand that must currently be zero and can be repurposed as a flags field /or/ as a table index field, see here, depending on how we like it. For table.copy there can be two table indices, so a flags field is probably more or less inevitable for that instruction.

I propose that we uniformly use a varu32 flags field for all the table operations: table.get, table.set, table.grow, table.fill, table.size, table.copy, and table.init. (table.drop is arguably misnamed but that's a discussion for elsewhere) and that this field is either zero, indicating default operands (table zero for every instruction except table.copy; src=0 and dest=0 for table.copy), or a bit indicating that a table index follows after the flag word, or, for table.copy, that two indices follow, for dest and source.

Normally, having a flags field will add zero overhead when using table zero, and one byte of overhead when using other tables.

For prototype code I'm writing for Firefox I've gone with the flag value 0x04 to signify the presence of table indices, to fit in with the other flags that we currently use for memories and tables (0x00=Default, 0x01=HasMaximum, 0x02=IsShared).

@rossberg, opinions? We don't have to use a flags field everywhere but we probably have to use it for the bulk table operations, which strongly suggests using one for table.grow and table.fill and table.size; table.get and table.set and call_indirect might reasonably be considered a separate class of instructions. The cost of using a flags field there seems slight. We can choose not to do it, but then any future extensions that could have used a flag will require new instructions or out-of-band encodings in the table index.

[js-api] WebAssembly.Table.prototype.grow and a fill value

The envisioned table.grow instruction takes a required fill value as a second argument. We should similarly change WebAssembly.Table.prototype.grow to accept a fill value, indeed for non-nullable element types we must have a fill value. Here are some notes about that.

Backward compatibility concerns when the table is table-of-anyfunc:

  • the second argument must be optional
  • the default value of the optional second argument must be null
  • if the second argument is present but is not a function value that can be stored, then it must be ignored

For table-of-anyref it is probably most correct if the default value for the optional second argument is null, not undefined, since null is a value that is in wasm, unlike undefined. (This matters only because undefined is representable as an anyref value and is thus a candidate at least in principle.) FWIW, the table.grow instruction cannot talk about undefined values without getting them from the host, but it can synthesize null. A JS caller to WebAssembly.Table.prototype.grow() can pass an undefined value explicitly as the second argument to initialize new slots with that value.

Long-term it's inevitable that the behavior of W.T.p.grow depends on the table type anyway; for non-nullable element types there can be no default, for example. We could choose now to require the second argument also for anyref tables, we would just have to leave anyfunc tables as a special case.

Avoid the need for computing lubs/glbs

With subtyping, and lacking sufficient type annotations, some operations may require the validation algorithm to compute the least upper bound (lub) or greatest lower bond (glb) of two types to accurately infer an output or input type, respectively. In current Wasm, this comes up with two instructions:

  • select: the result type is the lub of the two operand types
  • br_table: the operand type is the glb of the label types

(Similar issues would come up with ops like dup or pick, which might also require a glb of the use edges when the input type is not known a priori, i.e., in unreachable code.)

While lub and glb are easy to compute with the tiny subtyping lattice introduced by this proposal itself, it will not stay that way with future reference extensions (e.g., typed functions or GC types), which will make it both more complex and more costly. In accordance with the design goals for Wasm validation, we should make sure to avoid the need for computing lubs/glbs altogether.

Possible Solution:

  • For select, the only option seems to be introducing a new type-annotated version, and restricting the pre-existing unannotated version to numeric types for backwards compatibility (fortunately, only trivial subtyping is available on numeric types).

  • Without going into technical details, the glb cases (br_table, dup, pick) can most easily be avoided by adding a bottom type to the type system (a least type in the subtype lattice). Effectively, this is already present in the MVP validation algorithm, to type unreachable stack slots; promoting it to a proper type itself is a natural generalisation in the presence of subtyping. (Note that this type doesn't need not to be expressible in programs, it's sufficient if it exists in the typing rules.)

Should local.tee / br_if preserve subtypes?

Currently, local.tee validation requires C.locals[x] = t and br_if validation requires C.labels[l] = [t?]. This causes the following two examples to fail validation in the spec interpreter when intuitively they seem valid:

(module
  (func (param $p funcref)
    (local $x anyref)
    (local $y funcref)
    local.get $p
    local.tee $x
    local.set $y
  )
)
(module
  (func (param $p funcref)
    (block $b (result anyref)
      (block $c (result funcref)
        local.get $p
        (br_if $b (i32.const 1))
        br $c
      )
    )
    drop
  )
)

If instead the = was replaced with <: in the abovementioned validation rules, then I think these examples would validate.

Trying to think if this actually matters, you could imagine that it would be useful to have the property that:

  (local.set $local1 (local.get $x))
  (local.set $local2 (local.get $x))

was always equivalent to:

  (local.tee $local1 (local.get $x))
  local.set $local2

so that a size-optimizer could simply recognize and replace this pattern.

Asymmetric handling of br_table and br_if with bot type

With the recent change to introduce the bottom type, the validation rules for br_table changed. However, br_if did not get changed accordingly.

Consider the following two functions, the first function uses br_table, with one additional block to simulate the fall-through of br_if. The second function uses br_if directly.
(func (result f32)
(block (result f32)
(block (result i32)
unreachable
br_table 0 1 1
)
f32.convert_i32_u
)
)

(func (result f32)
(block (result f32)
unreachable
br_if 0
f32.convert_i32_u
)
)

The function with br_table validates, because the result type of the br_table instruction is BOT. The function with br_if, however, does not validate. For br_if the result type of the fall-through case is not considered when its result type is calculated.

Should we adjust the validation of br_if to match the validation of br_table?

Merge conflicts with spec master

There are merge conflicts between this repo and the spec master branch:

CONFLICT (content): Merge conflict in test/core/select.wast
CONFLICT (content): Merge conflict in test/core/linking.wast
CONFLICT (content): Merge conflict in test/core/imports.wast
CONFLICT (content): Merge conflict in test/core/globals.wast
CONFLICT (content): Merge conflict in test/core/exports.wast
CONFLICT (content): Merge conflict in test/core/br_table.wast
CONFLICT (content): Merge conflict in test/core/binary.wast
CONFLICT (content): Merge conflict in interpreter/valid/valid.ml
CONFLICT (content): Merge conflict in interpreter/text/parser.mly
CONFLICT (content): Merge conflict in interpreter/text/lexer.mll
CONFLICT (content): Merge conflict in interpreter/text/arrange.ml
CONFLICT (content): Merge conflict in interpreter/syntax/types.ml
CONFLICT (content): Merge conflict in interpreter/syntax/operators.ml
CONFLICT (content): Merge conflict in interpreter/syntax/ast.ml
CONFLICT (content): Merge conflict in interpreter/script/js.ml
CONFLICT (content): Merge conflict in interpreter/runtime/memory.mli
CONFLICT (content): Merge conflict in interpreter/runtime/memory.ml
CONFLICT (content): Merge conflict in interpreter/exec/eval_numeric.ml
CONFLICT (content): Merge conflict in interpreter/exec/eval.ml
CONFLICT (content): Merge conflict in interpreter/binary/encode.ml
CONFLICT (content): Merge conflict in interpreter/binary/decode.ml
CONFLICT (content): Merge conflict in interpreter/README.md

This prevents the testsuite mirror's update script from updating the reference-types tests.

wasm null should be mapped to JS undefined, not JS null

As in any interlanguage mapping, no mapping is perfect but some are more useful than others.
NOTE: It may already be too late to make this change, even if we agree it would have been a good idea.

Wasm null is used for the uninitialized value on of values of nullable types. For nullable types, null will be the unique platform supported value to indicate that the expected value is absent. This corresponds most closely to the JS undefined rather than the JS null. For the ES specification itself, undefined is consistently treated as the indicator of absence:

var x;  // x === undefined
let y;  // y === undefined
function foo(x) { console.log(x); }
foo();  // unbound parameters bound to undefined
foo();  // non-return returns undefined
({}).z;   // absent properties read as undefined
function bar(x = 3) { return x; }
bar();  // 3. unbound parameters use default value
bar(undefined);  // 3. parameters bound to undefined act as if unbound
bar(null);  // null. null does not emulate unbound
(new Map()).get("x");  // undefined indicates absence

Ability to detach references or move semantics?

I understand this is very late in this proposal's lifespan, so likely nothing can be changed at this point, but I'm wondering about whether move semantics were ever considered for references.

For example, in a situation where you're opening and closing a reference to a file (like in this example in the type imports proposal), you are required to have a "zombie" state where the file is closed internally, but there may still be references to the File alive. If some sort of move semantics were allowed, where you had an owning ref, and a non-owning ref, this kind of thing could be avoided.

I haven't really thought through whether this is possible without a really strong type-system (and lifetimes or something similar), so really I'm just wondering if there is prior discussion about something along these lines.

Inadvertent exporting of Wasm functions

The anyref/anyfunc allows a function to exported from a WASM module dynamically - as part of the returned value of another exported function. Similarly for imports.
This effectively blows apart any discipline for what is exported from a WASM module. It also complicates implementation because potentially every WASM function may become accessible from JS.

The table.set operator in conjunction with indirect calling of functions from any table allows monkey-patching of functions.

Merge conflicts with spec master

There are merge conflicts between this repo and the spec master branch:

Auto-merging interpreter/text/parser.mly
CONFLICT (content): Merge conflict in interpreter/text/parser.mly
Auto-merging interpreter/text/lexer.mll
CONFLICT (content): Merge conflict in interpreter/text/lexer.mll
Auto-merging interpreter/text/arrange.ml
CONFLICT (content): Merge conflict in interpreter/text/arrange.ml
Auto-merging interpreter/script/script.ml
CONFLICT (content): Merge conflict in interpreter/script/script.ml
Auto-merging interpreter/script/run.ml
CONFLICT (content): Merge conflict in interpreter/script/run.ml
Auto-merging interpreter/script/js.ml
CONFLICT (content): Merge conflict in interpreter/script/js.ml
Auto-merging interpreter/README.md
CONFLICT (content): Merge conflict in interpreter/README.md
Auto-merging document/js-api/index.bs
CONFLICT (content): Merge conflict in document/js-api/index.bs

This prevents the testsuite mirror's update script from updating the reference-types tests.

[test] Bulk instructions with multiple tables

The bulk proposal adds new table instructions, while this proposal adds multiple tables. The intersection of the two proposals has been implemented and spec'ced here after rebasing this on bulk, but we also need to extend the existing tests for those bulk instructions. In particular, test table.copy between different tables is a somewhat interesting case.

The respective tests are generated by scripts. @lars-t-hansen, would you or somebody else familiar with those scripts be willing to extend them?

[js-api] ToWebAssemblyValue signature change

In #8, ToWebAssemblyValue gained an additional error argument; the current spec only passes it in a single call site. It seems like it had a caller that passed LinkError at the time. Presumably all callers should pass a value here; otherwise the argument should be marked as optional.

Alternatively, the algorithm should always throw a TypeError; that's what it already throws when ToInt32 and friends fail.

Type of ref.func instruction

The currently stated type of the ref.func instruction is [] -> [funcref]. I assume that with the typed functions proposal, this return type will be refined to be the actual type of the function referenced in the immediate. Is it worth mentioning this in the overview? I can't see a case where this proposal isn't future-proof, but it might be good to state it here to double-check our understanding.

New `select` variant is not documented in Overview

I've noticed a new select variant when implementing the proposal as well as in Binary Encoding section of the modified spec, but it doesn't seem to be documented in the Overview, making it unclear if it's part of the proposal or comes from elsewhere.

(Now I know it's part of the proposal, but still worth clarifying IMO.)

Multiple tables take 2

Rather than continue down the road of having tables (editorial comment: arguably tables are the ugliest part of wasm) and having multiple tables, together with table.set etc., a more scalable and powerful approach would be to use managed memory.
Specifically, the array type in the MM proposal looks a lot like tables; with the advantages of being part of a holistic approach to memory.
Since we do not have MM at the moment, I would suggest refactoring multiple table so that it would become part of the ultimate MM proposal. (May no be possible of course).

Should the baseline proposal include ref.func?

The proposal currently allows getting and setting elements from anyfunc tables, but provides no way of producing such values separately, which is a weird functionality gap.

We could already add the instruction ref.func $f for creating a reference to a given function $f, which is currently mentioned in the function reference extension only. One caveat is that in the baseline type system we cannot yet provide it with its ideal type, we can only support

ref.func $f : [] -> [anyfunc]

However, thanks to subtyping, it would be a backwards-compatible change to refine that typing rule to return the more specific type (ref <functype-of-$f>) in later versions. So it seems save to add it now with weaker typing.

Merge conflicts with spec master

There are merge conflicts between this repo and the spec master branch:

CONFLICT (content): Merge conflict in document/core/appendix/properties.rst

This prevents the testsuite mirror's update script from updating the reference-types tests.

Reference types for phase 4?

After going through the open issues, there appear to be two substantive core language issues (#55, #60), both of which can probably be postponed if we want to or if we can't find agreement quickly. There is one corner case (#40) that should probably be resolved in favor of current behavior, because sentiment in the discussion is tilting in that direction. The JS API also has a small number of mop-up issues that may or may not already be done (#9, #20, #22, #51); investigating. The remaining open issues are discussion items or feature requests that have been postponed.

The process document requires these for phase 4 (with my comments about status):

  • Two or more Web VMs implement the feature.
    • Firefox is tracking the spec, as is Chrome, I believe. @mstarzinger?
  • At least one toolchain implements the feature.
  • The formalization and the reference interpreter are usually updated
    • These seem fine to me, @rossberg do you believe we're complete, modulo issues listed above?
  • Community Group has reached consensus in support of the feature.
    • We have to have a poll to gauge this

We may need to ship bulk memory at the same time, as the two proposals touch in various ways, but if anything that proposal is even closer to a shipping state.

Rename ref.is_null to ref.eq_null

The instruction seems quite similar to the i32.eqz instruction, so I think ref.eq_null would be more consistent than ref.is_null.

Bikeshedding: rename ref.func to func.ref

While working on the function references and GC proposals I noticed that some naming conventions become a bit arbitrary again. For example, for function references we currently have:

  • ref.func
  • func.bind
  • call_ref

I'd like to make at least the first two more consistent by renaming ref.func to func.ref. The idea being that all instructions producing/consuming a particular class of reference have its prefix.

That might also suggest renaming call_ref to func.call, though that may be confusing with plain calls. Not sure we can do much about calls, that ship has probably sailed. Maybe call and call_indirect should have been func.call and table.call, but then what about call_ref?

Compacting/reusing anyref table indices to prevent fragmentation

I'm implementing a language runtime that mixes linear memory allocated code with anyref's that need to persist in a "heap", in the case of Wasm that means stored in a table (table 1 anyref). In case readers aren't familiar, linear memory cannot directly store anyrefs, but you can store it in a table at a particular index, then store that index in linear memory.

The tricky part is when you want to reuse old, now unused, indices so you're not growing the table size indefinitely. When that table index is "freed" by removing the anyref from that particular index, the anyref itself can potentially be GC'd (if nothing else has a reference to it) but you need some way of knowing that index can be reused now.

I couldn't find any discussions around this (apologies if I missed them!) so I'm curious what folks think the ideal approach is for handling this? The most obvious might be some sort of mark-compact like thing, but it's not clear how one would update the table indices stored in linear memory. Seems like you'd almost have to have another indirection; two linear memory dictionaries of [table_index]: pointer and [pointer]: table_index so you can change what table index your fake "pointer" points to.

Another solution might be keeping a linear memory vector of freed indices that have been reclaimed, using up those first before appending to the end of the table or resizing.

Managing tables

Rather than freeing developers from managing tables, this proposal forces them to do so. Furthermore, given that tables represent the boundary between JS and WASM (say) neither side will have a clear way of indicating which parts of the table are actually in use -- forcing some convention such as null for empty tables.
Imagine the scenario where an application in WASM needs to create an entity that must be referenced from the table dynamically (e.g., an object). From the WASM side, there will need to be a map of available slots in the table which can be used to store the reference. But the same table also 'belongs' to the host -- who also has to find a free location within the table for a value intended to be passed to the WASM module.

Examples include string values that are computed by a wasm module, objects that are created by a wasm module to be passed back in JS callbacks, ...

Relation between externref and anyref

What ist the relation between the externref and anyref type. In the overview both are used. I have thought anyref was renamed to externref. But now both terms are in the overview. I am confused.

Global function references require two-pass analysis

Following issue #31, the spec was recently changed to disallow function references in global definitions or function bodies that are not explicitly listed in the element section.

Because the element section occurs in the binary right before the code section and therefore after the all other header sections in Wasm modules, this means that the validation of function references needs to be deferred until the elements section is parsed.

Is this an intended complication or would it help allowing arbitrary function references in the header and extend the allowed references in the code section to any function references in any of the header sections?

[js-api] Globals of type anyref require slight adjustment to JS API spec

At the moment, the JS API spec requires that if the value argument to the WebAssembly.Global constructor is undefined then the default value for the Global's type is stored. For a Global of type anyref this probably means that the value stored is null. But this is counter to the intent of anyref, which is that it can represent (through boxing) any host type faithfully.

This is not a compatibility issue since anyref is a new thing and the problem is not observable with older types, it's just a slight complication of the algorithm for Global's constructor. I'm not sure if this means the WebIDL spec for that constructor needs to change, ie, if the current behavior is a result of some WebIDL standard behavior. If it does need to change then compatibility may be slightly at risk, though no doubt a fix can be found.

Binary encoding for `table.init` spec doesn't match implementations

The online specification indicates that the table segment is encoded first and the element segment is encoded second, but this doesn't agree with other implementations, for example:

  • Gecko reads the segment index first which I think also means that the encoding of memory.init doesn't match the specification as well (?)
  • wabt reads the element segment first, then the table index
  • v8 I think also reads the element segment before the table index.

Is this a bug in the specification? Or do engines need to update which one they're reading first?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.