pipcet / asmjs Goto Github PK

GCC asm.js backend, support files that can't go in the program repositories

License: GNU General Public License v3.0

C 41.13% Makefile 5.55% JavaScript 25.08% Assembly 4.30% Shell 2.35% C++ 16.49% Perl 1.42% Objective-C 0.07% HTML 0.12% Scheme 3.00% D 0.05% Gnuplot 0.01% Raku 0.44%

asmjs's People

Contributors

Stargazers

Watchers

Forkers

martindale justforkin katietz

asmjs's Issues

Stack trampolines

The asm.js port supports stack trampolines: it's quite easy since we can catch function calls to invalid addresses, then make sure those addresses actually point to data, not code, that will tell us how to proceed.

By contrast, with WebAssembly, I think we need to do things the hard way: when constructing a stack trampoline, we grow our function table by one new entry, for this nested function, and call it normally afterwards.

But how do we then dispose of the new function table entry? The GCC trampoline code is fairly general, but appears to lack an option to clean up when the trampoline is no longer needed. I'm looking at GIMPLE_WITH_CLEANUP_EXPR, but haven't really found an example of how to use it.

Is this project still active is there any binary of some sort already?

__builtin_return_address

The asm.js backend splits the 32-bit "PC" register into a 20-bit function index and a 12-bit relative PC, with some alleged support for functions spanning several 12-bit "pages".

That's not a very nice approach; for the wasm32 port, function pointers are simply small integers specifying the function offset. That's what the WebAssembly docs are recommending, too, if I'm reading them correctly.

However, that raises the issue of what to report back to GCC's __builtin_return_address function, which is limited, I think, to a Pmode return value. Returning the function index unadjusted would be a problem because exception handling (which seemed to work on the asm.js port at a time) requires more fine-grained information about which call our function was in.

The options I can think of are:

split the PC à la the asm.js port
introduce a third PC space to go with the function index space (+PLT) and the linear memory space (+GOT).
rewrite the code in unwind-dw2.c to allow for return addresses wider than the word size.

None of them is particularly nice, so I'm wondering whether there might be a simpler approach I'm missing.

dummy relocations to implement -ffunction-sections/--gc-sections

I've been trying to get -ffunction-sections to work, particularly in combination with --gc-sections. The problem is there is no .text section. A simple program will produce assembly output (after macro expansion) like this:

    .section .space.code..text.f
f:
    .byte 0
    .section .wasm.code..text.f
    <header data>
    <actual code>

The problem is that there is nothing that pulls in the second section when f is referenced by some other object. I believe section groups could work around the issue, but what I'm doing right now is to use a dummy reloc, R_ASMJS_CODE_POINTER, as follows:

    .section .space.code..text.f
f:
    .reloc .,R_ASMJS_CODE_POINTER,__wasm_code_f
    .byte 0
    .section .wasm.code..text.f
__wasm_code_f:
    <header data>
    <actual code>

That pulls in the section all right, but we need to do so for up to 5 sections (.code, .function, .element, .name.function, .name.local)! I think our options are:

use R_ASMJS_CODE_POINTER for all of them, making dummy relocs useless for anything but section pull-in because you no longer can say "give me the code for f.
use magic section names to distinguish R_ASMJS_CODE_POINTER types
use the addend field to distinguish them
add five different R_ASMJS_*_POINTER relocs

None of those seems obviously right.

It's possible build a ARM/THUMB?

I need a gba compiler.wasm, it's posible using this project to do it?
Or only compile to wasm?

[Sorry my bad English]

Optimizing control flow?

I think that this project emits a switch-in-a-loop pattern for control flow currently? If that's correct, then one option to make it fast could be to add a pass in the binaryen optimizer to handle that (it does have the "relooper" implementation, that goes from cfgs to loops/ifs, but it doesn't parse switches in loops into a cfg which would be necessary here). If that's potentially useful I could look into adding that pass in binaryen.

JS shell segfaults when running without --no-threads

https://bugzilla.mozilla.org/show_bug.cgi?id=1322681

Should be trivial to fix.

Passing $pc0 to wasm functions?

Right now, wasm functions have the signature
(int, int, int, int, int, int) -> int

The arguments are:

callee $dpc (-1 for first call)
$sp1 (= $sp + 16)
$r0
$r1
$rpc = caller $dpc
callee $pc0

There are six of them because there are six integer registers used for function arguments on x86_64.

The last argument is the callee's $pc0, which I thought would be a good idea to pass for dynamic linking. Now that dynamic linking is somewhat working, it turns out it's a bad idea to pass it: in the callee, the $pc0 is available as

    get_global $plt
    i32.const f
    i32.add

while in the caller, it's actually hard to calculate: the actual call is

    call f@plt

with the heavy lifting done by the assembler and linker interpreting the "@plt" part. But there's no way to write i32.const f@plt, since we can't have runtime relocs in text, so we're left with creating a GOT entry for every function we call, which seems excessive overhead.

It also seems questionable to pass the caller's $dpc but not the caller's $pc0; originally those were in a single 32-bit word, and used for __builtin_return_address, but that's another issue...

I'm considering omitting the last two arguments for now (and leaving __builtin_return_address broken).

"Native ABI" calls from gcc-generated code

The first step to remedying the problems pointed out at #7 (comment) is to allow gcc-generated code to call functions defined in the "ordinary" wasm ABI.

I think it would be nicest to declare such functions with __attribute__((rawcall)) and let the rest of the work be done by gcc. However, it's a lot simpler to use __asm__ statements directly. Unfortunately, those don't play nice with either C++ templates or C macros, so it probably wouldn't be as nice to use them.

I've started by solving the least important issue, and making PLT calls work for arbitrary function types. That should make the binutils code independent of the ABI used, at least, and hopefully permit me to submit it upstream.

At least the native ABI doesn't have varargs, so we needn't deal with those...

SpiderMonkey/Perl performance issue

There's a major performance issue (multi-minute compilation) with perl's yylex() function and SpiderMonkey. Both ion's default backtracking register allocator and the stupid block-based register allocator exhibit the problem.

--wasm-always-baseline or --no-asmjs both result in acceptable startup times (but, probably, very slow code).

This wasn't a problem a few months ago, so bisecting might be a good idea.

My suspicion is the problem might be the way that wasm turns large switch statements (which the yylex() function is) into deeply-nested blocks.

Investigate this and report to mozilla.

__udivmoddi4: integer division by zero

There appears to be an Ion bug resulting in some code throwing an integer division by zero exception even though the division is actually in a code path that is never followed.

https://bugzilla.mozilla.org/show_bug.cgi?id=1321189

pipcet / asmjs Goto Github PK

asmjs's People

Contributors

Stargazers

Watchers

Forkers

asmjs's Issues

Stack trampolines

Is this project still active is there any binary of some sort already?

__builtin_return_address

dummy relocations to implement -ffunction-sections/--gc-sections

It's possible build a ARM/THUMB?

Optimizing control flow?

JS shell segfaults when running without --no-threads

Passing $pc0 to wasm functions?

"Native ABI" calls from gcc-generated code

SpiderMonkey/Perl performance issue

__udivmoddi4: integer division by zero

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent