Comments (8)
Have you ran this with
roc build --optimize
to ensure it is optimised using LLVM to run fast?
Yes, the run times quoted in the OP were all using optimized builds and without optimization it is about four time slower than this.
I assumed that Roc must be using LLVM for the optimized builds although the documentation didn't list LLVM as a pre-requisite. Fedora 39 currently uses LLVM 17 as its distro standard so that is what is installed.
from roc.
Adding another comparison, F#/Fable, (an almost pure FP language other than it has optional mutability of all bindings, always mutable array contents, and uncontrolled IO side-effects) can transpile to Rust code from F# code, with the following code implementing the same benchmark as the OP:
let cLOOPS = 1000
let primesBench() =
let cmpsts = Array.zeroCreate 16384
let rec loopn n =
if n < cLOOPS then
let rec loopi i =
let s0 = (i + i) * (i + 3) + 3
if s0 < 131072 then
if cmpsts.[i >>> 3] &&& (1uy <<< (i &&& 7)) <> 0uy then loopi (i + 1) else
let bp = i + i + 3
let slmt = s0 + 8 * bp
let rec loops s =
if s < slmt then
let msk = 1uy <<< (s &&& 7)
let rec loopc c =
if c < 16384 then
cmpsts.[c] <- cmpsts.[c] ||| msk
loopc (c + bp)
loopc (s >>> 3); loops (s + bp)
loops s0; loopi (i + 1)
loopi 0; loopn (n + 1)
loopn 0
{0 .. 131071}
|> Seq.fold
(fun a i -> a - ((int cmpsts.[i >>> 3] >>> (i &&& 7)) &&& 1))
131073
[<EntryPoint>]
let main args =
printfn "PrimesBench in F#"
let start = System.DateTime.Now.Ticks
let answr = primesBench()
let elpsd = (System.DateTime.Now.Ticks - start) / 10000L
printfn "Found %d primes to 262146 for %d loops in %d milliseconds." answr cLOOPS elpsd
0
On the same computer as in the OP, this has the following run times:
- Compiled to JavaScript (optimized): 197 milliseconds.
- Compiled to DotNet IR (Release mode): 213 milliseconds.
- Compiled to Rust (release/optimized): 169 milliseconds.
The above Rust results, which are a little faster than as run by JavaScript on Node.js or DotNet version 7.0 IR, are perhaps a better comparison to Roc as there is an automatic array bounds check per array access, which is one of the main reasons it is slower than the C version other than it also reloads the base array pointer on every inner culling loop (doesn't "lift" this outside the loop as it should). Using Rust's unsafe mutable pointers is not an option from Fable currently as it doesn't support the interop package that would allow that.
from roc.
Have you ran this with roc build --optimize
to ensure it is optimised using LLVM to run fast? On linux I think the dev backend is the current default which is very much not optimised generation.
I would suggest you share this analysis on zulip also, it looks really interesting. I always enjoy the resulting discussion on performance topics. 😃
from roc.
I am gonna make a guess that this has nothing to do with bit twiddling. Probably and issue with closures or some sort of recursion doing work it shouldn't.
Also, might just be a failure of opportunistic mutation.
from roc.
Can you pull the various loop functions into top level functions instead of nested functions and then test it? I am curious how that plays out.
from roc.
I am gonna make a guess that this has nothing to do with bit twiddling.
Yes, I agree that it likely doesn't have to do with bit twiddling itself, just that bit-twiddling takes so very little time that it really shows up the overheads.
Probably and issue with closures or some sort of recursion doing work it shouldn't.
Well, all the the work is done in the innermost tiny loop closure, and that is barely a closure, only capturing the update function from outside its scope. I tried passing that update function in as an argument so the cull function isn't a closure, but no improvement...
Also, might just be a failure of opportunistic mutation.
Well, that would be your issue then, as the way the benchmark is constructed should produce in place mutation. I don't think it can be copying the whole array on every mutation, although I suppose it could be copying the array pointer more often than it should (as in never)...
If I remove the update function, it is fast although the answer is incorrect (of course)...
Is there an easy way to view the assembly emitted by LLVM?
from roc.
Can you pull the various loop functions into top level functions instead of nested functions and then test it?
There isn't much chance that is the problem as it is pretty fast when I just remove the List.update from the inner "loop", but just to be sure I added the bp
as an argument to really make it not a closure and moved it to outside the benchmark function with no change; I even added a type signature to no effect...
from roc.
Adding this to the issue. I think I have figured out what is going on.
Ok I have been digging into the IR and here are the two main problems from what I can see:
List.get
is a regular function, not a low level (and we don't have borrow inference). As such, before every call to the function we increment the refcount and in the function we decrement the refcount (checking if we need to deallocate each time).- The same root cause as #6213: By having allocas in locations other than the entry block stops llvm from being able to inline
List.get
. LLVM is smart enough to realize that inlining that function with allocas in the wrong place would lead to a stack space explosion. As such, it refuses to inline the function unless the the allocas are fixed.
So I did some testing to verify. One, I updated and used my borrowing-hacks
branch to avoid the first issue. Then I manually edited the llvm ir to fix the second issue.
The result: Found 23000 primes to 262146 in 103 milliseconds.
Note: both changes are need for the perf.
No changes: 1.5s
Just borrowing-hacks: 1.2s
Just alloca fix: 1.0s
both: 100ms
from roc.
Related Issues (20)
- Don't suggest using `... where * implements ...`
- repl inferred type can't be used
- feat: suppress REPL startup message HOT 1
- bug: comments not accepted in REPL HOT 4
- incomprehensible-to-novice error message when omitting parentheses HOT 1
- change comment in repl behavior
- Compiler errors could be better when using wrong syntax for 'rest' in list patterns
- space after backslash in function definition causes panic
- `roc build` appears to hang with underscored function name
- request: novice-readable error message for task/function mixups HOT 2
- roc format doc comments without space
- Roc format panic on valid code HOT 2
- Odd string to F64 parsing bugs in Roc REPL (old_linux_x86_64)
- An odd case of a segmentation fault on old_linux_x86_64
- Adding invalid implementation for a function creates error messages in other parts of the code HOT 7
- Compiler Panic "constructor must be known in the indexable type if we are exhautiveness checking"
- Improve type mismatch error: this need to be `a` but it is `a` HOT 5
- CudaText has new lexer 'Roc' HOT 1
- Surgical linker bug on Windows causes segfault
- Pattern matching panic HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from roc.