juliaci / basebenchmarks.jl Goto Github PK
View Code? Open in Web Editor NEWA collection of Julia benchmarks available for CI tracking from the JuliaLang/julia repository
License: Other
A collection of Julia benchmarks available for CI tracking from the JuliaLang/julia repository
License: Other
It would be amazing if this package could export data to Prometheus so we could use Grafana to query it and have nice insights about performance over a given point in time.
One usecase is to collect such data from the changes made by delvelopers in a given Julia package repository. So then the stakeholders could see how fast or slow the code is along the time.
Changing of the RNG in Julia can cause spurious benchmark regressions. For example,. an integer might be pushed above / below the pre boxed limit, causing changes in allocations.
@ararslan would be good if Nanosoldier could get refreshed with the latest benchmarks and possibly a retune.
I have checked out the nanosoldier
branch, and cloned Benchmarks and BenchmarkTrackers. When I try to load BaseBenchmarks I get this error:
ERROR: LoadError: LoadError: LoadError: UndefVarError: ExecutionResults not defined
in include(::ASCIIString) at ./boot.jl:233
in include_from_node1(::ASCIIString) at ./loading.jl:426
in include(::ASCIIString) at ./boot.jl:233
in include_from_node1(::ASCIIString) at ./loading.jl:426
in eval(::Module, ::Any) at ./boot.jl:236
[inlined code] from ./sysimg.jl:11
in require(::Symbol) at ./loading.jl:357
in include(::ASCIIString) at ./boot.jl:233
in include_from_node1(::ASCIIString) at ./loading.jl:426
in eval(::Module, ::Any) at ./boot.jl:236
[inlined code] from ./sysimg.jl:11
in require(::Symbol) at ./loading.jl:357
in eval(::Module, ::Any) at ./boot.jl:236
while loading /home/jeff/.julia/v0.5/BenchmarkTrackers/src/metrics.jl, in expression starting on line 21
while loading /home/jeff/.julia/v0.5/BenchmarkTrackers/src/BenchmarkTrackers.jl, in expression starting on line 38
while loading /home/jeff/.julia/v0.5/BaseBenchmarks/src/BaseBenchmarks.jl, in expression starting on line 3
Sorry for the unclear title. The problem can be summarized as follows: T = UInt; run(tune!(@benchmarkable rand($T)))
gives a very over-estimated time compared to run(tune!(@benchmarkable rand(UInt))
. While preparing a PR against julia/master, the RandomBenchmarks showed a lot of regressions because of this (in this case, T
is set in a loop), even though the performance is not degraded when running individual benchmarks using the second form (i.e. using UInt
directly). I tried solving this by using some incantation of eval
, with no success. My last try was something like (edit: it doesn't)T=UInt; RD=RandomDevice(); g[...] = eval(@benchmarkable rand(Expr(:$, RD), $T)
(here RD
must not be interpolated by eval
, only by @benchmarkable
). I'm not sure whether this works as intended, but it's ugly, so wanted to discuss this problem here before working more on this.
This is an annoyance because it runs on any benchmark job. Here's an example: JuliaLang/julia#47966 (comment). It would probably be sufficient to bump up the time tolerance.
It is also interesting to see performance improvements. It might be worth making these stick out somehow. Putting it in bold like the regressions would be confusing but maybe underlining or cursive?
I tried it on a beefy remote machine and it was taking so long to load that I had to kill it after 40 minutes. This is also what's causing the 0.7 Travis builds to get killed, as it sits for 10 minutes with no output, which is Travis' limit.
Hi everyone,
I was trying to run BaseBenchmarks with Julia v0.6 built from the latest commit (ec15da730a5b25431d3cfcdc04e3451a9f3e400e) on branch master and got the following the following error when running running the cholfact
benchmark:
ERROR: ArgumentError: matrix is not symmetric/Hermitian. This error can be avoided by calling cholfact(Hermitian(A)) which will ignore either the upper or lower triangle of the matrix.
The code I used to invoke the benchmark was:
using BaseBenchmarks
BaseBenchmarks.load!("linalg")
run(BaseBenchmarks.SUITE[["linalg", "factorization", ("cholfact","Matrix",256)]], verbose=true);
I'm also using the latest master commit (e7c01ab) of the BaseBenchmarks repository, so I hope this is not a fault that is caused by my local setup.
Best,
Matthias
The link to "BenchmarkTools manual" does not work, I suggest using https://juliaci.github.io/BenchmarkTools.jl/dev/manual/
(I wondered why this package wasn't registered.)
I found this tool difficult to install — in fact I haven't yet succeeded. It wants me to install Xcode (I'm on macOS) from the App store, which I don't really want to do.
But: I was looking for a simple way to measure the performance of Julia on two or three different machines - a kind of high-level score
that gives a number that can be used to compare their performance: i.e. Computer 1 scored 7.3, but Computer 2 scored 16.3. I know this will be worrying vague and imprecise but it can be a useful first step in troubleshooting.
So, is it possible to add something like an executive overview benchmark that's easy to run?
(Obviously I'd need to install it on all the machines I have first... :) )
I could have sworn I'd previously opened an issue here with an idea for a benchmark but now I can't find it. There was a discussion of some algorithms that are hard/impossible to do in a vectorized way, and a couple that came up were:
Please post and discuss more ideas here!
Part 2 in the rousing series that began with #158. Things that are deprecated in 0.7 that we need to fix here:
linspace
repmat
We recently discovered in JuliaLang/julia#29888 a somewhat surprising fact about modern CPU: The branch-predictor is capable of remembering astoundingly long periodic patterns. If we run a benchmark loop, then each evaluation and possibly each sample will have identical branching patterns, thus introducing a period. In other words, the sneaky little CPU learns our sample set and we have an emergent defeat device for some of our benchmarks.
At least the findall
benchmarks are broken, and probably have been broken forever. I suspect that the logical indexing benchmarks are broken as well. But we should really go over all our benchmarks and figure out which ones are affected. Also, this is interesting and something to keep in mind for all our benchmarks. Indirect branch prediction (the BTB) and D-cache are something to keep in mind as well.
The likely fix is to increase the size of testsets. Long-term it would be cool to parametrize all benchmarks, and occasionally (rarely) run regression tests on our regression tests: Check that everything has the expected scaling behavior, and explain or fix surprises. Alternatively, we could regenerate new random data between runs. But afaik BenchmarkTools has no support for that (would need new feature to fix evals/sample = 1
).
Demo:
julia> using Printf, BenchmarkTools
julia> function cpu_speed_ghz()
# if available, use reported CPU speed instead of current speed (which
# may be inaccurate if the CPU is idling)
cpu_info = Sys.cpu_info()[1]
m = match(r"([\d\.]+)GHz", cpu_info.model)
ghz = m ≡ nothing ? cpu_info.speed / 1000 : parse(Float64, m.captures[1])
end;
julia> const CPU_SPEED_GHZ = cpu_speed_ghz();
julia> const cpu_model = Sys.cpu_info()[1].model;
julia> begin
N=30_000
list = fill(false, N); list[1:2:end].=true;
bt0 = @belapsed findall($list)
list .= rand(Bool, N)
btL = @belapsed findall($list)
time_to_cycle = 10^9/N * CPU_SPEED_GHZ
penalty = 2*(btL-bt0)*time_to_cycle
@printf("\n\n%s; branch-miss penalty: %4.1f ns = %4.1f cycles\n\n",
cpu_model, penalty/CPU_SPEED_GHZ , penalty)
bt = bt0
@printf("Period %5d: %7.2f us = %7.2f cycles per idx. Miss-rate %5.2f%%\n",
2, bt*10^6, bt*time_to_cycle, 100*(bt - bt0) *time_to_cycle / penalty )
for n=[100, 500, 1000, 2000, 2500, 3000, 5000, 10_000, 30_000]
pat = rand(Bool, n)
for i=1:n:N list[i:(i+n-1)].=pat end
bt = @belapsed findall($list)
@printf("Period %5d: %7.2f us = %7.2f cycles per idx. Miss-rate %5.2f%%\n",
n, bt*10^6, bt*time_to_cycle, 100*(bt - bt0) *time_to_cycle / penalty )
end
end;
yielding:
Intel(R) Core(TM) i5-5###U CPU @ 2.00GHz; branch-miss penalty: 9.9 ns = 19.8 cycles
Period 2: 44.81 us = 2.99 cycles per idx. Miss-rate 0.00%
Period 100: 53.22 us = 3.55 cycles per idx. Miss-rate 2.83%
Period 500: 51.52 us = 3.43 cycles per idx. Miss-rate 2.26%
Period 1000: 51.37 us = 3.42 cycles per idx. Miss-rate 2.21%
Period 2000: 57.85 us = 3.86 cycles per idx. Miss-rate 4.39%
Period 2500: 88.66 us = 5.91 cycles per idx. Miss-rate 14.77%
Period 3000: 121.78 us = 8.12 cycles per idx. Miss-rate 25.93%
Period 5000: 159.28 us = 10.62 cycles per idx. Miss-rate 38.56%
Period 10000: 182.87 us = 12.19 cycles per idx. Miss-rate 46.51%
Period 30000: 192.51 us = 12.83 cycles per idx. Miss-rate 49.75%
This is compatible with Agner Fog's tables. And it is absolutely mindboggling that the CPU manages to completely defeat patterns of length 2000. If you have a different CPU-Arch available (Ryzen? Skylake? Power?), then please post similar figures. We should increase testset sizes above the BHT limits for all realistic current and near-future CPUs.
JuliaBox's CPU gives a similar cutoff:
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz; branch-miss penalty: 6.3 ns = 14.6 cycles
Period 2: 28.40 us = 2.18 cycles per idx. Miss-rate 0.00%
Period 100: 32.50 us = 2.49 cycles per idx. Miss-rate 2.16%
Period 500: 32.30 us = 2.48 cycles per idx. Miss-rate 2.05%
Period 1000: 32.70 us = 2.51 cycles per idx. Miss-rate 2.26%
Period 2000: 33.40 us = 2.56 cycles per idx. Miss-rate 2.63%
Period 2500: 42.70 us = 3.27 cycles per idx. Miss-rate 7.52%
Period 3000: 71.90 us = 5.51 cycles per idx. Miss-rate 22.87%
Period 5000: 102.20 us = 7.84 cycles per idx. Miss-rate 38.80%
Period 10000: 116.70 us = 8.95 cycles per idx. Miss-rate 46.42%
Period 30000: 123.40 us = 9.46 cycles per idx. Miss-rate 49.95%
BaseBenchmarks.jl/src/problem/Laplacian.jl
Lines 41 to 51 in 9306a27
It is just benchmarking the setup of the sparse matrix (effectively testing sparse matmat)
If you make a function faster while keeping the GC pressure the same the % oft ime in GC will go up but this is not really a regression.
For example: https://github.com/JuliaCI/BaseBenchmarkReports/blob/master/5e07022/5e07022_vs_1a518ec.md
I just ran the following with Julia v1.3.1 and the current mast branch of BaseBenchmarks.jl on my both MacBook Pro and iMac (macOS 10.14.6 Mojave), Windows 10 machine (v 1909), and Linux machine (Ubuntu 18.04 LTS). If I ran using ApproxFun
before running the benchmark tests, it produced the following DomainError while running (20/25) benchmarking "floatexp"
. So far, many other packages other than ApproxFun
I typically use did not generate the same error. What is the problem here? Thanks for your help!
using ApproxFun, BenchmarkTools, BaseBenchmarks
BaseBenchmarks.load!("scalar")
results = run(BaseBenchmarks.SUITE["scalar"]["floatexp"]; verbose = true)
...
(22/55) benchmarking ("exponent", "subnorm", "Float32")...
ERROR: DomainError with 0.0:
Cannot be subnormal converted to 0.
Stacktrace:
[1] (::Base.Math.var"#throw2#2")(::Float32) at ./math.jl:716
[2] exponent at ./math.jl:721 [inlined]
[3] ##core#6109(::Float32) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:297
[4] ##sample#6110(::BenchmarkTools.Parameters) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:303
[5] #_run#743(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(BenchmarkTools._run), ::BenchmarkTools.Benchmark{Symbol("##benchmark#6108")}, ::BenchmarkTools.Parameters) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:331
[6] (::BenchmarkTools.var"#kw##_run")(::NamedTuple{(:verbose, :pad),Tuple{Bool,String}}, ::typeof(BenchmarkTools._run), ::BenchmarkTools.Benchmark{Symbol("##benchmark#6108")}, ::BenchmarkTools.Parameters) at ./none:0
[7] (::Base.var"#inner#2"{Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol},NamedTuple{(:verbose, :pad),Tuple{Bool,String}}},typeof(BenchmarkTools._run),Tuple{BenchmarkTools.Benchmark{Symbol("##benchmark#6108")},BenchmarkTools.Parameters}})() at ./essentials.jl:712
[8] #invokelatest#1 at ./essentials.jl:713 [inlined]
[9] #invokelatest at ./none:0 [inlined]
[10] #run_result#37 at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:32 [inlined]
[11] #run_result at ./none:0 [inlined]
[12] #run#39(::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol},NamedTuple{(:verbose, :pad),Tuple{Bool,String}}}, ::typeof(run), ::BenchmarkTools.Benchmark{Symbol("##benchmark#6108")}, ::BenchmarkTools.Parameters) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:46
[13] #run at ./none:0 [inlined] (repeats 2 times)
[14] macro expansion at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:55 [inlined]
[15] macro expansion at ./util.jl:212 [inlined]
[16] #run#40(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(run), ::BenchmarkGroup) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:54
[17] (::Base.var"#kw##run")(::NamedTuple{(:verbose, :pad),Tuple{Bool,String}}, ::typeof(run), ::BenchmarkGroup) at ./none:0
[18] macro expansion at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:55 [inlined]
[19] macro expansion at ./util.jl:212 [inlined]
[20] #run#40(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(run), ::BenchmarkGroup) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:54
[21] (::Base.var"#kw##run")(::NamedTuple{(:verbose, :pad),Tuple{Bool,String}}, ::typeof(run), ::BenchmarkGroup) at ./none:0
[22] macro expansion at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:55 [inlined]
[23] macro expansion at ./util.jl:212 [inlined]
[24] #run#40(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(run), ::BenchmarkGroup) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:54
[25] (::Base.var"#kw##run")(::NamedTuple{(:verbose,),Tuple{Bool}}, ::typeof(run), ::BenchmarkGroup) at ./none:0
[26] top-level scope at none:0
It's pretty clear that there will be no more 0.6 releases. The potential for another 0.6 release was the only reason we still run 0.6 on CI and make heavy use of Compat here. I think it's now safe to drop 0.6 support.
Tasks:
If you make a function faster while keeping the GC pressure the same the % oft ime in GC will go up but this is not really a regression.
For example: https://github.com/JuliaCI/BaseBenchmarkReports/blob/master/5e07022/5e07022_vs_1a518ec.md
I count ~240 of them only testing sparse * dense matrix multiplication. We cannot have this type of exhaustive testing here, in my opinion.
Lately on Julia master, the sparse matmul benchmarks have been showing a lot of spurious improvements and regressions. It would be great if we could determine why they're so noisy and adjust them as needed.
@Sacha0, would you be willing to take a crack at this?
Runtime of functions is nice, but it is also nice to track things like time to first plot.
Incorporating a set of compilation time benchmarks is a bit tricky because you need to take care to restart Julia, clear out precompilation files etc but it should be doable.
The tests take too long, here's why:
$ grep '^[^ ]' 093b2a6ea943f4c70fef4742453e73dd1aba255c_primary.out
RUNNING BENCHMARKS...
(1/20) benchmarking "shootout"...
done (took 70.756991683 seconds)
(2/20) benchmarking "string"...
done (took 90.13968044 seconds)
(3/20) benchmarking "linalg"...
done (took 833.003473297 seconds)
(4/20) benchmarking "parallel"...
done (took 17.074202133 seconds)
(5/20) benchmarking "find"...
done (took 139.282557616 seconds)
(6/20) benchmarking "tuple"...
done (took 107.72978849 seconds)
(7/20) benchmarking "dates"...
done (took 112.767580891 seconds)
(8/20) benchmarking "micro"...
done (took 43.087218279 seconds)
(9/20) benchmarking "io"...
done (took 61.65084442 seconds)
(10/20) benchmarking "scalar"... # not unreasonable tests, but takes 2 seconds each
done (took 2877.452812717 seconds)
(11/20) benchmarking "sparse"... # "matmul" has over 200 combinations, probably could be more "sparse"
done (took 1278.73521643 seconds)
(12/20) benchmarking "broadcast"...
done (took 104.842238879 seconds)
(13/20) benchmarking "union"... # 341 tests may be reasonable, but taking 3 seconds each
done (took 1192.683332499 seconds)
(14/20) benchmarking "simd"...
done (took 337.777793678 seconds)
(15/20) benchmarking "random"...
done (took 619.551763952 seconds)
(16/20) benchmarking "problem"...
done (took 172.609170893 seconds)
(17/20) benchmarking "array"... # 608 "index" tests may be a bit much, and take 4 seconds each
done (took 2795.986356153 seconds)
(18/20) benchmarking "misc"...
done (took 158.855686545 seconds)
(19/20) benchmarking "sort"...
done (took 648.501236383 seconds)
(20/20) benchmarking "collection"...
done (took 916.806169471 seconds)
BenchmarkTools now serializes benchmark data to and from JSON rather than JLD, so the tuning here will need to produce a .json file. In all likelihood we will lose the ability to easily retune a specific group only and instead will have to retune everything every time (which is what I've been doing anyway).
This issue will be a holding ground for various deprecations in 0.7 that are not yet fixed here.
A_mul_B
and friends.'
transposeVoid
replace
without Pair
ssearch
, searchindex
Array
constructor without uninitialized
sub2ind
CartesianRange
parse(::String)
Vector{UInt8}(::String)
find
parse
in a base without a kwargmethod_exists
indmin
/indmax
FYI I'm seeing a strange error that starts with libhdf5 messages of which the relevant portion seems to be
...
#010: ../../../src/H5FDint.c line 207 in H5FD_read(): addr overflow, addr = 140424345352640, size=512, eoa=1905739
major: Invalid arguments to routine
minor: Address overflowed
ERROR: LoadError: Error dereferencing object
Stacktrace:
...
It turns out this can be circumvented by replacing
BaseBenchmarks.loadall!()
with
BaseBenchmarks.load!("linalg")
BaseBenchmarks.loadall!()
Filing this without understanding what is actually going wrong, in case someone else encounters this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.