juliaci / basebenchmarks.jl Goto Github PK

A collection of Julia benchmarks available for CI tracking from the JuliaLang/julia repository

License: Other

Julia 99.96% Shell 0.04%

basebenchmarks.jl's Issues

Feature request: Export benchmark results to Prometheus

It would be amazing if this package could export data to Prometheus so we could use Grafana to query it and have nice insights about performance over a given point in time.

One usecase is to collect such data from the changes made by delvelopers in a given Julia package repository. So then the stakeholders could see how fast or slow the code is along the time.

Use StableRNGs for random number generation

Changing of the RNG in Julia can cause spurious benchmark regressions. For example,. an integer might be pushed above / below the pre boxed limit, causing changes in allocations.

New tuning + update with latest benchmarks

@ararslan would be good if Nanosoldier could get refreshed with the latest benchmarks and possibly a retune.

`ExecutionResults not defined`

I have checked out the nanosoldier branch, and cloned Benchmarks and BenchmarkTrackers. When I try to load BaseBenchmarks I get this error:

ERROR: LoadError: LoadError: LoadError: UndefVarError: ExecutionResults not defined
 in include(::ASCIIString) at ./boot.jl:233
 in include_from_node1(::ASCIIString) at ./loading.jl:426
 in include(::ASCIIString) at ./boot.jl:233
 in include_from_node1(::ASCIIString) at ./loading.jl:426
 in eval(::Module, ::Any) at ./boot.jl:236
 [inlined code] from ./sysimg.jl:11
 in require(::Symbol) at ./loading.jl:357
 in include(::ASCIIString) at ./boot.jl:233
 in include_from_node1(::ASCIIString) at ./loading.jl:426
 in eval(::Module, ::Any) at ./boot.jl:236
 [inlined code] from ./sysimg.jl:11
 in require(::Symbol) at ./loading.jl:357
 in eval(::Module, ::Any) at ./boot.jl:236
while loading /home/jeff/.julia/v0.5/BenchmarkTrackers/src/metrics.jl, in expression starting on line 21
while loading /home/jeff/.julia/v0.5/BenchmarkTrackers/src/BenchmarkTrackers.jl, in expression starting on line 38
while loading /home/jeff/.julia/v0.5/BaseBenchmarks/src/BaseBenchmarks.jl, in expression starting on line 3

Benchmarking with interpolated type gives different results

Sorry for the unclear title. The problem can be summarized as follows: T = UInt; run(tune!(@benchmarkable rand($T))) gives a very over-estimated time compared to run(tune!(@benchmarkable rand(UInt)). While preparing a PR against julia/master, the RandomBenchmarks showed a lot of regressions because of this (in this case, T is set in a loop), even though the performance is not degraded when running individual benchmarks using the second form (i.e. using UInt directly). I tried solving this by using some incantation of eval, with no success. My last try was something like T=UInt; RD=RandomDevice(); g[...] = eval(@benchmarkable rand(Expr(:$, RD), $T) (here RD must not be interpolated by eval, only by @benchmarkable). I'm not sure whether this works as intended(edit: it doesn't)~~, but it's ugly, so~~ wanted to discuss this problem here before working more on this.

"fib" microbenchmark gives false positives

This is an annoyance because it runs on any benchmark job. Here's an example: JuliaLang/julia#47966 (comment). It would probably be sufficient to bump up the time tolerance.

Distinguish performance improvements in some way

It is also interesting to see performance improvements. It might be worth making these stick out somehow. Putting it in bold like the regressions would be confusing but maybe underlining or cursive?

Problem suite takes >40 minutes to load

I tried it on a beefy remote machine and it was taking so long to load that I had to kill it after 40 minutes. This is also what's causing the 0.7 Travis builds to get killed, as it sits for 10 minutes with no output, which is Travis' limit.

Running `cholfact` benchmark leads to `ArgumentError`

Hi everyone,

I was trying to run BaseBenchmarks with Julia v0.6 built from the latest commit (ec15da730a5b25431d3cfcdc04e3451a9f3e400e) on branch master and got the following the following error when running running the cholfact benchmark:

ERROR: ArgumentError: matrix is not symmetric/Hermitian. This error can be avoided by calling cholfact(Hermitian(A)) which will ignore either the upper or lower triangle of the matrix.

The code I used to invoke the benchmark was:

using BaseBenchmarks
BaseBenchmarks.load!("linalg")
run(BaseBenchmarks.SUITE[["linalg", "factorization", ("cholfact","Matrix",256)]], verbose=true);

I'm also using the latest master commit (e7c01ab) of the BaseBenchmarks repository, so I hope this is not a fault that is caused by my local setup.

Best,
Matthias

Dead link

The link to "BenchmarkTools manual" does not work, I suggest using https://juliaci.github.io/BenchmarkTools.jl/dev/manual/

Easier to use?

(I wondered why this package wasn't registered.)

I found this tool difficult to install — in fact I haven't yet succeeded. It wants me to install Xcode (I'm on macOS) from the App store, which I don't really want to do.

But: I was looking for a simple way to measure the performance of Julia on two or three different machines - a kind of high-level score that gives a number that can be used to compare their performance: i.e. Computer 1 scored 7.3, but Computer 2 scored 16.3. I know this will be worrying vague and imprecise but it can be a useful first step in troubleshooting.

So, is it possible to add something like an executive overview benchmark that's easy to run?

(Obviously I'd need to install it on all the machines I have first... :) )

benchmark ideas

I could have sworn I'd previously opened an issue here with an idea for a benchmark but now I can't find it. There was a discussion of some algorithms that are hard/impossible to do in a vectorized way, and a couple that came up were:

PDE solver (@ChrisRackauckas)
counting the frequency of all substrings of a large string (@jakobnissen)

Please post and discuss more ideas here!

0.7 deprecations, take 2

Part 2 in the rousing series that began with #158. Things that are deprecated in 0.7 that we need to fix here:

linspace
repmat

Sneaky branch-predictor remembering inputs over benchmark loops

We recently discovered in JuliaLang/julia#29888 a somewhat surprising fact about modern CPU: The branch-predictor is capable of remembering astoundingly long periodic patterns. If we run a benchmark loop, then each evaluation and possibly each sample will have identical branching patterns, thus introducing a period. In other words, the sneaky little CPU learns our sample set and we have an emergent defeat device for some of our benchmarks.

At least the findall benchmarks are broken, and probably have been broken forever. I suspect that the logical indexing benchmarks are broken as well. But we should really go over all our benchmarks and figure out which ones are affected. Also, this is interesting and something to keep in mind for all our benchmarks. Indirect branch prediction (the BTB) and D-cache are something to keep in mind as well.

The likely fix is to increase the size of testsets. Long-term it would be cool to parametrize all benchmarks, and occasionally (rarely) run regression tests on our regression tests: Check that everything has the expected scaling behavior, and explain or fix surprises. Alternatively, we could regenerate new random data between runs. But afaik BenchmarkTools has no support for that (would need new feature to fix evals/sample = 1).

Demo:

julia> using Printf, BenchmarkTools

julia> function cpu_speed_ghz()
                  # if available, use reported CPU speed instead of current speed (which
                  # may be inaccurate if the CPU is idling)
                  cpu_info = Sys.cpu_info()[1]
                  m = match(r"([\d\.]+)GHz", cpu_info.model)
                  ghz = m ≡ nothing ? cpu_info.speed / 1000 : parse(Float64,  m.captures[1])
              end;

julia> const CPU_SPEED_GHZ = cpu_speed_ghz();

julia> const cpu_model = Sys.cpu_info()[1].model;

julia> begin 
           N=30_000
           list = fill(false, N); list[1:2:end].=true;
           bt0 = @belapsed findall($list)
           list .= rand(Bool, N)
           btL = @belapsed findall($list)
           time_to_cycle = 10^9/N * CPU_SPEED_GHZ
           penalty = 2*(btL-bt0)*time_to_cycle
           @printf("\n\n%s; branch-miss penalty: %4.1f ns = %4.1f cycles\n\n", 
           cpu_model, penalty/CPU_SPEED_GHZ , penalty)

           bt = bt0
           @printf("Period %5d: %7.2f us = %7.2f cycles per idx. Miss-rate %5.2f%%\n", 
                   2, bt*10^6, bt*time_to_cycle, 100*(bt - bt0) *time_to_cycle / penalty )
           for n=[100, 500, 1000, 2000, 2500, 3000, 5000, 10_000, 30_000]
               pat = rand(Bool, n)
               for i=1:n:N list[i:(i+n-1)].=pat end
               bt = @belapsed findall($list)
               @printf("Period %5d: %7.2f us = %7.2f cycles per idx. Miss-rate %5.2f%%\n", 
                   n, bt*10^6, bt*time_to_cycle, 100*(bt - bt0) *time_to_cycle / penalty )
           end
       end;

yielding:

Intel(R) Core(TM) i5-5###U CPU @ 2.00GHz; branch-miss penalty:  9.9 ns = 19.8 cycles

Period     2:   44.81 us =    2.99 cycles per idx. Miss-rate  0.00%
Period   100:   53.22 us =    3.55 cycles per idx. Miss-rate  2.83%
Period   500:   51.52 us =    3.43 cycles per idx. Miss-rate  2.26%
Period  1000:   51.37 us =    3.42 cycles per idx. Miss-rate  2.21%
Period  2000:   57.85 us =    3.86 cycles per idx. Miss-rate  4.39%
Period  2500:   88.66 us =    5.91 cycles per idx. Miss-rate 14.77%
Period  3000:  121.78 us =    8.12 cycles per idx. Miss-rate 25.93%
Period  5000:  159.28 us =   10.62 cycles per idx. Miss-rate 38.56%
Period 10000:  182.87 us =   12.19 cycles per idx. Miss-rate 46.51%
Period 30000:  192.51 us =   12.83 cycles per idx. Miss-rate 49.75%

This is compatible with Agner Fog's tables. And it is absolutely mindboggling that the CPU manages to completely defeat patterns of length 2000. If you have a different CPU-Arch available (Ryzen? Skylake? Power?), then please post similar figures. We should increase testset sizes above the BHT limits for all realistic current and near-future CPUs.

JuliaBox's CPU gives a similar cutoff:

Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz; branch-miss penalty:  6.3 ns = 14.6 cycles

Period     2:   28.40 us =    2.18 cycles per idx. Miss-rate  0.00%
Period   100:   32.50 us =    2.49 cycles per idx. Miss-rate  2.16%
Period   500:   32.30 us =    2.48 cycles per idx. Miss-rate  2.05%
Period  1000:   32.70 us =    2.51 cycles per idx. Miss-rate  2.26%
Period  2000:   33.40 us =    2.56 cycles per idx. Miss-rate  2.63%
Period  2500:   42.70 us =    3.27 cycles per idx. Miss-rate  7.52%
Period  3000:   71.90 us =    5.51 cycles per idx. Miss-rate 22.87%
Period  5000:  102.20 us =    7.84 cycles per idx. Miss-rate 38.80%
Period 10000:  116.70 us =    8.95 cycles per idx. Miss-rate 46.42%
Period 30000:  123.40 us =    9.46 cycles per idx. Miss-rate 49.95%

Sparse matvec tests broken on 0.6

See https://travis-ci.org/JuliaCI/BaseBenchmarks.jl/jobs/464284112

`perf_laplace_sparse_matvec` doesn't seem to do any matvecs

BaseBenchmarks.jl/src/problem/Laplacian.jl

Lines 41 to 51 in 9306a27

    
           function laplace_sparse_matvec(n1, n2, n3) 
        
               I_n1 = sparse(1.0I, n1, n1) 
        
               I_n2 = sparse(1.0I, n2, n2) 
        
               I_n3 = sparse(1.0I, n3, n3) 
        
               D1 = kron(I_n3, kron(I_n2, ddx_spdiags(n1))) 
        
               D2 = kron(I_n3, kron(ddx_spdiags(n2), I_n1)) 
        
               D3 = kron(ddx_spdiags(n3), kron(I_n2, I_n1)) 
        
               D = [D1 D2 D3] # divergence from faces to cell-centers 
        
               return D*D' 
        
           end

It is just benchmarking the setup of the sparse matrix (effectively testing sparse matmat)

% of time spent in GC should probably not count as a regression

If you make a function faster while keeping the GC pressure the same the % oft ime in GC will go up but this is not really a regression.

For example: https://github.com/JuliaCI/BaseBenchmarkReports/blob/master/5e07022/5e07022_vs_1a518ec.md

benchmarking ("exponent", "subnorm", "Float32") produced the DomainError when ApproxFun package is also used

I just ran the following with Julia v1.3.1 and the current mast branch of BaseBenchmarks.jl on my both MacBook Pro and iMac (macOS 10.14.6 Mojave), Windows 10 machine (v 1909), and Linux machine (Ubuntu 18.04 LTS). If I ran using ApproxFun before running the benchmark tests, it produced the following DomainError while running (20/25) benchmarking "floatexp". So far, many other packages other than ApproxFun I typically use did not generate the same error. What is the problem here? Thanks for your help!

using ApproxFun, BenchmarkTools, BaseBenchmarks
BaseBenchmarks.load!("scalar")
results = run(BaseBenchmarks.SUITE["scalar"]["floatexp"]; verbose = true)
...
(22/55) benchmarking ("exponent", "subnorm", "Float32")...
ERROR: DomainError with 0.0:
Cannot be subnormal converted to 0.
Stacktrace:
 [1] (::Base.Math.var"#throw2#2")(::Float32) at ./math.jl:716
 [2] exponent at ./math.jl:721 [inlined]
 [3] ##core#6109(::Float32) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:297
 [4] ##sample#6110(::BenchmarkTools.Parameters) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:303
 [5] #_run#743(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(BenchmarkTools._run), ::BenchmarkTools.Benchmark{Symbol("##benchmark#6108")}, ::BenchmarkTools.Parameters) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:331
 [6] (::BenchmarkTools.var"#kw##_run")(::NamedTuple{(:verbose, :pad),Tuple{Bool,String}}, ::typeof(BenchmarkTools._run), ::BenchmarkTools.Benchmark{Symbol("##benchmark#6108")}, ::BenchmarkTools.Parameters) at ./none:0
 [7] (::Base.var"#inner#2"{Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol},NamedTuple{(:verbose, :pad),Tuple{Bool,String}}},typeof(BenchmarkTools._run),Tuple{BenchmarkTools.Benchmark{Symbol("##benchmark#6108")},BenchmarkTools.Parameters}})() at ./essentials.jl:712
 [8] #invokelatest#1 at ./essentials.jl:713 [inlined]
 [9] #invokelatest at ./none:0 [inlined]
 [10] #run_result#37 at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:32 [inlined]
 [11] #run_result at ./none:0 [inlined]
 [12] #run#39(::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol},NamedTuple{(:verbose, :pad),Tuple{Bool,String}}}, ::typeof(run), ::BenchmarkTools.Benchmark{Symbol("##benchmark#6108")}, ::BenchmarkTools.Parameters) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:46
 [13] #run at ./none:0 [inlined] (repeats 2 times)
 [14] macro expansion at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:55 [inlined]
 [15] macro expansion at ./util.jl:212 [inlined]
 [16] #run#40(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(run), ::BenchmarkGroup) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:54
 [17] (::Base.var"#kw##run")(::NamedTuple{(:verbose, :pad),Tuple{Bool,String}}, ::typeof(run), ::BenchmarkGroup) at ./none:0
 [18] macro expansion at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:55 [inlined]
 [19] macro expansion at ./util.jl:212 [inlined]
 [20] #run#40(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(run), ::BenchmarkGroup) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:54
 [21] (::Base.var"#kw##run")(::NamedTuple{(:verbose, :pad),Tuple{Bool,String}}, ::typeof(run), ::BenchmarkGroup) at ./none:0
 [22] macro expansion at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:55 [inlined]
 [23] macro expansion at ./util.jl:212 [inlined]
 [24] #run#40(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(run), ::BenchmarkGroup) at /Users/xxx/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:54
 [25] (::Base.var"#kw##run")(::NamedTuple{(:verbose,),Tuple{Bool}}, ::typeof(run), ::BenchmarkGroup) at ./none:0
 [26] top-level scope at none:0

Drop 0.6

It's pretty clear that there will be no more 0.6 releases. The potential for another 0.6 release was the only reason we still run 0.6 on CI and make heavy use of Compat here. I think it's now safe to drop 0.6 support.

Tasks:

Remove version conditionals
Remove Compat dependency
Simplify any code that worked around 0.6 differences
Remove 0.6 from CI

% of time spent in GC should probably not count as a regression

If you make a function faster while keeping the GC pressure the same the % oft ime in GC will go up but this is not really a regression.

For example: https://github.com/JuliaCI/BaseBenchmarkReports/blob/master/5e07022/5e07022_vs_1a518ec.md

The number of sparse * dense matrix multiplication seems completely excessive

I count ~240 of them only testing sparse * dense matrix multiplication. We cannot have this type of exhaustive testing here, in my opinion.

Reduce noise in sparse matmul benchmarks

Lately on Julia master, the sparse matmul benchmarks have been showing a lot of spurious improvements and regressions. It would be great if we could determine why they're so noisy and adjust them as needed.

@Sacha0, would you be willing to take a crack at this?

Compilation benchmarks

Runtime of functions is nice, but it is also nice to track things like time to first plot.

Incorporating a set of compilation time benchmarks is a bit tricky because you need to take care to restart Julia, clear out precompilation files etc but it should be doable.

tame benchmark time

The tests take too long, here's why:

$ grep '^[^ ]' 093b2a6ea943f4c70fef4742453e73dd1aba255c_primary.out 
RUNNING BENCHMARKS...
(1/20) benchmarking "shootout"...
done (took 70.756991683 seconds)
(2/20) benchmarking "string"...
done (took 90.13968044 seconds)
(3/20) benchmarking "linalg"...
done (took 833.003473297 seconds)
(4/20) benchmarking "parallel"...
done (took 17.074202133 seconds)
(5/20) benchmarking "find"...
done (took 139.282557616 seconds)
(6/20) benchmarking "tuple"...
done (took 107.72978849 seconds)
(7/20) benchmarking "dates"...
done (took 112.767580891 seconds)
(8/20) benchmarking "micro"...
done (took 43.087218279 seconds)
(9/20) benchmarking "io"...
done (took 61.65084442 seconds)
(10/20) benchmarking "scalar"... # not unreasonable tests, but takes 2 seconds each
done (took 2877.452812717 seconds)
(11/20) benchmarking "sparse"... # "matmul" has over 200 combinations, probably could be more "sparse"
done (took 1278.73521643 seconds)
(12/20) benchmarking "broadcast"...
done (took 104.842238879 seconds)
(13/20) benchmarking "union"... # 341 tests may be reasonable, but taking 3 seconds each
done (took 1192.683332499 seconds)
(14/20) benchmarking "simd"...
done (took 337.777793678 seconds)
(15/20) benchmarking "random"...
done (took 619.551763952 seconds)
(16/20) benchmarking "problem"...
done (took 172.609170893 seconds)
(17/20) benchmarking "array"... # 608 "index" tests may be a bit much, and take 4 seconds each
done (took 2795.986356153 seconds)
(18/20) benchmarking "misc"...
done (took 158.855686545 seconds)
(19/20) benchmarking "sort"...
done (took 648.501236383 seconds)
(20/20) benchmarking "collection"...
done (took 916.806169471 seconds)

Update to use JSON-based serialization

BenchmarkTools now serializes benchmark data to and from JSON rather than JLD, so the tuning here will need to produce a .json file. In all likelihood we will lose the ability to easily retune a specific group only and instead will have to retune everything every time (which is what I've been doing anyway).

Fix deprecations for 0.7

This issue will be a holding ground for various deprecations in 0.7 that are not yet fixed here.

libhdf5 error

FYI I'm seeing a strange error that starts with libhdf5 messages of which the relevant portion seems to be

...
  #010: ../../../src/H5FDint.c line 207 in H5FD_read(): addr overflow, addr = 140424345352640, size=512, eoa=1905739
    major: Invalid arguments to routine
    minor: Address overflowed
ERROR: LoadError: Error dereferencing object
Stacktrace:
...

It turns out this can be circumvented by replacing

BaseBenchmarks.loadall!()

with

BaseBenchmarks.load!("linalg")
BaseBenchmarks.loadall!()

Filing this without understanding what is actually going wrong, in case someone else encounters this.


	function laplace_sparse_matvec(n1, n2, n3)
	I_n1 = sparse(1.0I, n1, n1)
	I_n2 = sparse(1.0I, n2, n2)
	I_n3 = sparse(1.0I, n3, n3)
	D1 = kron(I_n3, kron(I_n2, ddx_spdiags(n1)))
	D2 = kron(I_n3, kron(ddx_spdiags(n2), I_n1))
	D3 = kron(ddx_spdiags(n3), kron(I_n2, I_n1))
	D = [D1 D2 D3] # divergence from faces to cell-centers
	return D*D'
	end

juliaci / basebenchmarks.jl Goto Github PK

basebenchmarks.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org