Coder Social home page Coder Social logo

ocaml-bench / sandmark Goto Github PK

View Code? Open in Web Editor NEW
82.0 14.0 40.0 39.74 MB

A benchmark suite for the OCaml compiler

License: The Unlicense

Makefile 0.19% OCaml 17.74% C 7.34% Shell 0.10% PHP 0.02% NASL 0.01% Jupyter Notebook 73.75% Coq 0.81% Dockerfile 0.01% Python 0.03%

sandmark's Introduction

Build Status

Sandmark

Sandmark is a suite of OCaml benchmarks and a collection of tools to configure different compiler variants, run and visualise the results.

Sandmark includes both sequential and parallel benchmarks. The results from the nightly benchmark runs are available at sandmark.tarides.com.

πŸ“£ Attention Users 🫡

If you are interested in only running the sandmark benchmarks on your compiler branch, please add your branch to sandmark nightly config. Read on if you are interested in setting up your own instance of Sandmark for local runs.

FAQ

How do I run the benchmarks locally?

On Ubuntu 20.04.4 LTS or newer, you can run the following commands:

# Clone the repository
$ git clone https://github.com/ocaml-bench/sandmark.git && cd sandmark

# Install dependencies
$ make install-depends

# Install OPAM if not available already
$ sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)
$ opam init

## You can run all the serial or parallel benchmarks using the respective run_all_*.sh scripts
## You can edit the scripts to change the ocaml-version for which to run the benchmarks

$ bash run_all_serial.sh   # Run all serial benchmarks
$ bash run_all_parallel.sh   # Run all parallel benchmarks

You can now find the results in the _results/ folder.

How do I add new benchmarks?

See CONTRIBUTING.md

How do I visualize the benchmark results?

Local runs

  1. To visualize the local results, there are a handful of IPython notebooks available in notebooks/, which are maintained on a best-effort basis. See the README for more information on how to use them.

  2. You can run sandmark-nightly locally and visualize the local results directory using the local Sandmark nighly app.

Nightly production runs

Sandmark benchmarks are configured to run nightly on navajo and turing. The results for these benchmark runs are available at sandmark.tarides.com.

How are the machines tuned for the benchmarking?

You can find detailed notes on the OS settings for the benchmarking servers here

Overview

Sandmark uses opam, with a static local repository, to build external libraries and applications. It then builds any sandmark OCaml benchmarks and any data dependencies. Following this it runs the benchmarks as defined in the run_config.json

These stages are implemented in:

  • Opam setup: the Makefile handles the creation of an opam switch that builds a custom compiler as specified in the ocaml-versions/<version>.json file. It then installs all the required packages; the packages versions are defined in dependencies/template/*.opam files. The dependencies can be patched or tweaked using dependencies directory.

  • Runplan: the list of benchmarks which will run along with the measurement wrapper (e.g. orun or perf) is specified in run_config.json. This config file is used to generate dune files which will run the benchmarks.

  • Build: dune is used to build all the sandmark OCaml benchmarks that are in the benchmarks directory.

  • Execute: dune is used to execute all the benchmarks sepcified in the runplan using the benchmark wrapper defined in run_config.json and specified via the RUN_BENCH_TARGET variable passed to the makefile.

Configuration of the compiler build

The compiler variant and its configuration options can be specified in a .json file in the ocaml-versions/ directory. It uses the JSON syntax as shown in the following example:

{
  "url" : "https://github.com/ocaml-multicore/ocaml-multicore/archive/parallel_minor_gc.tar.gz",
  "configure" : "-q",
  "runparams" : "v=0x400"
}

The various options are described below:

  • url is MANDATORY and provides the web URL to download the source for the ocaml-base-compiler.

  • configure is OPTIONAL, and you can use this setting to pass specific flags to the configure script.

  • runparams is OPTIONAL, and its values are passed to OCAMLRUNPARAM when building the compiler. Note that this variable is not used for the running of benchmarks, just the build of the compiler

Execution

orun

The orun wrapper is packaged as a separate package here. It collects runtime and OCaml garbage collector statistics producing output in a JSON format.

You can use orun independently of the sandmark benchmarking suite, by installing it, e.g. using opam install orun.

Using a directory different than /home

Special care is needed if you happen to run sandmark from a directory different than home.

If you get error like # bwrap: execvp dune: No such file or directory, it may be because opam's sandboxing prevent executables to be run from non-standard locations.

In order to get around this issue, you may specify OPAM_USER_PATH_RO=/directory/to/sandmark in order to whitelist this location from sandboxing.

Benchmarks

You can execute both serial and parallel benchmarks using the run_all_serial.sh and run_all_parallel.sh scripts. Ensure that the respective .json configuration files have the appropriate settings.

If using RUN_BENCH_TARGET=run_orunchrt then the benchmarks will run using chrt -r 1.

IMPORTANT: chrt -r 1 is necessary when using taskset to run parallel programs. Otherwise, all the domains will be scheduled on the same core and you will see slowdown with increasing number of domains.

You may need to give the user permissions to execute chrt, one way to do this can be:

sudo setcap cap_sys_nice=ep /usr/bin/chrt

Configuring the benchmark runs

A config file can be specified with the environment variable RUN_CONFIG_JSON, and the default value is run_config.json. This file lists the executable to run and the wrapper which will be used to collect data (e.g. orun or perf). You can edit this file to change benchmark parameters or wrappers.

The environment within which a wrapper runs allows the user to configure variables such as OCAMLRUNPARAM or LD_PRELOAD. For example this wrapper configuration:

{
  "name": "orun-2M",
  "environment": "OCAMLRUNPARAM='s=2M'",
  "command": "orun -o %{output} -- taskset --cpu-list 5 %{command}"
}

would allow

$ RUN_BENCH_TARGET=run_orun-2M make ocaml-versions/5.0.0+trunk.bench

to run the benchmarks on 5.0.0+trunk with a 2M minor heap setting taskset onto CPU 5.

Tags

The benchmarks also have associated tags which classify the benchmarks. The current tags are:

  • macro_bench - A macro benchmark. Benchmarks with this tag are automatically run nightly.
  • run_in_ci - This benchmark is run in the CI.
  • lt_1s - running time is less than 1s on the turing machine.
  • 1s_10s - running time is between 1s and 10s on the turing machine.
  • 10s_100s - running time is between 10s and 100s on the turing machine.
  • gt_100s - running time is greater than 100s on the turing machine.

The benchmarking machine turing is an Intel Xeon Gold 5120 CPU with 64GB of RAM housed at IITM.

The run_config.json file may be filtered based on the tag. For example,

$ TAG='"macro_bench"' make run_config_filtered.json

filters the run_config.json file to only contain the benchmarks tagged as macro_bench.

Running benchmarks

The build bench target determines the type of benchmark being built. It can be specified with the environment variable BUILD_BENCH_TARGET, and the default value is buildbench which runs the serial benchmarks. For executing the parallel benchmarks use multibench_parallel. You can also setup a custom bench and add only the benchmarks you care about.

Sandmark has support to build and execute the serial benchmarks in byte mode. A separate run_config_byte.json file has been created for the same. These benchmarks are relatively slower compared to their native execution. You can use the following commands to run the serial benchmarks in byte mode:

$ opam install dune.2.9.0
$ USE_SYS_DUNE_HACK=1 SANDMARK_CUSTOM_NAME=5.0.0 BUILD_BENCH_TARGET=bytebench \
    RUN_CONFIG_JSON=run_config_byte.json make ocaml-versions/5.0.0+stable.bench

We can obtain throughput and latency results for the benchmarks. To obtain latency results, we can set the environment variable RUN_BENCH_TARGET to run_pausetimes, which will run the benchmarks with olly and collect the GC tail latency profile of the runs (see the script pausetimes/pausetimes). The results will be files in the _results directory with a .pausetimes.*.bench suffix.

The perf stat output results can be obtained by setting the environment variable RUN_BENCH_TARGET to run_perfstat. In order to use the perf command, the kernel.perf_event_paranoid parameter should be set to -1 using the sysctl command. For example:

$ sudo sysctl -w kernel.perf_event_paranoid=-1

You can also set it permanently in the /etc/sysctl.conf file.

Results

After a run is complete, the results will be available in the _results directory.

Jupyter notebooks are available in the notebooks directory to parse and visualise the results, for both serial and parallel benchmarks. To run the Jupyter notebooks for your results, copy your results to notebooks/ sequential folder for sequential benchmarks and notebooks/parallel folder for parallel benchmarks. It is sufficient to copy only the consolidated bench files, which are present as _results/<comp-version>/<comp-version>.bench. You can run the notebooks with

$ jupyter notebook

Logs

The logs for nightly runs are available at here. Runs which are considered successful are copied to the main branch of the repo, so that they can be visualized using the sandmark nightly UI

Config files

The *_config.json files used to build benchmarks

  • run_config.json : Runs sequential benchmarks with stock OCaml variants in CI and sandmark-nightly on the IITM machine(turing)
  • multicore_parallel_run_config.json : Runs parallel benchmarks with multicore OCaml in CI and sandmark-nightly on the IITM machine(turing)
  • multicore_parallel_navajo_run_config.json : Runs parallel benchmarks with multicore OCaml in sandmark-nightly on Navajo (AMD EPYC 7551 32-Core Processor) machine
  • micro_multicore.json : To locally run multicore specific micro benchmarks

Benchmarks status

The following table marks the benchmarks that are currently not working with any one of the variants that are used in the CI. These benchmarks are known to fail and have an issue tracking their progress.

Variants Benchmarks Issue Tracker
5.0.0+trunk.bench irmin benchmarks sandmark#262
4.14.0+domains.bench irmin benchmarks sandmark#262

Multicore Notes

ctypes

ctypes 14.0.0 doesn't support multicore yet. A workaround is to update dependencies/packages/ctypes/ctypes.0.14.0/opam to use https://github.com/yallop/ocaml-ctypes/archive/14d0e913e82f8de2ecf739970561066b2dce70b7.tar.gz as the source url.

OS X

This is only needed for multicore versions before this commit

The ocaml-update-c command in multicore needs to run with GNU sed. sed will default to a BSD sed on OS X. One way to make things work on OS X is to install GNU sed with homebrew and then update the PATH you run sandmark with to pick up the GNU version.

Makefile Variables

Name Description Default Values Usage
BENCH_COMMAND TAG selection and make command to run benchmarks 4.14.0+domains for CI With current-bench
BUILD_BENCH_TARGET Target selection for sequential (buildbench) and parallel (multibench) benchmarks buildbench building benchmark
BUILD_ONLY If the value is equal to 0 then execute the benchmarks otherwise skip the benchmark execution and exit the sandmark build process 0 building benchmark
CONTINUE_ON_OPAM_INSTALL_ERROR Allow benchmarks to continue even if the opam package install errors out true executing benchmark
DEPENDENCIES List of Ubuntu dependencies libgmp-dev libdw-dev jq python3-pip pkg-config m4 building compiler and its dependencies
ENVIRONMENT Function that gets the environment parameter from wrappers in *_config.json null string building compiler and its dependencies
ITER Indicates the number of iterations the sandmark benchmarks would be executed 1 executing benchmark
OCAML_CONFIG_OPTION Function that gets the runtime parameters configure in ocaml-versions/*.json null string building compiler and its dependencies
OCAML_RUN_PARAM Function that gets the runtime parameters run_param in ocaml-versions/*.json null string building compiler and its dependencies
PACKAGES List of all the benchmark dependencies in sandmark cpdf conf-pkg-config conf-zlib bigstringaf decompress camlzip menhirLib menhir minilight base stdio dune-private-libs dune-configurator camlimages yojson lwt zarith integers uuidm react ocplib-endian nbcodec checkseum sexplib0 eventlog-tools irmin cubicle conf-findutils index logs mtime ppx_deriving ppx_deriving_yojson ppx_irmin repr ppx_repr irmin-layers irmin-pack building benchmark
PRE_BENCH_EXEC Any specific commands that needed to be executed before the benchmark. For eg. PRE_BENCH_EXEC='taskset --cpu-list 3 setarch uname -m --addr-no-randomize' null string executing benchmark
RUN_BENCH_TARGET The executable to be used to run the benchmarks run_orun executing benchmark
RUN_CONFIG_JSON Input file selection that contains the list of benchmarks run_config.json executing benchmark
SANDMARK_DUNE_VERSION Default dune version to be used 2.9.0 building compiler and its dependencies
SANDMARK_OVERRIDE_PACKAGES A list of dependency packages with versions that can be overrided (optional) "" building compiler and its dependencies
SANDMARK_REMOVE_PACKAGES A list of dependency packages to be dynamically removed (optional) "" building compiler and its dependencies
SANDMARK_URL OCaml compiler source code URL used to build the benchmarks "" building compiler and its dependencies
SYS_DUNE_BASE_DIR Function that returns the path of the system installed dune for use with benchmarking dune package present in the local opam switch building compiler and its dependencies
USE_SYS_DUNE_HACK If the value is 1 then use system installed dune 0 building compiler and its dependencies
WRAPPER Function to get the wrapper out of run_<wrapper-name> run_orun executing benchmark

sandmark's People

Contributors

abbysmal avatar abdulrahim2567 avatar anmolsahoo25 avatar atuldhiman avatar ctk21 avatar dinosaure avatar electreaas avatar ernestmusong avatar fabbing avatar firobe avatar jberdine avatar kayceesrk avatar misterda avatar moazzammoriani avatar oliviernicole avatar punchagan avatar rikusilvola avatar ritikbhandari avatar sadiqj avatar shakthimaan avatar shubhamkumar13 avatar singhalshubh avatar stedolan avatar sudha247 avatar tmcgilchrist avatar xavierleroy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sandmark's Issues

Running sandmark on OS X is broken

Looks like we've inadvertently broken the ability to run sandmark on OS X.
It looks like there's a couple of issues here:

  • some of the opam pinned packages are failing to build for OS X 10.15.x for example Lwt
  • the benchmarks themselves just fail to run and sadly it doesn't give any errors to help:
make: *** [ocaml-versions/4.10.0.bench] Error 1

It feels like the ability to run even just the pure OCaml benchmarks (e.g. benchmarks/multicore-numerical/game_of_life.ml) with no package dependencies is useful.

To fix OS X support there are probably two things to do here:

  • fix the underlying issue(s) so that benchmarks run on OS X
  • add OS X to the CI

[RFC] Classifying benchmarks based on running time

Currently, we have classified benchmarks using two tags -- macro_bench and runs_in_ci. This is a useful classification, but we've had cases of mislabelling where benchmarks that run for a few milliseconds being classified as macro. Moreover, the original idea of runs_in_ci was meant to be running those benchmarks run reasonably fast so that we don't exhaust the 1hr CI limit. But it is unclear whether this is being followed necessarily. There are a few benchmarks that now run for more than 100 seconds, but don't bring in much value themselves. They tend to be too long to be useful for an initial benchmarking of a new feature.

image

To address this I propose the following scheme. We will get rid of macro_bench and runs_in_ci and instead use the following classification:

  • lt_1s - benchmarks that run for less than 1 second
  • 1s_10s - benchmarks that run for at least 1 second but less than 10 seconds
  • 10s_100s - benchmarks that run for at least 10 seconds but less than 100 seconds
  • gt_100s - benchmarks that run for at least 100 seconds

We classify the benchmarks based on their running time on the turing machine: Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz.

For the initial performance benchmarking, all the benchmarks in the 1s to 100s range should be considered. This will be the replacement for macro_bench.

For the CI, we will run benchmarks that are in the 1s to 10s range. This will be the replacement for runs_in_ci tag.

Any parallel benchmark should have a serial baseline version that runs for at least 10s. Otherwise, the parallelization overheads outweigh the benefit of parallelism.

We should have a hard look at any benchmarks that run for less than 1s and see whether they're giving us any useful signals.

Include major cycle information in Jupyter notebook graph

The major cycle, count and words, provide useful information on what benchmarks are GC heavy. The baseline information along with their labels can be included in the graph, as they are more useful than what is available in the normalized graph.

CI should run parallel benchmarks

Currently we only run the sequential benchmarks in the CI. We should run parallel benchmarks as well. It is useful just to run the benchmark for 2 domains, removing the taskset and chrt commands. This can be done with jq.

Measurements of code size

For compiler benchmarking it's important to have measurements of code size. The size of the text section of the executable should be fine.

Add the ability to average out the results from several runs

Currently, the Makefile takes the number of iterations to run the benchmarks as an environment variable ITER (default is 1). Each iteration produces a separate folder under _results.

While we have removed much of the noise from the executions and our results are generally reproducible, it would be useful to add the ability to average out the stats from several runs. For example, ASLR is turned on by default on Linux. However, for benchmarking runs, we turn off ASLR as it introduces noise and affects reproducibility. The right thing is to have ASLR on and then get the average of several runs.

That said, average might not make sense for all of the topics; max pause time, max resident set size, for example. Now that the Jupyter notebooks are part of the Sandmark repo, we should add the ability in the notebooks to process multiple iterations of the same compiler variant and compute averages (and also median and SD, when it makes sense).

BUILD_ONLY needs to exit with error if package installation fails

At present, the BUILD_ONLY environment variable stops before executing the benchmarks. We need a way to exit with an error status if any of the dependency installation packages fail. We use --best-effort to ignore any build errors and proceed.

In particular, frama-c (#17) needs to be built successfully for this change to be useful with the CI.

View results for a set of benchmarks in the nightly notebooks

(This is in the context of benchmark nightly runs on a remote machine)

It will be nice to have a way to select a set of benchmarks and view only results of those benchmarks. It will be useful to select them based on:

  • Individual benchmarks
  • Bench tags

Add an odoc based benchmark

It would be useful to have more memory intensive benchmarks. It looks like odoc has dependencies that shouldn't be to complex to add considering what is already present. It will also build on the multicore 4.12+domains variant.

I'm sure that the odoc team can provide some pointers to a couple of workloads that are fairly memory intensive and representative of the sort of thing they see.

[RFC] How should a user configure a sandmark run?

If we look at things as a user wanting to run a benchmarking experiment, then they are wanting to compose several components to get data. The components they want to compose are:

  • A compiler built with their favourite configure switches
  • A set of benchmarks they want to run (expressed preferably as the end binary rather than needing to know the dependencies)
  • A program that collects stats from the binary (e.g. perf, orun, binary size)
  • An environment configuration for the execution of the stats collection (e.g. taskset, ASLR, OCAMLRUNPARAM)

The user configures (and could do so implicitly from defaults) these components into a run plan of benchmark executions to get data.

The mechanisms for configuring up the above has evolved as we’ve tried to do more with sandmark and (currently) is spread out:

  • the compiler build is specified in the .comp file
  • the benchmarks and their parameters are in run_config.json but you have to get the BUILD_BENCH_TARGET correct (e.g. multicore vs serial benchmarks) for all the opam packages because the benchmark programs don’t bring their dependent opam packages with them.
  • the stat collection wrappers are in run_config.json
  • the environment configuration is a bit all over the place. Some of it is going on in the wrappers in run_config.json, we have some stuff happening (e.g. crontab scripts) with PRE_BENCH_EXEC and we have PARAMWRAPPER handling multicore tasksetting

With this issue I'm hoping to find out if this tuple of:

(
compiler build, 
set of benchmarks (incl dependencies),
executable to collect stats from a benchmark command, 
environment within which to run the executable
)

covers all the use cases we need?

Once we've got the use cases nailed down. We can try to have proposals or prototypes that attempt to make the configuration a bit easier to handle than the current mix of environment variables, comp files and json files. This should make it easier for users to run the experiments they want.

sandmark dune version not compiling with 4.09

sandmark is unable to run the 4.09 branch after this commit:
ocaml/ocaml@a7e7e8e

sandmark is using a custom dune to make it work with multicore:
https://github.com/ocaml-bench/sandmark/blob/master/dependencies/packages/dune/dune.1.7.1/opam

Unfortunately sandmark is unable to compile dune after the 4.09 commit above. We get the following error:

#=== ERROR while compiling dune.1.7.1 =========================================#
# context              2.0.4 | linux/x86_64 | ocaml-base-compiler.4.09.0 | file:///local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/dependencies
# path                 /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/4.09.0/.opam-switch/build/dune.1.7.1
# command              /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/opam-init/hooks/sandbox.sh build ./boot.exe --release -j 5
# exit-code            1
# env-file             /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/log/dune-35511-c4933c.env
# output-file          /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/log/dune-35511-c4933c.out
### output ###
# Error: This expression has type
# [...]
#          (string, string) Dune.Import.result =
#            (string, string) Dune_caml.result
#        Type string list is not compatible with type string
# -> required by src/.dune.objs/native/dune__Watermarks.cmx
# -> required by alias src/lib-dune.cmx-all
# -> required by alias src/lib-dune.cmi-and-.cmx-all
# -> required by bin/.main.objs/native/main.cmx
# -> required by bin/main.a
# -> required by bin/main_dune.exe
# -> required by _boot/install/default/bin/dune
# -> required by dune.install

Filtering benchmarks based on tags

We have multiple tags to classify benchmarks in the config files, like macro_bench, run_in_ci and tags based on running time. Makefile currently supports filtering macro and CI benchmarks with custom rules.

It would be useful to generalize this further by taking the following items as inputs:

  • tag
  • source file
  • destination file

All benchmarks with the tag in source file needs to be copied to destination file.

The following jq filter could be used to do the filtering:

jq '{wrappers : .wrappers, benchmarks: [.benchmarks | .[] | select(.tags | index(<tag>) != null)]}'

Reimplement broken pausetimes support

In Multicore OCaml 4.12.0, the catapult format (used by chrome://tracing/) based event tracing is deprecated. Multicore 4.12.0 will soon produce CTF based traces ocaml-multicore/ocaml-multicore#527. We need to update the scripts to adapt to the new tracing framework. This involves several tasks:

  1. Modify the pausetimes scripts for multicore and stock to emit the CTF based traces
  2. Rewrite the tail latency computation scripts to read the CTF format directly. Catapult traces produced by Multicore OCaml quickly go to Gigabytes in size and it wasn't practical to analyse the pausetime for long-running programs due to the large file sizes.
  3. Possibly, the new tail latency scripts should be implemented in OCaml rather than python. We observed that python happens to be really slow in processing medium-sized eventlog traces; multiple minutes for processing the eventlog of a single program run. If it turns out to be slow to process the CTF traces, we should consider rewriting this in OCaml for speed.

graph500seq: Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

The graph500seq benchmarks are failing in Sandmark (commit aa122e6, January 24, 2021) for macro_bench tag with the following error messages:

Executing benchmarks with:
  RUN_CONFIG_JSON=run_config_filtered.json
  RUN_BENCH_TARGET=run_orun  (WRAPPER=orun)
  PRE_BENCH_EXEC=
        orun kernel2.12_10.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/graph500seq && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../kernel2.12_10.orun.bench -- taskset --cpu-list 5 ./kernel2.exe 12 10)
Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

/tmp/orun875078stderr
        orun kernel3.12_10_2.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/graph500seq && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../kernel3.12_10_2.orun.bench -- taskset --cpu-list 5 ./kernel3.exe 12 10 2)
Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

You can reproduce the error using:

$ TAG='"macro_bench"' make run_config_filtered.json
$ RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench

In benchmarks/graph500seq/dune, there is:

(alias (name buildbench) (deps kronecker.exe kernel1.exe kernel2.exe kernel3.exe))

Note:

  1. The deps are not built in sequential order, and kronecker.exe produces kronecker.txt, which is a pre-requisite for kernel2 and kernel3.
  2. Also, kernel2 and kernel3 use a linkKernel1 function from kernel1.

[RFC] Categorize and group by benchmarks

At present, the benchmarks in Sandmark are available in the benchmarks folder:

$ ls benchmarks/
almabench       chameneos  decompress   kb           multicore-effects     multicore-minilight   numerical-analysis  stdlib      zarith
alt-ergo        coq        frama-c      lexifi-g2pp  multicore-gcroots     multicore-numerical   sauvola             thread-lwt
bdd             cpdf       graph500seq  menhir       multicore-grammatrix  multicore-structures  sequence            valet
benchmarksgame  cubicle    irmin        minilight    multicore-lockfree    nbcodec               simple-tests        yojson

What would be a good set of categories to group them together, perhaps something like the following?

library formal numerical graph multicore ...

Noise in Sandmark

Following the discussion in ocaml/ocaml#9934, I set out to quantify the noise in Sandmark macrobenchmark runs. Before asking complex questions about loop alignments and microarchitectural optimisations as was done in ocaml/ocaml#10039, I wanted to measure the noise between multiple runs of the same code. It is important to note that currently, we only run a single iteration of each variant.

The benchmarking was done on IITM "turing" which is a Intel Xeon Gold 5120 CPU machine with isolated cores, cpu governer set to performance, hyper threading disabled, turbo boost disabled, interrupts and rcu_callbacks directed to non-isolated cores but ASLR on [1]. The result on two runs of the latest commit from https://github.com/stedolan/ocaml/tree/sweep-optimisation is here:

image

The outlier is worrisome, but there is up to 2% difference in both directions. Moving forward, we should consider the following:

  1. Arrive at a measure for statistical significance on a given machine. What would be the minimum difference beyond which the result is statistically significant. This will vary based on the benchmark and the topic (running time, maxRSS).
  2. Run multiple iterations. Sandmark already has an ITER variable which runs the experiments for multiple runs. The notebooks need to be updated so that mean (and standard deviation) are computed first and the graphs are updated to include error bars. The downside is that the benchmarking will take significantly longer. We should choose a representative set of macro benchmarks for quick study and reserve the full macro benchmark run for the final result. Can we run the sequential macro benchmarks in parallel on different isolated cores? What would be the impact of this on individual benchmark runs?

[1] https://github.com/ocaml-bench/ocaml_bench_scripts#notes-on-hardware-and-os-settings-for-linux-benchmarking

[RFC] Header entry attributes for the summary benchmark result file

At present, the ocaml_bench_scripts creates the benchmark results files in a hierarchical directory structure that has information regarding hostname, GitHub commit, branch, timestamp etc. One option is to flatten this data, and also add additional meta-information as a header entry to each consolidated .bench result file. The advantages are:

  1. The list of bench files can be used locally with a JupyterHub notebook, without having to create the necessary directory structure.
  2. Each .bench file is self-contained, and can be easily stored and accessed in a file system archive. This allows an Extract, Transform, Load (ETL) tool to push data to a database or for visualization for further analysis.
    A useful list of key-value attributes that can be included in the header entry of the JSON .bench file are:
  • Version
  • Timestamp
  • Hostname
  • Operating System
  • Kernel version
  • Architecture
  • GitHub repository
  • GitHub branch
  • GitHub commit
  • Compiler variant
  • Compiler configure options
  • Compiler runtime options

Ability to collect perf stats on a benchmark run

On Linux we would like to be able to collect 'perf stat' on the benchmark process.

We would like to collect the following counters:

  • basic:
    task-clock, instructions, cpu-cycles, stalled-cycles-frontend, branches, branch-misses
  • front-end to back-end pipeline:
    idq_uops_not_delivered.core, lsd.cycles_active
  • memory:
    L1-icache-load-misses, L1-dcache-load-misses, iTLB-load-misses, dTLB-load-misses

How to run single-threaded benchmarks alone?

make ocaml-versions/4.06.0.bench fails to compile kcas package, unsurprisingly. How do I run just the single threaded benchmarks?

The following actions will be performed:
  βˆ— install kcas 0.1.4

<><> Gathering sources ><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
[kcas.0.1.4] found in cache

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
[ERROR] The compilation of kcas failed at "/home/kc/sandmark/_opam/opam-init/hooks/sandbox.sh build dune build -p kcas".

#=== ERROR while compiling kcas.0.1.4 =========================================#
# context     2.0.0 | linux/x86_64 | ocaml-base-compiler.4.06.0 | file:///home/kc/sandmark/dependencies
# path        ~/sandmark/_opam/4.06.0/.opam-switch/build/kcas.0.1.4
# command     ~/sandmark/_opam/opam-init/hooks/sandbox.sh build dune build -p kcas
# exit-code   1
# env-file    ~/sandmark/_opam/log/kcas-8684-1872ec.env
# output-file ~/sandmark/_opam/log/kcas-8684-1872ec.out
### output ###
#       ocamlc src/.kcas.objs/byte/kcas.{cmo,cmt} (exit 2)
# (cd _build/default && /home/kc/sandmark/_opam/4.06.0/bin/ocamlc.opt -w -40 -g -bin-annot -I src/.kcas.objs/byte -intf-suffix .ml -no-alias-deps -open Kcas__ -o src/.kcas.objs/byte/kcas.cmo -c -impl src/kcas.ml)
# File "src/kcas.ml", line 37, characters 2-28:
# Error: Unbound value Obj.compare_and_swap_field
#     ocamlopt src/.kcas.objs/native/kcas.{cmx,o} (exit 2)
# (cd _build/default && /home/kc/sandmark/_opam/4.06.0/bin/ocamlopt.opt -w -40 -g -I src/.kcas.objs/byte -I src/.kcas.objs/native -intf-suffix .ml -no-alias-deps -open Kcas__ -o src/.kcas.objs/native/kcas.cmx -c -impl src/kcas.ml)
# File "src/kcas.ml", line 37, characters 2-28:
# Error: Unbound value Obj.compare_and_swap_field



<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
β”Œβ”€ The following actions failed
β”‚ Ξ» build kcas 0.1.4
└─ 
╢─ No changes have been performed
opam exec --switch 4.06.0 -- dune build -j 1 --profile=release --workspace=ocaml-versions/.workspace.4.06.0 @bench; \
  ex=$?; find _build/4.06.0_* -name '*.bench' | xargs cat > ocaml-versions/4.06.0.bench; exit $ex

Add Coq benchmarks

Coq Installation on Multicore OCaml

Coq compiles with the multicore compiler now. You'll need this branch of coq. You'll also need a dune > 2.4.0. The easiest way to get an installation going is to install dune.2.4.0 on 4.10.0 and copy the dune binary into the bin folder of the multicore switch. Once that is done, coq can be built with make -f Makefile.dune world.

Benchmarks

Once coq is build, make ci-all downloads and builds a whole series of libraries. It will be useful to take some of these library builds and make them into benchmarks in Sandmark.

Make benchmark wrapper user configurable

From Tom Kelly on slack: "one thing that would be awesome in sandmark is the ability to configure the benchmark wrapper that collects the stats. Right now we have orun -o <output> -- <program-to-run> <program-arguments> which is static in the dune file. It would be nice if we could have the user configure in a central place <command> -o <output> -- <program-to-run> <program-arguments>. This can be powerful as you can then get off the shelf wrappers in there like ocperf.py and strace. It should also allow the user to define the arguments they want to pass to perf. For example they could record all the benchmarks for a given target."

Include raw baseline observation in the normalized graphs

The normalized graphs in sandmark currently do not include the raw baseline observations. For example,

image

The graphs shows the normalized running time comparing two compiler variants. This graph does not convey how much time the baseline actually took. We've had few examples where we've seen 20% speedup / slowdown that can be explained by the fact that the benchmark only runs for a few milliseconds and the difference that we see is due to expected execution time variance and noise.

It would be useful to have the baseline time included in the normalized graphs as additional information with the benchmark name. For example,

image

The numbers in the parenthesis indicates the time in seconds for the baseline runs.

It would be useful to include the raw baseline observation in every normalised graph.

Deprecate the use of 4.06.1 and 4.10.0 in Sandmark

We no longer support multicore versions 4.06.1 and 4.10.0 in multicore. The task is to ensure that all the references to either of these versions is removed from the documentation and the scripts, and making sure that 4.12.0 works well with the new documentation.

Run parallel benchmarks in navajo.ocamllabs.io

Now that we know how to run parallel benchmark, we should run them in our CB server.

p.s: Not sure if this is the right place to add this issue. We don't have an issue tracker for CB.

Making Output location for run statistics 'required'

The command line does not specify that the output location is mandatory or else orun will fail with the following error:
orun: internal error, uncaught exception: Sys_error(": No such file or directory")

There are two ways to fix this:

  1. Either make this argument here positional (since Cmdliner doesn't provide optional required arguments)
  2. Or change the default string here from "" to something else.

Which idea seems better?

Enrich the functionality of .comp files

Currently, the compiler variants in .comp files only take the URL. We've seen many requests for enriching the functionality of .comp files so that the builds and runs may be customized. Some of these are:

  1. OCAMLRUNPARAM parameters for the runs. Consider the case of running the same compiler variant with different OCAMLRUNPARAM parameters. Currently, there is no way to configure this as part of the variant and must be externally applied during runs.
  2. Compiler configuration flags such as enabling flambda.
  3. Different wrappers (orun, pauseimes, perf). These are now hardcoded in the .json files [1]. It will be better if these were described in the .comp files.

I would consider 1 and 2 to be high priority right now. Once that is done, we can do 3. I recommend using s-expression for the new data format of .comp files, and parse it with the help of sexplib [2]. It might also be useful to rename .comp to .var to signify that these are different variants not compilers (same compiler might have different variants c.f. item 1 above).

[1] https://github.com/ocaml-bench/sandmark/blob/master/multicore_parallel_run_config.json#L2-L15
[2] https://github.com/janestreet/sexplib

@shakthimaan

Improve microbenchmarks

A few of the microbenchmarks need to be improved. In particular, between multicore and trunk, the finalise microbenchmark measures the performance difference in mark stack overflow handling. lazy_primes measures the efficiency of major heap allocator.

simple-tests/capi - Fatal error: "unexpected test name"

The capi tests are failing in Sandmark master branch (commit aa122e6, January 24, 2021) for the lt_1s, 1s_10s tags with the following error messages:

...
Executing benchmarks with:
  RUN_CONFIG_JSON=run_config_filtered.json
  RUN_BENCH_TARGET=run_orun  (WRAPPER=orun)
  PRE_BENCH_EXEC=
        orun capi.test_few_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_few_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_few_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

/tmp/orun898647stderr
        orun capi.test_many_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_many_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_many_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

/tmp/orunf8dd14stderr
        orun capi.test_no_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_no_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_no_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

You can reproduce the issue using:

$ TAG='"lt_1s"' make run_config_filtered.json
$ RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench

js_of_ocaml fails to run on multicore

See issue #17 on how to get frama-c building on multicore.

js_of_ocaml was failing with Error: Js_of_ocaml_compiler__Instr.Bad_instruction(983041)

might be additional bytecode instructions added in multicore and not supported by js_of_ocaml?

Running benchmarks with varying OCAMLRUNPARAM

Right now there isn't an easy way to run experiments where OCAMLRUNPARAM changes but the compiler build stays fixed.

Suppose we wanted to look at the impact of the minor heap size across our benchmarks with 4.10.0+stock.

The run plan we want is:
4.10.0+stock, OCAMLRUNPARAM=s=1M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=2M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=4M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=8M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=16M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=32M, using orun

One way to do this might be to have a way to specify OCAMLRUNPARAM within a wrapper in the run_config.json; this has the advantage that we get the wrapper naming and reuse of the build for free. There may be other better ways to do it.

(NB: the runparams in the compiler variant here is only applied to the opam build, not the run of the benchmarks)

`OCAMLRUNPARAM` value should be included in the .bench file

Since OCAMLRUNPARAM values affect the execution of the benchmarks, we should include them in the .bench file for each of the runs. That is, this should include a new field called OCAMLRUNPARAM whose value is the value of the OCAMLRUNPARAM environment variable.

This will require changing the orun tool to include the additional field.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.