Coder Social home page Coder Social logo

nordlow / compiler-benchmark Goto Github PK

View Code? Open in Web Editor NEW
130.0 10.0 18.0 4.24 MB

Benchmarks compilation speeds of different combinations of languages and compilers.

License: MIT License

Python 93.30% Shell 6.70%
benchmark compiler compilation d rust julia zig vlang c cplusplus cpp java

compiler-benchmark's Introduction

compiler-benchmark

Benchmarks compilation speeds of different combinations of languages and compilers. Supported languages are:

Languages with Natives Compilers

Languages with Bytecode Compilers:

  • OCaml (using ocamlopt),
  • C# (using mcs), and
  • Java (using javac).

A subset of these can be installed on Ubuntu (tested on 20.04) via the script ./install-compilers-on-ubuntu-20.04.sh in this repo.

Install Python 3 packages

./install-python-packages.sh

How it works

A benchmark is typically performed as

./benchmark \
    --function-count=$FUNCTION_COUNT \
    --function-depth=$FUNCTION_DEPTH \
    --run-count=5

for suitable values of $FUNCTION_COUNT and FUNCTION_DEPTH or simply

./benchmark

for defaulted values of all the parameters.

A subset of languages combined with set of compilers to benchmark can be chosen as, for instance,

./benchmark --languages=C:tcc,C:gcc,C++,D:dmd,D:ldmd2,D:gdc,Rust

This will generate code into the directory generated and then, for each combination of language, operation type and compiler, run the supported benchmarks. At the end a Markdown-formatted table showing the results of the benchmark is printed to standard output. Note that the compilation times in this table are titled Time [us/#fn] meaning in unit microseconds normalized with number of test functions generated, that is divided by args.function_count * args.function_depth).

GCC and Clang doesn't perform all semantic checks for C++ (because it's too costly). This is in contrast to D's and Rust's compilers that perform all of them.

Sample generated code

To understand how the code generation works we can, for instance, do

./benchmark --function-count=3 --function-depth=2 --run-count=5

This will, for the C language case, generate a file generated/c/main.c containing

long add_long_n0_h0(long x) { return x + 15440; }
long add_long_n0(long x) { return x + add_long_n0_h0(x) + 95485; }

long add_long_n1_h0(long x) { return x + 37523; }
long add_long_n1(long x) { return x + add_long_n1_h0(x) + 92492; }

long add_long_n2_h0(long x) { return x + 39239; }
long add_long_n2(long x) { return x + add_long_n2_h0(x) + 12248; }


int main(__attribute__((unused)) int argc, __attribute__((unused)) char* argv[]) {
    long long_sum = 0;
    long_sum += add_long_n0(0);
    long_sum += add_long_n1(1);
    long_sum += add_long_n2(2);
    return long_sum;
}

Compiler Object Caches

The numerical constants are randomized using a new seed upon every call. This makes it impossible for any compiler to utilize any caching mechanism upon successive calls with same flags that affect the source generation. The purpose of this is to make the comparison between compilers with no or different levels of caching more fair.

The caching of the Go reference compiler go, for instance, is effectively disabled by this randomization.

Generics

For each languages $LANG that supports generics an additional templated source file main_t.$LANG will be generated alongside main.$LANG equivalent to the contents of main.$LANG apart from that all functions (except main) are templated. This templated source will be benchmarked aswell. The column Templated in the table below indicates whether or not the compilation is using templated functions.

Conclusions (from sample run shown below)

TCC build speed is varstly superior because of its single-pass code-generation architecture. Partly because parsing the C programming language that doesn’t have to deal with forward declarations and thereby limiting the parsing (and memory allocation) scope to a single function.

The Tiny C compiler (TCC) (tcc) is by a large margin, the fastest, closely followed by the C compiler Cuik, Vox and D's dmd. Note that Vox is an experimental language and Cuik is an experimental C compiler.

The performance of both GCC and Clang gets significanly worse with each new release (currently 8, 9, 10 in the table below).

The templated (generic) C++ source checks about 3 times slower than the non-generic one using gcc-8 but only about 2.3 times slower for gcc-10. For clang++-10 the slowdown is only about 1.6. The corresponding slowdown for generic D (dmd) is about 2.5 times. On the other hand, the generic Rust version interestingly is processed 2-3 times faster than the non-generic version.

Julia's JIT-compiler is (currently) very memory hungry. A maximum recommended product of function-count and function-depth for Julia is 5000. Julia will therefore be excluded from the benchmark when this maximum is reached.

OCaml's optimizing native compiler ocamlopt is very slow for large inputs and is therefore disabled when the product of function-count and function-depth exceeds 10000.

Sample Run on Intel Core (Tiger Lake R0) [Willow Cove] {Sunny Cove}, 10nm++

The output on a Intel Core (Tiger Lake R0) [Willow Cove] {Sunny Cove}, 10nm++ running Ubuntu 22.04 for the sample call

./benchmark --function-count=200 --function-depth=200 --run-count=5

results in the following table (copied from the output at the end).

Lang-uage Temp-lated Check Time [us/fn] Compile Time [us/fn] Build Time [us/fn] Run Time [us/fn] Check RSS [kB/fn] Build RSS [kB/fn] Exec Version Exec Path
D No 5.7 (4.1x) 14.4 (10.7x) 16.2 (11.5x) 46 (3.1x) 5.0 (10.6x) 14.7 (31.6x) v2.107.0-beta.1-136-gc5c4def18f dmd
D No 4.3 (3.1x) 67.5 (50.4x) 68.2 (48.6x) 218 (14.6x) 6.3 (13.5x) 20.8 (44.8x) 1.36.0 ldmd2
D No 4.7 (3.4x) 186.6 (139.3x) 183.8 (130.8x) 37 (2.5x) 4.8 (10.2x) 19.5 (41.9x) 11.4.0 gdc
D Yes 17.4 (12.6x) 29.1 (21.7x) 30.9 (22.0x) 45 (3.0x) 13.8 (29.6x) 23.8 (51.2x) v2.107.0-beta.1-136-gc5c4def18f dmd
D Yes 17.2 (12.5x) 83.0 (61.9x) 83.2 (59.2x) 217 (14.5x) 15.3 (32.9x) 29.6 (63.6x) 1.36.0 ldmd2
D Yes 11.0 (8.0x) 195.8 (146.1x) 192.6 (137.0x) 34 (2.3x) 13.6 (29.1x) 29.1 (62.5x) 11.4.0 gdc
C No 1.4 (best) 1.3 (best) 1.4 (best) 15 (best) 0.5 (best) 0.5 (best) 0.9.28rc tcc
C No 4.1 (3.0x) 27.9 (20.8x) 29.7 (21.1x) 275 (18.4x) 4.6 (10.0x) 49.7 (106.8x) ~master cuik
C No 8.0 (5.8x) 220.7 (164.7x) 219.1 (155.9x) 22 (1.5x) 3.0 (6.5x) 14.0 (30.1x) 12.3.0 gcc
C No 6.1 (4.4x) 173.8 (129.7x) 174.2 (124.0x) 22 (1.4x) 2.8 (6.0x) 14.4 (30.9x) 11.4.0 gcc-11
C No 8.0 (5.8x) 221.6 (165.4x) 221.1 (157.4x) 22 (1.5x) 3.0 (6.5x) 14.0 (30.1x) 12.3.0 gcc-12
C No 13.5 (9.8x) 84.3 (62.9x) 85.8 (61.1x) 347 (23.2x) 2.9 (6.1x) 10.8 (23.3x) 14.0.0-1 clang
C No 13.3 (9.6x) 81.7 (61.0x) 83.5 (59.4x) 183 (12.2x) 2.2 (4.8x) 9.7 (20.9x) 13.0.0 clang-13
C No 13.6 (9.8x) 83.8 (62.5x) 85.7 (61.0x) 313 (21.0x) 2.8 (6.1x) 10.8 (23.3x) 14.0.0-1 clang-14
C No 13.9 (10.1x) 82.1 (61.3x) 83.9 (59.7x) 320 (21.4x) 2.9 (6.3x) 10.8 (23.1x) 15.0.7 clang-15
C No 14.5 (10.5x) 86.7 (64.7x) 87.9 (62.6x) 257 (17.2x) 2.8 (6.0x) 10.9 (23.5x) 17.0.6 clang-17
C++ No 18.0 (13.0x) 229.2 (171.1x) 231.4 (164.7x) 27 (1.8x) 4.8 (10.3x) 16.8 (36.1x) 12.3.0 g++
C++ No 12.7 (9.2x) 185.4 (138.3x) 185.0 (131.7x) 23 (1.5x) 4.5 (9.7x) 14.3 (30.7x) 11.4.0 g++-11
C++ No 18.2 (13.2x) 229.4 (171.2x) 231.5 (164.7x) 22 (1.5x) 4.7 (10.2x) 16.8 (36.1x) 12.3.0 g++-12
C++ No 17.3 (12.6x) 90.6 (67.6x) 93.3 (66.4x) 347 (23.2x) 3.0 (6.4x) 10.8 (23.3x) 14.0.0-1 clang
C++ No 17.2 (12.5x) 88.9 (66.4x) 91.0 (64.8x) 180 (12.0x) 2.4 (5.1x) 9.8 (21.0x) 13.0.0 clang-13
C++ No 17.5 (12.7x) 90.8 (67.8x) 93.3 (66.4x) 312 (20.9x) 3.0 (6.4x) 10.8 (23.3x) 14.0.0-1 clang-14
C++ No 18.2 (13.2x) 89.5 (66.8x) 91.9 (65.4x) 297 (19.9x) 3.0 (6.5x) 10.8 (23.3x) 15.0.7 clang-15
C++ No 18.3 (13.2x) 94.6 (70.6x) 96.4 (68.6x) 278 (18.6x) 2.9 (6.2x) 10.9 (23.5x) 17.0.6 clang-17
C++ Yes 34.7 (25.2x) 274.7 (205.0x) 287.4 (204.5x) 21 (1.4x) 8.3 (17.8x) 20.9 (45.0x) 12.3.0 g++
C++ Yes 27.7 (20.1x) 227.3 (169.6x) 240.8 (171.4x) 22 (1.5x) 8.2 (17.7x) 20.8 (44.7x) 11.4.0 g++-11
C++ Yes 34.5 (25.0x) 275.0 (205.3x) 288.0 (205.0x) 23 (1.5x) 8.3 (17.8x) 20.9 (45.0x) 12.3.0 g++-12
C++ Yes 28.4 (20.6x) 99.1 (73.9x) 113.2 (80.6x) 351 (23.5x) 4.8 (10.3x) 14.0 (30.2x) 14.0.0-1 clang
C++ Yes 28.3 (20.5x) 98.3 (73.4x) 112.4 (80.0x) 179 (12.0x) 4.2 (9.0x) 13.2 (28.3x) 13.0.0 clang-13
C++ Yes 28.6 (20.7x) 98.6 (73.6x) 113.3 (80.6x) 347 (23.2x) 4.8 (10.3x) 14.0 (30.2x) 14.0.0-1 clang-14
C++ Yes 29.7 (21.5x) 98.3 (73.4x) 112.4 (80.0x) 319 (21.3x) 4.9 (10.4x) 14.2 (30.5x) 15.0.7 clang-15
C++ Yes 30.5 (22.1x) 102.9 (76.8x) 116.8 (83.1x) 275 (18.4x) 4.8 (10.3x) 14.2 (30.6x) 17.0.6 clang-17
Ada No N/A N/A 752.0 (535.2x) 38 (2.5x) N/A 31.8 (68.4x) 12.3.0 gnat
Ada No N/A N/A 755.5 (537.7x) 40 (2.7x) N/A 31.8 (68.4x) 12.3.0 gnat-12
Go No 8.1 (5.9x) N/A N/A N/A 4.3 (9.3x) N/A 1.21.6 gotype
Go No N/A N/A 344.1 (244.9x) 24 (1.6x) 7.2 (15.5x) 23.9 (51.3x) 12.3.0 gccgo-12
Go No N/A N/A 113.8 (81.0x) 57 (3.8x) N/A 27.5 (59.2x) 1.21.6 go
Swift No 429.3 (311.0x) N/A 679.0 (483.2x) 913 (61.1x) 9.3 (20.1x) 24.2 (51.9x) 5.9.2 swiftc
Zig No 12.1 (8.8x) N/A 226.2 (161.0x) 106 (7.1x) 3.1 (6.7x) 27.2 (58.4x) 0.12.0-dev.2341+92211135f zig
Zig Yes 14.1 (10.2x) N/A 232.9 (165.8x) 78 (5.2x) 3.5 (7.5x) 27.7 (59.6x) 0.12.0-dev.2341+92211135f zig
Rust No 28.1 (20.4x) N/A 157.2 (111.9x) 680 (45.5x) 14.5 (31.2x) 33.1 (71.1x) 1.77.0-nightly rustc
Rust Yes 41.4 (30.0x) N/A 116.2 (82.7x) 726 (48.7x) 16.9 (36.2x) 26.8 (57.6x) 1.77.0-nightly rustc
Nim No 36.3 (26.3x) N/A 358.5 (255.2x) 60 (4.0x) 4.4 (9.4x) sampling error 2.0.2 nim
C# No N/A N/A 15.5 (11.1x) 349 (23.4x) N/A 4.7 (10.1x) 6.12.0.200 mcs
C# No N/A N/A 182.2 (129.6x) 1477 (98.9x) N/A 8.8 (19.0x) 3.9.0-6.21124.20 csc
N/A N/A N/A N/A N/A N/A N/A 12.6 (27.2x) N/A N/A
N/A N/A N/A N/A N/A N/A N/A 17.2 (37.0x) N/A N/A
OCaml No N/A N/A 82.0 (58.3x) 19 (1.3x) N/A 16.0 (34.3x) 4.13.1 ocamlc
Julia No N/A N/A 287.5 (204.6x) N/A N/A 12.4 (26.6x) 1.11.0-DEV julia
Julia Yes N/A N/A 231.4 (164.6x) N/A N/A 10.6 (22.7x) 1.11.0-DEV julia

Sample Run on AMD Ryzen Threadripper 3960X 24-Core

The output on an AMD Ryzen Threadripper 3960X 24-Core Processor running Ubuntu 22.04 for the sample call

./benchmark --function-count=200 --function-depth=200 --run-count=1

results in the following table (copied from the output at the end).

Lang-uage Temp-lated Check Time [us/fn] Compile Time [us/fn] Build Time [us/fn] Run Time [us/fn] Check RSS [kB/fn] Build RSS [kB/fn] Exec Version Exec Path
Vox No 1.5 (best) N/A 5.2 (3.3x) 42 (1.2x) 1.1 (2.8x) 3.6 (8.1x) master vox
Vox Yes 2.0 (1.4x) N/A 6.1 (3.9x) 65 (1.8x) 2.0 (5.1x) 4.4 (9.9x) master vox
D No 6.3 (4.2x) 13.4 (7.4x) 17.9 (11.4x) 72 (2.0x) 4.6 (11.5x) 12.2 (27.2x) v2.097.0-275-g357bc9d7a dmd
D No 7.4 (5.0x) 90.8 (49.9x) 99.6 (63.5x) 219 (6.2x) 5.7 (14.3x) 19.7 (43.8x) 1.26.0 ldmd2
D No 6.4 (4.3x) 240.5 (132.3x) 237.5 (151.5x) 40 (1.1x) 4.5 (11.2x) 19.2 (42.6x) 10.3.0 gdc
D Yes 12.7 (8.5x) 21.9 (12.0x) 25.9 (16.5x) 64 (1.8x) 13.0 (32.5x) 21.4 (47.6x) v2.097.0-275-g357bc9d7a dmd
D Yes 14.2 (9.6x) 102.0 (56.1x) 110.6 (70.6x) 302 (8.6x) 14.9 (37.4x) 29.3 (65.3x) 1.26.0 ldmd2
D Yes 12.4 (8.3x) 287.4 (158.0x) 286.8 (182.9x) 56 (1.6x) 13.2 (33.1x) 28.3 (63.0x) 10.3.0 gdc
C No 1.8 (1.2x) 1.8 (best) 1.6 (best) 44 (1.3x) 0.4 (best) 0.4 (best) 0.9.27 tcc
C No 5.3 (3.5x) N/A N/A N/A 1.7 (4.2x) N/A unknown cproc
C No 8.2 (5.5x) 274.2 (150.8x) 282.0 (179.8x) 55 (1.6x) 2.9 (7.4x) 14.3 (31.9x) 9.3.0 gcc
C No 8.2 (5.5x) 273.6 (150.4x) 278.8 (177.8x) 54 (1.5x) 3.0 (7.5x) 14.3 (31.8x) 9.3.0 gcc-9
C No 6.0 (4.0x) 220.4 (121.2x) 224.9 (143.4x) 55 (1.6x) 2.8 (7.1x) 14.3 (31.9x) 10.3.0 gcc-10
C No 14.7 (9.9x) 121.9 (67.0x) 125.9 (80.3x) 1045 (29.8x) 1.8 (4.4x) 9.4 (20.9x) 10.0.0-4 clang-10
C No 15.6 (10.5x) 121.7 (66.9x) 124.7 (79.5x) 376 (10.7x) 1.9 (4.7x) 9.4 (21.0x) 11.0.0-2 clang-11
C++ No 20.1 (13.5x) 290.9 (159.9x) 293.3 (187.0x) 42 (1.2x) 4.4 (11.1x) 14.4 (32.1x) 9.3.0 g++
C++ No 19.9 (13.4x) 291.9 (160.5x) 294.6 (187.9x) 48 (1.4x) 4.4 (11.0x) 14.4 (32.1x) 9.3.0 g++-9
C++ No 14.9 (10.0x) 235.8 (129.6x) 238.4 (152.0x) 35 (best) 4.4 (11.1x) 14.1 (31.3x) 10.3.0 g++-10
C++ No 21.1 (14.2x) 135.2 (74.4x) 137.9 (87.9x) 1022 (29.1x) 1.9 (4.7x) 9.5 (21.1x) 10.0.0-4 clang-10
C++ No 21.8 (14.6x) 133.8 (73.6x) 135.3 (86.3x) 314 (8.9x) 2.0 (5.0x) 9.5 (21.2x) 11.0.0-2 clang-11
C++ Yes 40.0 (26.9x) 346.4 (190.5x) 370.5 (236.2x) 47 (1.3x) 7.8 (19.6x) 23.6 (52.6x) 9.3.0 g++
C++ Yes 40.0 (26.8x) 346.3 (190.4x) 362.8 (231.3x) 52 (1.5x) 7.8 (19.6x) 23.6 (52.5x) 9.3.0 g++-9
C++ Yes 31.4 (21.1x) 285.9 (157.2x) 309.2 (197.2x) 45 (1.3x) 8.0 (20.2x) 21.9 (48.7x) 10.3.0 g++-10
C++ Yes 35.0 (23.5x) 152.1 (83.7x) 186.4 (118.9x) 1049 (29.9x) 3.7 (9.2x) 14.1 (31.4x) 10.0.0-4 clang-10
C++ Yes 35.8 (24.1x) 132.1 (72.6x) 168.6 (107.5x) 403 (11.5x) 3.8 (9.6x) 14.3 (31.9x) 11.0.0-2 clang-11
Ada No N/A N/A 791.2 (504.5x) 93 (2.7x) N/A 31.8 (70.8x) 10.3.0 gnat
Ada No N/A N/A 802.8 (511.9x) 94 (2.7x) N/A 31.8 (70.8x) 10.3.0 gnat-10
Go No 12.9 (8.7x) N/A N/A N/A 3.8 (9.5x) N/A 1.16.5 gotype
Go No N/A N/A 377.3 (240.6x) 48 (1.4x) 6.4 (15.9x) 24.3 (54.1x) 10.3.0 gccgo-10
Go No N/A N/A 117.6 (75.0x) 119 (3.4x) N/A 27.5 (61.1x) 1.16.5 go
Swift No 349.4 (234.7x) N/A 830.0 (529.3x) 138 (3.9x) 5.9 (14.9x) 16.9 (37.5x) 5.3.3 swiftc
V No N/A N/A 17.9 (11.4x) 451 (12.8x) N/A 12.1 (27.0x) 0.2.2 v
V Yes N/A N/A 19.0 (12.1x) 402 (11.4x) N/A 12.6 (28.0x) 0.2.2 v
Zig No 62.6 (42.0x) N/A 284.8 (181.6x) 831 (23.7x) 21.7 (54.4x) 33.6 (74.8x) 0.8.0 zig
Zig Yes 81.6 (54.8x) N/A 303.5 (193.6x) 917 (26.1x) 29.9 (74.9x) 39.9 (88.7x) 0.8.0 zig
Rust No 126.0 (84.7x) N/A 412.9 (263.3x) 1659 (47.3x) 13.9 (34.8x) 30.3 (67.5x) 1.53.0-nightly rustc
Rust Yes 140.5 (94.4x) N/A 258.2 (164.7x) 1766 (50.3x) 15.8 (39.6x) 21.5 (48.0x) 1.53.0-nightly rustc
Nim No 36.6 (24.6x) N/A 80.3 (51.2x) 76 (2.2x) 4.2 (10.5x) 8.0 (17.8x) 1.4.6 nim
C# No N/A N/A 25.1 (16.0x) 556 (15.8x) N/A 4.7 (10.4x) 6.12.0.122 mcs
OCaml No N/A N/A 898.4 (572.9x) 463 (13.2x) N/A 40.6 (90.5x) 4.08.1 ocamlopt
OCaml No N/A N/A 87.7 (55.9x) 193 (5.5x) N/A 16.5 (36.8x) 4.08.1 ocamlc
Julia No N/A N/A 384.6 (245.2x) N/A N/A 22.7 (50.6x) 1.8.0-DEV julia
Julia Yes N/A N/A 331.7 (211.5x) N/A N/A 22.5 (50.0x) 1.8.0-DEV julia

TODO

  • Add function benchmark_CSharp_using_dotnet() that calls dotnet build. On my Ubuntu 22.04, both dotnet new and dotnet build segfaults so won’t waste time with this for now.
  • Add language Fortran.
  • Add language Pony.
  • Sort table primarily by build time and then check time.
  • Don’t include Build Time and Build RSS columns when build op is not used.
  • Don’t include Check Time and Check RSS columns when check op is not used.

References

compiler-benchmark's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

compiler-benchmark's Issues

Question about README note on dmd

A very nice benchmark. I have a question about this comment in the README:

The compilers vox dmd are, by a large margin, the fastest. 2 times faster than its closers competitor, tcc.

I don't see how this is true by looking at your table:

Lang-uage Check Time [us/fn] Compile Time [us/fn] Build Time [us/fn] Exec Path
Vox 1.5 (best) N/A 5.2 (3.3x) vox
Vox 2.0 (1.4x) N/A 6.1 (3.9x) vox
D 6.3 (4.2x) 13.4 (7.4x) 17.9 (11.4x) dmd
C 1.8 (1.2x) 1.8 (best) 1.6 (best) tcc

It seems to me that tcc is faster than dmd. It's also not clear what Check, Compile and Build time mean.

Is Check about just checking syntax?
How is Build different than Compile?

Compiler options are not comparable

If you want to compare the compilation times of gcc and Julia the compiler optimization options should be comparable.

Julia is running by default with -O2, gcc by default with -O0

The benchmarks should either use -O2 for gcc or the option --compile=min for Julia.

Is "Run Time" correct?

From what I can see "Run Time" is merely invoking the compiler. It doesn't run the main created and so consequently the "Run Time" goes down with number of functions. A simple test with C bears this out. I get 28650, 2677 and 269 respectively for 100/10 100/100 100/1000.

Is this intentional?

[feature request] Add Nim language

Hi, could you add Nim language?

Nim is a system language using automatic reference counting with destructors and move semantics to manage memory. It supports C/C++/JS backends. Nim has a powerful macro system which allows direct manipulation of the AST, offering nearly unlimited opportunities.

The website:
https://nim-lang.org

The Github repo:
https://github.com/nim-lang/Nim

Installation:
https://nim-lang.org/install.html

debug build: nim c --gc:arc app.nim
release build: nim c --gc:arc -d:release app.nim
check?: nim check app.nim

Sample code:

proc add_long_n0_h0(x: int): int =
  return x + 15440

proc add_long_n0(x: int): int =
  return x + add_long_n0_h0(x) + 95485

proc add_long_n1_h0(x: int): int =
  return x + 37523

proc add_long_n1(x: int): int =
  return x + add_long_n1_h0(x) + 92492

proc add_long_n2_h0(x: int): int =
  return x + 39239

proc add_long_n2(x: int): int =
  return x + add_long_n2_h0(x) + 12248


var long_sum = 0;
long_sum += add_long_n0(0)
long_sum += add_long_n1(1)
long_sum += add_long_n2(2)

./benchmark --languages=C++ fails: `AttributeError: 'Process' object has no attribute '_cache'`

pip3 install psutil
Collecting psutil
Downloading psutil-5.8.0-cp39-cp39-macosx_10_9_x86_64.whl (236 kB)
|████████████████████████████████| 236 kB 1.6 MB/s
Installing collected packages: psutil
Successfully installed psutil-5.8.0

python3 --version
Python 3.9.4

./benchmark --languages=C++
# Code-generation:
- Generating generated/c++/main.c++ took 0.135 seconds (C++)
- Generating generated/c++/main_t.c++ took 0.135 seconds (C++)

# Benchmark:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/psutil/_common.py", line 447, in wrapper
    ret = self._cache[fun]
AttributeError: 'Process' object has no attribute '_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/psutil/_common.py", line 447, in wrapper
    ret = self._cache[fun]
AttributeError: _cache

During handling of the above exception, another exception occurred:

Build error in benchmarking

## D-Templated-Build-dmd:
- Build took 0.875 seconds (using "/usr/bin/dmd" version v2.088.0)
## D-Templated-Build-ldmd2:
output: b'(6): Error: only one `main` function allowed\n'
- Build took 0.970 seconds (using "/opt/dev-setup/ldc2-1.17.0-linux-x86_64/bin/ldmd2" version 1.17.0)
## D-Templated-Build-gdc:
- Build took 8.017 seconds (using "/usr/bin/gdc" version 10.0.1)
## Java-Untemplated-Build-/usr/bin/javac:
javac 11.0.8
Traceback (most recent call last):
  File "./benchmark", line 1260, in <module>
    main()
  File "./benchmark", line 260, in main
    results += benchmark_Java(execs=execs, durs=durs, gpaths=gpaths, args=args, op='Build', templated=False)
  File "./benchmark", line 670, in benchmark_Java
    version = sp.run([exe, '-version'], stderr=sp.PIPE).stderr.decode('utf-8').split()[1]
IndexError: list index out of range

generate code for each language directly instead of using current complex piecemeal approach

benchmark python code would IMO be a lot more readable, concise, maintainable and extensible to other languages and benchmarks if it instead generated each language searately instead of the current complex logic, eg

def generate_main_test_function_variable(lang, typ, f, templated):
    if lang in ('c', 'c3', 'c++', 'd', 'vox'):
        f.write(Tm('    ${T} ${T}_sum = 0;\n').substitute(T=typ))
    elif lang in ['ada']:
        f.write(Tm('   ${T}_sum : ${T} := 0;\n').substitute(T=typ))
    elif lang in ['c#']:
        f.write(Tm('        ${T} ${T}_sum = 0;\n').substitute(T=typ))
...

Execution time for Julia is reported incorrectly

If I execute:

./benchmark --languages=C,Julia

I get the following result:

| Lang-uage | Temp-lated | Check Time [us/fn] | Compile Time [us/fn] | Build Time [us/fn] | Run Time [us/fn] | Check RSS [kB/fn] | Build RSS [kB/fn] | Exec Version | Exec Path | 
| :-------: | ---------- | :----------------: | :------------------: | :----------------: | :--------------: | :---------------: | :---------------: | :----------: | :-------: | 
| C         | No         |    2.1 (best)      |    2.0 (best)        |    3.0 (best)      |    124 (best)    |    0.4 (best)     |    0.7 (best)     | 0.9.27       | tcc       | 
| C         | No         |   10.5 (5.1x)      |  290.8 (144.7x)      |  292.1 (95.8x)     |    611 (4.9x)    |    4.3 (10.0x)    |   22.4 (33.3x)    | 9.3.0        | gcc       | 
| C         | No         |   12.0 (5.8x)      |  296.7 (147.6x)      |  294.9 (96.7x)     |    158 (1.3x)    |    4.3 (9.9x)     |   22.4 (33.3x)    | 9.3.0        | gcc-9     | 
| C         | No         |    7.5 (3.7x)      |  282.0 (140.4x)      |  265.9 (87.2x)     |    147 (1.2x)    |    3.5 (8.1x)     |   21.6 (32.2x)    | 10.3.0       | gcc-10    | 
| C         | No         |   24.6 (12.0x)     |  132.9 (66.1x)       |  127.4 (41.8x)     |   1219 (9.8x)    |    6.2 (14.3x)    |   18.3 (27.2x)    | 10.0.0-4     | clang-10  | 
| Julia     | No         |    N/A             |    N/A               | 20331.1 (6668.5x)  |    N/A           |    N/A            |   68.5 (101.9x)   | 1.6.1        | julia     | 
| Julia     | Yes        |    N/A             |    N/A               | 16652.0 (5461.8x)  |    N/A           |    N/A            |   61.7 (91.9x)    | 1.6.1        | julia     | 

So for the Run Time we get the result N/A.

If I run Julia from the command line and execute:

julia> @time main()
  0.000000 seconds
495802636

So the the execution time is zero, because the result is already calculated during compile time using constant propagation.

This means in the result table you should also report zero run time.

Or modify the test code so that full constant propagation is not possible.

clang++ is a symbolic link to one clang-** version only for C++ language

current approach to test different versions of clang compiler against c++ considers the path of clang++ executable with different version numbers, but on macos at least, clang++ is just a symbolic link to latest clang-** executable. Hence only a single version of clang is tested.
Not sure about linux, but for mac, this

if lang == 'C':
    exe = which('clang' + str(clang_version))
elif lang == 'C++':
    exe = which('clang++' + str(clang_version))

is not necessary and should only be

exe = which('clang' + str(clang_version))

even for C++.

Why there is no info in most tests for cproc and julia?

First of all hi and bravo for your amazing work!!! It is very important to have a repo for compiler benchmarks! And you also made a couple of them for every language which is amazing!!!

Now for my question, all the other languages seem to have at least the runtime and compile/build time benchmarks except for cproc and julia which miss almost all the benchmarks. Why is this happening?

Installation fails

I am trying to install the python packages on Linux Mint Cinnamon 20.1.

I did:

sudo apt install python3-pip
sudo apt install psutils

I get the error message:

ufechner@vmware:~/repos/compiler-benchmark$ ./install-python-packages.sh 
ERROR: Could not find a version that satisfies the requirement psutils (from versions: none)
ERROR: No matching distribution found for psutils

Any idea?

New Julia benchmark

First, you might want to benchmark Julia on master as is (or possibly next nightly, I just noticed yet one more improvement merged just now "Remove alloca from codegen").

I don't know if the issue with your very unusual benchmark is fixed. But Julia does use -O2 by default so you might also want to try running with -O0 (or --inline=no that I think is at least implied by the lowest level) or -O1, since there is no Julia debug/development-build mode, and that's the closest I can think of; Or even with --compile=min

At least if you see an improvement, there's also a further 25% improvement available (but you have to opt into this new Julia parser, it will be merged into Julia, but then also at first off by default):

JuliaLang/JuliaSyntax.jl#228

I also wanted to point that out for you for D (or other) language.

Adding extra C compilers

Hi, thanks for the project. Consider to add extra compilers for C: Zig and D.

I've tried to generate C source from your repo (--function-count=200 --function-depth=200).

Then compile it with all available on my system compilers. It seems working, but some issues should be solved (maybe with some additional flags for compiler):

  • Zig is very slow at first compilation. Moreover it is caching the results so the next compilation is very fast (some miliseconds).
  • It seems Zig makes stripped version by default (which also could take time). Becase it is the only exectubale file that not affected by "strip" command at all - the size is not changing.
  • D ImportC compiling your functions, but hardly could be consider as fair-play participant without support of preprocessor. And when I try to add "#include <stdio.h>" - it also gave me an error. So I'm not sure how properly measure lack of this "features".

I've used hyperfine on my Manjaro laptop:

  • AMD Ryzen 3 5300U
  • gcc (GCC) 12.2.0
  • clang version 14.0.6
  • zig 0.9.1
  • DMD64 D Compiler v2.100.2

Results:
Benchmark 1: clang -o code_clang code.c
Time (abs ≡): 5.034 s [User: 4.902 s, System: 0.124 s]

Benchmark 2: gcc -o code_gcc code.c
Time (abs ≡): 12.375 s [User: 12.024 s, System: 0.326 s]

Benchmark 3: zig cc -o code_zig code.c
Time (abs ≡): 70.219 s [User: 68.940 s, System: 1.083 s]

Benchmark 4: dmd -of=code_dmd code.c
Time (abs ≡): 1.344 s [User: 1.127 s, System: 0.211 s]

Summary
'dmd -of=code_dmd code.c' ran
3.75 times faster than 'clang -o code_clang code.c'
9.21 times faster than 'gcc -o code_gcc code.c'
52.26 times faster than 'zig cc -o code_zig code.c'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.