fortran-lang / benchmarks Goto Github PK

View Code? Open in Web Editor NEW

17.0 16.0 11.0 11 KB

Fortran benchmarks

License: MIT License

C 39.48% Fortran 26.05% Python 34.46%

benchmarks's Introduction

benchmarks

Fortran benchmarks

benchmarks's People

Contributors

Stargazers

Watchers

Forkers

gridl tubbz-alt arunningcroc rouson zerothi anarcypher qianglise euler-37 xecej4 seanpm2001 loiseaujc

benchmarks's Issues

precision problem for optimized.f90

y<0.8
should be changed as y<0.8_dp

and

a=0.01

should be changed as a=0.01_dp

and the result(using "Windows,GCC 11.2.0 " from 'equation.com',flag -Ofast)

Fortran:  0.45312500000000000            46337
C      : iters 46213 Execution time: 0.473000

Random number generation

There are several algorithms that might be good for benchmarks:

random numbers from the normal distribution: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/random.f90#L25
gamma distribution: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/random.f90#L68
https://github.com/vmagnin/exploring_coarrays/blob/66b9ee5483b81bdc11c67642e918068b6e1bb821/pi_monte_carlo_co_sum.f90 (background post)

LCG algorithm:

subroutine lcg_int32(x)
! Linear congruential generator
! https://en.wikipedia.org/wiki/Linear_congruential_generator
integer(int32), intent(out) :: x
integer(int32), save :: s = 26250493
s = modulo(s * 48271, 2147483647)
x = s
end subroutine

subroutine lcg_real32(x)
real(real32), intent(out) :: x
integer(int32) :: s
call lcg_int32(s)
x = s / 2147483647._real32
end subroutine

LULESH

LULESH is the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics: https://asc.llnl.gov/codes/proxy-apps/lulesh
This is a popular proxy app used at HPC trainings.

LLNL also has an algebraic multigrid proxy app: https://asc.llnl.gov/codes/proxy-apps/amg2013

I think it would be good if the final output is HTML+JSON that we can serve up at fortran-lang.org/benchmarks via gh-pages.
This can include various customizable and interactive plots to allow easily comparing different scenarios, and links for raw CSV and JSON download.

Some additional features that would be really nice for each benchmark case are the ability to show compiler reports for optimisation and vectorisation as well as disassembled output like godbolt.org for checking generated instructions.

Should we benchmark languages other than Fortran, why, and how?

I see great value in implementing a variety of simple yet real-world algorithms in Fortran and benchmarking them along multiple axes:

Different problem sizes (e.g. array or matrix size)
DIfferent compilers
Different optimization flags
Different hardware

How about different languages? What would be the main purpose of that?

Are we interested in comparing the performance of Fortran and other language implementations, using idiomatic, naive code (i.e. the code that a novice would write), and thus comparing the compilers capability to optimize?

Or are we interested in writing code in different languages that produces the same (or as similar as possible) assembly, and then compare the source code?

Automatic benchmarks via Github workflow

I created a repository (St-Maxwell/benchmark) with the use of Github workflow to perform automatic benchmarks and publish the results to Github page. I imitated Julia's Microbenchmarks to accomplish this prototype.

By working with my automatic benchmarks, I think fortran-lang/benchmarks require the following tasks:

a basic framework of benchmark for each language
workflow of automatic benchmarks (and deploying github page)
documentation

I believe my repository is helpful for the first two tasks. Of course, it needs further polishing. So I'd like to hear your suggestions. Thanks!

Add "benchmarks" from Julia

https://fortran-lang.discourse.group/t/improving-fortran-results-in-the-julia-micro-benchmarks/198

Add Recaman's sequence

There are several codes here:

https://fortran-lang.discourse.group/t/integer-sequences/1081

USM3D CFD benchmark

Discussed here, summarized here, and at Hacker News here. Quoting the 2nd source,

"Hunter benchmarks USM3D, is described by NASA as “a tetrahedral unstructured flow solver that has become widely used in industry, government, and academia for solving aerodynamic problems. Since its first introduction in 1989, USM3D has steadily evolved from an inviscid Euler solver into a full viscous Navier-Stokes code.”

As previously noted, this is a computational fluid dynamics test, and CFD tests are notoriously memory bandwidth sensitive. We’ve never tested USM3D at ExtremeTech and it isn’t an application that I’m familiar with, so we reached out to Hunter for some additional clarification on the test itself and how he compiled it for each platform. There has been some speculation online that the M1 Ultra hit these performance levels thanks to advanced matrix extensions or another, unspecified optimization that was not in play for the Intel platform."

Add summation of sin example

It came up here:

https://fortran-lang.discourse.group/t/simple-summation-8x-slower-than-in-julia/1171

Arrays vs loops

Examples from the article: https://doi.org/10.1155/1996/208679

Benchmark criteria

We should decide on criteria for what makes a good/suitable benchmark problem and how it can be implemented.

Add examples from benchmarksgame

https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/fortran.html

Add: Sum benchmark/example

As submitted here, or similar:

https://fortran-lang.discourse.group/t/julia-fast-as-fortran-beautiful-as-python/1405/127

Himeno benchmark

Dr. Ryutaro Himeno, Director of the Advanced Center for Computing and Communication, has developed this benchmark to evaluate performance of incompressible fluid analysis code. This benchmark program takes measurements to proceed major loops in solving the Poisson’s equation solution using the Jacobi iteration method.

Being the code very simple and easy to compile and to execute, users can measure actual speed (in MFLOPS) immediately.

Full link: https://i.riken.jp/en/supercom/documents/himenobmt/

The benchmark is used in some recent HPC papers and presentations including:

The benchmark appears closely related to the current Poisson2d benchmark, but instead features a 3D Jacobi stencil.

I've seen similar (MPI-enabled) Jacobi benchmarks before in the book of Hager & Wellein, Introduction to High Performance Computing for Scientists and Engineers.

Should we rename the repository from `benchmarks` to `idioms`?

See #10 for the background discussion. @rouson and I brainstormed a better name, and we came up with idioms.

It seems idioms communicates better what we are trying to achieve:

Have mainly idiomatic code (in each language) how to solve a given problem
Document each problem, give a mathematical background and then several versions in each language how to solve it
We'll also have non-idiomatic code that tries to extract the best performance, but the idiomatic code could help compilers to optimize it better
What is "idiomatic" is subjective, and thus we can and should have several different versions
As a user, I would love to browse the approaches how to solve a given problem, even just in Fortran. But also in C++, Julia, Python and other languages. To learn and educate myself. And also to compare how "easy" it is to write something like this myself in a given language.
I would like to see timings for each version in various compilers, options, platforms (but this is only part of the goal)

Add PRK kernels

https://github.com/ParRes/Kernels

Add example for M(i,j) = exp(i_ksqrt(a(i)2 + a(j)2))

Code in:

https://fortran-lang.discourse.group/t/julia-fast-as-fortran-beautiful-as-python/1405

Where to run the benchmarks for the published results?

Users will be able to run benchmarks locally. However, how do we choose on what kind of machine to run the benchmarks for the published results? Should it just be a reasonably recent server, and then we also document the machine specs alongside the benchmarks like https://julialang.org/benchmarks/ did?

For multi-compiler benchmarks this may be tricky. With several compilers we can build for x86, but some will be specific to the vendor hardware (IBM, Cray, ARM). Should we consider these as well?

Considering the support from the community so far and potential impacts of fortran-lang, down the road it shouldn't be difficult to get a dedicated cloud instance donated.

In the meantime, perhaps it will be good enough to just run on a recent workstation that one of us owns.

Add Runge-Kutta benchmarks

More details with Fortran and Julia code available at:

https://fortran-lang.discourse.group/t/improving-fortran-results-in-the-julia-micro-benchmarks/198/17

Add FFT examples

1D fft:

radix 2
radix 2, 3, 5 + driver
3D fft:
using the 1D fft

There is FFTPACK implementation of the above, and I have my own more modern Fortran implementation. For the radix 2 there is also a Python/NumPy implementation here:

https://jakevdp.github.io/blog/2013/08/28/understanding-the-fft/

Choosing License

While studying the stopwatch links @ivan-pi posted in #6, it occurred to me that each one was under a different license. While preparing benchmarks we will need to use software written in different languages under different licenses. As I understand, benchmarks will be part of the fortran-lang website therefore under MIT license.
What happens in the following scenario: I am preparing a Nbody benchmark, I already coded the Fortran version under MIT, I copied the C++ version from rosettacode.org, in order to start comparisons, which is under GFDL (v1.2) (and will do the same for the Julia version) and if I choose to time it with eg. the second link in #6 which is under GPLv3 will there be a conflict/problem? Can I publish the codes and the results and if yes, under what license?

Benchmark driver

We need a framework for compiling and running benchmark cases across a test matrix and, if necessary, collecting and post-processing the results.

The main dimensions for the test matrix are:

Different compilers (and eventually different languages)
Different optimisation levels

and there may be others. The framework needs to be able to run locally and in CI for testing.

This can be achieved with a makefile, though this may not be as flexible as a custom solution.

TeaLeaf benchmark

TeaLeaf is a mini-app that solves the linear heat conduction equation on a spatially decomposed regularly grid using a 5 point stencil with implicit solvers.

The GitHub repository can be found here: https://github.com/UK-MAC/TeaLeaf

If not else it can be added to a list of third party benchmarks.

add topics

I suggest adding the topics benchmark, benchmarks, fortran, python, c in the About section.

HPCG Benchmark

Another HPC Benchmark: http://hpcg-benchmark.org/. The reference version is C++ with OpenMPI and OpenMP parallelization.

A co-array Fortran version of the benchmark is discussed in the book by Robert W. Numrich, Parallel programming with co-arrays, Chapman & Hall / CRC Press, 2019. It also includes a very interesting performance analysis.

Write code as tests

It would be a really big time-saver for someone encountering these codes for the first time if every code is written as a test that reports success or failure so that a newcomer doesn't have to digest the entire algorithm and then read verbose output to determine whether a particular run succeeded or failed. End each program with something along the lines of

  block 
    logical, parameter :: testing=.true. 
    if (testing) call verify(calculation_result) ! error terminate if the calculation failed
    print *, "Poisson test passed."
  end block

where making testing a compile-time constant allows an optimizing compiler to completely eliminate the verification code during a dead-code removal phase when you want to do runs to measure performance. You could use a preprocessor macro to switch the value to false when so desired.

Even better would be to adopt a unit-testing framework that automates the execution of all the tests. I recommend Vegetables.

Timing routines

For benchmarking purposes it would be useful to have a set of timing routines.

Some examples of prior art include:

StopWatch by William F. Mitchell from NIST: https://math.nist.gov/StopWatch/ (mirror on Github - https://github.com/ivan-pi/StopWatch)
Stopwatch by @juanmanzanero: https://github.com/juanmanzanero/Stopwatch
Stopwatch class by @leonfoks (part of coretran): https://github.com/leonfoks/coretran/tree/master/src/time

In Julia they use two options:

the @time macro, which measures the time taken to execute an expression
the BenchmarkTools.jl package including the @btime macro which executes the expression multiple times and uses regression to reduce noise.

A week or two ago, I tried to build some timing macros using fypp:

#:def NTIC(n=1000)
  #:global BENCHMARK_NREPS
  #:set BENCHMARK_NREPS = n
  block
    use, intrinsic :: iso_fortran_env, only: int64, dp => real64
    integer(int64) :: benchmark_tic, benchmark_toc, benchmark_count_rate
    integer(int64) :: benchmark_i
    real(dp) :: benchmark_elapsed
    call system_clock(benchmark_tic,benchmark_count_rate)
    do benchmark_i = 1, ${BENCHMARK_NREPS}$
#:enddef

#:def NTOC(*args)
    #:global BENCHMARK_NREPS
    end do
    call system_clock(benchmark_toc)
    benchmark_elapsed = real(benchmark_toc - benchmark_tic)/real(benchmark_count_rate)
    benchmark_elapsed = benchmark_elapsed/${BENCHMARK_NREPS}$
  #:if len(args) > 0
    ${args[0]}$ = benchmark_elapsed
  #:else
    write(*,*) "Average time is ",benchmark_elapsed," seconds."
  #:endif
  end block
  #:del BENCHMARK_NREPS
#:enddef

These can be used then as follows:

  real :: x(1000), y(1000), avg_time
  call random_number(x)

  @:NTIC(100)
  y = sqrt(x)
  @:NTOC() ! print average time

  @:NTIC(100)
  y = sqrt(x)
  @:NTOC(avg_time) ! save average time to variable

Perhaps a combination of a StopWatch class and some fypp macros, could enable us to do some similar regression tests as done by Julia.

Add n-body example

This benchmark / example could be identical to all the Fortran versions from:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/nbody.html

We can also try a few different n-body problems.

Reid, J. K. (1990). Fortran 8X features and the exploitation of parallelism. In Scientific Software Systems (pp. 102-111). Springer, Dordrecht. https://doi.org/10.1007/978-94-009-0841-3_7

It is based on an example by Alan Wilson, showing a simple Ising model, which is a well-known Monte Carlo simulation in 3-dimensional space.

The paper by Wilson is the following one

Reid, J. K., & Wilson, A. (1985). The array features in FORTRAN 8x with examples of their use. Computer Physics Communications, 37(1-3), 125-132. https://doi.org/10.1016/0010-4655(85)90144-4