llnl / amg Goto Github PK

Algebraic multigrid benchmark

License: GNU Lesser General Public License v2.1

C 97.50% C++ 1.89% Makefile 0.61%

amg's Introduction

#BHEADER**********************************************************************
# Copyright (c) 2017,  Lawrence Livermore National Security, LLC.
# Produced at the Lawrence Livermore National Laboratory.
# Written by Ulrike Yang ([email protected]) et al. CODE-LLNL-738-322.
# This file is part of AMG.  See files COPYRIGHT and README for details.
#
# AMG is free software; you can redistribute it and/or modify it under the
# terms of the GNU Lesser General Public License (as published by the Free
# Software Foundation) version 2.1 dated February 1999.
#
# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the IMPLIED WARRANTY OF MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the terms and conditions of the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU Lesser General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
#EHEADER**********************************************************************


General description:

AMG is a parallel algebraic multigrid solver for linear systems arising from
problems on unstructured grids.  The driver provided with AMG builds linear 
systems for various 3-dimensional problems.
AMG is written in ISO-C.  It is an SPMD code which uses MPI and OpenMP 
threading within MPI tasks. Parallelism is achieved by data decomposition. The 
driver provided with AMG achieves this decomposition by simply subdividing 
driver provided with AMG achieves this decomposition by simply subdividing 
the grid into logical P x Q x R (in 3D) chunks of equal size.
For more information, see the amg.readme file in the docs directory of the
distribution.
%==========================================================================
%==========================================================================

Building the Code

AMG uses a simple Makefile system for building the code.  All compiler and
link options are set by modifying the file 'AMG/Makefile.include'
appropriately.  

To build the code, first modify the 'Makefile.include' file appropriately, 
(it is recommended to use the options -DHYPRE_BIGINT )
then type (in the AMG directory)

  make

Other available targets are

  make clean        (deletes .o files)
  make veryclean    (deletes .o files, libraries, and executables)

To configure the code to run with:

1 - MPI only , add '-DTIMER_USE_MPI' to the 'INCLUDE_CFLAGS' line 
    in the 'Makefile.include' file and use a valid MPI.
2 - OpenMP with MPI, add vendor dependent compilation flag for OMP
3 - to be able to solve problems that are larger than 2^31-1,
    add '-DHYPRE_BIGINT'
4 - For additional optimizations in MPI add '-DHYPRE_USING_PERSISTENT_COMM'
5 - For additional optimizations in OpenMP add '-DHYPRE_HOPSCOTCH'
%==========================================================================
%==========================================================================

Figure of Merit (FOM)

For problem 1, 
there are 2 FOMs printed out at the end of each run:
nnz_AP / setup_time
nnz_AP * #iterations / solve time
Both need to be considered.

For problem 2,
one FOM needs to be considered:
nnz_AP * (#iterations + time_steps) / time

amg's People

Contributors

Stargazers

Watchers

amg's Issues

Potential data race detected by static tool

We are developing a static race detection tool and we found a few potential data races in this project. We were unable to determine if these races are possible under some input, so we thought it best to report just in case.

Potential Race on `temp`

We found a potential race on temp inside of the parallel loop at seq_mv/csr_matvec.c:747.

#ifdef HYPRE_USING_OPENMP
#pragma omp parallel for private(i,jj) HYPRE_SMP_SCHEDULE
#endif

   for (i = 0; i < num_rows; i++)
   {
      if (CF_marker_x[i] == fpt)
      {
         temp = y_data[i];
         for (jj = A_i[i]; jj < A_i[i+1]; jj++)
            if (CF_marker_y[A_j[jj]] == fpt) temp += A_data[jj] * x_data[A_j[jj]];
         y_data[i] = temp;
      }
   }

There are two writes to tmp

temp = y_data[i];
temp += A_data[jj] ...

We were unable to confirm if the branch conditions guarding these writes prevent multiple threads from executing these writes in parallel.

Was temp intended to be marked private?

I have pasted the full report from our tool below for reference.

==== Found a race between: 
line 754, column 10 in csr_matvec.c AND line 756, column 51 in csr_matvec.c
Shared variable: 
temp at line 675 of csr_matvec.c
 675|   HYPRE_Complex      temp;
Thread 1: 
 752|      if (CF_marker_x[i] == fpt)
 753|      {
>754|         temp = y_data[i];
 755|         for (jj = A_i[i]; jj < A_i[i+1]; jj++)
 756|            if (CF_marker_y[A_j[jj]] == fpt) temp += A_data[jj] * x_data[A_j[jj]];
>>>Stack Trace:
>>>.omp_outlined._debug__.36.414 [csr_matvec.c:750]
Thread 2: 
 754|         temp = y_data[i];
 755|         for (jj = A_i[i]; jj < A_i[i+1]; jj++)
>756|            if (CF_marker_y[A_j[jj]] == fpt) temp += A_data[jj] * x_data[A_j[jj]];
 757|         y_data[i] = temp;
 758|      }
>>>Stack Trace:
>>>.omp_outlined._debug__.36.414 [csr_matvec.c:750]
The OpenMP region this bug occurs:
/home/brad/tmp/AMG/seq_mv/csr_matvec.c
>747|#pragma omp parallel for private(i,jj) HYPRE_SMP_SCHEDULE
 748|#endif
 749|
 750|   for (i = 0; i < num_rows; i++)
 751|   {
 752|      if (CF_marker_x[i] == fpt)

Potential Race on `res0`

The second potential race we identified occurs on res0 inside of the parallel loop at parcsr_ls/par_relax.c:1444.
There are writes to res0 at lines 1470 and 1477:

res0 = 0.0;
res0 -= A_diag_data[jj] ...

#ifdef HYPRE_USING_OPENMP
#pragma omp parallel for private(i,ii,j,jj,ns,ne,res,rest,size) HYPRE_SMP_SCHEDULE
#endif
for (j = 0; j < num_threads; j++)
{
    ....
    for (i = ne-1; i > ns-1; i--)	/* interior points first */
    {

        /*-----------------------------------------------------------
        * If diagonal is nonzero, relax point i; otherwise, skip it.
        *-----------------------------------------------------------*/
        
        if ( A_diag_data[A_diag_i[i]] != zero)
        {
            res = f_data[i];
            res0 = 0.0; // <================================= Racing Write
            res2 = 0.0;
            for (jj = A_diag_i[i]+1; jj < A_diag_i[i+1]; jj++)
            {
                ii = A_diag_j[jj];
                if (ii >= ns && ii < ne)
                {
                    res0 -= A_diag_data[jj] * u_data[ii]; // <=== Racing Write

We were unable to determine if the branches guarding these writes would prevent multiple threads from accessing these lines.

If this is a real race, there is a nearly identical case in the parallel loop at parcsr_ls/ams.c:3783 on the variable res2.

Different numbers of iterations while running code in different hardware paltform

I ran AMG on Xeon, KNL and KNM platform, but when I was running on KNM/KNL I find that the numbers of iterations become much more than on Xeon(with same configuration), what's more, when I running with 1 OMP thread per process, the number of iterations will become 0 on KNM/KNL, is there any way(like some flag) to control the number of iteration, or this is a problem of the code?

Regarding the issue of failure in running due to errors during the conversion of `amg.c` into intermediate code

Perhaps this is a strange question, and I am not sure if the project author is still actively involved in it, but I thought I would give it a try and inquire. Here's the situation: Firstly, I have successfully built the project using the make command. However, when I attempted to compile and run amg.c separately using different approaches (I am using the Clang compiler), I encountered the following issues:

amg.c can be successfully compiled into an executable file directly using Clang, and it can run smoothly. The command is as follows:
clang -fopenmp -o test4 amg.c -I.. -I../utilities -I../IJ_mv -I../seq_mv -I../parcsr_mv -I../parcsr_ls -I../krylov -DTIMER_USE_MPI -DHYPRE_USING_OPENMP -DHYPRE_HOPSCOTCH -DHYPRE_USING_PERSISTENT_COMM -DHYPRE_BIGINT -DHYPRE_TIMING -lmpi -L. -L../parcsr_ls -L../parcsr_mv -L../IJ_mv -L../seq_mv -L../krylov -L../utilities -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm
However, when I followed the steps below to convert amg.c into an executable file, I encountered different errors during the execution:
①
clang -S -emit-llvm -o amgtest.ll amgtest.c
clang amgtest.ll -fopenmp -DTIMER_USE_MPI -DHYPRE_USING_OPENMP -DHYPRE_HOPSCOTCH -DHYPRE_USING_PERSISTENT_COMM -DHYPRE_BIGINT -DHYPRE_TIMING -lmpi -L. -L../parcsr_ls -L../parcsr_mv -L../IJ_mv -L../seq_mv -L../krylov -L../utilities -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -o test4 -g
mpirun -np 4 test4
Output:

②
clang -S -emit-llvm -o amgtest.ll amgtest.c
llvm-as amgtest.ll -o amgtest.bc
llc amgtest.bc -o amgtest.s
clang amgtest.s -no-pie -fopenmp -DTIMER_USE_MPI -DHYPRE_USING_OPENMP -DHYPRE_HOPSCOTCH -DHYPRE_USING_PERSISTENT_COMM -DHYPRE_BIGINT -DHYPRE_TIMING -lmpi -L. -L../parcsr_ls -L../parcsr_mv -L../IJ_mv -L../seq_mv -L../krylov -L../utilities -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -o test4 -g
mpirun -np 4 test4
Output:

I identified the function where the error occurred through GDB debugging. After adding some debugging information, I found that the issue was related to the IJ_A structure.

I suspect that the error occurred during the conversion of the amg.c code into intermediate code due to the complexity of the IJ_A structure. This may have resulted in some unexpected errors, leading to the final error. I have ruled out the issue of compiler versions (as I have tried Clang 3.8, Clang 4.0, and Clang 3.0). Could it be a problem with the code itself? Have the authors encountered similar errors before? (Since my current research is in code optimization/profiling, I need to manipulate the intermediate code of amg.c). Do you have any suggestions? Thank you very much!

Thank you!

I just wanted to give huge props and say thank you for how easy this was to build, and find examples for running here! I literally typed make in an environment with the dependencies (all installed easily with apt) and it worked, and then the example problems did too.

Please close this after reading, just wanted to say thank you! 🙌

llnl / amg Goto Github PK

amg's Introduction

amg's People

Contributors

Stargazers

Watchers

Forkers

amg's Issues

Potential data race detected by static tool

Potential Race on `temp`

Potential Race on `res0`

Different numbers of iterations while running code in different hardware paltform

Regarding the issue of failure in running due to errors during the conversion of `amg.c` into intermediate code

Thank you!

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

llnl / amg Goto Github PK

amg's Introduction

amg's People

Contributors

Stargazers

Watchers

Forkers

amg's Issues

Potential Race on temp

Potential Race on res0

Recommend Projects

Recommend Topics

Recommend Org

Potential Race on `temp`

Potential Race on `res0`