The openmp_vv from openmp-validation-and-verification

Missing top-of-file comments for some tests

Describe the bug
I've noticed that a few older tests are missing header comments. This may not be high-priority but I wanted to document it here so we remember to fix it.

Test that it applies to
There may be more but I have found the following to be missing the top of file comment. I will add more when I get the chance.

test_target_teams_distribute_parallel_for_devices.c: wrong check, mult-dev handling

As found out by a colleague, there are two issues:

If there is more than one device (e.g. host + offloading device), all but the last set array element of "isHost" is uninitialized. The problem that isHost[0] is set for the first loop iteration – and map(from:) to the host. Then isHost[1] is set – but due to map(from:) it starts with a full but uninitialized array, hence, after this loop iteration, isHost[0] is undefined and isHost[1] is set etc. (It does work by chance, if the variable is zero initialized such that only the last iteration survives.) — Solution: use 'tofrom:'
Likewise the error check at the end: OMPVV_TEST_AND_SET(errors, isHost[dev]) this will fail for the host; it might work if (see above) isHost is zero initialized on the device as then all but the last device is ignored in this check.

test_target_data_use_device_ptr.c – uses undefined variable in clausesince pull req. #28

test_target_data_use_device_ptr.c was largely cleaned up in pull request #28 by @spophale.

Unfortunately, while the declaration of isHost was removed, it is still used in an OpenMP clause. (See also comment in #28.) Namely, the commit ff84b5f has for isHost::

--- a/tests/4.5/target_data/test_target_data_use_device_ptr.c
+++ b/tests/4.5/target_data/test_target_data_use_device_ptr.c
…
-  int errors = 0, len = 10000, isHost = 0;
+  int errors = 0, map_dummy;
…
+#pragma omp target is_device_ptr(array_device) map(tofrom: array_host[0:N]) map(tofrom: isHost)

out of bounds access in tests/4.5/target_teams_distribute/test_target_teams_distribute_firstprivate.F90

at line 105:
d(x) = a(x) + b(x) + c(x) + privatized_array(MOD(x, 10)) + privatized

since x can be 10, 20 ... MOD(x, 10) can evaluate to zero, hence we'll access privatized_array(0)

but privatized_array's section is 1:10, 0 is an out of bounds index

this will lead to incorrect value reading of privatized_array and hence undefined results.

segfault in test_error_verbose

When sollve_vv checks a result by invoking OMPVV_TEST_VERBOSE, it passes a constant literal as the second argument to test_error_verbose here. However, inside the implementation of test_error_verbose the string literal is being modified. This causes a segmentation fault for seven tests (in my case).

A different approach would be to copy the conditionStr argument to clean_condition before the DO WHILE loop and modify only the copy in the loop. Doing this locally on my box fixes up those seven run-time failures.

[EXTERNAL] OpenMP Target Array reductions

Dear OLCF consultants,

We are working on developing our post processing codes to make use of
GPUs using OpenMP 4.5. We have run into some difficulties when trying to
perform array reductions on the device.

We have tested two methods to perform array reductions :

A simple reduction where the argument list to the REDUCTION clause
contains an array, rrs of shape norder x nxh.
!$OMP TARGET TEAMS DISTRIBUTE PARALLEL DO REDUCTION(+:rrs)

This method results in the following run time error :

1587-175 The underlying GPU runtime reported the following error "an
illegal memory access was encountered".
1587-163 Error encountered while attempting to execute on the target
device 0. The program will stop.

Based on Oscar's suggestion, user-defined reductions were attempted
to implement the array reduction.

This method results in a failure at compile time :

"array_reduce_v2.F90", 1516-064 (S) Operands to the + operator must be
compatible with the intrinsic uses of the operator, or with a specific
interface within an accessible defined operator generic interface.
I remember in the past that array reductions was not supported but I
thought this is supported in the recent compiler versions.

We have consulted with Oscar from ORNL and Kevin Li from IBM regarding
this issue. Kevin suggested that 2 PMRs be opened, one for each test code.

I have generated reproducer codes with some notes in the README.txt file
on how the programs were compiled and run. These can be found in the
following tarball :

2019-08-11-OMP-TARGET-ARRAY-REDUCTIONS.tar.gz

errors variable is uninitialized in reduction_sub test

source file :
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_sub.F90

At line no: 30
30 INTEGER:: x, y, errors, host_result, result
31
the variable "errors" is NOT initialized to 0

this would result in random pass/fail of the test
@line 50:
OMPVV_TEST_AND_SET_VERBOSE(errors, result .ne. host_result)
the above line sets errors as
errors = errors + ...

g++ compiler error when using `this` in update clause

Right here is where the error occurs. I thought this was a reasonable thing to do until one of the Mentor Graphics folks pointed out to me that you explicitly avoid using this-> in a different test.

We should probably use the same workaround here.

Bug in test_target_enter_data_allocate_array_to.F90

Describe the bug
Appears to contain just a simple copy/paste mistake in the error-checking logic. The code is comparing the wrong variable, and the following changes fixes the issue:

diff --git a/tests/4.5/target_enter_data/test_target_enter_data_allocate_array_to.F90 b/tests/4.5/target_enter_data/test_target_enter_data_allocate_array_to.F90
index 88caf568..5ed6ba54 100644
--- a/tests/4.5/target_enter_data/test_target_enter_data_allocate_array_to.F90
+++ b/tests/4.5/target_enter_data/test_target_enter_data_allocate_array_to.F90
@@ -141,7 +141,7 @@
 
             ! Make sure it is not copied back
             IF (.NOT. isSharedEnv) THEN
-              OMPVV_TEST_AND_SET_VERBOSE(errors, ANY(my2DPtr /= 10))
+              OMPVV_TEST_AND_SET_VERBOSE(errors, ANY(my3DPtr /= 10))
             END IF
             OMPVV_TEST_AND_SET_VERBOSE(errors, SUM(my3DArr) /= (N**6+N**3)/2)

Compiler
Provided by Cray.

Accelerator hardware

Complex templated test in C++

Directive and clause it applies to
Class created inside target region which is constructed through complex templates. It should apply to the target directive.

Specification details
This test was taken from: https://bugs.llvm.org/show_bug.cgi?id=43771
This test is from Alpaka, and it could go inside the application tests folder

Pseudocode

#include <type_traits>

template<int Dim>
struct V {
  template<
    bool B = (Dim == 0),
    typename = typename std::enable_if<B>::type>
  V() {}

  template<
    typename TArg0,
    typename = typename std::enable_if<
        (std::is_same<unsigned long, typename std::decay<TArg0>::type>::value)
       >::type>
  V(TArg0 && arg0) {}
};

template<int Dim>
struct S {
  V<Dim> v;
};

int main(int argc, char *argv[]) {
  #pragma omp target
  {
    S<0> s;
  }

  return 0;
}

Use OMPVV_SYS env variable as default of SYSTEM

When using make SYSTEM=... we should be able to also use ```OMPVV_SYS```` (as well as a set of defined env variables for the OMPVV project.

Broken use of uninit memory: test_target_teams_distribute_reduction_bitor.F90

tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_bitand.F90 and
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_bitor.F90 have:

    REAL(8),DIMENSION(N, 32):: randoms
    INTEGER:: result, host_result, x, y, z, errors, itr_count
...
             IF (randoms(x, y) .gt. true_margin) THEN

Here, y is an array index but uninitialized.

Likewise for for OpenMP /5.0/

tests/4.5/target/test_target_map_classes_default.cpp uses a 5.0 feature

The current tests/4.5/target/test_target_map_classes_default.cpp has

  void modifyImplicit() {
#pragma omp target // implicit map(tofrom: this->h_array) map(firstprivate: this->size)
    {
      for (int i = 0; i < size; ++i)
          h_array[i] += 1;
    } // end target 
  }

OpenMP 5.0 has at “2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives”

If the targetconstruct is within a class non-static member function, and a variable is an accessible data member of the object for which the non-static data member function is invoked, the variable is treated as if the this[:1] expression had appeared in a map clause with a map-type of tofrom. Additionally, if the variable is of a type pointer or reference to pointer, it is also treated as if it has appeared in a map clause as a zero-length array section.

If the this keyword is referenced inside a targetconstruct within a class non-static member function, it is treated as if the this[:1] expression had appeared in a mapclause with a map-type of tofrom.

As especially the first bullet point is missing from OpenMP 4.5, I deduce that the modifyImplicit function implicitly relies on OpenMP 5.0 semantics.

Tests failures when running on the host

Hi,

this may be not an actual bug. I have noticed that there are several tests which fails when executed sequentially, and that’s ok according to the spec. However, some of these tests fail when executed in parallel on the host, is that supposed to happen? In other word, where these tests written to both pass when the device is either the CPU itself or an accelerator device? Thanks!

List of tests that are failing on the host (from @tmh97)

NERSC Some requests from them

In terms of future extensions to the SOLLVE test-suite. I think it would be useful to have a simple test to ensure that the compiler supports pointer attachment, e.g.

#define SZ 10
struct Array1D {
  double *data;
  int len;
};
#pragma omp target teams distribute parallel for map(tofrom:arr, arr.data[0:SZ])
for (int i=0; i<arr.len; ++i) arr.data[i] += 1.0;

Neither Cray nor GNU support this OpenMP-5.0 feature and it is a showstopper for many applications.

Also, I think it would be useful to have a test which maps a dynamically allocated 2D array to the device. I have only been successful using Clang and IBM compiler to do this. This is a frequent request by application teams, especially those new to GPUs.

data race in tests/4.5/target_teams_distribute_parallel_for/test_target_teams_distribute_parallel_for_devices.c

in line

‌43 isHost[dev] = omp_is_initial_device();// Checking if running on a device

‌isHost[dev] has a data race as well

This breaks clang

Directive and clause it applies to
reduction inside target

Specification details
Having the reduction on two different elements of the same array breaks clang unsuccessfully. I am not 100% sure this is allowed by the spec, but if so it would make a nice test.

Pseudocode

static double temps[2];

int main () {
   #pragma omp target
   {
       #pragma omp parallel reduction(+:temps[0], temps[1])
       {
           temps[0] += 1;
           temps[1] += 1;
       }
   }
   return 0;
}

Any other comments
None

test_target_is_device_ptr.c fails on the host

Consider the test case test_target_is_device_ptr.c.

Is it supposed to execute correctly on the host?
I think is not, let me explain why to see what you guys think.

At line 30 we have:

array_device = (int *) omp_target_alloc(N*sizeof(int), omp_get_default_device());

omp_get_default_device() will return an available target device. According to the definition of a target device in the spec, it does not include the host device, so whatever value this function returns will not represent the host. Therefore when this test runs on the host it will fail because omp_target_alloc it either allocates data on an available GPU (but the program is running on the host) or, if there are not available devices, the omp_target_alloc will likely return NULL and the test fails as well.

What do you think?

Thanks!
Simone

synchronization issue in tests/4.5/target/test_target_private.F90

We feel there is synchronization issue with test_target_private.F90 testcase that may lead to inconsistent results:

Below is a code snippet from the mentioned test case

!$omp parallel private(p_val, fp_val) shared(actualThreadCnt)
fp_val = omp_get_thread_num() + 2
p_val = omp_get_thread_num() + 1
actualThreadCnt = omp_get_num_threads()
!$omp target map(tofrom:compute_array) map(to:fp_val) private(p_val)
p_val = fp_val - 1
compute_array(p_val,:) = 100
p_val = p_val + 99
!$omp end target
IF (p_val == omp_get_thread_num() + 1) THEN
compute_array(p_val,:) = compute_array(p_val,:) + 1
END IF
!$omp end parallel

Here the compute_array is mapped as tofrom for the target region. Hence every host thread created by parllel region will copy compute array to and from the device. Since the same array is updated after target is completed this will cause incorrect results. The reason is there is no synchronization of threads here.

For example if parallel region has spawned two threads, t0 (threadId = 0) and t1 (threadId = 1) then

for t0 - in target region, array is assigned as -> compute_array(1, :) = 100

for t1 - in target region, array is assigned as -> compute_array(2, :) = 100

Since there is no guarantee on the order of kernel completion, let's consider t1's kernel get's completed first.

for t1 - after target execution, array is updated as -> compute_array(2, :) = compute_array(2, :) + 1 , which is 101

Now if at this point if t0 get's completed, becuase of tofrom mapping, compute array's local copy of t0's kernel is transfrered back to host overwriting the values computed/updated by t1. Hence the check on compute array will fail.

OpenMP doesn't guarantee any implicit barrier at the end of target construct. So test case needs to be modified to add the required synchronization.

OMPVV_TEST_SHARED_ENVIRONMENT test is broken

Describe the bug
Based on the description by Tom Scogland in the PR #86, the test we do with this macro is broken

Test that it applies to
Any test that requires us to check for this condition

Expected behavior
A possible solution would be to use is_device_pointer with a pointer variable instead of an scalar. Using an scalar has the problem of defaultmaping of scalars as firstprivate.

List of tests this applies to:

Stack overflow

The test_collapse2 function, in the following test blows away the stack when compiled with GCC:

tests/4.5/target_teams_distribute/test_target_teams_distribute_collapse.c

The testcase allocates two large local arrays totalling more than 16MiB and exceeding the available stack space. It causes a segmentation fault (trying to initialise the other local variable) even when OpenMP is disabled. I’m compiling using GCC 9 on x86_64 using Ubuntu glibc. The offload features are not relevant to this problem.

Moving the arrays to file scope solves the problem.

Wrong variable mapped in test_target_teams_distribute_thread_limit.c?

#define N 1024

int main() {
  OMPVV_TEST_OFFLOADING;
  int default_threads;
  int num_threads;
  int errors = 0;

#pragma omp target teams distribute map(from: default_threads)
  for (int x = 0; x < N; ++x) {
    if (omp_get_team_num() == 0) {
      default_threads = omp_get_thread_limit();
    }
  }

  OMPVV_WARNING_IF(default_threads == 1, "Test operated with one thread. Cannot test thread_limit clause.");
  OMPVV_TEST_AND_SET(errors, default_threads <= 0);

  if (default_threads > 0) {
#pragma omp target teams distribute thread_limit(default_threads / 2) map(from: default_threads)
    for (int x = 0; x < N; ++x) {
      if (omp_get_team_num() == 0) {
        num_threads = omp_get_thread_limit();
      }
    }

    OMPVV_TEST_AND_SET(errors, num_threads > default_threads / 2);
    OMPVV_WARNING_IF(num_threads < default_threads / 2, "Test was provided fewer threads than the thread_limit clause indicated. Still spec-conformant.");

  }

  OMPVV_REPORT_AND_RETURN(errors);
}

Describe the bug
In the second target pragma, I believe the from clause should map num_threads instead of default_threads as default threads is not used inside of the target. An added printf statement shows num_threads is not initialized.

Test that it applies to
test_target_teams_distribute_thread_limit.c

To Reproduce
Steps to reproduce the behavior:
Assumes AOMP 11.0-1 is installed at /usr/lib/aomp and gfx906

make VERBOSE=1 VERBOSE_TESTS=1 LOG=1 LOG_ALL=1 CC='/usr/lib/aomp/bin/clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906' all SOURCES=tests/4.5/target_teams_distribute/test_target_teams_distribute_thread_limit.c

Expected behavior
Expected num_threads be less than or equal to thread_limit. Shows 1 error and failed test.

Compiler
AOMP 11.0-1

Accelerator hardware
gfx906 - Radeon 7

reduction tests in 4.5 use datasharing attributes with map

Describe the bug
In OpenMP 4.5, page 218 line 15 and 16:
"A list item cannot appear in both a map clause and a data-sharing attribute clause on the same construct."

Reduction clause is a data-sharing attribute. We did this as a work around for the problem of reducing in combined constructs:

#pragma omp target teams distribute reduction(+:a) // this will not map a back

However adding the map will break the above restriction:

Test that it applies to
tests/4.5/application_kernels/gemv_target_reduction.cpp
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_add.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_and.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_bitand.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_bitor.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_bitxor.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_max.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_min.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_multiply.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_or.c
tests/4.5/target_teams_distribute/test_target_teams_distribute_reduction_subtract.c

Current behavior in compilers
CORAL clang, xlc and gcc pass these tests. However, Clang 9.0.0 shows the restriction:
"error: reduction variable cannot be in a map clause in '#pragma omp target teams distribute'"

Expected behavior
A clear and concise description of what you expected to happen.

Possible illegal Fortran code

Hi,

line ompvv.F90:151 is illegal Fortran code.

The function test_error is called within test_and_set, which is called through the macro:

#define OMPVV_TEST_AND_SET(err, condition) err = err + test_and_set(condition, __FILENAME__, __LINE__)

So the the fn arg turns out to be always a constant which may be allocated by some compilers in read-only memory.

Furthermore, the variable ln should be probably declared with INTENT(IN), and in that case the compiler would issue an error.

Is there a plan to use that variable, maybe a debug message?
Any chance this can be fixed?

Thank you!

Here a smaller reproducer:

subroutine test_error(fn)
   CHARACTER(len=*), INTENT(IN) :: fn
   ! Avoid unused variables warning
   fn = fn
end subroutine test_error

program hello
   call test_error(__FILE__)
end program hello

"omp master" does not protect against data race across "omp teams"

I noticed an issue with an “omp master” construct appearing in an “omp teams” construct, in these tests:

test_target_teams_distribute.c
test_target_teams_distribute_collapse.c
test_target_teams_distribute_default_none.c

For example:

#pragma omp target data map(tofrom: a[0:ARRAY_SIZE], num_teams) map(to: b[0:ARRAY_SIZE])
  {
#pragma omp target teams distribute map(alloc: a[0:ARRAY_SIZE], b[0:ARRAY_SIZE], num_teams)
    for (int x = 0; x < ARRAY_SIZE; ++x) {
#pragma omp master
      {
        num_teams = omp_get_num_teams();
      }
      a[x] += b[x];
    }
  }

The “omp master” does not bind to the “omp teams” construct, and therefore doesn’t protect against data races across the threads in different teams. To fix this code, I think you’d need a “lastprivate(num_teams)” clause, or replace “master” with “atomic write”, or just make the code conditional based on “omp_get_team_num() == 0” (or maybe even “x == 0”).

test_target_data_use_device_ptr not standard compliant?

The test_target_data_use_device_ptr test-case places "use_device_ptr" before "map". This ordering fails with GCC 9, because the implementation requires the "map" first.

According to OpenMP 4.5: "A list item in a use_device_ptr clause must have a corresponding list item in the device data environment." (Section 2.10.1)

This does not say explicitly that the ordering within a directive matters, but does imply that the "use_device_ptr" should not occur until the variable has been mapped. This was the interpretation taken by the GCC implementation.

OpenMP 5.0 appears to agree with this interpretation because new language has been added to cover the case: "If one or more of the use_device_ptr or use_device_addr clauses and one or more map clauses are present on the same construct, the address conversions of use_device_addr and use_device_ptr clauses will occur as if performed after all variables are mapped according to those map clauses."

GCC 10 accepts the test-case, as is, but that's because the implementation has been updated to OpenMP 5.0 (in this regard).

I believe this test-case needs to have those clauses reversed to be truly considered 4.5 compliant.

OMPVVLIB in the C tests build step fails with GCC 9.2.0

Describe the bug
We are building and running the benchmark on Linux system with GCC 9.2.0 and V100. Compilation of most of the C tests is successful, however during runtime all of the C tests fail with the following error:
libgomp: Cannot map target functions or variables (expected X-1, have X)

Test that it applies to
All C tests.

To Reproduce
Clone the current repository, compile and run with GCC 9.2.0.

Expected behavior
Removing $OMPVVLIB from the build rules for C files in Makefile seems to be solving this issue:
- -$(QUIET)$(call loadModules,$(C_COMPILER_MODULE)) $(CCOMPILE) $(VERBOSE_MODE) $(DTHREADS) $(DTEAMS) $(HTHREADS) $< -o $(BINDIR)/$(notdir $@) $(OMPVVLIB) $(if $(LOG),$(RECORD)$(notdir $(@:.o=.log))\
+ -$(QUIET)$(call loadModules,$(C_COMPILER_MODULE)) $(CCOMPILE) $(VERBOSE_MODE) $(DTHREADS) $(DTEAMS) $(HTHREADS) $< -o $(BINDIR)/$(notdir $@) $(if $(LOG),$(RECORD)$(notdir $(@:.o=.log))\

Compiler
GCC 9.2.0

Accelerator hardware
V100

Changing behavior of OMPVV_REPORT and OMPVV_REPORT_AND_RETURN

Is your feature request related to a problem? Please describe.
Both check for OpenMP offloading support. Some tests don't do offloading and still get reported as executed on device.

Describe the solution you'd like
OMPVV_TEST_OFFLOADING should test offloading support and set global state. If a test does not offload then OMPVV_TEST_OFFLOADING should not be used.

Reporting macros should only look at the flag to decide where the test was executed.

CORAL-2 Test suite integration

Include these tests

Jose, Kyle, Sunita, and Oscar,

I recently read about work in the CORAL-2 procurement using the SOLLVE OpenMP test suite to demonstrate compiler functionality. This is clearly very valuable work and having a comprehensive test suite is very important to us.

In the course of preparing for Sierra at LLNL, Aaron Black identified a large number of bugs in XLF’s OpenMP offload support. We have released the collection of reproducers as open source. It can be found here: https://github.com/LLNL/FGPU/

Would you be interested in reviewing our collection to find examples that would be good additions to the SOLLVE suite?

Thanks,

Dave

Test suggestions to review from FGPU (tests that are out of scope are removed from the list):

Task example from Jackub

Directive and clause it applies to

Tasks

Specification details
From the implant email
The following is an example on Jakub’s blog using task reductions.

I don’t have the latest gcc handy, but Intel 19 does not accept this code, complaining that tasks in function bar don’t have an “associated task group.” Should this compile? The spec states that “For a given a list item, the in_reduction clause deﬁnes a task to be a participant in a task reduction that is deﬁned by an enclosing region for a matching list item...”. I don’t see where we specify if it has to be lexically enclosed or if it can by dynamically enclosed, as the example implies.

(Blog article link:
https://developers.redhat.com/blog/2019/03/19/whats-new-in-openmp-5-0/
)

Stephen
int r;

void bar (int i) {
#pragma omp task in_reduction (+:r)
r += work (i, 0);
#pragma omp task in_reduction (+:r)
r += work (i, 1);
}

int foo () {
#pragma omp taskgroup task_reduction (+:r)
bar (0);
#pragma omp taskloop reduction (+:r)
for (int i = 1; i < 4; ++i)
{ bar (i); r += i; }
}

Copying C tests to its own version of C++

Is your feature request related to a problem? Please describe.
There are compilers that use different C and C++ passes. @mjklemm suggested we replicate all the C codes into C++ files (.cpp). I think this is a good idea, I want to discuss it with everyone else too.

Describe the solution you'd like
A solution we discussed was to make a hard copy of each of the tests into its own file, and possibly make specific C to C++ changes (e.g. malloc to new).

Iterators tests for 5.0

Fabian Mora suggested this test to us:

Directive and clause it applies to
This is for using iterators in OpenMP 5.0 with the depend clause.

Pseudocode

include <iostream>
#include <omp.h>
#include <cstdlib>
#include <unistd.h>

using namespace std;

void print(int tid,int tsk,int t) {
    #pragma omp critical 
    {
	cerr<<"Launch node: "<<tsk<<" \tThread: "<<tid<<"\tSleep time: "<<t<<endl;
    }
    sleep(t);
#pragma omp critical 
    {
	cerr<<"Conclude node: "<<tsk<<" \tThread: "<<tid<<"\tSleep time: "<<t<<endl;
    }
}

void tg(int n,int *ptr,int *cols,int *time) {
#pragma omp parallel num_threads(4)
{
#pragma omp single
	{
		for(int i=0;i<n;++i) {
			int pos=ptr[i],size=ptr[i+1]-ptr[i];
#pragma omp task depend(iterator(it=0:size), in:ptr[cols[pos+it]]) depend(out:ptr[i])
			{
				print(omp_get_thread_num(),i,time[i]);
			}
		}
	}
}
}

int main(int argc, char **argv) {
	cerr << "************************************************************"<< endl;
	cerr << "*                  Execution began                         *"<< endl;
	cerr << "************************************************************"<< endl<< endl;
	int ptr[]={0, 4, 5, 6, 7, 8, 9, 10, 11},cols[]={1, 2, 3, 4, 5, 5, 6, 6, 7, 7,8},time[]={3,3,3,1,1,4,2,3};
	tg(8,ptr,cols,time);
	cerr<< endl << "************************************************************"<< endl;
	cerr << "*                  Execution ended                         *"<< endl;
	cerr << "************************************************************"<< endl;
	return 0;
}

Consider using cmake build.

Is your feature request related to a problem? Please describe.
There are some bugs that can be easily resolved by using CMake like #34 and #5

Describe the solution you'd like
CMake has many builtin functionalities that facilitate features checking, cross-platform programming, finding packages, testing, packaging ... etc.

Describe alternatives you've considered

Additional context

improper use of FORTRAN KIND # for COMPLEX variable declaration

Describe the bug
One of the sollve_vv tests contains the following declaration:

   COMPLEX(kind = 16) :: scalar_double_complex

twice.
This declaration was intended to make "scalar_double_complex" a complex variable whose
real and imaginary parts are DOUBLE PRECISION (in fortran terms).
However, the KIND=n syntax introduced in Fortran 90 does NOT imitate the older non-standard
syntax every compiler supports, namely "COMPLEX*16 scalar_double_complex".
The non-standard syntax specifies a byte count for the entire variable. Hence,

 COMPLEX*16 z

declares "z" to be two consecutive 8 bytes real types (the real and imaginary parts of "z"), while
the newer

 COMPLEX (KIND=16) z

makes both the real and imaginary parts of "z" and REAL*16 type, and the entire variable is 32 bytes long.

So, the declaration should be either

       COMPLEX(kind = 8) :: scalar_double_complex

or (less desirabe)

       COMPLEX*16 scalar_double_complex

Test that it applies to

tests/4.5/target/test_target_defaultmap.F90

To Reproduce
n/a

Expected behavior
After the change, a compiler like flang, that does not support REAL*16 type, will be able to compile the code.

Compiler
Any.

Accelerator hardware
n/a

Nested 'Parallel Reduction' case

Directive and clause it applies to
Parallel within parallel and reduction

Specification details
From Mailing List:
Is it a known problem?

int main()
{
  int t=-1;

#pragma omp target teams map(t)
{
  #pragma omp parallel reduction(+: t)
  {
    #pragma omp parallel reduction(+: t)
    {
      t = 1;
    }
  }
}
}

$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda tmp.c
ptxas /tmp/thread_limit-984c26.s, line 1792; error   : Instruction 'vote' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4
ptxas fatal   : Ptx assembly aborted due to errors
clang-11: error: ptxas command failed with exit code 255 (use -v to see invocation)

$ bin/clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda tmp.c -Xopenmp-target -march=sm_60
ptxas /tmp/thread_limit-5cc9b6.s, line 1792; warning : Instruction 'vote' without '.sync' is deprecated since PTX ISA version 6.0 and will be discontinued in a future PTX ISA version

Kelvin
Any other comments
Reported by Kelvin Li

Makefile does not support out-of-tree builds

From @Meinersbur

$ make -f ../sollve_vv-src/Makefile
../sollve_vv-src/Makefile:14: sys/make/make.def: No such file or directory
make: *** No rule to make target `sys/make/make.def'.  Stop.

Does `test_target_private` contain a race condition?

Here is a shortened version of the test case:

#include "ompvv.F90"

#define NUM_THREADS 10

      PROGRAM test_target_private
        USE iso_fortran_env
        USE ompvv_lib
        USE omp_lib
        implicit none
        INTEGER :: i, j
        INTEGER :: compute_array(NUM_THREADS)
        INTEGER :: actualThreadCnt = 0
        INTEGER :: p_val, fp_val

        compute_array(:) = 0

        CALL omp_set_num_threads(NUM_THREADS)

        !$omp parallel private(p_val, fp_val) shared(actualThreadCnt)
          fp_val = omp_get_thread_num() + 2
          p_val = omp_get_thread_num() + 1
          actualThreadCnt = omp_get_num_threads()
          !$omp target map(tofrom:compute_array) map(to:fp_val) private(p_val)
            p_val = fp_val - 1
            compute_array(p_val) = 100
            p_val = p_val + 99
          !$omp end target
          IF (p_val == omp_get_thread_num() + 1) THEN
            write (*,*) "thread ", p_val, " is doing the right thing. \
            array val is ", compute_array(p_val)
            compute_array(p_val) = compute_array(p_val) + 1
          END IF
        !$omp end parallel

        do i=1,NUM_THREADS
            write (*,*) i
            write (*,*) compute_array(i)
        enddo

        OMPVV_TEST_VERBOSE(ANY(compute_array(:) /= 101))

      END PROGRAM test_target_private

Describe the bug

I am concerned about this part of the test:

        !$omp parallel private(p_val, fp_val) shared(actualThreadCnt)
          fp_val = omp_get_thread_num() + 2
          p_val = omp_get_thread_num() + 1
          actualThreadCnt = omp_get_num_threads()
          !$omp target map(tofrom:compute_array) map(to:fp_val) private(p_val)

The parallel region defaults to the shared data-sharing attribute for the compute_array variable, so each thread sees the same compute_array data;
The target region then forces each thread to make a copy with map(tofrom:compute_array);
So as a result of 2., each launched kernel contains its own private compute_array data;
Each launched kernel modifies a piece of compute array;
compute_array is copied back to the host, almost certainly overwriting data elements modified by kernels launched by other host threads.

Test that it applies to
test_target_private

To Reproduce

make VERBOSE=1 VERBOSE_TESTS=1 LOG=1 LOG_ALL=1 CC=gcc CXX=g++ FC=gfortran all SOURCES=test_target_private.F90 gives a test failure

Expected behavior

Well, I guess with a data race the expected behaviour is a test failure :) I'm arguing that the test has a bug.

Compiler
The CodeBench Sourcery Lite Edition compiler (a fork of GCC).

Accelerator hardware
AMD MI50 GPU

NERSC Virtual methods tests

Email from Rahulkumar Gayatri

Hi Jose and Chris,
Following our earlier conversation, here are the 2 test cases for virtual and derived class instances on the device using OpenMP offload.
test_virtual1.cxx - only derived class object accessed on the device (Written by Chris)
test_virtual1.cxx - both base and derived class objects are accessed on the device (slight modification to the first test case)

There is also a makefile that you can edit depending on the compiler.

Attached is the tar file for these tests.

Regards,
Rahul.

These were created as a proof of concept for Josh Meyers (copied) and the ImSim project. The eventual code will probably look very different.
Add to the test head Also Chris Daily and Rahulkumar Gayatri from NERSC for their suggestion

Chris

OpenMPOffloadVirtualTest.tar.gz

UPDATED VERSION
create-object-on-device-2.zip

test_target_teams_distribute_collapse.F90 – wrong map flag

In test_target_teams_distribute_collapse.F90's test_collapse2, the variable b's initial value is used in the loop (zero initialized) but it is only marked as from: instead of tofrom:.

Error in testcase : tests/4.5/target_enter_data/test_target_enter_data_if.F90

There is a logical error in the following testcase that will result in undetermined runtime behavior.

tests/4.5/target_enter_data/test_target_enter_data_if.F90
@line no: 69 we have enter data which is guarded under (s > THRESHOLD)

69 !$omp target enter data if(s > THRESHOLD) &
70 !$omp map(to: a(1:s), b(1:s))

@ line no: 102 we have exit data statement without guard
102 !$omp target exit data map(delete: a(1:s), b(1:s))

if s <= THRESHOLD, there will be issues with above code.

FIX:

line no: 102 need to be changed as follows
102 !$omp target exit data if(s> THRESHOLD) &
103 !$omp map(delete: a(1:s), b(1:s))

test_target_map_struct_default.c fails if run with no device present

Verfication issue in test_target_teams_distribute _parallel_for_firstprivate.c

The target_teams_distribute_parallel_for_firstprivate.c test verification works on the assumption that the first private variable 'privatized' will be reset to the value 10 for each iteration. We think this assumption is only true when exactly one thread executes one interation of the outer loop. This might be the case a GPU. However, this means that the test fails on devices with fewer threads, so we think this might not correct and portable OpenMP code. We believe that the verification only works if we reset the value of 'privatized' to 10 between the two loops, which is not the case if we (for instance) only have one team with 8 threads.

‌

Do you agree or did we made a wrong assumption?

Makefile does not work with absolute compiler path in CC, CXX, and FC variables

Describe the bug
If CC, CXX, or FC include the absolute compiler path. The build system gets very confused. Here is the work around @josemonsalve2 came up with for my Linux desktop. I have GCC-9 with the NVidia backend installed via Spack.

PATH=/scratch/naromero/opt/spack/linux-ubuntu18.04-westmere/gcc-8.3.0/gcc-9.2.0-oavx2juw3k6od5vagdvtshcwrgclcmqv/bin/:$PATH make CC=gcc CXX=g++ FC=gfortran SOURCES=offloading_success.c VERBOSE=1 VERBOSE_TESTS=1 LOG=1 LOG_ALL=1 all

But without the $PATH, GCC is not properly detected.

Test that it applies to
All tests

To Reproduce
See above line.

Expected behavior
Test suite compiles and runs.

Compiler
GCC 9 with NVidia backend compiled via Spack

Accelerator hardware
NVidia GPU card.

Allow the user to select NUM_THREADS and NUM_TEAMS for the tests that use this

The problem
Not all the devices would support large number of threads. In order to avoid being GPU specific we need to be flexible with respect to the requested number of threads. This is somehow different to the OpenMP Environment variables as we don't want to depend on the implementation, but rather pass the information from the test suite to OpenMP.

List of tests that this applies to:

Describe the solution you'd like
Create a new set of options to be included in the Makefile/system File:

OMPVV_NUM_THREADS_DEVICE
OMPVV_NUM_TEAMS_DEVICE
OMPVV_NUM_THREADS_HOST

These variables are sent to the source codes through -D option in the compiler line, and used everywhere we have num_teams and num_threads. Some cases will have to require special attention. Specially those where we are trying to test for the num_threads and num_teams clauses themselves.

We probably want to also have a set of default values. These values should live in ompvv.h header files.

Additional context
This request was a consequence of the discussion occurred in Issue #15

test_target_teams_distribute_default_firstprivate.F90 fails when run

tests/4.5/target_teams_distribute/test_target_teams_distribute_default_firstprivate.F90 fails both tests when compiled with GFortran (current git trunk) and run with an AMD GCN accelerator target.

[OMPVV_INFOMSG test_target_teams_distribute_default_firstprivate.F90:21] Test is running on device
[OMPVV_ERROR test_target_teams_distribute_default_firstprivate.F90:24]  Condition test_firstprivate_private() .ne. 0 failed
[OMPVV_ERROR test_target_teams_distribute_default_firstprivate.F90:25]  Condition test_firstprivate_first() .ne. 0 failed
[OMPVV_INFOMSG ompvv.F90:244] The value of errors is 2
[OMPVV_RESULT test_target_teams_distribute_default_firstprivate.F90] Test failed on the device.

If a write statement is added to print the value of d(x) being compared, they will all be zeroes.

The problem is that the 'default(firstprivate)' clause is being applied to all the variables (including d), so the result of the computation does not make it out of the enclosing construct.

From the OpenMP 4.5 specification:

2.15.1.1: Variables with implicitly determined data-sharing attributes are those that are referenced in a given construct, do not have predetermined data-sharing attributes, and are not listed in a data-sharing attribute clause on the construct.

2.15.3.1: The default clause explicitly determines the data-sharing attributes of variables that are referenced in a parallel, teams, or task generating construct and would otherwise be implicitly determined...

I believe this behaviour is correct according to the spec, as listing the variables in the 'map' clause does not count as being 'listed in a data-sharing attribute clause'.

Illegal use of "ANY" function in test_target_enter_data_allocate_array_to.F90

At lines 73, 108, and 144 the function ANY is used with an integer array while it expects a logical array.

Another reduction tests

Separated directives reduction tests. Taken from:
ROCm/aomp#51

#include <iostream>

int main()
{
  int counts1 = 0;
  int counts2 = 0;
  #pragma omp target teams map(from:counts1)
  {
    int counts_team = 0;
    #pragma omp parallel
    {
      #pragma omp for
      for (int i=0; i<4; i++)
        #pragma omp atomic
        counts_team += 1;
    }
    counts1 = counts_team;
  }

  #pragma omp target teams map(from:counts2)
  {
    int counts_team = 0;
    #pragma omp parallel
    {
      #pragma omp for reduction(+:counts_team)
      for (int i=0; i<4; i++)
        counts_team += 1;
    }
    counts2 = counts_team;
  }

  if (counts1 != 4)
    std::cout << " wrong counts1 = " << counts1 << " should be 4!" << std::endl;
  if (counts2 != 4)
    std::cout << " wrong counts2 = " << counts2 << " should be 4!" << std::endl;
}