Coder Social home page Coder Social logo

openaccusergroup / openaccv-v Goto Github PK

View Code? Open in Web Editor NEW
23.0 23.0 16.0 2 MB

OpenACC Validation and Verification Testsuite repository

License: BSD 3-Clause "New" or "Revised" License

C 29.85% Fortran 33.05% C++ 32.32% Python 3.45% CSS 0.22% JavaScript 0.91% HTML 0.21%

openaccv-v's People

Contributors

ajarmusch avatar chrismun avatar clementval avatar imanhosseini avatar jdenny-ornl avatar sunitachandra avatar taxift avatar utimatu avatar wangxin0321 avatar wcgunter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openaccv-v's Issues

[Fortran] Wrong lower bound in section mapping in some tests

There is a couple of tests that use array section with an invalid lower bound of 0 for arrays declared with the default lower bound of 1.

in acc_map_data.F90

acc_map_data.F90:33:27: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc update host(c(0:LOOPCOUNT))

in acc_unmap_data.F90

acc_unmap_data.F90:23:27: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc data copyin(a(0:LOOPCOUNT), b(0:LOOPCOUNT)) present(c(0:LOOPCOUNT)) copyout(c(0:LOOPCOUNT))
                            ^^^^^^^^^^^^^^
acc_unmap_data.F90:7:41: Declaration of 'a'
          REAL(8),DIMENSION(LOOPCOUNT) :: a, b, c, d, e !Data
                                          ^
acc_unmap_data.F90:23:43: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc data copyin(a(0:LOOPCOUNT), b(0:LOOPCOUNT)) present(c(0:LOOPCOUNT)) copyout(c(0:LOOPCOUNT))
                                            ^^^^^^^^^^^^^^
acc_unmap_data.F90:7:44: Declaration of 'b'
          REAL(8),DIMENSION(LOOPCOUNT) :: a, b, c, d, e !Data
                                             ^
acc_unmap_data.F90:23:67: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc data copyin(a(0:LOOPCOUNT), b(0:LOOPCOUNT)) present(c(0:LOOPCOUNT)) copyout(c(0:LOOPCOUNT))
                                                                    ^^^^^^^^^^^^^^
acc_unmap_data.F90:7:47: Declaration of 'c'
          REAL(8),DIMENSION(LOOPCOUNT) :: a, b, c, d, e !Data
                                                ^
acc_unmap_data.F90:23:91: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc data copyin(a(0:LOOPCOUNT), b(0:LOOPCOUNT)) present(c(0:LOOPCOUNT)) copyout(c(0:LOOPCOUNT))

In acc_deviceptr.F90

acc_deviceptr.F90:26:33: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc enter data copyin(a(0:n), b(0:n)) create(c(0:n))
                                  ^^^^^^
acc_deviceptr.F90:7:42: Declaration of 'a'
          REAL(8), DIMENSION(LOOPCOUNT) :: a, b, c  !Data
                                           ^
acc_deviceptr.F90:26:41: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc enter data copyin(a(0:n), b(0:n)) create(c(0:n))
                                          ^^^^^^
acc_deviceptr.F90:7:45: Declaration of 'b'
          REAL(8), DIMENSION(LOOPCOUNT) :: a, b, c  !Data
                                              ^
acc_deviceptr.F90:26:56: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc enter data copyin(a(0:n), b(0:n)) create(c(0:n))
                                                         ^^^^^^
acc_deviceptr.F90:7:48: Declaration of 'c'
          REAL(8), DIMENSION(LOOPCOUNT) :: a, b, c  !Data

acc_deviceptr.F90:42:33: error: Subscript 0 is less than lower bound 1 for dimension 1 of array
          !$acc exit data copyout(c(0:n)) delete(a(0:n), b(0:n))
                                  ^^^^^^
acc_deviceptr.F90:7:48: Declaration of 'c'
          REAL(8), DIMENSION(LOOPCOUNT) :: a, b, c  !Data

So the 0 in all the sections above must be 1.

pgi/18.10 compilation failures

Hi,

Trying to compile the .c test files with pgi/18.10, i am getting a few failures.
I'd like to get your expert feedback about those failures, i.e can we explain those failure because pgi just does not support the directives used in the test code (then i'll be fine with it). If not, we may want to report to the compiler team or adapt the src code.

Internal compiler error

/apps/common/UES/pgi/18.10/linux86-64/18.10/bin/pgcc -acc -ta=tesla,cc60 atomic_capture_postincrement.c

PGC-F-0000-Internal compiler error. mr_preceeds: 
too many st/br      22  (atomic_capture_postincrement.c: 33)

PGC-S-0155

  • atomic_capture_bitand_equals
  • atomic_capture_bitor_equals
  • atomic_capture_bitxor_equals
  • atomic_capture_divided_equals
  • atomic_capture_expr_bitand_x
  • atomic_capture_expr_bitor_x
  • atomic_capture_expr_bitxor_x
  • atomic_capture_expr_divided_x
  • atomic_capture_expr_lshift_x
  • atomic_capture_expr_minus_x
  • atomic_capture_expr_multiply_x
  • atomic_capture_expr_plus_x
  • atomic_capture_expr_rshift_x
  • atomic_capture_lshift_equals
  • atomic_capture_minus_equals
  • atomic_capture_multiply_equals
  • atomic_capture_plus_equals
  • atomic_capture_postdecrement
  • atomic_capture_predecrement
  • atomic_capture_preincrement
  • atomic_capture_rshift_equals

/apps/common/UES/pgi/18.10/linux86-64/18.10/bin/pgcc -acc -ta=tesla,cc60 atomic_capture_bitxor_equals.c

PGC-S-0155-Invalid atomic capture block, multiple updates.  (atomic_capture_bitxor_equals.c: 57)
PGC-S-0155-Invalid atomic capture.  (atomic_capture_bitxor_equals.c: 58)

Feedback from NVHPC

Hi. Thank you for the OpenACC V&V suite. I'm working on the NVHPC compiler and checked all the failures of nvhpc 23.1 ( https://crpl.cis.udel.edu/oaccvv/results/ ). I'd like to make some comments on the implementation to improve the quality of the suite.


This line should be IF (abs(c(x) - (a(x) + b(x))) .gt. PRECISION) THEN.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/acc_copyin_with_len.F90#L299


DO x = 1, acc_get_num_devices(acc_get_device_type()) should be DO x = 0, acc_get_num_devices(acc_get_device_type())-1

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/acc_get_device_num.F90


This test seems to be running on a single-GPU environment.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/acc_memcpy_d2d.c


This length is too big to complete the test.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/atomic_capture_assign_expr_divided_x.F90

atomic_capture_assign_expr_minus_x has the same issue. Also, the atomic operation is not necessary because the innermost loop is not parallelized automatically.

atomic_capture_assign_expr_plus_x is missing acc_testsuite.Fh, so PRECISION is undefined.

The runtime failures from atomic_capture_assign_expr_plus_x to atomic_capture_x_plus_expr_assign.F90 are the same.

atomic_update_expr_divided_x.F90 and atomic_update_expr_minus_x.F90 fail due to the same reason. They need PRECISION defined, and length=10 is too big. length=5 works.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/atomic_update_expr_divided_x.F90


#define PRECISION 1e-8 is too restrictive to check the results of real_t. if (fabs(b[x] - (a[x] * prev)) < PRECISION){ :

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/atomic_structured_expr_multiply_x_assign.c

The tests from atomic_structured_expr_multiply_x_assign.c to atomic_structured_x_multiply_expr_assign.cpp fail due to the same reason.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/atomic_structured_x_multiply_expr_assign.cpp

kernels_if test3 is in the same situation.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/kernels_if.F90

parallel_loop_reduction_add_general_type_check_pt2 is also.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/parallel_loop_reduction_add_general_type_check_pt2.c

The same case:

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_firstprivate.F90


These tests should not depend on the order of the evaluation of the clauses. #pragma acc parallel loop copy(test[0:n]) copyout(test[0:n]).

629 • A program must not depend on the order of evaluation of the clauses, or on any side effects
630 of the evaluations.
( https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.6.final.pdf )
1056 • A program must not depend on the order of evaluation of the clauses or on any side effects of
1057 the evaluations.
( https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.2-final.pdf )

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/copy_copyout.c

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/copyin_copyout.c


parallel_create_zero has an incomplete line: #pragma acc data copyin(a[0:n]) copyout(b[0:n]. (Also, we do not support zero modifiers yet.)

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/parallel_create_zero.c


The following tests are not complete.

This refers to undefined 'scalar'. A variable and types are redeclared.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_copyin.c

The Fortran version has similar issues.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_copyin.F90

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_create.F90

This one does not contain main, so should be excluded from the tests.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_copyin_mod.F90

This one also misses 'scalar'.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_device_resident.c

(In general, the C++ version should be symlinked to the C vertsion?)

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_device_resident.cpp

The tests from declare_function_scope_copy to declare_function_scope_present are missing the allocation of a, b, and c.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/declare_function_scope_copy.c


This loop should start from 0.

    for (int x = 1; x < n; ++x){
        host_b = host_b | a[x];
    }

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/kernels_loop_reduction_bitor_general.cpp#L34


This test is requesting gang-level reduction instead of loop-level reduction. I think it is not valid according to the spec.

1478 The reduction clause specifies a reduction operator and one or more scalar variables. For each
1479 reduction variable, a private copy is created in the same manner as for a private clause on the
1480 loop construct, and initialized for that operator; see the table in Section 2.5.12 reduction clause. At
1481 the end of the loop, the values for each thread are combined using the specified reduction operator,
1482 and the result combined with the value of the original variable and stored in the original variable at
1483 the end of the parallel or kernels region if the loop has gang parallelism, and at the end of the loop
1484 otherwise.
( https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.6.final.pdf )

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/kernels_num_gangs.F90


There is an incompatibility between OpenACC 2.6 and 3.2 regarding null copyout. NVHPC is supporting 2.6. The following test is assuming 3.2.

1225 – If var is not present on the current device, a runtime error is issued.
( https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.6.final.pdf )
1745 – If the appropriate reference counter for var is zero, no action is taken.
( https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.2-final.pdf )

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/reference_count_zero.c


In this test, a lot of threads illegally call a gang routine from the parallel construct without loop.

2876 gang clause
2877 The gang clause specifies that the procedure contains, may contain, or may call another procedure
2878 that contains a loop with a gang clause. A call to this procedure must appear in code that is
2879 executed in gang-redundant mode, and all gangs must execute the call. For instance, a procedure
2880 with a routine gang directive may not be called from within a loop that has a gang clause.
2881 Only one of the gang, worker, vector and seq clauses may appear for each device type.
( https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.2-final.pdf )

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/routine_gang.c


INTEGER:: multiplier should be INTEGER:: multiplier = 1

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_gang_blocking.F90

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_vector_blocking.F90

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_worker_blocking.F90

b and host_b need initialization.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_reduction_bitor_general.F90

This needs REAL(8):: maxval = 0.0, host_max = 0.0.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_reduction_max_general.F90


IF (a(x, y) .eq. 1) THEN should be IF (a(x, y) .eq. .TRUE.) THEN.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_loop_reduction_and_loop.F90


The kernel needs present(a, b, d).

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_private.c


IF (abs(d(x) - temp) .gt. (2 * PRECISION * LOOPCOUNT)) THEN should be IF (abs(d(y) - temp) .gt. (2 * PRECISION * LOOPCOUNT)) THEN.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/serial_private.F90


This test seems to be incorrect.

With NVHPC:

    int device_type = acc_get_device_type();   // 4

    #pragma acc set device_type(host)          // 2
    if (acc_get_device_type() != device_type){ // 2 != 4
        err += 1;
    }

#pragma acc set device_type(default) → nvidia → 4
#pragma acc set device_type(host) → 2
#pragma acc set device_type(multicore) → 2

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/set_device_type.c

This one is missing INTEGER :: device_num.

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/set_device_type_num.F90


wait_if is missing #ifndef T2. (Also, we do not support wait if yet.)

https://github.com/OpenACCUserGroup/OpenACCV-V/blob/master/Tests/wait_if.c


We will refine our compilers to fix the rest of the failures. Thanks a lot.

Several test cases where `async` is followed by synchronous execution without intervening `wait`

For example:

#pragma acc exit data delete(a[0:n], b[0:n]) copyout(c[0:n])

That exit data doesn't have an async clause, and neither a wait clause to wait for the preceding async(1) code, and also there is no intervening wait directive or acc_wait. Thus -- per my understanding -- that exit data may start executing while the the preceding async(1) code is still in progress, and thus may unmap data that's still in use.

As I've seen such a pattern not only here but also in several other test cases, I wonder if my understanding is flawed, or if all these should be fixed (in some way still to be determined)?

[Fortran] Non standard use of acc routine

In test declare_create.F90 line 5 (

!$acc routine vector
) there is an !$acc routine vector directive. This is a non standard usage and it has no specified semantic.

From the standard section 2.15.1:

In Fortran, the routine directive without a name may appear within the specification part of a
subroutine or function definition, or within an interface body for a subroutine or function in an
interface block, and applies to the containing subroutine or function.

In this case it is not in the specification part or a subroutine or function or within an interface body for a subroutine or function.

This directive should be removed to be standard compliant. There is the same directive in the function anyway.

Copyin clause with library call syntax in acc_delete_async.f90 and acc_delete_async_with_len.f90

The call here seems not correct according to the specification.

!$acc enter data copyin(c(1:LOOPCOUNT), 1) async(1)

The library has a acc_copyin(a, len) signature but the clause accept normally a var-list

Same in this test acc_delete_async_with_len.f90

!$acc enter data copyin(c(1:LOOPCOUNT), 1) async(1)

gcc/8.2.0 compilation failures

Hi,

Trying to compile the .c test files with gcc/8.2.0, i am getting a few failures.
I'd like to get your expert feedback about those failures, i.e can we explain those failure because gcc just does not support the directives used in the test code (then i'll be fine with it). If not, we may want to report to the compiler team or adapt the src code.

Undefined references:

  • acc_copyin_async

  • acc_copyout_async

  • acc_create_async

  • acc_delete_async

  • acc_get_default_async

  • acc_memcpy_from_device_async

  • acc_set_default_async

  • acc_update_device_async

  • acc_update_self_async

  • acc_free (undef acc_property_free_memory)

  • acc_get_property (undef acc_property_free_memory)

  • acc_malloc (undef acc_property_free_memory)

For instance:

cc -fopenacc -lm acc_copyin_async.c -o acc_copyin_async

acc_copyin_async.c: In function 'test':
acc_copyin_async.c:19:5: warning: implicit declaration of function 'acc_copyin_async'; 
did you mean 'acc_wait_async'? [-Wimplicit-function-declaration]
     acc_copyin_async(a, n * sizeof(real_t), 1);
     ^~~~~~~~~~~~~~~~
     acc_wait_async
/tmp/ccuvXq81.o:acc_copyin_async.c:function test:
 error: undefined reference to 'acc_copyin_async'

'c' appears more than once

cc -fopenacc -lm acc_wait.c -o acc_wait

acc_wait.c: In function 'test':
acc_wait.c:27:13: error: 'c' appears more than once in data clauses
     #pragma acc parallel present(c[0:n], d[0:n], c[0:n]) async(1) wait(2)
  • acc_async_test
  • acc_wait

expected ')'

cc -fopenacc -lm parallel_private.c -o parallel_private

parallel_private.c:21:49: error: expected ')' before '[' token
     #pragma acc parallel num_gangs(10) private(c[0:n])
  • parallel_firstprivate
  • parallel_private

INTERNAL-COMPILER-ERROR

cc -fopenacc -lm acc_memcpy_to_device_async.c -o acc_memcpy_to_device_async

acc_memcpy_to_device_async.c:36:5: warning: 
implicit declaration of function 'acc_memcpy_to_device_async'; 
did you mean 'acc_memcpy_to_device'? [-Wimplicit-function-declaration]
     acc_memcpy_to_device_async(devdata, a, n * sizeof(real_t), 1);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~
     acc_memcpy_to_device
during GIMPLE pass: omplower
acc_memcpy_to_device_async.c:89:13: internal compiler error: Segmentation fault
     #pragma acc update host(hostdata[2*n:n]) async(3)

'tile' argument

cc -fopenacc -lm parallel_loop_tile.c -o parallel_loop_tile

parallel_loop_tile.c:45:40: error: 'tile' argument needs positive integral constant
         #pragma acc parallel loop tile(tile_arg/10, tile_arg, tile_arg*2)

expected clause before finalize

cc -fopenacc -lm exit_data_finalize.c -o exit_data_finalize

exit_data_finalize.c:63:66: error: expected '#pragma acc' clause before 'finalize'
     #pragma acc exit data delete(a[0:n], b[0:n]) copyout(c[0:n]) finalize

array section

cc -fopenacc -lm declare.c -o declare

declare.c:50:29: error: array section in '#pragma acc declare'
   #pragma acc declare copyin(b[0:n])

array section is not contiguous

cc -fopenacc -lm acc_async_test_all.c -o acc_async_test_all

acc_async_test_all.c:25:34: error: array section is not contiguous in 'map' clause
     #pragma acc enter data create(a[0:10][0:n], b[0:10][0:n], 
                                   c[0:10][0:n], d[0:10][0:n], e[0:10][0:n])

'c' undeclared

cc -fopenacc -lm atomic_capture_bitor_equals.c -o atomic_capture_bitor_equals

atomic_capture_bitor_equals.c:54:70: error: 'c' undeclared (first use in this function)
     #pragma acc data copyin(a[0:n]) copy(totals[0:n/10 + 1]) copyout(c[0:n])

cce/8.7.6 compilation failures

Hi,

Trying to compile the .c test files with cce/8.7.6, i am getting a few failures. It's probably fine because i expect cce to have only limited support for OpenACC>2.0 but i'd like to get your expert feedback about those failures, i.e can we explain those failure because cce just does not support the directives used in the test code (then i'll be fine with it). If not, we may want to report to the compiler team or adapt the src code.

CC-10321

For instance:

cc -hacc -hnoomp atomic_capture_expr_bitand_x.c -o atomic_capture_expr_bitand_x

CC-10321 craycc: ERROR File = atomic_capture_expr_bitand_x.c, Line = 60
  The specified statement does not have a valid form for an OpenACC atomic directive.
                   b[x] = totals[x%(n/10 + 1)] = a[x] & totals[x%(n/10 + 1)];
  • atomic_capture_bitand_equals
  • atomic_capture_bitor_equals
  • atomic_capture_bitxor_equals
  • atomic_capture_divided_equals
  • atomic_capture_expr_bitand_x
  • atomic_capture_expr_bitor_x
  • atomic_capture_expr_bitxor_x
  • atomic_capture_expr_divided_x
  • atomic_capture_expr_lshift_x
  • atomic_capture_expr_minus_x
  • atomic_capture_expr_multiply_x
  • atomic_capture_expr_plus_x
  • atomic_capture_expr_rshift_x
  • atomic_capture_lshift_equals
  • atomic_capture_minus_equals
  • atomic_capture_multiply_equals
  • atomic_capture_plus_equals
  • atomic_capture_postdecrement
  • atomic_capture_postincrement
  • atomic_capture_predecrement
  • atomic_capture_preincrement
  • atomic_capture_rshift_equals

CC-10299

cc -hacc -hnoomp kernels_default_present.c -o kernels_default_present

CC-10299 craycc: ERROR File = kernels_default_present.c, Line = 15
  The OpenACC default clause has a malformed argument.
      #pragma acc kernels default(present)
  • kernels_default_present
  • parallel_default_present

CC-10302

cc -hacc -hnoomp parallel_loop_auto.c -o parallel_loop_auto

CC-10302 craycc: ERROR File = parallel_loop_auto.c, Line = 18
  This OpenACC directive does not accept the "auto" clause.
        #pragma acc parallel loop auto

undefined ref

cc -hacc -hnoomp acc_copyin_async.c -o acc_copyin_async

CC-223 craycc: WARNING File = acc_copyin_async.c, Line = 19
  The function "acc_copyin_async" is declared implicitly.
      acc_copyin_async(a, n * sizeof(real_t), 1);
      ^


            for (int x = 0; x < n; ++x){
CC-7254 craycc: WARNING File = acc_copyin_async.c, Line = 26
   A loop which was marked for worksharing will not be workshared.

            for (int x = 0; x < n; ++x){
CC-7254 craycc: WARNING File = acc_copyin_async.c, Line = 33
   A loop which was marked for worksharing will not be workshared.

Cray C : Version 8.7.6 (20181015183930_cdba25e9872fe61c78223bf16dd108a77ffce687)
Total warnings detected in acc_copyin_async.c: 3
/tmp/pe_9764/acc_copyin_async_1.o: In function `test':
OpenACCV-V.git/Tests/acc_copyin_async.c:19: 
  undefined reference to `acc_copyin_async'
  • acc_copyin_async
  • acc_copyout_async
  • acc_create_async
  • acc_delete_async
  • acc_get_default_async
  • acc_memcpy_from_device_async
  • acc_memcpy_to_device_async
  • acc_copyin
  • acc_create
  • acc_malloc
  • acc_set_default_async
  • acc_update_device_async
  • acc_update_self_async

compiler runtime error

For the .c tests i could to compile, i got this amount of failures:

gnu/8.2.0 cce/8.7.6 pgi/18.10
4 23 7

All the logs: https://file.io/ghh2xA

gnu/8.2.0 cce/8.7.6 pgi/18.10
atomic_bitand_equals/o:Error: Test failed. atomic_bitand_equals/o:Error: Test failed. atomic_bitand_equals/o:Error: Test failed.
-- -- --
atomic_expr_bitor_x/o:Error: Test failed. atomic_bitor_equals/o:Error: Test failed. atomic_expr_lshift_x/o:Error: Test failed.
atomic_update_bitor_equals/o:Error: Test failed. atomic_expr_bitor_x/o:Error: Test failed. atomic_expr_plus_x/o:Error: Test failed.
atomic_x_bitor_expr/o:Error: Test failed. atomic_expr_plus_x/o:Error: Test failed. atomic_expr_rshift_x/o:Error: Test failed.
  atomic_postdecrement/o:Error: Test failed. atomic_update_expr_lshift_x/o:Error: Test failed.
  atomic_postincrement/o:Error: Test failed. atomic_update_expr_rshift_x/o:Error: Test failed.
  atomic_predecrement/o:Error: Test failed. parallel_loop_tile/o:Error: Test failed.
  atomic_preincrement/o:Error: Test failed.  
  atomic_update_bitor_equals/o:Error: Test failed.  
  atomic_update_expr_bitand_x/o:Error: Test failed.  
  atomic_update_expr_bitor_x/o:Error: Test failed.  
  atomic_update_postdecrement/o:Error: Test failed.  
  atomic_update_postincrement/o:Error: Test failed.  
  atomic_update_predecrement/o:Error: Test failed.  
  atomic_update_preincrement/o:Error: Test failed.  
  atomic_update_x_bitand_expr/o:Error: Test failed.  
  atomic_update_x_bitor_expr/o:Error: Test failed.  
  atomic_x_bitand_expr/o:Error: Test failed.  
  atomic_x_bitor_expr/o:Error: Test failed.  
  declare/o:Error: Test failed.  
  kernels_wait/o:Error: Test failed.  
  parallel_deviceptr/o:Error: Test failed.  
  parallel_switch/o:Error: Test failed.

Update for OpenACC 3.0?

Last update was done before the released of OpenACC 3.0. Is there an update coming to check newest implementation?

Total_host declaration

Unit Test Files:

  • atomic_minus_equals.c
  • atomic_structured_assign_x_minus_expr.c
  • atomic_update_minus_equals.c
  • atomic_update_x_minus_expr.c
  • atomic_x_minus_expr.c

Change:

  • totals_host needs to be initialized in the same manner as totals.

[Fortran] invalid call of function like subroutine

In declare_function_scope_copy.F90 there is a couple of places where a function (copyin_copyout_test) is called as a CALL statement. Some compiler will flag this as an error.

Some example:

CALL copyin_copyout_test(a(:,x), b(:,x), c(:,x), LOOPCOUNT)

CALL copyin_copyout_test(a(:,x), b(:,x), c(:,x), LOOPCOUNT)

CALL copyin_copyout_test(a(:,x), b(:,x), c(:,x), LOOPCOUNT)

CALL copyin_copyout_test(a(:,x), b(:,x), c(:,x), LOOPCOUNT)

There are similar issues in declare_function_scope_copyin.F90 and declare_function_scope_copyout.F90

Broken test `parallel_default_copy.c`?

Thank you for these tests. I am part of a group at Mentor Graphics working on improving OpenACC support in GCC, and we use these tests internally.

Today I noticed that parallel_default_copy.c was failing in our internal testing. Here is the relevant snippet:

    real_t * a = (real_t *)malloc(n * sizeof(real_t));
    real_t * b = (real_t *)malloc(n * sizeof(real_t));
    real_t * c = (real_t *)malloc(n * sizeof(real_t));

    // ... initialization code...

    #pragma acc data copyin(a[0:n], b[0:n])
    {
        #pragma acc parallel
        {
            #pragma acc loop
            for (int x = 0; x < n; ++x){
                c[x] += a[x] + b[x];
            }
        }
    }

When compiled with our system this fails at runtime with a memory error because the memory pointed to by c is not mapped to the accelerator device. We can fix this by adding a copy clause for c[0:n] on the data or the parallel region. Another fix is to turn c from a pointer into an array, as it is in the Fortran version.

What I'm trying to understand is what this test is supposed to be testing, and whether omission of a copy clause for c is deliberate or just an oversight. The OpenACC spec does not mandate copying a pointer's target by default (unlike the default copy for arrays!), and it would be impossible to implement in the general case. Was this test meant to test automatic copying of c[0:n] without an annotation? Have you run these tests on different OpenACC implementations and found at least some of them performing an automatic copy of c[0:n]?

(The same remarks apply to the almost identical test in kernels_default_copy.c.)

errors about OpenACC C 1.0

These are errors other than my pull reqest.

Language: C
Environmet: PGI 20.4 and NVHPC 21.2, NVIDIA V100

When testing OpenACC 1.0, the compilation T3 of kernels_copyin.c fail about no declare "devtest". Maybe T3 does not think about OpenACC 1.0-2.4 because "devtest" is declared T2 and T2 only supports v2.5-2.7.

parallel_loop_async.c has an error. This test supports v1.0-2.7. But "reduction(+:errors[x])" is not supported with v1.0-2.6.

Compilations of Following files fail about long double type.

  • parallel_loop_reduction_add_general_type_check_pt2.c
  • parallel_loop_reduction_add_general_type_check_pt3.c

parallel_loop_reduction_add_general_type_check_pt1.c can be compiled. But the execution of T1 returns an error.
But, when it was compiled by GCC, the execution succeeded. So an error of this file are probably caused PGI and NVHPC.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.