Coder Social home page Coder Social logo

op-dsl / ops Goto Github PK

View Code? Open in Web Editor NEW
58.0 20.0 40.0 47.13 MB

OPS is an API with associated libraries and preprocessors to generate parallel executables for applications on mulit-block structured meshes.

Home Page: http://op-dsl.github.io

License: Other

Cuda 0.60% C++ 75.84% Makefile 0.76% C 7.30% Shell 2.91% Fortran 2.36% Python 6.39% TeX 0.27% Gnuplot 0.01% Perl 0.03% CMake 1.54% Jupyter Notebook 0.06% NASL 0.02% Jinja 1.92%

ops's Introduction

OPS

OPS (Oxford Parallel library for Structured mesh solvers) is a high-level embedded domain specific language for writing multi-block structured mesh algorithms, and the corresponding software library and code translation tools to enable automatic parallelisation on multi-core and many-core architectures. Multi-block structured meshes. The OPS API is embedded in C/C++ and Fortran.

Build Status Documentation Status

This repository contains the implementation of the back-end library and the code-generator, and is structured as follows:

  • ops: Implementation of the user and run-time OPS C/C++ APIs
  • apps: Application examples in C. These are examples of user application code and also include the target parallel code generated by the OPS code generato r.
  • ops_translator: Python/Clang/Fparser/jinja2 based OPS code generator for C/C++/Fortran API (New code generation engine). Note: Please refer to the instructions provided on Read the Docs. for details on Python dependencies required to utilize this code generation.
  • ops_translator_legacy: Python OPS code generator for C/C++/Fortran API (Old code generation engine)
  • scripts : example scripts for setting environmental variables and testing applications
  • cmake : cmake installation files
  • makefiles : makefile based installation files
  • doc: Documentation

Documentation

OPS documentation can be viewed on Read the Docs.

Citing

To cite OPS, please reference the following paper:

I. Z. Reguly, G. R. Mudalige and M. B. Giles, Loop Tiling in Large-Scale Stencil Codes at Run-Time with OPS, in IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 4, pp. 873-886, 1 April 2018, doi: 10.1109/TPDS.2017.2778161.

@ARTICLE{Reguly_et_al_2018,
  author={Reguly, István Z. and Mudalige, Gihan R. and Giles, Michael B.},
  journal={IEEE Transactions on Parallel and Distributed Systems}, 
  title={Loop Tiling in Large-Scale Stencil Codes at Run-Time with OPS}, 
  year={2018},
  volume={29},
  number={4},
  pages={873-886},
  doi={10.1109/TPDS.2017.2778161}}

Support and Contact

The preferred method of reporting bugs and issues with OPS is to submit an issue via the repository’s issue tracker. Users can also email the authors directly by contacting the the OP-DSL team.

Contributing

To contribute to OPS please use the following steps :

  1. Clone this repository (on your local system)
  2. Create a new branch in your cloned repository
  3. Make changes / contributions in your new branch
  4. Submit your changes by creating a Pull Request to the develop branch of the OPS repository

The contributions in the develop branch will be merged into the master branch as we create a new release.

License

OPS is released as an open-source project under the BSD 3-Clause License. See the file called LICENSE for more information.

ops's People

Contributors

addy419 avatar andymallinson avatar ashutosh-londhe avatar benielt avatar bgd54 avatar cavalcantelucas avatar csbrady-warwick avatar ctjacobs avatar dl11g11 avatar gabrielsvc avatar gamdow avatar gihanmudalige avatar jpmeng avatar knotman90 avatar maelso avatar reguly avatar spjammy avatar timsnag avatar tobyflynn avatar vtsinginos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ops's Issues

Parallel HDF5 output for multi-dim variables

It appears that the ops_fetch_block_hdf5_file call fails to correctly export a ops_dat object with multi-dimension in the parallel computing mode. On the other hand, It appears that this call tend not to overwrite the data in an existing h5 file.

Unable to build examples

Dear OPS team,

I am looking at OPS to incorporate into my CFD code. I am unable to compile the examples. The following errors occur with gnu 7.5 (for every example).
Thank you for your help.

Kind regards,
Dr Abhishek

rm -f .o .mod
gfortran -ffree-form -ffree-line-length-none -J/home/abhi/OPS/fortran/mod/gnu -I/home/abhi/OPS/fortran/mod/gnu -L/home/abhi/OPS/fortran/lib -lstdc++ constants.F90 MPI/
_seq_kernel.F90 mblock_ops.F90
-o mblock_seq -lops_for_seq
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function ops_construct_tile_plan()': /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void
)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o):/usr/include/c++/7/ext/new_allocator.h:125: more undefined references to operator delete(void*)' follow /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function ops_construct_tile_plan()':
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:102: undefined reference to std::__throw_bad_alloc()' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<ops_kernel_descriptor*, std::allocator<ops_kernel_descriptor*> >::~vector()': /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >::~vector()': /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o):/usr/include/c++/7/ext/new_allocator.h:125: more undefined references to operator delete(void*)' follow /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function void std::vector<ops_kernel_descriptor*, std::allocator<ops_kernel_descriptor*> >::_M_realloc_insert<ops_kernel_descriptor* const&>(__gnu_cxx::__normal_iterator<ops_kernel_descriptor**, std::vector<ops_kernel_descriptor*, std::allocator<ops_kernel_descriptor*> > >, ops_kernel_descriptor* const&)':
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<tiling_plan, std::allocator<tiling_plan> >::_M_default_append(unsigned long)': /usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o):/usr/include/c++/7/ext/new_allocator.h:125: more undefined references to operator delete(void*)' follow
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<tiling_plan, std::allocator<tiling_plan> >::_M_default_append(unsigned long)': /usr/include/c++/7/bits/stl_vector.h:1505: undefined reference to std::__throw_length_error(char const*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<unsigned long, std::allocator<unsigned long> >::_M_default_append(unsigned long)': /usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<unsigned long, std::allocator >::_M_default_append(unsigned long)':
/usr/include/c++/7/bits/stl_vector.h:1505: undefined reference to std::__throw_length_error(char const*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<std::vector<int, std::allocator >, std::allocator<std::vector<int, std::allocator > > >::_M_default_append(unsigned long)':
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<std::vector<int, std::allocator >, std::allocator<std::vector<int, std::allocator > > >::_M_default_append(unsigned long)':
/usr/include/c++/7/bits/stl_vector.h:1505: undefined reference to std::__throw_length_error(char const*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<int, std::allocator >::_M_default_append(unsigned long)':
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<int, std::allocator<int> >::_M_default_append(unsigned long)': /usr/include/c++/7/bits/stl_vector.h:1505: undefined reference to std::__throw_length_error(char const*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function void std::vector<ops_dat_core*, std::allocator<ops_dat_core*> >::_M_realloc_insert<ops_dat_core* const&>(__gnu_cxx::__normal_iterator<ops_dat_core**, std::vector<ops_dat_core*, std::allocator<ops_dat_core*> > >, ops_dat_core* const&)': /usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o):(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to __gxx_personality_v0'
collect2: error: ld returned 1 exit status
/home/abhi/OPS/makefiles/Makefile.f_app:30: recipe for target 'mblock_seq' failed

build error

Hi, I am getting the following build errors. I have the following modules loaded:

  1) cmake/3.20.2    3) intel-compilers/2018.1.163   5) intel-mpi/2019.3.199
  2) python/2.7.14   4) hdf5/1.12.0                  6) cuda/10.2

I then used this cmake command:

$ cmake .. -DBUILD_OPS_APPS=ON -DOPS_TEST=ON -DAPP_INSTALL_DIR=$HOME/OPS-APP \ -DCMAKE_INSTALL_PREFIX=$HOME/OPS-INSTALL -DHDF5_ROOT=/home/admwm1/hdf5-1.12.0

I then typed make and get the following link error:

[ 71%] Linking CXX executable multidim_seq_dev
CMakeFiles/multidim_seq_dev.dir/tmp/multidim.cpp.o: In function `main':
multidim.cpp:(.text.startup+0x79a): undefined reference to `ops_halo_group_core::halo_transfer()'

syntax error on "ops_par_loop_impl" ,folder name: laplace2d_tutorial

The severity code states that the project file line is prohibited from displaying status details
Error C2065 "build_indices" : Undeclared identifier ops_test I:\ OPS \ ops-develop \OPS \c\include\ops_seq_v2.h 333

The severity code states that the project file line is prohibited from displaying status details
Error C2760 Syntax error: Unexpected '{' appears here; ops_test I:\ OPS \ ops-develop \OPS \c\include\ops_seq_v2.h 333
The severity code states that the project file line is prohibited from displaying status details
Error C3878 Syntax error: Unexpected tag "{" ops_test I:\ OPS \ ops-develop \OPS \c\include\ops_seq_v2.h 333 after "expression"
The severity code states that the project file line is prohibited from displaying status details
Error C2760 Syntax error: An unexpected "} "appears here; It should be ";" ops_test I:\Ops\OPS-develop\ops\c\include\ops_seq_v2.h 333
image

fortran+shsgc+unable to generate MPI version

when I make apps/fortran/shsgc,something wrong happend : gfortran: error: unrecognized command line option ‘-module’.I use the Intel compiler and I checked the makefile :Makefile.intel,FFLAGS = -module $(F_INC_MOD).So can anybody know what can I do for this problem,and what's meaning of the option "-module".By the way ,my Intel compiler is not managed by module.Thank you so much

openacc and mpi+openacc applications

Some of the openacc and mpi_openacc parallelisations of the application in the OPS repository either does not give the expected output or does not run/seg-faults/produces errors. The following is a list that fails on Octon :

  1. CloverLeaf_3D and CloverLeaf_3D_HDF5 (C/C++ API) : openacc and mpi+openacc does not produce the correct final kinetic energy (i.e. Validation tests fail)
  2. Poisson (C/C++ API) : mpi+openacc (when executed with more than 1 mpi proc) fails with error -
    call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
    call to cuMemFreeHost returned error 700: Illegal address during kernel execution
    (note that when executed with 1 mpi proc or executing the plain openacc will not give correct results)
  3. Poisson (Fortran API) : mpi+openacc does not exist. openacc gives incorrect results
  4. shsgc (Fortran API): mpi+openacc (when executed with more than 1 mpi proc) fails with NaNs as the final result (but plain openacc passes and mpi+openacc with 1 proc gives correct results)

Coding guidelines

I have copied over the coding guide used in the OP2 project to OPS.

https://github.com/gihanmudalige/OPS/wiki/Coding-Guidelines

This is just so to lay-out some basic coding rules for OPS. Please attempt to follow these when committing code to the OPS repository. If any one has suggestions for changing these guidelines, suggestions that would get all of us write better code, then please let the team know.

Discripany with cmake build and makefile build of OPS

I tried to use cmake build as mentioned in https://ops-dsl.readthedocs.io/en/latest/installation.html#build-library-and-example-applications-separately. But the generated library is placed within $(OPS_INSTALL_PATH)/lib, header files within $(OPS_INSTALL_PATH)/include and binaries within $(OPS_INSTALL_PATH)/bin. In the other hand building app manually using the given makefiles in OPS, expecting the libraries to be in $(OPS_INSTALL_PATH)/c/lib/$(OPS_COMPILER).

In the other hand, when I tried building using Makefile inside OPS/ops/c/generated library is correctly placed inside the expected path ($(OPS_INSTALL_PATH)/c/lib/$(OPS_COMPILER)) mentioned in makefiles/Makeifle.commonand makefiles/Makefile.c_appetc.

Why is there such a discrepancy in CMake flow?

HDF5 File Write fails for mpi_tiling parallel versions

It appears that writing to HDF5 files for from/after a mpi+tiling run fails. The current master branch's cloverleaf_mpi_tiling version under CLoverLeaf_3D_HDF5 demonstrates the problem.
A fix is being looked into in the branch fix/HDF5_with_tiling

ops_dat_get_local_npartitions is hardcoded in MPI to return 1

Actual behaviour
ops_dat_get_local_npartitions is hard coded to always return 1 in MPI code. This causes an unexpected result when applying it to an ops_dat object that is not decomposed onto a given rank.
Expected behaviour
ops_dat_get_local_npartitions should return 0 on ranks that do not have the specified ops_dat object decomposed onto them

Incorrect filename generated for OPS MPI implementation of ops_print_dat_to_txtfile

Observed Behaviour
If a filename of "out.txt" is specified in a call to ops_print_dat_to_txtfile and then OPS is used to generate MPI code then output files named "out.txt.n.{unprintable_character}" are generated where n is the MPI rank of the generating MPI process. The unprintable character has the byte value of the MPI rank of the generating MPI process. For more than 255 MPI processes duplicate filenames are generated.

Expected Behaviour
If a filename of "out.txt" is specified in a call to ops_print_dat_to_txtfile then output files named "out.txt.n" where n is the MPI rank of the generating process.

Fix
Attached patch fixes error by replacing erroneous adding of integer to std::string with a call to std::to_string in ops/c/src/mpi/ops_mpi_core.cpp. Also removes duplicate appending of rank number in /ops/c/src/mpi/ops_mpi_decl.cpp. Equivalent code in /ops/c/src/mpi/ops_mpi_decl_cuda.cpp and /ops/c/src/mpi/ops_mpi_decl_opencl.cpp do not have this error but are still affected by the error in ops_mpi_core.cpp

fix.txt

TeaLeaf MPI+OpenCL giving NaNs on CPU

Currently, MPI+OpenCL when run on the CPU with more than 1 processor produces NaNs and the test fails. This problem is not there in any other application nor in MPI+OpenCL on GPUs

OpenMP reductions

OpenMP is currently limited at 64 threads and a reduction size of 64.

Questions about multi-GPU

I have two questions with regard to multi-GPU support in OPS

  1. Using CUDA, how to enable multi-GPU execution of the code? I see that there is a CMake variable GPU_NUMBER but setting it to more than 1 does not impact the execution at all. For example, after setting GPU_NUMBER to 2 and compiling Poisson app, I could see (from Nisght System) that poisson_cuda executable only run on a single GPU. Can you give specific instructions on how to run this app on multi-GPU?

  2. Does OPS automatically optimize the code by overlapping computation and communication on multi-GPU? For example, with stencil operations, it is possible to overlap part of the computation with the memory transfer of the halo. Can OPS do this automatically? or does such optimization left for the user to implement?

Missing pragmas for extern variables on generated OpenACC implementation

Global variables need to be copied to the device and OpenACC #pragmas are used direct compiler to allocate and update global variables on the device. Since required #pragmas are not present on generated OpenACC implementation, the compiler gives the following undefined variable error for apps/c/poisson

nvlink error   : Undefined reference to 'dx' in 'OpenACC/poisson_kernels_c.o'
nvlink error   : Undefined reference to 'dy' in 'OpenACC/poisson_kernels_c.o'
pgacclnk: child process exit status 2: /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/compilers/bin/tools/nvdd

The following pragmas should be applied in the host program poisson_ops.cpp

when declaring global variables, this will allocate memory in the device

double dx,dy;
#pragma acc declare copyin(dx)
#pragma acc declare copyin(dy)

when updating the global variable values, this will update the value in the device

  dx = 0.01;
  dy = 0.01;
  #pragma acc update device(dx)
  #pragma acc update device(dy)

finally, global memory allocation should be indicated in the device program as well (poisson_common.h)

extern double dx;
extern double dy;
#pragma acc declare copyin(dx)
#pragma acc declare copyin(dy)

This fixes the compiler error and the test passes.

Silent failure of code generation with ops_decl_dat when using non literal names and types

Observed behaviour
ops_decl_const for C appears to expect the name and type parameters to be character literals because of how they are used during code generation. Using char* variables causes a silent failure during code generation where the generated code uses the name of the variable rather than the content. This is inevitable because of code being generated before runtime but the error is silent at code generation and produces unusual errors at code compile time. Documentation does not make this clear

Correct behaviour
Assuming that I am correct and charcter literals are required and are intended to be required then OPS generator should fail with an error message when name and type aren't character literals. Documentation should mention the restriction. Attached patch fixes both. Request review of patch as suitable for inclusion

Remaining question
OPS Fortran generator appears to do nothing with ops_decl_const so this error does not appear to apply there. Unclear if this is correct.

fix.txt

CUDA problem for big mesh size and multi-dim dat on a single GPU

It seems that the CUDA code raises an exception about an illegal memory access on a single GPU when the mesh size is beyond a certain value. When the size is about 333333 (seems up to 128128128) the code runs normally (CPU codes as well)., while the ops_dat is 19-long double number. Beyond this, we get the following with 256256256.

KerCopyf_cuda_kernel.cu(169) : cutilSafeCall() Runtime API error : an illegal memory access was encountered.
terminate called after throwing an instance of 'OPSException'
what(): an illegal memory access was encountered
[1] 1732705 abort (core dumped) ./Cavity3DCuda

With 164164164 we get this

KerCutCellEQMDiffuseRefl3D_cuda_kernel.cu(298) : cutilSafeCall() Runtime API error : an illegal memory access was encountered.
terminate called after throwing an instance of 'OPSException'
what(): an illegal memory access was encountered

which is called later than the KerCopyf_cuda_kernel.cu.

I checked the memory allocation (about 7~8GB), which seems still below the capacity of the GPU (32GB) and CPU memory (96GB).

TAILQ does not like realloc

The realloc of OPS_block_list in ops_decl_block in ops_lib_core.c breaks the TAILQ structures inside. Can be detected when reallocating from non-0 size to bigger, and with valgrind - modify poisson.cpp to use more than 30 blocks.

OPS TODO Task List

Task List for OPS Release

  • OPS Move code generated APPS to OPS-APPS repo - automatically regenerate code for changes in OP2-Common repo
  • Add AMD HIP tests to CI test flow

Task list for next Release

  • OPS build with SPACK recipies / config
  • Upgrade/Update OPS code generator to use OP-CG codegen ideas - see https://github.com/jdjfisher/op-cg
  • Add SYCL Parallelization to OPS repo
  • OPS Fortran API develop - more applications, updated code generator (in line with C/C++ API/build)
  • Test / Add to CI tests the Tridsolver integration testing on non-intel CPUs
  • Add to readthedocs documentation - Dev-Docs and other missing sections

OpenCL compile-time dataset sizes

With OpenCL, we define xdim0 (etc) with preprocessor macros, instead of passing variables. This causes re-used kernels to break when run with differently sized datasets repeatedly. We need to pass these extents to the kernel

printf in a user kernel

printf() is currently not supported by all the devices (e.g. NVIDIA OpenCL does not support printf). Therefore it is recommended that printf() is only called on the host

OPS_ACCxx limited to 0 to 99. Need increased limits.

opensbliblock00_kernels.h:1299:18: error: ‘OPS_ACC108’ was not declared in this scope; did you mean ‘OPS_ACC10’?
1299 | Residual0_B0[OPS_ACC108(0,0,0)] = -rc6*(D11_B0[OPS_ACC23(0,0,0)]*wk66_B0[OPS_ACC89(0,0,0)] +
| ^~~~~~~~~~
| OPS_ACC10

opensbliblock00_kernels.h:1306:18: error: ‘OPS_ACC106’ was not declared in this scope; did you mean ‘OPS_ACC10’?
1306 | Residual1_B0[OPS_ACC106(0,0,0)] = -rc6*(D11_B0[OPS_ACC23(0,0,0)]*wk66_B0[OPS_ACC89(0,0,0)] +
| ^~~~~~~~~~
| OPS_ACC10

opensbliblock00_kernels.h:1313:18: error: ‘OPS_ACC105’ was not declared in this scope; did you mean ‘OPS_ACC10’?
1313 | Residual2_B0[OPS_ACC105(0,0,0)] = -rc6*(D11_B0[OPS_ACC23(0,0,0)]*wk66_B0[OPS_ACC89(0,0,0)] +

OPS_ACCxx are limited to xx from 0 to 99. Need to increase limit.

OpenCL compiler problem

Compiling and running the OpenCL version of Cloverleaf on NVIDIA GPUs gives the following error at runtime.

UNREACHABLE executed! Stack dump: 0. Running pass 'NVPTX Assembly Printer'

The same code on the CPU as the device executes ok. We believe that this is due to the array of structs states used in generate_cunk_kernel.cl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.