Coder Social home page Coder Social logo

op-dsl / ops Goto Github PK

View Code? Open in Web Editor NEW
58.0 20.0 40.0 47.13 MB

OPS is an API with associated libraries and preprocessors to generate parallel executables for applications on mulit-block structured meshes.

Home Page: http://op-dsl.github.io

License: Other

Cuda 0.60% C++ 75.84% Makefile 0.76% C 7.30% Shell 2.91% Fortran 2.36% Python 6.39% TeX 0.27% Gnuplot 0.01% Perl 0.03% CMake 1.54% Jupyter Notebook 0.06% NASL 0.02% Jinja 1.92%

ops's Issues

HDF5 File Write fails for mpi_tiling parallel versions

It appears that writing to HDF5 files for from/after a mpi+tiling run fails. The current master branch's cloverleaf_mpi_tiling version under CLoverLeaf_3D_HDF5 demonstrates the problem.
A fix is being looked into in the branch fix/HDF5_with_tiling

TeaLeaf MPI+OpenCL giving NaNs on CPU

Currently, MPI+OpenCL when run on the CPU with more than 1 processor produces NaNs and the test fails. This problem is not there in any other application nor in MPI+OpenCL on GPUs

Discripany with cmake build and makefile build of OPS

I tried to use cmake build as mentioned in https://ops-dsl.readthedocs.io/en/latest/installation.html#build-library-and-example-applications-separately. But the generated library is placed within $(OPS_INSTALL_PATH)/lib, header files within $(OPS_INSTALL_PATH)/include and binaries within $(OPS_INSTALL_PATH)/bin. In the other hand building app manually using the given makefiles in OPS, expecting the libraries to be in $(OPS_INSTALL_PATH)/c/lib/$(OPS_COMPILER).

In the other hand, when I tried building using Makefile inside OPS/ops/c/generated library is correctly placed inside the expected path ($(OPS_INSTALL_PATH)/c/lib/$(OPS_COMPILER)) mentioned in makefiles/Makeifle.commonand makefiles/Makefile.c_appetc.

Why is there such a discrepancy in CMake flow?

OpenCL compiler problem

Compiling and running the OpenCL version of Cloverleaf on NVIDIA GPUs gives the following error at runtime.

UNREACHABLE executed! Stack dump: 0. Running pass 'NVPTX Assembly Printer'

The same code on the CPU as the device executes ok. We believe that this is due to the array of structs states used in generate_cunk_kernel.cl

build error

Hi, I am getting the following build errors. I have the following modules loaded:

  1) cmake/3.20.2    3) intel-compilers/2018.1.163   5) intel-mpi/2019.3.199
  2) python/2.7.14   4) hdf5/1.12.0                  6) cuda/10.2

I then used this cmake command:

$ cmake .. -DBUILD_OPS_APPS=ON -DOPS_TEST=ON -DAPP_INSTALL_DIR=$HOME/OPS-APP \ -DCMAKE_INSTALL_PREFIX=$HOME/OPS-INSTALL -DHDF5_ROOT=/home/admwm1/hdf5-1.12.0

I then typed make and get the following link error:

[ 71%] Linking CXX executable multidim_seq_dev
CMakeFiles/multidim_seq_dev.dir/tmp/multidim.cpp.o: In function `main':
multidim.cpp:(.text.startup+0x79a): undefined reference to `ops_halo_group_core::halo_transfer()'

CUDA problem for big mesh size and multi-dim dat on a single GPU

It seems that the CUDA code raises an exception about an illegal memory access on a single GPU when the mesh size is beyond a certain value. When the size is about 333333 (seems up to 128128128) the code runs normally (CPU codes as well)., while the ops_dat is 19-long double number. Beyond this, we get the following with 256256256.

KerCopyf_cuda_kernel.cu(169) : cutilSafeCall() Runtime API error : an illegal memory access was encountered.
terminate called after throwing an instance of 'OPSException'
what(): an illegal memory access was encountered
[1] 1732705 abort (core dumped) ./Cavity3DCuda

With 164164164 we get this

KerCutCellEQMDiffuseRefl3D_cuda_kernel.cu(298) : cutilSafeCall() Runtime API error : an illegal memory access was encountered.
terminate called after throwing an instance of 'OPSException'
what(): an illegal memory access was encountered

which is called later than the KerCopyf_cuda_kernel.cu.

I checked the memory allocation (about 7~8GB), which seems still below the capacity of the GPU (32GB) and CPU memory (96GB).

ops_dat_get_local_npartitions is hardcoded in MPI to return 1

Actual behaviour
ops_dat_get_local_npartitions is hard coded to always return 1 in MPI code. This causes an unexpected result when applying it to an ops_dat object that is not decomposed onto a given rank.
Expected behaviour
ops_dat_get_local_npartitions should return 0 on ranks that do not have the specified ops_dat object decomposed onto them

fortran+shsgc+unable to generate MPI version

when I make apps/fortran/shsgc,something wrong happend : gfortran: error: unrecognized command line option ‘-module’.I use the Intel compiler and I checked the makefile :Makefile.intel,FFLAGS = -module $(F_INC_MOD).So can anybody know what can I do for this problem,and what's meaning of the option "-module".By the way ,my Intel compiler is not managed by module.Thank you so much

openacc and mpi+openacc applications

Some of the openacc and mpi_openacc parallelisations of the application in the OPS repository either does not give the expected output or does not run/seg-faults/produces errors. The following is a list that fails on Octon :

  1. CloverLeaf_3D and CloverLeaf_3D_HDF5 (C/C++ API) : openacc and mpi+openacc does not produce the correct final kinetic energy (i.e. Validation tests fail)
  2. Poisson (C/C++ API) : mpi+openacc (when executed with more than 1 mpi proc) fails with error -
    call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
    call to cuMemFreeHost returned error 700: Illegal address during kernel execution
    (note that when executed with 1 mpi proc or executing the plain openacc will not give correct results)
  3. Poisson (Fortran API) : mpi+openacc does not exist. openacc gives incorrect results
  4. shsgc (Fortran API): mpi+openacc (when executed with more than 1 mpi proc) fails with NaNs as the final result (but plain openacc passes and mpi+openacc with 1 proc gives correct results)

Silent failure of code generation with ops_decl_dat when using non literal names and types

Observed behaviour
ops_decl_const for C appears to expect the name and type parameters to be character literals because of how they are used during code generation. Using char* variables causes a silent failure during code generation where the generated code uses the name of the variable rather than the content. This is inevitable because of code being generated before runtime but the error is silent at code generation and produces unusual errors at code compile time. Documentation does not make this clear

Correct behaviour
Assuming that I am correct and charcter literals are required and are intended to be required then OPS generator should fail with an error message when name and type aren't character literals. Documentation should mention the restriction. Attached patch fixes both. Request review of patch as suitable for inclusion

Remaining question
OPS Fortran generator appears to do nothing with ops_decl_const so this error does not appear to apply there. Unclear if this is correct.

fix.txt

syntax error on "ops_par_loop_impl" ,folder name: laplace2d_tutorial

The severity code states that the project file line is prohibited from displaying status details
Error C2065 "build_indices" : Undeclared identifier ops_test I:\ OPS \ ops-develop \OPS \c\include\ops_seq_v2.h 333

The severity code states that the project file line is prohibited from displaying status details
Error C2760 Syntax error: Unexpected '{' appears here; ops_test I:\ OPS \ ops-develop \OPS \c\include\ops_seq_v2.h 333
The severity code states that the project file line is prohibited from displaying status details
Error C3878 Syntax error: Unexpected tag "{" ops_test I:\ OPS \ ops-develop \OPS \c\include\ops_seq_v2.h 333 after "expression"
The severity code states that the project file line is prohibited from displaying status details
Error C2760 Syntax error: An unexpected "} "appears here; It should be ";" ops_test I:\Ops\OPS-develop\ops\c\include\ops_seq_v2.h 333
image

Questions about multi-GPU

I have two questions with regard to multi-GPU support in OPS

  1. Using CUDA, how to enable multi-GPU execution of the code? I see that there is a CMake variable GPU_NUMBER but setting it to more than 1 does not impact the execution at all. For example, after setting GPU_NUMBER to 2 and compiling Poisson app, I could see (from Nisght System) that poisson_cuda executable only run on a single GPU. Can you give specific instructions on how to run this app on multi-GPU?

  2. Does OPS automatically optimize the code by overlapping computation and communication on multi-GPU? For example, with stencil operations, it is possible to overlap part of the computation with the memory transfer of the halo. Can OPS do this automatically? or does such optimization left for the user to implement?

OpenCL compile-time dataset sizes

With OpenCL, we define xdim0 (etc) with preprocessor macros, instead of passing variables. This causes re-used kernels to break when run with differently sized datasets repeatedly. We need to pass these extents to the kernel

TAILQ does not like realloc

The realloc of OPS_block_list in ops_decl_block in ops_lib_core.c breaks the TAILQ structures inside. Can be detected when reallocating from non-0 size to bigger, and with valgrind - modify poisson.cpp to use more than 30 blocks.

Missing pragmas for extern variables on generated OpenACC implementation

Global variables need to be copied to the device and OpenACC #pragmas are used direct compiler to allocate and update global variables on the device. Since required #pragmas are not present on generated OpenACC implementation, the compiler gives the following undefined variable error for apps/c/poisson

nvlink error   : Undefined reference to 'dx' in 'OpenACC/poisson_kernels_c.o'
nvlink error   : Undefined reference to 'dy' in 'OpenACC/poisson_kernels_c.o'
pgacclnk: child process exit status 2: /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/compilers/bin/tools/nvdd

The following pragmas should be applied in the host program poisson_ops.cpp

when declaring global variables, this will allocate memory in the device

double dx,dy;
#pragma acc declare copyin(dx)
#pragma acc declare copyin(dy)

when updating the global variable values, this will update the value in the device

  dx = 0.01;
  dy = 0.01;
  #pragma acc update device(dx)
  #pragma acc update device(dy)

finally, global memory allocation should be indicated in the device program as well (poisson_common.h)

extern double dx;
extern double dy;
#pragma acc declare copyin(dx)
#pragma acc declare copyin(dy)

This fixes the compiler error and the test passes.

printf in a user kernel

printf() is currently not supported by all the devices (e.g. NVIDIA OpenCL does not support printf). Therefore it is recommended that printf() is only called on the host

Unable to build examples

Dear OPS team,

I am looking at OPS to incorporate into my CFD code. I am unable to compile the examples. The following errors occur with gnu 7.5 (for every example).
Thank you for your help.

Kind regards,
Dr Abhishek

rm -f .o .mod
gfortran -ffree-form -ffree-line-length-none -J/home/abhi/OPS/fortran/mod/gnu -I/home/abhi/OPS/fortran/mod/gnu -L/home/abhi/OPS/fortran/lib -lstdc++ constants.F90 MPI/
_seq_kernel.F90 mblock_ops.F90
-o mblock_seq -lops_for_seq
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function ops_construct_tile_plan()': /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void
)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o):/usr/include/c++/7/ext/new_allocator.h:125: more undefined references to operator delete(void*)' follow /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function ops_construct_tile_plan()':
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:102: undefined reference to std::__throw_bad_alloc()' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<ops_kernel_descriptor*, std::allocator<ops_kernel_descriptor*> >::~vector()': /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >::~vector()': /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o):/usr/include/c++/7/ext/new_allocator.h:125: more undefined references to operator delete(void*)' follow /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function void std::vector<ops_kernel_descriptor*, std::allocator<ops_kernel_descriptor*> >::_M_realloc_insert<ops_kernel_descriptor* const&>(__gnu_cxx::__normal_iterator<ops_kernel_descriptor**, std::vector<ops_kernel_descriptor*, std::allocator<ops_kernel_descriptor*> > >, ops_kernel_descriptor* const&)':
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<tiling_plan, std::allocator<tiling_plan> >::_M_default_append(unsigned long)': /usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o):/usr/include/c++/7/ext/new_allocator.h:125: more undefined references to operator delete(void*)' follow
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<tiling_plan, std::allocator<tiling_plan> >::_M_default_append(unsigned long)': /usr/include/c++/7/bits/stl_vector.h:1505: undefined reference to std::__throw_length_error(char const*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<unsigned long, std::allocator<unsigned long> >::_M_default_append(unsigned long)': /usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<unsigned long, std::allocator >::_M_default_append(unsigned long)':
/usr/include/c++/7/bits/stl_vector.h:1505: undefined reference to std::__throw_length_error(char const*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<std::vector<int, std::allocator >, std::allocator<std::vector<int, std::allocator > > >::_M_default_append(unsigned long)':
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<std::vector<int, std::allocator >, std::allocator<std::vector<int, std::allocator > > >::_M_default_append(unsigned long)':
/usr/include/c++/7/bits/stl_vector.h:1505: undefined reference to std::__throw_length_error(char const*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<int, std::allocator >::_M_default_append(unsigned long)':
/usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)' /usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function std::vector<int, std::allocator<int> >::_M_default_append(unsigned long)': /usr/include/c++/7/bits/stl_vector.h:1505: undefined reference to std::__throw_length_error(char const*)'
/home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o): In function void std::vector<ops_dat_core*, std::allocator<ops_dat_core*> >::_M_realloc_insert<ops_dat_core* const&>(__gnu_cxx::__normal_iterator<ops_dat_core**, std::vector<ops_dat_core*, std::allocator<ops_dat_core*> > >, ops_dat_core* const&)': /usr/include/c++/7/ext/new_allocator.h:111: undefined reference to operator new(unsigned long)'
/usr/include/c++/7/ext/new_allocator.h:125: undefined reference to operator delete(void*)' /home/abhi/OPS/fortran/lib/libops_for_seq.a(ops_lazy.o):(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to __gxx_personality_v0'
collect2: error: ld returned 1 exit status
/home/abhi/OPS/makefiles/Makefile.f_app:30: recipe for target 'mblock_seq' failed

OPS TODO Task List

Task List for OPS Release

  • OPS Move code generated APPS to OPS-APPS repo - automatically regenerate code for changes in OP2-Common repo
  • Add AMD HIP tests to CI test flow

Task list for next Release

  • OPS build with SPACK recipies / config
  • Upgrade/Update OPS code generator to use OP-CG codegen ideas - see https://github.com/jdjfisher/op-cg
  • Add SYCL Parallelization to OPS repo
  • OPS Fortran API develop - more applications, updated code generator (in line with C/C++ API/build)
  • Test / Add to CI tests the Tridsolver integration testing on non-intel CPUs
  • Add to readthedocs documentation - Dev-Docs and other missing sections

Coding guidelines

I have copied over the coding guide used in the OP2 project to OPS.

https://github.com/gihanmudalige/OPS/wiki/Coding-Guidelines

This is just so to lay-out some basic coding rules for OPS. Please attempt to follow these when committing code to the OPS repository. If any one has suggestions for changing these guidelines, suggestions that would get all of us write better code, then please let the team know.

OpenMP reductions

OpenMP is currently limited at 64 threads and a reduction size of 64.

Parallel HDF5 output for multi-dim variables

It appears that the ops_fetch_block_hdf5_file call fails to correctly export a ops_dat object with multi-dimension in the parallel computing mode. On the other hand, It appears that this call tend not to overwrite the data in an existing h5 file.

OPS_ACCxx limited to 0 to 99. Need increased limits.

opensbliblock00_kernels.h:1299:18: error: ‘OPS_ACC108’ was not declared in this scope; did you mean ‘OPS_ACC10’?
1299 | Residual0_B0[OPS_ACC108(0,0,0)] = -rc6*(D11_B0[OPS_ACC23(0,0,0)]*wk66_B0[OPS_ACC89(0,0,0)] +
| ^~~~~~~~~~
| OPS_ACC10

opensbliblock00_kernels.h:1306:18: error: ‘OPS_ACC106’ was not declared in this scope; did you mean ‘OPS_ACC10’?
1306 | Residual1_B0[OPS_ACC106(0,0,0)] = -rc6*(D11_B0[OPS_ACC23(0,0,0)]*wk66_B0[OPS_ACC89(0,0,0)] +
| ^~~~~~~~~~
| OPS_ACC10

opensbliblock00_kernels.h:1313:18: error: ‘OPS_ACC105’ was not declared in this scope; did you mean ‘OPS_ACC10’?
1313 | Residual2_B0[OPS_ACC105(0,0,0)] = -rc6*(D11_B0[OPS_ACC23(0,0,0)]*wk66_B0[OPS_ACC89(0,0,0)] +

OPS_ACCxx are limited to xx from 0 to 99. Need to increase limit.

Incorrect filename generated for OPS MPI implementation of ops_print_dat_to_txtfile

Observed Behaviour
If a filename of "out.txt" is specified in a call to ops_print_dat_to_txtfile and then OPS is used to generate MPI code then output files named "out.txt.n.{unprintable_character}" are generated where n is the MPI rank of the generating MPI process. The unprintable character has the byte value of the MPI rank of the generating MPI process. For more than 255 MPI processes duplicate filenames are generated.

Expected Behaviour
If a filename of "out.txt" is specified in a call to ops_print_dat_to_txtfile then output files named "out.txt.n" where n is the MPI rank of the generating process.

Fix
Attached patch fixes error by replacing erroneous adding of integer to std::string with a call to std::to_string in ops/c/src/mpi/ops_mpi_core.cpp. Also removes duplicate appending of rank number in /ops/c/src/mpi/ops_mpi_decl.cpp. Equivalent code in /ops/c/src/mpi/ops_mpi_decl_cuda.cpp and /ops/c/src/mpi/ops_mpi_decl_opencl.cpp do not have this error but are still affected by the error in ops_mpi_core.cpp

fix.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.