scorec / omega_h Goto Github PK
View Code? Open in Web Editor NEWThis project forked from sandialabs/omega_h
Simplex mesh adaptivity for HPC
License: Other
This project forked from sandialabs/omega_h
Simplex mesh adaptivity for HPC
License: Other
Copying the same issue from XGCm
for better location.
When the number of components for a mesh tag exceeds INT8_MAX
, which is 127
, it is problematic as this could often happen.
Simply inferring from the below assert
(OMEGA_H_CHECK
) statement, INT8_MAX seems to be the upper limit manually set currently for the number of components. From my understanding, it may not be a simple manual limit to change, and could be related to many other operations:
Lines 242 to 267 in b4a92ca
In the unit test for mixed mesh, the IDs of adjacent entities returned
from query APIs are checked against known values.
Lines 132 to 153 in 466e470
Kokkos 4.2.0 kokkos/kokkos#6197 is adding 'sort: support custom comparator kokkos/kokkos#6253'. We should try to replace the Intel DPL and HIP/CUDA Thrust sorting calls with this:
Lines 51 to 60 in 08e643b
Build/test results of Omega_h (cws/cuda112 - based on 10.8.1, allows testing with cuda 11.2) with Kokkos 3.7.#, 4.0.#, and 4.1.00 on a i5-13600KF with NVIDIA GeForce RTX 3060, using the build scripts listed below, are:
gcc, cuda, result
10.4.0, 11.1.1, kokkos 4.1.0 fails to compile, kokkos 3.7.* and 4.0.* not tested
10.4.0, 11.2.1, omegah tests run_aniso_test and rc_field_test fail with cuda asserts, kokkos 3.7.* and 4.0.* not tested
10.4.0, 11.3.1, omegah tests run_aniso_test and rc_field_test fail with cuda asserts, kokkos 3.7.* and 4.0.* not tested
10.4.0, 11.4.4, all omegah tests pass
10.4.0, 11.5.2, all omegah tests pass
10.4.0, 11.6.2, all omegah tests pass
10.4.0, 11.7.1, all omegah tests pass
10.4.0, 11.8.0, all omegah tests pass
12.3.0, 12.1.1, all omegah tests pass
This page lists the compatible GCC - CUDA combinations: https://gist.github.com/ax3l/9489132#nvcc
kokkos
suffix=$2
d=buildKokkos${suffix}
cmake -S kokkos -B $d \
-DCMAKE_CXX_COMPILER=g++ \
-DKokkos_ARCH_AMPERE86=ON \
-DKokkos_ENABLE_CUDA=on \
-DKokkos_ENABLE_CUDA_LAMBDA=on \
-DBUILD_SHARED_LIBS=ON \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_DEBUG=on \
-DKokkos_ENABLE_TESTS=off \
-DCMAKE_INSTALL_PREFIX=$d/install
cmake --build $d -j 24 --target install
omegah
suffix=$1
d=buildOmegahKokkos$suffix
cmake -S omega_h -B $d \
-DCMAKE_INSTALL_PREFIX=$d/install \
-DCMAKE_BUILD_TYPE=Debug \
-DBUILD_TESTING=on \
-DOmega_h_USE_MPI=off \
-DOmega_h_USE_Kokkos=on \
-DOmega_h_CUDA_ARCH="86" \
-DCMAKE_CUDA_ARCHITECTURES="86" \
-DKokkos_PREFIX=buildKokkosCUDA$suffix/install \
-DBUILD_SHARED_LIBS=on
cmake --build $d --target install -j8
Running with more than one rank and --osh-time
produces TOP FUNCTIONS
output from each rank that gets mixed together on stdout making the results hard to read.
It appears that building with -DOmega_h_DBG=on
may resolve this as the TASK_0_cout
macro gets defined:
Line 33 in 630e4ed
But, for collecting performance data, specifically with a run using GPUs, enabling that build option may not work, and, if it does, it may increase runtime. This needs to be tested.
omega_h/src/Omega_h_profile.cpp
Lines 320 to 334 in 630e4ed
Currently GitHub actions is only testing with CPU only versions. Since there are different code paths for kokkos and cuda we should add those configurations to the tests.
It looks like all we need is adding
add_library (Omega_h::omega_h ALIAS omega_h)
right after adding the omega_h
target. This will allow a host app to use Omega_h either as a found dependency (via find_package
) or as a sub-project (e.g., via FetchContent).
Line 170 in 9621c44
Is returning incorrect values for a mesh created with build_box(...)
. This API was added to support mixed meshes. The original nents(int ent_dim)
is compatible with build_box(...)
.
One option to avoid this is to add a check (compile time?) to see if the mesh is mixed and if not fail in this API.
While building omega_h
master branch at latest commit e1be29b on Perlmutter
with the following libraries:
I encountered the following error:
-- Cray Programming Environment 2.7.20 CXX
-- CMAKE_VERSION: 3.22.0
-- Omega_h_VERSION: 10.5.0
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: OFF
-- CMAKE_INSTALL_PREFIX: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/install
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA:
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX:
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/lib64/cmake
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_USE_SimModSuite: OFF
-- Omega_h_VALGRIND:
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_MPI: ON
-- Omega_h_ENABLE_DEMANGLED_STACKTRACE: OFF
-- Omega_h_DBG: OFF
-- Cray Programming Environment 2.7.20 C
-- Omega_h_USE_MPI: ON
-- MPI_PREFIX:
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX:
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/lib64/cmake
-- Enabled Kokkos devices: CUDA;SERIAL
-- kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos
-- Kokkos_CONFIG: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/lib64/cmake/Kokkos/KokkosConfig.cmake
-- Kokkos_VERSION: 4.0.1
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_MPI: ON
-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 10.5.0-sha.e1be29b0+100101100000000000
-- Configuring done
-- Generating done
-- Build files have been written to: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build
/global/common/software/nersc/pm-2021q4/sw/cmake-3.22.0/bin/cmake -S/global/homes/z/zhangc20/xgcm/omega_h -B/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build --check-build-system CMakeFiles/Makefile.cmake 0
/global/common/software/nersc/pm-2021q4/sw/cmake-3.22.0/bin/cmake -E cmake_progress_start /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/CMakeFiles /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build//CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
make -f src/CMakeFiles/omega_h.dir/build.make src/CMakeFiles/omega_h.dir/depend
make[2]: Entering directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build && /global/common/software/nersc/pm-2021q4/sw/cmake-3.22.0/bin/cmake -E cmake_depends "Unix Makefiles" /global/homes/z/zhangc20/xgcm/omega_h /global/homes/z/zhangc20/xgcm/omega_h/src /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src/CMakeFiles/omega_h.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
make -f src/CMakeFiles/omega_h.dir/build.make src/CMakeFiles/omega_h.dir/build
make[2]: Entering directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
[ 1%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_adapt.cpp
[ 2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o
[ 2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_align.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_align.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_align.cpp
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_adj.cpp
[ 4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o
[ 4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_amr.cpp
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_any.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_any.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_any.cpp
[ 5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o
[ 5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_approach.cpp
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_amr_transfer.cpp
[ 6%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_amr_topology.cpp
nvc++-Error-Unknown switch: --expt-extended-lambda
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:90: src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
nvc++-Error-Unknown switch: --expt-extended-lambda
nvc++-Error-nvc++-Error-Unknown switch: --expt-extended-lambdaUnknown switch: --expt-extended-lambda
nvc++-Error-Unknown switch: --expt-extended-lambda
nvc++-Error-Unknown switch: --expt-extended-lambda
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:174: src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o] Error 1
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:160: src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o] Error 1
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:104: src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o] Error 1
nvc++-Error-make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:76: src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o] Error 1
Unknown switch: --expt-extended-lambda
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:132: src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o] Error 1
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:118: src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o] Error 1
nvc++-Error-Unknown switch: --expt-extended-lambda
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:146: src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o] Error 1
make[2]: Leaving directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
make[1]: *** [CMakeFiles/Makefile2:128: src/CMakeFiles/omega_h.dir/all] Error 2
make[1]: Leaving directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
make: *** [Makefile:146: all] Error 2
The configuration and build script:
module load PrgEnv-nvidia
module load nvidia/22.7
module load cudatoolkit/11.7
module load craype-accel-nvidia80
module load cmake/3.22.0
module unload darshan
export cuda=$CRAY_CUDATOOLKIT_DIR
export PATH=$cuda/bin:$PATH
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH
export installroot=$PWD
export kk=$installroot/kokkos/install
export srcroot=$installroot/../
export kksrc=$srcroot/kokkos
export oh=$installroot/omega_h/install
export ohsrc=$srcroot/omega_h
cd $installroot
mkdir -p omega_h/build
cd omega_h/build
cmake $ohsrc -DCMAKE_INSTALL_PREFIX=$oh -DBUILD_SHARED_LIBS=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DOmega_h_USE_CUDA=on -DOmega_h_CUDA_ARCH=80 \
-DOmega_h_USE_Kokkos=ON -DOmega_h_USE_MPI=ON \
-DCMAKE_CXX_COMPILER=CC \
-DCMAKE_CXX_FLAGS='-std=c++14' \
-DCMAKE_CUDA_FLAGS="-I$MPICH_DIR/include" \
-DKokkos_PREFIX=$kk/lib64/cmake/
make VERBOSE=1 -j8 install
Previously, I was able to build omega_h
using gcc 11.2.0
, cuda 11.7
, and kokkos 4.0.01
on Perlmutter
.
The reverse classification information returned when the full/original omega-h mesh is given as an input to the API is correct.The reverse classification information returned when the omega-h mesh returned from pumi-pic is given as an input to the API is incorrect.
on cranium rhel7
env:
module purge
module unuse /opt/scorec/spack/lmod/linux-rhel7-x86_64/Core
module use /opt/scorec/spack/v0154_2/lmod/linux-rhel7-x86_64/Core
module load gcc mpich simmetrix-simmodsuite/16.0-210202dev cmake
export MPICH_CXX=g++
config:
cmake $1 \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_MPI=on \
-DCMAKE_CXX_COMPILER=`which mpicxx`
there are multiple errors - the following appears to be a problem with the lambda syntax
cd /lore/cwsmith/develop/simmetrixToOmegah/build-omegah/src && /opt/scorec/spack/v0154_2/install/linux-rhel7-x86_64/gcc-10.1.0/mpich-3.3.2-gi3wrjquyo564rk27x6r2c6ilr7ndmpl/bin/mpicxx -I/lore/cwsmith/develop/simmetrixToOmegah/omega_h/src
-I/lore/cwsmith/develop/simmetrixToOmegah/build-omegah/src -I/lore/cwsmith/develop/simmetrixToOmegah/omega_h/tpl -O3 -g -Werror -Wall -Wextra -Wdouble-promotion -Wshadow -Wformat=2 -Wduplicated-cond -Wnull-dereference -Wlogical-op -Wres
trict -Wduplicated-branches --std=c++11 -Wno-subobject-linkage -o CMakeFiles/omega_h.dir/Omega_h_array_ops.cpp.o -c /lore/cwsmith/develop/simmetrixToOmegah/omega_h/src/Omega_h_array_ops.cpp
/lore/cwsmith/develop/simmetrixToOmegah/omega_h/src/Omega_h_array_ops.cpp: In lambda function:
/lore/cwsmith/develop/simmetrixToOmegah/omega_h/src/Omega_h_array_ops.cpp:17:40: error: expected '{' before '->' token
17 | auto transform = OMEGA_H_LAMBDA(LO i)->bool { return a[i] == b[i]; };
| ^~
cmake output:
../omega_h/doConfigNoCudaNoSim.sh ../omega_h/
-- The CXX compiler identification is GNU 10.1.0
-- Check for working CXX compiler: /opt/scorec/spack/v0154_2/install/linux-rhel7-x86_64/gcc-10.1.0/mpich-3.3.2-gi3wrjquyo564rk27x6r2c6ilr7ndmpl/bin/mpicxx
-- Check for working CXX compiler: /opt/scorec/spack/v0154_2/install/linux-rhel7-x86_64/gcc-10.1.0/mpich-3.3.2-gi3wrjquyo564rk27x6r2c6ilr7ndmpl/bin/mpicxx - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_VERSION: 3.17.3
-- Omega_h_VERSION: 9.27.0
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: OFF
-- CMAKE_INSTALL_PREFIX: /lore/cwsmith/develop/simmetrixToOmegah/build-omegah/install
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA:
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX:
-- Omega_h_USE_Kokkos: OFF
-- Kokkos_PREFIX:
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_USE_SimModSuite: OFF
-- Omega_h_VALGRIND:
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX:
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7")
-- Omega_h_USE_Kokkos: OFF
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_MPI: on
-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: OFF
-- Omega_h_CXX_OPTIMIZE: ON
-- Omega_h_CXX_SYMBOLS: ON
-- Omega_h_ARCH:
-- Omega_h_CXX_W**NINGS: ON
-- Omega_h_CXX_FLAGS:
-- Omega_h_EXTRA_CXX_FLAGS:
-- CMAKE_CXX_FLAGS: -O3 -g -Werror -Wall -Wextra -Wdouble-promotion -Wshadow -Wformat=2 -Wduplicated-cond -Wnull-dereference -Wlogical-op -Wrestrict -Wduplicated-branches --std=c++11 -Wno-subobject-linkage
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 9.27.0-sha.25c1039+10001000000000
-- Configuring done
-- Generating done
-- Build files have been written to: /lore/cwsmith/develop/simmetrixToOmegah/build-omegah
Previously, I was able to build omega_h
on Perlmutter
with kokkos 3.4.01
, gcc 11.2
, and cuda 11.7
without any issue (about 2 weeks ago after the last major maintenance of Perlmutter
). But with a fresh build, I just encountered the following issue.
Searching online, I was able to resolve it by adding -DCMAKE_CXX_FLAGS='-std=c++14'
in the build script. I am wondering what could be the reason for this change @cwsmith.
-- kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos
-- Kokkos_CONFIG: /global/homes/z/zhangc20/xgcm/install_cuda11.7/kokkos/install/lib64/cmake/Kokkos/KokkosConfig.cmake
-- Kokkos_VERSION: 3.4.01
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_OpenMP: ON
-- Omega_h_USE_MPI: ON
-- Omega_h_USE_OpenMP: ON
-- Omega_h_USE_CUDA: on
-- The CUDA compiler identification is NVIDIA 11.7.64
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 10.5.0-sha.a63aff26+100111100000000000
-- Configuring done
-- Generating done
-- Build files have been written to: /global/homes/z/zhangc20/xgcm/install_cuda11.7/omega_h/build
[ 2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o
[ 2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o
[ 2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o
[ 3%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o
[ 4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o
[ 5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o
[ 6%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o
[ 6%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:71:52: error: redefinition of 'constexpr const _Tp std::integral_constant<_Tp, __v>::value'
71 | template<typename _Tp, _Tp __v>
| ^
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:59:29: note: 'constexpr const _Tp value' previously declared here
59 | static constexpr _Tp value = __v;
| ^~~~~
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:160: src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:71:52: error: redefinition of 'constexpr const _Tp std::integral_constant<_Tp, __v>::value'
71 | template<typename _Tp, _Tp __v>
| ^
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:59:29: note: 'constexpr const _Tp value' previously declared here
59 | static constexpr _Tp value = __v;
| ^~~~~
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:71:52: error: redefinition of 'constexpr const _Tp std::integral_constant<_Tp, __v>::value'
71 | template<typename _Tp, _Tp __v>
| ^
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:59:29: note: 'constexpr const _Tp value' previously declared here
59 | static constexpr _Tp value = __v;
| ^~~~~
/global/u2/z/zhangc20/xgcm/install_cuda11.7/kokkos/install/include/impl/Kokkos_ViewMapping.hpp:122:443: error: redefinition of 'constexpr const size_t Kokkos::Impl::ViewDimension0<V, <anonymous> >::ArgN0'
122 | KOKKOS_IMPL_VIEW_DIMENSION(0)
| ^
/global/u2/z/zhangc20/xgcm/install_cuda11.7/kokkos/install/include/impl/Kokkos_ViewMapping.hpp:122:87: note: 'constexpr const size_t Kokkos::Impl::ViewDimension0<V, <anonymous> >::ArgN0' previously declared here
122 | KOKKOS_IMPL_VIEW_DIMENSION(0)
| ^
/global/u2/z/zhangc20/xgcm/install_cuda11.7/kokkos/install/include/impl/Kokkos_ViewMapping.hpp:122:532: error: redefinition of 'constexpr const size_t Kokkos::Impl::ViewDimension0<V, <anonymous> >::N0'
122 | KOKKOS_IMPL_VIEW_DIMENSION(0)
The periodic_test
with a build of master with the Kokkos Serial backend fails with a seg fault. Below is the output of valgrind from one of the two processes; the other process had a similar trace.
$ cat Omega_h_cmake_args.txt
-DBUILD_TESTING:BOOL="on" -DBUILD_SHARED_LIBS:BOOL="on" -DCMAKE_INSTALL_PREFIX:PATH="/space/cwsmith/omegahKkVersions/buildOmegahSimKokkosSerialMpion_master/install" -DOmega_h_USE_Kokkos:BOOL="on" -DKokkos_PREFIX:PATH="/space/cwsmith/omegahKkVersions/buildKokkos/install" -DOmega_h_USE_SimModSuite:BOOL="on" -DOmega_h_USE_MPI:BOOL="on" -DOmega_h_USE_MPI:BOOL="on" -DOmega_h_USE_Kokkos:BOOL="on" -DKokkos_PREFIX:PATH="/space/cwsmith/omegahKkVersions/buildKokkos/install" -DOmega_h_USE_MPI:BOOL="on" -DOmega_h_USE_OpenMP:BOOL="OFF" -DOmega_h_USE_CUDA:BOOL="OFF"
omegah - master @ c5f1dc9d
kokkos - develop @ ed08974c7 (newer than last tagged version of 4.2.00)
simmetrix simmodsuite - 2023.1-230907dev
==3612296== Memcheck, a memory error detector
==3612296== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==3612296== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==3612296== Command: ./src/periodic_test /space/cwsmith/omegahKkVersions/omega_h_master/meshes/wedge_matchZ_12elem.sms /space/cwsmith/omegahKkVersions/omega_h_master/meshes/wedge_match.smd /space/cwsmith/omegahKkVersions/omega_h_master/meshes/wedge_matchZ_12elem_sync_2.osh 2
==3612296== Parent PID: 3612294
==3612296==
==3612296== Invalid read of size 4
==3612296== at 0x6654270: host_atomic_fetch_oper<desul::Impl::sub_operator<int, int const>, int, desul::MemoryOrderRelaxed> (Fetch_Op_ScopeCaller.hpp:44)
==3612296== by 0x6654270: host_atomic_fetch_sub<int, desul::MemoryOrderRelaxed, desul::MemoryScopeCaller> (Fetch_Op_Generic.hpp:40)
==3612296== by 0x6654270: atomic_fetch_sub<int, desul::MemoryOrderRelaxed, desul::MemoryScopeCaller> (Generic.hpp:60)
==3612296== by 0x6654270: atomic_fetch_sub<int> (Kokkos_Atomics_Desul_Wrapper.hpp:83)
==3612296== by 0x6654270: Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) (Kokkos_SharedAlloc.cpp:212)
==3612296== by 0x5213382: assign_direct (Kokkos_SharedAlloc.hpp:477)
==3612296== by 0x5213382: Kokkos::Impl::ViewTracker<Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > >::operator=(Kokkos::Impl::ViewTracker<Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > > const&) (Kokkos_ViewTracker.hpp:79)
==3612296== by 0x521076E: Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::operator=(Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&) (Kokkos_View.hpp:1288)
==3612296== by 0x520BA08: Omega_h::Write<int>::operator=(Omega_h::Write<int> const&) (Omega_h_array.hpp:49)
==3612296== by 0x5221F08: Omega_h::Read<int>::operator=(Omega_h::Read<int> const&) (Omega_h_array.hpp:88)
==3612296== by 0x5451023: Omega_h::Mesh::copy_meta() const (Omega_h_mesh.cpp:1235)
==3612296== by 0x54BE3C9: Omega_h::migrate_mesh(Omega_h::Mesh*, Omega_h::Dist, Omega_h_Parting, bool) (Omega_h_migrate.cpp:383)
==3612296== by 0x544D863: Omega_h::Mesh::balance(bool) (Omega_h_mesh.cpp:956)
==3612296== by 0x41CFCF: main (periodic_test.cpp:61)
==3612296== Address 0x38 is not stack'd, malloc'd or (recently) free'd
==3612296==
==3612296==
==3612296== Process terminating with default action of signal 11 (SIGSEGV)
==3612296== Access not within mapped region at address 0x38
==3612296== at 0x6654270: host_atomic_fetch_oper<desul::Impl::sub_operator<int, int const>, int, desul::MemoryOrderRelaxed> (Fetch_Op_ScopeCaller.hpp:44)
==3612296== by 0x6654270: host_atomic_fetch_sub<int, desul::MemoryOrderRelaxed, desul::MemoryScopeCaller> (Fetch_Op_Generic.hpp:40)
==3612296== by 0x6654270: atomic_fetch_sub<int, desul::MemoryOrderRelaxed, desul::MemoryScopeCaller> (Generic.hpp:60)
==3612296== by 0x6654270: atomic_fetch_sub<int> (Kokkos_Atomics_Desul_Wrapper.hpp:83)
==3612296== by 0x6654270: Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) (Kokkos_SharedAlloc.cpp:212)
==3612296== by 0x5213382: assign_direct (Kokkos_SharedAlloc.hpp:477)
==3612296== by 0x5213382: Kokkos::Impl::ViewTracker<Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > >::operator=(Kokkos::Impl::ViewTracker<Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > > const&) (Kokkos_ViewTracker.hpp:79)
==3612296== by 0x521076E: Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::operator=(Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&) (Kokkos_View.hpp:1288)
==3612296== by 0x520BA08: Omega_h::Write<int>::operator=(Omega_h::Write<int> const&) (Omega_h_array.hpp:49)
==3612296== by 0x5221F08: Omega_h::Read<int>::operator=(Omega_h::Read<int> const&) (Omega_h_array.hpp:88)
==3612296== by 0x5451023: Omega_h::Mesh::copy_meta() const (Omega_h_mesh.cpp:1235)
==3612296== by 0x54BE3C9: Omega_h::migrate_mesh(Omega_h::Mesh*, Omega_h::Dist, Omega_h_Parting, bool) (Omega_h_migrate.cpp:383)
==3612296== by 0x544D863: Omega_h::Mesh::balance(bool) (Omega_h_mesh.cpp:956)
==3612296== by 0x41CFCF: main (periodic_test.cpp:61)
==3612296== If you believe this happened as a result of a stack
==3612296== overflow in your program's main thread (unlikely but
==3612296== possible), you can try to increase the size of the
==3612296== main thread stack using the --main-stacksize= flag.
==3612296== The main thread stack size used in this run was 8388608.
==3612296==
==3612296== HEAP SUMMARY:
==3612296== in use at exit: 13,116,178 bytes in 4,205 blocks
==3612296== total heap usage: 15,374 allocs, 11,169 frees, 14,496,121 bytes allocated
==3612296==
==3612296== LEAK SUMMARY:
==3612296== definitely lost: 0 bytes in 0 blocks
==3612296== indirectly lost: 0 bytes in 0 blocks
==3612296== possibly lost: 10,525 bytes in 206 blocks
==3612296== still reachable: 13,105,653 bytes in 3,999 blocks
==3612296== suppressed: 0 bytes in 0 blocks
==3612296== Rerun with --leak-check=full to see details of leaked memory
==3612296==
==3612296== For lists of detected and suppressed errors, rerun with: -s
==3612296== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
This issue is to document functionality and features added to Omega_h since the scorec-v10.8.4 release.
Looking at the Mesh class I'm not entirely sure
Omega_h::Mesh mesh_a;
mesh_b = mesh_a; // most data is shared via internal shared ptrs, but not all!
mesh_b.set_dim(1); // dim is integer so copied by value...mesh dim in a still default value
mesh_b.set_coords(new_coords); // coords are a tag and tags are shared between mesh instances with shared ptr! now both mesh_a and mesh_b have new coords!
Also, you can't assume that the values stored internally as shared_ptrs will be shared between instances because some items like set_parents()
assign a new shared_ptr, so assigning new values in one instance won't affect the other one.
I suspect most people are passing the mesh via ptr or reference and aren't actually copying a mesh instance which is why we haven't hit the nasty bug/confusion that this will cause.
Since copy doesn't do what people expect I suggest either deprecating or deleting the copy constructor/assignment until we decide on the intended semantics and fix the class to consistently abide by those semantics.
While building with cuda 12.1
and gcc 11.2.0
on a RHEL 7 computer, I encountered the following thrust
error:
[ 1%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o
[ 1%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o
[ 2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o
[ 2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o
[ 2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o
[ 3%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o
[ 3%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o
[ 4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o
[ 4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_array.cpp.o
[ 5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_assoc.cpp.o
[ 5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_array_ops.cpp.o
[ 6%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_base64.cpp.o
/lore/zhangc20/pumi-pic/omega_h/src/Omega_h_reduce.hpp(84): error: namespace "thrust" has no member "device"
return thrust::transform_reduce(thrust::device, first, last,
^
1 error detected in the compilation of "/lore/zhangc20/pumi-pic/omega_h/src/Omega_h_array_ops.cpp".
make[2]: *** [src/CMakeFiles/omega_h.dir/Omega_h_array_ops.cpp.o] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [src/CMakeFiles/omega_h.dir/all] Error 2
make: *** [all] Error 2
The build script:
module unuse /opt/scorec/spack/lmod/linux-rhel7-x86_64/Core
module use /opt/scorec/spack/v0181_1/lmod/linux-rhel7-x86_64/Core
module load gcc/11.2.0
module load mpich/4.0.2
module load cmake
cuda=/usr/local/cuda-12.1
export PATH=$cuda/bin:$PATH
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH
export installroot=$PWD
export srcroot=$installroot/../
# kokkos
export kk=$installroot/kokkos/install
export kksrc=$srcroot/kokkos
# omega_h
export oh=$installroot/omega_h/install
export ohsrc=$srcroot/omega_h
cd $installroot
mkdir -p omega_h/build
cd omega_h/build
cmake $ohsrc -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=$oh \
-DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_Kokkos=ON \
-DOmega_h_USE_CUDA=on \
-DOmega_h_CUDA_ARCH=86 \
-DOmega_h_USE_MPI=on \
-DOmega_h_USE_OpenMP=OFF \
-DBUILD_TESTING=ON \
-DCMAKE_CXX_COMPILER=mpicxx \
-DCMAKE_C_COMPILER=mpicc \
-DKokkos_PREFIX=$kk/lib64/cmake
make -j4 install
ctest
Creating issue and copying relevant discussion from #35
From Dhyan:
/fusion/usc/opt/gcc/gcc-6.3.0/include/c++/6.3.0/bits/unique_ptr.h:158:17: note: candidate expects 0 arguments, 1 provided
In file included from /home/nathd/omega_h/src/Omega_h_array.hpp:4:0,
from /home/nathd/omega_h/src/Omega_h_comm.hpp:7,
from /home/nathd/omega_h/src/Omega_h_array_ops.hpp:6,
from /home/nathd/omega_h/src/Omega_h_rcFields.cpp:4:
/home/nathd/omega_h/src/Omega_h_defines.hpp: In instantiation of ‘auto Omega_h::apply_to_omega_h_types(Omega_h_Type, const F&) [with F = Omega_h::Mesh::get_rc_mesh_tag_from_rc_tag(Omega_h::Int, const Omega_h::TagBase*)::<lambda(auto:1)>]’:
/home/nathd/omega_h/src/Omega_h_rcFields.cpp:340:4: required from here
/home/nathd/omega_h/src/Omega_h_defines.hpp:93:15: error: use of ‘Omega_h::Mesh::get_rc_mesh_tag_from_rc_tag(Omega_h::Int, const Omega_h::TagBase*)::<lambda(auto:1)> [with auto:1 = int]’ before deduction of ‘auto’
return f(I32{});
~^~~~~~~
/home/nathd/omega_h/src/Omega_h_defines.hpp:93:21: error: return-statement with a value, in function returning 'void' [-fpermissive]
return f(I32{});
^
/home/nathd/omega_h/src/Omega_h_defines.hpp:97:15: error: use of ‘Omega_h::Mesh::get_rc_mesh_tag_from_rc_tag(Omega_h::Int, const Omega_h::TagBase*)::<lambda(auto:1)> [with auto:1 = long int]’ before deduction of ‘auto’
return f(I64{});
=====================================================================
Performance of ask_revClass
is reported as being significantly better on scorec rhel7 cranium than AiMOS. The testing below runs the src/reverse_class_test
to exercise this api on the mesh in question.
Note, src/reverse_class_test
also runs a small 2d mesh.... so the timings aren't directly from the 146k tet mesh in question commenting out the part of the test that runs on the 2d mesh (here) results in less than a one second change in the ask_revClass
time.
/lore/nathd/INSTALLATION/omega_h-install/bin/Tomms_15ge_16me/mesh_15_16.osh
ask_revClass
on AiMOS (dcs090), built using the script buildAllKk.sh
(pasted below). This script is the kokkos+omega_h steps from the GITRm Build Instructions.
--osh-time
is listed below: kk.log
ask_revClass
on AiMOS (dcs090), built using the script buildAllNoKk.sh
(pasted below).
--osh-time
is listed below: noKk.log
ask_revClass
on SCOREC Rhel7 cranium, built using the script buildAllNoKk_scorec.sh
. The script uses the Build with Kokkos Disabled instructions.
--osh-time
is listed below: scorec_noKk.log
ask_revClass
time that was within 1 second of the cuda10.2 build & run--osh-pool
, for all builds, reduces the run time by less than a second in ask_revClass
module load spectrum-mpi/10.4
module load cmake/3.20.0
module load cuda/11.1
export root=$PWD
export OMPI_CXX=$root/kokkos/bin/nvcc_wrapper
export OMPI_CC=gcc
[ "$root" != "$PWD" ] && exit 1
build=build-dcsRhel8-gcc74
export kk=$root/${build}-kokkos/install
export oh=$root/${build}-omegah/install
export CMAKE_PREFIX_PATH=$kk:$oh:$CMAKE_PREFIX_PATH
#kokkos
cd $root
#git clone https://github.com/kokkos/kokkos.git
cd kokkos
git checkout 3.1.00
cd -
[ -d $kk ] && rm -rf ${kk%%install}
mkdir -p $kk
cd ${kk%%install}
cmake ../kokkos \
-DCMAKE_CXX_COMPILER=$root/kokkos/bin/nvcc_wrapper \
-DKokkos_ARCH_VOLTA70=ON \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_OPENMP=off \
-DKokkos_ENABLE_CUDA=on \
-DKokkos_ENABLE_CUDA_LAMBDA=on \
-DKokkos_ENABLE_DEBUG=on \
-DKokkos_ENABLE_PROFILING=on \
-DCMAKE_INSTALL_PREFIX=$kk
make -j 24 install
# Omega_h
cd $root
git clone https://github.com/SCOREC/omega_h.git
cd omega_h
git checkout 466e4702
cd $root
[ -d $oh ] && rm -rf ${oh%%install}
mkdir -p $oh
cd ${oh%%install}
cmake ../omega_h \
-DCMAKE_INSTALL_PREFIX=$oh \
-DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_Kokkos=ON \
-DOmega_h_USE_CUDA=on \
-DOmega_h_CUDA_ARCH=70 \
-DOmega_h_USE_MPI=on \
-DBUILD_TESTING=on \
-DCMAKE_CXX_COMPILER=g++ \
-DKokkos_PREFIX=$kk/lib64/cmake
make reverse_class_test -j8
#allocate a node: salloc -n 1 --gres=gpu:6 -t60
#ssh to it, and run the following commands
#note, the environment (at the top of this script) needs to be setup
mpirun -np 1 --bind-to core ./src/reverse_class_test --kokkos-num-devices=1 --osh-time ../omega_h/meshes/plate_6elem.osh /gpfs/u/home/MPFS/MPFSsmth/killme/mesh_15_16.osh &> kk.log
mpirun -np 1 --bind-to core ./src/reverse_class_test --kokkos-num-devices=1 --osh-pool --osh-time ../omega_h/meshes/plate_6elem.osh /gpfs/u/home/MPFS/MPFSsmth/killme/mesh_15_16.osh &> kk_pool.log
module load spectrum-mpi/10.4
module load cmake/3.20.0
module load cuda/11.1
export root=$PWD
export OMPI_CXX=g++
export OMPI_CC=gcc
git clone https://github.com/SCOREC/omega_h.git
cd omega_h
git checkout 466e4702
cd $root
d=build-dcsRhel8-gcc74-omegahNoKk
[[ ! -d $d ]] && mkdir -p $d
cd $d
cmake $root/omega_h \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_Kokkos=OFF \
-DOmega_h_USE_CUDA=on \
-DOmega_h_CUDA_ARCH=70 \
-DOmega_h_USE_MPI=on \
-DBUILD_TESTING=on \
-DCMAKE_CXX_COMPILER=g++
make reverse_class_test -j8
#allocate a node: salloc -n 1 --gres=gpu:6 -t60
#ssh to it, and run the following commands
#note, the environment (at the top of this script) needs to be setup
mpirun -np 1 --bind-to core ./src/reverse_class_test --osh-time ../omega_h/meshes/plate_6elem.osh /gpfs/u/home/MPFS/MPFSsmth/killme/mesh_15_16.osh &> noKk.log
mpirun -np 1 --bind-to core ./src/reverse_class_test --osh-pool --osh-time ../omega_h/meshes/plate_6elem.osh /gpfs/u/home/MPFS/MPFSsmth/killme/mesh_15_16.osh &> noKk_pool.log
module use /opt/scorec/spack/dev/lmod/linux-rhel7-x86_64/Core
module unuse /opt/scorec/spack/lmod/linux-rhel7-x86_64/Core
module load gcc/7.4.0-c5aaloy cuda/10.2
module load mpich/3.3.1-bfezl2l
module load cmake
export root=$PWD
export MPICH_CXX=g++
export MPICH_CC=gcc
git clone https://github.com/SCOREC/omega_h.git
cd omega_h
git checkout 466e4702
cd $root
d=build-rhel7-gcc74-omegahNoKk
[[ ! -d $d ]] && mkdir -p $d
cd $d
cmake $root/omega_h \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DBUILD_SHARED_LIBS=ON \
-DOmega_h_USE_Kokkos=OFF \
-DOmega_h_USE_CUDA=on \
-DOmega_h_CUDA_ARCH=75 \
-DOmega_h_USE_MPI=on \
-DBUILD_TESTING=on
make reverse_class_test -j8
tommsMesh=/space/cwsmith/omegahRcTesting/mesh_15_16.osh
mpirun -np 1 --bind-to core ./src/reverse_class_test --osh-time ../omega_h/meshes/plate_6elem.osh $tommsMesh &> noKk.log
mpirun -np 1 --bind-to core ./src/reverse_class_test --osh-pool --osh-time ../omega_h/meshes/plate_6elem.osh $tommsMesh &> noKk_pool.log
TOP-DOWN:
=========
ask_revClass 65.0243 14
| derive_revClass 65.024 7
| | sort_by_high_index 65.0188 7
| | Write allocation 0.00182296 28
| | offset_scan 0.000607692 7
| | | Write allocation 0.000253348 7
| | | device_free 9.2566e-05 7
| | | single host to device 6.5203e-05 7
| | | device_malloc 4.825e-05 7
| offset_scan 8.2885e-05 1
| | Write allocation 3.4471e-05 1
| | device_free 1.196e-05 1
| | single host to device 8.742e-06 1
| | device_malloc 6.375e-06 1
| Write allocation 7.0067e-05 2
| ask_revClass 6.39e-07 1
binary::read(path, comm, mesh, strict) 0.0694269 2
| binary::read_in_comm(path, comm, mesh, version) 0.067175 2
| | binary::read(istream, mesh, version) 0.0669596 2
| | | Write allocation 0.0028541 27
| | | array host to device 0.00136568 27
| | | set_ents 1.7123e-05 5
ask_adj 0.00121137 6
| derive_adj 0.00120489 2
| | transit 0.00119605 2
| | | Write allocation 0.000442638 2
| | ask_adj 2.695e-06 4
Write allocation 0.000750138 21
array host to device 0.000208103 9
BOTTOM-UP:
==========
sort_by_high_index 65.0188 7
| derive_revClass 65.0188 7
| | ask_revClass 65.0188 7
binary::read(istream, mesh, version) 0.0627227 2
| binary::read_in_comm(path, comm, mesh, version) 0.0627227 2
| | binary::read(path, comm, mesh, strict) 0.0627227 2
Write allocation 0.00622772 88
| binary::read(istream, mesh, version) 0.0028541 27
| | binary::read_in_comm(path, comm, mesh, version) 0.0028541 27
| | | binary::read(path, comm, mesh, strict) 0.0028541 27
| derive_revClass 0.00182296 28
| | ask_revClass 0.00182296 28
| transit 0.000442638 2
| | derive_adj 0.000442638 2
| | | ask_adj 0.000442638 2
| offset_scan 0.000287819 8
| | derive_revClass 0.000253348 7
| | | ask_revClass 0.000253348 7
| | ask_revClass 3.4471e-05 1
| ask_revClass 7.0067e-05 2
derive_revClass 0.00270772 7
| ask_revClass 0.00270772 7
binary::read(path, comm, mesh, strict) 0.00225191 2
array host to device 0.00157378 36
| binary::read(istream, mesh, version) 0.00136568 27
| | binary::read_in_comm(path, comm, mesh, version) 0.00136568 27
| | | binary::read(path, comm, mesh, strict) 0.00136568 27
transit 0.000753416 2
| derive_adj 0.000753416 2
| | ask_adj 0.000753416 2
binary::read_in_comm(path, comm, mesh, version) 0.000215453 2
| binary::read(path, comm, mesh, strict) 0.000215453 2
offset_scan 0.000169662 8
| derive_revClass 0.000148325 7
| | ask_revClass 0.000148325 7
| ask_revClass 2.1337e-05 1
ask_revClass 0.000161856 15
| ask_revClass 6.39e-07 1
device_free 0.000104526 8
| offset_scan 0.000104526 8
| | derive_revClass 9.2566e-05 7
| | | ask_revClass 9.2566e-05 7
| | ask_revClass 1.196e-05 1
single host to device 7.3945e-05 8
| offset_scan 7.3945e-05 8
| | derive_revClass 6.5203e-05 7
| | | ask_revClass 6.5203e-05 7
| | ask_revClass 8.742e-06 1
device_malloc 5.4625e-05 8
| offset_scan 5.4625e-05 8
| | derive_revClass 4.825e-05 7
| | | ask_revClass 4.825e-05 7
| | ask_revClass 6.375e-06 1
set_ents 1.7123e-05 5
| binary::read(istream, mesh, version) 1.7123e-05 5
| | binary::read_in_comm(path, comm, mesh, version) 1.7123e-05 5
| | | binary::read(path, comm, mesh, strict) 1.7123e-05 5
ask_adj 9.174e-06 10
| derive_adj 2.695e-06 4
| | ask_adj 2.695e-06 4
derive_adj 6.143e-06 2
| ask_adj 6.143e-06 2
TOP-DOWN:
=========
ask_revClass 59.7207 14
| derive_revClass 59.7205 7
| | sort_by_high_index 59.7161 7
| | Write allocation 0.00103472 28
| | | device_malloc 0.00101018 28
| | device_free 0.000796198 21
| | offset_scan 0.000410892 7
| | | device_free 9.0479e-05 7
| | | single host to device 7.3759e-05 7
| | | Write allocation 6.1759e-05 7
| | | | device_malloc 5.6407e-05 7
| | | device_malloc 4.5161e-05 7
| offset_scan 5.4773e-05 1
| | device_free 1.2484e-05 1
| | single host to device 9.47e-06 1
| | Write allocation 7.138e-06 1
| | | device_malloc 5.912e-06 1
| | device_malloc 6.299e-06 1
| Write allocation 1.4493e-05 2
| | device_malloc 1.252e-05 2
| device_free 9.044e-06 1
| ask_revClass 4.2e-07 1
binary::read(path, comm, mesh, strict) 0.0675819 2
| binary::read_in_comm(path, comm, mesh, version) 0.0658969 2
| | binary::read(istream, mesh, version) 0.0658028 2
| | | Write allocation 0.00277324 27
| | | | device_malloc 0.00275041 27
| | | array host to device 0.000851867 27
| | | set_ents 1.3327e-05 5
device_free 0.00267911 66
ask_adj 0.00110366 6
| derive_adj 0.00109784 2
| | transit 0.00108981 2
| | | Write allocation 0.000376341 2
| | | | device_malloc 0.000372779 2
| | ask_adj 2.247e-06 4
Write allocation 0.00047037 21
| device_malloc 0.000456152 21
array host to device 8.6368e-05 9
BOTTOM-UP:
==========
sort_by_high_index 59.7161 7
| derive_revClass 59.7161 7
| | ask_revClass 59.7161 7
binary::read(istream, mesh, version) 0.0621644 2
| binary::read_in_comm(path, comm, mesh, version) 0.0621644 2
| | binary::read(path, comm, mesh, strict) 0.0621644 2
device_malloc 0.00471582 96
| Write allocation 0.00466436 88
| | binary::read(istream, mesh, version) 0.00275041 27
| | | binary::read_in_comm(path, comm, mesh, version) 0.00275041 27
| | | | binary::read(path, comm, mesh, strict) 0.00275041 27
| | derive_revClass 0.00101018 28
| | | ask_revClass 0.00101018 28
| | transit 0.000372779 2
| | | derive_adj 0.000372779 2
| | | | ask_adj 0.000372779 2
| | offset_scan 6.2319e-05 8
| | | derive_revClass 5.6407e-05 7
| | | | ask_revClass 5.6407e-05 7
| | | ask_revClass 5.912e-06 1
| | ask_revClass 1.252e-05 2
| offset_scan 5.146e-05 8
| | derive_revClass 4.5161e-05 7
| | | ask_revClass 4.5161e-05 7
| | ask_revClass 6.299e-06 1
device_free 0.00358732 96
| derive_revClass 0.000796198 21
| | ask_revClass 0.000796198 21
| offset_scan 0.000102963 8
| | derive_revClass 9.0479e-05 7
| | | ask_revClass 9.0479e-05 7
| | ask_revClass 1.2484e-05 1
| ask_revClass 9.044e-06 1
derive_revClass 0.00212429 7
| ask_revClass 0.00212429 7
binary::read(path, comm, mesh, strict) 0.00168493 2
array host to device 0.000938235 36
| binary::read(istream, mesh, version) 0.000851867 27
| | binary::read_in_comm(path, comm, mesh, version) 0.000851867 27
| | | binary::read(path, comm, mesh, strict) 0.000851867 27
transit 0.000713472 2
| derive_adj 0.000713472 2
| | ask_adj 0.000713472 2
offset_scan 0.000159116 8
| derive_revClass 0.000139734 7
| | ask_revClass 0.000139734 7
| ask_revClass 1.9382e-05 1
ask_revClass 0.000151177 15
| ask_revClass 4.2e-07 1
binary::read_in_comm(path, comm, mesh, version) 9.4113e-05 2
| binary::read(path, comm, mesh, strict) 9.4113e-05 2
single host to device 8.3229e-05 8
| offset_scan 8.3229e-05 8
| | derive_revClass 7.3759e-05 7
| | | ask_revClass 7.3759e-05 7
| | ask_revClass 9.47e-06 1
Write allocation 7.37e-05 88
| derive_revClass 2.4536e-05 28
| | ask_revClass 2.4536e-05 28
| binary::read(istream, mesh, version) 2.2833e-05 27
| | binary::read_in_comm(path, comm, mesh, version) 2.2833e-05 27
| | | binary::read(path, comm, mesh, strict) 2.2833e-05 27
| offset_scan 6.578e-06 8
| | derive_revClass 5.352e-06 7
| | | ask_revClass 5.352e-06 7
| | ask_revClass 1.226e-06 1
| transit 3.562e-06 2
| | derive_adj 3.562e-06 2
| | | ask_adj 3.562e-06 2
| ask_revClass 1.973e-06 2
set_ents 1.3327e-05 5
| binary::read(istream, mesh, version) 1.3327e-05 5
| | binary::read_in_comm(path, comm, mesh, version) 1.3327e-05 5
| | | binary::read(path, comm, mesh, strict) 1.3327e-05 5
ask_adj 8.063e-06 10
| derive_adj 2.247e-06 4
| | ask_adj 2.247e-06 4
derive_adj 5.779e-06 2
| ask_adj 5.779e-06 2
TOP-DOWN:
=========
ask_revClass 42.1864 14
| derive_revClass 42.1862 7
| | sort_by_high_index 42.1842 7
| | Write allocation 0.000283144 28
| | | device_malloc 0.000263804 28
| | device_free 0.000178336 21
| | offset_scan 0.000176175 7
| | | device_free 3.9569e-05 7
| | | single host to device 2.341e-05 7
| | | Write allocation 2.218e-05 7
| | | | device_malloc 1.934e-05 7
| | | device_malloc 1.7779e-05 7
| offset_scan 2.5029e-05 1
| | device_free 6.579e-06 1
| | Write allocation 2.89e-06 1
| | | device_malloc 2.13e-06 1
| | single host to device 2.89e-06 1
| | device_malloc 2.42e-06 1
| Write allocation 6.04e-06 2
| | device_malloc 5.06e-06 2
| ask_revClass 3.56e-06 1
| device_free 2.52e-06 1
binary::read(path, comm, mesh, strict) 0.0365693 2
| binary::read_in_comm(path, comm, mesh, version) 0.0363525 2
| | binary::read(istream, mesh, version) 0.0363086 2
| | | array host to device 0.000896288 27
| | | Write allocation 0.000690983 27
| | | | device_malloc 0.000660933 27
| | | set_ents 6.7368e-05 5
device_free 0.000542124 66
ask_adj 0.000218783 6
| derive_adj 0.000211903 2
| | transit 0.000204544 2
| | | Write allocation 8.1698e-05 2
| | | | device_malloc 7.9838e-05 2
| | ask_adj 1.62e-06 4
Write allocation 0.000125347 21
| device_malloc 0.000116357 21
array host to device 2.5568e-05 9
BOTTOM-UP:
==========
sort_by_high_index 42.1842 7
| derive_revClass 42.1842 7
| | ask_revClass 42.1842 7
binary::read(istream, mesh, version) 0.0346539 2
| binary::read_in_comm(path, comm, mesh, version) 0.0346539 2
| | binary::read(path, comm, mesh, strict) 0.0346539 2
derive_revClass 0.00134197 7
| ask_revClass 0.00134197 7
device_malloc 0.00116766 96
| Write allocation 0.00114746 88
| | binary::read(istream, mesh, version) 0.000660933 27
| | | binary::read_in_comm(path, comm, mesh, version) 0.000660933 27
| | | | binary::read(path, comm, mesh, strict) 0.000660933 27
| | derive_revClass 0.000263804 28
| | | ask_revClass 0.000263804 28
| | transit 7.9838e-05 2
| | | derive_adj 7.9838e-05 2
| | | | ask_adj 7.9838e-05 2
| | offset_scan 2.147e-05 8
| | | derive_revClass 1.934e-05 7
| | | | ask_revClass 1.934e-05 7
| | | ask_revClass 2.13e-06 1
| | ask_revClass 5.06e-06 2
| offset_scan 2.0199e-05 8
| | derive_revClass 1.7779e-05 7
| | | ask_revClass 1.7779e-05 7
| | ask_revClass 2.42e-06 1
array host to device 0.000921856 36
| binary::read(istream, mesh, version) 0.000896288 27
| | binary::read_in_comm(path, comm, mesh, version) 0.000896288 27
| | | binary::read(path, comm, mesh, strict) 0.000896288 27
device_free 0.000769128 96
| derive_revClass 0.000178336 21
| | ask_revClass 0.000178336 21
| offset_scan 4.6148e-05 8
| | derive_revClass 3.9569e-05 7
| | | ask_revClass 3.9569e-05 7
| | ask_revClass 6.579e-06 1
| ask_revClass 2.52e-06 1
binary::read(path, comm, mesh, strict) 0.000216783 2
ask_revClass 0.000129035 15
| ask_revClass 3.56e-06 1
transit 0.000122846 2
| derive_adj 0.000122846 2
| | ask_adj 0.000122846 2
offset_scan 8.3487e-05 8
| derive_revClass 7.3237e-05 7
| | ask_revClass 7.3237e-05 7
| ask_revClass 1.025e-05 1
set_ents 6.7368e-05 5
| binary::read(istream, mesh, version) 6.7368e-05 5
| | binary::read_in_comm(path, comm, mesh, version) 6.7368e-05 5
| | | binary::read(path, comm, mesh, strict) 6.7368e-05 5
Write allocation 6.482e-05 88
| binary::read(istream, mesh, version) 3.005e-05 27
| | binary::read_in_comm(path, comm, mesh, version) 3.005e-05 27
| | | binary::read(path, comm, mesh, strict) 3.005e-05 27
| derive_revClass 1.934e-05 28
| | ask_revClass 1.934e-05 28
| offset_scan 3.6e-06 8
| | derive_revClass 2.84e-06 7
| | | ask_revClass 2.84e-06 7
| | ask_revClass 7.6e-07 1
| transit 1.86e-06 2
| | derive_adj 1.86e-06 2
| | | ask_adj 1.86e-06 2
| ask_revClass 9.8e-07 2
binary::read_in_comm(path, comm, mesh, version) 4.3949e-05 2
| binary::read(path, comm, mesh, strict) 4.3949e-05 2
single host to device 2.63e-05 8
| offset_scan 2.63e-05 8
| | derive_revClass 2.341e-05 7
| | | ask_revClass 2.341e-05 7
| | ask_revClass 2.89e-06 1
ask_adj 8.5e-06 10
| derive_adj 1.62e-06 4
| | ask_adj 1.62e-06 4
derive_adj 5.739e-06 2
| ask_adj 5.739e-06 2
There is a non-deterministic failure of run_mpi_tests
when building with CUDA UVM on Perlmutter. See #67 for details.
From the discussion in the TOMMS issue here: https://github.com/SCOREC/tomms/issues/17
The simmetrix -> omegah mesh converter is calling classify_equal_order
to set the geometric model entity id and dimension:
omega_h/src/Omega_h_meshsim.cpp
Line 136 in 1235d6f
clasify_equal_order
is defined here:
Lines 168 to 190 in 1235d6f
It isn't clear if this is necessary since we are explicitly querying the classification info from simmetrix; I suspect we can just set the "class_dim" and "class_id" tags directly.
Running the mesh adaptation driver (gis_adapt.cc) on the attached mesh (gis.osh.zip) results in the following error in find_matches
during coarsening. Note, this was run in serial.
Here is the velocity field being passed into get_variation_metrics(...)
which, IIUC, is computing the hessian of the velocity and using that for the target_metric
.
initMeshVelocityField
Lines 141 to 143 in 8a5aaf6
Starting program: /space/cwsmith/landice/gisExoAdaptDemo/gis_adapt gis.osh adapted
warning: File "/opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-7.4.0/gcc-12.3.0-iil3lnovyknyxf7pec36wljem3fntjd5/lib64/libstdc++.so.6.0.30-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-7.4.0/python-3.10.10-3ixwq6g4ulierat3brscaje6qvjt4b6j/bin/python3.10-gdb.py".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff2cac640 (LWP 871949)]
[New Thread 0x7ffff24ab640 (LWP 871950)]
temperature_01temperature_02temperature_03temperature_04temperature_05temperature_06temperature_07temperature_08temperature_09temperature_10temperature_11nTri: 3690
Tag Properties by Dimension: (Name, Dim, Type, Number of Components, Min. Value, Max. Value)
global 0 I64 1 0.00 2086.00
coordinates 0 F64 2 -3240.00 464.00
class_dim 0 I8 1 0.00 2.00
class_id 0 I32 1 -1.00 1.00
basal_friction 0 F64 1 0.10 167.33
ice_thickness 0 F64 1 0.00 3.29
solution_1 0 F64 1 0.00 0.00
solution_2 0 F64 1 0.00 0.00
surface_height 0 F64 1 0.00 3.23
temperature_01 0 F64 1 244.02 273.15
temperature_02 0 F64 1 243.97 273.15
temperature_03 0 F64 1 242.55 273.15
temperature_04 0 F64 1 241.27 273.15
temperature_05 0 F64 1 240.65 273.15
temperature_06 0 F64 1 240.36 272.82
temperature_07 0 F64 1 240.24 272.25
temperature_08 0 F64 1 240.19 271.51
temperature_09 0 F64 1 240.16 268.13
temperature_10 0 F64 1 240.15 266.42
temperature_11 0 F64 1 240.15 266.49
velocity 0 F64 2 -27.13 21.53
target_metric 0 F64 3 -0.00 0.00
global 1 I64 1 0.00 5780.00
class_dim 1 I8 1 1.00 2.00
class_id 1 I32 1 0.00 1.00
global 2 I64 1 0.00 3689.00
class_dim 2 I8 1 2.00 2.00
class_id 2 I32 1 1.00 1.00
mass 2.62738e+09
approach_metric moved by factor 0.25
before adapting:
3690 triangles, quality [0.40,1.00], 1 in [0.30,0.40], 3689 >0.40
5781 edges, length [0.17,1.27], 5113 <0.71, 668 in [0.71,1.41]
addressing edge lengths
coarsening 554 vertices
gis_adapt: /space/cwsmith/landice/omegah/src/Omega_h_adj.cpp:451: Omega_h::find_matches_deg<2, int>(LOs, Read<int>, Read<int>, Adj, Write<int>*, Write<signed char>*, bool)::<lambda(Omega_h::LO)>: Assertion `!found' failed.
Thread 1 "gis_adapt" received signal SIGABRT, Aborted.
0x00007ffff5aa154c in __pthread_kill_implementation () from /lib64/libc.so.6
(ins)(gdb) where
#0 0x00007ffff5aa154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007ffff5a54d46 in raise () from /lib64/libc.so.6
#2 0x00007ffff5a287f3 in abort () from /lib64/libc.so.6
#3 0x00007ffff5a2871b in __assert_fail_base.cold () from /lib64/libc.so.6
#4 0x00007ffff5a4dce6 in __assert_fail () from /lib64/libc.so.6
#5 0x00007ffff7974509 in Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}::operator()(int) const (__closure=0x7ffffffe3e90, a=244) at /space/cwsmith/landice/omegah/src/Omega_h_adj.cpp:451
#6 0x00007ffff79819f7 in Kokkos::Impl::ParallelFor<Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial, Kokkos::Schedule<Kokkos::Static>, int>, Kokkos::Serial>::exec<void>() const (this=0x7ffffffe3e90) at /space/cwsmith/landice/buildTrilinosGcc/install/include/Serial/Kokkos_Serial_Parallel_Range.hpp:37
#7 0x00007ffff797ffc2 in Kokkos::Impl::ParallelFor<Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial, Kokkos::Schedule<Kokkos::Static>, int>, Kokkos::Serial>::execute() const (this=0x7ffffffe3e90) at /space/cwsmith/landice/buildTrilinosGcc/install/include/Serial/Kokkos_Serial_Parallel_Range.hpp:52
#8 0x00007ffff797ec52 in Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial, Kokkos::Schedule<Kokkos::Static>, int>, Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}, void>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Serial, Kokkos::Schedule<Kokkos::Static>, int> const&, Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1} const&) (str=..., policy=..., functor=...) at /space/cwsmith/landice/buildTrilinosGcc/install/include/Kokkos_Parallel.hpp:144
#9 0x00007ffff797de03 in Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial, Kokkos::Schedule<Kokkos::Static>, int>, Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}>(Kokkos::RangePolicy<Kokkos::Serial, Kokkos::Schedule<Kokkos::Static>, int> const&, Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1} const&, std::enable_if<Kokkos::is_execution_policy<Kokkos::RangePolicy<Kokkos::Serial, Kokkos::Schedule<Kokkos::Static>, int> >::value, void>::type*) (policy=..., functor=...)
at /space/cwsmith/landice/buildTrilinosGcc/install/include/Kokkos_Parallel.hpp:153
#10 0x00007ffff797c26a in Omega_h::for_each<Omega_h::IntIterator, Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}&>(Omega_h::IntIterator, Omega_h::IntIterator, Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}&) (first=...,
last=..., f=...) at /space/cwsmith/landice/omegah/src/Omega_h_for.hpp:70
#11 0x00007ffff79796b0 in Omega_h::parallel_for<Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}>(int, Omega_h::find_matches_deg<2, int>(Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Read<int>, Omega_h::Adj, Omega_h::Write<int>*, Omega_h::Write<signed char>*, bool)::{lambda(int)#1}&&) (n=4863, f=...) at /space/cwsmith/landice/omegah/src/Omega_h_for.hpp:90
#12 0x00007ffff7974a9b in Omega_h::find_matches_deg<2, int> (a2fv=..., av2v=..., bv2v=..., v2b=..., a2b_out=0x7ffffffe4a60, codes_out=0x7ffffffe4a40,
allow_duplicates=false) at /space/cwsmith/landice/omegah/src/Omega_h_adj.cpp:461
#13 0x00007ffff79719d5 in Omega_h::find_matches_ex<int> (deg=2, a2fv=..., av2v=..., bv2v=..., v2b=..., a2b_out=0x7ffffffe4a60, codes_out=0x7ffffffe4a40,
allow_duplicates=false) at /space/cwsmith/landice/omegah/src/Omega_h_adj.cpp:471
#14 0x00007ffff796012a in Omega_h::find_matches (family=OMEGA_H_SIMPLEX, dim=1, av2v=..., bv2v=..., v2b=..., a2b_out=0x7ffffffe4a60, codes_out=0x7ffffffe4a40)
at /space/cwsmith/landice/omegah/src/Omega_h_adj.cpp:489
#15 0x00007ffff7960561 in Omega_h::reflect_down (hv2v=..., lv2v=..., v2l=..., family=OMEGA_H_SIMPLEX, high_dim=2, low_dim=1)
at /space/cwsmith/landice/omegah/src/Omega_h_adj.cpp:506
#16 0x00007ffff7c2a217 in Omega_h::modify_conn (old_mesh=0x7ffffffe82f0, new_mesh=0x7ffffffe6cb0, ent_dim=2, prod_verts2verts=..., prods2new_ents=...,
same_ents2old_ents=..., same_ents2new_ents=..., old_lows2new_lows=...) at /space/cwsmith/landice/omegah/src/Omega_h_modify.cpp:36
#17 0x00007ffff7c301b6 in Omega_h::modify_ents (old_mesh=0x7ffffffe82f0, new_mesh=0x7ffffffe6cb0, ent_dim=2, mods2mds=..., mds_are_mods=..., mods2prods=...,
prod_verts2verts=..., old_lows2new_lows=..., keep_mods=false, mods_can_be_shared=false, p_prods2new_ents=0x7ffffffe6b90, p_same_ents2old_ents=0x7ffffffe6bb0,
p_same_ents2new_ents=0x7ffffffe6bd0, p_old_ents2new_ents=0x7ffffffe6bf0) at /space/cwsmith/landice/omegah/src/Omega_h_modify.cpp:503
#18 0x00007ffff7c2f618 in Omega_h::modify_ents_adapt (old_mesh=0x7ffffffe82f0, new_mesh=0x7ffffffe6cb0, ent_dim=2, key_dim=0, keys2kds=..., keys2prods=...,
prod_verts2verts=..., old_lows2new_lows=..., p_prods2new_ents=0x7ffffffe6b90, p_same_ents2old_ents=0x7ffffffe6bb0, p_same_ents2new_ents=0x7ffffffe6bd0,
p_old_ents2new_ents=0x7ffffffe6bf0) at /space/cwsmith/landice/omegah/src/Omega_h_modify.cpp:462
#19 0x00007ffff7a7f766 in Omega_h::coarsen_element_based2 (mesh=0x7ffffffe82f0, opts=...) at /space/cwsmith/landice/omegah/src/Omega_h_coarsen.cpp:145
#20 0x00007ffff7a7ff39 in Omega_h::coarsen (mesh=0x7ffffffe82f0, opts=..., overshoot=Omega_h::DESIRED, improve=Omega_h::DONT_IMPROVE)
at /space/cwsmith/landice/omegah/src/Omega_h_coarsen.cpp:176
#21 0x00007ffff7a80240 in Omega_h::coarsen_verts (mesh=0x7ffffffe82f0, opts=..., vert_marks=..., overshoot=Omega_h::DESIRED, improve=Omega_h::DONT_IMPROVE)
at /space/cwsmith/landice/omegah/src/Omega_h_coarsen.cpp:197
#22 0x00007ffff7a80411 in Omega_h::coarsen_ents (mesh=0x7ffffffe82f0, opts=..., ent_dim=1, marks=..., overshoot=Omega_h::DESIRED, improve=Omega_h::DONT_IMPROVE)
at /space/cwsmith/landice/omegah/src/Omega_h_coarsen.cpp:203
#23 0x00007ffff7a806b9 in Omega_h::coarsen_by_size (mesh=0x7ffffffe82f0, opts=...) at /space/cwsmith/landice/omegah/src/Omega_h_coarsen.cpp:213
#24 0x00007ffff7951b99 in Omega_h::satisfy_lengths (mesh=0x7ffffffe82f0, opts=...) at /space/cwsmith/landice/omegah/src/Omega_h_adapt.cpp:184
#25 0x00007ffff7952533 in Omega_h::adapt (mesh=0x7ffffffe82f0, opts=...) at /space/cwsmith/landice/omegah/src/Omega_h_adapt.cpp:289
#26 0x0000000000429007 in main (argc=3, argv=0x7ffffffe8ec8) at /space/cwsmith/landice/omegah/src/gis_adapt.cpp:160
Kokkos develop@63a3cef18 was built with:
cmake -S kokkos -B buildKokkos -DCMAKE_CXX_COMPILER=g++ -DBUILD_SHARED_LIBS=ON -DKokkos_ENABLE_SERIAL=ON -DKokkos_ENABLE_DEBUG=on -DKokkos_ENABLE_TESTS=off -DCMAKE_INSTALL_PREFIX=buildKokkos/install
cmake --build buildKokkos -j 24 --target install
Omega_h cws/gisAdapt_linearInterp@ccf91e8 was built with:
cmake -S omega_h -B buildOmegahKokkos -DCMAKE_INSTALL_PREFIX=buildOmegahKokkos/install -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTING=off -DOmega_h_USE_Kokkos=on -DKokkos_PREFIX=buildKokkos/install -DBUILD_SHARED_LIBS=on -DBUILD_TESTING=on
cmake --build buildOmegahKokkos --target install -j8
There is a pattern in the vtk writer and unmap where the write_array
function is surrounded by calls to change_tagToMesh
and change_tagTorc
.
There is a change in the set_tags
function which makes it invalidate a pointer to the old tag when the array gets swapped. So, using change_tagToMesh
/change_tagTorc
has become silently dangerous because of the tag invalidation.
I'm not exactly sure what the difference in the formats is, but it seems as though the baseline storage of the reverse classification on the mesh should be in the "rc" format and it's just converted to the mesh format to write it out or do whatever unmap
does.
@joshia5 I'd like to get your opinion on the following change because it seems that it will save some work/memory.
_rc
tags in "rc" format (looks like this is currently the intent)change_tagToMesh
and change_tagTorc
rc_tag_to_mesh_array
which returns the rc tag array in the "mesh" format.Using the latest master branch code at commit: 466e470, I am having problem converting a *.sms
file to *.osh
file, following the instruction: https://github.com/SCOREC/pumi-pic/wiki/simmetrix-to-omegah-mesh-conversion. The short error message is as below:
k_gene_comp_case5_fix.eqd.smd 37kmesh.osh
tet=0, hex=0, wedge=0, pyramid=0
assertion !has_ents(ent_dim) failed at /lore/zhangc20/pumi-pic/omega_h/src/Omega_h_mesh.cpp +112
Aborted (core dumped)
The *.sms
file was generated using simmetrix-simmodsuite/16.0-210606dev
. Similarly, meshsim2osh
was generated using the same version. The conversion was working fine with everything generated using simmetrix-simmodsuite/15.0-200714-x7ycno2
.
The case to reproduce this: https://github.com/SCOREC/xgc1_data/tree/master/Cyclone_ITG/Cyclone_ITG_deltaf_37kmesh.
With the current IO routines the vtkGhostType
array is only written out if the partition type is set to OMEGA_H_GHOSTED
Lines 39 to 41 in 00131eb
As far as I can tell from the limited information available about the ghosting in pvtu
file formats, it looks like this is an array that should exist for both cells and points and is essentially just whether an entity is owned in a specific partition.
@cwsmith @joshia5 do you know if there is a specific reason why OMEGA_H_VERT_BASED
and OMEGA_H_ELEM_BASED
don't include these ownership arrays? I think this would resolve the overlaps when reading pumi-pic outputs as well as some other oddities that we faced when debugging PCMS/redev visualizations.
For example, I think the following is valid
<PUnstructuredGrid GhostLevel="1">
<PPointData>
<!-- Other data arrays -->
<PDataArray type="Int32" Name="vtkGhostType"/>
</PPointData>
<PCellData>
<!-- Other data arrays -->
<PDataArray type="Int32" Name="vtkGhostType"/>
</PCellData>
<Piece Source="piece1.vtu"/>
<Piece Source="piece2.vtu"/>
<Piece Source="piece3.vtu"/>
</PUnstructuredGrid>
Building using the amd compilers without optimizations results in compile errors:
Adding -O1
or -O2
results in a successful build.
+ /opt/cray/pe/craype/2.7.19/bin/CC -DKOKKOS_DEPENDENCE -D__HIP_ROCclr__ -I/ccs/home/cwsmith/omegahKk/omega_h/src -I/ccs/home/cwsmith/omegahKk/buildOmegahVega90a_amd_RelWithDebInfo_MPIon/src -I/ccs/home/cwsmith/omegahKk/omega_h/tpl -isystem /ccs/home/cwsmith/omegahKk/buildKokkosVega90a_RelWithDebInfo_amd/install/include -g -fno-gpu-rdc -x hip --rocm-path=/opt/rocm-5.3.0 --offload-arch=gfx90a -std=c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o -c /ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp
In file included from /ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:3:
/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_align.hpp:132:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
1 warning generated when compiling for gfx90a.
lld: error: undefined hidden symbol: Omega_h::Read<int>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:29 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:29)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:29 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:29)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:31 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:31)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced 99 more times
lld: error: undefined hidden symbol: Omega_h::Write<int>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:32 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:32)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:32 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:32)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:170 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:170)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::form_uses(Omega_h::Read<int>, Omega_h_Family, int, int)::'lambda'(int)::operator()(int) const)
>>> referenced 37 more times
lld: error: undefined hidden symbol: Omega_h::Read<signed char>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:33 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:33)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:33 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:33)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:212 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:212)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::separate_upward_with_codes(int, Omega_h::Read<int>, int, Omega_h::Write<int>, Omega_h::Read<signed char>, Omega_h::Write<signed char>)::'lambda'(int)::operator()(int) const)
>>> referenced 29 more times
lld: error: undefined hidden symbol: Omega_h::Write<signed char>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:33 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:33)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:33 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:33)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:127 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:127)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::find_canonical_jumps(int, Omega_h::Read<int>, Omega_h::Read<int>)::'lambda'(int)::operator()(int) const)
>>> referenced 37 more times
lld: error: undefined hidden symbol: Omega_h::Read<long>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:78 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:78)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::Read<signed char> Omega_h::get_codes_to_canonical_deg<4, long>(Omega_h::Read<long>)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:78 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:78)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::Read<signed char> Omega_h::get_codes_to_canonical_deg<4, long>(Omega_h::Read<long>)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:81 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:81)
>>> /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::Read<signed char> Omega_h::get_codes_to_canonical_deg<4, long>(Omega_h::Read<long>)::'lambda'(int)::operator()(int) const)
>>> referenced 35 more times
clang-15: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
Building with amdclang compilers on frontier
results in wrap_test_serial
hanging. The output and stack trace is pasted below.
A build with hipcc
:
results in warp_test_serial
running without error.
(ins)cwsmith@frontier06451: ~/omegahKk/test/buildOmegahVega90a_amd $ ./src/warp_test
warp_to_limit completed in one step
before adapting:
6000 tets, quality [0.62,0.85], 6000 >0.30
7930 edges, length [0.67,1.55], 132 <0.71, 7458 in [0.71,1.41], 340 >1.41
quality histogram:
0.00-0.10: 0
0.10-0.20: 0
0.20-0.30: 0
0.30-0.40: 0
0.40-0.50: 0
0.50-0.60: 0
0.60-0.70: 1080
0.70-0.80: 4240
0.80-0.90: 680
0.90-1.00: 0
length histogram:
0.00-0.30: 0
0.30-0.60: 0
0.60-0.90: 3410
0.90-1.20: 3056
1.20-1.50: 1364
1.50-1.80: 100
1.80-2.10: 0
2.10-2.40: 0
2.40-2.70: 0
2.70-3.00: 0
average quality: 0.744939
addressing edge lengths
refining 340 edges
8040 tets, quality [0.63,0.85], 8040 >0.30
10310 edges, length [0.60,1.41], 1452 <0.71, 8858 in [0.71,1.41]
... snip ...
warp_to_limit completed in one step
before adapting:
8472 tets, quality [0.33,0.88], 8472 >0.30
10810 edges, length [0.47,1.56], 1796 <0.71, 8616 in [0.71,1.41], 398 >1.41
quality histogram:
0.00-0.10: 0
0.10-0.20: 0
0.20-0.30: 0
0.30-0.40: 24
0.40-0.50: 558
0.50-0.60: 1236
0.60-0.70: 1936
0.70-0.80: 3198
0.80-0.90: 1520
0.90-1.00: 0
length histogram:
0.00-0.30: 0
0.30-0.60: 756
0.60-0.90: 5242
0.90-1.20: 3160
1.20-1.50: 1590
1.50-1.80: 62
1.80-2.10: 0
2.10-2.40: 0
2.40-2.70: 0
2.70-3.00: 0
average quality: 0.693191
addressing edge lengths
refining 390 edges
10016 tets, quality [0.29,0.92], 8 in [0.20,0.30], 10008 >0.30
12772 edges, length [0.22,1.45], 2692 <0.71, 10072 in [0.71,1.41], 8 >1.41
coarsening 298 vertices
8880 tets, quality [0.29,0.95], 4 in [0.20,0.30], 8876 >0.30
11310 edges, length [0.40,1.45], 1958 <0.71, 9344 in [0.71,1.41], 8 >1.41
refining 8 edges
8916 tets, quality [0.28,0.95], 4 in [0.20,0.30], 8912 >0.30
11354 edges, length [0.28,1.40], 1972 <0.71, 9382 in [0.71,1.41]
coarsening 68 vertices
8644 tets, quality [0.29,0.95], 2 in [0.20,0.30], 8642 >0.30
11010 edges, length [0.40,1.41], 1812 <0.71, 9198 in [0.71,1.41]
coarsening 6 vertices
8622 tets, quality [0.29,0.96], 4 in [0.20,0.30], 8618 >0.30
10978 edges, length [0.40,1.41], 1798 <0.71, 9180 in [0.71,1.41]
coarsening 4 vertices
8608 tets, quality [0.29,0.96], 4 in [0.20,0.30], 8604 >0.30
10958 edges, length [0.40,1.41], 1794 <0.71, 9164 in [0.71,1.41]
addressing element qualities
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fffed197e7f in ?? () from /opt/rocm-5.3.0/lib/libhsa-runtime64.so.1
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.31-150300.41.1.x86_64 krb5-debuginfo-1.19.2-150400.3.3.1.x86_64 libbrotlicommon1-debuginfo-1.0.7-3.3.1.x86_64 libbrotlidec1-debuginfo-1.0.7-3.3.1.x86_64 libcom_err2-debuginfo-1.46.4-150400.3.3.1.x86_64 libcurl4-debuginfo-7.79.1-150400.5.15.1.x86_64 libdrm2-debuginfo-2.4.107-150400.1.8.x86_64 libdrm_amdgpu1-debuginfo-2.4.107-150400.1.8.x86_64 libelf1-debuginfo-0.185-150400.5.3.1.x86_64 libgcc_s1-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libidn2-0-debuginfo-2.2.0-3.6.1.x86_64 libjson-c3-debuginfo-0.13-3.3.1.x86_64 libkeyutils1-debuginfo-1.6.3-5.6.1.x86_64 libldap-2_4-2-debuginfo-2.4.46-150200.14.11.2.x86_64 libncurses6-debuginfo-6.1-150000.5.12.1.x86_64 libnghttp2-14-debuginfo-1.40.0-6.1.x86_64 libnl3-200-debuginfo-3.3.0-1.29.x86_64 libnuma1-debuginfo-2.0.14.20.g4ee5e0c-150400.1.24.x86_64 libopenssl1_1-debuginfo-1.1.1l-150400.7.22.1.x86_64 libpcre1-debuginfo-8.45-150000.20.13.1.x86_64 libpsl5-debuginfo-0.20.1-150000.3.3.1.x86_64 libselinux1-debuginfo-3.1-150400.1.69.x86_64 libssh4-debuginfo-0.9.6-150400.1.5.x86_64 libstdc++6-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libunistring2-debuginfo-0.9.10-1.1.x86_64 libyaml-0-2-debuginfo-0.1.7-1.17.x86_64 libz1-debuginfo-1.2.11-150000.3.39.1.x86_64 libzstd1-debuginfo-1.5.0-150400.1.71.x86_64
(ins)(gdb) where
#0 0x00007fffed197e7f in ?? () from /opt/rocm-5.3.0/lib/libhsa-runtime64.so.1
#1 0x00007fffed197d3a in ?? () from /opt/rocm-5.3.0/lib/libhsa-runtime64.so.1
#2 0x00007fffed18bd09 in ?? () from /opt/rocm-5.3.0/lib/libhsa-runtime64.so.1
#3 0x00007fffebbf21db in ?? () from /opt/rocm-5.3.0/lib/libamdhip64.so.5
#4 0x00007fffebbe13fa in ?? () from /opt/rocm-5.3.0/lib/libamdhip64.so.5
#5 0x00007fffeba54cb7 in ?? () from /opt/rocm-5.3.0/lib/libamdhip64.so.5
#6 0x00007fffeba5ecfc in hipFree () from /opt/rocm-5.3.0/lib/libamdhip64.so.5
#7 0x000000000134aeec in Kokkos::HIPSpace::impl_deallocate (this=<optimized out>,
arg_label=arg_label@entry=0x1b95d08 "", arg_alloc_ptr=arg_alloc_ptr@entry=0x7ff7a83f9000,
arg_alloc_size=arg_alloc_size@entry=11216, arg_logical_size=arg_logical_size@entry=10960,
arg_handle=...) at /ccs/home/cwsmith/omegahKk/test/kokkos/core/src/HIP/Kokkos_HIP_Space.cpp:228
#8 0x000000000134adc4 in Kokkos::HIPSpace::deallocate (this=<optimized out>, arg_label=0x1b95d08 "",
arg_alloc_ptr=0x7ff7a83f9000, arg_alloc_size=11216, arg_logical_size=10960)
at /ccs/home/cwsmith/omegahKk/test/kokkos/core/src/HIP/Kokkos_HIP_Space.cpp:216
#9 0x0000000001349bc7 in Kokkos::Impl::SharedAllocationRecord<Kokkos::HIPSpace, void>::~SharedAllocationRecord (this=0x1b95cd0)
at /ccs/home/cwsmith/omegahKk/test/kokkos/core/src/HIP/Kokkos_HIP_SharedAllocationRecord.cpp:41
#10 0x0000000001068421 in Kokkos::Impl::SharedAllocationRecord<Kokkos::HIPSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, signed char, true> >::~SharedAllocationRecord (
this=0x1b95cd0)
at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/impl/Kokkos_SharedAlloc.hpp:281
#11 Kokkos::Impl::SharedAllocationRecord<Kokkos::HIPSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, signed char, true> >::~SharedAllocationRecord (this=0x1b95cd0)
at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/impl/Kokkos_SharedAlloc.hpp:281
#12 0x0000000001343c21 in Kokkos::Impl::SharedAllocationRecord<void, void>::decrement (
arg_record=0x1b95cd0)
at /ccs/home/cwsmith/omegahKk/test/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp:265
#13 0x00000000012c2631 in Kokkos::Impl::SharedAllocationTracker::~SharedAllocationTracker (
this=0x7fffffff6130)
at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/impl/Kokkos_SharedAlloc.hpp:419
#14 Kokkos::Impl::ViewTracker<Kokkos::View<signed char*> >::~ViewTracker (this=0x7fffffff6130)
at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/impl/Kokkos_ViewTracker.hpp:39
#15 Kokkos::View<signed char*>::~View (this=0x7fffffff6130)
at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/Kokkos_View.hpp:1266
#16 Omega_h::Write<signed char>::~Write (this=0x7fffffff6130)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_array.hpp:24
#17 Omega_h::Read<signed char>::~Read (this=0x7fffffff6130)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_array.hpp:251
#18 Omega_h::swap3d_qualities_tmpl<1> (mesh=mesh@entry=0x7fffffff7360, opts=..., cands2edges=...,
cand_quals=cand_quals@entry=0x7fffffff64b8, cand_configs=cand_configs@entry=0x7fffffff6568)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap3d_qualities.cpp:58
#19 0x00000000012c0a99 in Omega_h::swap3d_qualities (mesh=mesh@entry=0x7fffffff7360, opts=...,
cands2edges=..., cand_quals=cand_quals@entry=0x7fffffff64b8,
cand_configs=cand_configs@entry=0x7fffffff6568)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap3d_qualities.cpp:71
#20 0x00000000012be594 in Omega_h::swap3d_ghosted (mesh=0x7fffffff7360, opts=...)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap3d.cpp:21
#21 Omega_h::swap_edges_3d (mesh=mesh@entry=0x7fffffff7360, opts=...)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap3d.cpp:80
#22 0x00000000012b7586 in Omega_h::swap_edges (mesh=mesh@entry=0x7fffffff7360, opts=...)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap.cpp:49
#23 0x00000000010520d1 in Omega_h::satisfy_quality (mesh=0x7fffffff7360, opts=...)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_adapt.cpp:196
#24 Omega_h::snap_and_satisfy_quality (mesh=0x7fffffff7360, opts=...)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_adapt.cpp:241
#25 Omega_h::adapt (mesh=mesh@entry=0x7fffffff7360, opts=...)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_adapt.cpp:281
#26 0x000000000104bfc1 in main (argc=<optimized out>, argv=<optimized out>)
(ins)--Type <RET> for more, q to quit, c to continue without paging--
test/omega_h/src/warp_test.cpp:115
(ins)(gdb)
working through fixing some rcField tests as I merge sandialabs. I'm wondering if there is any reason why we shouldn't push all of the rcField functions to be private methods on the mesh class? If I understand correctly these functions are all there to have an internal cache of stored values of the reverse classification.
If this is correct, then we should make all those methods private and just have the ask_revClass
etc. functions be exposed in the namespace.
To regression test the features enabled with simmodsuite = on
cc'ing @cwsmith , we've had this discussion before but it keeps getting away due to other things so making a note here
Does omega_h explicitly define shape functions, or does it never need them as everything is assumed to be linear? If so where is it? Can you inquire for a certain parametric coordinate and get back the field's value?
Configure and building omega_h
master branch at commit: eac504c on Perlmutter, one warning message:
MPI Libraries not found, there may be linking errors! Try enabling shared
libraries '-DBUILD_SHARED_LIBS=ON'.
Full log is below:
-- Cray Programming Environment 2.7.11 CXX
-- CMAKE_VERSION: 3.20.5
-- Omega_h_VERSION: 9.33.0
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: OFF
-- CMAKE_INSTALL_PREFIX: /global/homes/z/zhangc20/xgcm/install/omega_h/install
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA:
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX:
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /global/homes/z/zhangc20/xgcm/install/kokkos/install/lib64/cmake
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_USE_SimModSuite: OFF
-- Omega_h_VALGRIND:
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX:
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /global/homes/z/zhangc20/xgcm/install/kokkos/install/lib64/cmake
-- Enabled Kokkos devices: CUDA;SERIAL
-- kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos
-- Kokkos_CONFIG: /global/homes/z/zhangc20/xgcm/install/kokkos/install/lib64/cmake/Kokkos/KokkosConfig.cmake
-- Kokkos_VERSION: 3.4.01
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_MPI: ON
-- Omega_h_USE_MPI: ON
-- MPI_PREFIX:
CMake Warning at CMakeLists.txt:102 (message):
MPI Libraries not found, there may be linking errors! Try enabling shared
libraries '-DBUILD_SHARED_LIBS=ON'.
-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 9.33.0-sha.eac504c0+110110000000000
-- Configuring done
-- Generating done
The building was fine though. Script used is below:
module load PrgEnv-gnu
module load cpe-cuda
module load cmake/3.20.5
module load cuda/11.1.1
export cuda=$CUDA_DIR
export PATH=$cuda/bin:$PATH
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH
export installroot=$PWD
export kk=$installroot/kokkos/install
export srcroot=$installroot/../
export kksrc=$srcroot/kokkos
export oh=$installroot/omega_h/install
export ohsrc=$srcroot/omega_h
export OMPI_CXX=$kksrc/bin/nvcc_wrapper
cd $installroot
mkdir -p omega_h/build
cd omega_h/build
cmake $ohsrc -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=$oh -DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_CUDA=on -DOmega_h_CUDA_ARCH=80 \
-DOmega_h_USE_Kokkos=ON -DOmega_h_USE_MPI=ON \
-DCMAKE_CXX_COMPILER=CC \
-DCMAKE_CUDA_FLAGS="-I$MPICH_DIR/include" \
-DKokkos_PREFIX=$kk/lib64/cmake/
make -j4 install
It seems that the storage/retrieval of tags with either a topology type or dimension does not work as one would expect.
That's because those tags are stored and queried based on how they were added either by topo type or dimension.
For example if a tag is requested of dimension 3 (region) it will not return the tags that were set on Tets, Hexes, etc.
Following these instructions:
core was generated by `src/warp_test'.
Program terminated with signal SIGBUS, Bus error.
warning: Section `.reg-xstate/76107' in core file too small.
#0 0x00007fffe817ed4f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fffed81a300 (LWP 76107))]
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.31-150300.41.1.x86_64 krb5-debuginfo-1.19.2-150400.3.3.1.x86_64 libbrotlicommon1-debuginfo-1.0.7-3.3.1.x86_64 libbrotlidec1-debuginfo-1.0.7-3.3.1.x86_64 libcom_err2-debuginfo-1.46.4-150400.3.3.1.x86_64 libcurl4-debuginfo-7.79.1-150400.5.15.1.x86_64 libdrm2-debuginfo-2.4.107-150400.1.8.x86_64 libdrm_amdgpu1-debuginfo-2.4.107-150400.1.8.x86_64 libelf1-debuginfo-0.185-150400.5.3.1.x86_64 libgcc_s1-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libidn2-0-debuginfo-2.2.0-3.6.1.x86_64 libjson-c3-debuginfo-0.13-3.3.1.x86_64 libkeyutils1-debuginfo-1.6.3-5.6.1.x86_64 libldap-2_4-2-debuginfo-2.4.46-150200.14.11.2.x86_64 libncurses6-debuginfo-6.1-150000.5.12.1.x86_64 libnghttp2-14-debuginfo-1.40.0-6.1.x86_64 libnl3-200-debuginfo-3.3.0-1.29.x86_64 libnuma1-debuginfo-2.0.14.20.g4ee5e0c-150400.1.24.x86_64 libopenssl1_1-debuginfo-1.1.1l-150400.7.22.1.x86_64 libpcre1-debuginfo-8.45-150000.20.13.1.x86_64 libpsl5-debuginfo-0.20.1-150000.3.3.1.x86_64 libselinux1-debuginfo-3.1-150400.1.69.x86_64 libssh4-debuginfo-0.9.6-150400.1.5.x86_64 libstdc++6-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libunistring2-debuginfo-0.9.10-1.1.x86_64 libyaml-0-2-debuginfo-0.1.7-1.17.x86_64 libz1-debuginfo-1.2.11-150000.3.39.1.x86_64 libzstd1-debuginfo-1.5.0-150400.1.71.x86_64
(ins)(gdb) where
#0 0x00007fffe817ed4f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1 0x00007fffe9729b6c in MPIR_Localcopy () from /opt/cray/pe/lib64/libmpi_amd.so.12
#2 0x00007fffeb479223 in MPIDI_CRAY_Common_lmt_unpack () from /opt/cray/pe/lib64/libmpi_amd.so.12
#3 0x00007fffeb498a08 in MPIDI_CRAY_Common_lmt_ctrl_send_rts_cb () from /opt/cray/pe/lib64/libmpi_amd.so.12
#4 0x00007fffeb4716c8 in MPIDI_SHMI_progress () from /opt/cray/pe/lib64/libmpi_amd.so.12
#5 0x00007fffe9f6b7e9 in MPIR_Waitall_impl () from /opt/cray/pe/lib64/libmpi_amd.so.12
#6 0x00007fffe9fd19b1 in MPIR_Waitall () from /opt/cray/pe/lib64/libmpi_amd.so.12
#7 0x00007fffe9fd2eae in PMPI_Waitall () from /opt/cray/pe/lib64/libmpi_amd.so.12
#8 0x000000000114cb9d in Omega_h::Comm::alltoallv<int> (this=0x1b04f80, sendbuf_dev=..., sdispls_dev=..., rdispls_dev=..., width=1) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_comm.cpp:557
#9 0x000000000117989c in Omega_h::Dist::exch<int> (this=this@entry=0x7fffffff6538, data=..., width=width@entry=1) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:118
#10 0x0000000001174221 in Omega_h::Dist::set_dest_idxs (this=this@entry=0x7fffffff6538, fitems2rroots=..., nrroots=nrroots@entry=3000) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:78
#11 0x00000000011734bb in Omega_h::Dist::Dist (this=0x7fffffff6538, comm_in=..., fitems2rroots=..., nrroots=3000) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:23
#12 0x0000000001199098 in Omega_h::bi_partition (comm=..., marks=...) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_bipart.cpp:32
#13 0x0000000001193657 in Omega_h::inertia::recursively_bisect (comm=..., tolerance=<error reading variable: That operation is not available on integers of more than 8 bytes.>,
p_coords=p_coords@entry=0x7fffffff6658, p_masses=p_masses@entry=0x7fffffff6640, p_owners=p_owners@entry=0x7fffffff6670, p_hints=p_hints@entry=0x7fffffff66d0)
at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_inertia.cpp:181
#14 0x00000000011e0665 in Omega_h::Mesh::balance (this=0x7fffffff70c0, predictive=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_mesh.cpp:560
#15 0x00000000010f23b8 in Omega_h::build_box (comm=..., family=family@entry=OMEGA_H_SIMPLEX, x=<error reading variable: That operation is not available on integers of more than 8 bytes.>,
y=<error reading variable: That operation is not available on integers of more than 8 bytes.>, z=<error reading variable: That operation is not available on integers of more than 8 bytes.>, nx=nx@entry=10,
ny=ny@entry=10, nz=nz@entry=10, symmetric=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_build.cpp:147
#16 0x000000000104af37 in main (argc=<optimized out>, argv=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/warp_test.cpp:71
(ins)(gdb)
The following fails to compile using cuda 10.2:
#include "Omega_h_library.hpp"
#include <Omega_h_for.hpp>
static void foo() {
auto coords = Omega_h::LOs(100);
auto f = OMEGA_H_LAMBDA(Omega_h::LO v) {
auto z = coords[v];
};
Omega_h::parallel_for(100, f);
}
int main(int argc, char** argv) {
auto lib = Omega_h::Library (&argc, &argv);
foo();
return 0;
}
With a very long compilation error (see below) on cranium.
Removing static
from the foo()
resolves it..... No idea why.
[100%] Building CUDA object src/CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o
cd /space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah/src && /usr/local/cuda-10.2/bin/nvcc -forward-unknown-to-host-compiler -I/space/cwsmith/testOmegahBuild/omega_h/src -I/space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah/src --generate-code=arch=compute_75,code=[compute_75,sm_75] --compiler-options -W,-Wall,-Wextra,-Werror,-Wno-noexcept-type --Werror cross-execution-space-call,deprecated-declarations --expt-extended-lambda -std=c++14 -MD -MT src/CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o -MF CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o.d -x cu -c /space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp -o CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30>':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:108:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8: [ skipping 3 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:139:65: required from 'cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int; cudaError_t = cudaError; cudaStream_t = CUstream_st*]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:164:50: required from 'void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived>&, F, Size) [with Derived = thrust::cuda_cub::par_t; F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23: required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28: required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16: required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17: required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18: required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29: required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:132:8: error: 'thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30>' has a base 'thrust::cuda_cub::core::has_sm_tuning_impl<thrust::cuda_cub::core::sm30, thrust::cuda_cub::__parallel_for::Tuning<thrust::cuda_cub::core::sm30, thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> > > >' whose type uses the anonymous namespace [-Werror=subobject-linkage]
struct has_sm_tuning : has_sm_tuning_impl<SM, typename P<lowest_supported_sm_arch>::tuning > {};
^~~~~~~~~~~~~
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:108:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8: required from 'struct thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30>'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:81:6: [ skipping 2 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:139:65: required from 'cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int; cudaError_t = cudaError; cudaStream_t = CUstream_st*]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:164:50: required from 'void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived>&, F, Size) [with Derived = thrust::cuda_cub::par_t; F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23: required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28: required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16: required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17: required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18: required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29: required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8: error: 'thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >' has a base 'thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan<thrust::cuda_cub::core::sm30>' whose type uses the anonymous namespace [-Werror=subobject-linkage]
struct specialize_plan_impl_match<P, typelist<SM, _1, _2, _3, _4, _5, _6, _7, _8, _9> >
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8: required from 'struct thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:125: required by substitution of 'template<class S> static thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::yes_type thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::test<S>(typename S::TempStorage*) [with S = thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:246: required from 'const bool thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::value'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:330: [ skipping 7 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23: required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28: required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16: required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17: required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18: required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29: required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:132:8: error: 'thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>' has a base 'thrust::cuda_cub::core::has_sm_tuning_impl<thrust::cuda_cub::core::sm60, thrust::cuda_cub::__parallel_for::Tuning<thrust::cuda_cub::core::sm30, thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> > > >' whose type uses the anonymous namespace [-Werror=subobject-linkage]
struct has_sm_tuning : has_sm_tuning_impl<SM, typename P<lowest_supported_sm_arch>::tuning > {};
^~~~~~~~~~~~~
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm52>':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8: required from 'struct thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:125: required by substitution of 'template<class S> static thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::yes_type thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::test<S>(typename S::TempStorage*) [with S = thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:246: [ skipping 7 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:164:50: required from 'void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived>&, F, Size) [with Derived = thrust::cuda_cub::par_t; F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23: required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28: required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16: required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17: required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18: required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29: required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:132:8: error: 'thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm52>' has a base 'thrust::cuda_cub::core::has_sm_tuning_impl<thrust::cuda_cub::core::sm52, thrust::cuda_cub::__parallel_for::Tuning<thrust::cuda_cub::core::sm30, thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> > > >' whose type uses the anonymous namespace [-Werror=subobject-linkage]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm35>':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8: required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8: required from 'struct thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:125: [ skipping 7 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:139:65: required from 'cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int; cudaError_t = cudaError; cudaStream_t = CUstream_st*]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:164:50: required from 'void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived>&, F, Size) [with Derived = thrust::cuda_cub::par_t; F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23: required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28: required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16: required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17: required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18: required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29: required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:132:8: error: 'thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm35>' has a base 'thrust::cuda_cub::core::has_sm_tuning_impl<thrust::cuda_cub::core::sm35, thrust::cuda_cub::__parallel_for::Tuning<thrust::cuda_cub::core::sm30, thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> > > >' whose type uses the anonymous namespace [-Werror=subobject-linkage]
cc1plus: all warnings being treated as errors
make[3]: *** [src/CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o] Error 1
make[3]: Leaving directory `/space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah'
make[2]: *** [src/CMakeFiles/boundary_field_test.dir/all] Error 2
make[2]: Leaving directory `/space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah'
make[1]: *** [src/CMakeFiles/boundary_field_test.dir/rule] Error 2
make[1]: Leaving directory `/space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah'
make: *** [boundary_field_test] Error 2
On ALCF Polaris using the nvhpc 23.9.0 compilers, this macro:
Lines 16 to 22 in 0dcdfee
results in pragma GCC system_header
being added to a handful of header files. The compiler/pre processor outputs the following warning:
omega_h/src/Omega_h_mpi.h(8): warning #2320-D: #pragma GCC system_header cannot be used in the primary source file
_Pragma("GCC system_header")
While building on Summit
with CMake 3.23.2
, gcc 11.2.0
, and Cuda 11.7.1
, CMake
configuration produced the following warning message:
-- CMAKE_VERSION: 3.23.2
-- Omega_h_VERSION: 10.5.0
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: OFF
-- CMAKE_INSTALL_PREFIX: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/omega_h/install
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA:
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX:
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/kokkos/install/lib64/cmake
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_USE_SimModSuite: OFF
-- Omega_h_VALGRIND:
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_MPI: on
-- Omega_h_ENABLE_DEMANGLED_STACKTRACE: OFF
-- Omega_h_DBG: OFF
-- Omega_h_USE_MPI: on
-- MPI_PREFIX:
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX:
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/kokkos/install/lib64/cmake
-- Enabled Kokkos devices: CUDA;SERIAL
-- kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos
-- Kokkos_CONFIG: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/kokkos/install/lib64/cmake/Kokkos/KokkosConfig.cmake
-- Kokkos_VERSION: 3.7.01
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_MPI: on
-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 10.5.0-sha.e1be29b0+100101100000000000
-- Configuring done
CMake Warning at src/CMakeLists.txt:252 (add_executable):
Cannot generate a safe runtime search path for target msh2osh because files
in some directories may conflict with libraries in implicit directories:
runtime library [libcuda.so.1] in /usr/lib64 may be hidden by files in:
/sw/summit/cuda/11.7.1/lib64
Some of these libraries may not be found correctly.
Call Stack (most recent call first):
src/CMakeLists.txt:261 (osh_add_exe)
src/CMakeLists.txt:265 (osh_add_util)
CMake Warning at src/CMakeLists.txt:252 (add_executable):
Cannot generate a safe runtime search path for target osh2vtk because files
in some directories may conflict with libraries in implicit directories:
runtime library [libcuda.so.1] in /usr/lib64 may be hidden by files in:
/sw/summit/cuda/11.7.1/lib64
Some of these libraries may not be found correctly.
Call Stack (most recent call first):
src/CMakeLists.txt:261 (osh_add_exe)
src/CMakeLists.txt:266 (osh_add_util)
CMake Warning at src/CMakeLists.txt:252 (add_executable):
Cannot generate a safe runtime search path for target oshdiff because files
in some directories may conflict with libraries in implicit directories:
runtime library [libcuda.so.1] in /usr/lib64 may be hidden by files in:
/sw/summit/cuda/11.7.1/lib64
Some of these libraries may not be found correctly.
Call Stack (most recent call first):
src/CMakeLists.txt:261 (osh_add_exe)
src/CMakeLists.txt:267 (osh_add_util)
.....
-- Generating done
-- Build files have been written to: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/omega_h/build
Note: warning message truncated as the warnings are the same except different line numbers for the following line:
src/CMakeLists.txt:267 (osh_add_util)
The build was working fine. Reporting here in case it's an issue.
On Summit, adding -DCMAKE_BUILD_TYPE=Release
in the build script, there is an error at the installation stage:
[100%] Built target ascii_vtk2osh
Install the project...
-- Install configuration: "Release"
CMake Error at cmake_install.cmake:58 (file):
file cannot create directory: /usr/local/lib/cmake/Omega_h. Maybe need
administrative privileges.
make: *** [install] Error 1
The build script is as below:
module load gcc/7.4.0
module load cuda/10.1.243
module load cmake
export cuda=$CUDA_DIR
export PATH=$cuda/bin:$PATH
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH
export installroot=$PWD
export kk=$installroot/kokkos/install # this is where kokkos will be installed
export srcroot=$installroot/../
export kksrc=$srcroot/kokkos
export OMPI_CXX=$kksrc/bin/nvcc_wrapper
export oh=$installroot/omega_h/install # this is where omega_h will be installed
export ohsrc=$srcroot/omega_h
cd $installroot
mkdir -p omega_h/build
cd omega_h/build
cmake $ohsrc -DCMAKE_BUILD_TYPE=Release
-DCMAKE_INSTALL_PREFIX=$oh -DBUILD_SHARED_LIBS=ON \
-DOmega_h_USE_CUDA=on -DOmega_h_CUDA_ARCH=70 \
-DOmega_h_USE_Kokkos=ON -DOmega_h_USE_MPI=on \
-DCMAKE_CXX_COMPILER=mpiCC \
-DKokkos_PREFIX=$kk/lib64/cmake/
make -j4 install
Deleting -DCMAKE_BUILD_TYPE=Release
, everything works fine. What's the reason?
The following compilation error is repeatable on SCOREC rhel7 and Aimos rhel8 using omegah d56a86e. The build instructions are listed below.
Thrust version is 1.10.0 ( packaged with CUDA Toolkit 11.2) according to https://github.com/NVIDIA/thrust.
Dan reported it already! NVIDIA/thrust#1332
[ 0%] Building CUDA object src/CMakeFiles/omega_h.dir/Omega_h_int_scan.cpp.o
cd /space/cwsmith/omegahCuda11/buildCuda11/src && /usr/local/cuda-11.2/bin/nvcc -forward-unknown-to-host-compiler -Domega_h_EXPORTS -I/space/cwsmith/omegahCuda11/omega_h/src -I/space/cw
/usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/scan.h(578): error: array of reference is not allowed
detected during:
instantiation of class "thrust::cuda_cub::__scan::DoNothing<T> [with T=const Omega_h::LO &]"
(784): here
instantiation of "OutputIt thrust::cuda_cub::inclusive_scan_n(thrust::cuda_cub::execution_policy<Derived> &, InputIt, Size, OutputIt, ScanOp) [with Derived=thrust::cuda_cub:
/usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/transform_scan.h(72): here
instantiation of "OutputIt thrust::cuda_cub::transform_inclusive_scan(thrust::cuda_cub::execution_policy<Derived> &, InputIt, InputIt, OutputIt, TransformOp, ScanOp) [with D
/usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/thrust/detail/transform_scan.inl(47): here
instantiation of "OutputIterator thrust::transform_inclusive_scan(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, OutputIterator,
/space/cwsmith/omegahCuda11/omega_h/src/Omega_h_scan.hpp(83): here
instantiation of "OutputIterator Omega_h::transform_inclusive_scan(InputIterator, InputIterator, OutputIterator, BinaryOp, UnaryOp) [with InputIterator=Omega_h::LO *, Output
/space/cwsmith/omegahCuda11/omega_h/src/Omega_h_int_scan.cpp(32): here
The following was tested on cranium
.
env
module use /opt/scorec/spack/dev/lmod/linux-rhel7-x86_64/Core
module unuse /opt/scorec/spack/lmod/linux-rhel7-x86_64/Core
module load gcc/7.4.0-c5aaloy cuda/11.2
module load mpich/3.3.1-bfezl2l
module load cmake
cmake
cmake ../omega_h \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DBUILD_TESTING=on \
-DOmega_h_USE_CUDA=on \
-DOmega_h_CUDA_ARCH=75 \
-DOmega_h_USE_MPI=on \
-DBUILD_SHARED_LIBS=ON
The following was tested on dcs217
(a rhel8 fen).
env
module load spectrum-mpi/10.4
module load cmake/3.20.0
module load cuda/11.2
cmake
cmake ../omega_h \
-DCMAKE_INSTALL_PREFIX=$oh \
-DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_Kokkos=ON \
-DOmega_h_USE_CUDA=on \
-DOmega_h_CUDA_ARCH=70 \
-DOmega_h_USE_MPI=on \
-DBUILD_TESTING=on \
-DCMAKE_CXX_COMPILER=g++ \
-DKokkos_PREFIX=$kk/lib64/cmake
In the describe
tool, we should add the following info on each tag:
This is a tracking issue for sandialabs#389 . I will commit fixes upstream.
warp_test_parallel & rc_field_testp failing with the following error:
(GTL DEBUG: 0) cuIpcGetMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 148
MPICH ERROR [Rank 0] [job id 26603385.1] [Sat Jun 8 16:52:33 2024] [nid001264] - Abort(942257666) (rank 0 in comm 0): Fatal error in PMPI_Isend: Invalid count, error stack:
PMPI_Isend(161)......................: MPI_Isend(buf=0x3200fb5c0, count=9000, MPI_DOUBLE, dest=1, tag=42, comm=0xc4000004, request=0x3bbfdf4) failed
MPID_Isend(584)......................:
MPIDI_isend_unsafe(136)..............:
MPIDI_SHM_mpi_isend(323).............:
MPIDI_CRAY_Common_lmt_isend(84)......:
MPIDI_CRAY_Common_lmt_export_mem(103):
(unknown)(): Invalid count
aborting job:
Fatal error in PMPI_Isend: Invalid count, error stack:
PMPI_Isend(161)......................: MPI_Isend(buf=0x3200fb5c0, count=9000, MPI_DOUBLE, dest=1, tag=42, comm=0xc4000004, request=0x3bbfdf4) failed
MPID_Isend(584)......................:
MPIDI_isend_unsafe(136)..............:
MPIDI_SHM_mpi_isend(323).............:
MPIDI_CRAY_Common_lmt_isend(84)......:
MPIDI_CRAY_Common_lmt_export_mem(103):
(unknown)(): Invalid count
Both tests output vaild results for a while before failing. Follow these instructions to reproduce:
environment script:
export root=$PWD
module load PrgEnv-gnu
module load cudatoolkit
module load cmake
export kk=$root/build-kokkos/install # This is where kokkos will be (or is) installed
export oh=$root/build-omega_h/install # This is where omega_h will be (or is) installed
export CMAKE_PREFIX_PATH=$kk:$kk/lib64/cmake:$oh:$CMAKE_PREFIX_PATH
export MPICH_CXX=$root/kokkos/bin/nvcc_wrapper
export SLURM_CPU_BIND="cores"
install script:
#kokkos
rm ${kk%%install} -rf
rm kokkos -rf
git clone -b 4.2.00 https://github.com/kokkos/kokkos.git
mkdir -p $kk
cmake -S kokkos -B ${kk%%install} \
-DCMAKE_INSTALL_PREFIX=$kk \
-DCMAKE_BUILD_TYPE="Release" \
-DCMAKE_CXX_COMPILER=$root/kokkos/bin/nvcc_wrapper \
-DKokkos_ARCH_AMPERE80=ON \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_OPENMP=off \
-DKokkos_ENABLE_CUDA=on \
-DKokkos_ENABLE_CUDA_LAMBDA=on \
-DKokkos_ENABLE_DEBUG=off
cmake --build ${kk%%install} -j 24 --target install
#omegah
rm ${oh%%install} -rf
rm omega_h -rf
mkdir -p $oh
git clone https://github.com/SCOREC/omega_h.git
cd omega_h && git checkout $1 && cd -
cmake -S omega_h -B ${oh%%install} \
-DCMAKE_INSTALL_PREFIX=$oh \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=off \
-DOmega_h_USE_Kokkos=ON \
-DOmega_h_USE_CUDA=on \
-DOmega_h_CUDA_ARCH=80 \
-DOmega_h_USE_MPI=on \
-DMPIEXEC_EXECUTABLE=srun \
-DBUILD_TESTING=on \
-DCMAKE_C_COMPILER=cc \
-DCMAKE_CXX_COMPILER=CC \
-DOmega_h_USE_CUDA_AWARE_MPI=ON \
-DKokkos_PREFIX=$kk/lib64/cmake
cmake --build ${oh%%install} -j 24 --target install
allocation script:
salloc --time 00:20:00 --constraint gpu --qos=interactive --nodes=1 --ntasks-per-node=40 --cpus-per-task=1 --gpus=1 --account=XXX
$ module li
Currently Loaded Modules:
1) craype-x86-milan 3) craype-network-ofi 5) PrgEnv-gnu/8.5.0 7) cray-libsci/23.12.5 9) craype/2.7.30 11) perftools-base/23.12.0 13) cudatoolkit/12.2 15) gpu/1.0
2) libfabric/1.15.2.0 4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta 6) cray-dsmml/0.2.2 8) cray-mpich/8.1.28 10) gcc-native/12.3 12) cpe/23.12 14) craype-accel-nvidia80
$ cat doConfigPerlKk.sh
bdir=$PWD/build-kokkos
cmake -S kokkos -B $bdir \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=ON \
-DCRAYPE_LINK_TYPE=dynamic \
-DCMAKE_CXX_COMPILER=$PWD/kokkos/bin/nvcc_wrapper \
-DKokkos_ARCH_AMPERE80=ON \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_OPENMP=off \
-DKokkos_ENABLE_CUDA=on \
-DKokkos_ENABLE_CUDA_LAMBDA=on \
-DKokkos_ENABLE_DEBUG=off \
-DCMAKE_INSTALL_PREFIX=$bdir/install
$ cat doConfigPerlOmegah.sh
#!/bin/bash -ex
usage="Usage: $0 <mpi=on|off> <cudaAware=on|off>"
[[ $# -ne 2 ]] && echo $usage && exit 1
mpi=$1
[[ $mpi != "on" && $mpi != "off" ]] && echo $usage && exit 1
cudaAware=$2
[[ $cudaAware != "on" && $cudaAware != "off" ]] && echo $usage && exit 1
bdir=$PWD/build-omegah-mpi${mpi}-cudaAware${cudaAware}
cmake -S omega_h -B $bdir \
-DCMAKE_INSTALL_PREFIX=$bdir/install \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=on \
-DOmega_h_USE_Kokkos=on \
-DOmega_h_CUDA_ARCH=80 \
-DOmega_h_USE_MPI=$mpi \
-DOmega_h_USE_CUDA_AWARE_MPI=$cudaAware \
-DBUILD_TESTING=on \
-DCMAKE_CXX_COMPILER=CC
Download the Omega_h delta wing meshes: https://zenodo.org/records/10672130
$ cat submitP2.sh
sbatch --nodes 1 --qos regular --time 00:10:00 --constraint gpu --gpus 4 --account=PROJECT_NAME ./runP2.sh
$ cat runP2.sh
#!/bin/bash
bin_cudaAwareOff=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/build-omegah-mpion-cudaAwareoff/src
bin_cudaAwareOn=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/build-omegah-mpion-cudaAwareon/src
mesh=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/deltaWing_500kMetric_p2.osh
cmd="$bin_cudaAwareOff/ugawg_hsc_oshmeshload --osh-pool $mesh"
export MPICH_GPU_SUPPORT_ENABLED=0
set -x
srun -n 2 $cmd &> log2p_cudaAwareOff
set +x
cmd="$bin_cudaAwareOn/ugawg_hsc_oshmeshload --osh-pool $mesh"
export MPICH_GPU_SUPPORT_ENABLED=1
set -x
srun -n 2 $cmd &> log2p_cudaAwareOn
set +x
$ cat log2p_cudaAwareOn
(GTL DEBUG: 0) cuIpcGetMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 148
MPICH ERROR [Rank 0] [job id 22622708.1] [Wed Mar 6 07:48:56 2024] [nid002241] - Abort(606713346) (rank 0 in comm 0): Fatal error in PMPI_Isend: Invalid count, error stack:
PMPI_Isend(161)......................: MPI_Isend(buf=0x623196f88, count=2382, MPI_INT, dest=1, tag=42, comm=0xc4000000, request=0x23c3f34) failed
MPID_Isend(584)......................:
MPIDI_isend_unsafe(136)..............:
MPIDI_SHM_mpi_isend(323).............:
MPIDI_CRAY_Common_lmt_isend(84)......:
MPIDI_CRAY_Common_lmt_export_mem(103):
(unknown)(): Invalid count
aborting job:
Fatal error in PMPI_Isend: Invalid count, error stack:
PMPI_Isend(161)......................: MPI_Isend(buf=0x623196f88, count=2382, MPI_INT, dest=1, tag=42, comm=0xc4000000, request=0x23c3f34) failed
MPID_Isend(584)......................:
MPIDI_isend_unsafe(136)..............:
MPIDI_SHM_mpi_isend(323).............:
MPIDI_CRAY_Common_lmt_isend(84)......:
MPIDI_CRAY_Common_lmt_export_mem(103):
(unknown)(): Invalid count
Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()
srun: error: nid002241: task 0: Exited with exit code 255
srun: Terminating StepId=22622708.1
slurmstepd: error: *** STEP 22622708.1 ON nid002241 CANCELLED AT 2024-03-06T15:48:58 ***
srun: error: nid002241: task 1: Terminated
srun: Force Terminated StepId=22622708.1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.