Coder Social home page Coder Social logo

scorec / omega_h Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sandialabs/omega_h

1.0 6.0 8.0 8 MB

Simplex mesh adaptivity for HPC

License: Other

CMake 2.92% Shell 0.61% Perl 1.70% C++ 91.59% GLSL 0.04% Python 0.12% C 2.15% Objective-C 0.87%
kokkos

omega_h's Introduction

Omega_h Logo

Omega_h

Reliable mesh adaptation

Omega_h is a C++14 library that implements tetrahedron and triangle mesh adaptativity, with a focus on scalable HPC performance using (optionally) MPI and OpenMP or CUDA. It is intended to provided adaptive functionality to existing simulation codes. Mesh adaptivity allows one to minimize both discretization error and number of degrees of freedom live during the simulation, as well as enabling moving object and evolving geometry simulations. Omega_h will do this for you in a way that is fast, memory-efficient, and portable across many different architectures.

Extensions

This fork of Omega_h from https://github.com/sandialabs/omega_h v9.34.12 adds support for:

  • loading serial Simmetrix meshes
  • reverse classification, and defining tags on sets of mesh entities defined by a reverse classification relation
  • representing single-process mixed meshes
  • domains with matched/periodic entities

Installing / Getting started

For a bare minimum setup with no parallelism, you just need CMake, a C++14 compiler, and preferably ZLib installed.

git clone [email protected]:SCOREC/omega_h.git
cd omega_h
cmake . -DCMAKE_INSTALL_PREFIX=/your/choice
make install

This should install Omega_h under the given prefix in a way you can access from your own CMake files using these CMake commands:

find_package(Omega_h)
target_link_libraries(myprogram Omega_h::omega_h)

Features

Omega_h provides at least the following:

  • Adaptation of tetrahedral and triangle meshes in parallel
  • Anisotropic metric field support
  • Given good input element quality, the output element quality is also guaranteed.
  • Scalable MPI parallelism
  • On-node OpenMP or CUDA parallelism
  • Fully deterministic execution
  • Given the same mesh, global numbering, and size field, results will be independent of parallel partitioning and ordering.

Configuration

Below we document some key CMake configuration options:

Omega_h_USE_MPI

Default: OFF

Whether to enable MPI parallelism. We recommend using MPICH or another MPI 3.0 implementation, but we also support MPI version 2.1. If this is ON, set CMAKE_CXX_COMPILER to your MPI compiler wrapper.

Omega_h_USE_OpenMP

Default: OFF

Whether to enable OpenMP thread parallelism. The -fopenmp flag will automatically be added.

Omega_h_USE_CUDA

Default: OFF

Whether to enable CUDA GPU parallelism.

Omega_h_USE_SEACASExodus

Default: OFF

Whether to use the Exodus subpackage of the SEACAS toolkit. This allows reading and writing Exodus files from Omega_h. By default, it will look for this dependency in SEACASExodus_PREFIX.

Contributing

Please open a Github issue to ask a question, report a bug, request features, etc. If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are welcome.

omega_h's People

Contributors

akstagg avatar andrewfu77 avatar angelyr avatar bartgol avatar bgranzow avatar camelliadpg avatar ckegel avatar crtrott avatar cwsmith avatar dpzwick avatar gahansen avatar ibaned avatar ikalash avatar jacobmerson avatar jbakosi avatar joshia5 avatar joshua-robbins avatar jrobbin avatar magentatreehouse avatar matthew-mccall avatar matz-e avatar micpowe avatar overfelt avatar skennon10 avatar smelchio avatar tristan0x avatar weiliangchenoist avatar wortiz avatar zhangchonglin avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

omega_h's Issues

Issue converting Simmetrix to Omega_h mesh

Using the latest master branch code at commit: 466e470, I am having problem converting a *.sms file to *.osh file, following the instruction: https://github.com/SCOREC/pumi-pic/wiki/simmetrix-to-omegah-mesh-conversion. The short error message is as below:

k_gene_comp_case5_fix.eqd.smd 37kmesh.osh
tet=0, hex=0, wedge=0, pyramid=0
assertion !has_ents(ent_dim) failed at /lore/zhangc20/pumi-pic/omega_h/src/Omega_h_mesh.cpp +112
Aborted (core dumped)

The *.sms file was generated using simmetrix-simmodsuite/16.0-210606dev. Similarly, meshsim2osh was generated using the same version. The conversion was working fine with everything generated using simmetrix-simmodsuite/15.0-200714-x7ycno2.

The case to reproduce this: https://github.com/SCOREC/xgc1_data/tree/master/Cyclone_ITG/Cyclone_ITG_deltaf_37kmesh.

replace vendor specific sort calls with kokkos::sort using custom comparator

Kokkos 4.2.0 kokkos/kokkos#6197 is adding 'sort: support custom comparator kokkos/kokkos#6253'. We should try to replace the Intel DPL and HIP/CUDA Thrust sorting calls with this:

#if defined(OMEGA_H_USE_KOKKOS) and defined(OMEGA_H_USE_SYCL)
auto space = Kokkos::Experimental::SYCL();
const auto q = *space.impl_internal_space_instance()->m_queue;
auto policy = ::oneapi::dpl::execution::make_device_policy(q);
oneapi::dpl::sort(policy,b,e,c);
#elif defined(OMEGA_H_USE_CUDA) || defined(OMEGA_H_USE_HIP)
auto bptr = thrust::device_ptr<T>(b);
auto eptr = thrust::device_ptr<T>(e);
thrust::stable_sort(bptr, eptr, c);
#elif defined(OMEGA_H_USE_OPENMP)

Add cuda/kokkos builds to GitHub actions

Currently GitHub actions is only testing with CPU only versions. Since there are different code paths for kokkos and cuda we should add those configurations to the tests.

periodic_test fails

The periodic_test with a build of master with the Kokkos Serial backend fails with a seg fault. Below is the output of valgrind from one of the two processes; the other process had a similar trace.

Omega_h cmake args:

$ cat Omega_h_cmake_args.txt
-DBUILD_TESTING:BOOL="on" -DBUILD_SHARED_LIBS:BOOL="on" -DCMAKE_INSTALL_PREFIX:PATH="/space/cwsmith/omegahKkVersions/buildOmegahSimKokkosSerialMpion_master/install" -DOmega_h_USE_Kokkos:BOOL="on" -DKokkos_PREFIX:PATH="/space/cwsmith/omegahKkVersions/buildKokkos/install" -DOmega_h_USE_SimModSuite:BOOL="on" -DOmega_h_USE_MPI:BOOL="on" -DOmega_h_USE_MPI:BOOL="on" -DOmega_h_USE_Kokkos:BOOL="on" -DKokkos_PREFIX:PATH="/space/cwsmith/omegahKkVersions/buildKokkos/install" -DOmega_h_USE_MPI:BOOL="on" -DOmega_h_USE_OpenMP:BOOL="OFF" -DOmega_h_USE_CUDA:BOOL="OFF"

Versions

omegah - master @ c5f1dc9d
kokkos - develop @ ed08974c7 (newer than last tagged version of 4.2.00)
simmetrix simmodsuite - 2023.1-230907dev

Valgrind output:

==3612296== Memcheck, a memory error detector
==3612296== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==3612296== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==3612296== Command: ./src/periodic_test /space/cwsmith/omegahKkVersions/omega_h_master/meshes/wedge_matchZ_12elem.sms /space/cwsmith/omegahKkVersions/omega_h_master/meshes/wedge_match.smd /space/cwsmith/omegahKkVersions/omega_h_master/meshes/wedge_matchZ_12elem_sync_2.osh 2
==3612296== Parent PID: 3612294
==3612296==
==3612296== Invalid read of size 4
==3612296==    at 0x6654270: host_atomic_fetch_oper<desul::Impl::sub_operator<int, int const>, int, desul::MemoryOrderRelaxed> (Fetch_Op_ScopeCaller.hpp:44)
==3612296==    by 0x6654270: host_atomic_fetch_sub<int, desul::MemoryOrderRelaxed, desul::MemoryScopeCaller> (Fetch_Op_Generic.hpp:40)
==3612296==    by 0x6654270: atomic_fetch_sub<int, desul::MemoryOrderRelaxed, desul::MemoryScopeCaller> (Generic.hpp:60)
==3612296==    by 0x6654270: atomic_fetch_sub<int> (Kokkos_Atomics_Desul_Wrapper.hpp:83)
==3612296==    by 0x6654270: Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) (Kokkos_SharedAlloc.cpp:212)
==3612296==    by 0x5213382: assign_direct (Kokkos_SharedAlloc.hpp:477)
==3612296==    by 0x5213382: Kokkos::Impl::ViewTracker<Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > >::operator=(Kokkos::Impl::ViewTracker<Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > > const&) (Kokkos_ViewTracker.hpp:79)
==3612296==    by 0x521076E: Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::operator=(Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&) (Kokkos_View.hpp:1288)
==3612296==    by 0x520BA08: Omega_h::Write<int>::operator=(Omega_h::Write<int> const&) (Omega_h_array.hpp:49)
==3612296==    by 0x5221F08: Omega_h::Read<int>::operator=(Omega_h::Read<int> const&) (Omega_h_array.hpp:88)
==3612296==    by 0x5451023: Omega_h::Mesh::copy_meta() const (Omega_h_mesh.cpp:1235)
==3612296==    by 0x54BE3C9: Omega_h::migrate_mesh(Omega_h::Mesh*, Omega_h::Dist, Omega_h_Parting, bool) (Omega_h_migrate.cpp:383)
==3612296==    by 0x544D863: Omega_h::Mesh::balance(bool) (Omega_h_mesh.cpp:956)
==3612296==    by 0x41CFCF: main (periodic_test.cpp:61)
==3612296==  Address 0x38 is not stack'd, malloc'd or (recently) free'd
==3612296==
==3612296==
==3612296== Process terminating with default action of signal 11 (SIGSEGV)
==3612296==  Access not within mapped region at address 0x38
==3612296==    at 0x6654270: host_atomic_fetch_oper<desul::Impl::sub_operator<int, int const>, int, desul::MemoryOrderRelaxed> (Fetch_Op_ScopeCaller.hpp:44)
==3612296==    by 0x6654270: host_atomic_fetch_sub<int, desul::MemoryOrderRelaxed, desul::MemoryScopeCaller> (Fetch_Op_Generic.hpp:40)
==3612296==    by 0x6654270: atomic_fetch_sub<int, desul::MemoryOrderRelaxed, desul::MemoryScopeCaller> (Generic.hpp:60)
==3612296==    by 0x6654270: atomic_fetch_sub<int> (Kokkos_Atomics_Desul_Wrapper.hpp:83)
==3612296==    by 0x6654270: Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) (Kokkos_SharedAlloc.cpp:212)
==3612296==    by 0x5213382: assign_direct (Kokkos_SharedAlloc.hpp:477)
==3612296==    by 0x5213382: Kokkos::Impl::ViewTracker<Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > >::operator=(Kokkos::Impl::ViewTracker<Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > > const&) (Kokkos_ViewTracker.hpp:79)
==3612296==    by 0x521076E: Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::operator=(Kokkos::View<int*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&) (Kokkos_View.hpp:1288)
==3612296==    by 0x520BA08: Omega_h::Write<int>::operator=(Omega_h::Write<int> const&) (Omega_h_array.hpp:49)
==3612296==    by 0x5221F08: Omega_h::Read<int>::operator=(Omega_h::Read<int> const&) (Omega_h_array.hpp:88)
==3612296==    by 0x5451023: Omega_h::Mesh::copy_meta() const (Omega_h_mesh.cpp:1235)
==3612296==    by 0x54BE3C9: Omega_h::migrate_mesh(Omega_h::Mesh*, Omega_h::Dist, Omega_h_Parting, bool) (Omega_h_migrate.cpp:383)
==3612296==    by 0x544D863: Omega_h::Mesh::balance(bool) (Omega_h_mesh.cpp:956)
==3612296==    by 0x41CFCF: main (periodic_test.cpp:61)
==3612296==  If you believe this happened as a result of a stack
==3612296==  overflow in your program's main thread (unlikely but
==3612296==  possible), you can try to increase the size of the
==3612296==  main thread stack using the --main-stacksize= flag.
==3612296==  The main thread stack size used in this run was 8388608.
==3612296==
==3612296== HEAP SUMMARY:
==3612296==     in use at exit: 13,116,178 bytes in 4,205 blocks
==3612296==   total heap usage: 15,374 allocs, 11,169 frees, 14,496,121 bytes allocated
==3612296==
==3612296== LEAK SUMMARY:
==3612296==    definitely lost: 0 bytes in 0 blocks
==3612296==    indirectly lost: 0 bytes in 0 blocks
==3612296==      possibly lost: 10,525 bytes in 206 blocks
==3612296==    still reachable: 13,105,653 bytes in 3,999 blocks
==3612296==         suppressed: 0 bytes in 0 blocks
==3612296== Rerun with --leak-check=full to see details of leaked memory
==3612296==
==3612296== For lists of detected and suppressed errors, rerun with: -s
==3612296== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

build errors using gcc10 on rhel7 with cuda disabled

on cranium rhel7

env:

module purge
module unuse /opt/scorec/spack/lmod/linux-rhel7-x86_64/Core
module use /opt/scorec/spack/v0154_2/lmod/linux-rhel7-x86_64/Core
module load gcc mpich simmetrix-simmodsuite/16.0-210202dev cmake
export MPICH_CXX=g++

config:

cmake $1 \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_MPI=on \
-DCMAKE_CXX_COMPILER=`which mpicxx`

there are multiple errors - the following appears to be a problem with the lambda syntax

cd /lore/cwsmith/develop/simmetrixToOmegah/build-omegah/src && /opt/scorec/spack/v0154_2/install/linux-rhel7-x86_64/gcc-10.1.0/mpich-3.3.2-gi3wrjquyo564rk27x6r2c6ilr7ndmpl/bin/mpicxx   -I/lore/cwsmith/develop/simmetrixToOmegah/omega_h/src
 -I/lore/cwsmith/develop/simmetrixToOmegah/build-omegah/src -I/lore/cwsmith/develop/simmetrixToOmegah/omega_h/tpl  -O3 -g -Werror -Wall -Wextra -Wdouble-promotion -Wshadow -Wformat=2 -Wduplicated-cond -Wnull-dereference -Wlogical-op -Wres
trict -Wduplicated-branches --std=c++11  -Wno-subobject-linkage    -o CMakeFiles/omega_h.dir/Omega_h_array_ops.cpp.o -c /lore/cwsmith/develop/simmetrixToOmegah/omega_h/src/Omega_h_array_ops.cpp
/lore/cwsmith/develop/simmetrixToOmegah/omega_h/src/Omega_h_array_ops.cpp: In lambda function:
/lore/cwsmith/develop/simmetrixToOmegah/omega_h/src/Omega_h_array_ops.cpp:17:40: error: expected '{' before '->' token
   17 |   auto transform = OMEGA_H_LAMBDA(LO i)->bool { return a[i] == b[i]; };
      |                                        ^~

cmake output:

 ../omega_h/doConfigNoCudaNoSim.sh  ../omega_h/
-- The CXX compiler identification is GNU 10.1.0
-- Check for working CXX compiler: /opt/scorec/spack/v0154_2/install/linux-rhel7-x86_64/gcc-10.1.0/mpich-3.3.2-gi3wrjquyo564rk27x6r2c6ilr7ndmpl/bin/mpicxx
-- Check for working CXX compiler: /opt/scorec/spack/v0154_2/install/linux-rhel7-x86_64/gcc-10.1.0/mpich-3.3.2-gi3wrjquyo564rk27x6r2c6ilr7ndmpl/bin/mpicxx - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_VERSION: 3.17.3
-- Omega_h_VERSION: 9.27.0
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: OFF
-- CMAKE_INSTALL_PREFIX: /lore/cwsmith/develop/simmetrixToOmegah/build-omegah/install
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA: 
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX: 
-- Omega_h_USE_Kokkos: OFF
-- Kokkos_PREFIX: 
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_USE_SimModSuite: OFF
-- Omega_h_VALGRIND: 
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX: 
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7") 
-- Omega_h_USE_Kokkos: OFF
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_MPI: on
-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: OFF
-- Omega_h_CXX_OPTIMIZE: ON
-- Omega_h_CXX_SYMBOLS: ON
-- Omega_h_ARCH: 
-- Omega_h_CXX_W**NINGS: ON
-- Omega_h_CXX_FLAGS: 
-- Omega_h_EXTRA_CXX_FLAGS: 
-- CMAKE_CXX_FLAGS:  -O3 -g -Werror -Wall -Wextra -Wdouble-promotion -Wshadow -Wformat=2 -Wduplicated-cond -Wnull-dereference -Wlogical-op -Wrestrict -Wduplicated-branches --std=c++11  -Wno-subobject-linkage 
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 9.27.0-sha.25c1039+10001000000000
-- Configuring done
-- Generating done
-- Build files have been written to: /lore/cwsmith/develop/simmetrixToOmegah/build-omegah

Warning message configuring with kokkos 3.4.1 on Perlmutter

Configure and building omega_h master branch at commit: eac504c on Perlmutter, one warning message:

  MPI Libraries not found, there may be linking errors! Try enabling shared
  libraries '-DBUILD_SHARED_LIBS=ON'.

Full log is below:

-- Cray Programming Environment 2.7.11 CXX
-- CMAKE_VERSION: 3.20.5
-- Omega_h_VERSION: 9.33.0
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: OFF
-- CMAKE_INSTALL_PREFIX: /global/homes/z/zhangc20/xgcm/install/omega_h/install
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA: 
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX: 
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /global/homes/z/zhangc20/xgcm/install/kokkos/install/lib64/cmake
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_USE_SimModSuite: OFF
-- Omega_h_VALGRIND: 
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX: 
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /global/homes/z/zhangc20/xgcm/install/kokkos/install/lib64/cmake
-- Enabled Kokkos devices: CUDA;SERIAL
-- kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos
-- Kokkos_CONFIG: /global/homes/z/zhangc20/xgcm/install/kokkos/install/lib64/cmake/Kokkos/KokkosConfig.cmake
-- Kokkos_VERSION: 3.4.01
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_MPI: ON
-- Omega_h_USE_MPI: ON
-- MPI_PREFIX: 
CMake Warning at CMakeLists.txt:102 (message):
  MPI Libraries not found, there may be linking errors! Try enabling shared
  libraries '-DBUILD_SHARED_LIBS=ON'.


-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 9.33.0-sha.eac504c0+110110000000000
-- Configuring done
-- Generating done

The building was fine though. Script used is below:

module load PrgEnv-gnu                                                          
module load cpe-cuda                                                            
module load cmake/3.20.5                                                        
module load cuda/11.1.1                                                         
                                                                                
export cuda=$CUDA_DIR                                                           
export PATH=$cuda/bin:$PATH                                                     
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH                             
export installroot=$PWD                                                         
export kk=$installroot/kokkos/install                                           
export srcroot=$installroot/../                                                 
export kksrc=$srcroot/kokkos                                                    
                                                                                
export oh=$installroot/omega_h/install                                          
export ohsrc=$srcroot/omega_h                                                   
                                                                                
export OMPI_CXX=$kksrc/bin/nvcc_wrapper                                         
                                                                                
cd $installroot                                                                 
mkdir -p omega_h/build                                                          
cd omega_h/build                                                                
cmake $ohsrc -DCMAKE_BUILD_TYPE=Debug \                                         
             -DCMAKE_INSTALL_PREFIX=$oh -DBUILD_SHARED_LIBS=OFF \               
             -DOmega_h_USE_CUDA=on -DOmega_h_CUDA_ARCH=80 \                     
             -DOmega_h_USE_Kokkos=ON -DOmega_h_USE_MPI=ON \                     
             -DCMAKE_CXX_COMPILER=CC \                                          
             -DCMAKE_CUDA_FLAGS="-I$MPICH_DIR/include" \                        
             -DKokkos_PREFIX=$kk/lib64/cmake/                                   
                                                                                
make -j4 install

classification in simmetrix mesh converter

From the discussion in the TOMMS issue here: https://github.com/SCOREC/tomms/issues/17

The simmetrix -> omegah mesh converter is calling classify_equal_order to set the geometric model entity id and dimension:

classify_equal_order(mesh, ent_dim, eqv2v, host_class_id.write());

clasify_equal_order is defined here:

void classify_equal_order(
Mesh* mesh, Int ent_dim, LOs eqv2v, Read<ClassId> eq_class_ids) {
LOs eq2e;
if (ent_dim == mesh->dim()) {
/* assuming elements were constructed in the same order ! */
eq2e = LOs(mesh->nelems(), 0, 1);
} else if (ent_dim == VERT) {
eq2e = eqv2v;
} else {
Write<LO> eq2e_w;
Write<I8> codes;
auto ev2v = mesh->ask_verts_of(ent_dim);
auto v2e = mesh->ask_up(VERT, ent_dim);
find_matches(mesh->family(), ent_dim, eqv2v, ev2v, v2e, &eq2e_w, &codes);
eq2e = eq2e_w;
}
auto neq = eqv2v.size() / (ent_dim + 1);
auto eq_class_dim = Read<I8>(neq, I8(ent_dim));
auto class_dim =
map_onto(eq_class_dim, eq2e, mesh->nents(ent_dim), I8(mesh->dim()), 1);
auto class_id = map_onto(eq_class_ids, eq2e, mesh->nents(ent_dim), -1, 1);
mesh->add_tag<I8>(ent_dim, "class_dim", 1, class_dim);
mesh->add_tag<ClassId>(ent_dim, "class_id", 1, class_id);

It isn't clear if this is necessary since we are explicitly querying the classification info from simmetrix; I suspect we can just set the "class_dim" and "class_id" tags directly.

New issue building omega_h on Perlmutter with kokkos 3.4.01, gcc 11.2, and cuda 11.7

Previously, I was able to build omega_h on Perlmutter with kokkos 3.4.01, gcc 11.2, and cuda 11.7 without any issue (about 2 weeks ago after the last major maintenance of Perlmutter). But with a fresh build, I just encountered the following issue.

Searching online, I was able to resolve it by adding -DCMAKE_CXX_FLAGS='-std=c++14' in the build script. I am wondering what could be the reason for this change @cwsmith.

-- kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos
-- Kokkos_CONFIG: /global/homes/z/zhangc20/xgcm/install_cuda11.7/kokkos/install/lib64/cmake/Kokkos/KokkosConfig.cmake
-- Kokkos_VERSION: 3.4.01
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_OpenMP: ON
-- Omega_h_USE_MPI: ON
-- Omega_h_USE_OpenMP: ON
-- Omega_h_USE_CUDA: on
-- The CUDA compiler identification is NVIDIA 11.7.64
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 10.5.0-sha.a63aff26+100111100000000000
-- Configuring done
-- Generating done
-- Build files have been written to: /global/homes/z/zhangc20/xgcm/install_cuda11.7/omega_h/build
[  2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o
[  2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o
[  2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o
[  3%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o
[  4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o
[  5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o
[  6%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o
[  6%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:71:52: error: redefinition of 'constexpr const _Tp std::integral_constant<_Tp, __v>::value'
   71 |   template<typename _Tp, _Tp __v>
      |                                                    ^                          
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:59:29: note: 'constexpr const _Tp value' previously declared here
   59 |       static constexpr _Tp                  value = __v;
      |                             ^~~~~
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:160: src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:71:52: error: redefinition of 'constexpr const _Tp std::integral_constant<_Tp, __v>::value'
   71 |   template<typename _Tp, _Tp __v>
      |                                                    ^                          
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:59:29: note: 'constexpr const _Tp value' previously declared here
   59 |       static constexpr _Tp                  value = __v;
      |                             ^~~~~
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:71:52: error: redefinition of 'constexpr const _Tp std::integral_constant<_Tp, __v>::value'
   71 |   template<typename _Tp, _Tp __v>
      |                                                    ^                          
/opt/cray/pe/gcc/11.2.0/snos/include/g++/type_traits:59:29: note: 'constexpr const _Tp value' previously declared here
   59 |       static constexpr _Tp                  value = __v;
      |                             ^~~~~
/global/u2/z/zhangc20/xgcm/install_cuda11.7/kokkos/install/include/impl/Kokkos_ViewMapping.hpp:122:443: error: redefinition of 'constexpr const size_t Kokkos::Impl::ViewDimension0<V, <anonymous> >::ArgN0'
  122 | KOKKOS_IMPL_VIEW_DIMENSION(0)
      |                                                                                                                                                                                                                                                                                                                                                                                                                                                           ^                    
/global/u2/z/zhangc20/xgcm/install_cuda11.7/kokkos/install/include/impl/Kokkos_ViewMapping.hpp:122:87: note: 'constexpr const size_t Kokkos::Impl::ViewDimension0<V, <anonymous> >::ArgN0' previously declared here
  122 | KOKKOS_IMPL_VIEW_DIMENSION(0)
      |                                                                                       ^    
/global/u2/z/zhangc20/xgcm/install_cuda11.7/kokkos/install/include/impl/Kokkos_ViewMapping.hpp:122:532: error: redefinition of 'constexpr const size_t Kokkos::Impl::ViewDimension0<V, <anonymous> >::N0'
  122 | KOKKOS_IMPL_VIEW_DIMENSION(0)

build with gcc-6.3.0

Creating issue and copying relevant discussion from #35

From Dhyan:

Building without CUDA / kokkos and with gcc-6.3.0 gives errors like

/fusion/usc/opt/gcc/gcc-6.3.0/include/c++/6.3.0/bits/unique_ptr.h:158:17: note: candidate expects 0 arguments, 1 provided
In file included from /home/nathd/omega_h/src/Omega_h_array.hpp:4:0,
from /home/nathd/omega_h/src/Omega_h_comm.hpp:7,
from /home/nathd/omega_h/src/Omega_h_array_ops.hpp:6,
from /home/nathd/omega_h/src/Omega_h_rcFields.cpp:4:
/home/nathd/omega_h/src/Omega_h_defines.hpp: In instantiation of ‘auto Omega_h::apply_to_omega_h_types(Omega_h_Type, const F&) [with F = Omega_h::Mesh::get_rc_mesh_tag_from_rc_tag(Omega_h::Int, const Omega_h::TagBase*)::<lambda(auto:1)>]’:
/home/nathd/omega_h/src/Omega_h_rcFields.cpp:340:4: required from here
/home/nathd/omega_h/src/Omega_h_defines.hpp:93:15: error: use of ‘Omega_h::Mesh::get_rc_mesh_tag_from_rc_tag(Omega_h::Int, const Omega_h::TagBase*)::<lambda(auto:1)> [with auto:1 = int]’ before deduction of ‘auto’
return f(I32{});
~^~~~~~~
/home/nathd/omega_h/src/Omega_h_defines.hpp:93:21: error: return-statement with a value, in function returning 'void' [-fpermissive]
return f(I32{});
^
/home/nathd/omega_h/src/Omega_h_defines.hpp:97:15: error: use of ‘Omega_h::Mesh::get_rc_mesh_tag_from_rc_tag(Omega_h::Int, const Omega_h::TagBase*)::<lambda(auto:1)> [with auto:1 = long int]’ before deduction of ‘auto’
return f(I64{});

=====================================================================

building with cuda 11.2 fails

The following compilation error is repeatable on SCOREC rhel7 and Aimos rhel8 using omegah d56a86e. The build instructions are listed below.

Thrust version is 1.10.0 ( packaged with CUDA Toolkit 11.2) according to https://github.com/NVIDIA/thrust.

Dan reported it already! NVIDIA/thrust#1332

error

[  0%] Building CUDA object src/CMakeFiles/omega_h.dir/Omega_h_int_scan.cpp.o                                                                                                            
cd /space/cwsmith/omegahCuda11/buildCuda11/src && /usr/local/cuda-11.2/bin/nvcc -forward-unknown-to-host-compiler -Domega_h_EXPORTS -I/space/cwsmith/omegahCuda11/omega_h/src -I/space/cw
/usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/scan.h(578): error: array of reference is not allowed                                                 
          detected during:                                                                                                                                                               
            instantiation of class "thrust::cuda_cub::__scan::DoNothing<T> [with T=const Omega_h::LO &]"                                                                                 
(784): here                                                                                                                                                                              
            instantiation of "OutputIt thrust::cuda_cub::inclusive_scan_n(thrust::cuda_cub::execution_policy<Derived> &, InputIt, Size, OutputIt, ScanOp) [with Derived=thrust::cuda_cub:
/usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/transform_scan.h(72): here                                                                            
            instantiation of "OutputIt thrust::cuda_cub::transform_inclusive_scan(thrust::cuda_cub::execution_policy<Derived> &, InputIt, InputIt, OutputIt, TransformOp, ScanOp) [with D
/usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/thrust/detail/transform_scan.inl(47): here                                                                                      
            instantiation of "OutputIterator thrust::transform_inclusive_scan(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, OutputIterator,
/space/cwsmith/omegahCuda11/omega_h/src/Omega_h_scan.hpp(83): here                                                                                                                       
            instantiation of "OutputIterator Omega_h::transform_inclusive_scan(InputIterator, InputIterator, OutputIterator, BinaryOp, UnaryOp) [with InputIterator=Omega_h::LO *, Output
/space/cwsmith/omegahCuda11/omega_h/src/Omega_h_int_scan.cpp(32): here

SCOREC Build

The following was tested on cranium.

env

module use /opt/scorec/spack/dev/lmod/linux-rhel7-x86_64/Core
module unuse /opt/scorec/spack/lmod/linux-rhel7-x86_64/Core
module load gcc/7.4.0-c5aaloy cuda/11.2
module load mpich/3.3.1-bfezl2l
module load cmake

cmake

cmake ../omega_h \
  -DCMAKE_INSTALL_PREFIX=$PWD/install \
  -DBUILD_TESTING=on  \
  -DOmega_h_USE_CUDA=on \
  -DOmega_h_CUDA_ARCH=75 \
  -DOmega_h_USE_MPI=on  \
  -DBUILD_SHARED_LIBS=ON

AiMOS Build

The following was tested on dcs217 (a rhel8 fen).

env

module load spectrum-mpi/10.4
module load cmake/3.20.0
module load cuda/11.2

cmake

cmake ../omega_h \
  -DCMAKE_INSTALL_PREFIX=$oh \
  -DBUILD_SHARED_LIBS=OFF \
  -DOmega_h_USE_Kokkos=ON \
  -DOmega_h_USE_CUDA=on \
  -DOmega_h_CUDA_ARCH=70 \
  -DOmega_h_USE_MPI=on  \
  -DBUILD_TESTING=on  \
  -DCMAKE_CXX_COMPILER=g++ \
  -DKokkos_PREFIX=$kk/lib64/cmake

ask_revClass performance

Summary:

Performance of ask_revClass is reported as being significantly better on scorec rhel7 cranium than AiMOS. The testing below runs the src/reverse_class_test to exercise this api on the mesh in question.

Note, src/reverse_class_test also runs a small 2d mesh.... so the timings aren't directly from the 146k tet mesh in question commenting out the part of the test that runs on the 2d mesh (here) results in less than a one second change in the ask_revClass time.

Details:

  • test mesh w/ 146k tets: /lore/nathd/INSTALLATION/omega_h-install/bin/Tomms_15ge_16me/mesh_15_16.osh
  • software versions (testing on scorec rhel7 cranium and aimos used the same versions unless otherwise noted)
    • omegah: master@466e4702
    • kokkos: 3.1.00
    • cuda: 11.1 on AiMOS, 10.2 on SCOREC
    • gcc: 8.4.1 on AiMOS, 7.4 on SCOREC
  • ~65s spent in ask_revClass on AiMOS (dcs090), built using the script buildAllKk.sh (pasted below). This script is the kokkos+omega_h steps from the GITRm Build Instructions.
    • the output from --osh-time is listed below: kk.log
  • ~60s spent in ask_revClass on AiMOS (dcs090), built using the script buildAllNoKk.sh (pasted below).
    • the output from --osh-time is listed below: noKk.log
  • ~42s spent in ask_revClass on SCOREC Rhel7 cranium, built using the script buildAllNoKk_scorec.sh. The script uses the Build with Kokkos Disabled instructions.
    • the output from --osh-time is listed below: scorec_noKk.log
    • building and running with cuda11.1 resulted in a ask_revClass time that was within 1 second of the cuda10.2 build & run
  • enabling the memory pool --osh-pool, for all builds, reduces the run time by less than a second in ask_revClass

buildAllKk.sh

module load spectrum-mpi/10.4
module load cmake/3.20.0
module load cuda/11.1

export root=$PWD
export OMPI_CXX=$root/kokkos/bin/nvcc_wrapper
export OMPI_CC=gcc

[ "$root" != "$PWD" ] && exit 1

build=build-dcsRhel8-gcc74
export kk=$root/${build}-kokkos/install
export oh=$root/${build}-omegah/install
export CMAKE_PREFIX_PATH=$kk:$oh:$CMAKE_PREFIX_PATH

#kokkos
cd $root
#git clone https://github.com/kokkos/kokkos.git
cd kokkos
git checkout 3.1.00
cd -
[ -d $kk ] && rm -rf ${kk%%install}
mkdir -p $kk
cd ${kk%%install}
cmake ../kokkos \
  -DCMAKE_CXX_COMPILER=$root/kokkos/bin/nvcc_wrapper \
  -DKokkos_ARCH_VOLTA70=ON \
  -DKokkos_ENABLE_SERIAL=ON \
  -DKokkos_ENABLE_OPENMP=off \
  -DKokkos_ENABLE_CUDA=on \
  -DKokkos_ENABLE_CUDA_LAMBDA=on \
  -DKokkos_ENABLE_DEBUG=on \
  -DKokkos_ENABLE_PROFILING=on \
  -DCMAKE_INSTALL_PREFIX=$kk
make -j 24 install

# Omega_h
cd $root
git clone https://github.com/SCOREC/omega_h.git
cd omega_h
git checkout 466e4702
cd $root
[ -d $oh ] && rm -rf ${oh%%install}
mkdir -p $oh 
cd ${oh%%install}
cmake ../omega_h \
  -DCMAKE_INSTALL_PREFIX=$oh \
  -DBUILD_SHARED_LIBS=OFF \
  -DOmega_h_USE_Kokkos=ON \
  -DOmega_h_USE_CUDA=on \
  -DOmega_h_CUDA_ARCH=70 \
  -DOmega_h_USE_MPI=on  \
  -DBUILD_TESTING=on  \
  -DCMAKE_CXX_COMPILER=g++ \
  -DKokkos_PREFIX=$kk/lib64/cmake
make reverse_class_test -j8
#allocate a node: salloc -n 1 --gres=gpu:6 -t60
#ssh to it, and run the following commands
#note, the environment (at the top of this script) needs to be setup
mpirun -np 1 --bind-to core ./src/reverse_class_test --kokkos-num-devices=1 --osh-time ../omega_h/meshes/plate_6elem.osh /gpfs/u/home/MPFS/MPFSsmth/killme/mesh_15_16.osh &> kk.log
mpirun -np 1 --bind-to core ./src/reverse_class_test --kokkos-num-devices=1 --osh-pool --osh-time ../omega_h/meshes/plate_6elem.osh /gpfs/u/home/MPFS/MPFSsmth/killme/mesh_15_16.osh &> kk_pool.log

buildAllNoKk.sh

module load spectrum-mpi/10.4
module load cmake/3.20.0
module load cuda/11.1

export root=$PWD
export OMPI_CXX=g++
export OMPI_CC=gcc

git clone https://github.com/SCOREC/omega_h.git
cd omega_h
git checkout 466e4702
cd $root

d=build-dcsRhel8-gcc74-omegahNoKk
[[ ! -d $d ]] && mkdir -p $d
cd $d

cmake $root/omega_h \
  -DCMAKE_INSTALL_PREFIX=$PWD/install \
  -DBUILD_SHARED_LIBS=OFF \
  -DOmega_h_USE_Kokkos=OFF \
  -DOmega_h_USE_CUDA=on \
  -DOmega_h_CUDA_ARCH=70 \
  -DOmega_h_USE_MPI=on \
  -DBUILD_TESTING=on \
  -DCMAKE_CXX_COMPILER=g++

make reverse_class_test -j8

#allocate a node: salloc -n 1 --gres=gpu:6 -t60
#ssh to it, and run the following commands
#note, the environment (at the top of this script) needs to be setup
mpirun -np 1 --bind-to core ./src/reverse_class_test --osh-time ../omega_h/meshes/plate_6elem.osh /gpfs/u/home/MPFS/MPFSsmth/killme/mesh_15_16.osh &> noKk.log
mpirun -np 1 --bind-to core ./src/reverse_class_test --osh-pool --osh-time ../omega_h/meshes/plate_6elem.osh /gpfs/u/home/MPFS/MPFSsmth/killme/mesh_15_16.osh &> noKk_pool.log

buildAllNoKk_scorec.sh

module use /opt/scorec/spack/dev/lmod/linux-rhel7-x86_64/Core
module unuse /opt/scorec/spack/lmod/linux-rhel7-x86_64/Core
module load gcc/7.4.0-c5aaloy cuda/10.2
module load mpich/3.3.1-bfezl2l
module load cmake

export root=$PWD
export MPICH_CXX=g++
export MPICH_CC=gcc

git clone https://github.com/SCOREC/omega_h.git
cd omega_h
git checkout 466e4702
cd $root

d=build-rhel7-gcc74-omegahNoKk
[[ ! -d $d ]] && mkdir -p $d
cd $d

cmake $root/omega_h \
  -DCMAKE_INSTALL_PREFIX=$PWD/install \
  -DBUILD_SHARED_LIBS=ON \
  -DOmega_h_USE_Kokkos=OFF \
  -DOmega_h_USE_CUDA=on \
  -DOmega_h_CUDA_ARCH=75 \
  -DOmega_h_USE_MPI=on \
  -DBUILD_TESTING=on

make reverse_class_test -j8

tommsMesh=/space/cwsmith/omegahRcTesting/mesh_15_16.osh
mpirun -np 1 --bind-to core ./src/reverse_class_test --osh-time ../omega_h/meshes/plate_6elem.osh $tommsMesh &> noKk.log
mpirun -np 1 --bind-to core ./src/reverse_class_test --osh-pool --osh-time ../omega_h/meshes/plate_6elem.osh $tommsMesh &> noKk_pool.log

kk.log

TOP-DOWN:
=========
ask_revClass 65.0243 14
|  derive_revClass 65.024 7
|  |  sort_by_high_index 65.0188 7
|  |  Write allocation 0.00182296 28
|  |  offset_scan 0.000607692 7
|  |  |  Write allocation 0.000253348 7
|  |  |  device_free 9.2566e-05 7
|  |  |  single host to device 6.5203e-05 7
|  |  |  device_malloc 4.825e-05 7
|  offset_scan 8.2885e-05 1
|  |  Write allocation 3.4471e-05 1
|  |  device_free 1.196e-05 1
|  |  single host to device 8.742e-06 1
|  |  device_malloc 6.375e-06 1
|  Write allocation 7.0067e-05 2
|  ask_revClass 6.39e-07 1
binary::read(path, comm, mesh, strict) 0.0694269 2
|  binary::read_in_comm(path, comm, mesh, version) 0.067175 2
|  |  binary::read(istream, mesh, version) 0.0669596 2
|  |  |  Write allocation 0.0028541 27
|  |  |  array host to device 0.00136568 27
|  |  |  set_ents 1.7123e-05 5
ask_adj 0.00121137 6
|  derive_adj 0.00120489 2
|  |  transit 0.00119605 2
|  |  |  Write allocation 0.000442638 2
|  |  ask_adj 2.695e-06 4
Write allocation 0.000750138 21
array host to device 0.000208103 9

BOTTOM-UP:
==========
sort_by_high_index 65.0188 7
|  derive_revClass 65.0188 7
|  |  ask_revClass 65.0188 7
binary::read(istream, mesh, version) 0.0627227 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0627227 2
|  |  binary::read(path, comm, mesh, strict) 0.0627227 2
Write allocation 0.00622772 88
|  binary::read(istream, mesh, version) 0.0028541 27
|  |  binary::read_in_comm(path, comm, mesh, version) 0.0028541 27
|  |  |  binary::read(path, comm, mesh, strict) 0.0028541 27
|  derive_revClass 0.00182296 28
|  |  ask_revClass 0.00182296 28
|  transit 0.000442638 2
|  |  derive_adj 0.000442638 2
|  |  |  ask_adj 0.000442638 2
|  offset_scan 0.000287819 8
|  |  derive_revClass 0.000253348 7
|  |  |  ask_revClass 0.000253348 7
|  |  ask_revClass 3.4471e-05 1
|  ask_revClass 7.0067e-05 2
derive_revClass 0.00270772 7
|  ask_revClass 0.00270772 7
binary::read(path, comm, mesh, strict) 0.00225191 2
array host to device 0.00157378 36
|  binary::read(istream, mesh, version) 0.00136568 27
|  |  binary::read_in_comm(path, comm, mesh, version) 0.00136568 27
|  |  |  binary::read(path, comm, mesh, strict) 0.00136568 27
transit 0.000753416 2
|  derive_adj 0.000753416 2
|  |  ask_adj 0.000753416 2
binary::read_in_comm(path, comm, mesh, version) 0.000215453 2
|  binary::read(path, comm, mesh, strict) 0.000215453 2
offset_scan 0.000169662 8
|  derive_revClass 0.000148325 7
|  |  ask_revClass 0.000148325 7
|  ask_revClass 2.1337e-05 1
ask_revClass 0.000161856 15
|  ask_revClass 6.39e-07 1
device_free 0.000104526 8
|  offset_scan 0.000104526 8
|  |  derive_revClass 9.2566e-05 7
|  |  |  ask_revClass 9.2566e-05 7
|  |  ask_revClass 1.196e-05 1
single host to device 7.3945e-05 8
|  offset_scan 7.3945e-05 8
|  |  derive_revClass 6.5203e-05 7
|  |  |  ask_revClass 6.5203e-05 7
|  |  ask_revClass 8.742e-06 1
device_malloc 5.4625e-05 8
|  offset_scan 5.4625e-05 8
|  |  derive_revClass 4.825e-05 7
|  |  |  ask_revClass 4.825e-05 7
|  |  ask_revClass 6.375e-06 1
set_ents 1.7123e-05 5
|  binary::read(istream, mesh, version) 1.7123e-05 5
|  |  binary::read_in_comm(path, comm, mesh, version) 1.7123e-05 5
|  |  |  binary::read(path, comm, mesh, strict) 1.7123e-05 5
ask_adj 9.174e-06 10
|  derive_adj 2.695e-06 4
|  |  ask_adj 2.695e-06 4
derive_adj 6.143e-06 2
|  ask_adj 6.143e-06 2

noKk.log

TOP-DOWN:
=========
ask_revClass 59.7207 14
|  derive_revClass 59.7205 7
|  |  sort_by_high_index 59.7161 7
|  |  Write allocation 0.00103472 28
|  |  |  device_malloc 0.00101018 28
|  |  device_free 0.000796198 21
|  |  offset_scan 0.000410892 7
|  |  |  device_free 9.0479e-05 7
|  |  |  single host to device 7.3759e-05 7
|  |  |  Write allocation 6.1759e-05 7
|  |  |  |  device_malloc 5.6407e-05 7
|  |  |  device_malloc 4.5161e-05 7
|  offset_scan 5.4773e-05 1
|  |  device_free 1.2484e-05 1
|  |  single host to device 9.47e-06 1
|  |  Write allocation 7.138e-06 1
|  |  |  device_malloc 5.912e-06 1
|  |  device_malloc 6.299e-06 1
|  Write allocation 1.4493e-05 2
|  |  device_malloc 1.252e-05 2
|  device_free 9.044e-06 1
|  ask_revClass 4.2e-07 1
binary::read(path, comm, mesh, strict) 0.0675819 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0658969 2
|  |  binary::read(istream, mesh, version) 0.0658028 2
|  |  |  Write allocation 0.00277324 27
|  |  |  |  device_malloc 0.00275041 27
|  |  |  array host to device 0.000851867 27
|  |  |  set_ents 1.3327e-05 5
device_free 0.00267911 66
ask_adj 0.00110366 6
|  derive_adj 0.00109784 2
|  |  transit 0.00108981 2
|  |  |  Write allocation 0.000376341 2
|  |  |  |  device_malloc 0.000372779 2
|  |  ask_adj 2.247e-06 4
Write allocation 0.00047037 21
|  device_malloc 0.000456152 21
array host to device 8.6368e-05 9

BOTTOM-UP:
==========
sort_by_high_index 59.7161 7
|  derive_revClass 59.7161 7
|  |  ask_revClass 59.7161 7
binary::read(istream, mesh, version) 0.0621644 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0621644 2
|  |  binary::read(path, comm, mesh, strict) 0.0621644 2
device_malloc 0.00471582 96
|  Write allocation 0.00466436 88
|  |  binary::read(istream, mesh, version) 0.00275041 27
|  |  |  binary::read_in_comm(path, comm, mesh, version) 0.00275041 27
|  |  |  |  binary::read(path, comm, mesh, strict) 0.00275041 27
|  |  derive_revClass 0.00101018 28
|  |  |  ask_revClass 0.00101018 28
|  |  transit 0.000372779 2
|  |  |  derive_adj 0.000372779 2
|  |  |  |  ask_adj 0.000372779 2
|  |  offset_scan 6.2319e-05 8
|  |  |  derive_revClass 5.6407e-05 7
|  |  |  |  ask_revClass 5.6407e-05 7
|  |  |  ask_revClass 5.912e-06 1
|  |  ask_revClass 1.252e-05 2
|  offset_scan 5.146e-05 8
|  |  derive_revClass 4.5161e-05 7
|  |  |  ask_revClass 4.5161e-05 7
|  |  ask_revClass 6.299e-06 1
device_free 0.00358732 96
|  derive_revClass 0.000796198 21
|  |  ask_revClass 0.000796198 21
|  offset_scan 0.000102963 8
|  |  derive_revClass 9.0479e-05 7
|  |  |  ask_revClass 9.0479e-05 7
|  |  ask_revClass 1.2484e-05 1
|  ask_revClass 9.044e-06 1
derive_revClass 0.00212429 7
|  ask_revClass 0.00212429 7
binary::read(path, comm, mesh, strict) 0.00168493 2
array host to device 0.000938235 36
|  binary::read(istream, mesh, version) 0.000851867 27
|  |  binary::read_in_comm(path, comm, mesh, version) 0.000851867 27
|  |  |  binary::read(path, comm, mesh, strict) 0.000851867 27
transit 0.000713472 2
|  derive_adj 0.000713472 2
|  |  ask_adj 0.000713472 2
offset_scan 0.000159116 8
|  derive_revClass 0.000139734 7
|  |  ask_revClass 0.000139734 7
|  ask_revClass 1.9382e-05 1
ask_revClass 0.000151177 15
|  ask_revClass 4.2e-07 1
binary::read_in_comm(path, comm, mesh, version) 9.4113e-05 2
|  binary::read(path, comm, mesh, strict) 9.4113e-05 2
single host to device 8.3229e-05 8
|  offset_scan 8.3229e-05 8
|  |  derive_revClass 7.3759e-05 7
|  |  |  ask_revClass 7.3759e-05 7
|  |  ask_revClass 9.47e-06 1
Write allocation 7.37e-05 88
|  derive_revClass 2.4536e-05 28
|  |  ask_revClass 2.4536e-05 28
|  binary::read(istream, mesh, version) 2.2833e-05 27
|  |  binary::read_in_comm(path, comm, mesh, version) 2.2833e-05 27
|  |  |  binary::read(path, comm, mesh, strict) 2.2833e-05 27
|  offset_scan 6.578e-06 8
|  |  derive_revClass 5.352e-06 7
|  |  |  ask_revClass 5.352e-06 7
|  |  ask_revClass 1.226e-06 1
|  transit 3.562e-06 2
|  |  derive_adj 3.562e-06 2
|  |  |  ask_adj 3.562e-06 2
|  ask_revClass 1.973e-06 2
set_ents 1.3327e-05 5
|  binary::read(istream, mesh, version) 1.3327e-05 5
|  |  binary::read_in_comm(path, comm, mesh, version) 1.3327e-05 5
|  |  |  binary::read(path, comm, mesh, strict) 1.3327e-05 5
ask_adj 8.063e-06 10
|  derive_adj 2.247e-06 4
|  |  ask_adj 2.247e-06 4
derive_adj 5.779e-06 2
|  ask_adj 5.779e-06 2

scorec_noKk.log

TOP-DOWN:
=========
ask_revClass 42.1864 14
|  derive_revClass 42.1862 7
|  |  sort_by_high_index 42.1842 7
|  |  Write allocation 0.000283144 28
|  |  |  device_malloc 0.000263804 28
|  |  device_free 0.000178336 21
|  |  offset_scan 0.000176175 7
|  |  |  device_free 3.9569e-05 7
|  |  |  single host to device 2.341e-05 7
|  |  |  Write allocation 2.218e-05 7
|  |  |  |  device_malloc 1.934e-05 7
|  |  |  device_malloc 1.7779e-05 7
|  offset_scan 2.5029e-05 1
|  |  device_free 6.579e-06 1
|  |  Write allocation 2.89e-06 1
|  |  |  device_malloc 2.13e-06 1
|  |  single host to device 2.89e-06 1
|  |  device_malloc 2.42e-06 1
|  Write allocation 6.04e-06 2
|  |  device_malloc 5.06e-06 2
|  ask_revClass 3.56e-06 1
|  device_free 2.52e-06 1
binary::read(path, comm, mesh, strict) 0.0365693 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0363525 2
|  |  binary::read(istream, mesh, version) 0.0363086 2
|  |  |  array host to device 0.000896288 27
|  |  |  Write allocation 0.000690983 27
|  |  |  |  device_malloc 0.000660933 27
|  |  |  set_ents 6.7368e-05 5
device_free 0.000542124 66
ask_adj 0.000218783 6
|  derive_adj 0.000211903 2
|  |  transit 0.000204544 2
|  |  |  Write allocation 8.1698e-05 2
|  |  |  |  device_malloc 7.9838e-05 2
|  |  ask_adj 1.62e-06 4
Write allocation 0.000125347 21
|  device_malloc 0.000116357 21
array host to device 2.5568e-05 9

BOTTOM-UP:
==========
sort_by_high_index 42.1842 7
|  derive_revClass 42.1842 7
|  |  ask_revClass 42.1842 7
binary::read(istream, mesh, version) 0.0346539 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0346539 2
|  |  binary::read(path, comm, mesh, strict) 0.0346539 2
derive_revClass 0.00134197 7
|  ask_revClass 0.00134197 7
device_malloc 0.00116766 96
|  Write allocation 0.00114746 88
|  |  binary::read(istream, mesh, version) 0.000660933 27
|  |  |  binary::read_in_comm(path, comm, mesh, version) 0.000660933 27
|  |  |  |  binary::read(path, comm, mesh, strict) 0.000660933 27
|  |  derive_revClass 0.000263804 28
|  |  |  ask_revClass 0.000263804 28
|  |  transit 7.9838e-05 2
|  |  |  derive_adj 7.9838e-05 2
|  |  |  |  ask_adj 7.9838e-05 2
|  |  offset_scan 2.147e-05 8
|  |  |  derive_revClass 1.934e-05 7
|  |  |  |  ask_revClass 1.934e-05 7
|  |  |  ask_revClass 2.13e-06 1
|  |  ask_revClass 5.06e-06 2
|  offset_scan 2.0199e-05 8
|  |  derive_revClass 1.7779e-05 7
|  |  |  ask_revClass 1.7779e-05 7
|  |  ask_revClass 2.42e-06 1
array host to device 0.000921856 36
|  binary::read(istream, mesh, version) 0.000896288 27
|  |  binary::read_in_comm(path, comm, mesh, version) 0.000896288 27
|  |  |  binary::read(path, comm, mesh, strict) 0.000896288 27
device_free 0.000769128 96
|  derive_revClass 0.000178336 21
|  |  ask_revClass 0.000178336 21
|  offset_scan 4.6148e-05 8
|  |  derive_revClass 3.9569e-05 7
|  |  |  ask_revClass 3.9569e-05 7
|  |  ask_revClass 6.579e-06 1
|  ask_revClass 2.52e-06 1
binary::read(path, comm, mesh, strict) 0.000216783 2
ask_revClass 0.000129035 15
|  ask_revClass 3.56e-06 1
transit 0.000122846 2
|  derive_adj 0.000122846 2
|  |  ask_adj 0.000122846 2
offset_scan 8.3487e-05 8
|  derive_revClass 7.3237e-05 7
|  |  ask_revClass 7.3237e-05 7
|  ask_revClass 1.025e-05 1
set_ents 6.7368e-05 5
|  binary::read(istream, mesh, version) 6.7368e-05 5
|  |  binary::read_in_comm(path, comm, mesh, version) 6.7368e-05 5
|  |  |  binary::read(path, comm, mesh, strict) 6.7368e-05 5
Write allocation 6.482e-05 88
|  binary::read(istream, mesh, version) 3.005e-05 27
|  |  binary::read_in_comm(path, comm, mesh, version) 3.005e-05 27
|  |  |  binary::read(path, comm, mesh, strict) 3.005e-05 27
|  derive_revClass 1.934e-05 28
|  |  ask_revClass 1.934e-05 28
|  offset_scan 3.6e-06 8
|  |  derive_revClass 2.84e-06 7
|  |  |  ask_revClass 2.84e-06 7
|  |  ask_revClass 7.6e-07 1
|  transit 1.86e-06 2
|  |  derive_adj 1.86e-06 2
|  |  |  ask_adj 1.86e-06 2
|  ask_revClass 9.8e-07 2
binary::read_in_comm(path, comm, mesh, version) 4.3949e-05 2
|  binary::read(path, comm, mesh, strict) 4.3949e-05 2
single host to device 2.63e-05 8
|  offset_scan 2.63e-05 8
|  |  derive_revClass 2.341e-05 7
|  |  |  ask_revClass 2.341e-05 7
|  |  ask_revClass 2.89e-06 1
ask_adj 8.5e-06 10
|  derive_adj 1.62e-06 4
|  |  ask_adj 1.62e-06 4
derive_adj 5.739e-06 2
|  ask_adj 5.739e-06 2

copy semantics of Mesh

Looking at the Mesh class I'm not entirely sure

Omega_h::Mesh mesh_a;
mesh_b = mesh_a; // most data is shared via internal shared ptrs, but not all!

mesh_b.set_dim(1); // dim is integer so copied by value...mesh dim in a still default value
mesh_b.set_coords(new_coords); // coords are a tag and tags are shared between mesh instances with shared ptr! now both mesh_a and mesh_b have new coords!

Also, you can't assume that the values stored internally as shared_ptrs will be shared between instances because some items like set_parents() assign a new shared_ptr, so assigning new values in one instance won't affect the other one.

I suspect most people are passing the mesh via ptr or reference and aren't actually copying a mesh instance which is why we haven't hit the nasty bug/confusion that this will cause.

Since copy doesn't do what people expect I suggest either deprecating or deleting the copy constructor/assignment until we decide on the intended semantics and fix the class to consistently abide by those semantics.

Issue building on RHEL7 with cuda 12.1 and gcc 11.2.0

While building with cuda 12.1 and gcc 11.2.0 on a RHEL 7 computer, I encountered the following thrust error:

[  1%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o
[  1%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o
[  2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o
[  2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o
[  2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o
[  3%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o
[  3%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o
[  4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o
[  4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_array.cpp.o
[  5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_assoc.cpp.o
[  5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_array_ops.cpp.o
[  6%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_base64.cpp.o
/lore/zhangc20/pumi-pic/omega_h/src/Omega_h_reduce.hpp(84): error: namespace "thrust" has no member "device"
    return thrust::transform_reduce(thrust::device, first, last,
                                            ^

1 error detected in the compilation of "/lore/zhangc20/pumi-pic/omega_h/src/Omega_h_array_ops.cpp".
make[2]: *** [src/CMakeFiles/omega_h.dir/Omega_h_array_ops.cpp.o] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [src/CMakeFiles/omega_h.dir/all] Error 2
make: *** [all] Error 2

The build script:

module unuse /opt/scorec/spack/lmod/linux-rhel7-x86_64/Core                     
module use /opt/scorec/spack/v0181_1/lmod/linux-rhel7-x86_64/Core               
module load gcc/11.2.0                                                          
module load mpich/4.0.2                                                         
module load cmake                                                               
                                                                                
cuda=/usr/local/cuda-12.1                                                       
export PATH=$cuda/bin:$PATH                                                     
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH                             
export installroot=$PWD                                                         
export srcroot=$installroot/../                                                 
                                                                                
# kokkos                                                                        
export kk=$installroot/kokkos/install                                           
export kksrc=$srcroot/kokkos                                                    
                                                                                
# omega_h                                                                       
export oh=$installroot/omega_h/install                                          
export ohsrc=$srcroot/omega_h                                                   
                                                                                
cd $installroot                                                                 
mkdir -p omega_h/build                                                          
cd omega_h/build                                                                
cmake $ohsrc -DCMAKE_BUILD_TYPE=Release \                                       
             -DCMAKE_INSTALL_PREFIX=$oh \                                       
             -DBUILD_SHARED_LIBS=OFF \                                          
             -DOmega_h_USE_Kokkos=ON \                                          
             -DOmega_h_USE_CUDA=on \                                            
             -DOmega_h_CUDA_ARCH=86 \                                           
             -DOmega_h_USE_MPI=on \                                             
             -DOmega_h_USE_OpenMP=OFF \                                         
             -DBUILD_TESTING=ON \                                               
             -DCMAKE_CXX_COMPILER=mpicxx \                                      
             -DCMAKE_C_COMPILER=mpicc \                                         
             -DKokkos_PREFIX=$kk/lib64/cmake                                    
make -j4 install                                                                
ctest

what versions of kokkos and cuda does omegah 10.8 require?

Build/test results of Omega_h (cws/cuda112 - based on 10.8.1, allows testing with cuda 11.2) with Kokkos 3.7.#, 4.0.#, and 4.1.00 on a i5-13600KF with NVIDIA GeForce RTX 3060, using the build scripts listed below, are:

gcc, cuda, result
10.4.0, 11.1.1, kokkos 4.1.0 fails to compile, kokkos 3.7.* and 4.0.* not tested
10.4.0, 11.2.1, omegah tests run_aniso_test and rc_field_test fail with cuda asserts, kokkos 3.7.* and 4.0.* not tested
10.4.0, 11.3.1, omegah tests run_aniso_test and rc_field_test fail with cuda asserts, kokkos 3.7.* and 4.0.* not tested
10.4.0, 11.4.4, all omegah tests pass
10.4.0, 11.5.2, all omegah tests pass
10.4.0, 11.6.2, all omegah tests pass
10.4.0, 11.7.1, all omegah tests pass
10.4.0, 11.8.0, all omegah tests pass
12.3.0, 12.1.1, all omegah tests pass

This page lists the compatible GCC - CUDA combinations: https://gist.github.com/ax3l/9489132#nvcc

build scripts

kokkos

suffix=$2
d=buildKokkos${suffix}
cmake -S kokkos -B $d \
  -DCMAKE_CXX_COMPILER=g++ \
  -DKokkos_ARCH_AMPERE86=ON \
  -DKokkos_ENABLE_CUDA=on \
  -DKokkos_ENABLE_CUDA_LAMBDA=on \
  -DBUILD_SHARED_LIBS=ON \
  -DKokkos_ENABLE_SERIAL=ON \
  -DKokkos_ENABLE_DEBUG=on \
  -DKokkos_ENABLE_TESTS=off \
  -DCMAKE_INSTALL_PREFIX=$d/install
cmake --build $d -j 24 --target install

omegah

suffix=$1
d=buildOmegahKokkos$suffix
cmake -S omega_h -B $d \
  -DCMAKE_INSTALL_PREFIX=$d/install \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TESTING=on  \
  -DOmega_h_USE_MPI=off  \
  -DOmega_h_USE_Kokkos=on \
  -DOmega_h_CUDA_ARCH="86" \
  -DCMAKE_CUDA_ARCHITECTURES="86" \
  -DKokkos_PREFIX=buildKokkosCUDA$suffix/install \
  -DBUILD_SHARED_LIBS=on

cmake --build $d --target install -j8

Checking IDs of adjacent entities in the mixed mesh test

In the unit test for mixed mesh, the IDs of adjacent entities returned
from query APIs are checked against known values.

omega_h/src/mixed_test.cpp

Lines 132 to 153 in 466e470

OMEGA_H_CHECK(tet2vtx.ab2b == LOs({0, 1, 8, 2}));
OMEGA_H_CHECK(hex2vtx.ab2b == LOs({4, 5, 9, 11, 7, 6, 1, 0}));
OMEGA_H_CHECK(wedge2vtx.ab2b == LOs({11, 9, 10, 0, 1, 8}));
OMEGA_H_CHECK(pyram2vtx.ab2b == LOs({8, 1, 9, 10, 3}));
OMEGA_H_CHECK(tet2edge.ab2b == LOs({0, 22, 20, 1, 3, 6}));
OMEGA_H_CHECK(hex2edge.ab2b == LOs({13, 9, 18, 12, 14, 15, 23, 19, 16, 5, 0, 2}));
OMEGA_H_CHECK(wedge2edge.ab2b == LOs({18, 17, 11, 19, 23, 21, 0, 22, 20}));
OMEGA_H_CHECK(pyram2edge.ab2b == LOs({22, 23, 17, 21, 7, 4, 8, 10}));
}
else if (num_vertex == 9) {
//pyram on hex
OMEGA_H_CHECK(hex2edge.ab2b == LOs({1, 7, 4, 0, 2, 10, 8, 5, 12, 13, 14, 15}));
OMEGA_H_CHECK(hex2vtx.ab2b == LOs({0, 3, 2, 1, 5, 8, 7, 6}));
OMEGA_H_CHECK(pyram2edge.ab2b == LOs({12, 13, 14, 15, 3, 11, 9, 6}));
OMEGA_H_CHECK(pyram2vtx.ab2b == LOs({5, 8, 7, 6, 4}));
}
else if (num_vertex == 7) {
//tet on wedge
OMEGA_H_CHECK(tet2edge.ab2b == LOs({9, 10, 11, 3, 8, 6}));
OMEGA_H_CHECK(tet2vtx.ab2b == LOs({4, 6, 5, 3}));
OMEGA_H_CHECK(wedge2edge.ab2b == LOs({1, 4, 0, 2, 7, 5, 9, 10, 11}));
OMEGA_H_CHECK(wedge2vtx.ab2b == LOs({0, 2, 1, 4, 6, 5}));

An open issue is to define a new OMEGA_H_VERIFY in Omega_h_fail.hpp which
prints the values of adjacent entity IDs incase of a mis-match, before failing.

don't use static functions with thrust parallel_for

The following fails to compile using cuda 10.2:

#include "Omega_h_library.hpp"                                                                                                                                                                                                                                                
#include <Omega_h_for.hpp>                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                              
static void foo() {                                                                                                                                                                                                                                                           
  auto coords = Omega_h::LOs(100);                                                                                                                                                                                                                                            
  auto f = OMEGA_H_LAMBDA(Omega_h::LO v) {                                                                                                                                                                                                                                    
    auto z = coords[v];                                                                                                                                                                                                                                                       
  };                                                                                                                                                                                                                                                                          
  Omega_h::parallel_for(100, f);                                                                                                                                                                                                                                              
}                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                              
int main(int argc, char** argv) {                                                                                                                                                                                                                                             
  auto lib = Omega_h::Library (&argc, &argv);                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                              
  foo();                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                              
  return 0;                                                                                                                                                                                                                                                                   
}        

With a very long compilation error (see below) on cranium.

Removing static from the foo() resolves it..... No idea why.

[100%] Building CUDA object src/CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o
cd /space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah/src && /usr/local/cuda-10.2/bin/nvcc -forward-unknown-to-host-compiler  -I/space/cwsmith/testOmegahBuild/omega_h/src -I/space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah/src --generate-code=arch=compute_75,code=[compute_75,sm_75] --compiler-options -W,-Wall,-Wextra,-Werror,-Wno-noexcept-type --Werror cross-execution-space-call,deprecated-declarations --expt-extended-lambda -std=c++14 -MD -MT src/CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o -MF CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o.d -x cu -c /space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp -o CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30>':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:108:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8:   [ skipping 3 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:139:65:   required from 'cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int; cudaError_t = cudaError; cudaStream_t = CUstream_st*]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:164:50:   required from 'void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived>&, F, Size) [with Derived = thrust::cuda_cub::par_t; F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23:   required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28:   required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16:   required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17:   required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18:   required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29:   required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:132:8: error: 'thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30>' has a base 'thrust::cuda_cub::core::has_sm_tuning_impl<thrust::cuda_cub::core::sm30, thrust::cuda_cub::__parallel_for::Tuning<thrust::cuda_cub::core::sm30, thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> > > >' whose type uses the anonymous namespace [-Werror=subobject-linkage]
   struct has_sm_tuning : has_sm_tuning_impl<SM, typename P<lowest_supported_sm_arch>::tuning > {};
        ^~~~~~~~~~~~~
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:108:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8:   required from 'struct thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm30>'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:81:6:   [ skipping 2 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:139:65:   required from 'cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int; cudaError_t = cudaError; cudaStream_t = CUstream_st*]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:164:50:   required from 'void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived>&, F, Size) [with Derived = thrust::cuda_cub::par_t; F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23:   required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28:   required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16:   required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17:   required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18:   required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29:   required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8: error: 'thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm30> >' has a base 'thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan<thrust::cuda_cub::core::sm30>' whose type uses the anonymous namespace [-Werror=subobject-linkage]
   struct specialize_plan_impl_match<P, typelist<SM, _1, _2, _3, _4, _5, _6, _7, _8, _9> >
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8:   required from 'struct thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:125:   required by substitution of 'template<class S> static thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::yes_type thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::test<S>(typename S::TempStorage*) [with S = thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:246:   required from 'const bool thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::value'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:330:   [ skipping 7 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23:   required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28:   required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16:   required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17:   required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18:   required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29:   required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:132:8: error: 'thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>' has a base 'thrust::cuda_cub::core::has_sm_tuning_impl<thrust::cuda_cub::core::sm60, thrust::cuda_cub::__parallel_for::Tuning<thrust::cuda_cub::core::sm30, thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> > > >' whose type uses the anonymous namespace [-Werror=subobject-linkage]
   struct has_sm_tuning : has_sm_tuning_impl<SM, typename P<lowest_supported_sm_arch>::tuning > {};
        ^~~~~~~~~~~~~
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm52>':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8:   required from 'struct thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:125:   required by substitution of 'template<class S> static thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::yes_type thrust::cuda_cub::core::has_temp_storage<thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60> >::test<S>(typename S::TempStorage*) [with S = thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:246:   [ skipping 7 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:164:50:   required from 'void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived>&, F, Size) [with Derived = thrust::cuda_cub::par_t; F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23:   required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28:   required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16:   required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17:   required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18:   required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29:   required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:132:8: error: 'thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm52>' has a base 'thrust::cuda_cub::core::has_sm_tuning_impl<thrust::cuda_cub::core::sm52, thrust::cuda_cub::__parallel_for::Tuning<thrust::cuda_cub::core::sm30, thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> > > >' whose type uses the anonymous namespace [-Werror=subobject-linkage]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h: In instantiation of 'struct thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm35>':
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:139:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_match<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:113:8:   required from 'struct thrust::cuda_cub::core::specialize_plan_impl_loop<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::typelist<thrust::cuda_cub::core::sm60, thrust::cuda_cub::core::sm52, thrust::cuda_cub::core::sm35, thrust::cuda_cub::core::sm30> >'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:158:8:   required from 'struct thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm60>'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:170:125:   [ skipping 7 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:139:65:   required from 'cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int; cudaError_t = cudaError; cudaStream_t = CUstream_st*]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/parallel_for.h:164:50:   required from 'void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived>&, F, Size) [with Derived = thrust::cuda_cub::par_t; F = thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >; Size = int]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:79:23:   required from 'Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived>&, Input, Size, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; Size = int; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:103:28:   required from 'Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived>&, Input, Input, UnaryOp) [with Derived = thrust::cuda_cub::par_t; Input = Omega_h::IntIterator; UnaryOp = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/detail/for_each.inl:44:16:   required from 'InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy>&, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy = thrust::cuda_cub::par_t; InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:39:17:   required from 'void Omega_h::for_each(InputIterator, InputIterator, UnaryFunction&&) [with InputIterator = Omega_h::IntIterator; UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&]'
/space/cwsmith/testOmegahBuild/omega_h/src/Omega_h_for.hpp:57:18:   required from 'void Omega_h::parallel_for(Omega_h::LO, UnaryFunction&&) [with UnaryFunction = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>&; Omega_h::LO = int]'
/space/cwsmith/testOmegahBuild/omega_h/src/boundary_field_test.cpp:9:29:   required from here
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h:132:8: error: 'thrust::cuda_cub::core::has_sm_tuning<thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> >, int>::PtxPlan, thrust::cuda_cub::core::sm35>' has a base 'thrust::cuda_cub::core::has_sm_tuning_impl<thrust::cuda_cub::core::sm35, thrust::cuda_cub::__parallel_for::Tuning<thrust::cuda_cub::core::sm30, thrust::cuda_cub::for_each_f<Omega_h::IntIterator, thrust::detail::wrapped_function<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), foo, 1>, Omega_h::LOs>, void> > > >' whose type uses the anonymous namespace [-Werror=subobject-linkage]
cc1plus: all warnings being treated as errors
make[3]: *** [src/CMakeFiles/boundary_field_test.dir/boundary_field_test.cpp.o] Error 1
make[3]: Leaving directory `/space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah'
make[2]: *** [src/CMakeFiles/boundary_field_test.dir/all] Error 2
make[2]: Leaving directory `/space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah'
make[1]: *** [src/CMakeFiles/boundary_field_test.dir/rule] Error 2
make[1]: Leaving directory `/space/cwsmith/testOmegahBuild/build-gcc74-cuda-omegah'
make: *** [boundary_field_test] Error 2

Fix usage of change_tagToMesh/change_tagTorc

There is a pattern in the vtk writer and unmap where the write_array function is surrounded by calls to change_tagToMesh and change_tagTorc.

There is a change in the set_tags function which makes it invalidate a pointer to the old tag when the array gets swapped. So, using change_tagToMesh/change_tagTorc has become silently dangerous because of the tag invalidation.

I'm not exactly sure what the difference in the formats is, but it seems as though the baseline storage of the reverse classification on the mesh should be in the "rc" format and it's just converted to the mesh format to write it out or do whatever unmap does.

@joshia5 I'd like to get your opinion on the following change because it seems that it will save some work/memory.

  1. always store the _rc tags in "rc" format (looks like this is currently the intent)
  2. deprecate/remove change_tagToMesh and change_tagTorc
  3. provide a new function (please help with the naming) rc_tag_to_mesh_array which returns the rc tag array in the "mesh" format.

warp_test_serial hangs with amdclang compilers on frontier

Building with amdclang compilers on frontier

https://github.com/SCOREC/omega_h/wiki/Build-and-Run-on-OLCF-Frontier#build-with-mpi-enabled-using-cray-compiler-wrappers-and-amd-compilers

results in wrap_test_serial hanging. The output and stack trace is pasted below.

A build with hipcc:

https://github.com/SCOREC/omega_h/wiki/Build-and-Run-on-OLCF-Frontier#build-without-mpi-enabled-using-hipcc

results in warp_test_serial running without error.


stdout/stderr

(ins)cwsmith@frontier06451: ~/omegahKk/test/buildOmegahVega90a_amd $ ./src/warp_test                                                                                                                                                                      
warp_to_limit completed in one step                                                                                                                                                                                                                       
before adapting:                                                                                                                                                                                                                                          
6000 tets, quality [0.62,0.85], 6000 >0.30                                                                                                                                                                                                                
7930 edges, length [0.67,1.55], 132 <0.71, 7458 in [0.71,1.41], 340 >1.41                                                                                                                                                                                 
quality histogram:                                                                                                                                                                                                                                        
0.00-0.10: 0                                                                                                                                                                                                                                              
0.10-0.20: 0                                                                                                                                                                                                                                              
0.20-0.30: 0                                                                                                                                                                                                                                              
0.30-0.40: 0                                                                                                                                                                                                                                              
0.40-0.50: 0                                                                                                                                                                                                                                              
0.50-0.60: 0                                                                                                                                                                                                                                              
0.60-0.70: 1080                                                                                                                                                                                                                                           
0.70-0.80: 4240                                                                                                                                                                                                                                           
0.80-0.90: 680                                                                                                                                                                                                                                            
0.90-1.00: 0                                                                                                                                                                                                                                              
length histogram:                                                                                                                                                                                                                                         
0.00-0.30: 0                                                                                                                                                                                                                                              
0.30-0.60: 0                                                                                                                                                                                                                                              
0.60-0.90: 3410                                                                                                                                                                                                                                           
0.90-1.20: 3056                                                                                                                                                                                                                                           
1.20-1.50: 1364                                                                                                                                                                                                                                           
1.50-1.80: 100                                                                                                                                                                                                                                            
1.80-2.10: 0                                                                                                                                                                                                                                              
2.10-2.40: 0                                                                                                                                                                                                                                              
2.40-2.70: 0                                                                                                                                                                                                                                              
2.70-3.00: 0                                                                                                                                                                                                                                              
average quality: 0.744939                                                                                                                                                                                                                                 
addressing edge lengths                                                                                                                                                                                                                                   
refining 340 edges                                                                                                                                                                                                                                        
8040 tets, quality [0.63,0.85], 8040 >0.30                                                                                                                                                                                                                
10310 edges, length [0.60,1.41], 1452 <0.71, 8858 in [0.71,1.41]        

... snip ...


warp_to_limit completed in one step
before adapting:
8472 tets, quality [0.33,0.88], 8472 >0.30
10810 edges, length [0.47,1.56], 1796 <0.71, 8616 in [0.71,1.41], 398 >1.41
quality histogram:
0.00-0.10: 0
0.10-0.20: 0
0.20-0.30: 0
0.30-0.40: 24
0.40-0.50: 558
0.50-0.60: 1236
0.60-0.70: 1936
0.70-0.80: 3198
0.80-0.90: 1520
0.90-1.00: 0
length histogram:
0.00-0.30: 0
0.30-0.60: 756
0.60-0.90: 5242
0.90-1.20: 3160
1.20-1.50: 1590
1.50-1.80: 62
1.80-2.10: 0
2.10-2.40: 0
2.40-2.70: 0
2.70-3.00: 0
average quality: 0.693191
addressing edge lengths
refining 390 edges
10016 tets, quality [0.29,0.92], 8 in [0.20,0.30], 10008 >0.30
12772 edges, length [0.22,1.45], 2692 <0.71, 10072 in [0.71,1.41], 8 >1.41
coarsening 298 vertices
8880 tets, quality [0.29,0.95], 4 in [0.20,0.30], 8876 >0.30
11310 edges, length [0.40,1.45], 1958 <0.71, 9344 in [0.71,1.41], 8 >1.41
refining 8 edges
8916 tets, quality [0.28,0.95], 4 in [0.20,0.30], 8912 >0.30
11354 edges, length [0.28,1.40], 1972 <0.71, 9382 in [0.71,1.41]
coarsening 68 vertices
8644 tets, quality [0.29,0.95], 2 in [0.20,0.30], 8642 >0.30
11010 edges, length [0.40,1.41], 1812 <0.71, 9198 in [0.71,1.41]
coarsening 6 vertices
8622 tets, quality [0.29,0.96], 4 in [0.20,0.30], 8618 >0.30
10978 edges, length [0.40,1.41], 1798 <0.71, 9180 in [0.71,1.41]
coarsening 4 vertices
8608 tets, quality [0.29,0.96], 4 in [0.20,0.30], 8604 >0.30
10958 edges, length [0.40,1.41], 1794 <0.71, 9164 in [0.71,1.41]
addressing element qualities

gdb stack

Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fffed197e7f in ?? () from /opt/rocm-5.3.0/lib/libhsa-runtime64.so.1
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.31-150300.41.1.x86_64 krb5-debuginfo-1.19.2-150400.3.3.1.x86_64 libbrotlicommon1-debuginfo-1.0.7-3.3.1.x86_64 libbrotlidec1-debuginfo-1.0.7-3.3.1.x86_64 libcom_err2-debuginfo-1.46.4-150400.3.3.1.x86_64 libcurl4-debuginfo-7.79.1-150400.5.15.1.x86_64 libdrm2-debuginfo-2.4.107-150400.1.8.x86_64 libdrm_amdgpu1-debuginfo-2.4.107-150400.1.8.x86_64 libelf1-debuginfo-0.185-150400.5.3.1.x86_64 libgcc_s1-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libidn2-0-debuginfo-2.2.0-3.6.1.x86_64 libjson-c3-debuginfo-0.13-3.3.1.x86_64 libkeyutils1-debuginfo-1.6.3-5.6.1.x86_64 libldap-2_4-2-debuginfo-2.4.46-150200.14.11.2.x86_64 libncurses6-debuginfo-6.1-150000.5.12.1.x86_64 libnghttp2-14-debuginfo-1.40.0-6.1.x86_64 libnl3-200-debuginfo-3.3.0-1.29.x86_64 libnuma1-debuginfo-2.0.14.20.g4ee5e0c-150400.1.24.x86_64 libopenssl1_1-debuginfo-1.1.1l-150400.7.22.1.x86_64 libpcre1-debuginfo-8.45-150000.20.13.1.x86_64 libpsl5-debuginfo-0.20.1-150000.3.3.1.x86_64 libselinux1-debuginfo-3.1-150400.1.69.x86_64 libssh4-debuginfo-0.9.6-150400.1.5.x86_64 libstdc++6-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libunistring2-debuginfo-0.9.10-1.1.x86_64 libyaml-0-2-debuginfo-0.1.7-1.17.x86_64 libz1-debuginfo-1.2.11-150000.3.39.1.x86_64 libzstd1-debuginfo-1.5.0-150400.1.71.x86_64
(ins)(gdb) where
#0  0x00007fffed197e7f in ?? () from /opt/rocm-5.3.0/lib/libhsa-runtime64.so.1
#1  0x00007fffed197d3a in ?? () from /opt/rocm-5.3.0/lib/libhsa-runtime64.so.1
#2  0x00007fffed18bd09 in ?? () from /opt/rocm-5.3.0/lib/libhsa-runtime64.so.1
#3  0x00007fffebbf21db in ?? () from /opt/rocm-5.3.0/lib/libamdhip64.so.5
#4  0x00007fffebbe13fa in ?? () from /opt/rocm-5.3.0/lib/libamdhip64.so.5
#5  0x00007fffeba54cb7 in ?? () from /opt/rocm-5.3.0/lib/libamdhip64.so.5
#6  0x00007fffeba5ecfc in hipFree () from /opt/rocm-5.3.0/lib/libamdhip64.so.5
#7  0x000000000134aeec in Kokkos::HIPSpace::impl_deallocate (this=<optimized out>, 
    arg_label=arg_label@entry=0x1b95d08 "", arg_alloc_ptr=arg_alloc_ptr@entry=0x7ff7a83f9000, 
    arg_alloc_size=arg_alloc_size@entry=11216, arg_logical_size=arg_logical_size@entry=10960, 
    arg_handle=...) at /ccs/home/cwsmith/omegahKk/test/kokkos/core/src/HIP/Kokkos_HIP_Space.cpp:228
#8  0x000000000134adc4 in Kokkos::HIPSpace::deallocate (this=<optimized out>, arg_label=0x1b95d08 "", 
    arg_alloc_ptr=0x7ff7a83f9000, arg_alloc_size=11216, arg_logical_size=10960)
    at /ccs/home/cwsmith/omegahKk/test/kokkos/core/src/HIP/Kokkos_HIP_Space.cpp:216
#9  0x0000000001349bc7 in Kokkos::Impl::SharedAllocationRecord<Kokkos::HIPSpace, void>::~SharedAllocationRecord (this=0x1b95cd0)
    at /ccs/home/cwsmith/omegahKk/test/kokkos/core/src/HIP/Kokkos_HIP_SharedAllocationRecord.cpp:41
#10 0x0000000001068421 in Kokkos::Impl::SharedAllocationRecord<Kokkos::HIPSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, signed char, true> >::~SharedAllocationRecord (
    this=0x1b95cd0)
    at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/impl/Kokkos_SharedAlloc.hpp:281
#11 Kokkos::Impl::SharedAllocationRecord<Kokkos::HIPSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, signed char, true> >::~SharedAllocationRecord (this=0x1b95cd0)
    at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/impl/Kokkos_SharedAlloc.hpp:281
#12 0x0000000001343c21 in Kokkos::Impl::SharedAllocationRecord<void, void>::decrement (
    arg_record=0x1b95cd0)
    at /ccs/home/cwsmith/omegahKk/test/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp:265
#13 0x00000000012c2631 in Kokkos::Impl::SharedAllocationTracker::~SharedAllocationTracker (
    this=0x7fffffff6130)
    at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/impl/Kokkos_SharedAlloc.hpp:419
#14 Kokkos::Impl::ViewTracker<Kokkos::View<signed char*> >::~ViewTracker (this=0x7fffffff6130)
    at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/impl/Kokkos_ViewTracker.hpp:39
#15 Kokkos::View<signed char*>::~View (this=0x7fffffff6130)
    at /ccs/home/cwsmith/omegahKk/test/buildKokkosVega90a_amd/install/include/Kokkos_View.hpp:1266
#16 Omega_h::Write<signed char>::~Write (this=0x7fffffff6130)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_array.hpp:24
#17 Omega_h::Read<signed char>::~Read (this=0x7fffffff6130)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_array.hpp:251
#18 Omega_h::swap3d_qualities_tmpl<1> (mesh=mesh@entry=0x7fffffff7360, opts=..., cands2edges=..., 
    cand_quals=cand_quals@entry=0x7fffffff64b8, cand_configs=cand_configs@entry=0x7fffffff6568)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap3d_qualities.cpp:58
#19 0x00000000012c0a99 in Omega_h::swap3d_qualities (mesh=mesh@entry=0x7fffffff7360, opts=..., 
    cands2edges=..., cand_quals=cand_quals@entry=0x7fffffff64b8, 
    cand_configs=cand_configs@entry=0x7fffffff6568)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap3d_qualities.cpp:71
#20 0x00000000012be594 in Omega_h::swap3d_ghosted (mesh=0x7fffffff7360, opts=...)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap3d.cpp:21
#21 Omega_h::swap_edges_3d (mesh=mesh@entry=0x7fffffff7360, opts=...)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap3d.cpp:80
#22 0x00000000012b7586 in Omega_h::swap_edges (mesh=mesh@entry=0x7fffffff7360, opts=...)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_swap.cpp:49
#23 0x00000000010520d1 in Omega_h::satisfy_quality (mesh=0x7fffffff7360, opts=...)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_adapt.cpp:196
#24 Omega_h::snap_and_satisfy_quality (mesh=0x7fffffff7360, opts=...)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_adapt.cpp:241
#25 Omega_h::adapt (mesh=mesh@entry=0x7fffffff7360, opts=...)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_adapt.cpp:281
#26 0x000000000104bfc1 in main (argc=<optimized out>, argv=<optimized out>)
(ins)--Type <RET> for more, q to quit, c to continue without paging--
   test/omega_h/src/warp_test.cpp:115
(ins)(gdb) 

Error building with nvidia/22.7, cuda11.7, and kokkos 4.0.01 on Perlmutter

While building omega_h master branch at latest commit e1be29b on Perlmutter with the following libraries:

  • nvidia 22.7
  • cuda 11.7
  • and kokkos 4.0.01

I encountered the following error:

-- Cray Programming Environment 2.7.20 CXX
-- CMAKE_VERSION: 3.22.0
-- Omega_h_VERSION: 10.5.0
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: OFF
-- CMAKE_INSTALL_PREFIX: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/install
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA: 
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX: 
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/lib64/cmake
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_USE_SimModSuite: OFF
-- Omega_h_VALGRIND: 
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_MPI: ON
-- Omega_h_ENABLE_DEMANGLED_STACKTRACE: OFF
-- Omega_h_DBG: OFF
-- Cray Programming Environment 2.7.20 C
-- Omega_h_USE_MPI: ON
-- MPI_PREFIX: 
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX: 
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/lib64/cmake
-- Enabled Kokkos devices: CUDA;SERIAL
-- kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos
-- Kokkos_CONFIG: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/lib64/cmake/Kokkos/KokkosConfig.cmake
-- Kokkos_VERSION: 4.0.1
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_MPI: ON
-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 10.5.0-sha.e1be29b0+100101100000000000
-- Configuring done
-- Generating done
-- Build files have been written to: /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build
/global/common/software/nersc/pm-2021q4/sw/cmake-3.22.0/bin/cmake -S/global/homes/z/zhangc20/xgcm/omega_h -B/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build --check-build-system CMakeFiles/Makefile.cmake 0
/global/common/software/nersc/pm-2021q4/sw/cmake-3.22.0/bin/cmake -E cmake_progress_start /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/CMakeFiles /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build//CMakeFiles/progress.marks
make  -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
make  -f src/CMakeFiles/omega_h.dir/build.make src/CMakeFiles/omega_h.dir/depend
make[2]: Entering directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build && /global/common/software/nersc/pm-2021q4/sw/cmake-3.22.0/bin/cmake -E cmake_depends "Unix Makefiles" /global/homes/z/zhangc20/xgcm/omega_h /global/homes/z/zhangc20/xgcm/omega_h/src /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src/CMakeFiles/omega_h.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
make  -f src/CMakeFiles/omega_h.dir/build.make src/CMakeFiles/omega_h.dir/build
make[2]: Entering directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
[  1%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_adapt.cpp
[  2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o
[  2%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_align.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_align.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_align.cpp
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_adj.cpp
[  4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o
[  4%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_amr.cpp
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_any.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_any.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_any.cpp
[  5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o
[  5%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_approach.cpp
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_amr_transfer.cpp
[  6%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o
cd /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src && /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/bin/kokkos_launch_compiler /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC /opt/cray/pe/craype/2.7.20/bin/CC -DKOKKOS_DEPENDENCE -I/global/homes/z/zhangc20/xgcm/omega_h/src -I/global/homes/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build/src -I/global/homes/z/zhangc20/xgcm/omega_h/tpl -isystem /global/homes/z/zhangc20/xgcm/install_nvhpc22.7/kokkos/install/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -std=c++14 -fast -O3 -DNDEBUG --expt-extended-lambda -gpu=nordc -cuda -gpu=cc80 --c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o -c /global/homes/z/zhangc20/xgcm/omega_h/src/Omega_h_amr_topology.cpp
nvc++-Error-Unknown switch: --expt-extended-lambda
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:90: src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
nvc++-Error-Unknown switch: --expt-extended-lambda
nvc++-Error-nvc++-Error-Unknown switch: --expt-extended-lambdaUnknown switch: --expt-extended-lambda

nvc++-Error-Unknown switch: --expt-extended-lambda
nvc++-Error-Unknown switch: --expt-extended-lambda
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:174: src/CMakeFiles/omega_h.dir/Omega_h_approach.cpp.o] Error 1
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:160: src/CMakeFiles/omega_h.dir/Omega_h_any.cpp.o] Error 1
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:104: src/CMakeFiles/omega_h.dir/Omega_h_align.cpp.o] Error 1
nvc++-Error-make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:76: src/CMakeFiles/omega_h.dir/Omega_h_adapt.cpp.o] Error 1
Unknown switch: --expt-extended-lambda
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:132: src/CMakeFiles/omega_h.dir/Omega_h_amr_topology.cpp.o] Error 1
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:118: src/CMakeFiles/omega_h.dir/Omega_h_amr.cpp.o] Error 1
nvc++-Error-Unknown switch: --expt-extended-lambda
make[2]: *** [src/CMakeFiles/omega_h.dir/build.make:146: src/CMakeFiles/omega_h.dir/Omega_h_amr_transfer.cpp.o] Error 1
make[2]: Leaving directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
make[1]: *** [CMakeFiles/Makefile2:128: src/CMakeFiles/omega_h.dir/all] Error 2
make[1]: Leaving directory '/global/u2/z/zhangc20/xgcm/install_nvhpc22.7/omega_h/build'
make: *** [Makefile:146: all] Error 2

The configuration and build script:

module load PrgEnv-nvidia                                                       
module load nvidia/22.7                                                            
module load cudatoolkit/11.7                                                       
module load craype-accel-nvidia80                                               
module load cmake/3.22.0                                                        
module unload darshan                                                           
                                                                                
export cuda=$CRAY_CUDATOOLKIT_DIR                                               
export PATH=$cuda/bin:$PATH                                                     
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH                             
export installroot=$PWD                                                         
export kk=$installroot/kokkos/install                                           
export srcroot=$installroot/../                                                 
export kksrc=$srcroot/kokkos                                                    
                                                                                
export oh=$installroot/omega_h/install                                          
export ohsrc=$srcroot/omega_h                                                   
                                                                                
cd $installroot                                                                 
mkdir -p omega_h/build                                                          
cd omega_h/build                                                                
cmake $ohsrc -DCMAKE_INSTALL_PREFIX=$oh -DBUILD_SHARED_LIBS=OFF \               
             -DCMAKE_BUILD_TYPE=Release \                                       
             -DOmega_h_USE_CUDA=on -DOmega_h_CUDA_ARCH=80 \                     
             -DOmega_h_USE_Kokkos=ON -DOmega_h_USE_MPI=ON \                     
             -DCMAKE_CXX_COMPILER=CC \                                          
             -DCMAKE_CXX_FLAGS='-std=c++14' \                                   
             -DCMAKE_CUDA_FLAGS="-I$MPICH_DIR/include" \                        
             -DKokkos_PREFIX=$kk/lib64/cmake/                                   
                                                                                
make VERBOSE=1 -j8 install     

Previously, I was able to build omega_h using gcc 11.2.0, cuda 11.7, and kokkos 4.0.01 on Perlmutter.

make rcField functions private

working through fixing some rcField tests as I merge sandialabs. I'm wondering if there is any reason why we shouldn't push all of the rcField functions to be private methods on the mesh class? If I understand correctly these functions are all there to have an internal cache of stored values of the reverse classification.

If this is correct, then we should make all those methods private and just have the ask_revClass etc. functions be exposed in the namespace.

per process output from --osh-time sorted function is mixed together on stdout

Running with more than one rank and --osh-time produces TOP FUNCTIONS output from each rank that gets mixed together on stdout making the results hard to read.

It appears that building with -DOmega_h_DBG=on may resolve this as the TASK_0_cout macro gets defined:

# define TASK_0_cout if(DBG_COMM && (0 == DBG_COMM->rank())) std::cout

But, for collecting performance data, specifically with a run using GPUs, enabling that build option may not work, and, if it does, it may increase runtime. This needs to be tested.

for (auto i : sorted_result) {
auto cflags( std::cout.flags() );
double val = i.second[TOP_AVE];
if (val*100.0/total_runtime_ave >= h.chop) {
TASK_0_cout << std::right
<< std::setw(width) << val*scale << percent
<< std::setw(width) << i.second[TOP_MIN]*scale << percent
<< std::setw(width) << i.second[TOP_MAX]*scale << percent
<< std::setw(width) << i.second[TOP_MAX]/i.second[TOP_MIN] << " ";
std::cout.flags(cflags);
TASK_0_cout << i.first << std::endl;
}
std::cout.flags(coutflags);
}
}

Shape Functions API

Does omega_h explicitly define shape functions, or does it never need them as everything is assumed to be linear? If so where is it? Can you inquire for a certain parametric coordinate and get back the field's value?

Warning message configuring on Summit with CMake 3.23.2, gcc 11.2.0, and Cuda 11.7.1

While building on Summit with CMake 3.23.2, gcc 11.2.0, and Cuda 11.7.1, CMake configuration produced the following warning message:

-- CMAKE_VERSION: 3.23.2
-- Omega_h_VERSION: 10.5.0
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: OFF
-- CMAKE_INSTALL_PREFIX: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/omega_h/install
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA: 
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX: 
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/kokkos/install/lib64/cmake
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_USE_SimModSuite: OFF
-- Omega_h_VALGRIND: 
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_MPI: on
-- Omega_h_ENABLE_DEMANGLED_STACKTRACE: OFF
-- Omega_h_DBG: OFF
-- Omega_h_USE_MPI: on
-- MPI_PREFIX: 
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX: 
-- Omega_h_USE_Kokkos: ON
-- Kokkos_PREFIX: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/kokkos/install/lib64/cmake
-- Enabled Kokkos devices: CUDA;SERIAL
-- kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos
-- Kokkos_CONFIG: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/kokkos/install/lib64/cmake/Kokkos/KokkosConfig.cmake
-- Kokkos_VERSION: 3.7.01
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_MPI: on
-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: on
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 10.5.0-sha.e1be29b0+100101100000000000
-- Configuring done
CMake Warning at src/CMakeLists.txt:252 (add_executable):
  Cannot generate a safe runtime search path for target msh2osh because files
  in some directories may conflict with libraries in implicit directories:

    runtime library [libcuda.so.1] in /usr/lib64 may be hidden by files in:
      /sw/summit/cuda/11.7.1/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  src/CMakeLists.txt:261 (osh_add_exe)
  src/CMakeLists.txt:265 (osh_add_util)


CMake Warning at src/CMakeLists.txt:252 (add_executable):
  Cannot generate a safe runtime search path for target osh2vtk because files
  in some directories may conflict with libraries in implicit directories:

    runtime library [libcuda.so.1] in /usr/lib64 may be hidden by files in:
      /sw/summit/cuda/11.7.1/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  src/CMakeLists.txt:261 (osh_add_exe)
  src/CMakeLists.txt:266 (osh_add_util)


CMake Warning at src/CMakeLists.txt:252 (add_executable):
  Cannot generate a safe runtime search path for target oshdiff because files
  in some directories may conflict with libraries in implicit directories:

    runtime library [libcuda.so.1] in /usr/lib64 may be hidden by files in:
      /sw/summit/cuda/11.7.1/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  src/CMakeLists.txt:261 (osh_add_exe)
  src/CMakeLists.txt:267 (osh_add_util)

.....

-- Generating done
-- Build files have been written to: /ccs/home/zhangc20/pumi-pic/install_cuda11.7.1/omega_h/build

Note: warning message truncated as the warnings are the same except different line numbers for the following line:

  src/CMakeLists.txt:267 (osh_add_util)

The build was working fine. Reporting here in case it's an issue.

cuda aware run of deltawing case fails on perlmutter

environment

$ module li

Currently Loaded Modules:
  1) craype-x86-milan     3) craype-network-ofi                      5) PrgEnv-gnu/8.5.0   7) cray-libsci/23.12.5   9) craype/2.7.30    11) perftools-base/23.12.0  13) cudatoolkit/12.2       15) gpu/1.0
  2) libfabric/1.15.2.0   4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta   6) cray-dsmml/0.2.2   8) cray-mpich/8.1.28    10) gcc-native/12.3  12) cpe/23.12               14) craype-accel-nvidia80

versions

  • Omega_h: scorec/omega_h master @ 7a39707
  • Kokkos: kokkos/kokkos master @ e0dc0128e

build

$ cat doConfigPerlKk.sh 
bdir=$PWD/build-kokkos
cmake -S kokkos -B $bdir \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_SHARED_LIBS=ON \
  -DCRAYPE_LINK_TYPE=dynamic \
  -DCMAKE_CXX_COMPILER=$PWD/kokkos/bin/nvcc_wrapper \
  -DKokkos_ARCH_AMPERE80=ON \
  -DKokkos_ENABLE_SERIAL=ON \
  -DKokkos_ENABLE_OPENMP=off \
  -DKokkos_ENABLE_CUDA=on \
  -DKokkos_ENABLE_CUDA_LAMBDA=on \
  -DKokkos_ENABLE_DEBUG=off \
  -DCMAKE_INSTALL_PREFIX=$bdir/install
$ cat doConfigPerlOmegah.sh 
#!/bin/bash -ex

usage="Usage: $0  <mpi=on|off> <cudaAware=on|off>"
[[ $# -ne 2 ]] && echo $usage && exit 1

mpi=$1
[[ $mpi != "on" && $mpi != "off" ]] && echo $usage && exit 1

cudaAware=$2
[[ $cudaAware != "on" && $cudaAware != "off" ]] && echo $usage && exit 1

bdir=$PWD/build-omegah-mpi${mpi}-cudaAware${cudaAware}
cmake -S omega_h -B $bdir \
  -DCMAKE_INSTALL_PREFIX=$bdir/install \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_SHARED_LIBS=on \
  -DOmega_h_USE_Kokkos=on \
  -DOmega_h_CUDA_ARCH=80 \
  -DOmega_h_USE_MPI=$mpi \
  -DOmega_h_USE_CUDA_AWARE_MPI=$cudaAware \
  -DBUILD_TESTING=on \
  -DCMAKE_CXX_COMPILER=CC

run

Download the Omega_h delta wing meshes: https://zenodo.org/records/10672130

$ cat submitP2.sh
sbatch --nodes 1 --qos regular --time 00:10:00 --constraint gpu --gpus 4 --account=PROJECT_NAME ./runP2.sh
$ cat runP2.sh
#!/bin/bash
bin_cudaAwareOff=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/build-omegah-mpion-cudaAwareoff/src
bin_cudaAwareOn=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/build-omegah-mpion-cudaAwareon/src
mesh=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/deltaWing_500kMetric_p2.osh

cmd="$bin_cudaAwareOff/ugawg_hsc_oshmeshload --osh-pool $mesh"
export MPICH_GPU_SUPPORT_ENABLED=0
set -x
srun -n 2 $cmd &> log2p_cudaAwareOff
set +x

cmd="$bin_cudaAwareOn/ugawg_hsc_oshmeshload --osh-pool $mesh"
export MPICH_GPU_SUPPORT_ENABLED=1
set -x
srun -n 2 $cmd &> log2p_cudaAwareOn
set +x

error

$ cat log2p_cudaAwareOn
(GTL DEBUG: 0) cuIpcGetMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 148
MPICH ERROR [Rank 0] [job id 22622708.1] [Wed Mar  6 07:48:56 2024] [nid002241] - Abort(606713346) (rank 0 in comm 0): Fatal error in PMPI_Isend: Invalid count, error stack:
PMPI_Isend(161)......................: MPI_Isend(buf=0x623196f88, count=2382, MPI_INT, dest=1, tag=42, comm=0xc4000000, request=0x23c3f34) failed
MPID_Isend(584)......................: 
MPIDI_isend_unsafe(136)..............: 
MPIDI_SHM_mpi_isend(323).............: 
MPIDI_CRAY_Common_lmt_isend(84)......: 
MPIDI_CRAY_Common_lmt_export_mem(103): 
(unknown)(): Invalid count

aborting job:
Fatal error in PMPI_Isend: Invalid count, error stack:
PMPI_Isend(161)......................: MPI_Isend(buf=0x623196f88, count=2382, MPI_INT, dest=1, tag=42, comm=0xc4000000, request=0x23c3f34) failed
MPID_Isend(584)......................: 
MPIDI_isend_unsafe(136)..............: 
MPIDI_SHM_mpi_isend(323).............: 
MPIDI_CRAY_Common_lmt_isend(84)......: 
MPIDI_CRAY_Common_lmt_export_mem(103): 
(unknown)(): Invalid count
Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()
srun: error: nid002241: task 0: Exited with exit code 255
srun: Terminating StepId=22622708.1
slurmstepd: error: *** STEP 22622708.1 ON nid002241 CANCELLED AT 2024-03-06T15:48:58 ***
srun: error: nid002241: task 1: Terminated
srun: Force Terminated StepId=22622708.1

describe: provide more tag info

In the describe tool, we should add the following info on each tag:

  • number of entities with a given values (this would require additional user input: tag name and value)
  • min and max across all processes
  • type (int, bool, float, real, etc.)
  • change ‘size per entity’ → ‘number of components’

get/set tags with Topo_type vs dimension (integer)

It seems that the storage/retrieval of tags with either a topology type or dimension does not work as one would expect.

That's because those tags are stored and queried based on how they were added either by topo type or dimension.

For example if a tag is requested of dimension 3 (region) it will not return the tags that were set on Tets, Hexes, etc.

Error building with "-DCMAKE_BUILD_TYPE=Release" on Summit

On Summit, adding -DCMAKE_BUILD_TYPE=Release in the build script, there is an error at the installation stage:

[100%] Built target ascii_vtk2osh
Install the project...
-- Install configuration: "Release"
CMake Error at cmake_install.cmake:58 (file):
  file cannot create directory: /usr/local/lib/cmake/Omega_h.  Maybe need
  administrative privileges.


make: *** [install] Error 1

The build script is as below:

module load gcc/7.4.0
module load cuda/10.1.243
module load cmake

export cuda=$CUDA_DIR
export PATH=$cuda/bin:$PATH
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH
export installroot=$PWD
export kk=$installroot/kokkos/install # this is where kokkos will be installed
export srcroot=$installroot/../
export kksrc=$srcroot/kokkos

export OMPI_CXX=$kksrc/bin/nvcc_wrapper
export oh=$installroot/omega_h/install # this is where omega_h will be installed
export ohsrc=$srcroot/omega_h

cd $installroot
mkdir -p omega_h/build
cd omega_h/build
cmake $ohsrc -DCMAKE_BUILD_TYPE=Release
             -DCMAKE_INSTALL_PREFIX=$oh -DBUILD_SHARED_LIBS=ON \
             -DOmega_h_USE_CUDA=on -DOmega_h_CUDA_ARCH=70 \
             -DOmega_h_USE_Kokkos=ON -DOmega_h_USE_MPI=on \
             -DCMAKE_CXX_COMPILER=mpiCC \
             -DKokkos_PREFIX=$kk/lib64/cmake/

make -j4 install

Deleting -DCMAKE_BUILD_TYPE=Release, everything works fine. What's the reason?

build error ('undefined hidden symbol') using amdclang compilers and no optimizations on frontier

Building using the amd compilers without optimizations results in compile errors:

https://github.com/SCOREC/omega_h/wiki/Build-and-Run-on-OLCF-Frontier#build-with-mpi-enabled-using-cray-compiler-wrappers-and-amd-compilers

Adding -O1 or -O2 results in a successful build.

+ /opt/cray/pe/craype/2.7.19/bin/CC -DKOKKOS_DEPENDENCE -D__HIP_ROCclr__ -I/ccs/home/cwsmith/omegahKk/omega_h/src -I/ccs/home/cwsmith/omegahKk/buildOmegahVega90a_amd_RelWithDebInfo_MPIon/src -I/ccs/home/cwsmith/omegahKk/omega_h/tpl -isystem /ccs/home/cwsmith/omegahKk/buildKokkosVega90a_RelWithDebInfo_amd/install/include -g -fno-gpu-rdc -x hip --rocm-path=/opt/rocm-5.3.0 --offload-arch=gfx90a -std=c++17 -MD -MT src/CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o -MF CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o.d -o CMakeFiles/omega_h.dir/Omega_h_adj.cpp.o -c /ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp
In file included from /ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:3:
/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_align.hpp:132:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
1 warning generated when compiling for gfx90a.
lld: error: undefined hidden symbol: Omega_h::Read<int>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:29 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:29)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:29 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:29)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:31 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:31)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced 99 more times

lld: error: undefined hidden symbol: Omega_h::Write<int>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:32 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:32)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:32 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:32)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:170 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:170)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::form_uses(Omega_h::Read<int>, Omega_h_Family, int, int)::'lambda'(int)::operator()(int) const)
>>> referenced 37 more times

lld: error: undefined hidden symbol: Omega_h::Read<signed char>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:33 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:33)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:33 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:33)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:212 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:212)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::separate_upward_with_codes(int, Omega_h::Read<int>, int, Omega_h::Write<int>, Omega_h::Read<signed char>, Omega_h::Write<signed char>)::'lambda'(int)::operator()(int) const)
>>> referenced 29 more times

lld: error: undefined hidden symbol: Omega_h::Write<signed char>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:33 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:33)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:33 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:33)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::unmap_adjacency(Omega_h::Read<int>, Omega_h::Adj)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:127 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:127)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::find_canonical_jumps(int, Omega_h::Read<int>, Omega_h::Read<int>)::'lambda'(int)::operator()(int) const)
>>> referenced 37 more times

lld: error: undefined hidden symbol: Omega_h::Read<long>::operator[](int) const
>>> referenced by Omega_h_adj.cpp:78 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:78)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::Read<signed char> Omega_h::get_codes_to_canonical_deg<4, long>(Omega_h::Read<long>)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:78 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:78)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::Read<signed char> Omega_h::get_codes_to_canonical_deg<4, long>(Omega_h::Read<long>)::'lambda'(int)::operator()(int) const)
>>> referenced by Omega_h_adj.cpp:81 (/ccs/home/cwsmith/omegahKk/omega_h/src/Omega_h_adj.cpp:81)
>>>               /tmp/Omega_h_adj-26bbe5/Omega_h_adj-gfx90a.o:(Omega_h::Read<signed char> Omega_h::get_codes_to_canonical_deg<4, long>(Omega_h::Read<long>)::'lambda'(int)::operator()(int) const)
>>> referenced 35 more times
clang-15: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)

Number of components in a mesh tag is more than INT8_MAX

Copying the same issue from XGCm for better location.

When the number of components for a mesh tag exceeds INT8_MAX, which is 127, it is problematic as this could often happen.

Simply inferring from the below assert (OMEGA_H_CHECK) statement, INT8_MAX seems to be the upper limit manually set currently for the number of components. From my understanding, it may not be a simple manual limit to change, and could be related to many other operations:

template <typename T>
void Mesh::add_tag(Int ent_dim, std::string const& name, Int ncomps,
Read<T> array, bool internal) {
check_dim2(ent_dim);
auto it = tag_iter(ent_dim, name);
auto had_tag = (it != tags_[ent_dim].end());
Tag<T>* tag;
if (had_tag) {
tag = as<T>(it->get());
OMEGA_H_CHECK(ncomps == tag->ncomps());
} else {
check_tag_name(name);
OMEGA_H_CHECK(ncomps >= 0);
OMEGA_H_CHECK(ncomps <= Int(INT8_MAX));
OMEGA_H_CHECK(tags_[ent_dim].size() < size_t(INT8_MAX));
tag = new Tag<T>(name, ncomps);
TagPtr ptr(tag);
tags_[ent_dim].push_back(std::move(ptr));
}
/* internal typically indicates migration/adaptation/file reading,
when we do not want any invalidation to take place.
the invalidation is there to prevent users changing coordinates
etc. without updating dependent fields */
if (!internal) react_to_set_tag(ent_dim, name);
tag->set_array(array);
}

warp_test fails in mpi on frontier with reducedThrust branch

Following these instructions:

https://github.com/SCOREC/omega_h/wiki/Build-and-Run-on-OLCF-Frontier#build-with-mpi-enabled-using-cray-compiler-wrappers-and-amd-compilers

core was generated by `src/warp_test'.
Program terminated with signal SIGBUS, Bus error.

warning: Section `.reg-xstate/76107' in core file too small.
#0  0x00007fffe817ed4f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fffed81a300 (LWP 76107))]
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.31-150300.41.1.x86_64 krb5-debuginfo-1.19.2-150400.3.3.1.x86_64 libbrotlicommon1-debuginfo-1.0.7-3.3.1.x86_64 libbrotlidec1-debuginfo-1.0.7-3.3.1.x86_64 libcom_err2-debuginfo-1.46.4-150400.3.3.1.x86_64 libcurl4-debuginfo-7.79.1-150400.5.15.1.x86_64 libdrm2-debuginfo-2.4.107-150400.1.8.x86_64 libdrm_amdgpu1-debuginfo-2.4.107-150400.1.8.x86_64 libelf1-debuginfo-0.185-150400.5.3.1.x86_64 libgcc_s1-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libidn2-0-debuginfo-2.2.0-3.6.1.x86_64 libjson-c3-debuginfo-0.13-3.3.1.x86_64 libkeyutils1-debuginfo-1.6.3-5.6.1.x86_64 libldap-2_4-2-debuginfo-2.4.46-150200.14.11.2.x86_64 libncurses6-debuginfo-6.1-150000.5.12.1.x86_64 libnghttp2-14-debuginfo-1.40.0-6.1.x86_64 libnl3-200-debuginfo-3.3.0-1.29.x86_64 libnuma1-debuginfo-2.0.14.20.g4ee5e0c-150400.1.24.x86_64 libopenssl1_1-debuginfo-1.1.1l-150400.7.22.1.x86_64 libpcre1-debuginfo-8.45-150000.20.13.1.x86_64 libpsl5-debuginfo-0.20.1-150000.3.3.1.x86_64 libselinux1-debuginfo-3.1-150400.1.69.x86_64 libssh4-debuginfo-0.9.6-150400.1.5.x86_64 libstdc++6-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libunistring2-debuginfo-0.9.10-1.1.x86_64 libyaml-0-2-debuginfo-0.1.7-1.17.x86_64 libz1-debuginfo-1.2.11-150000.3.39.1.x86_64 libzstd1-debuginfo-1.5.0-150400.1.71.x86_64
(ins)(gdb) where
#0  0x00007fffe817ed4f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1  0x00007fffe9729b6c in MPIR_Localcopy () from /opt/cray/pe/lib64/libmpi_amd.so.12
#2  0x00007fffeb479223 in MPIDI_CRAY_Common_lmt_unpack () from /opt/cray/pe/lib64/libmpi_amd.so.12
#3  0x00007fffeb498a08 in MPIDI_CRAY_Common_lmt_ctrl_send_rts_cb () from /opt/cray/pe/lib64/libmpi_amd.so.12
#4  0x00007fffeb4716c8 in MPIDI_SHMI_progress () from /opt/cray/pe/lib64/libmpi_amd.so.12
#5  0x00007fffe9f6b7e9 in MPIR_Waitall_impl () from /opt/cray/pe/lib64/libmpi_amd.so.12
#6  0x00007fffe9fd19b1 in MPIR_Waitall () from /opt/cray/pe/lib64/libmpi_amd.so.12
#7  0x00007fffe9fd2eae in PMPI_Waitall () from /opt/cray/pe/lib64/libmpi_amd.so.12
#8  0x000000000114cb9d in Omega_h::Comm::alltoallv<int> (this=0x1b04f80, sendbuf_dev=..., sdispls_dev=..., rdispls_dev=..., width=1) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_comm.cpp:557
#9  0x000000000117989c in Omega_h::Dist::exch<int> (this=this@entry=0x7fffffff6538, data=..., width=width@entry=1) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:118
#10 0x0000000001174221 in Omega_h::Dist::set_dest_idxs (this=this@entry=0x7fffffff6538, fitems2rroots=..., nrroots=nrroots@entry=3000) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:78
#11 0x00000000011734bb in Omega_h::Dist::Dist (this=0x7fffffff6538, comm_in=..., fitems2rroots=..., nrroots=3000) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:23
#12 0x0000000001199098 in Omega_h::bi_partition (comm=..., marks=...) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_bipart.cpp:32
#13 0x0000000001193657 in Omega_h::inertia::recursively_bisect (comm=..., tolerance=<error reading variable: That operation is not available on integers of more than 8 bytes.>, 
    p_coords=p_coords@entry=0x7fffffff6658, p_masses=p_masses@entry=0x7fffffff6640, p_owners=p_owners@entry=0x7fffffff6670, p_hints=p_hints@entry=0x7fffffff66d0)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_inertia.cpp:181
#14 0x00000000011e0665 in Omega_h::Mesh::balance (this=0x7fffffff70c0, predictive=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_mesh.cpp:560
#15 0x00000000010f23b8 in Omega_h::build_box (comm=..., family=family@entry=OMEGA_H_SIMPLEX, x=<error reading variable: That operation is not available on integers of more than 8 bytes.>, 
    y=<error reading variable: That operation is not available on integers of more than 8 bytes.>, z=<error reading variable: That operation is not available on integers of more than 8 bytes.>, nx=nx@entry=10, 
    ny=ny@entry=10, nz=nz@entry=10, symmetric=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_build.cpp:147
#16 0x000000000104af37 in main (argc=<optimized out>, argv=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/warp_test.cpp:71
(ins)(gdb) 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.