Comments (12)
Update: We talked to VTK-m folks and we think this is fixed in VTK-m 2.0.
Here is the relevant MR: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/2880
Ascent's 0.9.2 release will support VTK-m 2.0.
VTK-m 2.0 includes API-breaking changes, so you can't try it out of the box.
But we hope to release soon after 0.9.1 (which is imminent, focused on windows support)
from ascent.
I built Ascent with:
module load PrgEnv-cray cce/14.0.2 craype-accel-amd-gfx90a cmake hdf5 cray-python rocm/5.2.0 cray-mpich/8.1.23
export MPICH_GPU_SUPPORT_ENABLED=1
export ROCM_ARCH=gfx90a
export CC=$(which cc)
export CXX=$(which CC)
export FTN=$(which ftn)
export CFLAGS="-I${ROCM_PATH}/include"
export CXXFLAGS="-I${ROCM_PATH}/include -Wno-pass-failed"
export LDFLAGS="-L${ROCM_PATH}/lib -lamdhip64"
export HIPFLAGS="-I/opt/cray/pe/mpich/default/ofi/rocm-compiler/5.0/include/"
env enable_mpi=ON enable_find_mpi=OFF ./build_ascent_hip.sh
from ascent.
I don't believe you did anything wrong. This is reminiscent of another error that needed "cmake_policy(SET CMP0054 NEW)" added to the beginning of this file: kokkos-3.6.01/install/lib64/cmake/Kokkos/KokkosConfigCommon.cmake
Unfortunately, I didn't see the actual error, this is a different policy, and it was while building Ascent rather than building against it. So not necessarily helpful.
I can start replicating on Frontier. This issue may need to be raised with the Kokkos team.
from ascent.
Ok, thanks. I should add that our code uses Kokkos itself, so it might be a version mismatch between Kokkos used by our code and the version of Kokkos used by Ascent/dependencies. Is the build process designed to handle such a case?
from ascent.
FYI- I can get it to configure and build if I explicitly point our code to link against the Ascent Kokkos build.
from ascent.
At runtime, I get a warning (for each MPI rank):
2023-04-05 15:54:06.701 ( 27.263s) [main thread ]RuntimeDeviceConfigurat:188 WARN| Attempted to Re-initialize Kokkos! The Kokkos subsystem can only be initialized once
However, it appears to run perfectly fine.
from ascent.
It runs fine, but crashes when it tries to finalize Kokkos twice:
terminate called after throwing an instance of 'std::runtime_error'
what(): Kokkos allocation "ErrorMessageViewInstance" is being deallocated after Kokkos::finalize was called
Loguru caught a signal: SIGABRT
Stack trace:
5 0x7fffd84cbdbb /ccs/home/wibking/ascent_frontier/install/vtk-m-v1.9.0/lib/libvtkm_cont-1.9.so.1(+0xbebdbb) [0x7fffd84cbdbb]
4 0x7fffd6287c55 /opt/cray/pe/gcc-libs/libstdc++.so.6(+0xb0c55) [0x7fffd6287c55]
3 0x7fffd6287bea /opt/cray/pe/gcc-libs/libstdc++.so.6(+0xb0bea) [0x7fffd6287bea]
2 0x7fffd627c5b9 /opt/cray/pe/gcc-libs/libstdc++.so.6(+0xa55b9) [0x7fffd627c5b9]
1 0x7fffd5e2c355 abort + 375
0 0x7fffd5e2acbb gsignal + 269
2023-04-05 15:56:48.270 ( 177.913s) [main thread ] :0 FATL| Signal: SIGABRT
terminate called after throwing an instance of 'std::runtime_error'
what(): Kokkos allocation "ErrorMessageViewInstance" is being deallocated after Kokkos::finalize was called
from ascent.
Ok! These are great updates!
This makes sense, Ascent's Kokkos implementation is based on VTK-m needing it and they don't initialize or close kokkos so Ascent handles it. But is clearly not robust to handle a linked code also initializes/closes kokkos. I'll look at fixing this.
from ascent.
Turns out we already have some safeguards:
Line 325 in 76f3354
And I'm not finding Kokkos::finalize() anywhere in Ascent.
from ascent.
@pgrete Can you reproduce this on Frontier?
from ascent.
This looks like it's a bug in VTK-m to me.
In vtk-m-v1.9.0/vtkm/cont/kokkos/internal/DeviceAdapterAlgorithmKokkos.cxx:23
, a static thread_local
Kokkos::View ("ErrorMessageViewInstance") is created that appears to never be explicitly deallocated. This seems like it should always be an issue, since static variables will only be implicitly deallocated when the program exits, which is always after Kokkos::finalize()
.
from ascent.
Related Issues (20)
- When OpenMPI is chosen binaries are still linked with mpich HOT 2
- windows doesnt like giant strings HOT 1
- add example with ghost zones
- How to use Cuda? HOT 5
- How to plot blocks? HOT 1
- Block-structured multi level mesh HOT 11
- replay: empty line in --cycles input file causes crash
- zero copy device support for integer fields
- render failure with slice filter (but only on GPU) HOT 3
- query expression for rms of a field? HOT 2
- Data binning segfaults when data is in device memory on non-unified memory systems HOT 9
- are we double initing?
- Frontier failing unit tests HOT 1
- Segfault in downstream use on Frontier HOT 11
- Contour of qcriterion HOT 4
- Build errors when upgrading to VTKm 2.0.0 HOT 2
- provide vtkm merge points based option for point rendering
- databinning vorticity causes warning "Field type unsupported for conversion to blueprint" HOT 3
- build_ascent hdf5 zlib support HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ascent.