Coder Social home page Coder Social logo

Comments (12)

cyrush avatar cyrush commented on June 17, 2024 1

@BenWibking

Update: We talked to VTK-m folks and we think this is fixed in VTK-m 2.0.

Here is the relevant MR: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/2880

Ascent's 0.9.2 release will support VTK-m 2.0.

VTK-m 2.0 includes API-breaking changes, so you can't try it out of the box.

But we hope to release soon after 0.9.1 (which is imminent, focused on windows support)

from ascent.

BenWibking avatar BenWibking commented on June 17, 2024

I built Ascent with:

module load PrgEnv-cray cce/14.0.2 craype-accel-amd-gfx90a cmake hdf5 cray-python rocm/5.2.0 cray-mpich/8.1.23

export MPICH_GPU_SUPPORT_ENABLED=1
export ROCM_ARCH=gfx90a
export CC=$(which cc)
export CXX=$(which CC)
export FTN=$(which ftn)
export CFLAGS="-I${ROCM_PATH}/include"
export CXXFLAGS="-I${ROCM_PATH}/include -Wno-pass-failed"
export LDFLAGS="-L${ROCM_PATH}/lib -lamdhip64"
export HIPFLAGS="-I/opt/cray/pe/mpich/default/ofi/rocm-compiler/5.0/include/"
env enable_mpi=ON enable_find_mpi=OFF ./build_ascent_hip.sh

from ascent.

nicolemarsaglia avatar nicolemarsaglia commented on June 17, 2024

I don't believe you did anything wrong. This is reminiscent of another error that needed "cmake_policy(SET CMP0054 NEW)" added to the beginning of this file: kokkos-3.6.01/install/lib64/cmake/Kokkos/KokkosConfigCommon.cmake

Unfortunately, I didn't see the actual error, this is a different policy, and it was while building Ascent rather than building against it. So not necessarily helpful.

I can start replicating on Frontier. This issue may need to be raised with the Kokkos team.

from ascent.

BenWibking avatar BenWibking commented on June 17, 2024

Ok, thanks. I should add that our code uses Kokkos itself, so it might be a version mismatch between Kokkos used by our code and the version of Kokkos used by Ascent/dependencies. Is the build process designed to handle such a case?

from ascent.

BenWibking avatar BenWibking commented on June 17, 2024

FYI- I can get it to configure and build if I explicitly point our code to link against the Ascent Kokkos build.

from ascent.

BenWibking avatar BenWibking commented on June 17, 2024

At runtime, I get a warning (for each MPI rank):
2023-04-05 15:54:06.701 ( 27.263s) [main thread ]RuntimeDeviceConfigurat:188 WARN| Attempted to Re-initialize Kokkos! The Kokkos subsystem can only be initialized once

However, it appears to run perfectly fine.

from ascent.

BenWibking avatar BenWibking commented on June 17, 2024

It runs fine, but crashes when it tries to finalize Kokkos twice:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos allocation "ErrorMessageViewInstance" is being deallocated after Kokkos::finalize was called


Loguru caught a signal: SIGABRT
Stack trace:
5       0x7fffd84cbdbb /ccs/home/wibking/ascent_frontier/install/vtk-m-v1.9.0/lib/libvtkm_cont-1.9.so.1(+0xbebdbb) [0x7fffd84cbdbb]
4       0x7fffd6287c55 /opt/cray/pe/gcc-libs/libstdc++.so.6(+0xb0c55) [0x7fffd6287c55]
3       0x7fffd6287bea /opt/cray/pe/gcc-libs/libstdc++.so.6(+0xb0bea) [0x7fffd6287bea]
2       0x7fffd627c5b9 /opt/cray/pe/gcc-libs/libstdc++.so.6(+0xa55b9) [0x7fffd627c5b9]
1       0x7fffd5e2c355 abort + 375
0       0x7fffd5e2acbb gsignal + 269
2023-04-05 15:56:48.270 ( 177.913s) [main thread     ]                       :0     FATL| Signal: SIGABRT
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos allocation "ErrorMessageViewInstance" is being deallocated after Kokkos::finalize was called

from ascent.

nicolemarsaglia avatar nicolemarsaglia commented on June 17, 2024

Ok! These are great updates!

This makes sense, Ascent's Kokkos implementation is based on VTK-m needing it and they don't initialize or close kokkos so Ascent handles it. But is clearly not robust to handle a linked code also initializes/closes kokkos. I'll look at fixing this.

from ascent.

nicolemarsaglia avatar nicolemarsaglia commented on June 17, 2024

Turns out we already have some safeguards:

if(!Kokkos::is_initialized())

And I'm not finding Kokkos::finalize() anywhere in Ascent.

from ascent.

BenWibking avatar BenWibking commented on June 17, 2024

@pgrete Can you reproduce this on Frontier?

from ascent.

BenWibking avatar BenWibking commented on June 17, 2024

This looks like it's a bug in VTK-m to me.

In vtk-m-v1.9.0/vtkm/cont/kokkos/internal/DeviceAdapterAlgorithmKokkos.cxx:23, a static thread_local Kokkos::View ("ErrorMessageViewInstance") is created that appears to never be explicitly deallocated. This seems like it should always be an issue, since static variables will only be implicitly deallocated when the program exits, which is always after Kokkos::finalize().

from ascent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.