Coder Social home page Coder Social logo

kokkos-resilience's Introduction

Kokkos

Kokkos: Core Libraries

Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use CUDA, HIP, SYCL, HPX, OpenMP and C++ threads as backend programming models with several other backends in development.

Kokkos Core is part of the Kokkos C++ Performance Portability Programming Ecosystem.

Kokkos is a Linux Foundation project.

Learning about Kokkos

To start learning about Kokkos:

Obtaining Kokkos

The latest release of Kokkos can be obtained from the GitHub releases page.

The current release is 4.3.00.

curl -OJ -L https://github.com/kokkos/kokkos/archive/refs/tags/4.3.00.tar.gz
# Or with wget
wget https://github.com/kokkos/kokkos/archive/refs/tags/4.3.00.tar.gz

To clone the latest development version of Kokkos from GitHub:

git clone -b develop  https://github.com/kokkos/kokkos.git

Building Kokkos

To build Kokkos, you will need to have a C++ compiler that supports C++17 or later. All requirements including minimum and primary tested compiler versions can be found here.

Building and installation instructions are described here.

You can also install Kokkos using Spack: spack install kokkos. Available configuration options can be displayed using spack info kokkos.

For the complete documentation: kokkos.org/kokkos-core-wiki/

Support

For questions find us on Slack: https://kokkosteam.slack.com or open a GitHub issue.

For non-public questions send an email to: crtrott(at)sandia.gov

Contributing

Please see this page for details on how to contribute.

Citing Kokkos

Please see the following page.

License

License

Under the terms of Contract DE-NA0003525 with NTESS, the U.S. Government retains certain rights in this software.

The full license statement used in all headers is available here or here.

kokkos-resilience's People

Contributors

elisabethgiem avatar hkaiser avatar jeffmiles63 avatar keitat avatar matthew-whitlock avatar nmm0 avatar srirajpaul avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kokkos-resilience's Issues

Build error when cmake option KR_ENABLE_TRACING is ON

/home/kestenerp/install/kokkos/github/kokkos-resilience-pk/src/resilience/veloc/VelocBackend.cpp: In member function ‘void KokkosResilience::VeloCFileBackend::checkpoint(const string&, int, const std::vector<KokkosResilience::BasicViewHolder<false> >&)’:
/home/kestenerp/install/kokkos/github/kokkos-resilience-pk/src/resilience/veloc/VelocBackend.cpp:340:82: error: ‘m_context’ was not declared in this scope; did you mean ‘make_context’?
  340 |       auto write_trace = Util::begin_trace< Util::TimingTrace< std::string > >( *m_context, "write" );
      |                                                                                  ^~~~~~~~~
      |                                                                                  make_context

Apparently, class VeloCFileBackend should store a "ContextBase *m_context" member to solve this.

CUDA support issues on Power9+V100 systems

I'm running into issues building with CUDA support on Power9. The platform is a dual socket Power9 node with 32 cores and 2 V100 GPUs per node. Building with CUDA support has 2 issues I've seen so far. The first is a simple mistake in ResCudaSpace.hpp(273) that generates a bunch of syntax errors.
...
[ 25%] Building CXX object CMakeFiles/resilience.dir/src/resilience/cuda/ResCuda.cpp.o
nvcc_wrapper has been given GNU extension standard flag -std=gnu++14 - reverting flag to -std=c++14
/home/ntan1/KokkosResilience/kokkos-resilience/src/resilience/cuda/ResCudaSpace.hpp(273): error: enable_if is not a template

The fix is to add std:: to both the enable_if and is_same template functions on line 273.

The second error comes further along when building the tests.

[ 50%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestResilience.cpp.o
/home/ntan1/KokkosResilience/kokkos/build/install/include/impl/Kokkos_Profiling_Interface.hpp(79): error: incomplete type is not allowed
detected during:
instantiation of "uint32_t Kokkos::Profiling::Experimental::device_id(const ExecutionSpace &) [with ExecutionSpace=KokkosResilience::ResCuda]"
/home/ntan1/KokkosResilience/kokkos/build/install/include/Kokkos_Parallel.hpp(171): here
instantiation of "void Kokkos::parallel_for(const ExecPolicy &, const FunctorType &, const std::cxx11::string &, std::enable_if<Kokkos::is_execution_policy::value, void>::type *) [with ExecPolicy=Kokkos::RangePolicyKokkosResilience::ResCuda, FunctorType=lambda ->void]"
/home/ntan1/KokkosResilience/kokkos-resilience/tests/TestResilience.cpp(93): here
instantiation of "void TestResilientRange<ExecSpace, ScheduleType, DataType>::test_for() [with ExecSpace=Kokkos::Serial, ScheduleType=Kokkos::ScheduleKokkos::Static, DataType=int]"
/home/ntan1/KokkosResilience/kokkos-resilience/tests/TestResilience.cpp(117): here
instantiation of "void TestResilience_range_Test<gtest_TypeParam
>::TestBody() [with gtest_TypeParam
=Kokkos::Serial]"
/home/ntan1/KokkosResilience/kokkos-resilience/build/deps/googletest-src/googletest/include/gtest/internal/gtest-internal.h(470): here
implicit generation of "TestResilience_range_Test<gtest_TypeParam
>::~TestResilience_range_Test() [with gtest_TypeParam_=Kokkos::Serial]"
/home/ntan1/KokkosResilience/kokkos-resilience/build/_deps/googletest-src/googletest/include/gtest/internal/gtest-internal.h(470): here
[ 4 instantiation contexts not shown ]
implicit generation of "testing::internal::TestFactoryImpl::~TestFactoryImpl() [with TestClass=TestResilience_range_TestKokkos::Serial]"
/home/ntan1/KokkosResilience/kokkos-resilience/build/_deps/googletest-src/googletest/include/gtest/internal/gtest-internal.h(728): here
instantiation of class "testing::internal::TestFactoryImpl [with TestClass=TestResilience_range_TestKokkos::Serial]"
/home/ntan1/KokkosResilience/kokkos-resilience/build/_deps/googletest-src/googletest/include/gtest/internal/gtest-internal.h(728): here
implicit generation of "testing::internal::TestFactoryImpl::TestFactoryImpl() [with TestClass=TestResilience_range_TestKokkos::Serial]"
/home/ntan1/KokkosResilience/kokkos-resilience/build/_deps/googletest-src/googletest/include/gtest/internal/gtest-internal.h(728): here
instantiation of class "testing::internal::TestFactoryImpl [with TestClass=TestResilience_range_TestKokkos::Serial]"
/home/ntan1/KokkosResilience/kokkos-resilience/build/_deps/googletest-src/googletest/include/gtest/internal/gtest-internal.h(728): here
instantiation of "__nv_bool testing::internal::TypeParameterizedTest<Fixture, TestSel, Types>::Register(const char *, const testing::internal::CodeLocation &, const char *, const char *, int, const std::vector<std::_cxx11::string, std::allocatorstd::__cxx11::string> &) [with Fixture=TestResilience, TestSel=testing::internal::TemplateSel<TestResilience_range_Test>, Types=gtest_type_params_TestResilience]"
/home/ntan1/KokkosResilience/kokkos-resilience/tests/TestResilience.cpp(110): here

1 error detected in the compilation of "/tmp/tmpxft_00002401_00000000-6_TestResilience.cpp1.ii".
make[2]: *** [tests/CMakeFiles/resilience_tests.dir/TestResilience.cpp.o] Error 1
make[1]: *** [tests/CMakeFiles/resilience_tests.dir/all] Error 2
make: *** [all] Error 2

I'm not sure how to fix this.

Resilient Kokkos RangePolicy parallel_for has a bug in heatdist test

The issue is that RangePolicy resilient parallel_for with single-dimensional resilient views and no MPI fails in the heat distribution test on kahuna. It appears to not run at all/enter infinite loop (program times out), although the precise moment of failure is yet to be determined.

Modules loaded:
cmake 3.19.1
gcc7-support
gcc 7.5.0

Notes:
0) Branch of Resilient Kokkos: resilient-execution-space

  1. No MPI version of heat distribution test works with non-resilient Kokkos
  2. heatdist test code: nmm0/veloc-heat-test@ef7a94b

Dynamic Scheduling - Resilient Execution Space

Dynamic scheduling disabled for range policy implementation of parallel_for in resilient execution space. Can likely be resolved by creating new data class for resilient scheduling.

Build error for Kokkos::Cuda backend

I'm trying to build current main branch against kokkos with cuda backend activated.

I've encountered at least two errors for which I would need help:

  1. about Kokkos_TrackDuplicates.hpp
[ 48%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestResilience.cpp.o
In file included from /data/pkestene/install/trilinos/kokkos/github/kokkos-resilience_pk/tests/TestResilience.cpp:45:
/data/pkestene/install/trilinos/kokkos/github/kokkos-resilience_pk/src/resilience/cuda/ResCudaSpace.hpp:47:10: fatal error: impl/Kokkos_TrackDuplicates.hpp: No such file or directory
   47 | #include <impl/Kokkos_TrackDuplicates.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [tests/CMakeFiles/resilience_tests.dir/build.make:90: tests/CMakeFiles/resilience_tests.dir/TestResilience.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:485: tests/CMakeFiles/resilience_tests.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

I'm not able to find header Kokkos_TrackDuplicates.hpp in kokkos sources even with git grep.

  1. about header impl/Kokkos_Tags.hpp
[ 46%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestResilience.cpp.o
In file included from /data/pkestene/install/trilinos/kokkos/github/kokkos-resilience_pk/tests/TestResilience.cpp:46:
/data/pkestene/install/trilinos/kokkos/github/kokkos-resilience_pk/src/resilience/cuda/ResCuda.hpp:61:10: fatal error: impl/Kokkos_Tags.hpp: No such file or directory
   61 | #include <impl/Kokkos_Tags.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

header impl/Kokkos_Tags.hpp exists in Kokkos sources, but it is not installed anymore.

These errors are trigger when I use kokkos 4.1.00

Which version / git hash of Kokkos do you recommend ?

Update code to work with newer versions of Kokkos develop

A recent commit (4dced25118a9fa5021de7e8cf076154cf9681e26) changed the names for some of the internal OpenMP files. Kokkos resilience will need some updates to rename a few header includes.

  • Kokkos_OpenMP_Exec -> Kokkos_OpenMP_Instance
  • Kokkos::Impl::OpenMPExec -> Kokkos::Impl::OpenMPInternal

There are also some issues in some of the tests (TestOpenMPResilientExecution.cpp). I don't see any recent changes that could have caused the problem other than some magic number changes.

/home/nphtan/kokkos/build/install/include/impl/Kokkos_Profiling_Interface.hpp(126): error: class "Kokkos::Tools::Experimental::DeviceTypeTraitsKokkosResilience::ResOpenMP" has no member "device_id"
(DeviceTypeTraits::device_id(space)
^
detected during:
instantiation of "uint32_t={__uint32_t={unsigned int}} Kokkos::Tools::Experimental::device_id(const ExecutionSpace &) [with ExecutionSpace=KokkosResilience::ResOpenMP]" at line 371 of "/home/nphtan/kokkos/build/install/include/impl/Kokkos_Tools_Generic.hpp"
instantiation of "void Kokkos::Tools::Impl::begin_parallel_for(ExecPolicy &, FunctorType &, const std::string &, uint64_t={__uint64_t={unsigned long}} &) [with ExecPolicy=range_policy, FunctorType=const lambda ->void]" at line 162 of "/home/nphtan/kokkos/build/install/include/Kokkos_Parallel.hpp"
instantiation of "void Kokkos::parallel_for(const ExecPolicy &, const FunctorType &, const std::string &, std::enable_if_t<Kokkos::is_execution_policy::value, void> *) [with ExecPolicy=range_policy, FunctorType=lambda ->void]" at line 113 of "/home/nphtan/kokkos-resilience/tests/TestOpenMPResilientExecution.cpp"

compilation aborted for /home/nphtan/kokkos-resilience/tests/TestOpenMPResilientExecution.cpp (code 2)
make[2]: *** [tests/CMakeFiles/resilience_tests.dir/build.make:134: tests/CMakeFiles/resilience_tests.dir/TestOpenMPResilientExecution.cpp.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:573: tests/CMakeFiles/resilience_tests.dir/all] Error 2
make: *** [Makefile:160: all] Error 2

Fix cmake installation path

Currently cmake Config is installed in ${CMAKE_INSTALL_PREFIX}/share/resilience/cmake and the cmake Targets file inside ${CMAKE_INSTALL_PREFIX}/cmake. Since cmake Config file tries to access the Targets file from the same directory, making the Config file also installed inside in ${CMAKE_INSTALL_PREFIX}/cmake should solve this. For reference, both Config and Target files of kokkos get installed in the directory ${CMAKE_INSTALL_PREFIX}/lib/cmake. The issue came up while trying to use kokkos-resilience in an out-of-tree cmake file.

Unify view subscriber

We should have aliases for view subscribers that work with both automatic checkpointing and resilient execution spaces or either one separately.

  • Type alias CheckpointView -> Automatic checkpointing
  • Type alias ResilientExecView -> Resilient Execution Space
  • Type alias ResilientView -> both

deregister views

Deregister views with VeloC when a view is not included in the checkpoint

Build issues on Theta (ANL)

I'm having issues building the tests on Theta. I have tried both GNU and Intel compilers. During the build process for the tests there seems to be an issue with the resilience test. I've include the full build steps I performed along with the modules I have loaded.

(miniconda-3/latest/base) nphtan@thetalogin5:~/kokkos/build> module list
Currently Loaded Modulefiles:

  1. modules/3.2.11.4 15) rca/2.2.20-7.0.2.1_2.78__g8e3fb5b.ari
  2. intel/19.1.0.166 16) atp/3.8.1
  3. craype-network-aries 17) perftools-base/20.06.0
  4. craype/2.6.5 18) PrgEnv-intel/6.0.7
  5. cray-libsci/20.06.1 19) craype-mic-knl
  6. udreg/2.3.2-7.0.2.1_2.33__g8175d3d.ari 20) cray-mpich/7.7.14
  7. ugni/6.0.14.0-7.0.2.1_3.60__ge78e5b0.ari 21) nompirun/nompirun
  8. pmi/5.0.16 22) adaptive-routing-a3
  9. dmapp/7.1.1-7.0.2.1_2.78__g38cf134.ari 23) darshan/3.2.1
  10. gni-headers/5.0.12.0-7.0.2.1_2.19__g3b1768f.ari 24) xalt
  11. xpmem/2.2.20-7.0.2.1_2.60__g87eb960.ari 25) miniconda-3/latest
  12. job/2.2.4-7.0.2.1_2.72__g36b56f4.ari 26) cray-hdf5-parallel/1.10.6.1
  13. dvs/2.12_2.2.172-7.0.2.1_8.1__g7056cbb6 27) boost/intel/1.64.0
  14. alps/6.6.59-7.0.2.1_3.65__g872a8d62.ari

cat build.sh
#! /usr/bin/env bash

cmake
-DCMAKE_BUILD_TYPE=RelWithDebInfo
-DCMAKE_CXX_COMPILER=CC
-DCMAKE_CXX_FLAGS="-dynamic"
-DCMAKE_INSTALL_PREFIX=/home/nphtan/kokkos/build/install
-DKokkos_ENABLE_OPENMP=ON
-DKokkos_ENABLE_SERIAL=ON
-DKokkos_ARCH_KNL=ON
..

(miniconda-3/latest/base) nphtan@thetalogin5:~/kokkos/build> . build.sh
-- Setting default Kokkos CXX standard to 11
-- The CXX compiler identification is Intel 19.1.0.20191121
-- Cray Programming Environment 2.6.5 CXX
-- Check for working CXX compiler: /opt/cray/pe/craype/2.6.5/bin/CC
-- Check for working CXX compiler: /opt/cray/pe/craype/2.6.5/bin/CC -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting policy CMP0074 to use _ROOT variables
-- The project name is: Kokkos
-- Using -std=gnu++11 for C++11 extensions as feature
-- Execution Spaces:
-- Device Parallel: NONE
-- Host Parallel: OPENMP
-- Host Serial: SERIAL

-- Architectures:
-- KNL
-- Found TPLLIBDL: /usr/lib64/libdl.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/nphtan/kokkos/build
(miniconda-3/latest/base) nphtan@thetalogin5:~/kokkos/build> make
Scanning dependencies of target kokkoscore
[ 4%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o
[ 8%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o
[ 13%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Error.cpp.o
[ 17%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_ExecPolicy.cpp.o
[ 21%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostBarrier.cpp.o
[ 26%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace.cpp.o
[ 30%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace_deepcopy.cpp.o
[ 34%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostThreadTeam.cpp.o
[ 39%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_MemoryPool.cpp.o
[ 43%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Profiling_Interface.cpp.o
[ 47%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Serial.cpp.o
[ 52%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Serial_Task.cpp.o
[ 56%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_SharedAlloc.cpp.o
[ 60%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Spinwait.cpp.o
[ 65%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Stacktrace.cpp.o
[ 69%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_TrackDuplicates.cpp.o
[ 73%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_ViewHooks.cpp.o
[ 78%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_hwloc.cpp.o
[ 82%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/OpenMP/Kokkos_OpenMP_Exec.cpp.o
[ 86%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/OpenMP/Kokkos_OpenMP_Task.cpp.o
[ 91%] Linking CXX static library libkokkoscore.a
[ 91%] Built target kokkoscore
Scanning dependencies of target kokkoscontainers
[ 95%] Building CXX object containers/src/CMakeFiles/kokkoscontainers.dir/impl/Kokkos_UnorderedMap_impl.cpp.o
[100%] Linking CXX static library libkokkoscontainers.a
[100%] Built target kokkoscontainers

(miniconda-3/latest/base) nphtan@thetalogin5:~/VELOC> python auto-install.py --no-boost --no-deps $HOME/VELOC/build/install
Installing VeloC in /home/nphtan/VELOC/build/install...
CMake Warning:
No source or binary directory provided. Both will be assumed to be the
same as the current working directory, but note that this warning will
become a fatal error in future CMake releases.

-- The C compiler identification is Intel 19.1.0.20191121
-- The CXX compiler identification is Intel 19.1.0.20191121
-- Cray Programming Environment 2.6.5 C
-- Check for working C compiler: /opt/cray/pe/craype/2.6.5/bin/cc
-- Check for working C compiler: /opt/cray/pe/craype/2.6.5/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Cray Programming Environment 2.6.5 CXX
-- Check for working CXX compiler: /opt/cray/pe/craype/2.6.5/bin/CC
-- Check for working CXX compiler: /opt/cray/pe/craype/2.6.5/bin/CC -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Boost version: 1.64.0
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE
-- Found MPI_C: /opt/cray/pe/craype/2.6.5/bin/cc (found version "3.1")
-- Found MPI_CXX: /opt/cray/pe/craype/2.6.5/bin/CC (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/nphtan/VELOC
Scanning dependencies of target veloc-modules
[ 5%] Building CXX object src/modules/CMakeFiles/veloc-modules.dir/module_manager.cpp.o
[ 10%] Building CXX object src/modules/CMakeFiles/veloc-modules.dir/client_watchdog.cpp.o
[ 15%] Building CXX object src/modules/CMakeFiles/veloc-modules.dir/transfer_module.cpp.o
[ 20%] Building CXX object src/modules/CMakeFiles/veloc-modules.dir//common/config.cpp.o
[ 25%] Linking CXX shared library libveloc-modules.so
[ 25%] Built target veloc-modules
Scanning dependencies of target veloc-backend
[ 30%] Building CXX object src/backend/CMakeFiles/veloc-backend.dir/main.cpp.o
[ 35%] Building CXX object src/backend/CMakeFiles/veloc-backend.dir/
/common/config.cpp.o
[ 40%] Linking CXX executable veloc-backend
[ 40%] Built target veloc-backend
Scanning dependencies of target veloc-client
[ 45%] Building CXX object src/lib/CMakeFiles/veloc-client.dir/veloc.cpp.o
[ 50%] Building CXX object src/lib/CMakeFiles/veloc-client.dir/client.cpp.o
[ 55%] Building CXX object src/lib/CMakeFiles/veloc-client.dir/__/common/config.cpp.o
[ 60%] Linking CXX shared library libveloc-client.so
[ 60%] Built target veloc-client
Scanning dependencies of target heatdis_fault
[ 65%] Building CXX object test/CMakeFiles/heatdis_fault.dir/heatdis_fault.cpp.o
[ 70%] Linking CXX executable heatdis_fault
[ 70%] Built target heatdis_fault
Scanning dependencies of target heatdis_original
[ 75%] Building C object test/CMakeFiles/heatdis_original.dir/heatdis_original.c.o
[ 80%] Linking C executable heatdis_original
[ 80%] Built target heatdis_original
Scanning dependencies of target heatdis_file
[ 85%] Building C object test/CMakeFiles/heatdis_file.dir/heatdis_file.c.o
[ 90%] Linking C executable heatdis_file
[ 90%] Built target heatdis_file
Scanning dependencies of target heatdis_mem
[ 95%] Building C object test/CMakeFiles/heatdis_mem.dir/heatdis_mem.c.o
[100%] Linking C executable heatdis_mem
[100%] Built target heatdis_mem
Install the project...
-- Install configuration: "Release"
-- Installing: /home/nphtan/VELOC/build/install/lib/libveloc-modules.so
-- Installing: /home/nphtan/VELOC/build/install/bin/veloc-backend
-- Set runtime path of "/home/nphtan/VELOC/build/install/bin/veloc-backend" to ""
-- Installing: /home/nphtan/VELOC/build/install/lib/libveloc-client.so
-- Set runtime path of "/home/nphtan/VELOC/build/install/lib/libveloc-client.so" to ""
-- Up-to-date: /home/nphtan/VELOC/build/install/include/veloc.h
running install
running build
running build_py
running install_lib
running install_egg_info
Removing /home/nphtan/.local/miniconda-3/latest/lib/python3.7/site-packages/VELOC_Python-0.1-py3.7.egg-info
Writing /home/nphtan/.local/miniconda-3/latest/lib/python3.7/site-packages/VELOC_Python-0.1-py3.7.egg-info
Installation successful!

(miniconda-3/latest/base) nphtan@thetalogin5:~/kokkos-resilience/build> cat build.sh
#!/usr/bin/env bash

cmake
-DCMAKE_BUILD_TYPE=RelWithDebInfo
-DCMAKE_C_COMPILER=cc
-DCMAKE_C_FLAGS="-dynamic"
-DCMAKE_CXX_COMPILER=CC
-DCMAKE_CXX_FLAGS="-dynamic"
-DCMAKE_INSTALL_PREFIX=/home/nphtan/kokkos-resilience/build/install
-DVeloC_ROOT=/home/nphtan/VELOC/build/install
-DKokkos_ROOT=/home/nphtan/kokkos/build/install
-DKR_ENABLE_TRACING=ON
-DKR_ENABLE_STDIO=ON
-DKR_ENABLE_HDF5_PARALLEL=ON
-DVELOC_BAREBONE=ON
..

(miniconda-3/latest/base) nphtan@thetalogin5:/kokkos-resilience/build> . build.sh
-- The C compiler identification is Intel 19.1.0.20191121
-- The CXX compiler identification is Intel 19.1.0.20191121
-- Cray Programming Environment 2.6.5 C
-- Check for working C compiler: /opt/cray/pe/craype/2.6.5/bin/cc
-- Check for working C compiler: /opt/cray/pe/craype/2.6.5/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Cray Programming Environment 2.6.5 CXX
-- Check for working CXX compiler: /opt/cray/pe/craype/2.6.5/bin/CC
-- Check for working CXX compiler: /opt/cray/pe/craype/2.6.5/bin/CC -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Enabled Kokkos devices: OPENMP;SERIAL
-- Found MPI_C: /opt/cray/pe/craype/2.6.5/bin/cc (found version "3.1")
-- Found MPI_CXX: /opt/cray/pe/craype/2.6.5/bin/CC (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found VeloC: /home/nphtan/VELOC/build/install
-- Found HDF5: /opt/cray/pe/hdf5-parallel/1.10.6.1/INTEL/19.1
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- cxxopts version 2.2.0
-- Found PythonInterp: /soft/datascience/conda/miniconda3/latest/bin/python (found version "3.7.6")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /home/nphtan/kokkos-resilience/build
(miniconda-3/latest/base) nphtan@thetalogin5:
/kokkos-resilience/build> make
Scanning dependencies of target resilience
[ 1%] Building CXX object CMakeFiles/resilience.dir/src/resilience/Resilience.cpp.o
[ 3%] Building CXX object CMakeFiles/resilience.dir/src/resilience/AutomaticCheckpoint.cpp.o
[ 5%] Building CXX object CMakeFiles/resilience.dir/src/resilience/Context.cpp.o
[ 7%] Building CXX object CMakeFiles/resilience.dir/src/resilience/Config.cpp.o
[ 9%] Building CXX object CMakeFiles/resilience.dir/src/resilience/Cref.cpp.o
[ 10%] Building CXX object CMakeFiles/resilience.dir/src/resilience/ResilientRef.cpp.o
[ 12%] Building CXX object CMakeFiles/resilience.dir/src/resilience/MPIContext.cpp.o
[ 14%] Building CXX object CMakeFiles/resilience.dir/src/resilience/filesystem/ExternalIOInterface.cpp.o
[ 16%] Building CXX object CMakeFiles/resilience.dir/src/resilience/filesystem/Filesystem.cpp.o
[ 18%] Building CXX object CMakeFiles/resilience.dir/src/resilience/stdio/StdFileSpace.cpp.o
[ 20%] Building CXX object CMakeFiles/resilience.dir/src/resilience/veloc/VelocBackend.cpp.o
[ 21%] Building CXX object CMakeFiles/resilience.dir/src/resilience/StdFileContext.cpp.o
[ 23%] Building CXX object CMakeFiles/resilience.dir/src/resilience/stdfile/StdFileBackend.cpp.o
[ 25%] Building CXX object CMakeFiles/resilience.dir/src/resilience/hdf5/HDF5Space.cpp.o
[ 27%] Linking CXX static library libresilience.a
[ 27%] Built target resilience
Scanning dependencies of target example
[ 29%] Building CXX object _deps/cxxopts-build/src/CMakeFiles/example.dir/example.cpp.o
[ 30%] Linking CXX executable example
[ 30%] Built target example
Scanning dependencies of target link_test
[ 32%] Building CXX object _deps/cxxopts-build/test/CMakeFiles/link_test.dir/link_a.cpp.o
[ 34%] Building CXX object _deps/cxxopts-build/test/CMakeFiles/link_test.dir/link_b.cpp.o
[ 36%] Linking CXX executable link_test
[ 36%] Built target link_test
Scanning dependencies of target options_test
[ 38%] Building CXX object _deps/cxxopts-build/test/CMakeFiles/options_test.dir/main.cpp.o
[ 40%] Building CXX object _deps/cxxopts-build/test/CMakeFiles/options_test.dir/options.cpp.o
[ 41%] Linking CXX executable options_test
[ 41%] Built target options_test
Scanning dependencies of target gtest
[ 43%] Building CXX object _deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 45%] Linking CXX static library ../../../lib/libgtest.a
[ 45%] Built target gtest
Scanning dependencies of target resilience_tests
[ 47%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestMain.cpp.o
[ 49%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestResilience.cpp.o
[ 50%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestLambdaCapture.cpp.o
[ 52%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestVelocMemoryBackend.cpp.o
[ 54%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestStdFileBackend.cpp.o
[ 56%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestViewCheckpoint.cpp.o
[ 58%] Building CXX object tests/CMakeFiles/resilience_tests.dir/TestHDF5Configuration.cpp.o
[ 60%] Linking CXX executable resilience_tests
CMake Error at /lus/theta-fs0/software/datascience/conda/miniconda3/latest/share/cmake-3.14/Modules/GoogleTestAddTests.cmake:40 (message):
Error running test executable.

Path: '/home/nphtan/kokkos-resilience/build/tests/resilience_tests'
Result: Illegal instruction
Output:

make[2]: *** [tests/CMakeFiles/resilience_tests.dir/build.make:185: tests/resilience_tests] Error 1
make[2]: *** Deleting file 'tests/resilience_tests'
make[1]: *** [CMakeFiles/Makefile2:475: tests/CMakeFiles/resilience_tests.dir/all] Error 2
make: *** [Makefile:141: all] Error 2

(miniconda-3/latest/base) nphtan@thetalogin5:~/kokkos-resilience/build> make VERBOSE=1
/lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -S/home/nphtan/kokkos-resilience -B/home/nphtan/kokkos-resilience/build --check-build-system CMakeFiles/Makefile.cmake 0
/lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -E cmake_progress_start /home/nphtan/kokkos-resilience/build/CMakeFiles /home/nphtan/kokkos-resilience/build/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make -f CMakeFiles/resilience.dir/build.make CMakeFiles/resilience.dir/depend
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
cd /home/nphtan/kokkos-resilience/build && /lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -E cmake_depends "Unix Makefiles" /home/nphtan/kokkos-resilience /home/nphtan/kokkos-resilience /home/nphtan/kokkos-resilience/build /home/nphtan/kokkos-resilience/build /home/nphtan/kokkos-resilience/build/CMakeFiles/resilience.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make -f CMakeFiles/resilience.dir/build.make CMakeFiles/resilience.dir/build
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make[2]: Nothing to be done for 'CMakeFiles/resilience.dir/build'.
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
[ 27%] Built target resilience
make -f _deps/cxxopts-build/src/CMakeFiles/example.dir/build.make _deps/cxxopts-build/src/CMakeFiles/example.dir/depend
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
cd /home/nphtan/kokkos-resilience/build && /lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -E cmake_depends "Unix Makefiles" /home/nphtan/kokkos-resilience /home/nphtan/kokkos-resilience/build/_deps/cxxopts-src/src /home/nphtan/kokkos-resilience/build /home/nphtan/kokkos-resilience/build/_deps/cxxopts-build/src /home/nphtan/kokkos-resilience/build/_deps/cxxopts-build/src/CMakeFiles/example.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make -f _deps/cxxopts-build/src/CMakeFiles/example.dir/build.make _deps/cxxopts-build/src/CMakeFiles/example.dir/build
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make[2]: Nothing to be done for '_deps/cxxopts-build/src/CMakeFiles/example.dir/build'.
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
[ 30%] Built target example
make -f _deps/cxxopts-build/test/CMakeFiles/link_test.dir/build.make _deps/cxxopts-build/test/CMakeFiles/link_test.dir/depend
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
cd /home/nphtan/kokkos-resilience/build && /lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -E cmake_depends "Unix Makefiles" /home/nphtan/kokkos-resilience /home/nphtan/kokkos-resilience/build/_deps/cxxopts-src/test /home/nphtan/kokkos-resilience/build /home/nphtan/kokkos-resilience/build/_deps/cxxopts-build/test /home/nphtan/kokkos-resilience/build/_deps/cxxopts-build/test/CMakeFiles/link_test.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make -f _deps/cxxopts-build/test/CMakeFiles/link_test.dir/build.make _deps/cxxopts-build/test/CMakeFiles/link_test.dir/build
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make[2]: Nothing to be done for '_deps/cxxopts-build/test/CMakeFiles/link_test.dir/build'.
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
[ 36%] Built target link_test
make -f _deps/cxxopts-build/test/CMakeFiles/options_test.dir/build.make _deps/cxxopts-build/test/CMakeFiles/options_test.dir/depend
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
cd /home/nphtan/kokkos-resilience/build && /lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -E cmake_depends "Unix Makefiles" /home/nphtan/kokkos-resilience /home/nphtan/kokkos-resilience/build/_deps/cxxopts-src/test /home/nphtan/kokkos-resilience/build /home/nphtan/kokkos-resilience/build/_deps/cxxopts-build/test /home/nphtan/kokkos-resilience/build/_deps/cxxopts-build/test/CMakeFiles/options_test.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make -f _deps/cxxopts-build/test/CMakeFiles/options_test.dir/build.make _deps/cxxopts-build/test/CMakeFiles/options_test.dir/build
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make[2]: Nothing to be done for '_deps/cxxopts-build/test/CMakeFiles/options_test.dir/build'.
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
[ 41%] Built target options_test
make -f _deps/googletest-build/googletest/CMakeFiles/gtest.dir/build.make _deps/googletest-build/googletest/CMakeFiles/gtest.dir/depend
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
cd /home/nphtan/kokkos-resilience/build && /lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -E cmake_depends "Unix Makefiles" /home/nphtan/kokkos-resilience /home/nphtan/kokkos-resilience/build/_deps/googletest-src/googletest /home/nphtan/kokkos-resilience/build /home/nphtan/kokkos-resilience/build/_deps/googletest-build/googletest /home/nphtan/kokkos-resilience/build/_deps/googletest-build/googletest/CMakeFiles/gtest.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make -f _deps/googletest-build/googletest/CMakeFiles/gtest.dir/build.make _deps/googletest-build/googletest/CMakeFiles/gtest.dir/build
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make[2]: Nothing to be done for '_deps/googletest-build/googletest/CMakeFiles/gtest.dir/build'.
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
[ 45%] Built target gtest
make -f tests/CMakeFiles/resilience_tests.dir/build.make tests/CMakeFiles/resilience_tests.dir/depend
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
cd /home/nphtan/kokkos-resilience/build && /lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -E cmake_depends "Unix Makefiles" /home/nphtan/kokkos-resilience /home/nphtan/kokkos-resilience/tests /home/nphtan/kokkos-resilience/build /home/nphtan/kokkos-resilience/build/tests /home/nphtan/kokkos-resilience/build/tests/CMakeFiles/resilience_tests.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make -f tests/CMakeFiles/resilience_tests.dir/build.make tests/CMakeFiles/resilience_tests.dir/build
make[2]: Entering directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
[ 47%] Linking CXX executable resilience_tests
cd /home/nphtan/kokkos-resilience/build/tests && /lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -E cmake_link_script CMakeFiles/resilience_tests.dir/link.txt --verbose=1
/opt/cray/pe/craype/2.6.5/bin/CC -dynamic -O2 -g -DNDEBUG -fopenmp -xMIC-AVX512 CMakeFiles/resilience_tests.dir/TestMain.cpp.o CMakeFiles/resilience_tests.dir/TestResilience.cpp.o CMakeFiles/resilience_tests.dir/TestLambdaCapture.cpp.o CMakeFiles/resilience_tests.dir/TestVelocMemoryBackend.cpp.o CMakeFiles/resilience_tests.dir/TestStdFileBackend.cpp.o CMakeFiles/resilience_tests.dir/TestViewCheckpoint.cpp.o CMakeFiles/resilience_tests.dir/TestHDF5Configuration.cpp.o -o resilience_tests -Wl,-rpath,/home/nphtan/VELOC/build/install/lib ../lib/libgtest.a ../libresilience.a /home/nphtan/kokkos/build/install/lib64/libkokkoscontainers.a /home/nphtan/kokkos/build/install/lib64/libkokkoscore.a /usr/lib64/libdl.so /home/nphtan/VELOC/build/install/lib/libveloc-client.so /home/nphtan/VELOC/build/install/lib/libveloc-modules.so /opt/cray/pe/hdf5-parallel/1.10.6.1/INTEL/19.1/lib/libhdf5.so /soft/libraries/boost/1.64.0/intel/lib/libboost_filesystem-mt.so /soft/libraries/boost/1.64.0/intel/lib/libboost_system-mt.so
cd /home/nphtan/kokkos-resilience/build/tests && /lus/theta-fs0/software/datascience/conda/miniconda3/latest/bin/cmake -D TEST_TARGET=resilience_tests -D TEST_EXECUTABLE=/home/nphtan/kokkos-resilience/build/tests/resilience_tests -D TEST_EXECUTOR= -D TEST_WORKING_DIR=/home/nphtan/kokkos-resilience/build/tests -D TEST_EXTRA_ARGS= -D TEST_PROPERTIES= -D TEST_PREFIX= -D TEST_SUFFIX= -D NO_PRETTY_TYPES=FALSE -D NO_PRETTY_VALUES=FALSE -D TEST_LIST=resilience_tests_TESTS -D CTEST_FILE=/home/nphtan/kokkos-resilience/build/tests/resilience_tests[1]_tests.cmake -D TEST_DISCOVERY_TIMEOUT=5 -P /lus/theta-fs0/software/datascience/conda/miniconda3/latest/share/cmake-3.14/Modules/GoogleTestAddTests.cmake
CMake Error at /lus/theta-fs0/software/datascience/conda/miniconda3/latest/share/cmake-3.14/Modules/GoogleTestAddTests.cmake:40 (message):
Error running test executable.

Path: '/home/nphtan/kokkos-resilience/build/tests/resilience_tests'
Result: Illegal instruction
Output:

make[2]: *** [tests/CMakeFiles/resilience_tests.dir/build.make:185: tests/resilience_tests] Error 1
make[2]: *** Deleting file 'tests/resilience_tests'
make[2]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make[1]: *** [CMakeFiles/Makefile2:475: tests/CMakeFiles/resilience_tests.dir/all] Error 2
make[1]: Leaving directory '/gpfs/mira-home/nphtan/kokkos-resilience/build'
make: *** [Makefile:141: all] Error 2

workflows break because spack is too old

We try and read from binary caches but the caches now require a newer version of spack. We should consider constructing a docker image separately so we don't run into these problems in the future

The performance issues with ResOpenMP execution space in resilience execution space branch.

Brief Summary

The performance of ResOpenMP in resilient execution space is extremely slow. Parent issue is #14 (comment)

How to Reproduce

Go to a cluster environment.
Set up modules.

module load gcc/10.2.0
module load cmake/3.19.1 
module load gcc10-support
module load boost/1.73.0

Then load Nic Moreales' Kokkos (Kokkos_ARCH_XXX is adjusted to your platform's node architecture)

git clone [email protected]:nmm0/kokkos.git
cd kokkos
git checkout accessor-hooks
mkdir BUILD 
cd BUILD
cmake -DCMAKE_INSTALL_PREFIX=(my home dir)/kokkos_viewhooks  -DKokkos_ENABLE_OPENMP=ON -DKokkos_ARCH_HSW=ON ../.
make -j 8 install

Checkout and build resilience extension with execution space (my version has modified CMakeLIsts.txt). I am using find_package(). So put only one Kokkos installation at CMAKE_PREFIX_PATH

git clone git clone [email protected]:keitat/kokkos-resilience.git
cd kokkos-resilience
git checkout resilient_execution_space
mkdir BUILD
cd BUILD
cmake -DCMAKE_BUILD_TYPE=Release  -DCMAKE_INSTALL_PREFIX=(my home dir)/resilienceKT  -DCMAKE_PREFIX_PATH=(my kokkos installation) ../.
make -j 8 install ../

Checkout heatdis code and edit CMAKE_PREFIX_PATH to point Kokkos and Kokkos-resilience installation (You can edit in CMakeLists.txt)

git clone [email protected]:nmm0/veloc-heat-test.git
git checkout no-mdrange
cd velocity-heat-test
(Edit CMAKE_PREFIX_PATH, CMakeLists.txt)
mkdir BUILD
cd BUILD. 
cmake ../. 
make

Run heatdis and heatdis_resil. The latter is extremely slow.

Diagnosis

heatdis_resil is pretty slow (5x for small problems and 1000x for 16MB problems). I suspect the way to schedule functor in execute, and exec_work methods in src/resilience/openMPI/OpenMPResParallel.hpp.

examples outdated

Our examples haven't been updated since the subscribable hooks were merged into Kokkos core

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.