Coder Social home page Coder Social logo

kokkos / kokkos-remote-spaces Goto Github PK

View Code? Open in Web Editor NEW
37.0 37.0 16.0 4.04 MB

This repository contains Kokkos Remote Spaces, which implements distributed shared memory support for Kokkos.

License: Other

CMake 1.63% C++ 96.76% Groovy 0.23% Shell 1.38%
cuda distributed-computing gpu high-performance-computing hpc mpi parallel-computing pgas

kokkos-remote-spaces's Introduction

Kokkos

Kokkos: Core Libraries

Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use CUDA, HIP, SYCL, HPX, OpenMP and C++ threads as backend programming models with several other backends in development.

Kokkos Core is part of the Kokkos C++ Performance Portability Programming Ecosystem.

Kokkos is a Linux Foundation project.

Learning about Kokkos

To start learning about Kokkos:

Obtaining Kokkos

The latest release of Kokkos can be obtained from the GitHub releases page.

The current release is 4.3.00.

curl -OJ -L https://github.com/kokkos/kokkos/archive/refs/tags/4.3.00.tar.gz
# Or with wget
wget https://github.com/kokkos/kokkos/archive/refs/tags/4.3.00.tar.gz

To clone the latest development version of Kokkos from GitHub:

git clone -b develop  https://github.com/kokkos/kokkos.git

Building Kokkos

To build Kokkos, you will need to have a C++ compiler that supports C++17 or later. All requirements including minimum and primary tested compiler versions can be found here.

Building and installation instructions are described here.

You can also install Kokkos using Spack: spack install kokkos. Available configuration options can be displayed using spack info kokkos.

For the complete documentation: kokkos.org/kokkos-core-wiki/

Support

For questions find us on Slack: https://kokkosteam.slack.com or open a GitHub issue.

For non-public questions send an email to: crtrott(at)sandia.gov

Contributing

Please see this page for details on how to contribute.

Citing Kokkos

Please see the following page.

License

License

Under the terms of Contract DE-NA0003525 with NTESS, the U.S. Government retains certain rights in this software.

The full license statement used in all headers is available here or here.

kokkos-remote-spaces's People

Contributors

akhillanger avatar bkp avatar brian-kelley avatar crtrott avatar davidozog avatar dsmishler avatar huanghua1994 avatar janciesko avatar jeffhammond avatar jjwilke avatar lucbv avatar rombur avatar vmiheer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kokkos-remote-spaces's Issues

Add support for RemoteView MemoryTraits

Support RemoteView template specialization.

Unmanaged
Atomic
RandomAccess
Restrict

Use-case:

View<RemoteView_t, Kokkos::MemoryTraits<Kokkos::Atomic>, int*> v("Count",2);
  parallel_for("AtomIncr", values.extent(0) * values.extent(1), KOKKOS_LAMBDA(const int i) {
      int rank = i / values.extent(0);
      int idx = i % values.extent(1);
      v(rank, idx) +=1; //No data race as remote view uses trait specialization
  });

No fence in View initialization

It seems that if a View in MPISpace or SHMEMSpace is initialized to zeros, then a fence of the space is needed before it's safe to use that view (reads or writes). Otherwise, another rank might still be working on the initialization of its local part. I think this fence should probably be added by default to match Kokkos core behavior. I know there are async allocations and initializations there too, but if you also execute kernels on the same stream then things like this work as expected:

View<...> view("myView", ...);
Kokkos::parallel_for(N, KOKKOS_LAMBDA(int i) {if(i % 2) view(i) = 1;});

The reproducer below does an infinite loop to catch this, since it's somewhat random. If the issue is observed, it does MPI_Abort and stops. If everything is behaving well, it would just stay in the infinite loop forever.

#include <Kokkos_RemoteSpaces.hpp>
#include <mpi.h>

using ExecSpace   = Kokkos::DefaultExecutionSpace;
using Ordinal     = int64_t;
using RemoteSpace = Kokkos::Experimental::DefaultRemoteMemorySpace;
using RangePol = Kokkos::RangePolicy<ExecSpace>;

int getMyRank() {
  int r;
  MPI_Comm_rank(MPI_COMM_WORLD, &r);
  return r;
}

int getNumRanks() {
  int n;
  MPI_Comm_size(MPI_COMM_WORLD, &n);
  return n;
}

void initialize(int argc, char** argv) {
  int mpi_thread_level_available;
  int mpi_thread_level_required = MPI_THREAD_MULTIPLE;
#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
  mpi_thread_level_required = MPI_THREAD_SINGLE;
#endif
  MPI_Init_thread(&argc, &argv, mpi_thread_level_required,
                  &mpi_thread_level_available);
  assert(mpi_thread_level_available >= mpi_thread_level_required);
#ifdef KRS_ENABLE_SHMEMSPACE
  shmem_init_thread(mpi_thread_level_required, &mpi_thread_level_available);
  assert(mpi_thread_level_available >= mpi_thread_level_required);
#endif
#ifdef KRS_ENABLE_NVSHMEMSPACE
  MPI_Comm mpi_comm;
  nvshmemx_init_attr_t attr;
  mpi_comm      = MPI_COMM_WORLD;
  attr.mpi_comm = &mpi_comm;
  nvshmemx_init_attr(NVSHMEMX_INIT_WITH_MPI_COMM, &attr);
#endif
  Kokkos::initialize(argc, argv);
}

void finalize() {
  Kokkos::finalize();
#if defined(KRS_ENABLE_SHMEMSPACE)
  shmem_finalize();
#elif defined(KRS_ENABLE_NVSHMEMSPACE)
  nvshmem_finalize();
  MPI_Finalize();
#else
  MPI_Finalize();
#endif
}

int main(int argc, char** argv) {
  initialize(argc, argv);
  {
    int myRank = getMyRank();
    int numRanks = getNumRanks();
    for(;;)
    {
//    Replacing "myVals" with Kokkos::ViewAllocateWithoutInitializing("myVals") fixes it
      Kokkos::View<Ordinal*, RemoteSpace> myVals("myVals", numRanks + 1);
//    and so does uncommenting this fence:
//    RemoteSpace().fence();
      myVals(myRank) = myRank;
      RemoteSpace().fence();
      if(myRank == 0)
      {
        if(myVals(2) != Ordinal(2))
        {
          std::cout << "Rank 0 failed to observe write by rank 2!\n";
          MPI_Abort(MPI_COMM_WORLD, 1);
        }
      }
    }
  }
  finalize();
  return 0;
}

Build cannot find KokkosTargets.cmake file

I'm having trouble building kokkos-remote-spaces due to a missing cmake file in kokkos:

CMake Error at /home/dmozog/Repos/kokkos/build/KokkosConfig.cmake:48 (INCLUDE):
  INCLUDE could not find load file:

    /home/dmozog/Repos/kokkos/build/KokkosTargets.cmake

Call Stack (most recent call first):
  CMakeLists.txt:12 (find_package)

However, I see that KokkosTargets.cmake file is located in a different directory (CMakeFiles/Export/lib64/cmake/Kokkos/KokkosTargets.cmake), so if I create a symlink to the file from the kokkos build directory, things get further. But something still seems wrong with my build because the next error is this:

  The imported target "Kokkos::kokkoscore" references the file
     "/home/dmozog/lib64/libkokkoscore.a"
 but this file does not exist.

Here is the cmake command I'm running (on SUSE Linux Enterprise Server 15 SP3, cmake version 3.17.0, oneAPI C++ 2023.1.0):

export KOKKOS_BUILD_DIR=$HOME/Repos/kokkos/build
export SHMEM_ROOT=$HOME/usr/local/SOS/
cmake .. -DKRS_ENABLE_SHMEMSPACE=ON -DKokkos_DIR=${KOKKOS_BUILD_DIR} -DSHMEM_ROOT=${SHMEM_ROOT} -DKRS_ENABLE_TESTS=ON -DCMAKE_CXX_COMPILER=oshc++

And here is how I've configured Kokkos:

cmake .. -DCMAKE_CXX_COMPILER=icpx -DKokkos_ENABLE_SYCL=ON -DKokkos_ENABLE_HWLOC=ON

Should KOKKOS_BUILD_DIR point to the kokkos install directory? I have different (but similar) errors in that case.

CG-Solve Iteration

[1,0]<stdout>:|   |-> 4.43e-01 sec 28.4% 29.0% 0.0% 200 CGSolve::Iteration [region]
[1,0]<stdout>:|   |   |-> 3.40e-01 sec 21.8% 27.5% 6.3% 200 Tpetra::CrsMatrix::apply [region]
[1,0]<stdout>:|   |   |   |-> 2.49e-01 sec 16.0% 3.0% 8.5% 200 Tpetra::CrsMatrix::apply: Import [region]
[1,0]<stdout>:|   |   |   |   |-> 2.48e-01 sec 15.9% 3.0% 8.6% 200 Tpetra::DistObject::doTransfer [region]
[1,0]<stdout>:|   |   |   |       |-> 2.46e-01 sec 15.8% 3.1% 8.6% 200 Tpetra::DistObject::doTransferNew [region]
[1,0]<stdout>:|   |   |   |           |-> 2.18e-01 sec 14.0% 1.7% 9.6% 200 Tpetra::DistObject::doTransferNew::doPostsAndWaits [region]
[1,0]<stdout>:|   |   |   |           |   |-> 7.47e-03 sec 0.5% 50.6% 1.5% 200 Tpetra::DistObject::doTransferNew::unpackAndCombine [region]
[1,0]<stdout>:|   |   |   |           |       |-> 5.72e-03 sec 0.4% 66.1% 1.9% 200 Tpetra::MultiVector::unpackAndCombine [region]
[1,0]<stdout>:|   |   |   |           |           |-> 3.78e-03 sec 0.2% 100[1,0]<stdout>:.0% 2.8% 200 Tpetra::MultiVector unpack const stride atomic [for]
[1,0]<stdout>:|   |   |   |           |-> 1.08e-02 sec 0.7% 0.0% 11.3% 200 Tpetra::DistObject::doTransferNew::copyAndPermute [region]
[1,0]<stdout>:|   |   |   |           |   |-> 9.05e-03 sec 0.6% 0.0% 13.5% 200 Tpetra::MultiVector::copyAndPermute [region]
[1,0]<stdout>:|   |   |   |           |-> 9.27e-03 sec 0.6% 40.3% 1.4% 200 Tpetra::DistObject::doTransferNew::packAndPrepare [region]
[1,0]<stdout>:|   |   |   |           |   |-> 7.50e-03 sec 0.5% 49.9% 1.5% 200 Tpetra::MultiVector::packAndPrepare [region]
[1,0]<stdout>:|   |   |   |           |       |-> 3.74e-03 sec 0.2% 100.0% 1.9% 200 Tpetra::MultiVector pack one col [for]
[1,0]<stdout>:|   |   |   |           |       |-> 8.34e-04 sec 0.1% 0.0% 1.5% 200 Tpetra::Details::reallocDualViewIfNeeded [region]
[1,0]<stdout>:|   |   |   |           |-> 8.97e-04 sec 0.1% 0.0% 3.7% 200 Tpetra::DistObject::doTransferNew::checkSizes [region]
[1,0]<stdout>:|   |   |   |           |-> 8.54e-04 sec 0.1% 0.0% 2.2% 200 Tpetra::Details::reallocDualViewIfNeeded [region]
[1,0]<stdout>:|   |   |   |-> 8.78e-02 sec 5.6% 97.8% 1.5% 200 Tpetra::CrsMatrix::localApply [region]
[1,0]<stdout>:|   |   |       |-> 8.59e-02 sec 5.5% 100.0% 1.5% 200 KokkosSparse::spmv<NoTranspose,Dynamic> [for]
[1,0]<stdout>:|   |   |-> 6.45e-02 sec 4.1% 21.2% 245.8% 399 Tpetra::MV::dot (Teuchos::ArrayView) [region]
[1,0]<stdout>:|   |   |   |-> 6.11e-02 sec 3.9% 22.4% 259.6% 399 Tpetra::multiVectorSingleColumnDot [region]
[1,0]<stdout>:|   |   |       |-> 1.71e-02 sec 1.1% 80.0% 6.5% 399 KokkosBlas::dot[ETI] [region]
[1,0]<stdout>:|   |   |           |-> 1.37e-02 sec 0.9% 100.0% 7.8% 399 KokkosBlas::dot<1D> [reduce]
[1,0]<stdout>:|   |   |-> 3.23e-02 sec 2.1% 66.6% 0.7% 600 Tpetra::MV::update(alpha,A,beta) [region]
[1,0]<stdout>:|   |       |-> 2.69e-02 sec 1.7% 80.0% 0.5% 600 KokkosBlas::axpby[ETI] [region]
[1,0]<stdout>:|   |           |-> 2.15e-02 sec 1.4% 100.0% 0.5% 599 KokkosBlas::Axpby::S15 [for]
[1,0]<stdout>:|   |           |-> 4.60e-05 sec 0.0% 100.0% 8.9% 1 KokkosBlas::Axpby::S12 [for]

Compiler warnings

/kokkos-remote-spaces/examples/randomaccess/randomaccess.cpp:155:37: warning: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Wabsolute-value]
                  ORDINAL_T index = abs(get(i, variance, g));
                                    ^
kokkos-remote-spaces/examples/randomaccess/randomaccess.cpp:155:37: note: use function 'std::abs' instead
                  ORDINAL_T index = abs(get(i, variance, g));
                                    ^~~
                                    std::abs
/kokkos-remote-spaces/examples/randomaccess/randomaccess.cpp:179:32: warning: format specifies type 'long long' but the argument has type 'int64_t' (aka 'long') [-Wformat]
           team_size, vec_len, num_elems, access_latency, time, GBs);
                               ^~~~~~~~~
/kokkos-remote-spaces/examples/cgsolve/parallel/cgsolve.cpp:308:71: warning: data argument not used by format string [-Wformat-extra-args]
      printf("%i, %i, %0.1f, %lf\n", N, num_iters, total_flops, time, GFlops,
             ~~~~~~~~~~~~~~~~~~~~~~                                   ^

Add support for local_deep_copy

Add support for local_deep_copy to support copy to remote to local and local to remote.
Use-cases:

Kokkos::Experimental::RemoteSpaces::local_deep_copy(v_R_cpy, v_R_sub); //Get cont'
Kokkos::Experimental::RemoteSpaces::local_deep_copy(v_R_sub, v_R_cpy); //Put cont'

Up for discussion: Three possible implementations to differentiate between local_deep_copy implementations

  • Kokkos::Experimental::RemoteSpaces::remote_local_deep_copy(View_t, View_t ) //by name
  • Kokkos::local_deep_copy (View_t, View_t); //by template expansion
  • Kokkos::local_deep_copy (RemoteView_t, View_t); //by argument type

Correct-behaving programs trigger view bounds check errors

Note: this is a very low priority issue, but I thought I would leave it for the future as KRS works its way out of Experimental.

When building Kokkos with Kokkos_ENABLE_DEBUG_BOUNDS_CHECK=ON, correct-behaving programs trigger view bounds messages and abort (happens with MPISpace and SHMEMSpace). For example, the cgsolve runs fine without bounds checking on 8 ranks:

Initial Residual = 35.7087
Iteration = 1   Residual = 35.7087
Iteration = 2   Residual = 0.322523
Iteration = 3   Residual = 0.154219
...

but not with it:

terminate called after throwing an instance of 'std::runtime_error'
  what():  View bounds error of view MyView ( 247273 < 128788 )

I'm pretty sure this is because it's comparing a global index against the local extent. One clue is the error doesn't happen when running with 1 rank.

View construction: build error with MPISpace

With Kokkos 4.1, I am able to build and install kokkos-remote-spaces with MPISpace:

cmake \
  -DKRS_ENABLE_MPISPACE=ON \
  -DKokkos_DIR=.../lib64/cmake/Kokkos \
  -DCMAKE_CXX_COMPILER=mpicxx \
  -DCMAKE_INSTALL_PREFIX=... \
  -DCMAKE_VERBOSE_MAKEFILE=ON \
  .../kokkos-remote-spaces 

but in an application, I get build errors if I try to construct a remote space View by label and extent:

/usr/local/bin/mpicxx -DKOKKOS_DEPENDENCE -DKRS_ENABLE_MPI -DKRS_ENABLE_MPISPACE -isystem /ascldap/users/bmkelle/Questa/MySandbox/scratch/install_mpi/include -isystem /ascldap/users/bmkelle/Questa/MySandbox/scratch/install/include -g -march=core-avx2 -mtune=core-avx2 -std=gnu++17 -MD -MT CMakeFiles/reprod.dir/reprod.cpp.o -MF CMakeFiles/reprod.dir/reprod.cpp.o.d -o CMakeFiles/reprod.dir/reprod.cpp.o -c /ascldap/users/bmkelle/Reprod/reprod.cpp
In file included from /home/bmkelle/Questa/MySandbox/scratch/install_mpi/include/Kokkos_MPISpace.hpp:159,
                 from /home/bmkelle/Questa/MySandbox/scratch/install_mpi/include/Kokkos_RemoteSpaces.hpp:62,
                 from /ascldap/users/bmkelle/Reprod/reprod.cpp:1:
/home/bmkelle/Questa/MySandbox/scratch/install_mpi/include/Kokkos_MPISpace_DataHandle.hpp: In instantiation of ‘Kokkos::Impl::MPIDataHandle<T, Traits>::MPIDataHandle(const SrcTraits&) [with SrcTraits = double*; T = double; Traits = Kokkos::ViewTraits<double*, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1> >]’:
/home/bmkelle/Questa/MySandbox/scratch/install_mpi/include/Kokkos_RemoteSpaces_ViewMapping.hpp:1072:23:   required from ‘Kokkos::Impl::ViewMapping<SrcTraits, Kokkos::Experimental::RemoteSpaceSpecializeTag>::ViewMapping(const Kokkos::Impl::ViewCtorProp<Args ...>&, const typename Traits::array_layout&) [with P = {double*}; Traits = Kokkos::ViewTraits<double*, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1> >; typename Traits::array_layout = Kokkos::LayoutRight]’
/home/bmkelle/Questa/MySandbox/scratch/install/include/Kokkos_View.hpp:1442:35:   required from ‘Kokkos::View<DataType, Properties>::View(const Kokkos::Impl::ViewCtorProp<Args ...>&, std::enable_if_t<Kokkos::Impl::ViewCtorProp<Args ...>::has_pointer, typename Kokkos::ViewTraits<DataType, Properties ...>::array_layout>&) [with P = {double*}; DataType = double*; Properties = {Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1>}; std::enable_if_t<Kokkos::Impl::ViewCtorProp<Args ...>::has_pointer, typename Kokkos::ViewTraits<DataType, Properties ...>::array_layout> = Kokkos::LayoutRight]’
/home/bmkelle/Questa/MySandbox/scratch/install/include/Kokkos_View.hpp:1579:75:   required from ‘Kokkos::View<DataType, Properties>::View(Kokkos::View<DataType, Properties>::pointer_type, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t) [with DataType = double*; Properties = {Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1>}; Kokkos::View<DataType, Properties>::pointer_type = double*; size_t = long unsigned int]’
/home/bmkelle/Questa/MySandbox/scratch/install/include/impl/Kokkos_ViewMapping.hpp:3064:19:   required from ‘std::enable_if_t<(std::is_trivial<Dummy>::value && std::is_trivially_copy_assignable<Dummy>::value)> Kokkos::Impl::ViewValueFunctor<DeviceType, ValueType, true>::construct_shared_allocation() [with Dummy = double; DeviceType = Kokkos::Device<Kokkos::Serial, Kokkos::Experimental::MPISpace>; ValueType = double; std::enable_if_t<(std::is_trivial<Dummy>::value && std::is_trivially_copy_assignable<Dummy>::value)> = void]’
/home/bmkelle/Questa/MySandbox/scratch/install_mpi/include/Kokkos_RemoteSpaces_ViewMapping.hpp:1214:9:   required from ‘Kokkos::Impl::SharedAllocationRecord<void, void>* Kokkos::Impl::ViewMapping<SrcTraits, Kokkos::Experimental::RemoteSpaceSpecializeTag>::allocate_shared(const Kokkos::Impl::ViewCtorProp<Args ...>&, const typename Traits::array_layout&, bool) [with P = {std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::Experimental::MPISpace, Kokkos::Serial}; T = Kokkos::ViewTraits<double*, Kokkos::Experimental::MPISpace>; Traits = Kokkos::ViewTraits<double*, Kokkos::Experimental::MPISpace>; typename Traits::array_layout = Kokkos::LayoutRight]’
/home/bmkelle/Questa/MySandbox/scratch/install/include/Kokkos_View.hpp:1421:45:   required from ‘Kokkos::View<DataType, Properties>::View(const Kokkos::Impl::ViewCtorProp<Args ...>&, std::enable_if_t<(! Kokkos::Impl::ViewCtorProp<Args ...>::has_pointer), typename Kokkos::ViewTraits<DataType, Properties ...>::array_layout>&) [with P = {std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}; DataType = double*; Properties = {Kokkos::Experimental::MPISpace}; std::enable_if_t<(! Kokkos::Impl::ViewCtorProp<Args ...>::has_pointer), typename Kokkos::ViewTraits<DataType, Properties ...>::array_layout> = Kokkos::LayoutRight]’
/home/bmkelle/Questa/MySandbox/scratch/install/include/Kokkos_View.hpp:1514:75:   required from ‘Kokkos::View<DataType, Properties>::View(const Label&, std::enable_if_t<Kokkos::Impl::is_view_label<Label>::value, const long unsigned int>, size_t, size_t, size_t, size_t, size_t, size_t, size_t) [with Label = char [7]; DataType = double*; Properties = {Kokkos::Experimental::MPISpace}; std::enable_if_t<Kokkos::Impl::is_view_label<Label>::value, const long unsigned int> = const long unsigned int; size_t = long unsigned int]’
/ascldap/users/bmkelle/Reprod/reprod.cpp:31:95:   required from here
/home/bmkelle/Questa/MySandbox/scratch/install_mpi/include/Kokkos_MPISpace_DataHandle.hpp:44:17: error: request for member ‘ptr’ in ‘arg’, which is of non-class type ‘double* const’
   44 |       : ptr(arg.ptr), win(arg.win), win_offset(arg.win_offset) {}
      |             ~~~~^~~
/home/bmkelle/Questa/MySandbox/scratch/install_mpi/include/Kokkos_MPISpace_DataHandle.hpp:44:31: error: request for member ‘win’ in ‘arg’, which is of non-class type ‘double* const’
   44 |       : ptr(arg.ptr), win(arg.win), win_offset(arg.win_offset) {}
      |                           ~~~~^~~
/home/bmkelle/Questa/MySandbox/scratch/install_mpi/include/Kokkos_MPISpace_DataHandle.hpp:44:52: error: request for member ‘win_offset’ in ‘arg’, which is of non-class type ‘double* const’
   44 |       : ptr(arg.ptr), win(arg.win), win_offset(arg.win_offset) {}
      |                                                ~~~~^~~~~~~~~~
make[2]: *** [CMakeFiles/reprod.dir/reprod.cpp.o] Error 1
make[2]: Leaving directory `/home/bmkelle/reprodBuild'
make[1]: *** [CMakeFiles/reprod.dir/all] Error 2
make[1]: Leaving directory `/home/bmkelle/reprodBuild'
make: *** [all] Error 2

Here is a complete program that reproduces the issue with MPISpace enabled, but compiles and runs OK when SHMEMSpace is enabled:

#include <Kokkos_RemoteSpaces.hpp>
int main(int argc, char** argv)
{
  int mpi_thread_level_available;
  int mpi_thread_level_required = MPI_THREAD_MULTIPLE;
#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
  mpi_thread_level_required = MPI_THREAD_SINGLE;
#endif
  MPI_Init_thread(&argc, &argv, mpi_thread_level_required,
                  &mpi_thread_level_available);
  assert(mpi_thread_level_available >= mpi_thread_level_required);
#ifdef KRS_ENABLE_SHMEMSPACE
  shmem_init_thread(mpi_thread_level_required, &mpi_thread_level_available);
  assert(mpi_thread_level_available >= mpi_thread_level_required);
#endif
  Kokkos::initialize(argc, argv);
  {
    Kokkos::View<double*, Kokkos::Experimental::DefaultRemoteMemorySpace> myView("myView", 100);
  }
  Kokkos::finalize();
#if defined(KRS_ENABLE_SHMEMSPACE)
  shmem_finalize();
#else
  MPI_Finalize();
#endif
  return 0;
}

Fix assertion `win != (static_cast<MPI_Win> (static_cast<void *> (&(ompi_mpi_win_null))))'

TEST_CATEGORY.test_subview failing with

#0  0x000020000060fcb0 in __GI_raise (sig=<optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x000020000061200c in __GI_abort () at abort.c:90
#2  0x00002000006057d4 in __assert_fail_base (
    fmt=0x20000076b6d0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x1014fcd8 "win != (static_cast<MPI_Win> (static_cast<void *> (&(ompi_mpi_win_null))))",
---Type <return> to continue, or q <return> to quit---
    file=0x10150218 "/g/g92/ciesko1/Kokkos/kokkos-remote-spaces-reduction-example/src/MPISPACE/Kokkos_MPISpace_DataHandle.hpp", line=<optimized out>,
    function=<optimized out>) at assert.c:92
#3  0x00002000006058c4 in __GI___assert_fail (
    assertion=0x1014fcd8 "win != (static_cast<MPI_Win> (static_cast<void *> (&(ompi_mpi_win_null))))",
    file=0x10150218 "/g/g92/ciesko1/Kokkos/kokkos-remote-spaces-reduction-example/src/MPISPACE/Kokkos_MPISpace_DataHandle.hpp", line=<optimized out>,
    function=0x10150390 <Kokkos::Impl::MPIDataElement<int, Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> >, void> Kokkos::Impl::MPIDataHandle<int, Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> > >::operator()<int>(int const&, int const&) const::__PRETTY_FUNCTION__> "Kokkos::Impl::MPIDataElement<T, Traits> Kokkos::Impl::MPIDataHandle<T, Traits>::operator()(const int&, const iType&) const [with iType = int; T = int; Traits = Kokkos::ViewTraits<int*, Kokkos::Partiti"...) at assert.c:101
#4  0x00000000100933ec in Kokkos::Impl::MPIDataElement<int, Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> >, void> Kokkos::Impl::MPIDataHandle<int, Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> > >::operator()<int>(int const&, int const&) const ()
#5  0x000000001008fae8 in Kokkos::Impl::MPIDataElement<int, Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::Me---Type <return> to continue, or q <return> to quit---
moryTraits<1u> >, void> const Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> >, Kokkos::Experimental::RemoteSpaceSpecializeTag>::reference<int, Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> > >(int const&, std::enable_if<((std::is_same<Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> >::array_layout, Kokkos::PartitionedLayoutLeft>::value||std::is_same<Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> >::array_layout, Kokkos::PartitionedLayoutRight>::value)||std::is_same<Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> >::array_layout, Kokkos::PartitionedLayoutStride>::value)&&Kokkos::RemoteSpaces_MemoryTraits<Kokkos::ViewTraits<int*, Kokkos::PartitionedLayoutStride, Kokkos::Experimental::MPISpace, Kokkos::MemoryTraits<1u> >::memory_traits>::dim0_is_pe, void>::type*) const ()
#6  0x0000000010084988 in void test_subview1D<int>(int, int, int, int)::{lambda(int)#1}::operator()(int) const ()
#7  0x0000000010098cd0 in std::enable_if<std::is_same<void, void>::value, void>::type Kokkos::Impl::ParallelFor<void test_subview1D<int>(int, int, int, int)::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::RangePolicy>::exec<void>() const ()
#8  0x000000001009391c in Kokkos::Impl::ParallelFor<void test_subview1D<int>(int, int, int, int)::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::---Type <return> to continue, or q <return> to quit---
RangePolicy>::execute() const ()

Needs updating how we deallocate windows with a shared space instance.

Add subview support

Subviews allow to encapsulate a subset of data on a partition rank.
We need to add support for both ways of subview construction:

auto v_sub = Kokkos::Experimental::RemoteSpaces::subview (v, rank, Args... dim);
auto v_sub = Kokkos::View<Data_t, RemoteSpace>(v, rank, Args... dim);

A subview is a stepping stone towards block data transfers. Up for discussion:

  • A subview return type is a View or a RemoteView:

Tests failing with MPISpace

There are multiple tests failing when using MPISpace, including localdeepcopy and partitioned_subview. In particular, the problematic line in Test_LocalDeepCopy.cpp is here. The test passes for the Scalar and the 1D View. Only the 2D View fails.

MPISPACE/Kokkos_MPISpace_ViewMapping.hpp:203:6: error: redefinition of 'template<class T> void Kokkos::Impl::mpi_type_p(T, int, int, ompi_win_t* const&, typename std::enable_if<std::is_same<T, long long int>::value>::type*)

MacOS, GCC 10 from Homebrew, Open-MPI 4.0.4 from Homebrew built with Apple Clang, which shouldn't matter because MPI is pure C and the C ABI is stable.

jrhammon-mac02:build jrhammon$ git clean -dfx ; CC=gcc-10 CXX=g++-10 MPI_CXX=mpicxx cmake .. -DKokkos_DIR=$HOME/Work/DOE/KOKKOS/install -DKokkos_ENABLE_MPISPACE=ON -DCMAKE_INSTALL_PREFIX=$HOME/Work/DOE/KOKKOS/install  && make -j4 install
warning: failed to remove ./: Invalid argument
Removing ./CMakeFiles
Removing ./Makefile
Removing ./cmake_install.cmake
Removing ./KokkosRemoteConfigVersion.cmake
Removing ./examples
Removing ./CMakeCache.txt
Removing ./KokkosRemoteConfig.cmake
-- The CXX compiler identification is GNU 10.2.0
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/bin/g++-10 - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Enabled Kokkos devices: OPENMP
-- Found MPI_CXX: /usr/local/Cellar/open-mpi/4.0.4_1/lib/libmpi.dylib (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/jrhammon/Work/DOE/KOKKOS/remote-spaces/build
Scanning dependencies of target kokkosremote
[ 16%] Building CXX object CMakeFiles/kokkosremote.dir/src/MPISPACE/Kokkos_MPISpace.cpp.o
In file included from /Users/jrhammon/Work/DOE/KOKKOS/remote-spaces/src/MPISPACE/Kokkos_MPISpace.hpp:274,
                 from /Users/jrhammon/Work/DOE/KOKKOS/remote-spaces/src/MPISPACE/Kokkos_MPISpace.cpp:46:
/Users/jrhammon/Work/DOE/KOKKOS/remote-spaces/src/MPISPACE/Kokkos_MPISpace_ViewMapping.hpp:203:6: error: redefinition of 'template<class T> void Kokkos::Impl::mpi_type_p(T, int, int, ompi_win_t* const&, typename std::enable_if<std::is_same<T, long long int>::value>::type*)'
  203 | void mpi_type_p(const T val, int offset, const int pe, const MPI_Win& win,
      |      ^~~~~~~~~~
/Users/jrhammon/Work/DOE/KOKKOS/remote-spaces/src/MPISPACE/Kokkos_MPISpace_ViewMapping.hpp:143:6: note: 'template<class T> void Kokkos::Impl::mpi_type_p(T, int, int, ompi_win_t* const&, typename std::enable_if<std::is_same<T, long long int>::value>::type*)' previously declared here
  143 | void mpi_type_p(const T val, int offset, const int pe, const MPI_Win& win,
      |      ^~~~~~~~~~
/Users/jrhammon/Work/DOE/KOKKOS/remote-spaces/src/MPISPACE/Kokkos_MPISpace_ViewMapping.hpp:385:3: error: redefinition of 'template<class T> T Kokkos::Impl::mpi_type_g(T&, int, int, ompi_win_t* const&, typename std::enable_if<std::is_same<T, long long int>::value>::type*)'
  385 | T mpi_type_g(T& val, int offset, const int pe, const MPI_Win& win,
      |   ^~~~~~~~~~
/Users/jrhammon/Work/DOE/KOKKOS/remote-spaces/src/MPISPACE/Kokkos_MPISpace_ViewMapping.hpp:325:3: note: 'template<class T> T Kokkos::Impl::mpi_type_g(T&, int, int, ompi_win_t* const&, typename std::enable_if<std::is_same<T, long long int>::value>::type*)' previously declared here
  325 | T mpi_type_g(T& val, int offset, const int pe, const MPI_Win& win,
      |   ^~~~~~~~~~
make[2]: *** [CMakeFiles/kokkosremote.dir/src/MPISPACE/Kokkos_MPISpace.cpp.o] Error 1
make[1]: *** [CMakeFiles/kokkosremote.dir/all] Error 2
make: *** [all] Error 2

document how to build this project

I am trying to test my MPI changes but cannot figure out what INCLUDE could not find load file: means when I compile with GCC 9.3.0 and OpenMP host support.

Kokkos remote spaces build

jrhammon@jrhammon-nuc:~/KOKKOS/remote-spaces/build$ git clean -dfx ; CC=gcc CXX=g++ MPI_CXX=mpig++ cmake .. -DKokkos_DIR=$HOME/KOKKOS/kokkos/build -DKokkos_ENABLE_MPISPACE=ON
warning: failed to remove ./: Invalid argument
Removing ./KokkosRemoteConfig.cmake
Removing ./examples
Removing ./CMakeCache.txt
Removing ./KokkosRemoteConfigVersion.cmake
Removing ./CMakeFiles
-- The CXX compiler identification is GNU 9.3.0
-- Check for working CXX compiler: /usr/bin/g++
-- Check for working CXX compiler: /usr/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at /home/jrhammon/KOKKOS/kokkos/build/KokkosConfig.cmake:42 (INCLUDE):
  INCLUDE could not find load file:

    /home/jrhammon/KOKKOS/kokkos/build/KokkosTargets.cmake
Call Stack (most recent call first):
  CMakeLists.txt:9 (find_package)


-- Enabled Kokkos devices: OPENMP
-- Found MPI_CXX: /opt/intel/oneapi/mpi/2021.1-beta08/lib/libmpicxx.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Configuring incomplete, errors occurred!
See also "/home/jrhammon/KOKKOS/remote-spaces/build/CMakeFiles/CMakeOutput.log".

Kokkos build

jrhammon@jrhammon-nuc:~/KOKKOS/kokkos/build$ git clean -dfx ; CC=gcc CXX=g++ cmake .. -DKokkos_ENABLE_OPENMP=ON && make -j8
warning: failed to remove ./: Invalid argument
-- Setting default Kokkos CXX standard to 11
-- The CXX compiler identification is GNU 9.3.0
-- Check for working CXX compiler: /usr/bin/g++
-- Check for working CXX compiler: /usr/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'RelWithDebInfo' as none was specified.
-- Setting policy CMP0074 to use <Package>_ROOT variables
-- The project name is: Kokkos
-- Using -std=gnu++11 for C++11 extensions as feature
-- Execution Spaces:
--     Device Parallel: NONE
--     Host Parallel: OPENMP
--       Host Serial: NONE
-- 
-- Architectures:
-- Found TPLLIBDL: /usr/lib/x86_64-linux-gnu/libdl.so  
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jrhammon/KOKKOS/kokkos/build
Scanning dependencies of target kokkoscore
[  4%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o
[ 19%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostBarrier.cpp.o
[ 19%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Error.cpp.o
[ 19%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o
[ 23%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace.cpp.o
[ 28%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_ExecPolicy.cpp.o
[ 33%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace_deepcopy.cpp.o
[ 38%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostThreadTeam.cpp.o
[ 42%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_MemoryPool.cpp.o
[ 47%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_MemorySpace.cpp.o
[ 52%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Profiling.cpp.o
[ 57%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Serial_Task.cpp.o
[ 61%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_SharedAlloc.cpp.o
[ 66%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Spinwait.cpp.o
[ 71%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Stacktrace.cpp.o
[ 76%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_hwloc.cpp.o
[ 80%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/OpenMP/Kokkos_OpenMP_Exec.cpp.o
[ 85%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/OpenMP/Kokkos_OpenMP_Task.cpp.o
[ 90%] Linking CXX static library libkokkoscore.a
[ 90%] Built target kokkoscore
Scanning dependencies of target kokkoscontainers
[ 95%] Building CXX object containers/src/CMakeFiles/kokkoscontainers.dir/impl/Kokkos_UnorderedMap_impl.cpp.o
[100%] Linking CXX static library libkokkoscontainers.a
[100%] Built target kokkoscontainers

Add support for atomics API

Add support for

atomic_exchange 
atomic_compare_exchange 
atomic_compare_exchange_strong 
atomic_load 
atomic_{add, assign, decrement, increment, sub}
atomic_fetch_{add, anddiv, lshift, max, min, mod, mul, or, rshift, sub, xor}
atomic_[op]_fetch, op = {add, anddiv, lshift, max, min, mod, mul, or, rshift, sub, xor}
atomic_store

Use-case

View<RemoteView_t, int*> v("Count",2);
  parallel_for("AtomIncr", values.extent(0) * values.extent(1), KOKKOS_LAMBDA(const int i) {
      int rank = i / values.extent(0);
      int idx = i % values.extent(1);
      atomic_increment(&v(rank, idx),1);
  });

Four miscellaneous suggestions

After working with kokkos-remote-spaces a bit, there are a few little changes I would like to propose to make interfaces and behavior more intuitive, especially for people already comfortable with Kokkos core.

It's possible that some of these already exist and I just haven't found them in the code though.

  • Make Kokkos::Experimental::getRange return a range with an exclusive upper bound (half-open interval), not inclusive. Kokkos Core consistently uses half-open intervals in RangePolicy constructors, and subview.
    For example,
auto myRange = Kokkos::Experimental::getRange(101, myRank);
std::cout << "Hello from rank " << myRank << ", range is " << myRange.first << "..." << myRange.second << std::endl;

produces this.

Hello from rank 0, range is 0...25
Hello from rank 1, range is 26...51
Hello from rank 2, range is 52...77
Hello from rank 3, range is 78...100

Instead, I'm proposing it should be

Hello from rank 0, range is 0...26
Hello from rank 1, range is 26...52
Hello from rank 2, range is 52...78
Hello from rank 3, range is 78...101
  • Have a way to get the global (0th) extent of a RemoteSpace view. If you make a distributed view like this, with a non-partitioned layout:
    using RemoteSpace = Kokkos::Experimental::DefaultRemoteMemorySpace;
    Kokkos::View<double*, RemoteSpace> myView("myView", 100);

the global extent will be 100, but it seems like there isn't a way to get that back once you've created the view. myView.extent(0) will return the maximum local extent on any one rank (for example, with 3 ranks, the above example will have myView.extent(0) == 34 on all ranks). This means that even with an all-reduce, you can't recover the global extent if it wasn't evenly divisible by the number of ranks.

  • Related to the last thing: if you have a distributed view, and add up its local extents on the ranks, that sum should be the same as the global extent you passed to the view constructor. So in this example, executed on 3 ranks:
    using RemoteSpace = Kokkos::Experimental::DefaultRemoteMemorySpace;
    Kokkos::View<double*, RemoteSpace> myView("myView", 100);
    std::cout << "On rank " << myRank << ", myView.extent(0) = " << myView.extent(0) << '\n';

right now this prints

On rank 0, myView.extent(0) = 34
On rank 1, myView.extent(0) = 34
On rank 2, myView.extent(0) = 34

I propose that this instead be

On rank 0, myView.extent(0) = 34
On rank 1, myView.extent(0) = 33
On rank 2, myView.extent(0) = 33

so that 34+33+33=100, the global extent.

  • For remote space views, it would be nice to have a method to get the local subview. For example, suppose RemoteSpace == SHMEMSpace and myView is constructed like in the above snippet, myView.getLocalSubview() would return a Kokkos::View<double*, Kokkos::HostSpace> with extent 34 or 33 (depending on which rank it's called from). Right now, it seems that an unmanaged view (pointer & extent) is needed to create a view of the local data.

Tests failing with SHMEM and OPENMP backend with OMP_NUM_THREADS>1

mpirun --np 2 --map-by ppr:2:node:TAGOUTPUT ./unit_tests/KokkosRemote_TestAll
result in:

 TEST_CATEGORY.test_atomic_globalview
[weaver4:128963:0:128982] ucp_worker.inl:62   Assertion `(*ep_p)->worker == worker' failed: worker=0x39f00010 ep=0x200035a400d8 ep->worker=(nil)
[weaver4:128962:1:128983] ucp_worker.inl:62   Assertion `(*ep_p)->worker == worker' failed: worker=0x1bad0010 ep=0x200035610090 ep->worker=(nil)
[weaver4:128962:2:128981] ucp_worker.inl:62   Assertion `(*ep_p)->worker == worker' failed: worker=0x1bad0010 ep=0x2000356100d8 ep->worker=(nil)
[weaver4:128962:0:128985] ucp_worker.inl:62   Assertion `(*ep_p)->worker == worker' failed: worker=0x1bad0010 ep=0x200035610090 ep->worker=(nil)
[1656027079.226968] [weaver4:128962:0]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 8 was not set in ucs
[1656027079.227022] [weaver4:128962:0]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 1 was not set in ucs
[1656027079.227055] [weaver4:128962:2]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 8 was not set in ucs
[1656027079.227094] [weaver4:128962:1]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 11 was not set in ucs
[1656027079.227149] [weaver4:128962:2]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 11 was not set in ucs
[1656027079.227185] [weaver4:128962:0]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 4 was not set in ucs
[1656027079.227219] [weaver4:128962:1]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 4 was not set in ucs
[1656027079.227272] [weaver4:128962:2]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 7 was not set in ucs
[1656027079.227307] [weaver4:128962:1]           debug.c:1369 UCX  WARN  ucs_debug_disable_signal: signal 7 was not set in ucs
==== backtrace (tid: 128982) ====
 0  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_handle_error+0x154) [0x200002757574]
[1656027079.227275] [weaver4:128962:0]        spinlock.c:29   UCX  WARN  ucs_recursive_spinlock_destroy() failed: busy
 1  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_message+0x88) [0x200002752488]
 2  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_format+0xa4) [0x200002752634]
 3  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_atomic_req_handler+0x4a4) [0x2000025e8b44]
 4  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libuct.so.0(+0x22be8) [0x2000026d2be8]
 5  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libuct.so.0(uct_self_ep_am_bcopy+0x16c) [0x2000026d353c]
 6  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(+0x591a4) [0x2000025e91a4]
 7  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_atomic_op_nbx+0x404) [0x2000025e5864]
 8  /ascldap/users/jciesko/software/ompi/ompi/build/../install_pow9/lib/openmpi/mca_atomic_ucx.so(+0x22d8) [0x20003aa822d8]
 9  /ascldap/users/jciesko/software/ompi/ompi/install_pow9/lib/liboshmem.so.0(pshmem_int_atomic_fetch_add+0xcc) [0x2000003778dc]
10  ./unit_tests/KokkosRemote_TestAll() [0x100318c0]
11  ./unit_tests/KokkosRemote_TestAll() [0x100365cc]
12  ./unit_tests/KokkosRemote_TestAll() [0x10032af8]
13  ./unit_tests/KokkosRemote_TestAll() [0x10043e18]
14  ./unit_tests/KokkosRemote_TestAll() [0x100322ac]
15  /home/projects/ppc64le/gcc/7.2.0/lib64/libgomp.so.1(+0x1a2ec) [0x20000078a2ec]
16  /lib64/libpthread.so.0(+0x8b94) [0x200000418b94]
17  /lib64/libc.so.6(clone+0xe4) [0x2000009385f4]
=================================
[weaver4:128963] *** Process received signal ***
[weaver4:128963] Signal: Aborted (6)
[weaver4:128963] Signal code:  (-6)
[weaver4:128963] [ 0] [0x2000000504d8]
[weaver4:128963] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200000851f94]
[weaver4:128963] [ 2] /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_message+0x90)[0x200002752490]
[weaver4:128963] [ 3] /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_format+0xa4)[0x200002752634]
[weaver4:128963] [ 4] /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_atomic_req_handler+0x4a4)[0x2000025e8b44]
[weaver4:128963] [ 5] /ascldap/users/jciesko/software/ucx/install_pow9/lib/libuct.so.0(+0x22be8)[0x2000026d2be8]
[weaver4:128963] [ 6] /ascldap/users/jciesko/software/ucx/install_pow9/lib/libuct.so.0(uct_self_ep_am_bcopy+0x16c)[0x2000026d353c]
[weaver4:128963] [ 7] /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(+0x591a4)[0x2000025e91a4]
[weaver4:128963] [ 8] ==== backtrace (tid: 128981) ====
 0  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_handle_error+0x154) [0x200002757574]
 1  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_message+0x88) [0x200002752488]
 2  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_format+0xa4) [0x200002752634]
/ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_atomic_op_nbx+0x404)[0x2000025e5864]
[weaver4:128963] [ 9] /ascldap/users/jciesko/software/ompi/ompi/build/../install_pow9/lib/openmpi/mca_atomic_ucx.so(+0x22d8)[0x20003aa822d8]
[weaver4:128963] [10]  3  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_atomic_req_handler+0x4a4) [0x2000025e8b44]
 4  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libuct.so.0(+0x22be8) [0x2000026d2be8]
 5  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libuct.so.0(uct_self_ep_am_bcopy+0x16c) [0x2000026d353c]
 6  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(+0x591a4) [0x2000025e91a4]
 7  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_atomic_op_nbx+0x404) [0x2000025e5864]
 8  /ascldap/users/jciesko/software/ompi/ompi/build/../install_pow9/lib/openmpi/mca_atomic_ucx.so(+0x22d8) [0x20003aa822d8]
 9  /ascldap/users/jciesko/software/ompi/ompi/install_pow9/lib/liboshmem.so.0(pshmem_int_atomic_fetch_add+0xcc) [0x2000003778dc]
10  ./unit_tests/KokkosRemote_TestAll() [0x100318c0]
11  ./unit_tests/KokkosRemote_TestAll() [0x100365cc]
12  ./unit_tests/KokkosRemote_TestAll() [0x10032af8]
13  ./unit_tests/KokkosRemote_TestAll() [0x10043e18]
14  ./unit_tests/KokkosRemote_TestAll() [0x100322ac]
15  /home/projects/ppc64le/gcc/7.2.0/lib64/libgomp.so.1(+0x1a2ec) [0x20000078a2ec]
16  /lib64/libpthread.so.0(+0x8b94) [0x200000418b94]
17  /lib64/libc.so.6(clone+0xe4) [0x2000009385f4]
=================================
/ascldap/users/jciesko/software/ompi/ompi/install_pow9/lib/liboshmem.so.0(pshmem_int_atomic_fetch_add+0xcc)[0x2000003778dc]
[weaver4:128963] [11] ./unit_tests/KokkosRemote_TestAll[0x100318c0]
[weaver4:128963] [12] ./unit_tests/KokkosRemote_TestAll[0x100365cc]
[weaver4:128963] [13] ./unit_tests/KokkosRemote_TestAll[0x10032af8]
[weaver4:128963] [14] ./unit_tests/KokkosRemote_TestAll[0x10043e18]
[weaver4:128963] [15] ./unit_tests/KokkosRemote_TestAll[0x100322ac]
[weaver4:128963] [16] /home/projects/ppc64le/gcc/7.2.0/lib64/libgomp.so.1(+0x1a2ec)[0x20000078a2ec]
[weaver4:128963] [17] [weaver4:128962] *** Process received signal ***
[weaver4:128962] Signal: Aborted (6)
[weaver4:128962] Signal code:  (-6)
[weaver4:128962] [ 0] [0x2000000504d8]
/lib64/libpthread.so.0(+0x8b94)[0x200000418b94]
[weaver4:128963] [18] [weaver4:128962] [ 1] /lib64/libc.so.6(clone+0xe4)[0x2000009385f4]
[weaver4:128963] *** End of error message ***
==== backtrace (tid: 128983) ====
 0  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_handle_error+0x154) [0x200002757574]
 1  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_message+0x88) [0x200002752488]
 2  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_format+0xa4) [0x200002752634]
 3  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_atomic_req_handler+0x4a4) [0x2000025e8b44]
 4  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libuct.so.0(+0x1a7d0) [0x2000026ca7d0]
 5  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_worker_progress+0x64) [0x2000025d8ea4]
 6  /ascldap/users/jciesko/software/ompi/ompi/build/../install_pow9/lib/openmpi/mca_atomic_ucx.so(+0x2384) [0x20003aa82384]
 7  /ascldap/users/jciesko/software/ompi/ompi/install_pow9/lib/liboshmem.so.0(pshmem_int_atomic_fetch_add+0xcc) [0x2000003778dc]
 8  ./unit_tests/KokkosRemote_TestAll() [0x100318c0]
 9  ./unit_tests/KokkosRemote_TestAll() [0x100365cc]
10  ./unit_tests/KokkosRemote_TestAll() [0x10032af8]
11  ./unit_tests/KokkosRemote_TestAll() [0x10043e18]
12  ./unit_tests/KokkosRemote_TestAll() [0x100322ac]
13  /home/projects/ppc64le/gcc/7.2.0/lib64/libgomp.so.1(+0x1a2ec) [0x20000078a2ec]
14  /lib64/libpthread.so.0(+0x8b94) [0x200000418b94]
15  /lib64/libc.so.6(clone+0xe4) [0x2000009385f4]
=================================
==== backtrace (tid: 128985) ====
 0  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_handle_error+0x154) [0x200002757574]
 1  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_message+0x88) [0x200002752488]
 2  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_format+0xa4) [0x200002752634]
/lib64/libc.so.6(abort+0x2b4)[0x200000851f94]
 3  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_atomic_req_handler+0x4a4) [0x2000025e8b44]
[weaver4:128962] [ 2]  4  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libuct.so.0(+0x1a7d0) [0x2000026ca7d0]
 5  /ascldap/users/jciesko/software/ucx/install_pow9/lib/libucp.so.0(ucp_worker_progress+0x64) [0x2000025d8ea4]
 6  /ascldap/users/jciesko/software/ompi/ompi/build/../install_pow9/lib/openmpi/mca_atomic_ucx.so(+0x2384) [0x20003aa82384]
 7  /ascldap/users/jciesko/software/ompi/ompi/install_pow9/lib/liboshmem.so.0(pshmem_int_atomic_fetch_add+0xcc) [0x2000003778dc]
/ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_message+0x90)[0x200002752490]
[weaver4:128962]  8  ./unit_tests/KokkosRemote_TestAll() [0x100318c0]
[ 3]  9  ./unit_tests/KokkosRemote_TestAll() [0x100365cc]
10  ./unit_tests/KokkosRemote_TestAll() [0x10032af8]
11  ./unit_tests/KokkosRemote_TestAll() [0x10043e18]
/ascldap/users/jciesko/software/ucx/install_pow9/lib/libucs.so.0(ucs_fatal_error_format+0xa4)[0x200002752634]
12  ./unit_tests/KokkosRemote_TestAll() [0x100322ac]
[weaver4:128962] [ 4] 13  /home/projects/ppc64le/gcc/7.2.0/lib64/libgomp.so.1(+0x1a2ec) [0x20000078a2ec]

MPISpace: element operators with atomic trait aren't atomic

The operator overloads for proxy type MPIDataElement with atomic trait have the same implementation as the non-atomic version, for example at

  KOKKOS_INLINE_FUNCTION
  void inc() const {
    T val = T();
    mpi_type_g(val, offset, pe, *win);
    val++;
    mpi_type_p(val, offset, pe, *win);
  }

These operators should use the one-sided atomic functions: MPI_Accumulate, MPI_Get_accumulate, or MPI_Fetch_and_op.

MPI_Compare_and_swap in a loop could be used for non-builtin operations like in Desul.

Kokkos::deep_copy hangs

In this example, the Kokkos::deep_copy call marked below causes the program to hang when running with multiple MPI ranks. Commenting out the deep_copy and uncommenting the for-loop to copy the equivalent entries makes the program work as expected.

Kokkos::deep_copy is not a collective operation when one of the arguments is a view in a remote space, correct?

// clang-format off
#include <fstream>
#include <algorithm>
#include <numeric>
#include <unistd.h>

#include <Kokkos_RemoteSpaces.hpp>
// clang-format on

int main(int argc, char *argv[]) {
  using RemoteSpace_t = Kokkos::Experimental::DefaultRemoteMemorySpace;
  constexpr size_t M = 8;
  int mpi_thread_level_available;
  int mpi_thread_level_required = MPI_THREAD_MULTIPLE;
  MPI_Init_thread(&argc, &argv, mpi_thread_level_required,
                  &mpi_thread_level_available);
  assert(mpi_thread_level_available >= mpi_thread_level_required);
  if (!(mpi_thread_level_available >= mpi_thread_level_required)) {
    // if asserts are disabled, don't want to move forward.
    std::cout << "mpi_thread_level_available >= mpi_thread_level_required failed\n";
    exit(1);
  }

  Kokkos::initialize(argc, argv);
  {
    using namespace Kokkos;
    using PartitionedView1D =
        Kokkos::View<double **, PartitionedLayoutRight, RemoteSpace_t>;
    using Local1DView = typename PartitionedView1D::HostMirror;
    using TeamPolicy_t = Kokkos::TeamPolicy<>;

    int size, rank;
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0) std::cout << "MPI_COMM_WORLD size: " << size << '\n';

    auto A = PartitionedView1D("RemoteView", size, M);
    RemoteSpace_t().fence();
    auto Alocal = Local1DView("LocalView", 1, M);
    auto lr = Experimental::get_local_range(M);
    parallel_for(
        "init", (A.extent(1)),
        KOKKOS_LAMBDA(auto i) { A(rank, i) = rank * M + i; });
    RemoteSpace_t().fence();
    for (int i = 0; i < size; i++) {
      if (rank == 0) {
        std::cout << "MPI_COMM_WORLD rank: " << i << '\n';
        auto range = std::make_pair(size_t(0), M);
        auto ar = Kokkos::subview(A, std::make_pair(i, i+1), range);
        auto al = Kokkos::subview(A, std::make_pair(rank, rank+1), range);
        Kokkos::parallel_for(
            "Team", TeamPolicy_t(1, 1),
            KOKKOS_LAMBDA(typename TeamPolicy_t::member_type team) {
              Kokkos::single(Kokkos::PerTeam(team), [&]() {
                Kokkos::Experimental::RemoteSpaces::local_deep_copy(al, ar);
              });
            });
        //for(int i = 0; i < al.extent_int(1); i++)
        //  Alocal(0, i) = al(0, i);
        Kokkos::deep_copy(Alocal, al); // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HERE
        for (size_t j = range.first; j < range.second; j++)
          std::cout << Alocal(0, j) << ' ';
        std::cout << '\n';
      }
      RemoteSpace_t().fence();
    }
  }
  Kokkos::finalize();
  MPI_Finalize();

  return 0;
}

Output with 4 ranks and the deep_copy uncommented:

MPI_COMM_WORLD size: 4
MPI_COMM_WORLD rank: 0
0 1 2 3 4 5 6 7 
MPI_COMM_WORLD rank: 1
8 9 10 11 12 13 14 15 
MPI_COMM_WORLD rank: 2
16 17 18 19 20 21 22 23 
<hang>

With the for-loop uncommented (correct behavior). Now the program terminates normally:

MPI_COMM_WORLD size: 4
MPI_COMM_WORLD rank: 0
0 1 2 3 4 5 6 7 
MPI_COMM_WORLD rank: 1
8 9 10 11 12 13 14 15 
MPI_COMM_WORLD rank: 2
16 17 18 19 20 21 22 23 
MPI_COMM_WORLD rank: 3
24 25 26 27 28 29 30 31 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.