Coder Social home page Coder Social logo

cabanamd's Introduction

cabanamd's People

Contributors

aprokop avatar athomps avatar crtrott avatar dalg24 avatar frobnitzem avatar junghans avatar masterleinad avatar saakethdesai avatar sslattery avatar stanmoore1 avatar streeve avatar tcgermann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cabanamd's Issues

Add long range forces

Add long range forces for systems with charges, which is fully optional when running the code:

  • Cajita dependency
  • Example case
  • Ewald solver
  • Smooth particle mesh Ewald (SPME) with Cajita

(Add Shane)

Problems with CUDA device

I tried to run the provided LJ example on a GPU:

./cbnMD -il in.lj --device-type CUDA

The job crashed with the following error message:

Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
                                  without setting CUDA_LAUNCH_BLOCKING=1.
                                  The code must call Cuda().fence() after each kernel
                                  or will likely crash when accessing data on the host.
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered /.../kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:144
Traceback functionality not available

What did I do wrong?

System generation

Since adding read_data functionality and accepting nonzero box minimum, issues with building lattices and reading data have arisen

  • - Build lattice issues
  • - Read data issues

Intranode GPU communication crashes in MPI called from Cabana::Gather::apply()

CabanaMD with the standard in.lj testcase crashes on both LLNL Lassen (spectrum MPI or mvapich2) and LANL Chicoma (craypich) when communicating between GPUs on the same node. It works when communicating inter-node, though I expect this is because MPI is not being as strict in error checking for data sending as the RMA routines MPI uses for intra-node communication. I've enabled GPU-aware communication in all cases.

The MPI_Send call invoked by Cabana::Gather::apply() (line 335 of Cabana_Halo.cpp) appears to be what is crashing. Here's the Lassen lwcore traceback from spectrum MPI:

[email protected]:101
PAMI::Protocol::Get::GetRdma<PAMI::Device::Shmem::DmaModel<PAMI::Device::ShmemDevice<PAMI::Fifo::WrapFifo<P
AMI::Fifo::FifoPacket<64u,@libpami.so.3
PAMI::Protocol::Get::CompositeRGet<PAMI::Protocol::Get::RGet,@libpami.so.3
PAMI::Context::rget_impl(pami_rget_simple_t*)@libpami.so.3
[email protected]
process_rndv_msg@mca_pml_pami.so
pml_pami_recv_rndv_cb@mca_pml_pami.so
PAMI::Protocol::Send::EagerSimple<PAMI::Device::Shmem::PacketModel<PAMI::Device::ShmemDevice<PAMI::Fifo::Wr
apFifo<PAMI::Fifo::FifoPacket<64u,@libpami.so.3
[email protected]
mca_pml_pami_progress_wait@mca_pml_pami.so
mca_pml_pami_send@mca_pml_pami.so
PMPI_Send@libmpi_ibm.so.3
Cabana::Gather<Cabana::Halo<Kokkos::Device<Kokkos::Cuda,@()
void@()
Comm<System<Kokkos::Device<Kokkos::Cuda,@()
CbnMD<System<Kokkos::Device<Kokkos::Cuda,@()
main@()
---STACK

Use Cajita for MPI decomposition

Replace current MPI subdomain grid/mesh with Cajita. This will simplify the Comm class and enable using Cajita periodic-enabled Halo/Distributors once implemented.

First step towards #17

Unable to run NNP example

I am trying to run the NNP example in input/in.nnp but after the symmetry function setup is completed I get the following error in the SETUP: SYMMETRY FUNCTION GROUPS section:

terminate called after throwing an instance of 'std::runtime_error'
  what():  View bounds error of view AngularCounter ( 1 < 1 )
Traceback functionality not available

I am starting CabanaMD with the following command:

 ~/local/src/openmpi/4.0.4/build/bin/mpiexec -n 1  build/bin/cbnMD -il input/in.nnp --device-type SERIAL

The error occurs with any of the three device targets: SERIAL, OPENMP and CUDA

When I run with gdb and look at the backtrace I find:

#6  0x0000555557159719 in Kokkos::Impl::throw_runtime_exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#7  0x0000555556d48c19 in Kokkos::Impl::view_verify_operator_bounds<Kokkos::HostSpace, Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<int*, Kokkos::LayoutRight, Kokkos::HostSpace>, void>, int> (tracker=..., map=...)
    at /home/andi/local/src/kokkos/3.1.01/build/install/include/impl/Kokkos_ViewMapping.hpp:3813
#8  0x0000555556bd7b53 in Kokkos::View<int*, Kokkos::LayoutRight, Kokkos::HostSpace>::operator()<int> (i0=<optimized out>, this=<optimized out>)
    at /home/andi/local/src/kokkos/3.1.01/build/install/include/Kokkos_View.hpp:1241
#9  nnpCbn::Element::setupSymmetryFunctionGroups<Kokkos::View<double** [15], Kokkos::LayoutRight, Kokkos::HostSpace>, Kokkos::View<int***, Kokkos::LayoutRight, Kokkos::HostSpace>, Kokkos::View<int*, Kokkos::LayoutRight, Kokkos::HostSpace> > (this=0x55555a3a4cf0, SF=..., SFGmemberlist=..., attype=0, 
---Type <return> to continue, or q <return> to quit---
    h_numSFperElem=..., h_numSFGperElem=..., maxSFperElem=27)
    at /home/andi/local/src/CabanaMD/master/src/force_types/nnp_element_impl.h:375
#10 0x00005555563ec769 in nnpCbn::Mode<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::setupSymmetryFunctionGroups (this=0x55555b920de0)
    at /home/andi/local/src/CabanaMD/master/src/force_types/nnp_mode_impl.h:615
#11 0x000055555633314b in ForceNNP<System<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, AoSoA6>, System_NNP<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, AoSoA3>, NeighborVerlet<System<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, AoSoA6>, Cabana::FullNeighborTag, Cabana::VerletLayout2D>, Cabana::SerialOpTag, Cabana::SerialOpTag>::init_coeff (this=0x55555c21f950, 
    args=std::vector of length 1, capacity 1 = {...})
    at /home/andi/local/src/CabanaMD/master/src/force_types/force_nnp_cabana_neigh_impl.h:59
#12 0x0000555555f0c0aa in CbnMD<System<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, AoSoA6>, NeighborVerlet<System<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, AoSoA6>, Cabana::FullNeighborTag, Cabana::VerletLayout2D> >::init (
    this=0x55555b9bb680, commandline=...)
    at /home/andi/local/src/CabanaMD/master/src/cabanamd_impl.h:178

which brings me here:

and then descends into Kokkos... do you have any idea why this error happens and how I can resolve it?

I used the following setup to compile Kokkos, Cabana and CabanaMD:

My system:

  • LInux Mint 19.3
  • gcc version 7.5.0
  • CUDA version 11.0
  • OpenMPI version 4.0.4 (compiled with CUDA support)
  • NVIDIA GTX 1060 6GB GPU (Pascal61 architecture)

Kokkos (version 3.1.01) build flags:

In the nvcc_wrapper script I set default_arch="sm_61".

 -DCMAKE_CXX_COMPILER=${KOKKOS_SRC_DIR}/bin/nvcc_wrapper \
 -DCMAKE_INSTALL_PREFIX=${KOKKOS_SRC_DIR}/build/install \
 -DKokkos_CUDA_DIR=/usr/local/cuda-11.0/ \
 -DKokkos_ENABLE_SERIAL=On \
 -DKokkos_ENABLE_OPENMP=On \
 -DKokkos_ENABLE_CUDA=On \
 -DKokkos_ENABLE_CUDA_LAMBDA=On \
 -DKokkos_ENABLE_CUDA_UVM=On \
 -DKokkos_ARCH_PASCAL61=On \
 -DKokkos_ENABLE_HWLOC=On \
 -DKokkos_ENABLE_TESTS=On \
 -DKokkos_ENABLE_DEBUG=On \
 -DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=On \

Cabana (66c94f6) build flags:

 -DCMAKE_BUILD_TYPE="Debug" \
 -DCMAKE_PREFIX_PATH="${KOKKOS_INSTALL_DIR};${HOME}/local/src/openmpi/4.0.4/build/" \
 -DCMAKE_INSTALL_PREFIX=${CABANA_INSTALL_DIR} \
 -DCMAKE_CXX_COMPILER=${KOKKOS_SRC_DIR}/bin/nvcc_wrapper \
 -DMPI_CXX_COMPILER=${HOME}/local/src/openmpi/4.0.4/build/bin/mpic++ \
 -DCabana_REQUIRE_CUDA=On \
 -DCabana_ENABLE_MPI=On \
 -DCabana_ENABLE_EXAMPLES=On \
 -DCabana_ENABLE_TESTING=On \

CabanaMD (562600e) build flags:

 -DCMAKE_BUILD_TYPE="Debug" \
 -DCMAKE_CXX_COMPILER=${KOKKOS_DIR}/bin/nvcc_wrapper \
 -DCMAKE_PREFIX_PATH="${CABANA_DIR};${HOME}/local/src/openmpi/4.0.4/build/" \
 -DCMAKE_INSTALL_PREFIX=${CABANAMD_INSTALL_DIR} \
 -DMPI_CXX_COMPILER=${HOME}/local/src/openmpi/4.0.4/build/bin/mpic++ \
 -DCabana_ENABLE_MPI=On \
 -DCabanaMD_VECTORLENGTH=32 \
 -DN2P2_DIR=${HOME}/local/src/n2p2-singraber/ \
 -DCabanaMD_ENABLE_NNP=On \
 -DCabanaMD_MAXSYMMFUNC_NNP=30 \
 -DCabanaMD_VECTORLENGTH_NNP=1 \
 -DCabanaMD_ENABLE_TESTING=ON \

There is also an additional issue with the tests of CabanaMD which may be unrelated but who knows...:

The tests of Kokkos and Cabana pass without any errors but when I run ctest -VV in the CabanaMD build directory I get the same error for both CUDA-related tests (Integrator_test_CUDA and Neighbor_test_CUDA):

[ RUN      ] cuda.reversibility_test
Kokkos::View ERROR: attempt to access inaccessible memory space
Thread 1 "Integrator_test" received signal SIGABRT, Aborted.

Running the tests manually and backtracing with gdb shows:

#3  0x000055555556caa0 in Kokkos::abort (
    message=0x55555567fe30 "Kokkos::View ERROR: attempt to access inaccessible memory space")
    at /home/andi/local/src/kokkos/3.1.01/build/install/include/impl/Kokkos_Error.hpp:175
#4  0x0000555555576ee7 in Kokkos::View<double*, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace> >::verify_space<Kokkos::HostSpace, false>::check ()
    at /home/andi/local/src/kokkos/3.1.01/build/install/include/Kokkos_View.hpp:882
#5  Kokkos::View<double*, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace> >::operator()<int> (i0=<optimized out>, this=0x7fffffffc730)
    at /home/andi/local/src/kokkos/3.1.01/build/install/include/Kokkos_View.hpp:1241
#6  Test::createParticles<System<Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, AoSoA6> > (num_particle=1000, num_ghost=200, box_min=-12.295999999999999, 
    box_max=10.904)
    at /home/andi/local/src/CabanaMD/master/unit_test/tstIntegrator.hpp:38
#7  0x000055555556e6d2 in Test::testIntegratorReversibility<System<Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, AoSoA6> > (steps=100)
    at /home/andi/local/src/CabanaMD/master/unit_test/tstIntegrator.hpp:91

and

#3  0x000055555557190b in Kokkos::abort (
    message=0x5555556c2ca8 "Kokkos::View ERROR: attempt to access inaccessible memory space")
    at /home/andi/local/src/kokkos/3.1.01/build/install/include/impl/Kokkos_Error.hpp:175
#4  0x000055555557bc05 in Kokkos::View<double*, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace> >::verify_space<Kokkos::HostSpace, false>::check ()
    at /home/andi/local/src/kokkos/3.1.01/build/install/include/Kokkos_View.hpp:882
#5  Kokkos::View<double*, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace> >::operator()<int> (i0=<optimized out>, this=0x7fffffffc730)
    at /home/andi/local/src/kokkos/3.1.01/build/install/include/Kokkos_View.hpp:1241
#6  Test::createAtoms<System<Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, AoSoA6> > (num_atom=1000, num_ghost=200, box_min=-12.295999999999999, 
    box_max=10.904)
    at /home/andi/local/src/CabanaMD/master/unit_test/tstNeighbor.hpp:255
#7  0x0000555555573790 in Test::testNeighborListPartialRange<System<Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, AoSoA6>, NeighborVerlet<System<Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, AoSoA6>, Cabana::FullNeighborTag, Cabana::V--Type <RET> for more, q to quit, c to continue without paging--c
erletLayout2D> > (half_neigh=false) at /home/andi/local/src/CabanaMD/master/unit_test/tstNeighbor.hpp:303

for Integrator_test_CUDA and Neighbor_test_CUDA, respectively.

Sorry for this overly long post... I am out of ideas for now, any help is greatly appreciated!

Thank you!!

Remove HIP workarounds

Once released in Kokkos, remove workarounds for:

  • print_configuration
  • Hierarchical reductions

neighbor_parallel_for

Use the Cabana::neighbor_parallel_for with command line options for serial or team parallelization over atom neighbors

  • LJ
  • NNP

Illegal instruction error

As I run cbnmd binary, there is Illegal instruction error
So, I tried to test unit_test, but there is also same error

I attached log when running Neighbor_test_SERIAL with gdb

Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments

Program received signal SIGILL, Illegal instruction.
std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct<char const*> (__end=, __beg=, this=0x7fffffffced0)
at /usr/include/c++/8/bits/basic_string.tcc:225
225 { this->_S_copy_chars(_M_data(), __beg, __end); }
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.1.el8.x86_64 libgomp-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64 nvidia-driver-cuda-libs-470.57.02-1.el8.x86_64 zlib-1.2.11-16.el8_2.x86_64

Add Neighbor class

Add back Neighbor class for increased flexibility, initially with VerletList subclass

Variable number of AoSoA

Create an atom class that enables the user to specify the number and grouping of particle properties in AoSoAs from command line options.

This closes #16 by adding flexibility rather than assuming a general, optimal layout. This is also in replacement of #15, allowing many layout choices rather than 1 or 6 AoAoA.

CabanaMD/N2P2 feature limitations

Hi.
I am trying to use the CabanaMD/n2p2 interface for my ML MD simulations. The interface works for the nnp example well, but when I tried it for my system, it's turned out that it doesn't support some of the Lammps features, like boundary, and most importantly it only works for the NVE simulations. I wonder if there is any plans to extend the capabilities of the interface. Thanks.

Best,
Mostafa

CMake 3.11+ required for build?

This is only a minor issue which can probably be fixed by simply updating the documentation:

When setting up the CabanaMD build with cmake it fails with this error:

CMake Error at CMakeLists.txt:19 (include):
  include could not find load file:

    FetchContent


CMake Error at CMakeLists.txt:20 (FetchContent_Declare):
  Unknown CMake command "FetchContent_Declare".

I think the reason is a too old version of cmake: The FetchContent_Declare command appears in cmake 3.11 whereas I am currently using 3.10.2. The CabanaMD documentation says that even 3.9+ would be sufficient.

I will update my cmake and see whether this will resolve the issue...

MPI comm (26 neighbor)

Improve communication performance by switching from 6-pass/6-neighbor to 1-pass/26-neighbor approach.

Enable CabanaMC functionality

  • Break up cmake file into library and executable (to simplify work of adding multiple executables).
  • Create a test for energy calculations.
  • Build class/template-based input method for molecular topologies.
  • Create simplified route to calculation of Delta E-values.

(add frobnitzem)

Remove NNP code and move to n2p2

Steps towards removing the n2p2 specific code:

  • Merge flexible AoSoA #30, templating over System and System_NNP (better interface to main n2p2 kernels)
  • Merge neighbor list update #43, templating over Neighbor
  • Merge update for NNP kernels 46 to use Cabana slices, rather than CabanaMD System
  • Merge template over device type #51 and clean up use of Kokkos Views for NNP
  • Merge into n2p2
  • Remove all NNP files except force_nnp* (wrapping n2p2 functions)

Unit tests

First add initial unit testing, then at least one per class:

  • Integration
  • Neighbor lists
  • Communication
  • Force compute
  • Energy compute
  • Sort

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.