Coder Social home page Coder Social logo

ecp-copa / examinimd Goto Github PK

View Code? Open in Web Editor NEW
26.0 8.0 20.0 1016 KB

Molecular dynamics proxy application based on Kokkos

License: Other

Shell 0.15% Makefile 0.55% C++ 95.07% C 3.32% CMake 0.77% CWeb 0.14%
kokkos lammps molecular-dynamics proxy-application

examinimd's People

Contributors

athomps avatar calccrypto avatar crtrott avatar junghans avatar masterleinad avatar stanmoore1 avatar streeve avatar tcgermann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

examinimd's Issues

Update ExaMiniMD for Kokkos v3.0 promotion

I'm a research computer scientist at the University of Utah, doing contract work for Sandia ABQ. Christian Trott has asked that I update the Kokkos miniApps repository for the Kokkos v3.0 promotion. As I'm not a direct contributor to ExaMiniMD, I cannot self-assign this issue.

I've updated ExaMiniMD and will submit a pull request.

View Bounds Error with Multiple GPUs and LJ

I'm seeing a crash with LJ on more than 1 GPU. It is failing with a bounds error when I use a debug executable:

Using: ForceLJNeighFull Neighbor2D CommMPI BinningKKSort
Atoms: 256000 128000

#Timestep Temperature PotE ETot Time Atomsteps/s
0 1.400000 -6.332812 -4.232820 0.000000 0.000000e+00
10 417750470098862761654662257421581806554862139952389909363017383936.000000 -5.135827 626623257391633461030728237666787479383714598672317509428821622784.000000 0.014616 1.751505e+08
:0: : block: [192,0,0], thread: [0,36,0] Assertion `View bounds error of view Kokkos::SortImpl::BinSortFunctor::bin_count` failed.
:0: : block: [192,0,0], thread: [0,38,0] Assertion `View bounds error of view Kokkos::SortImpl::BinSortFunctor::bin_count` failed.
:0: : block: [1173,0,0], thread: [0,37,0] Assertion `View bounds error of view Kokkos::SortImpl::BinSortFunctor::bin_count` failed.

Build error with Kokkos master

With kokkos/kokkos@120d9ce, I get:

mpicxx -I./ -I/home/junghans/kokkos/core/src -I/home/junghans/kokkos/containers/src -I/home/junghans/kokkos/algorithms/src  --std=c++11 -mavx -O3 -g -DEXAMINIMD_ENABLE_MPI  -c binning_kksort.cpp
binning_kksort.cpp: In constructor ‘BinningKKSort::BinningKKSort(System*)’:
binning_kksort.cpp:3:51: error: no matching function for call to ‘Kokkos::BinSort<Kokkos::View<const double* [3], Kokkos::LayoutRight>, Kokkos::BinOp3D<Kokkos::View<const double* [3], Kokkos::LayoutRight> >, Kokkos::Serial, int>::BinSort()’
 BinningKKSort::BinningKKSort(System* s): Binning(s) {}
                                                   ^
In file included from ./binning_kksort.h:4:0,
                 from binning_kksort.cpp:1:
/home/junghans/kokkos/algorithms/src/Kokkos_Sort.hpp:164:3: note: candidate: Kokkos::BinSort<KeyViewType, BinSortOp, ExecutionSpace, SizeType>::BinSort(Kokkos::BinSort<KeyViewType, BinSortOp, ExecutionSpace, SizeType>::const_key_view_type, BinSortOp, bool) [with KeyViewType = Kokkos::View<const double* [3], Kokkos::LayoutRight>; BinSortOp = Kokkos::BinOp3D<Kokkos::View<const double* [3], Kokkos::LayoutRight> >; ExecutionSpace = Kokkos::Serial; SizeType = int; Kokkos::BinSort<KeyViewType, BinSortOp, ExecutionSpace, SizeType>::const_key_view_type = Kokkos::View<const double* [3], Kokkos::LayoutRight, Kokkos::HostSpace>; typename KeyViewType::array_layout = Kokkos::LayoutRight; typename KeyViewType::memory_space = Kokkos::HostSpace; typename KeyViewType::const_data_type = const double* [3]]
   BinSort(const_key_view_type keys_, BinSortOp bin_op_,
   ^~~~~~~
/home/junghans/kokkos/algorithms/src/Kokkos_Sort.hpp:164:3: note:   candidate expects 3 arguments, 0 provided
/home/junghans/kokkos/algorithms/src/Kokkos_Sort.hpp:95:7: note: candidate: Kokkos::BinSort<Kokkos::View<const double* [3], Kokkos::LayoutRight>, Kokkos::BinOp3D<Kokkos::View<const double* [3], Kokkos::LayoutRight> >, Kokkos::Serial, int>::BinSort(const Kokkos::BinSort<Kokkos::View<const double* [3], Kokkos::LayoutRight>, Kokkos::BinOp3D<Kokkos::View<const double* [3], Kokkos::LayoutRight> >, Kokkos::Serial, int>&)
 class BinSort {
       ^~~~~~~
/home/junghans/kokkos/algorithms/src/Kokkos_Sort.hpp:95:7: note:   candidate expects 1 argument, 0 provided
/home/junghans/kokkos/algorithms/src/Kokkos_Sort.hpp:95:7: note: candidate: Kokkos::BinSort<Kokkos::View<const double* [3], Kokkos::LayoutRight>, Kokkos::BinOp3D<Kokkos::View<const double* [3], Kokkos::LayoutRight> >, Kokkos::Serial, int>::BinSort(Kokkos::BinSort<Kokkos::View<const double* [3], Kokkos::LayoutRight>, Kokkos::BinOp3D<Kokkos::View<const double* [3], Kokkos::LayoutRight> >, Kokkos::Serial, int>&&)
/home/junghans/kokkos/algorithms/src/Kokkos_Sort.hpp:95:7: note:   candidate expects 1 argument, 0 provided
make: *** [Makefile:56: binning_kksort.o] Error 1

Add Correctness Checks

We need to add some intrinsic correctness checks, as well as statistical gold file comparisons.

Problems with CUDA device

I tried to run the provided LJ example on a GPU:

./cbnMD -il in.lj --device-type CUDA

The job crashed with the following error message:

Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
                                  without setting CUDA_LAUNCH_BLOCKING=1.
                                  The code must call Cuda().fence() after each kernel
                                  or will likely crash when accessing data on the host.
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered /.../kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:144
Traceback functionality not available

What did I do wrong?

MPI Issues

I'm seeing some MPI issues. On Cray, I get an error unless I set MPICH_NO_BUFFER_ALIAS_CHECK=1

PMPI_Scan(695): MPI_Scan(sbuf=0x7fffffff3878, rbuf=0x7fffffff3878, count=1, MPI_INT, MPI_SUM, MPI_COMM_WORLD) failed
PMPI_Scan(672): Buffers must not be aliased. Consider using MPI_IN_PLACE or setting MPICH_NO_BUFFER_ALIAS_CHECK

On my Linux box with MPICH I get a similar error:

Fatal error in PMPI_Scan: Internal MPI error!, error stack:
PMPI_Scan(639)........: MPI_Scan(sbuf=0x7fffffffdaa0, rbuf=0x7fffffffdaa0, count=1, MPI_INT, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_Scan_impl(506)...:
MPIR_Scan_generic(155):
MPIR_Localcopy(357)...: memcpy arguments alias each other, dst=0x7fffffffdaa0 src=0x7fffffffdaa0 len=4

Add LJ variant with Intensity Dial

This is about adding a variant of LJ where the computational intensity of the force calculation can be chosen, for example by doing the actual pair interaction (i.e. the code inside the cutoff check) multiple times. The goal is to being able to change the ratio of force compute to communication and/or even the amount of work which is done for each atomic add when using a half neighbor list.

Add HalfNeighbor List support

Currently only the full neighborlist approach is supported. We need to add half neighbor list including forward communication of forces on halo particles.

Add Rectangular NeighborList

Both MiniMD and LAMMPS use rectangular neighbor lists in threaded mode instead of CSR. That means the neighbor list is a simple 2D array. This allows for a simple single pass neighborlist construction at the cost of increased memory consumption.

Add Bonded Interaction

We need a representation of bond and angle forces as they are commonplace in bio simulations.

Add LJ variant for HalfNeighbor list which uses data replication

The OpenMP package in LAMMPS utilizes data replication to avoid write conflicts on the force array. This works very well for small amounts of threads. It may be worthwhile to explore to use a combination of data replication and atomics for higher thread counts (like on GPUs) to reduce the conflict rate while keeping the amount of data replication limited.

View Bounds Error

When I enable view bounds checking, I get an abort, see below. This only happens when running on many processors. This issue was already present before the fix for atom array size in fbe8b07. It doesn't seem to affect the numerics and it doesn't segfault when bounds checking is off.

Using: ForceLJNeighHalf Neighbor2D CommMPI BinningKKSort
Atoms: 2048000 32000

#Timestep Temperature PotE ETot Time Atomsteps/s
0 1.400000 -6.332812 -4.232813 0.000000 0.000000e+00
terminate called after throwing an instance of 'std::runtime_error'
  what():  View bounds error of view  ( 4032 < 4032 , 0 < 3 )
Traceback functionality not available

terminate called after throwing an instance of 'std::runtime_error'
  what():  View bounds error of view  ( 4032 < 4032 , 0 < 3 )
Traceback functionality not available

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.