Coder Social home page Coder Social logo

vpic-kokkos's Introduction

Welcome to the Kokkos version of the Vector Particle-in-Cell code, VPIC 2.0! VPIC is a 3D3V, fully relativistic, kinetic, performance-first PIC code for solving the coupled Maxwell-Boltzmann system of equations. Utilizing the Kokkos performance-portable framework, VPIC achieves high performance on multiple CPU and GPU architectures.

The primary documentation for VPIC has moved to Sphinx and is hosted on GitHub Pages here, and located in docs/. The documentation is still a work in progress, but hopefully sufficient to get most users started.

Attribution

Researchers who use the VPIC code for scientific research are asked to cite the papers listed below.

  1. Bird, R., Tan, N., Luedtke, S. V., Harrell, S. L., Taufer, M., & Albright, B. (2021). VPIC 2.0: Next generation particle-in-cell simulations. IEEE Transactions on Parallel and Distributed Systems, 33(4), 952-963.

  2. Bowers, K. J., B. J. Albright, B. Bergen, L. Yin, K. J. Barker and D. J. Kerbyson, "0.374 Pflop/s Trillion-Particle Kinetic Modeling of Laser Plasma Interaction on Road-runner," Proc. 2008 ACM/IEEE Conf. Supercomputing (Gordon Bell Prize Finalist Paper). http://dl.acm.org/citation.cfm?id=1413435

  3. K.J. Bowers, B.J. Albright, B. Bergen and T.J.T. Kwan, Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation, Phys. Plasmas 15, 055703 (2008); http://dx.doi.org/10.1063/1.2840133

  4. K.J. Bowers, B.J. Albright, L. Yin, W. Daughton, V. Roytershteyn, B. Bergen and T.J.T Kwan, Advances in petascale kinetic simulations with VPIC and Roadrunner, Journal of Physics: Conference Series 180, 012055, 2009

Copyright

© 2022. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

This program is open source under the BSD-3 License. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

vpic-kokkos's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vpic-kokkos's Issues

Emitters and particle injectors need to move particles

In some use cases such as open boundaries, emitters and particle injectors are used to inject a net current in addition to particle flux. To do this properly, injected particles must be moved off the cell face and accumulated. Without accumulation, the local current is inconsistent with curl(B) and leads to very incorrect results. Injected particles may also leave the local domain, and this should be allowed as well. Moving particles from emitters and injectors is difficult right now because of the way device/host copies are staged.

Device-side emission

To support device-side emission, transfer from sp->k_pc_d to sp->k_pc_h

KOKKOS_TIC();
// I need to know the number of movers that got populated so I can call the
// compress. Let's copy it back
Kokkos::deep_copy(sp->k_nm_h, sp->k_nm_d);
// TODO: which way round should this copy be?
// int nm = sp->k_nm_h(0);
// printf("nm = %d \n", nm);
// Copy particle mirror movers back so we have their data safe. Ready for
// boundary_p_kokkos
auto pc_d_subview = Kokkos::subview(sp->k_pc_d, std::make_pair(0, sp->k_nm_h(0)), Kokkos::ALL);
auto pci_d_subview = Kokkos::subview(sp->k_pc_i_d, std::make_pair(0, sp->k_nm_h(0)));
auto pc_h_subview = Kokkos::subview(sp->k_pc_h, std::make_pair(0, sp->k_nm_h(0)), Kokkos::ALL);
auto pci_h_subview = Kokkos::subview(sp->k_pc_i_h, std::make_pair(0, sp->k_nm_h(0)));
Kokkos::deep_copy(pc_h_subview, pc_d_subview);
Kokkos::deep_copy(pci_h_subview, pci_d_subview);
// Kokkos::deep_copy(sp->k_pc_h, sp->k_pc_d);
// Kokkos::deep_copy(sp->k_pc_i_h, sp->k_pc_i_d);
KOKKOS_TOC( PARTICLE_DATA_MOVEMENT, 1);

must occur after apply_emitter_list() and user_particle_injection(). Also, since emission/injection typically require random numbers, #58 would be helpful.

Host-side emission

Support for host-side emission is far more complicated. At a minimum any support requires

  1. Instead of copying the whole particle array, logic should be added around apply_emitter_list() and user_particle_injection() to monitor sp->np and sp->nm for changes. New particles and movers would be copied to the device. Emission may generate movers, and these need to be on the device for the compressor to work correctly.
  2. Copy interpolator memory to host before user_particle_injection() if kokkos_particle_injection == false. Emitters should be responsible for doing this themselves in order to support both device and host-side emitters, but this requires #50 since emitters do not have access to the copy methods currently.
  3. Host-side accumulators must be cleared before apply_emitter_list() and user_particle_injection() and not after
    if( emitter_list )
    {
    TIC apply_emitter_list( emitter_list ); TOC( emission_model, 1 );
    }
    if((particle_injection_interval>0) && ((step() % particle_injection_interval)==0)) {
    if(!kokkos_particle_injection) {
    KOKKOS_TIC();
    KOKKOS_COPY_PARTICLE_MEM_TO_HOST(species_list);
    KOKKOS_TOC(PARTICLE_DATA_MOVEMENT, 1);
    }
    TIC user_particle_injection(); TOC( user_particle_injection, 1 );
    if(!kokkos_particle_injection) {
    KOKKOS_TIC();
    KOKKOS_COPY_PARTICLE_MEM_TO_DEVICE(species_list);
    KOKKOS_TOC(PARTICLE_DATA_MOVEMENT, 1);
    }
    }
    bool accumulate_in_place = false; // This has to be outside the scoped timing block
    KOKKOS_TIC(); // Time this data movement
    // This could technically be done once per simulation, not every timestep
    if (accumulator_array->k_a_h.data() == accumulator_array->k_a_d.data() )
    {
    accumulate_in_place = true;
    }
    else {
    // Zero out the host accumulator
    Kokkos::deep_copy(accumulator_array->k_a_h, 0.0f);
    }
    KOKKOS_TOC( ACCUMULATOR_DATA_MOVEMENT, 1);
  4. One of the following:
    • Allow accumulation into accumulator_array->k_a_h from move_p() (the host-side version).
    • clear_accumulator_array() must be called every step in the advance loop (next to the deep_copy above) and combine_accumulators() must first combine a reduced accumulator_array->a with accumulator_array->k_a_h before copying to device.

I think this is everything required for host-side emission to work, but there might be things I missed.

k_* and kokkos_* confusion

compute_div_e_err.cc line 334 and vacuum_compute_div_e_err.cc line 327 use different versions of the same function. Which is better?

Strided sort memory usage

I'm seeing much higher memory usage while sorting with the strided sort than standard. I get errors like this:

  what():  Kokkos failed to allocate memory for label "Kokkos::SortImpl::BinSortFunctor::sorted_values".  Allocation using MemorySpace named "Cuda" failed with the following error:  Allocation of size 22.25 M failed, likely due to insufficient memory.  (The allocation mechanism was cudaMalloc().  The Cuda allocation returned the error code ""cudaErrorMemoryAllocation".)

spikes in hydro outputs

hydro outputs have some problems. For example, viy shown below has spikes at the corners of the local domains.

The issue can be fixed by switching to the kokkos version of synchronize_hydro_array in src/vpic/dump.cc

hydro_array->copy_to_host();
synchronize_hydro_array( hydro_array );

to

synchronize_hydro_array_kokkos(hydro_array);
hydro_array->copy_to_host();

The test was done on Perlmutter@NERSC. It was a reconnection run with 4 GPU nodes (16 GPUs in total).
vpic-kokkos was configured using

CRAYPE_LINK_TYPE=dynamic cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DENABLE_INTEGRATED_TESTS=ON \
  -DENABLE_UNIT_TESTS=ON \
  -DBUILD_INTERNAL_KOKKOS=ON \
  -DENABLE_KOKKOS_CUDA=ON \
  -DKokkos_ARCH_AMPERE80=ON \
  -DKokkos_ARCH_ZEN3=ON \
  -DKokkos_ENABLE_PTHREAD=OFF \
  -DKokkos_ENABLE_OPENMP=ON \
  -DCMAKE_CXX_COMPILER="CC" \
  -DCMAKE_CXX_FLAGS="-g -rdynamic"\
  $src_dir

Checkpoint code allocates too little space for new registry entries

This was discovered on the hybridVPIC branch, but should apply to the main branch as well.

The functions register_object and restore_objects each only allocate one byte for new registry entries. This leads to intermittent crashes of the checkpoint code, and to the error "cannot checkpoint a pointer to an unregistered object".

This can be fixed by changing lines 191 and 304 in src/util/checkpt/checkpt.c from:
MALLOC( node, 1 );
to:
MALLOC( node, sizeof(*node) );

As with many memory related bugs, this one can be a little hard to reproduce. This seems to be triggered in particular with GCC compilers.

make install issues

Making binary installations with make install has some issues:

  1. Public header files are not installed
  2. If BUILD_INTERNAL_KOKKOS = ON then the installed compile script references locations in the source and build directories instead of their installed counterparts. (e.g., srcdir/kokkos/bin/nvcc_wrapper instead of installdir/bin/nvcc_wrapper)

Add field ionization

Add field ionization physics to VPIC:

  • Calculate the ionization rate (multiphoton ionization, ADK ionization, and BSI ionization) based on the interpolated E-field
  • Compare ionization probability and uniform random number to decide if a given ionization event occurs.
  • Add functionality to check for multiple ionization events in a single timestep.
  • Change the particle's charge to make it the appropriate ion.
  • Add a macro electron with the appropriate weight.
  • Account for energy conservation due to energy loss from the field.

Kokkos version in Travis CI

The Travis CI does not test VPIC with the version of Kokkos that the vpic-kokkos repo points to, so a git clone --recursive may not work.

Separate mp_t and mp_k_t types

The mp_t struct is used differently by the traditional MPI routines and the *_k MPI routines. They should be different types for safety.

cmake configuration and kokkos versions

Cmake config needs to be updated or tested over different versions

cmake_minimum_required(VERSION 3.9)
cmake_policy(SET CMP0074 NEW)

Policy CMP0074 doesn't exist between 3.9-3.12 and raises an error.

Using -DBUILD_INTERNAL_KOKKOS=ON with a recently updated Kokkos submodule and nvcc 10.2 raises an error that C++14 is required. I had to change to

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

to make it work.

About the input_deck

Dear developer,

After generating a vpic file in the ~/vpic/kokkos/build/bin/, the next step is" % bin/vpic input_deck " in the README file.
Could you please show an example, which file is input_deck? E.g., the shock example or 2D dipole example.

Thank you very much!

Best regards,
Zhongwei

particle tracking

Hello,

I have implemented particle tracking in my input deck. To do this in a way consistent with current VPIC, I am using the particle weight for the storage of particle tag. But to make sure that these test particles do not generate currents, they needed to be defined in a separate species list and managed in input deck for particle push (using a dummy field array) and boundary handling, which works but is cumbersome.

I wonder in the future version, will VPIC support some better way to track particles, e.g., allow the user to skip current deposition in the pusher for some species, or define separate variable for tagging the particles.

Thanks!

Improve Collisions PR

We have a WIP PR for adding TA collisions, but we need the following improvements:

  • Kokkos uint() rand has wrong bounds. needs to include 0
  • re-seed collisions once per step, not once per collision (or not at all, having a persistent generator)
  • The RNG uses a hard coded Kokkos RNG, that should use the RNG_strategy abstraction
  • Need to figure out what affect the different RNG pools have (i.e Random_XorShift64 vs Random_XorShift1024

Reduce accumulator data movement

We currently copy accumulator data back from the device before boundary_p (which adds to it), and then copy it back to the device.

This could instead only copy the data from host->device and sum into the existing data that's on the GPU

This should speed things up by ~2x wrt the ACCUMULATOR_DATA_MOVEMENT timer

hybridVPIC fails CI

Branch hybridVPIC is failing CI because it doesn't have Kokkos as a submodule. I don't know how it passed CI in PR #92.

compile issue on NERSC

I am testing the VPIC kokkos version on NERSC and encounter the following issue when making the integrated tests.

make[1]: *** [CMakeFiles/Makefile2:400: test/integrated/to_completion/CMakeFiles/dump.dir/all] Error 2
/global/homes/h/huangck/CFS/VPIC/vpic-kokkos/deck/wrapper.cc:7:41: fatal error: /((struct user_global_t *)user_global)/homes/h/huangck/CFS/VPIC/vpic-kokkos/test/integrated/to_completion/simple.deck: No such file or directory
6 | #include EXPAND_AND_STRINGIFY(INPUT_DECK)
| ^
compilation terminated.

The problem seems to be related to the path name of the input deck containing "global" which is expanded by the preprocessor. NERSC's file systems all started with the top level directory "global". I worked around this by undefining global before including the input deck and redefining it in the deck again. I wonder if there is a better fix?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.