nalucfd / nalu Goto Github PK

Nalu: a generalized unstructured massively parallel low Mach flow code designed to support a variety of open applications of interest built on the Sierra Toolkit and Trilinos solver Tpetra solver stack. The open source BSD, clause 3 license model has been chosen for the code base. See LICENSE for more information.

Home Page: https://github.com/NaluCFD/Nalu

License: Other

CMake 0.52% C++ 16.80% C 73.54% Shell 0.26% Fortran 1.94% Python 0.14% Assembly 5.50% TeX 1.25% Makefile 0.03% Gnuplot 0.03%

cfd low-mach turbulence les snl-applications

nalu's Introduction

Nalu, a generalized unstructured massively parallel low Mach flow code designed to support energy applications of interest.

The Department of Energy (DOE), by memo dated 05/13/2014, has granted Sandia permission to assert its copyright in software entitled "Nalu v 1.0".

Follow us!

Referencing Nalu

When disseminating technical work that includes Nalu simulations, please reference the following citation:

Domino, S. "Sierra Low Mach Module: Nalu Theory Manual 1.0", SAND2015-3107W, Sandia National Laboratories Unclassified Unlimited Release (UUR), 2015. https://github.com/NaluCFD/NaluDoc

This document can be found under the Nalu Documentation.

Nalu Documentation

The Nalu documentation website is located here.

Building Nalu

Detailed build instructions for Nalu and the accompanied required TPLs, e.g., Trilinos, YAML, etc., can be found in the Nalu Documentation.

Nightly Testing Status

Nightly testing results can be seen by clicking the CDash logo above. More details on testing can be found here.

Help and Questions

Projects that are using Nalu should use the formal github issue tracking for any questions or help. Issues are addressed by the Nalu user/developer community subject to availability.

nalu's People

Contributors

Stargazers

Watchers

Forkers

gantech haider-ba sayerhs crdohrm mopolino8 tjotaha rcknaus gsjaardema skennon10 michaelasprague jamelvin shashanknrel xyuan tonyinme dcmania jhux2 mbarone81 tzanio myousefi2016 eminsight octwanna lawsonro3 mlblayl leixu-cfd alanw0 overfelt raymond-sy ivanredbread gyzhangqm camelliadpg madhusnk nyue spdomin wenjundong xianjiaodahe91 spdomino jamesliu516 kilojoules thw1021 zhangning737 lmns3d drwanghan hossam45 vishalsacharya nopech sbmbeng grahamg3 magicknight hijiker mulligatawny h00shi supermangithu zwghit bartlettroscoe tbrodine jbeards3 shidong123 rnshah9 benjamark borispopoff djglaze psakievich kkyter aerosayan

nalu's Issues

Files MeshMotion.h and nso/* missing?

My build of the current master version of Nalu 97f4040d fails with errors like:

/ascldap/users/jhu/nalu/Nalu/src/Realm.C(35): catastrophic error: cannot open source file "MeshMotionInfo.h"
  #include <MeshMotionInfo.h>

and

/ascldap/users/jhu/nalu/Nalu/src/SpecificDissipationRateEquationSystem.C(51): catastrophic error: cannot open source file "nso/ScalarNSOKeElemSuppAlg.h"
  #include <nso/ScalarNSOKeElemSuppAlg.h>

Are these files intended to be part of Nalu?

@spdomin @NaluCFD/core

ABL Forcing Source Term Implementation

Implement momentum and temperature source terms to drive ABL simulations to a desired velocity and temperature profile as a function of height and time.

cc: @mbarone81, @mchurchf

Normal distance to the wall usage for resolved and wall function simulation

When computing y+, what is the value of Yp that should be used?

The current wall-resolved implementation is using a normal distance = 1/4*normalOpposingEdgeDistance. However, based on the usage of the projected nodal gradient, this distance should simply be normalOpposingEdgeDistance.

For the wall-function approach, more thought is required.

Matrix row size guess is set at 8; may be too small for many use cases

Initial guess at row size is set at 8 even though we can support CVFEM 3x3 momentum:

globallyOwnedGraph_ = Teuchos::rcp(new LinSys::Graph(globallyOwnedRowsMap_, ownedPlusGloballyOwnedRowsMap_, 8));

We should set that final number based on the local and shared row map to get the size of the stencil 100% correct.

ECP 6: Establish and improve strong scaling of linear solves on high aspect ratio meshes

Activities:

Evaluate current solver technologies for high aspect ratio meshes.
Develop improved coarsening and preconditioning techniques.
Evaluate possible algorithmic discretization approach improvements.
Explore approximate LHS contribution’s effect on linear and nonlinear solver convergence.

ECP 5(c) Overset path forward with mesh movement

For the next quarter, let's move forward with the following distribution of work (subject to negotiation):

@mbarone81 is exploring hole cutting algorithms and procedures to decide if Nalu should manage this or a TPL.
@spdomin is exploring the interior penalty methodology to remove the constraint aspect of the overset problem that degrades accuracy and is problematic for low-Mach implicit PPE codes.
@skennon10 is baselining STK skin mesh and part management approaches and exploring other FEM-like approaches that speaks to avoiding this part management aspect in STK.
@sayerhs will focus on adding mesh movement to the current constraint-based overset algorithm (starting with the simple heat conduction Rtest problem, only with mesh motion). This will involve defining another set of nodes to reconstruct the background state from the overset state. You can also start considering how the current OversetManager can be generalized to have more background surface, overset surface pairs similar to how the DG scheme is managed.

The immediate goals this year are to explore sliding and overset technologies in the context of the ECP wind application space. Given the implicit solve requirement, we already know that sliding mesh affords exploitation of matrix rows that are being modified (which lends to optimization and in-situ implementations) while overset allows for generality of meshing at the expense of in situ optimization of the linear system solves. This effort will focus on overset algorithms and performance benchmarking.

ECP 10: Memory reduction techniques for higher-order

Activities:

Establish static condensation approaches with local, fast solves.
Deploy preconditioner coarsening strategies which may require only the P=1 system to be stored.
Evaluate matrix-free methods for both advection/diffusion and possible PPE.

Upgrade YAML to latest openly supported, production ready version

YAML 0.3 has extensive usage of deprecated smart pointer syntax which causes a variety of build warnings on C++ 11 compilers/platforms.

Tasks:

Upgrade YAML
Verify that build warnings are removed
Ensure that test suite is clean
Update build instructions
commit

ECP 4: Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking

Activities:

Algorithm infrastructure modification to operate on a homogeneous topology set; step 1 is to elevate std::vector resize operations to the algorithm constructor.
Transition of matrix assemble algorithms to use Kokkos-based matrix apply_coefficient().
Usage of Kokkos views in solver-based algorithms.

build/test status issue

Looks cool (build/test status):

https://github.com/thujohn/twitter

Through Travis_CI?

https://travis-ci.org/thujohn/twitter

Removal of SuperLU from Trilinos config and build results in poor Continuity solves

Removing SuperLu from Trilinos configurations results in widespread Rtest failure with notable increase in Muelu iterations.

Can we make sure that all is well when we remove SuperLU?

ECP 5(c) Generalize mesh motion

Relax the requirement to have sliding mesh simulations in the x-direction for 3D.

Also allow for a clean interface between 2D and 3D.

I will handle the code based on the formula provided by @gantech

Debug build provides many warnings; please fix

Debug builds are a bit ugly now in that we have many warnings. Common issues are unused variables, bad comparisons between int and size_t, and initialization ordering.

It would be nice to monitor this more closely in the future. Having a nightly debug build would be useful.

Also, we should review why we have this disparity between debug and opt. I think that it was an optimization to have opt mostly working everywhere even when small warnings were present.

I once had warnings as errors for debug active which may make things better in the future. Not sure why this was turned off.

Investigate using Docker for Nalu

ECP 14: in situ matrix modifications

Activities:

Establish local row- and column-maps for dofs only at moving mesh interfaces.
Modify matrix assembly to exploit locally changed matrix maps.
Work with Tpetra team to allow for in situ matrix modifications.

ECP 4(a) ScratchView interface change

A few notes on the templated SuppAlgs and ScratchView interface:

code does not build in debug (somewhat related):

from /UnitTestKokkosViews.C:5:
/Trilinos_stable_release/include/stk_mesh/baseImpl/FieldBaseImpl.hpp:152:7: warning: â��*((void*)& f +40)â�� may be used uninitialized in this function [-Wmaybe-uninitialized]

   m_field_states[i] = field_states[i];

We also gather tensor quantities, e.g., duidxj via the following interface

      for ( int ni = 0; ni < num_nodes; ++ni ) {
        stk::mesh::Entity node = node_rels[ni];
        const double * du     =  stk::mesh::field_data(*dudx_, node);
        // row for p_dudx
        const int row_p_dudx = niNdim*nDim;
        for ( int i=0; i < nDim; ++i ) {
          // gather tensor
          const int row_dudx = i*nDim;
          for ( int j=0; j < nDim; ++j ) {
            p_dudx[row_p_dudx+row_dudx+j] = du[row_dudx+j];
          }
        }
      }

Let's avoid a call signature to fill_prereq_data including meSCS, meSCV, meFEM and later meFC, etc., by calls that add the master element, e.g., add_cvfem_interior_scs(), etc.
Remove excessive logic for each template argument to the SuppAlgs in, e.g., HeatCondEQs::register_interior()
@alanw0, I think that coordField can also be removed since it is part of the "add" infrastructure...

  fill_pre_req_data(dataNeededBySuppAlgs_, bulk_data, topo, meSCS, meSCV, element,
                        coordField, prereqData);

This call can be elevated in ElemSolver from inner bucket loops since topo is fixed for the ElemSolver:

stk::mesh::Bucket & b = *elem_buckets[team.league_rank()];
    stk::topology topo = b.topology();
    sierra::nalu::MasterElement* meSCS = realm_.get_surface_master_element(topo);
    sierra::nalu::MasterElement* meSCV = realm_.get_volume_master_element(topo);

Possibly more...

SuperLU Version Number in Documentation in Incorrect (Very Minor Bug)

Linux Build page, SuperLU is listed as version 2.9.2 but then 4.3 in the instructions. Minor bug.

Trilinos and TPL status

This is a live update on the status of Trilinos and other TPL builds/versions.

SuperLU_4.3 compilation on Certainty (Stanford)

When building SuperLU_4.3 on Certainty (Stanford), in addition to copying the src files to the install directory, I had to also copy the lib files: cp lib/* $nalu_build_dir/install/SuperLU_4.3/lib

All the other builds went perfectly. Thanks for the detailed instructions.

Lluis

A2e Task: Moeng surface shear stress boundary condition

@sayerhs @mchurchf : This issue tracks progress on the Moeng surface stress boundary condition.

Demonstrate perfect restart for full phyics applications.

A restarted simulation does not provide exactly the same results. For example, if I run a simulation from step 1:100 and then restart at step 50, the steps are really (really) close, however, not perfect. This suggests a subtle start-up issue with restart that should be resolved.

@NaluCFD/core, I need someone who will lead this task. Might be good for a new developer..

The recent dgNonConformalThreeBlade regression test is the target test problem. However, that test is probably too complex to start with since it has mesh motion, second order time integration (with adaptive time step), etc.

I would start with a simple heat conduction case and move up in complexity.

Does custom ghosting negatively affect matrix assembly?

If we have aura, XFER or sliding mesh/overset active, custom ghosting will be active.

In the setup phase, we extract the following selector and buckets:

  // create a localID for all active nodes in the mesh...
  const stk::mesh::Selector s_universal = metaData.universal_part()
      & !(realm_.get_inactive_selector());

  stk::mesh::BucketVector const& buckets =
      realm_.get_buckets( stk::topology::NODE_RANK, s_universal );

This means that some ghosted elements/edges, which are a relic of the XFER, sliding/overset, will be included even though they are not formal candidates.

@alanw0 and @skennon10, is this true? Can we add the parts of the mesh in the active blocks to help? Would we need a separate part of all of the "non essential" ghosted entities to reduce what is provided to this setup method?

ECP 13: Improvements to linear solver setup costs

Activities:

Benchmark, at scale, solver setup costs over the history of the project (~once per year).
Improve RAP operation.
Integrate with Kokkos and Tpetra.

Linear system imbalance in ABL strong-scaling simulations

@spdomin @alanw0 @aprokop @mbarone81
In looking at the ABL strong scaling simulations on 256, 512, and 1024 cores, we have observed that the linear system passed into MueLu becomes more imbalanced as the core count grows. (See data below). This will very likely affect communication in the preconditioner setup.

I have some initial questions:

How is the distribution of the fine grid matrix determined? Is it primarily from the decomposition of the underlying grid?
Are there options available in the Nalu input deck that can change/influence the matrix distribution?

256 cores

A0 size =  2113536 x 2113536, nnz = 15155200
A0 Load balancing info
A0   # active processes: 256/256
A0   # rows per proc   : avg = 8.26e+03,  dev =   3.4%,  min =   -7.0%,  max =   +8.7%
A0   #  nnz per proc   : avg = 5.92e+04,  dev =   5.1%,  min =   -9.2%,  max =  +13.5%
A0 Communication info
A0   # num export send : avg =     0.00,  dev =   0.00,  min =    0.0 ,  max =    0.0
A0   # num import send : avg = 2.51e+03,  dev =   5.6%,  min =  -11.0%,  max =  +10.6%
A0   # num msgs        : avg =     9.50,  dev =   1.39,  min =    6.0 ,  max =   13.0
A0   # min msg size    : avg = 2.56e+00,  dev = 157.3%,  min =  -61.0%,  max = +485.4%
A0   # max msg size    : avg = 5.20e+02,  dev =   2.5%,  min =   -1.5%,  max =   +7.9%

512 cores

A0 size =  2113536 x 2113536, nnz = 15155200
A0 Load balancing info
A0   # active processes: 512/512
A0   # rows per proc   : avg = 4.13e+03,  dev =   3.7%,  min =   -7.0%,  max =  +12.0%
A0   #  nnz per proc   : avg = 2.96e+04,  dev =   7.3%,  min =   -9.2%,  max =  +24.1%
A0 Communication info
A0   # num export send : avg =     0.00,  dev =   0.00,  min =    0.0 ,  max =    0.0
A0   # num import send : avg = 1.53e+03,  dev =   7.5%,  min =  -18.7%,  max =  +10.3%
A0   # num msgs        : avg =     9.75,  dev =   1.47,  min =    6.0 ,  max =   13.0
A0   # min msg size    : avg = 2.66e+00,  dev = 163.5%,  min =  -62.4%,  max = +464.7%
A0   # max msg size    : avg = 2.60e+02,  dev =   2.9%,  min =   -1.5%,  max =  +11.1%

1024 cores

A0 size =  2113536 x 2113536, nnz = 15155200
A0 Load balancing info
A0   # active processes: 1024/1024
A0   # rows per proc   : avg = 2.06e+03,  dev =   4.9%,  min =  -13.2%,  max =  +18.6%
A0   #  nnz per proc   : avg = 1.48e+04,  dev =   7.9%,  min =  -15.2%,  max =  +31.4%
A0 Communication info
A0   # num export send : avg =     0.00,  dev =   0.00,  min =    0.0 ,  max =    0.0
A0   # num import send : avg = 1.04e+03,  dev =   5.9%,  min =  -17.4%,  max =  +12.2%
A0   # num msgs        : avg =     9.75,  dev =   1.08,  min =    6.0 ,  max =   13.0
A0   # min msg size    : avg = 1.45e+00,  dev =  94.6%,  min =  -31.2%,  max = +381.7%
A0   # max msg size    : avg = 2.58e+02,  dev =   2.1%,  min =   -0.8%,  max =   +5.4%

General code clean-up tasks

Any opportunity for miscellaneous code clean-up.

ECP 5(b): Evaluate dropping Jacobian entries for sliding/overset

Remove LHS contributions for column entries on sliding mesh matrix entries that change over the simulation.

This will require modification of non-conformal algorithms in addition to increasing iterations over the continuity equation, e.g.,

const int numContSubIter = 10;
for ( int k = 0; k < numContSubIter; ++k ) {
    // continuity assemble, load_complete and solve
    continuityEqSys_->assemble_and_solve(continuityEqSys_->pTmp_);

    // update pressure
    timeA = NaluEnv::self().nalu_time();
    field_axpby(
      realm_.meta_data(),
      realm_.bulk_data(),
      1.0, *continuityEqSys_->pTmp_,
      1.0, *continuityEqSys_->pressure_,
      realm_.get_activate_aura());
    timeB = NaluEnv::self().nalu_time();
    continuityEqSys_->timerAssemble_ += (timeB-timeA);
}

Any takers from @NaluCFD/sliding or @NaluCFD/solver ?

I will start this next week if there are no volunteers.

ECP 3: Demonstrate single blade-resolved simulation in non-rotating turbulent inflow

Generic build/test issues for Nalu

@NaluCFD/core, Any current build issues comment here.

FY17 ECP Goals

@NaluCFD/core, From the SNL-owned ECP milestones, here are the FY17 high level project goals. Let's rally around these for the stand-up meetings and work to fill in the other efforts not yet captured. Below, I have a brief description of the task/goal followed by a link to the ECP. Finally, a delivery quarter is provided.

Implement and performance benchmark ABL; ECP 1; FY17Q1
Develop a kokkos/stk algorithmic interface in Nalu core matrix assembly algorithms; ECP 4, FY17Q4; 4a; FY17Q4
Extend overset methods to include mesh motion: ECP 5, FY18Q1
Improve sliding mesh algorithmic performance with demonstration runs; evaluate methods to reduce matrix setup: ECP 5, FY18Q1
Preliminary evaluation of parallel search and matrix modifications: ECP 5, FY18Q1
Start evaluation of high aspect ratio AMG performance: ECP 6, FY18Q2
Work towards threaded preconditioners/solves for adv/diff and PPE: ECP 6, FY18Q2; ECP 7 (TBD), FY17Q3
Improvement of AMG setup costs: ECP 13, FY20Q1 (may include evaluation of reduced update frequency to matrix/preconditioner)
Higher-order performance bench-marking and improvement: ECP 5, FY18Q1; 10, FY19Q2; 12; FY19Q4
Evaluate alternative solver strategies (GMRES/BDDC using Tpetra/Balos/ifpack2): ECP 6, FY18Q2
Demonstrate non-moving blade simulation FY17Q3
Build and test system FY17Q1

ECP 4(b) sumInto() through Kokkos matrix in prep for Tpetra changes

Re-factor sumInto() within TpetraLinearSystem().

Surface force and moment post processing zeros out fields when not applying

In my LES channel flow, I am computing a time mean tau_wall, Yplus, etc. However, the algorithm is only executed at a user-defined output step. Meanwhile, the field is always zeroed each and every time step.

Bottom line: the instantaneous tau_wall is fine, however, the time mean is not correct unless the algorithm is applied each and every time step.

Fix: In the AlgDriver, make sure that we only zero, execute, etc., if the frequency is met.

@mbarone81 this is just a FYI as I plan on fixing this today.

Support for pyramid elements on an exposed boundary

Nalu does not currently support pyramid elements on an exposed boundary. The master element code does not define exposed face integration point node maps, for example. This causes a runtime error when a boundary condition algorithm requests these maps. @spdomin and I discussed defining the boundary ip node map for pyramids, thereby allowing a pyramid to sit on a symmetry boundary. However, the momentum symmetry bc algorithm computes dndx at the boundary, requiring a face_grad_op, which is also currently undefined for pyramids.

Deploy full Pyramid5 support -

We have a new use case in which pyramids are required at DG interfaces.

We also need to implement isInElement() for a pyramid for XFERS

@rcknaus, @mbarone81 or @spdomin, let's negotiate:) Anyone working today or tomorrow has first dibs as I would like to launch my full-up tower simulation very soon.

Here is the list:

For DG:

meSCSCurrent->sidePcoords_to_elemPcoords(currentFaceOrdinal, 1, &currentIsoParCoords[0],    &currentElementIsoParCoords[0]);

meSCSCurrent->general_face_grad_op(currentFaceOrdinal, &currentElementIsoParCoords[0], &p_c_elem_coordinates[0], &p_c_dndx[0], &ws_c_det_j[0], &scs_error);

For XFER:

      const double nearestDistance = meSCS->isInElement(&theElementCoords[0],
                                                        &(tocoords[0]),
                                                        &(isoParCoords[0]));

I would also like to take the opportunity to check our sidePcoords_to_elemPcoords() implementations. We could do this by the following:

at the current gauss point, determine the boundary integration point coordinates.
call isInElement on the opposing element. Compute the coordinates of this gauss point using the general interpolatePoint method on the opposing face.
call isInElement on the opposing face and compare the isoparametric coordinates from the isInElement call and the sidePcoords.

Shifted Laplace with consolidated approach is segfaulting

My production V27 cases are segfaulting on Cori when using the shifted element-based operators with the consolidated algorithm approach. The segfault occurs at the ContinuityAdvectionElemKernel::execute().

In fact, the same segfault occurs on, e.g., the ductWedge case when I change the simulation to element-based, with consolidated and shifted. The modifications to the ductWedge input file to activate the shifted/consolidated approach are attached. Removal of the consolidated keyword results in a clean simulation.

I also converted the elemHybridFluids case and the same behavior is noted.

I am not sure if this was done at the time @sayerhs converted the Cont SupAlg or during subsequent refactors of current/coords or an issue with the orig SuppAlg? At the very least, the full AssembleContinuityElemSolver is working. This issue should be addressed ASAP since milestones are affected, e.g., FY17/Q3 and FY18/Q1.

ductWedgeShiftC.txt

ECP 5: Deploy production sliding mesh capability with linear solver benchmarking

Activities:

Improve baseline sliding mesh capability at curved surfaces.
Evaluate ATDM-based parallel search methods.
Establish matrix set-up cost timings.
Evaluate possible lagging of matrix update.
Evaluate reduction of matrix system by omitting moving block column entries in favor of multiple matrix assembly/solve iterations.

ECP 5(a): Develop robust, convergent NSO

Explore the usage of NSO operators in the context of high speed LES flows.

Link to: #5

ECP 1: Establish time-to-solution for ABL

Activities:

Documentation of the ABL model implementation in NaluDoc.
Implementation of appropriate wall boundary conditions for energy and momentum transport.
Prototype STK_transfer of inflow boundary condition from the ABL to a subsequent simulation in which “velocity_bc” is interpolated in space and time for use in the inflow bc.
Testing, at available scales, ABL with solver and general algorithmic costs.

Unable to Build NALU-CFD with Trilinos Configured for OpenMP

Hi NALU-CFD Team, been trying to get NALU-CFD running on one of our new test beds with OpenMP enabled for Trilinos. NALU-CFD will not compile because there is a forced use of Serial host-space which gives the errors below. It would be super-nice to be able to run Trilinos with OpenMP enabled and NALU-CFD in Serial if that's what is required, is that possible at all?

[  1%] Building CXX object CMakeFiles/nalu.dir/src/AssembleElemSolverAlgorithm.C.o
In file included from /home/sdhammo/nalu/trilinos/master-20170618/include/Kokkos_Core.hpp:53:0,
                 from /home/sdhammo/nalu/Nalu-master/include/KokkosInterface.h:12,
                 from /home/sdhammo/nalu/Nalu-master/include/SolverAlgorithm.h:13,
                 from /home/sdhammo/nalu/Nalu-master/include/AssembleElemSolverAlgorithm.h:12,
                 from /home/sdhammo/nalu/Nalu-master/src/AssembleElemSolverAlgorithm.C:10:
/home/sdhammo/nalu/trilinos/master-20170618/include/Kokkos_Serial.hpp: In instantiation of ‘typename std::enable_if<std::is_same<TagType, void>::value>::type Kokkos::Impl::ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties ...>, Kokkos::Serial>::exec(Kokkos::Impl::HostThreadTeamData&) const [with TagType = void; FunctorType = sierra::nalu::AssembleElemSolverAlgorithm::execute()::<lambda(const TeamHandleType&)>; Properties = {Kokkos::Serial}; typename std::enable_if<std::is_same<TagType, void>::value>::type = void]’:
/home/sdhammo/nalu/trilinos/master-20170618/include/Kokkos_Serial.hpp:672:7:   required from ‘void Kokkos::Impl::ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties ...>, Kokkos::Serial>::execute() const [with FunctorType = sierra::nalu::AssembleElemSolverAlgorithm::execute()::<lambda(const TeamHandleType&)>; Properties = {Kokkos::Serial}]’
/home/sdhammo/nalu/trilinos/master-20170618/include/Kokkos_Parallel.hpp:190:4:   required from ‘void Kokkos::parallel_for(const ExecPolicy&, const FunctorType&, const string&, typename Kokkos::Impl::enable_if<(! Kokkos::Impl::is_integral<ExecPolicy>::value)>::type*) [with ExecPolicy = Kokkos::TeamPolicy<Kokkos::Serial>; FunctorType = sierra::nalu::AssembleElemSolverAlgorithm::execute()::<lambda(const TeamHandleType&)>; std::__cxx11::string = std::__cxx11::basic_string<char>; typename Kokkos::Impl::enable_if<(! Kokkos::Impl::is_integral<ExecPolicy>::value)>::type = void]’
/home/sdhammo/nalu/Nalu-master/src/AssembleElemSolverAlgorithm.C:140:4:   required from here
/home/sdhammo/nalu/trilinos/master-20170618/include/Kokkos_Serial.hpp:640:9: error: no match for call to ‘(const sierra::nalu::AssembleElemSolverAlgorithm::execute()::<lambda(const TeamHandleType&)>) (Kokkos::Impl::ParallelFor<sierra::nalu::AssembleElemSolverAlgorithm::execute()::<lambda(const TeamHandleType&)>, Kokkos::TeamPolicy<Kokkos::Serial>, Kokkos::Serial>::Member)’
         m_functor( Member(data,ileague,m_league) );
         ^~~~~~~~~
/home/sdhammo/nalu/Nalu-master/src/AssembleElemSolverAlgorithm.C:99:79: note: candidate: sierra::nalu::AssembleElemSolverAlgorithm::execute()::<lambda(const TeamHandleType&)>
   Kokkos::parallel_for(team_exec, [&](const sierra::nalu::TeamHandleType& team)
                                                                               ^
/home/sdhammo/nalu/Nalu-master/src/AssembleElemSolverAlgorithm.C:99:79: note:   no known conversion for argument 1 from ‘Kokkos::Impl::ParallelFor<sierra::nalu::AssembleElemSolverAlgorithm::execute()::<lambda(const TeamHandleType&)>, Kokkos::TeamPolicy<Kokkos::Serial>, Kokkos::Serial>::Member {aka Kokkos::Impl::HostThreadTeamMember<Kokkos::Serial>}’ to ‘const TeamHandleType& {aka const Kokkos::Impl::HostThreadTeamMember<Kokkos::OpenMP>&}’

Add proper NaluUnit repository

Let's invert the pyramid in Nalu regression testing and start building a high quality unit test harness. The former NaluUnit was more like a sandbox.

Periodic + DG/Overset in one simulation (assuming non-interacting nodes)

dofStatus throws when periodic and overset/DG is active even though the nodes may not be painted with both types.

Allowing for a full pitching/plunging with DG/overset and periodic may be hard without a major refactor, however, we should take a look t see if dofStatus can be changed to support and ABL-like simulation with internal contact surfaces.

Evaluate dofStatus changes under the assumption that nodes are either periodic or DG/overset, however, not both.
Evaluate TpetraLinSys under many ghosting contributions. The code now may or may not support multiple ghosting.s I think it does, however, let's make sure.
take notes on what it would take to support a true pitching/plunging-like use case.

Quad9SCS:Hex27 ip consistently

Extracting nearest nodes to its is not consistent for higher order elements.

The snippet of code that shows this issue is as follows:

    // mapping from ip to nodes for this ordinal
    const int *faceIpNodeMap = meFCCurrent->ipNodeMap();
    const int *ipNodeMap = meSCSCurrent->ipNodeMap(currentFaceOrdinal);

    // gather current face data
    stk::mesh::Entity const* current_face_node_rels = bulk_data.begin_nodes(currentFace);

    // gather current element data
    stk::mesh::Entity const* current_elem_node_rels = bulk_data.begin_nodes(currentElement);

    // extract pointers to nearest node fields
    const int nnFIP = faceIpNodeMap[currentGaussPointId];
    const int nnIP = ipNodeMap[currentGaussPointId];
    stk::mesh::Entity nNode = current_face_node_rels[nnFIP];
    stk::mesh::Entity nNodeE = current_elem_node_rels[nnIP];

    if ( nNode != nNodeE ) {
      NaluEnv::self().naluOutputP0() << "nodes not equal; ordinal and gauss point id are " 
                                     << currentFaceOrdinal << " " << currentGaussPointId << std::endl;
    }

Let's resolve this and add a new unit test that checks consistency of these type of operations. To start, the higher order elements, however, would be nice to see all face:element pairs.

@skennon10 may be able to help although @rcknaus can take the lead.

ECP 12: Mixed-order production runs with overset or sliding mesh

Activities:

Implement proper infrastructure for generalized interfaces.
Implement master element methods for low- and higher-order search methods to determine proper face:element DG pairs.
Test mixed element approach for Hex8:Tet4 and Hex8:Hex27 topologies.

ECP 2: Stand-up code repository and software build system and test harness

Activities:

Create draft software design document, including code requirements around building, testing, and documentation.
Establish a public-facing code repository.
Choose and implement build system and testing harness on dedicated testing cluster for nightly building and testing.

Clarity in matrix reuse/recompute options

Recent questions regarding the reuse and recompute options for matrix management in Nalu have been raised.

In general, this option is germane to the continuity system. The low-Mach PPE solved is as follows:

D tau G delta_p = -res

tau is a time scale and D and G are the divergence and gradient operator; res above might include a mass matrix, M(rho). In the case of a time step scaling, tau = dt. Therefore, we can factor out this from the system to obtain:

DG delta_p = -res/dt

In an approximate projection step, DG != L. However, we make this approximation, hence the reason why we have stabilization:

L delta_p = -res/dt

For a mesh that has no topological changes (these occur due to adaptivity, sliding mesh or overset), L is constant over the full simulation. In such cases, we want recompute and reuse preconditioner to be false. Therefore, L prevails over the full system. Although we reassemble the system, we want to avoid the AMG setup cost to a single step over the full simulation.

Note that for the original system, D tau G delta_p = -res, the connectivity can remain the same while the Jacobian entries can change. In such cases, the connectivity is the same, however, the Jacobians are changing. Therefore, reuse should be true. Although tau is taken to be dt, the residual might contain a needed Jacobian entry due to M(rho). Specifically, in acoustically compressible flows in which dp/d(rho) !=0.

For meshes that change topology in Nalu due to sliding or overset, L is changing in time, however, fixed over the nonlinear loop. In this case, the EquationSystem calls reinitialize_linear_system() at the top of the time step. This deletes everything and sets the AMG preconditioner to NULL. In this case, we wan reuse and recompute set to false. This will guarantee that the system is reinitialized once at the top of the Picard loop.

Finally, for algorithms that allow for connectivity changes over the Picard loop, e.g., if we allowed for adaptivity within the iteration, we would want recompute to true.

I think we need some clarity is defining these options.

recompute is a hard reset at any time setMuelu is called.
reuse is only when the LHS is changing entries under the context of fixed topology.

For sliding mesh simulations, we need recompute = reuse = false. This feels non intuitive and is only functional since the EqSys called reinitialize_linear_system().

I might change recompute to "forced_reset_any_time_setMuelu_is_called" or something like that:)

Of coarse, in the future we may want the lagging of LHS connectivity/values with an updated RHS..

Consolidated approach for momentum when hybrid meshes are in use is wrong

I let this one slip... 'should have been more careful in the review.

Basically, the bug is that the consolidated approach logic is only hit when the first interior algorithm type is found. However, for a hybrid mesh, this does not work. The logic needs to be either consolidated or not. Then, for each part pushed in, the code needs to check the map for the topology if interest. The major complexity is added when element mass is added and in supporting both nodal source terms and the consolidated approach (which I would have preferred to hold off on).

Continuity hybrid looks fine.

get_if_present() seems off...

A nightly test is failing due to the fact that the following line:

get_if_present(node, "output_frequency", outputFreq_);

returns outputFreq_ as 0 rather than what it has been specified to be in the constructor (10).

The get_if_present() method seems to be off in that it is returning a value of outputFreq_ that is based on some default if not present...

@sayerhs has a recent code commit that is failing due to a divide by zero whereas either @skennon10 or @gantech own the code base.

Interestingly, in most all of my use cases, I peform the following:

get_if_present(y_output, "paraview_script_name", paraviewScriptName_, paraviewScriptName_);

Deprecate usage of Epetra solver interface

Epetra represents the 32-bit limited Trilinos solver stack.

Tpetra represents the 64-bit compliant Trilinos solver stack.

Activities:

Remove instances of Epetra usage in favor of a single Trilinos interface
Retain the polymorphic solver interface design to allow future solver prototyping
Modify regression test to transition to Tpetra/Balos/ifpack2/Muelu.

Actuator line with FSI

Fully transition A/L to Nalu.

Activities:

Generalize machine definition to allow for a turbine definition
Generalized rotation
FSI coupling
Documentation
Testing at scale to ensure that underlying parallel model is adequate, i.e., ghosting collection of elements to A/L point

Does BCData for moving mesh surfaces sync with actual mesh motion?

Make sure that wall_velocity_bc and velocity_bc syncs properly with the actual time value. I think it does, however, someone make sure...

The uqSliding mesh is a good test as it uses wall functions. The following is also an interesting note in the code...

  // copy velocity_bc to velocity np1... (consider not doing this when a wall function is in use)
  CopyFieldAlgorithm *theCopyAlg
    = new CopyFieldAlgorithm(realm_, part,
			     theBcField, &velocityNp1,
			     0, nDim,
			     stk::topology::NODE_RANK);
  bcDataMapAlg_.push_back(theCopyAlg);

MasterElementHexSerial.hex8_* tests segfaults on OSX/LLVM

When running unit_tests, the MasterElementHexSerial tests segfault on OSX/LLVM compiler

Note: Google Test filter = MasterElementHexSerial.hex8_scs_*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from MasterElementHexSerial
[ RUN      ] MasterElementHexSerial.hex8_scs_interpolation
[pinaka:70500] *** Process received signal ***
[pinaka:70500] Signal: Segmentation fault: 11 (11)
[pinaka:70500] Signal code:  (0)
[pinaka:70500] Failing at address: 0x0
[pinaka:70500] [ 0] 0   libsystem_platform.dylib            0x00007fff814ac52a _sigtramp + 26
[pinaka:70500] [ 1] 0   ???                                 0xf386af8fb1f722c4 0x0 + 17547906029796664004
[pinaka:70500] [ 2] 0   unittestX                           0x000000010591d5b0 _ZN3stk4mesh8BulkDataD2Ev + 432
[pinaka:70500] [ 3] 0   unittestX                           0x000000010585eacf _ZN12_GLOBAL__N_150MasterElementHexSerial_hex8_scs_interpolation_TestD0Ev + 31
[pinaka:70500] [ 4] 0   unittestX                           0x0000000105fc79fe _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc + 78
[pinaka:70500] [ 5] 0   unittestX                           0x0000000105fc8861 _ZN7testing8TestInfo3RunEv + 385
[pinaka:70500] [ 6] 0   unittestX                           0x0000000105fc9023 _ZN7testing8TestCase3RunEv + 275
[pinaka:70500] [ 7] 0   unittestX                           0x0000000105fd0f0b _ZN7testing8internal12UnitTestImpl11RunAllTestsEv + 1083
[pinaka:70500] [ 8] 0   unittestX                           0x0000000105fd0930 _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc + 80
[pinaka:70500] [ 9] 0   unittestX                           0x0000000105fd088e _ZN7testing8UnitTest3RunEv + 174
[pinaka:70500] [10] 0   unittestX                           0x0000000105855a9e main + 94
[pinaka:70500] [11] 0   libdyld.dylib                       0x00007fff8cb825ad start + 1
[pinaka:70500] *** End of error message ***
Segmentation fault: 11

nalucfd / nalu Goto Github PK

nalu's Introduction

Follow us!

Referencing Nalu

Nalu Documentation

Building Nalu

Nightly Testing Status

Help and Questions

nalu's People

Contributors

Stargazers

Watchers

Forkers

nalu's Issues

256 cores

512 cores

1024 cores

Recommend Projects

Recommend Topics

Recommend Org