cans-world / cans Goto Github PK

A code for fast, massively-parallel direct numerical simulations (DNS) of canonical flows

License: MIT License

Fortran 86.61% Makefile 1.98% Python 9.66% Shell 0.21% MATLAB 0.95% Awk 0.58%

fluid-dynamics fluid-simulation computational-fluid-dynamics turbulence high-performance-computing cfd fortran gpu gpu-computing

cans's Introduction

Synopsis

CaNS (Canonical Navier-Stokes) is a code for massively-parallel numerical simulations of fluid flows. It aims at solving any fluid flow of an incompressible, Newtonian fluid that can benefit from a FFT-based solver for the second-order finite-difference Poisson equation in a 3D Cartesian grid. In two directions the grid is regular and the solver supports the following combination of (homogeneous) boundary conditions:

Neumann-Neumann
Dirichlet-Dirichlet
Neumann-Dirichlet
Periodic

In the third domain direction, the solver is more flexible as it uses Gauss elimination. There the grid can also be non-uniform (e.g. fine at the boundary and coarser in the center).

CaNS also allows for choosing an implicit temporal discretization of the momentum diffusion terms, either fully implicit or only along the last domain direction. This results in solving a 3D/1D Helmholtz equation per velocity component. In the fully implicit case, FFT-based solvers are also used, and the same options described above for pressure boundary conditions apply to the velocity.

Reference

P. Costa. A FFT-based finite-difference solver for massively-parallel direct numerical simulations of turbulent flows. Computers & Mathematics with Applications 76: 1853--1862 (2018). doi:10.1016/j.camwa.2018.07.034 [arXiv preprint]

News

[10/08/2023]: The input files dns.in and cudecomp.in have been replaced with the namelist file input.nml, which makes parsing of input files and extensions in more complex solvers based on CaNS simpler. See the updated docs/INFO_INPUT.md file for more details. Additionally, we have added a new input parameter, gtype to explicitly select the type of grid stretching function.

[03/02/2023]: The input file dns.in has been simplified to avoid a common source of confusion. Instead of prescribing uref, lref, and rey (reference velocity and length scales, and Reynolds number) to calculate the fluid viscosity as visc = uref*lref/rey, we directly prescribe the inverse of the viscosity, visci (visc = visci**(-1)), so all inputs are dimensional (see the updated docs/INFO_INPUT.md file). Note that visci has the same value as the flow Reynolds number for all files under examples, as uref and lref were always equal to 1. This change is backwards-incompatible - former input files should be updated from v2.2.0 onward!

[24/10/2022]: Option SINGLE_PRECISION_POISSON has been removed from the main branch. While solving the Poisson in lower precision equation yields excellent results for many benchmarks, several of these cases also perform well when the whole calculation is performed in lower precision (see #42). Since this mode introduces significant complexity, it will be removed from the main branch for now in favor of a more readable code, a decision that can be reconsidered in the future. This option can still be explored in v2.0.1, and is valuable for, e.g., setups with high Reynolds numbers and/or with extremely fine grids.

Major Update: `CaNS 2.0` is finally out! 🎉

CaNS 2.0 has many new features, being the result of the most significant revision effort undertaken so far. It includes major improvements in performance and robustness, and a fresh hardware-adaptive many-GPU parallelization using the cuDecomp library. See docs/CaNS-2.0.md for a detailed description of all new features. CaNS 2.0 has been tested and observed to run efficiently on some major GPU-accelerated clusters such as Perlmutter, Summit, and Marconi 100.

Features

Some features are:

Hybrid MPI/OpenMP parallelization
FFTW guru interface / cuFFT used for computing multi-dimensional vectors of 1D transforms
The right type of transformation (Fourier, cosine, sine, etc) is automatically determined form the input file
cuDecomp pencil decomposition library for hardware-adaptive distributed memory calculations on many GPUs
2DECOMP&FFT library used for performing global data transpositions on CPUs and some of the data I/O
GPU acceleration using OpenACC directives
A different canonical flow can be simulated just by changing the input files

Some examples of flows that this code can solve are:

periodic or developing channel
periodic or developing square duct
tri-periodic domain
lid-driven cavity

Motivation

This project aimed first at being a modern alternative to the well-known FISHPACK routines (Paul Swarztrauber & Roland Sweet, NCAR) for solving a three-dimensional Helmholtz equation. After noticing some works simulating canonical flows with iterative solvers -- when faster direct solvers could have been used instead -- it seemed natural to create a versatile tool and make it available. This code can be used as a first base code for which solvers for more complex flows can be developed (e.g. extensions with fictitious domain methods).

Method

The fluid flow is solved with a second-order finite difference pressure correction scheme, discretized in a MAC grid arrangement. Time is advanced with a three-step low storage Runge-Kutta scheme. Optionally, for increased stability at low Reynolds numbers, at the price of higher computational demand, the diffusion term can be treated implicitly. See the reference above for details.

Usage

Downloading CaNS

Since CaNS loads the external pencil decomposition libraries as Git Submodules, the repository should be cloned as follows:

git clone --recursive https://github.com/CaNS-World/CaNS

so the libraries are downloaded too. Alternatively, in case the repository has already been cloned without the Submodules (i.e., folders cuDecomp and 2decomp-fft under dependencies/ are empty), the following command can be used to update them:

git submodule update --init --recursive

Compilation

Prerequisites

The prerequisites for compiling CaNS are the following:

MPI
FFTW3/cuFFT library for CPU/GPU runs
The nvfortran compiler (for GPU runs)
NCCL and NVSHMEM (optional, may be exploited by the cuDecomp library)
OpenMP (optional)

In short

For most systems, CaNS can be compiled from the root directory with the following commands make libs && make, which will compile the 2DECOMP&FFT/cuDecomp libraries, and CaNS.

Detailed instructions

The Makefile in root directory is used to compile the code, and is expected to work out-of-the-box for most systems. The build.conf file in the root directory can be used to choose the Fortran compiler (MPI wrapper), and a few pre-defined profiles depending on the nature of the run (e.g., production vs debugging), and pre-processing options, see INFO_COMPILING.md for more details. Concerning the pre-processing options, the following are available:

DEBUG : performs some basic checks for debugging purposes
TIMING : wall-clock time per time step is computed
IMPDIFF : diffusion terms are integrated implicitly in time (thereby improving the stability of the numerical algorithm for viscous-dominated flows)
IMPDIFF_1D : same as above, but with implicit diffusion only along Z; for optimal parallel performance this option should be combined with PENCIL_AXIS=3
PENCIL_AXIS : sets the default pencil direction, one of [1,2,3] for [X,Y,Z]-aligned pencils; X-aligned is the default and should be optimal for all cases except for Z implicit diffusion, where using Z-pencils is recommended
SINGLE_PRECISION : calculation will be carried out in single precision (the default precision is double)
GPU : enable GPU-accelerated runs
USE_NVTX : enable NVTX tags for profiling

Input file

The input file input.nml sets the physical and computational parameters. In the examples/ folder are examples of input files for several canonical flows. See INFO_INPUT.md for a detailed description of the input file.

Files out1d.h90, out2d.h90 and out3d.h90 in src/ set which data are written in 1-, 2- and 3-dimensional output files, respectively. The code should be recompiled after editing out?d.h90 files.

Running the code

Run the executable with mpirun with a number of tasks complying to what has been set in the input file dns.in. Data will be written by default in a folder named data/, which must be located where the executable is run (by default in the run/ folder).

Visualizing field data

See INFO_VISU.md.

Contributing

We appreciate any contributions and feedback that can improve CaNS. If you wish to contribute to the tool, please get in touch with the maintainers or open an Issue in the repository / a thread in Discussions. Pull Requests are welcome, but please propose/discuss the changes in an linked Issue first.

Final notes

Please read the ACKNOWLEDGEMENTS, LICENSE files.

cans's People

Contributors

Stargazers

Watchers

Forkers

drwanghan cxb1993 hietwll manmeet3591 youngforever222 tcfdwqq lancelot-wang dexule nyue maxcuda xyuan liujiamingustc staminazhu ligaohua surajp92 sayin ufdup89 junhuapan luo-qin jhssyb irvise zhoutengye hpcl-hub tianyuma lkampoli nscapin hexiaoqiu lucienpan0903 5l1v3r1 supermangithu ajayrawat pokern hyy-001 kennellyt gabrieleboga khurrumsaleem mingtaoli s-m-hk syam-s jieyangchen7 azrael417 akshaypatil1994 navyamulumudi zimoliao dizbassarov cfd-xing wang-yi-xiang soaringxmc huangjd7 xanfus hashnayne-ahmed zelongyuan qingpinghuang zhaoyu-shi829 gianlupo

cans's Issues

Support simpler R2C/C2R FFTs to allow, e.g., for MKL FFT backend?

Since we already do this for cuFFT, it is not hard to support Intel MKL. Some refactoring may be needed, but perhaps worth doing it at a later stage.

EDIT: Issue originates from the DelftBlue supercomputer not having FFTW and recommending to use MKL instead (but Intel MKL does not support the FFTW guru interface, and even though the code will compile the output will be nonsensical).

Error while running

I have successfully compiled CaNS on Aaditya HPC at IITM Pune. I am getting the following error while running the code.
$ mpirun -n 4 ./cans
At line 100 of file main.f90 (unit = 99, file = '')
Fortran runtime error: File 'data/grid.bin' does not exist

Do we need the grid.bin file to run the program ?

Reconsider flow forcing?

When we brutally force the flow by computing a volume integral to determine the pressure gradient that is needed to add to the flow to sustain a certain bulk velocity, we incur in significant error when performing calculations in single precision, at least in gfortran and ifort; nvfortran still performs very well.

Instead, we can simply use the other mode that is available but not by default active in CaNS, of using a surface integral of the wall shear stresses to prescribe zero net acceleration. The only thing one should be cautious about is to make sure that the time integral of the wall shear stress is consistent with the time integration scheme (i.e., fully explicit, all implicit, or z-implicit).

Of course, in case we move towards simply prescribing zero net acceleration, the velf parameter should remain in dns.in, and will be used to prescribe the target bulk velocity in the initial condition.

Naming inconsistencies under mom.

Nit - Fix the notation under mom.f90 such that terms actually reflect how $\nabla \cdot \mathbf{u}\otimes\mathbf{u}$ looks like when expanded.

add `num_checkpoint_max` parameter

In order to maintain a small number of checkpoint files, introduce an num_checkpoint_max parameter, which overwrites the saved checkpoints every n time steps. Say n=5, the savings will proceed in time as follows:

1, 2, 3, 4, 5 ; 6->1, 7->2, 8->3, 9->4, 10->5; with -> meaning that files are overwritten.

One can still use the current symbolic link approach to have fld.bin pointing to the last saved files.

Thanks @arashalizadbanaei for the discussion!

consider not shipping subset of FFTW module in the repo

Instead of shipping it with the CaNS repo, one could simply use the one provided in the FFTW include directory...

consider pencil <-> slab data redistribution to avoid two all-to-all collectives

This avoids, whenever possible, two all-to-all collectives in the Poisson solver, while still allowing for keeping a default 2D domain decomposition.

Steps:

Port the init_transpose_slab and transpose_slab from SNaC;
Optionally, draft a MPI point-to-point alternative to the MPI_Alltoallw implementation, thinking of GPU-GPU communications. It is quick and straightforward, anyway.

Reconsider some input parameters under `dns.in`

Often the input file dns.in draws some confusion.

CaNS solves the dimensional Navier-Stokes equations, and most input parameters should have consistent dimensions (e.g., [meter] for length and [seconds] for time should imply [meters/second] for velocity.

However, for convenience, we also defined three input parameters denoted uref (reference velocity), lref (reference length), and rey (Reynolds number), because in most simple use cases of CaNS one is more interested in defining a Reynolds number.

These three variables are just used to calculate the viscosity visc = lref*uref/rey, which also should have consistent dimensions [meters^2/s].

There are however a few issues with this approach and current implementation:

it (understandably) confuses new users about the inner workings of CaNS ("is it solving dimensional or non-dimensional equations?");
uref and lref are being used outside the calculation of viscosity, under iniftlow.f90, to set the initial conditions, which can be an issue if uref and lref are not consistent with other flow parameters.

To settle this issue, it may be better to:

change dns.in so that it takes the viscosity (or the inverse of the viscosity, as in many example setups it should match the corresponding Reynolds number);
have ubulk and lref under initflow.f90 consistently computed from the relevant input parameters (such as boundary conditions, mean pressure gradient, or bulk forced bulk velocity);
consider implementing the changes in the parsing of dns.in under param.f90 such that they are backward compatible. I.e., if an "old" input file format is prescribed, the code can still compute visc and run?;
document the current and possible future approach better under INFO_INPUT.md.

Revise OpenMP statements?

E.g., perhaps we could add COLLAPSE(3) statements for the loops, or simply wait until DO CONCURRENT is ready for prime time (and then minimize OpenACC & OpenMP directives)?

Improve description of list of initial conditions

The various possible initial conditions under initflow.f90, some of them mentioned under docs/INFO_INPUT.md, need to have their description updated.

Thanks to Akshay Patil for the feedback!

Change order of multiplication in interpolations in advection terms

Such that all (i,j,k) values appear aligned the left-hand side. It does not matter but it helps readability and avoids bugs.

Using MPI-3 shared memory programming for increasing parallel performance of halo exchange

Example: https://software.intel.com/en-us/articles/an-introduction-to-mpi-3-shared-memory-programming

Add more options to `initgrid.f90`

Not all initial open channel setups' initial conditions are covered under initgrid.f90, some of them should be added. Ideally, the type of grid stretching should not be bound to the initial condition option, but it is not bad to keep things as they are right now and just add the other setups.

Documentation for visualization tools ?

Hi,

There are some Python code which does visualization.

I would be good to show an example usage in the README.md

There is a xdmf generation fortran code. The example shell script does not say very much about how it can be used.

I hope to ingest the computed data files *.out and do 3d visualization in applications like Paraview and Houdini

Cheers

FFTW3 fortran

Hi,

Is there a specific version of FFTW3 I should use ?

I obtained the latest but when I build it, it seems the fortran bindings are missing when I link it to CaNS

Cheers

io_field_hdf5

In load.f90, is subroutine io_field_hdf5 complete? I am having an issue compiling while enabling this subroutine. My system has modules loaded for HDF5, i added "use hdf5" also; yet, i am encountering errors. I would really appreciate if further reading for HDF5 is given in readme if possible.

Simplify OpenMP directives

Both !$OMP END DO and PRIVATE clauses with loop indexes can be safely removed.

micro polar simulation

hi,
can I use this code for micropolar simulations?
thank you.

Add `SKIP_IO` pre-processor maco.

We often need to activate a feature like this whenever we need to do some benchmarking at scale, for which we do not care about large field data I/O.

Tasks:

skip out2d writing;
skip out3d writing;
skip checkpointing.

multi-block implementation for more complex geometries

Implementing a multi-block approach to handle more complex geometries (e.g. a T-Junction) while still using the FFT-based solver. This could be achieved by using a block cyclic reduction method to solve the resulting tri-diagonal system. Collaborators for helping out with this feature are very welcome!

out-of-bounds error in the XDMF generator

A corner case triggered when less than three scalar fields are outputted by the simulation. The fix is easy and will come in a linked PR soon.

Implicit `firstprivate` generated when compiling with OpenACC

Since 23.3 (?) nvfortran writes Generating implicit firstprivate(...). Though this is not a problem, to be consistent with our "implicit is better than implicit" philosophy. It would be good to explicitly declare these with firstprivate attribute (even though the standard mentions that this is OpenACC's default for scalar values).

Update cuDecomp version.

In the future, this should probably be automatized using GH's dependabot.

Consider adding test of kinetic energy preservation in the inviscid limit.

Example file for this test:

&dns
ng(1:3) = 32, 32, 32
l(1:3)  = 6.283185307179586, 6.283185307179586, 6.283185307179586
gtype = 1, gr = 0.
cfl = 0.95, dtmin = 1.e-3
visci = 1.
inivel = 'tgv'
is_wallturb = F
nstep = 100, time_max = 100., tw_max = 0.1
stop_type(1:3) = F, T, F
restart = F, is_overwrite_save = T, nsaves_max = 0
icheck = 10, iout0d = 10, iout1d = 100, iout2d = 500, iout3d = 1000, isave = 1000
cbcvel(0:1,1:3,1) = 'P','P',  'P','P',  'P','P'
cbcvel(0:1,1:3,2) = 'P','P',  'P','P',  'P','P'
cbcvel(0:1,1:3,3) = 'P','P',  'P','P',  'P','P'
cbcpre(0:1,1:3)   = 'P','P',  'P','P',  'P','P'
bcvel(0:1,1:3,1) =  0.,0.,   0.,0.,   0.,0.
bcvel(0:1,1:3,2) =  0.,0.,   0.,0.,   0.,0.
bcvel(0:1,1:3,3) =  0.,0.,   0.,0.,   0.,0.
bcpre(0:1,1:3)   =  0.,0.,   0.,0.,   0.,0.
bforce(1:3) = 0., 0., 0.
is_forced(1:3) = F, F, F
velf(1:3) = 0., 0., 0.
dims(1:2) = 0, 0
\

&cudecomp
cudecomp_t_comm_backend = 0, cudecomp_is_t_enable_nccl = T, cudecomp_is_t_enable_nvshmem = T
cudecomp_h_comm_backend = 0, cudecomp_is_h_enable_nccl = T, cudecomp_is_h_enable_nvshmem = T
\

reconsider mixed-precision mode?

So, while the mixed-precision mode yields excellent results for many benchmarks, it results in a more complex code that is harder to follow for the average user. Simply performing the whole calculation in lower precision seems to do a decent job for many setups, so mixed-precision mode not is crucial for most cases.

Hence, in favor of a more readable code, we removed this feature from the main branch, a decision that can be reconsidered in the future. This option can still be explored in v2.0.1, and is valuable for very high Reynolds numbers or other setups with extremely fine grids.

`dtmin` in the input file should be called `dtmax`

Time to fix this long-standing issue :), but the new input name should not conflict with the dtmax variable computed from the stability requirements.

change `DEFAULT(NONE)` to `DEFAULT(SHARED) in OpenMP directives

Since we anyway have to define the private locality correctly with OpenACC, we can do the same with OpenMP to avoid more directives.

Add option in `load` to write a checkpoint file per MPI task.

It may be useful in cases where collective I/O is not performing well.

To be added under load.f90 as a load_local subroutine.

NVTX branch needs to be synched with developments under `main`.

Easy but needs some conflict resolution. Needs something like git cherry-pick dc6126f..d62268a on the with-nvtx branch.

add function to calculate the mean wall scalar flux

Consistently with momentum, in case one wants to force constant species mass or heat flux.

have the default decomposition changed to x-aligned pencils

This will save 2 transposes in the Poisson solver.

I should credit Naoki Hori who visited KTH in 2019 for suggesting this a while ago.

Add proper formatting strings for text output.

If double precision: (*(es24.16e3,1x))
If single precision: (*(es15.8e2,1x))

See: https://github.com/fortran-lang/stdlib/blob/57cfaf011ee11001b35bbdde43841a5a1f8cf443/src/stdlib_io.fypp#L23-L32

Input namelist files name fixing.

Specifically, one needs to replace occurrences as indicated below (spaces in the indexing).

< bcpre(0:1,1:3  ) =  0.,0.,   0.,0.,   0.,0.

> bcpre(0:1,1:3) =  0.,0.,   0.,0.,   0.,0.

add cloning instructions in the `README.md` file

It should be added in the README.md file that the project should be cloned with

git clone --recursive https://github.com/CaNS-World/CaNS

for the submodules to be downloaded along with the repo, or with the following command

git submodule update --init --recursive

in case the repository has already been cloned without submodules.

Temporal boundary layer (`tbl`) case initial condition should feature noise by default.

Since this initial condition needs noise for the transition to turbulence to be triggered.

Exec format error

Good evening,

I have a problem while running

./cans

I get by output

I currently use WSL2 and 20.04.4 LTS Ubuntu version, both of them are 64bit, and so it is the cans file.
Could you please help me out?
Thank you so much,
Tommaso

I/O update

Consider:

writing one binary field per saved scalar field;
adding an hdf5 backend;
and if that is done, update the python xdmf writer too to support the new format (n.b.: time and step number metadata can be added as attributes).
embed also x_g, y_g, and z_g in the HDF5 file too under a grid/ group.
and the python xdmf, hdf5 writer could also save the binary grids as HDF5.