Coder Social home page Coder Social logo

cans-world / cans Goto Github PK

View Code? Open in Web Editor NEW
190.0 16.0 66.0 862 KB

A code for fast, massively-parallel direct numerical simulations (DNS) of canonical flows

License: MIT License

Fortran 86.61% Makefile 1.98% Python 9.66% Shell 0.21% MATLAB 0.95% Awk 0.58%
fluid-dynamics fluid-simulation computational-fluid-dynamics turbulence high-performance-computing cfd fortran gpu gpu-computing

cans's Introduction

Synopsis

CaNS (Canonical Navier-Stokes) is a code for massively-parallel numerical simulations of fluid flows. It aims at solving any fluid flow of an incompressible, Newtonian fluid that can benefit from a FFT-based solver for the second-order finite-difference Poisson equation in a 3D Cartesian grid. In two directions the grid is regular and the solver supports the following combination of (homogeneous) boundary conditions:

  • Neumann-Neumann
  • Dirichlet-Dirichlet
  • Neumann-Dirichlet
  • Periodic

In the third domain direction, the solver is more flexible as it uses Gauss elimination. There the grid can also be non-uniform (e.g. fine at the boundary and coarser in the center).

CaNS also allows for choosing an implicit temporal discretization of the momentum diffusion terms, either fully implicit or only along the last domain direction. This results in solving a 3D/1D Helmholtz equation per velocity component. In the fully implicit case, FFT-based solvers are also used, and the same options described above for pressure boundary conditions apply to the velocity.

Reference

P. Costa. A FFT-based finite-difference solver for massively-parallel direct numerical simulations of turbulent flows. Computers & Mathematics with Applications 76: 1853--1862 (2018). doi:10.1016/j.camwa.2018.07.034 [arXiv preprint]

News

[10/08/2023]: The input files dns.in and cudecomp.in have been replaced with the namelist file input.nml, which makes parsing of input files and extensions in more complex solvers based on CaNS simpler. See the updated docs/INFO_INPUT.md file for more details. Additionally, we have added a new input parameter, gtype to explicitly select the type of grid stretching function.

[03/02/2023]: The input file dns.in has been simplified to avoid a common source of confusion. Instead of prescribing uref, lref, and rey (reference velocity and length scales, and Reynolds number) to calculate the fluid viscosity as visc = uref*lref/rey, we directly prescribe the inverse of the viscosity, visci (visc = visci**(-1)), so all inputs are dimensional (see the updated docs/INFO_INPUT.md file). Note that visci has the same value as the flow Reynolds number for all files under examples, as uref and lref were always equal to 1. This change is backwards-incompatible - former input files should be updated from v2.2.0 onward!

[24/10/2022]: Option SINGLE_PRECISION_POISSON has been removed from the main branch. While solving the Poisson in lower precision equation yields excellent results for many benchmarks, several of these cases also perform well when the whole calculation is performed in lower precision (see #42). Since this mode introduces significant complexity, it will be removed from the main branch for now in favor of a more readable code, a decision that can be reconsidered in the future. This option can still be explored in v2.0.1, and is valuable for, e.g., setups with high Reynolds numbers and/or with extremely fine grids.

Major Update: CaNS 2.0 is finally out! ๐ŸŽ‰

CaNS 2.0 has many new features, being the result of the most significant revision effort undertaken so far. It includes major improvements in performance and robustness, and a fresh hardware-adaptive many-GPU parallelization using the cuDecomp library. See docs/CaNS-2.0.md for a detailed description of all new features. CaNS 2.0 has been tested and observed to run efficiently on some major GPU-accelerated clusters such as Perlmutter, Summit, and Marconi 100.

Features

Some features are:

  • Hybrid MPI/OpenMP parallelization
  • FFTW guru interface / cuFFT used for computing multi-dimensional vectors of 1D transforms
  • The right type of transformation (Fourier, cosine, sine, etc) is automatically determined form the input file
  • cuDecomp pencil decomposition library for hardware-adaptive distributed memory calculations on many GPUs
  • 2DECOMP&FFT library used for performing global data transpositions on CPUs and some of the data I/O
  • GPU acceleration using OpenACC directives
  • A different canonical flow can be simulated just by changing the input files

Some examples of flows that this code can solve are:

  • periodic or developing channel
  • periodic or developing square duct
  • tri-periodic domain
  • lid-driven cavity

Motivation

This project aimed first at being a modern alternative to the well-known FISHPACK routines (Paul Swarztrauber & Roland Sweet, NCAR) for solving a three-dimensional Helmholtz equation. After noticing some works simulating canonical flows with iterative solvers -- when faster direct solvers could have been used instead -- it seemed natural to create a versatile tool and make it available. This code can be used as a first base code for which solvers for more complex flows can be developed (e.g. extensions with fictitious domain methods).

Method

The fluid flow is solved with a second-order finite difference pressure correction scheme, discretized in a MAC grid arrangement. Time is advanced with a three-step low storage Runge-Kutta scheme. Optionally, for increased stability at low Reynolds numbers, at the price of higher computational demand, the diffusion term can be treated implicitly. See the reference above for details.

Usage

Downloading CaNS

Since CaNS loads the external pencil decomposition libraries as Git Submodules, the repository should be cloned as follows:

git clone --recursive https://github.com/CaNS-World/CaNS

so the libraries are downloaded too. Alternatively, in case the repository has already been cloned without the Submodules (i.e., folders cuDecomp and 2decomp-fft under dependencies/ are empty), the following command can be used to update them:

git submodule update --init --recursive

Compilation

Prerequisites

The prerequisites for compiling CaNS are the following:

  • MPI
  • FFTW3/cuFFT library for CPU/GPU runs
  • The nvfortran compiler (for GPU runs)
  • NCCL and NVSHMEM (optional, may be exploited by the cuDecomp library)
  • OpenMP (optional)

In short

For most systems, CaNS can be compiled from the root directory with the following commands make libs && make, which will compile the 2DECOMP&FFT/cuDecomp libraries, and CaNS.

Detailed instructions

The Makefile in root directory is used to compile the code, and is expected to work out-of-the-box for most systems. The build.conf file in the root directory can be used to choose the Fortran compiler (MPI wrapper), and a few pre-defined profiles depending on the nature of the run (e.g., production vs debugging), and pre-processing options, see INFO_COMPILING.md for more details. Concerning the pre-processing options, the following are available:

  • DEBUG : performs some basic checks for debugging purposes
  • TIMING : wall-clock time per time step is computed
  • IMPDIFF : diffusion terms are integrated implicitly in time (thereby improving the stability of the numerical algorithm for viscous-dominated flows)
  • IMPDIFF_1D : same as above, but with implicit diffusion only along Z; for optimal parallel performance this option should be combined with PENCIL_AXIS=3
  • PENCIL_AXIS : sets the default pencil direction, one of [1,2,3] for [X,Y,Z]-aligned pencils; X-aligned is the default and should be optimal for all cases except for Z implicit diffusion, where using Z-pencils is recommended
  • SINGLE_PRECISION : calculation will be carried out in single precision (the default precision is double)
  • GPU : enable GPU-accelerated runs
  • USE_NVTX : enable NVTX tags for profiling

Input file

The input file input.nml sets the physical and computational parameters. In the examples/ folder are examples of input files for several canonical flows. See INFO_INPUT.md for a detailed description of the input file.

Files out1d.h90, out2d.h90 and out3d.h90 in src/ set which data are written in 1-, 2- and 3-dimensional output files, respectively. The code should be recompiled after editing out?d.h90 files.

Running the code

Run the executable with mpirun with a number of tasks complying to what has been set in the input file dns.in. Data will be written by default in a folder named data/, which must be located where the executable is run (by default in the run/ folder).

Visualizing field data

See INFO_VISU.md.

Contributing

We appreciate any contributions and feedback that can improve CaNS. If you wish to contribute to the tool, please get in touch with the maintainers or open an Issue in the repository / a thread in Discussions. Pull Requests are welcome, but please propose/discuss the changes in an linked Issue first.

Final notes

Please read the ACKNOWLEDGEMENTS, LICENSE files.

cans's People

Contributors

gabrieleboga avatar gianlupo avatar nscapin avatar p-costa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cans's Issues

Error while running

Hi

I have successfully compiled CaNS on Aaditya HPC at IITM Pune. I am getting the following error while running the code.
$ mpirun -n 4 ./cans
At line 100 of file main.f90 (unit = 99, file = '')
Fortran runtime error: File 'data/grid.bin' does not exist

Do we need the grid.bin file to run the program ?

Reconsider flow forcing?

When we brutally force the flow by computing a volume integral to determine the pressure gradient that is needed to add to the flow to sustain a certain bulk velocity, we incur in significant error when performing calculations in single precision, at least in gfortran and ifort; nvfortran still performs very well.

Instead, we can simply use the other mode that is available but not by default active in CaNS, of using a surface integral of the wall shear stresses to prescribe zero net acceleration. The only thing one should be cautious about is to make sure that the time integral of the wall shear stress is consistent with the time integration scheme (i.e., fully explicit, all implicit, or z-implicit).

Of course, in case we move towards simply prescribing zero net acceleration, the velf parameter should remain in dns.in, and will be used to prescribe the target bulk velocity in the initial condition.

Naming inconsistencies under mom.

Nit - Fix the notation under mom.f90 such that terms actually reflect how $\nabla \cdot \mathbf{u}\otimes\mathbf{u}$ looks like when expanded.

add `num_checkpoint_max` parameter

In order to maintain a small number of checkpoint files, introduce an num_checkpoint_max parameter, which overwrites the saved checkpoints every n time steps. Say n=5, the savings will proceed in time as follows:

1, 2, 3, 4, 5 ; 6->1, 7->2, 8->3, 9->4, 10->5; with -> meaning that files are overwritten.

One can still use the current symbolic link approach to have fld.bin pointing to the last saved files.

Thanks @arashalizadbanaei for the discussion!

consider pencil <-> slab data redistribution to avoid two all-to-all collectives

This avoids, whenever possible, two all-to-all collectives in the Poisson solver, while still allowing for keeping a default 2D domain decomposition.

Steps:

  • Port the init_transpose_slab and transpose_slab from SNaC;
  • Optionally, draft a MPI point-to-point alternative to the MPI_Alltoallw implementation, thinking of GPU-GPU communications. It is quick and straightforward, anyway.

Reconsider some input parameters under `dns.in`

Often the input file dns.in draws some confusion.

CaNS solves the dimensional Navier-Stokes equations, and most input parameters should have consistent dimensions (e.g., [meter] for length and [seconds] for time should imply [meters/second] for velocity.

However, for convenience, we also defined three input parameters denoted uref (reference velocity), lref (reference length), and rey (Reynolds number), because in most simple use cases of CaNS one is more interested in defining a Reynolds number.

These three variables are just used to calculate the viscosity visc = lref*uref/rey, which also should have consistent dimensions [meters^2/s].

There are however a few issues with this approach and current implementation:

  • it (understandably) confuses new users about the inner workings of CaNS ("is it solving dimensional or non-dimensional equations?");
  • uref and lref are being used outside the calculation of viscosity, under iniftlow.f90, to set the initial conditions, which can be an issue if uref and lref are not consistent with other flow parameters.

To settle this issue, it may be better to:

  • change dns.in so that it takes the viscosity (or the inverse of the viscosity, as in many example setups it should match the corresponding Reynolds number);
  • have ubulk and lref under initflow.f90 consistently computed from the relevant input parameters (such as boundary conditions, mean pressure gradient, or bulk forced bulk velocity);
  • consider implementing the changes in the parsing of dns.in under param.f90 such that they are backward compatible. I.e., if an "old" input file format is prescribed, the code can still compute visc and run?;
  • document the current and possible future approach better under INFO_INPUT.md.

Revise OpenMP statements?

E.g., perhaps we could add COLLAPSE(3) statements for the loops, or simply wait until DO CONCURRENT is ready for prime time (and then minimize OpenACC & OpenMP directives)?

Add more options to `initgrid.f90`

Not all initial open channel setups' initial conditions are covered under initgrid.f90, some of them should be added. Ideally, the type of grid stretching should not be bound to the initial condition option, but it is not bad to keep things as they are right now and just add the other setups.

Documentation for visualization tools ?

Hi,

There are some Python code which does visualization.

I would be good to show an example usage in the README.md

There is a xdmf generation fortran code. The example shell script does not say very much about how it can be used.

I hope to ingest the computed data files *.out and do 3d visualization in applications like Paraview and Houdini

Cheers

FFTW3 fortran

Hi,

Is there a specific version of FFTW3 I should use ?

I obtained the latest but when I build it, it seems the fortran bindings are missing when I link it to CaNS

Cheers

io_field_hdf5

In load.f90, is subroutine io_field_hdf5 complete? I am having an issue compiling while enabling this subroutine. My system has modules loaded for HDF5, i added "use hdf5" also; yet, i am encountering errors. I would really appreciate if further reading for HDF5 is given in readme if possible.

Add `SKIP_IO` pre-processor maco.

We often need to activate a feature like this whenever we need to do some benchmarking at scale, for which we do not care about large field data I/O.

Tasks:

  • skip out2d writing;
  • skip out3d writing;
  • skip checkpointing.

multi-block implementation for more complex geometries

Implementing a multi-block approach to handle more complex geometries (e.g. a T-Junction) while still using the FFT-based solver. This could be achieved by using a block cyclic reduction method to solve the resulting tri-diagonal system. Collaborators for helping out with this feature are very welcome!

Implicit `firstprivate` generated when compiling with OpenACC

Since 23.3 (?) nvfortran writes Generating implicit firstprivate(...). Though this is not a problem, to be consistent with our "implicit is better than implicit" philosophy. It would be good to explicitly declare these with firstprivate attribute (even though the standard mentions that this is OpenACC's default for scalar values).

Consider adding test of kinetic energy preservation in the inviscid limit.

Example file for this test:

&dns
ng(1:3) = 32, 32, 32
l(1:3)  = 6.283185307179586, 6.283185307179586, 6.283185307179586
gtype = 1, gr = 0.
cfl = 0.95, dtmin = 1.e-3
visci = 1.
inivel = 'tgv'
is_wallturb = F
nstep = 100, time_max = 100., tw_max = 0.1
stop_type(1:3) = F, T, F
restart = F, is_overwrite_save = T, nsaves_max = 0
icheck = 10, iout0d = 10, iout1d = 100, iout2d = 500, iout3d = 1000, isave = 1000
cbcvel(0:1,1:3,1) = 'P','P',  'P','P',  'P','P'
cbcvel(0:1,1:3,2) = 'P','P',  'P','P',  'P','P'
cbcvel(0:1,1:3,3) = 'P','P',  'P','P',  'P','P'
cbcpre(0:1,1:3)   = 'P','P',  'P','P',  'P','P'
bcvel(0:1,1:3,1) =  0.,0.,   0.,0.,   0.,0.
bcvel(0:1,1:3,2) =  0.,0.,   0.,0.,   0.,0.
bcvel(0:1,1:3,3) =  0.,0.,   0.,0.,   0.,0.
bcpre(0:1,1:3)   =  0.,0.,   0.,0.,   0.,0.
bforce(1:3) = 0., 0., 0.
is_forced(1:3) = F, F, F
velf(1:3) = 0., 0., 0.
dims(1:2) = 0, 0
\

&cudecomp
cudecomp_t_comm_backend = 0, cudecomp_is_t_enable_nccl = T, cudecomp_is_t_enable_nvshmem = T
cudecomp_h_comm_backend = 0, cudecomp_is_h_enable_nccl = T, cudecomp_is_h_enable_nvshmem = T
\

reconsider mixed-precision mode?

So, while the mixed-precision mode yields excellent results for many benchmarks, it results in a more complex code that is harder to follow for the average user. Simply performing the whole calculation in lower precision seems to do a decent job for many setups, so mixed-precision mode not is crucial for most cases.

Hence, in favor of a more readable code, we removed this feature from the main branch, a decision that can be reconsidered in the future. This option can still be explored in v2.0.1, and is valuable for very high Reynolds numbers or other setups with extremely fine grids.

Input namelist files name fixing.

Specifically, one needs to replace occurrences as indicated below (spaces in the indexing).

< bcpre(0:1,1:3  ) =  0.,0.,   0.,0.,   0.,0.

> bcpre(0:1,1:3) =  0.,0.,   0.,0.,   0.,0.

add cloning instructions in the `README.md` file

It should be added in the README.md file that the project should be cloned with

git clone --recursive https://github.com/CaNS-World/CaNS

for the submodules to be downloaded along with the repo, or with the following command

git submodule update --init --recursive

in case the repository has already been cloned without submodules.

Exec format error

Good evening,

I have a problem while running

./cans

I get by output
erroreCaNS
I currently use WSL2 and 20.04.4 LTS Ubuntu version, both of them are 64bit, and so it is the cans file.
Could you please help me out?
Thank you so much,
Tommaso

I/O update

Consider:

  • writing one binary field per saved scalar field;
  • adding an hdf5 backend;
  • and if that is done, update the python xdmf writer too to support the new format (n.b.: time and step number metadata can be added as attributes).
  • embed also x_g, y_g, and z_g in the HDF5 file too under a grid/ group.
  • and the python xdmf, hdf5 writer could also save the binary grids as HDF5.

Errors compiling for Cray machines

ftn -cpp -O2 -DDEBUG -DTIMING -c chkdt.f90

module mod_chkdt
^
ftn-855 crayftn: ERROR MOD_CHKDT, File = chkdt.f90, Line = 1, Column = 8
The compiler has detected errors in module "MOD_CHKDT". No module information file will be created for this module.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.