stellagk / stella Goto Github PK

stella is a flux tube gyrokinetic code for micro-stability and turbulence simulations of strongly magnetised plasma.

Home Page: https://stellagk.github.io/stella/

License: Other

Makefile 0.49% Fortran 54.72% Emacs Lisp 0.07% TeX 3.60% Perl 0.55% Python 38.66% C 0.06% Shell 0.38% CMake 1.35% BitBake 0.02% GLSL 0.11%

plasma-physics

stella's People

Contributors

Stargazers

Watchers

Forkers

d7919 sstroteich alexvboe hannethienpondt antoniog-jerez richardnies daringli fokker-planck soggyomelette jychoi-hpc

stella's Issues

Namelist documentation from database

I know GS2 has a very sophisticated way of autogenerating the namelist documentation on the website. I don't think we need something so complex... maybe we can store all the namelist info in a JSON or XML which can then be translated to HTML on the documentation website somehow? Peter, do you know of any standard ways of doing this? I don't mind writing up the JSON/XML file.

Remove USE_MPI as a makefile option

Stella should be compiled with MPI, so this Makefile option seems superfluous. It would certainly help clean-up some ifdefs in the preprocessor file. Of course, this may all be mooted if we decide to move to cmake.

volume average in radially global stella

Currently, stella uses dl_over_b (element of the flux surface average) to calculate the volume averages of the fluxes for the diagnostics. This approach works in local stella because dl_over_b is radially homogeneous, but this will actually fail in the global setting when it does have profile variation. This will need to be fixed.

mpi_win_free error due to apparent bug in finish_sources subroutine

When running on 48 cores of a single node on Marconi, standard nonlinear simulations run from a noise initial condition make it to the end of all time steps but crash during an mpi_win_free call in the finish_sources subroutine. The error message is:

Fatal error in MPI_Win_free: Invalid MPI_Win, error stack:
MPI_Win_free(147): MPI_Win_free(win=0x16ab1c0) failed
MPI_Win_free(75).: Invalid MPI_Win

Numerical instability in nonlinear electrostatic "big box" ETG simulations

Performing simulations recently on Marconi, I noticed an unusual numerical instability. With commit 8eba66a (close enough to the master of stella to be of interest, I hope), I was testing the performance of stella with multiscale kx ky grids. I noticed that after several restarts the simulations grind to a halt with ever reducing time steps and increasing amplitudes. The potential spectrum looks like the attached figure:
comparephikytlow-power-es0-tprim-1.75-0.pdf
the input file for this case is here with other output files (but not the netcdf files):
multiscale-case.zip

To determine if this fault results from the multiscale nature of the simulation, or for some other reason, I drastically reduced the kx ky resolution with the following results. We still find a numerical instability:
comparephikytlow-power-es-tprim-1.75-0-full-range.pdf

This output and input files for this case are here:

singlescale-case.zip

Comments and further investigations of this issue are welcomed.

adiabatic_option deprecation notice

Recently, adiabatic_option has been moved from dist_fn_knobs to zgrid_parameters. This makes old input files obsolete, so we may want to either throw a deprecation notice/error if dist_fn_knobs is found, or perhaps throw an error/deprecation notice if adiabatic_option is not found. (This can be removed down the line)

Interface with pyrokinetics

Pyrokinetics has settled down in terms of structure so now would be a good time to interface Stella with it. We are writing several publications based on pyro so it would be nice to get stella in before they are submitted.

Swap redistribute.f90 with MPI_alltoall

The Barcelona team mentioned that the redistribution routines uses for the mirror term might be better performed using the MPI_alltoall library functions, rather than the routines in redistribute.f90. I don't actually have a feel for how much of an improvement we can get, but I do recall them saying that a good chunk of the redistribute time was actually latency, rather than transmission, so this change could be worth a shot.

NetCDF errors occur when fphi=0

If a simulation uses fphi=0, errors occur when writing to the .out.nc file. From the .error file we see the following errors:

ERROR in netcdf_error: NetCDF: Variable not found in varid: 0, ncid: 65536
ERROR: NetCDF: Variable not found

The reason for this is that, within stella_io, the IDs for phi-related diagnostics (e.g. phi2_id , phi_vs_t_id) are only set if fphi != 0. However, diagnose_stella will try to write diagnostics like phi2 even if fphi=0; this causes an error because the ID is unset. (NB setting fphi=0 is a rare occurrence.)

Proposed solution: either (1) Always set the stella_io IDs, even if fphi=0, or (2) only try to write phi-related diagnostics if fphi=1. (1) seems more consistent with the rest of stella (we still allocate and set phi if fphi=0 for example, and apar is allocated, set and written to plaintext diagnostics even if fapar=0.)

drifts_implicit = .true. coupled with rhostar > 0 observed to give numerical instability in a tokamak case

at some point in the last year, the behaviour of the code changed so that rhostar > 0 led to a phase shift automatically being applied within the twist-and-shift BC. This was done to explore effects of radial profile variation + low magnetic shear cases where periodic BCs were used. I've observed for some fairly vanilla tokamak cases that using rhostar > 0 with drifts_implicit = .true. led to numerical instability. not sure if this is something wrong with the implementation of the phase shift itself, or if the new code with drifts_implicit = .true. somehow does not account for it. for now, I plan to add a warning to the code output whenever these two options are selected together.

Array dimensions in dissipation.f90 too large for older versions of fortran

In dissipation.f90, there are some arrays which have more than 7 dimensions. This can cause compilation errors, since having an array of rank>7 is only allowed in Fortran 2008 onwards. It is interesting to note that one of the dimensions is "ia", currently hardcoded to 1 so currently redundant (unless it's a placeholder for some future work) e.g. in line ~2100 of dissipation.f90:

        ia = 1

        allocate (deltaj(0:lmax,0:jmax,nspec,nspec,nvpa,nmu,ia,-nzgrid:nzgrid))

Proposed fix: Either remove the "ia" dimensions from these large arrays or include the specification for Fortran >= 2008 in the README

Cmake fails to work on Marconi

After merging the 'external git_version' pull request, cmake fails on Marconi with these messages:

/marconi/home/userexternal/dstonge0/Codes/stella/externals/git_version/src/git_version_impl.F90(14): error #6337: This is not a valid lvalue or rvalue. [GIT_VERSION]
length = min(max_length, len(GIT_VERSION))
---------------------------------^

Apparently the macros aren't being resolved correctly.

Unstable zonal flows for kinetic electron runs when zed_upwind > 0

I've found that large-scale zonal flows are numerically unstable when using kinetic electrons and the implicit linear solve. (There's actually residual growth in the adiabatic case too, but it's sufficiently small that you really need to look for it). This is what I know so far:

zed_upwind has to be finite. If things are centered, there is no instability. The growth rate of the instability also depends on zed_upwind
increasing the number of zed-grid points (decreasing delzed) reduces the growth rate
There is no instability in the explicit case
this doesn't appear to happen in gs2
This is a large scale instability. For small electron scale zonal flows, there is no instability. The growth rate converges to some value as kx -> 0, so I suspect some stabilization from the Bessel functions.
If I remove the spatial dependency for Maxwell_mu by using bmag at the outboard midplane in vpamu_grids.f90, this problem goes away.

The last point seems to suggest that it's a problem with how we're centering the maxwellian, but I can't seem to find the source of the bug! Note that there is no problem if gradpar varies in space, which is even more confusing, since gradpar and maxwell_mu are handled the same way.

There could be a typo in response_matrix.fpp where the centering is handled, but I can't seem to find it. Perhaps I need another set of eyes.

Attached is a zip file containing a minimal input file, which contains only parallel streaming, nothing else.

test_case.zip

Project improvments

This is a bit of a brain dump of various features, tasks, enhancements, and so on, that I think would be useful to do on stella to improve the sustainability of the project. There's a fair bit here, and you probably don't want everything, at least to begin with :)

A few of these are things I could just get on with and implement myself (CMake, for example), but things like writing documentation will definitely need some input from the maintainers. It would probably be useful if @DenSto @mabarnes could either veto or prioritise some features.

Documentation

Expand README to include more information on what stella does, features, etc.
- readme.so is a nice resource for this, has lots of suggested sections
- This is fairly easy to do, and a nice README is a good advert for the project
In-source documentation comments
- Useful even if not extracted and built into website
- Big task to do on existing project!
- Have a policy of new code requires documentation
- Have a habit of adding documentation when touching existing code
Automated documentation building with e.g. FORD
- Needs to be hosted somewhere, could be done with Github Pages (see website below)
- Namelist docs generation needs some custom FORD code, but could be done without GS2's config types
User manual
- Probably most of the GS2 First steps page is useful
- Could steal a fair bit from published papers?
- Examples?
Developer manual
- How to contribute to project, write tests and documentation
Website with e.g. Github Pages
- Could just be landing page and direct to repo, but can be a good place to advertise papers, demonstrate nice examples, etc.
Releases
- Releases on e.g. Zenodo are useful for citations via DOIs (e.g. https://doi.org/10.5281/zenodo.2551066)
- Semantic versions are good way to indicate incompatible versions

Build system

Add CMake build system: I think this is overall a useful addition, and I'd be happy to be the maintainer for it, but there would almost certainly be some growing pains during the transition.
- Pros:
  - Can automatically check compiler capabilities, such as support for Fortran 2008
  - Can be easier to find and link to dependencies (or even download and build them)
  - Easier to make portable
  - Can be easier to maintain
  - Automatically handles compilation order dependency chain, replacing fortdep
  - Has utilities to replace the existing Makefiles.GK_SYSTEM
- Cons:
  - Requires learning another tool
  - Errors are sometimes obtuse and/or buried

There are other build systems that could be used, such as autotools, or meson. They mostly have the same pros and cons as CMake.

Source code

Reorganise source code, e.g. move all the source files under src
- Has the benefit of a tidier top-level directory
- GS2 recently did this and it wasn't too painful
Use the GS2 utils submodule
- This is mostly the same, with some additions from both projects
- Benefit of sharing and reuse of code, sharing maintainer resources
- Would probably need a bit of work to add stella changes
Automatic code formatting with e.g. fprettify
- Consistent formatting tends to make code easier to read
- Painful to apply to existing projects
- Might be possible to apply automatically only to changes in PRs?

Code features

Add module configuration types from GS2
- Pros:
  - More consistently handling of input parameters across modules
  - Easier to programmatically setup modules for testing
  - Already have a working solution for generating namelist docs from these types
- Cons:
  - Adds extra steps when changing namelists -- this could be better automated
  - Will take a bit of work to port to stella

Tests

More integrated tests!
- Needs some example cases e.g. CYCLONE base case, Rosenbluth-Hinton equivalents?
Unit tests
- Geometry might be another good place to start adding tests, but probably needs example equilibrium files?
Coverage with e.g. https://coveralls.io
- Tracks how much of source code is actually covered by tests
Expand Github Actions to cover more build options
- e.g. oldest compiler you want to support, minimal dependencies

Licence

stella needs a licence, I recommend either MIT or BSD-3-Clause. Both essentially say "do whatever you want, but you must credit the authors". GS2 is MIT-licensed, so that's probably the best starting point. The MIT licence does allow changing the licence, so we're not limited to that.

LU decomposition vs. Matrix Inversion

Currently, stella uses an LU decomposition to perform the linear solve of the response matrix for the parallel streaming term. This isn't the only option, however; GS2 uses a full matrix inversion instead. There are pros and cons to both approaches: the LU decomposition itself uses about half the operations of the Gauss Jordan elimination, and so initializing the problem is twice as fast. The LU decomposition is also much more stable for ill-conditioned matrices (though I don't see why parallel streaming would be particularly ill conditioned, and so this is less of a concern).

However, the full matrix inversion comes with one big advantage: the linear solve becomes a simple matrix multiplication, rather than the forward and backward substitutions of the LU decomposition. This can actually be very important, as this part of the code scales as (N_x*N_z)^2, and so for big problems this solve can become dominant if no extra steps are taken.

I've made several attempts to parallelize the forward/backward substitutions of the LU decomposition (which are in the feature/parallel_back_sub branch), but the approaches suffer performance degradation either from synchronization (from MPI_barriers/MPI_win_fences) or lack of locality (cache misses). On the other hand, parallelizing a matrix multiplication is rather trivial, which I've done in the feature/matrix_inverse branch; in this case, the cost of the linear solve ceases to be an issue.

I suppose for many use cases, the LU decomposition is good enough, but as we push towards larger problem sizes we'll need to consider the inversion option. At the very least, we may want to somehow inform users that this option exists.

EDIT: I imagine David Dickinson has some advice on this front from his experience with GS2.

Refactoring of shared memory routines

Right now, the allocation of shared memory arrays occurs in a few locations, such as fields.fpp, sources.fpp and response_matrix.fpp. There's a lot of repeated code patterns in these areas, and so it should probably be brought into a new file in the utils/ directory. This should hopefully clean up those files quite a bit.

`full_xzgrid.f90` could be deleted?

The full_xzgrid module isn't used anywhere and looks to not be fully implemented. Should be deleted? We can always revive it if it is needed later

post_processing/stella_data.py: *.out.nc is not a valid NetCDF 3 file

Executing

$ python3 kspectra_plots.py

On my favourite stella *.out.nc file leads to the following error

Traceback (most recent call last):
File "kspectra_plots.py", line 3, in
from stella_data import time, ntime, outdir, nakx, naky, kx, ky
TypeError: Error: *.out.nc is not a valid NetCDF 3 file

This can be fixed by replacing the scipy netcdf module with

from NETCDF4 import dataset
ncfile = Dataset(infile,'r', format='NETCDF4')

and remembering to add '.size' the the end of a dimension enquiry

Current master branch (3cf015b) exhibits incorrect behaviour

The current master branch is exhibiting numerical instability for a nonlinear tokamak simulation that saturated without issue for a previous version of master (from October 2022). I am investigating where the issue was introduced, but in the meantime suggest using an older version of the code or refraining from nonlinear simulations.

Should we keep the flag set_bmag_const?

The flag set_bmag_const forces the quantity bmag to be constant when the geometry is initialised. However, the code does this after all other geometric quantities are calculated, and so bmag(z) isn't consistent with (for example) dbdzed. This is likely to cause confusion, and so I was wondering if we should:

Remove the flag completely (the comments from @alexvboe indicate it was intended as a temporary flag to test a particular problem). bmag can be overwritten using the overwrite_bmag flag, which is done nearer the start of the init_geometry subroutine, and so is more consistent.
Refactor the geometry initialisation so that set_bmag_const occurs nearer the start of the geometry initialisation.
Add more documentation/warnings for the flag.

What your thoughts on this @mabarnes @DenSto @alexvboe ?

Numerical instability for small ky rho_ref in branch feature/Davies-linear-EM-terms

Issue to track a numerical instability in the electromagnetic stella branch feature/Davies-linear-EM-terms (commit 47f9f53). Possibly related to issue #98 (which mostly concerns the master branch).

Some details of the test case I'm examining has the features:

CBC, with zero gradients
Zero-incoming parallel BC
Kinetic ions + kinetic electrons
implicit treatment of streaming + mirror terms
ky=0.05 , dt=0.004, upwinding parameters uz=0.02, ut=0.02, uvpa=0.02
apar=bpar=0 i.e. an electrostatic problem. The differences between this implementation and the master branch is (1) the formulation of the GKE (uses h rather than/as well as g as distribution function variable) and (2) a refactoring of the response matrix for streaming to avoid code duplication.
Some features of the instability:
Occurs at low ky
Occurs for all timestep sizes tried (in the range 1E-4 - 1) for ky=0.05. The instability growth rate increases as the timestep decreases (NB this is the opposite to the trend observed in the master branch, issue #98 )
It appears this can be stabilised by increasing u_z (to ~0.3) or u_t (to ~0.3), but this may introduce too much dissipation for physical modes.
Instability does not occur for the same input file using master branch (commit dcc6aa4). This becomes unstable when dt is increased (0.004 to e.g. dt=0.4), (consistent with #98 )
Insensitive to whether the mirror term is implicit/explicit
Making the streaming term explicit gives rise to an instability, but this looks a bit different (stabilised at sufficiently low dt). So I think there are 2 long-wavelength instabilities, which manifest depending on how parallel streaming is implemented.

Example input file attached. I'll provide more information/diagnostics shortly. It looks to me like some problem with the streaming algorithm. (Although there isn't an instability for the unsheared slab geometry with periodic BCs).

input.in.txt

Add a strict debug build to the actions?

Is it worth adding a build to the job matrix which enables as strict checking as possible? I've managed to catch a few issues by running the simple integrated test with -fcheck=all active. Enabling such checks and expanding the cases covered by the integrated tests might be useful in catching subtle issues.

Inconsistencies for high ky in linear runs

In linear runs, the solution of stella differs from the GENE solution for high kys as can be seen in the picture below. These modes do not seem to converge. The corresponding files can be found at https://www.pavlosipp.com/stella

Parallel LU decomposition and shared memory issues on Archer2

When HAS_ISO_C_BINDING = yes is used alongside the parallel LU decompositions, stella will crash when used on Archer2 when the Cray compilers are used. This problem does not happen when the gnu compilers are used instead, or on other compilers using Intel compilers. It remains to be seen if this is a problem with the Cray compilers, or if there is a deeper problem in the shared memory methods.

dissipation.f90 is currently too unwieldy

As it stands, dissipation.f90 is clocking in at 6182 lines of code, which means it's probably too big. I've already removed the sources/sinks from the file, and perhaps I can take out hyperdissipation too. This still won't do the job, and I imagine the separate collision operators should be separated as well.

Apart from that, the file throws lots of warnings, and in many places the line lengths are too long/indentation is too wide. It needs a clean-up.

Non blocking scatter and gather

Right now the redistribution function uses blocking send and receive commands. This makes the communication very inefficient and should be fixed by using the non blocking commands in redistribute.f90.

Some invalid floating operations caught by compiler flags

I've been trying to have a quick check for invalid operations for issues when stricter debugging flags are enabled, including -fcheck=all and -ffpe-trap=invalid,overflow,zero. This has highlighted a couple of potential (minor) issues.

Firstly, tcorr_source_qn defaults to 0.0 and is later used to set exp_fac_qn (here) which involves dividing by this value. As such, if it isn't set in the input then this gives a divide by zero error. One could guard the division to avoid this, but there's also

stella/sources.fpp

Line 148 in 5d6ef96

if (tcorr_source_qn < 0) tcorr_source_qn = tcorr_source

which could change < to <= if this makes more sense?

Secondly the line at

stella/fields.fpp

Line 1262 in e37153b

phi = phi / spread(gamtot, 4, ntubes)

divides by gamtot, which can be zero. There's a check after this line to set phi to zero where certain elements are zero, however as there is strictly a divide by zero the compiler catches this and aborts before we get to the fix. Could this be written as

where (spread(gamtot, 4, ntubes) < epsilon(0.0))
    phi = 0.0
elsewhere
   phi = phi / spread(gamtot, 4, ntubes)
end where

to avoid ever dividing by zero (and also generalising to allow gamtot to be zero for ik /= 1 and it /= 1)?

time_advance: Resetting the timestep size code_dt after implicitly advancing doesn't re-do the implicit advance on that timestep

advance_stella in the master branch has the following:

` if (mod(istep,2)==1 .or. .not.flip_flop) then
! advance the explicit parts of the GKE
call advance_explicit (gnew)

   ! enforce periodicity for zonal mode

! if (zonal_mode(1)) gnew(1,:,-nzgrid,:) = gnew(1,:,nzgrid,:)

   ! use operator splitting to separately evolve
   ! all terms treated implicitly
   if (.not.fully_explicit) call advance_implicit (istep, phi, apar, gnew)
else
   if (.not.fully_explicit) call advance_implicit (istep, phi, apar, gnew)
   call advance_explicit (gnew)
end if`

Adjusting timestep size code_dt for numerical stability is handled in advance_explicit; however, if advance_explicit is called after advance_implicit (which always occurs if flip_flop = .false. , and every other timestep if flip_flop = .true.), the implicit advance is not re-performed with the smaller code_dt. This means that the timestep on which code_dt resets is inconsistent (although in subsequent timesteps advance_implicit uses the reduced code_dt).

Proposed fix: Move the reset_dt logic into the advance_stella subroutuine; try taking a full timestep, and then check the flag restart_time_step . If restart_time_step==.true. , throw away the changes and repeat the whole timestep with the reduced code_dt. I've implemented something like this in the branch feature/nisl_leapfrog_nonlinearity , so will do same in master.

Minor compilation issues from PR #9

Pull request #9 fixed and issue and introduces another: vmec will seems to always recompile after 'make', even if there have been no updates.

Restarted simulations fail on initialisation when new input file contains more lines than original input file (netCDF error)

To reproduce:

Run a simulation with save_for_restart = .true. (example attached, but delete the ".txt" suffix)
Modify the input input file to ginit_option = "many"
Add an additional line to the input file e.g. add delt_option = "check restart" in the "knobs" namelist, or add a line of comment anywhere in the input file
Run the simulation.

When I do this, I get the error message:
ERROR: NetCDF: Start+count exceeds dimension bound in variable "input_file" in varid: 54 variable name: input_file writing variable
ERROR STOP Aborted by neasyf_error

My guess is that:

Running a simulation from resart, stella uses the same .out.nc file as the initial simulation, so it can just append new data on to the end
The old .out.nc file is opened, and save_input is called to save the contents new input file in the old .out.nc file
Somehow, stella (or neasyf) determines the dimensions of this data based on the old input file, rather than the new.

@ZedThree - possibly a neasyf issue, or an issue which manifests badly when using neasyf?

example.in.txt

use of box with periodic BCs and non-zero shat

if shat is greater than shat_zero, but periodic BCs are chosen in the input file (and no separate twist_and_shift option is chosen), it seems to me that the the standard twist-and-shift constraint will be used to set dkx. While it is an unlikely combination of parameters, I would think that the behaviour we would want is that anytime a periodic BC is chosen in the input file, dkx should be set independent of the constraints placed by jtwist being an integer.

Any interest in omega in netcdf output?

I've been doing some linear comparisons with GS2 for testing the electromagnetic terms and to aid this I added the omega data to the netcdf output. I'm happy to make a PR for this if it is something useful for general use, but thought I'd check first in case it doesn't fit with common use.

mat_gen default to .false.?

Hanne, since the parallelization of the response matrices has been implemented, have you needed to save and read in response matrices? If not, I'll go and set the writing out of the response matrices default to false.

New adiabatic electron option

Now that stella has a self-periodic option and can seemingly handle unsheared slabs, it may be worthwhile to add a modified adiabatic electron option that now just substracts the field-line average, rather than the flux-surface average, from the Boltzmann response. This is similar to what we're doing now, but now performed for all ky, not just ky = 0. What needs to be done is to change the arrays such as gamtot3 to include the ky dimension, and then to do a loop over ky for the fieldline-average procedure.

Making the parallel streaming in radially global stella implicit.

Integrating the exact quasineutrality solve with the response matrix approach is a work in progress that will soon be completely.

This however, doesn't include any radial variation from terms in the GKE proper. The naive way of including these (i.e. a big matrix) wouldn't work, since we really do rely on the speed of the tridiagonal solver for this term. What one could do is work in mixed x-ky space instead of pure Fourier. This won't work for any ky with connections, since this messes up the parallel boundary condition, but for periodic modes (e.g. zonal modes), this would be totally doable. The only catch is the gyroaveraging, which destroys the one-pointedness of the real-space representation. That can be worked around by using the un-gyroaveraged phi in the implicit routine (which then renders it one-point), and then adding the phi - < phi >_R term explicitly. This should remove the 1/k_perp^2 CFL constraint that's mentioned in the original Stella paper, at least for periodic modes.

Zonal mode and the flux-tube train

Currently when the flux-tube train is used in Stella, the k_y = 0 modes of the train are uncoupled, and so these modes are treated as periodic within a single car. I'm not quite sure this is what we want to do, since the zonal modes may grow to be different between the cars. I think it would be best to match the nperiod > 1 case without the train. This is easy enough to change explicitly (fill_zed_ghostzones), but the response matrix will have to be modified somewhat for periodic modes.

Numerical instability for very small ky rhoref in electrostatic stella

In trying to understand a numerical instability in a high fidelity linear simulation, I have discovered a numerical instability that affects electrostatic stella. In the attached .zip file is a cyclone base case example which goes numerically unstable after a long period of almost constant amplitude in phi. B is forced to be constant with set_bmag_const = .true., with the intent to remove complications from the mirror term. In the example fprim = 0 and tprim = 0, guaranteeing that this instability is numerical rather than physical. The eigenfunction has a high or grid scale k||. Introducing a larger zed_upwind does delay the instability, but even with zed_upwind = 0.1 the growth of the mode can still be seen.

Sharing this here for future reference. May be of interest to @DenSto @mabarnes @d7919 @rd1042 due to similarities to the famous GS2 numerical instability at low ky and dt (https://bitbucket.org/gyrokinetics/gs2/issues/108/linearly-unstable-zonal-modes), although there the issue seems related also to the electromagnetic terms.

cbc.zip

P.S. for avoidance of doubt, the version of stella used was commit 126b599.

Asynchronous mirror term

Currently, stella uses a sequential approach to operator splitting, where an operator partially advances $g_s$ and passes off to the next operator in the sequence. This isn't the only way to do operator splitting, though: it can also be done in a parallel way instead. Consider a set of operators $\Gamma_i$ and the distribution function at some timestep $g_s^n$. The parallelized approach goes as follows: every operator acts on the distribution function from the previous timestep,
$$\tilde{g}_s^{n,i} = \Gamma_i g_s^n.$$
We then calculate the increments,
$$\Delta g_s^{n,i} = \tilde{g}_s^{n,i} - g_s^n,$$
and so the distribution function at the next timestep is $g_s^{n+1} = g_s^{n} +\sum_i \Delta g_s^{n,i}$. This approach is still first order accurate, so it should perform similarly to what we're already doing.

The advantage of this approach is that, since the operators don't depend on each other, things can happen asynchronously. The idea for a single stella timestep would be as follows:

Asynchronously perform the all-to-all redistribution of $g_s^n$ for the mirror term
Calculate the explicit terms
Collect $g_s^n$ in local velocity space, which was sent in step 1, and calculate the mirror term.
Asynchronously send information (all-to-all redistribution) from step 3.
Calculate parallel streaming
Collect information transmitted in step 4.
Calculate increments and thus $g_s^{n+1}$.

If things are done correctly, then one shouldn't have to wait around for the all-to-all to complete, which is currently one of the bottlenecks of the code. There are two disadvantages though: one has to now store the increments, which modestly increases memory requirements, and one cannot use the flip-flopping for second order accuracy (though this isn't default behaviour anyway).

I think this approach, along with the shared-memory improvements mentioned in #23, would allow stella to scale to hundreds of thousands of cores. At the very least, it would be interesting to see this could mitigate the all-to-all bottleneck.

Stella MPI + MPI shared memory approach

Currently stella's scaleability is limited by its parallelisation of velocity space alone, relying on a redistribution between a space-local grid and a velocity-local grid. I think the fastest and most straightforward way to achieve more scaleability with what's already in Stella is to use a hybrid shared memory approach. While this is typically done with MPI + openmp, stella already takes advantage of MPI's shared memory framework using windows and mixed communicators, so I think a MPI + MPI approach would be much faster to get to production.

The idea is this: instead of distributing the velocity grid over the number of available cores, instead distribution over the number of available nodes. Then on a node, most of the operations can be parallelized over naky (and when the time comes, nakx for the nonlinearity). This would require a number of small modifications in a few subroutines, rather than needing to rewrite array sizes, create new layouts and redistributors, etc...

The question that remains is can this be exploited for the velocity-local operations as well, such as the mirror term and collisions? In principle the latter should be doable, since mu acts like ky here. For collisions, this is more unclear, unless the vpa and mu operators are always decoupled (like for the Dougherty operator).

Documentation git action fails

Recently, the documentation git action has been failing with this error:

Traceback (most recent call last): File "/home/runner/.local/bin/ford", line 5, in <module> from ford import run File "/home/runner/.local/lib/python3.8/site-packages/ford/__init__.py", line 41, in <module> import ford.output File "/home/runner/.local/lib/python3.8/site-packages/ford/output.py", line 44, in <module> loader=jinja2.FileSystemLoader(loc / "templates"), File "/usr/lib/python3/dist-packages/jinja2/loaders.py", line 163, in __init__ self.searchpath = list(searchpath) TypeError: 'PosixPath' object is not iterable Error: Process completed with exit code 1.

Peter, do you know what this might mean?

Missing broadcast of species_type_switch?

The species type input is not broadcast and may be overridden on just proc0. Perhaps this is broadcast elsewhere - might be worth a comment to note this if this is the case.

stella/species.f90

Lines 159 to 191 in 14a4c90

    
              if (proc0) then 
        
                 nspec = 2 
        
                 read_profile_variation = .false. 
        
                 write_profile_variation = .false. 
        
                 species_option = 'stella' 
        
                 ecoll_zeff = .false. 
        
                 in_file = input_unit_exist("species_knobs", exist) 
        
                 if (exist) read (unit=in_file, nml=species_knobs) 
        
                 ierr = error_unit() 
        
                 call get_option_value(species_option, specopts, species_option_switch, & 
        
                                       ierr, "species_option in species_knobs") 
        
                 if (runtype_option_switch == runtype_multibox .and. (job /= 1) .and. radial_variation) then 
        
                    !will need to readjust the species parameters in the left/right boxes 
        
                    species_option_switch = species_option_multibox 
        
                 end if 
        
                 if (nspec < 1) then 
        
                    ierr = error_unit() 
        
                    write (unit=ierr, & 
        
                           fmt="('Invalid nspec in species_knobs: ', i5)") nspec 
        
                    stop 
        
                 end if 
        
              end if 
        
              call broadcast(nspec) 
        
              call broadcast(read_profile_variation) 
        
              call broadcast(write_profile_variation) 
        
              call broadcast(ecoll_zeff) 
        
           end subroutine read_species_knobs

Appetite for using GS2 utils?

Is there any appetite for using the utils library from GS2? The upcoming 8.1 release already uses CMake and has a bunch of tests.

Obviously things have diverged a little bit, but not a massive amount, and the two projects could benefit from the combined additions. For example, GS2 utils now has a function get_option_with_default that makes dealing with optionals in routines much nicer.

cmake build for master branch fails on Marconi

When trying to use cmake with the latest version of the master branch, I get the following error message:
CMake Error at externals/neasyf/CMakeLists.txt:1 (cmake_minimum_required):
CMake 3.20 or higher is required. You are running version 3.18.2

As far as I can see, the most recent version of CMake available on Marconi is 3.18.2. Is there a workaround for this?

add topic plasma-physics

I suggest adding the topic plasma-physics in the About section.

numerically unstable zonal flows when maxwellian

This is a similar issue to #59, but seems to be much stronger and present even when zed_upwind = 0.

When doing the standard RH test case with adiabatic electrons and maxwellian_inside_zed = .true., thing act as they should. However, if instead one uses kinetic electrons (nspec =2), the ky = 0 mode become wildly unstable. I'm not quite sure why this is, and I think it would be good to have this case working, since I think how one should center gradpar and maxwell_mu is a little more clear in this case.

long-wavelength numerical instability in unsheared slab with explicit streaming

I've tried running a low-res simulation in an unsheared slab in stella (periodic BCs), in the absence of any gradients. I'm finding that when I treat parallel streaming implicitly, I get something stable ( |phi^2| fluctuates but slowly falls over time); however, with explicit streaming the solution blows up; this blowup happens more quickly at lower ky.

I think is related to the electrostatic shear alfven wave problem (described in the appendix of the 2019 stella paper, https://doi.org/10.1016/j.jcp.2019.01.025 ); one expects a numerical instability if the d(phi)/dz term is treated in an upwinded way, and from a look at the code (parallel_streaming.f90), advance_parallel_streaming_explicit calculates d(phi)/dz by calling the subroutine third_order_upwind_zed.

It seems advance_parallel_streaming_explicit used to calculate d(phi)/dz in a centered way, but has been replaced with an upwinded method. I was wondering if anyone knew the reason for this?

Example input files to reproduce instability attached.

input_slab_explicit.zip

Building and deploying documentation failing

I tried to update the documentation on the .io page, but the gh-pages branch gave this build error:

Error: fatal: No url found for submodule path 'externals/git_version' in .gitmodules Error: The process '/usr/bin/git' failed with exit code 128

Not quite sure what to make of it.

Avail_cpu_time flag not supported

The available_CPU_time flag should cause a Stella to exit 5 minutes before is reached. This function is not currently supported in the main stella.f90 program. Restoring this function is potentially important for users carrying out large simulations where it is expected that the simulation will require multiple restarts to reach saturation.

I have implemented the fix here

https://github.com/stellaGK/stella/tree/bugfix/avail_cpu_time

Thanks to @DenSto for suggestions.

Implicit treatment of mirror term in radially global operation

The mirror term can be treated fully implicitly, even with radial profile variation. The idea is to inverse Fourier transform to x space then multiply by the appropriate factor give the correct b . grad B. This is already done for the collision operator, and shouldn't be too hard to do for the mirror term.

	if (proc0) then
	nspec = 2
	read_profile_variation = .false.
	write_profile_variation = .false.
	species_option = 'stella'

	ecoll_zeff = .false.

	in_file = input_unit_exist("species_knobs", exist)
	if (exist) read (unit=in_file, nml=species_knobs)

	ierr = error_unit()
	call get_option_value(species_option, specopts, species_option_switch, &
	ierr, "species_option in species_knobs")

	if (runtype_option_switch == runtype_multibox .and. (job /= 1) .and. radial_variation) then
	!will need to readjust the species parameters in the left/right boxes
	species_option_switch = species_option_multibox
	end if

	if (nspec < 1) then
	ierr = error_unit()
	write (unit=ierr, &
	fmt="('Invalid nspec in species_knobs: ', i5)") nspec
	stop
	end if
	end if
	call broadcast(nspec)
	call broadcast(read_profile_variation)
	call broadcast(write_profile_variation)
	call broadcast(ecoll_zeff)

	end subroutine read_species_knobs