earth-system-radiation / rte-rrtmgp Goto Github PK
View Code? Open in Web Editor NEWRTE+RRTMGP is a set of codes for computing radiative fluxes in planetary atmospheres.
License: BSD 3-Clause "New" or "Revised" License
RTE+RRTMGP is a set of codes for computing radiative fluxes in planetary atmospheres.
License: BSD 3-Clause "New" or "Revised" License
An issue received via email:
From: Thomas Jahns <[[email protected]](mailto:[email protected])>
Sent: Monday, March 7, 2022 9:57 AM
To: [[email protected]](mailto:[email protected]) <[[email protected]](mailto:[email protected])>
Cc: Claudia Frauen <[[email protected]](mailto:[email protected])>; Jörg Behrens <[[email protected]](mailto:[email protected])>
Subject: Memory layout in rrtmgp/kernels/mo_gas_optics_kernels.F90
Hi,
I was roped in by some colleagues to look at some issues in the mo_gas_optics_kernels module and noticed two other, independent problems for compiler optimization of the module:
1. all routines are made public and bind(c) which prevents inline for compilers adhering to strict semantics for symbols that can be interpositioned. Would it be possible to change the declaration of interpolate3D_byflav to private and remove the bind(c) to permit Fortran assumed shape arguments.
2. the above would allow to change the fields from e.g.
integer, dimension(2, ncol,nlay,nflav), intent(out) :: jeta
real(wp), dimension(2, ncol,nlay,nflav), intent(out) :: col_mix
real(wp), dimension(2,2,2,ncol,nlay,nflav), intent(out) :: fmajor
real(wp), dimension(2,2, ncol,nlay,nflav), intent(out) :: fminor
to
integer, dimension(ncol,2, nlay,nflav), intent(out) :: jeta
real(wp), dimension(ncol,2, nlay,nflav), intent(out) :: col_mix
real(wp), dimension(ncol,2,2,2,nlay,nflav), intent(out) :: fmajor
real(wp), dimension(ncol,2,2, nlay,nflav), intent(out) :: fminor
and enable the compiler to process the data with multiple packed SIMD-streams instead of emitting instructions for vector gather/scatter operations of data that is essentially in AoS layout.
I think I could submit a corresponding code change if the idea seems possible in general.
Kind regards, Thomas
Hi,
I get errors that I am unable to fix when compiling the latest release (v1.3). I am using CMake for the build process.
OS: Ubuntu 20.04
CMakeLists.txt
error output: output.txt
Any idea what that is?
At present, using GPU code and running on GPUs produces correct answers for the RFMIP SW standalone test case on the OLCF host Summit but incorrect answers on other hosts. We are seeking a solution.
There are data files which do contain the generation date in their name. The older highres once have an ISO conforming datestring include, while the newer once with lower resolution refer to the time Augustus was leading the Roman Empire. Could that be changed to a proper date string?
Hi,
this is a probably very silly question:
What is the best way of calling RTE+RRTMGP from a GCM?
Say, I have dry air density and temperature in all the grid boxes and I want to get back the spectrally integrated spectral flux density convergences as W/m^3.
Do I have to write out the atmospheric state as a netcdf file, then call RTE+RRTMGP and then get back a netcdf file with the radiation fluxes?
Should I compile the source code of RTE+RRTMGP into the GCM executable and then write an interface? Or should I compile RTE+RRTMGP separately and then use the files librrtmgp.a and librte.a?
I could not find information on this.
Thanks
Max
Paging @SamuelTrahanNOAA @dustinswales: My understanding is that the implementation of RTE+RRTMGP has been consolidated in the UFS and that we are free to delete the dtc/ccpp branch. Any comments?
In examples/mo_load_coefficients, the coefficients files are opened with mode NF90_WRITE
. Since these files are not intended to be written to, they should probably be open in mode NF90_NOWRITE
.
When I run the master branch:
[imn@login5:/gpfs/alpine/scratch/imn/stf006/rte-rrtmgp] 8-) git branch
* master
[imn@login5:/gpfs/alpine/scratch/imn/stf006/rte-rrtmgp] 8-) git diff
[imn@login5:/gpfs/alpine/scratch/imn/stf006/rte-rrtmgp] 8-)
with PGI on Summit using -O1
, I get the following diffs for a CPU run:
Variable rld differs (max abs difference: 3.814697e-06; max frac. difference: 1.178709e-05%)
Variable rsu differs (max abs difference: 4.685364e-01; max frac. difference: 1.464250e-01%)
Variable rsd differs (max abs difference: 1.940063e+00; max frac. difference: 1.464253e-01%)
I'm thinking maybe the baseline files are stale compared to current master.
Problem:
Intel compiler issues when building ICON with RTE-RRTMGP on "Levante", the new HPC system at DKRZ consisting of AMD EPYC Milan processors. The build using the Intel compiler is possible only if almost all optimizations are turned off. Especially the compiler options -m64 -march=core-avx2 do not work at all.
The error occurs in the subroutine interpolation
in file rte-rrtmgp/rrtmgp/kernels/mo_gas_optics_kernels.F90
in the following nested loop:
eta = merge(col_gas(icol,ilay,igases(1)) / col_mix(itemp,icol,ilay,iflav), 0.5_wp, &
col_mix(itemp,icol,ilay,iflav) > 2._wp * tiny(col_mix))
loceta = eta * float(neta-1)
jeta(itemp,icol,ilay,iflav) = min(int(loceta)+1, neta-1)
The result of the calculations for jeta
is sometimes jeta=-2147483647
, which leads to a segmentation fault.
Workarounds tried so far :
mo_gas_optics_kernels.F90
by using !DIR$ NOOPTIMIZE
at the beginning of the module. It works but the performance degradation is unacceptable.Next steps :
Hi,
we found a bug in line 137/138 of rrtmgp/kernels/mo_gas_optics_kernels.F90. The code is:
col_mix(itemp,icol,ilay,iflav) = col_gas(icol,ilay,igases(1)) + ratio_eta_half * col_gas(icol,ilay,igases(2))
eta = merge(col_gas(icol,ilay,igases(1)) / col_mix(itemp,icol,ilay,iflav), 0.5_wp, &
col_mix(itemp,icol,ilay,iflav) > 2._wp * tiny(col_mix))
following the Fortran standard is merge following the same rules as conditional expressions like if and a compiler is allowed to evaluate the potential results in parallel to the evaluation of the logical expression. This leads to a potential divide by 0 in the second line given above. To solve this the two lines need to be reformulated so that this cannot happen anymore.
We could do this, but first want to know, if you have a better solution in mind how to fix this.
Cheerio,
Luis
I was unable to run the RTE-RRTMGP GPU code on NVIDIA A100 GPU with either nvhpc/24.1
or nvhpc/24.3
compiler.
The error message looks like:
./check_equivalence test_atmospheres.nc /glade/derecho/scratch/sunjian/rte-rrtmgp/rrtmgp-data/rrtmgp-gas-sw-g224.nc
gas optics is for the shortwave
pressure limits (Pa): 1.005183574463000 109663.3158428461
temperature limits (K): 160.0000000000000 355.0000000000000
Intialized atmosphere twice
Default calculation
Vertical orientation invariance
Changing TSI fails
TSI invariance
halving/doubling fails
Incrementing with 1scl fails
Incrementing with 2str fails
Incrementing with nstr fails
Incrementing
Warning: ieee_invalid is signaling
Warning: ieee_inexact is signaling
ERROR STOP 1
I am using the same compiler flags suggested by https://github.com/earth-system-radiation/rte-rrtmgp/blob/main/.github/workflows/containerized-ci.yml#L40-L42.
The combine_and_reorder_nstr
routine appears to incorrectly index the input p
array, which is assumed to have dimensions p(ncol,nlay,ngpt,nmom)
but is accessed within combine_and_reorder_nstr
as p(imom,icol,ilay,igpt)
.
It also appears that p(1:2,...)
are written to, then p(1:nmom,...)
are overwritten in a loop. In the event that nmom < 2
explicitly writing to p(2,...)
will cause a segfault. This also causes some complications/dangers with the openACC port I think.
Based on RRTMGP.jl's coverage, the lowest coverage comes from
It might be nice to add examples that touch more of these kernels.
I ran into this problem while trying to develop an RCE compset for E3SM. The valid temperature range for RRTMGP goes down to 180K, but for whatever reason when running in aqua planet the upper levels routinely see temperatures down to 150K.
For the time being I've enabled temperature clipping, but I'm not sure what the effect will be. It seems like a good idea to expand the look up tables to support these conditions.
It looks like there is an indexing issue on this line:
and maybe these lines too:
rte-rrtmgp/rte/kernels-openacc/mo_rte_solver_kernels.F90
Lines 1018 to 1022 in 3b5d157
Shouldn't ngpt
be changed to igpt
?
Hello,
When I tested the current code on an AMD machine I found that sw_dif_and_source
is not vectorizing with either ifort 2021.6.0 or GNU Fortran 11.3 compilers.
Intel's optimization report says the following:
LOOP BEGIN at ../rte/kernels/mo_rte_solver_kernels.F90(1389,7)
remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
remark #15346: vector dependence: assumed ANTI dependence between DIR_FLUX_INC(i) (1465:11) and DIR_FLUX_TRANS(i) (1471:11)
LOOP END
(Ignore line numbers, I've added things - but this is for the version of sw_dif_and_source in the main branch).
Timings confirm sw_dif_and_source
is very slow due to this:
Called Recurse Wall max min %_of_clea
clear_sky_total (SW) 1 - 1.946 1.946 1.946 100.00
gas_optics (SW) 100 - 0.773 8.83e-03 7.65e-03 39.70
rte_sw 100 - 1.173 0.017 0.012 60.26
sw_2stream 100 - 1.170 0.017 0.012 60.12
sw_dif_and_source 11200 - 1.109 2.79e-03 9.20e-05 56.97
sw_adding 11200 - 0.043 8.00e-06 3.00e-06 2.19
This can be fixed by adding !$OMP SIMD
before the ncol loop:
LOOP BEGIN at ../rte/kernels/mo_rte_solver_kernels.F90(1389,7)
remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 6
remark #15451: unmasked unaligned unit stride stores: 4
remark #15456: masked unaligned unit stride loads: 3
remark #15457: masked unaligned unit stride stores: 3
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 470
remark #15477: vector cost: 249.250
remark #15478: estimated potential speedup: 1.860
remark #15482: vectorized math library calls: 2
remark #15486: divides: 3
remark #15488: --- end vector cost summary ---
LOOP END
Called Recurse Wall max min %_of_clea
clear_sky_total (SW) 1 - 1.362 1.362 1.362 100.00
gas_optics (SW) 100 - 0.781 0.013 7.64e-03 57.34
rte_sw 100 - 0.580 6.01e-03 5.76e-03 42.60
sw_2stream 100 - 0.577 5.98e-03 5.71e-03 42.38
sw_dif_and_source 11200 - 0.470 6.30e-05 4.00e-05 34.53
sw_adding 11200 - 0.043 1.00e-05 3.00e-06 3.19
This is with Intel. With gfortran, there is another serious issue: the mu0 conditional prevents vectorization. Removing it halved the total runtime of the RFMIP SW program (2 -> 1 second, now faster than ifort!). Perhaps the code could be changed so that the mo_rte_sw checks that mu0 is positive in all the input columns, otherwise calls the slower version with mu0 conditional inside kernels? Or you could even write the functionality into the same (but admittedly more bloated) SW kernel with an argument check_for_positive_mu0 (perhaps some host models have already removed nighttime columns before calling RTE-SW, so they can set it to false).
Hi,
!scaling = scaling * col_gas(icol,ilay,idx_minor_scaling(imnr)) * vmr_fact * dry_fact
! When using single precision, the scaling variable previously became infinity at this stage for some minor gases, due to
!(very big number)* (very big number) . Fixed by computing last term first (vmr_fact is very small)
scaling = scaling * (col_gas(icol,ilay,idx_minor_scaling(imnr)) * vmr_fact * dry_fact)
find_gas = findloc(this%gas_name,gas,dim=1)
, and GAS_NOT_IN_LIST needs to be changed to 0.As per suggestion by @vectorflux in #188:
export RTE_KERNELS="openacc"
for the OpenMP-acc mode is unaesthetic. "accelerator"
would be better.
For timing purposes it would be great to have a flexible benchmarking program. Timing estimates could build on @alexeedm's timing with system_clock
in the all-sky example.
We could use fixed concentrations for some greenhouse gases and analytic formulae for temperature, pressure, ozone, and water vapor, for example following the RCEMIP protocol. Then we might want to vary:
I'm in a situation trying to create an RCE compset for E3SM where I need to calculate the tsi_scaling variable to be sent in to rte_sw. This simply requires summing k_dist_sw%solar_src(:), but this is a private data that needs a "get" function to access. The alternative would require modifying the external files, which I'd like to avoid. The simple fix would be to add a "get_solar_src" method to ty_gas_optics_rrtmgp.
In branch feature-reduce-in-place
, which will be merged into develop
once OpenACC/MP issues are resolved, the kernels compute the sum over all g-points "in place" if the fluxes
argument to the high-level Fortran driver is of type ty_fluxes_broadband
. This is implemented in the kernels using what are essentially C-compatible optional arguments: logical variables and kernel arguments which may not be referenced within the kernel depending on the value of the logical. So, for example, in the call to the LW kernel the arrays broadband_up
and _dn
(with dimensions ncol, nlay+1
) are assigned values if the logical do_broadband
is .true.
, while the arrays flux_up
and _dn
(with dimensions ncol, nlay+1, ngpt
) are assigned if do_broadband == .false.
For the CPU versions of the kernels this has the potential advantage that the flux_
arrays need not be allocated at their full sizes in the driver if do_broadband == .true.
; we could rather use a small decoy
array in the call to the kernel. This would reduce the considerable memory footprint of the calculation and was one motivation for the change.
On the other hand at least one compiler (NAG, using debugging flags) attempts to fill every array with intent(out)
to NaN if its values is not explicitly computed, so not providing fully-sized arrays causes a runtime error. And, on the GPU side, fluxes for each g-point are still computed in parallel, so working memory needs to be allocated even if only the sum of the spectrally-resolved fluxes are used.
@alexeedm @Chiil I'm curious about your opinion - should we always completely allocate all the arrays passed to the kernels even if they won't be referenced? Are there alternatives?
As discussed in this PR #117 (comment)
This style of OpenMP offload code may not be correct
https://github.com/earth-system-radiation/rte-rrtmgp/blob/main/rte/kernels-openacc/mo_rte_solver_kernels.F90#L203-L213
Even in the case of do_Jacobians=False
, the OpenMP offload compiler may map variables which are not available on the CPU and lead to spurious seg. faults or other undefined behaviour. Note that most GPUs only have stack memory, so the variables are going to get allocated on the GPU even if the code is not executed on the GPU. (I think, but not 100% sure)
There are times when seg. faults with the Cray OpenMP offload are encountered in the presence optional argument. It would be much better to examine alternate code structure to avoid these kernels as is proposed in the aforementioned PR.
Various parts of the CI (the tests running automatically for each commit) rely on files from RFMIP hosted on the Earth System Grid. The ESGF nodes are flakey and the CI fails not infrequently because the CI can't retrieve the files.
One example is the LBLRTM results used in tests/validation-plots.py
. As these tests are being run in a container, we could consider adding the reference files to the container, trying the various ESFG nodes, and using the files inside the container if all ESGF nodes are broken.
It looks like radn_dn
should be intent(inout)
in lw_solver_noscat
, because it is assumed to have defined values within the routine (line 110 in mo_rte_solver_kernels.F90
in master). This doesn't throw an error with the PGI compiler, but the intent should probably be fixed for safety/clarity.
Hi,
I was running RRTMGP (tag: v1.5
) in CAM on Cheyenne (a supercomputer at NCAR, CPU nodes only) and I observed different results when I used different RRTMGP code paths for compilation (the same intel compiler, compset, grid, etc for other settings).
To be specific, case 1 used the following code paths from this repo:
rrtmgp/kernels
rte/kernels
while case 2 used the following code paths from this repo:
rrtmgp/kernels-openacc
rte/kernels-openacc
I wonder whether the difference in the CPU results for these two cases is expected or not. Or is there any compiler flag that I should include in my CAM build?
Currently, certain subroutines are exposed via C bindings under very generic names, e.g. interpolation
. That might trigger a name collision in a user application.
A possible solution is to come up with some common prefix and use it for all public functions of the library, e.g. rrtm_interpolation
.
The mo_rng_mklvsl
module attempts to include the source file mkl_vsl.f90
, which is not included in the rte-rrtmgp source, so any attempt to build the mo_rng_mklvsl
module will fail if this file is not sourced ahead of time. Unfortunately, this causes problems with our build environment in E3SM, which scans entire directories for source code files and attempts to build ALL of the files it finds. Thus, the mere presence of mo_rng_mklvsl.F90
in this directory will cause a build to fail for us. My solution until now has been to rename mo_rng_mklvsl.F90
to mo_rng_mklvsl.F90~
to trick the build scripts, but I now want to make the rte-rrtmgp repository a submodule in our repo to allow me to more easily pull in updates to rte-rrtmgp while preserving the git history for rte-rrtmgp in E3SM as well. I would suggest then removing this file from the rte-rrtmgp repo, since it won't build anyway, or at least moving it to a different directory (rng_mklvsl, perhaps?).
The dimension indicating string length is named string_len
in the longwave gas optics coefficients file, but string32
in the shortwave file. This results in some complications in a generic reader. In earlier versions both were named string_len
. It would be great if that could be restored.
There's a GNU bug in examples/all-sky/mo_garand_atmos_io.F90 for GNU wherein you cannot pass arrays of string literals that have different lengths. Workaround is to create a temporary: character(len=32) :: strtmp(2)
, and assign a dim name to each index and then use the temporary.
Also, there are array temporaries being created in the following places:
At line 234 of file ../rrtmgp/kernels/mo_gas_optics_kernels.F90
Fortran runtime warning: An array temporary was created for argument 'gpt_flv' of procedure 'gas_optical_depths_minor'
At line 254 of file ../rrtmgp/kernels/mo_gas_optics_kernels.F90
Fortran runtime warning: An array temporary was created for argument 'gpt_flv' of procedure 'gas_optical_depths_minor'
At line 187 of file ../rte/kernels/mo_rte_solver_kernels.F90
Fortran runtime warning: An array temporary was created for argument 'inc_flux' of procedure 'apply_bc_gpt'
+ ./rrtmgp_clouds rrtmgp-clouds.nc /home/temp/rte-rrtmgp/rrtmgp/data/rrtmgp-data-sw-g224-2018-12-04.nc /home/temp/rte-rrtmgp/extensions/cloud_optics/rrtmgp-cloud-optics-coeffs-sw.nc 3000
At line 234 of file ../rrtmgp/kernels/mo_gas_optics_kernels.F90
Fortran runtime warning: An array temporary was created for argument 'gpt_flv' of procedure 'gas_optical_depths_minor'
At line 254 of file ../rrtmgp/kernels/mo_gas_optics_kernels.F90
Fortran runtime warning: An array temporary was created for argument 'gpt_flv' of procedure 'gas_optical_depths_minor'
These can be worked around by using an array you create yourself, copying the data into it, and passing the new array.
Hi,
What are the units for cloud ice and liquid water paths, is it g/kg? I would like to do all-sky computations using CAMS data. I noticed that the liquid water path I computed (from mixing ratios in kg/kg) had a maximum value of ~0.2 while the all-sky program fills the array with values of 0 or 10. The units should be written in mo_clouds_optics.F90.
Gas optics routines will currently fail when passed any profiles that do not span both the upper and lower atmosphere, as defined by the value of press_ref_trop
in the coefficients files. This appears to be due to a bug in the interpolation
routine in mo_gas_optics_kernels
, where if the threshold between the lower and upper atmosphere is not crossed, the arrays itropo_lower
and itropo_upper
are left unset (in my case, these were left with values of zero, but this is probably compiler-dependent since they do not appear to be explicitly initialized anywhere). The code in question seems to be the following:
! itropo = 1 lower atmosphere; itropo = 2 upper atmosphere
itropo = merge(1,2,tropo(icol,ilay))
if (ilay .eq. 1) then
itropo_last(icol) = itropo
endif
if (itropo_last(icol) .ne. itropo) then
if (itropo .eq. 1) then ! layers go from TOA to surface
itropo_lower(icol,1) = ilay
itropo_lower(icol,2) = nlay
itropo_upper(icol,1) = 1
itropo_upper(icol,2) = ilay - 1
itropo_last(icol) = itropo
else ! layers go from surface to TOA
itropo_lower(icol,1) = 1
itropo_lower(icol,2) = ilay - 1
itropo_upper(icol,1) = ilay
itropo_upper(icol,2) = nlay
itropo_last(icol) = itropo
endif
endif
This causes problems in gas_optical_depths_minor
, where the values of itropo
are used to set the loop indices over the vertical dimension of the column gas amounts. If either itropo_lower
or itropo_uppter
contain zeros, then this code will create a segmentation fault when trying to access the col_gas
array out of bounds.
This is more of a question than an issue, is the index for ilev
in tau(:,ilev,igpt)
correct here? Based on when top_at_1
is true, it looks like there might be a +1
missing, but I'm not sure.
I have been trying to use the provided cloud optics with rte+rrtmgp, but it looks like there might be a potential issue with the table lookup function compute_all_from_table
. In particular, line 650 in mo_cloud_optics.F90
computes an index into the table, but this will result in an index <= 0 if effective radius re
is less than offset
. The code in question is:
index = min(floor((re(icol,ilay) - offset)/step_size)+1, nsteps-1)
Perhaps this is missing something like a max(index, 1)
here? I.e., it seems like this would fix the problem:
index = max(min(floor((re(icol,ilay) - offset)/step_size)+1, nsteps-1), 1)
but perhaps I am misunderstanding. I suppose it could also be assumed that it is the user's responsibility to restrict the cloud mask to only cases with effective radii within the bounds of the table, in which case maybe it is appropriate that this code will error out if it encounters a value of re
below the minimum specified in the table. Still, it would probably be better to throw an explicit error message, rather than letting the code segfault.
On Summit, we get the following failure with stage_files.py
using the following from a fresh conda install:
module load python/3.7.0-anaconda3-5.3.0
conda create -n newenv python=3.7 anaconda
source activate newenv
conda install netcdf4 xarray numpy
python ./stage_files.py
Conda env here: https://gist.github.com/mrnorman/be751789d5357b45453d56c8a4c917db
(newenv) [imn@login2:~/rte-rrtmgp/examples/rfmip-clear-sky] 8-) python ./stage_files.py
Dowloading RFMIP input files
Dowloading scripts for generating output templates
Traceback (most recent call last):
File "./stage_files.py", line 34, in <module>
urllib.request.urlretrieve(templ_scr_url, templ_scr)
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/urllib/request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/urllib/request.py", line 1360, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/urllib/request.py", line 1285, in do_open
h = http_class(host, timeout=req.timeout, **http_conn_args)
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/http/client.py", line 1368, in __init__
context = ssl._create_default_https_context()
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/ssl.py", line 565, in create_default_context
context = SSLContext(PROTOCOL_TLS)
File "/ccs/home/imn/.conda/envs/newenv/lib/python3.7/ssl.py", line 388, in __new__
self = _SSLContext.__new__(cls, protocol)
ssl.SSLError: [SSL] malloc failure (_ssl.c:2994)
(newenv) [imn@login2:~/rte-rrtmgp/examples/rfmip-clear-sky] >:O
There are folders rte-rrtmgp/rte/kernels, rte-rrtmgp/rte/kernels-openacc and
rte-rrtmgp/rrtmgp/kernels, rte-rrtmgp/rrtmgp/kernels-openacc
The problem is that there is no file mo_rrtmgp_util_reorder_kernels.F90
in the rte-rrtmgp/rrtmgp/kernels-openacc folder but it is used by mo_rrtmgp_util_reorder.F90 of rte-rrtmgp/rrtmgp/.
Similarly, mo_fluxes_broadband_kernels.F90 resides in rte/kernels, but not in rte/kernels-openacc, but is used in mo_fluxes.F90 that is compiled in both cases...
PR#50
TL;DR: Is there an "official" way to run RRTMGP in single-precision?
So this is a bit of a long ramble of a question so bear with me...
In trying to get RRTMGP v1.5 working in GEOS, we found it was crashing with a segfault...and the previous version didn't seem to the last we tested.
Eventually, I was able to figure out that, for some reason, the set of Intel Fortran flags we switched to recently--which enable vectorization model-wide--lead to a segfault in RRTMGP, and we made that switch in between tests. Now, I don't think the actual issue is in RRTMGP since changing the flags back to our old non-vectorized flags lets it work. I've tried to track down which of our flags cause the issue, but no luck so far.
But I got curious and thought, "I wonder if I can run RRTMGP in single-precision?" for some extra speed (since part of my job is code optimization). So I changed mo_rte_kind.F90
, recompiled, and boom. Big crash! But, a bit of GitHub searching led me to this commit on the rte-rrtgmp-nn repo of @peterukk. So, I added a k_min
, made similar edits, and I can run RRTMGP in single precision! Even weirder, it works with both non-vectorized and vectorized flags (which really makes me think someone else is messing with RRTMGP's memory state.)
And this got me thinking: is there a "correct" way to run single-precision RRTMGP? My naive attempt at just changing wp
failed, but with the tweak from @peterukk it did seem to work and the output wasn't a different planet...
The repository includes some examples/
and a few tests/
exercising various options in the code. These are cobbled together into a makeshift continuous integration that ensures changes to the code don't change the answers (unless the point of reference is also changed).
It would be terrific to add automated code coverage to the continuous integration so we know how much of the code is being exercised, and so new code is always paired with new tests. We are signed up with codecov.io but I have not been able to figure out how to produce the .xml or .json files needed with the Intel compiler (see branch feature-code-coverage
), nor have I been able to figure out how to manage coverage using the gnu toolchain when source code and executables are scattered across directories.
We'd welcome help or a PR that showed us how to generate code coverage reports during continuous integration.
To date data for RRTMGP schemes have been distributed alongside code in the repository, while data for verification and validation including continuous integration are distributed via FTP.
I propose to package the data used by the schemes, along with validation and verification data, via e.g. Zenodo, where each version of the data gets a new (but related) DOI. Something like:
rrtmgp_data/
rrtmgp_data/gas_optics/
rrtmgp_data/cloud_optics
rrtmgp_verification_data/
rrtmgp_verification_data/examples/
rrtmgp_verification_data/examples/rfmip-clear-sky
rrtmgp_verification_data/examples/all-sky
Files could be fetched for continuous integration and/or local use with the shell (e.g. wget https://zenodo.org/record/XXX/files/FILENAME.tgz
) and/or with Pooch in Python.
@Chiil @vectorflux @skosukhin Any thoughts?
@naromero77 from Intel had some suggestions via email for using the Intel OneAPI compilers to check aspects of the implement of the OpenMP offload.
I recently had some ideas that may help you continue OpenMP offload development and possibly do CI. With the Intel oneAPI compilers, you can do the following:
Test OpenMP offload with the target being the CPU instead of the GPU:
ifx -c -fiopenmp -fopenmp-target=x86_64 <files>
.
Code and should compile and run AS IS. However, you cannot detect issues with data transfers.
Test OpenMP offload with the target being the GPU but OpenMP offload DISABLED at runtime
ifx -c -fiopenmp -fopenmp-target=spir64 <files>
You will not to set the runtime environment variable OMP_TARGET_OFFLOAD=DISABLED
. Again, you cannot detect issues with data transfers.
The first method should be supported on any Intel CPU. The second method should also be supported on any Intel CPU and would be the preferred method.
We would like to distribute the RTE and RRTMGP libraries via conda and possibly other channels (e.g. apt-get). Libraries would be updated at each release.
For conda we will need to write a recipe and get it adopted as a feedstock. I think conda uses CMake, which we don't currently support.
We can build separate libraries for the kernels and the Fortran frontend, or libraries that combine both. The Python frontend needs just the kernels - is that what we'd start with?
I think there may be a small indexing error in the subroutine create_flavor() in mo_gas_optics_rrtmgp.F90.
https://github.com/earth-system-radiation/rte-rrtmgp/blob/main/rrtmgp/mo_gas_optics_rrtmgp.F90#L1558
To be consistent with the k-distribution data format, line 1558:
do iatm=1,size(key_species,1)
should be replaced with:
do iatm=1,size(key_species,2)
From the LW k-distribution file, key_species is ordered(pair, atmos_layer, bnd), [2,2,6]. Fortunately it has no impact since pair=atmos_layer.
Running the RFMIP test cases with gfortran 6.4.0 with -fcheck=all
throws array out of bounds errors. The error I got is below. I can look more into this once I finish the E3SM CRM port, but someone else might be able to get to this in the meantime. The code should probably be run through this flag until it runs clean. I also plan to run through valgrind after it runs through -fcheck=all
cleanly.
[imn@login1:~/rte-rrtmgp/branches/master] >:O cat ../../scripts/master/fcheck_lw.out
At line 161 of file rrtmgp_rfmip_lw.F90
Fortran runtime error: Index '0' of dimension 1 of array 'gases_to_use' below lower bound of 1
It looks like mo_compute_bc is mistakenly using the ty_gas_optics
abstract type directly. I think the type(ty_gas_optics)
either needs to change to class(ty_gas_optics)
, or it needs to change to type(ty_gas_optics_rrtmgp)
. Note that this prevents the code from compiling.
The indicated compiler flags for PGI in Compiler-flags.md
are intel-specific (i.e. identical to the intel flags). Using them in combination with nvfortran obviously breaks compilation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.