Coder Social home page Coder Social logo

dynamico-giant's Introduction

install

set environment (do it once)

cp ~aspigaheat/.bash_profile ~/.bash_profile
source ~/.bash_profile

download structure

cd $SCRATCHDIR
git clone https://github.com/aymeric-spiga/dynamico-giant.git [optional different name]

install code

cd dynamico-giant
./install.sh
./install_ioipsl.sh

compile code

cd saturn
./compile_occigen.sh

prepare starts (brief run)

cd makestart
sbatch job_mpi
cd ..

check job

squeue

symbolic links startfi.nc and start_icosa.nc should link to something. if this is OK, then run DYNAMICO

sbatch job_mpi


problems

  • kappa
  • drag value
  • preff
  • correct in gfluxi
  • saturn1d psurf
  • saturn1d ichoice
  • bad Ls for ending the 1D

sandbox

check_conserv = detailed dans run_icosa.def (vori fichiers log)

tested with rev 1711 of LMD models

important

  • synchronize call to physics with XIOS outputs
  • synchronize also for physics
  • do not use a timestep with more than 3 digits after .
  • enough procs for xios_server (1 per 25)
  • set nprocs_for_run = 10 * nsplit_i * nsplit_j
  • do not call physics and radtrans too often
  • increase buffer (factor 4 --> 12)
  • do not open output file before the end
  • augmenter info_level ralentit
  • faire tail -f icosa_lmdz.out | grep "0000:" ralentit
  • gros effet de la fréquence de sortie (y compris en grille native)

semble-t-il le remapping explose la mémoire utilisée par XIOS --> utiliser plus que 1 serveur par 25 coeurs de run ?

il faut 100 serveurs pour 1200 sinon ça freeze testé en sortie grille native environ même temps de calcul que version non serveur voire un peu plus rapide (sauf fin XIOS beaucoup plus longue le temps de fermer tous les serveurs) -- autre probleme: il n ecrit pas bien le startfi.nc -- et en fait il ne clot jamais...

note

server-client is necessary for online remapping attached works only if interpolated nlat < nproc used (a strong requirement actually)

fix compilation problems

dans le .bash_profile

export PATH=$PATH:.:~millourheat/FCM_V1.2/bin
ulimit -s unlimited
# modules
module purge
module load intel/17.0
module load intelmpi/2017.0.098
module load hdf5/1.8.17
module load netcdf/4.4.0_fortran-4.4.2
module load ncview
module load nco
module load qt
module load python
module load mkl

dynamico-giant's People

Contributors

aymeric-spiga avatar debbardet avatar jvatantollone avatar milcareck-gwenael avatar

Watchers

 avatar  avatar  avatar

dynamico-giant's Issues

data not initialized

3D dynamico runs stop approximately 10 seconds after they begin. I get the following error (message from proc 19 is sent by everyone):

00: Remapping...
19: > Error [template void CType::checkEmpty(void) const] : In file '/scratch/cnt0027/lmd1167/aboissinot/dynamico-giant/code/XIOS/src/attribute_template_impl.hpp', line 78 -> On checking attribute with id=operation : data is not initialized
19:
19:
19: application called MPI_Abort(MPI_COMM_WORLD, -1) - process 19
00: slurmstepd: *** STEP 5202504.0 ON n1668 CANCELLED AT 2018-09-27T11:37:30 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: n1668: tasks 0-19: Killed
srun: Terminating job step 5202504.0
srun: error: n1675: tasks 20-39: Killed

Problem in bilinearbig in 64 levels

Yes with version 1984 the problem is still
000: forrtl: severe (408): fort: (3): Subscript #2 of the array F2D_ARR has value -858993460 which is less than the lower bound of 1
000:
000: Image PC Routine Line Source
000: icosa_lmdz.exe 0000000002B95E66 Unknown Unknown Unknown
000: icosa_lmdz.exe 00000000014F1DBB bilinearbig_ 95 bilinearbig.f90
000: icosa_lmdz.exe 00000000014C16C4 interpolateh2h2_ 125 interpolateH2H2.f90
000: icosa_lmdz.exe 00000000012B419A optcv_ 193 optcv.f90
000: icosa_lmdz.exe 0000000001097C99 callcorrk_ 812 callcorrk.f90
000: icosa_lmdz.exe 0000000000DB955E physiq_mod_mp_phy 825 physiq_mod.f90

Originally posted by @ehouarn in #4 (comment)

Model compilation with 64 vertical levels

I can't run the model with 64 vertical levels: There is a problem with fields modified in the restart files.
This is the error message into icosa_lmdz_270.out :

0954: application called MPI_Abort(MPI_COMM_WORLD, -1) - process 954
0955: > Error [CNc4DataInput::readFieldAttributes_(CField* field, bool readAttributeValues)] : In file '/scratch/cnt0027/lmd1167/dbardet/dynamico-giant/code/XIOS/src/io/nc4_data_input.cpp', line 174 -> Field 'q' has incorrect dimension
0955: Verify dimension of grid defined by 'grid_ref' or 'domain_ref'/'axis_ref' and dimension of grid in read file.

I think I have to recompile the model with 64 vertical levels, but I don't find the executable file that I have to modify. Could you help me ?

timestamp problem

by @debbardet

I tested saturn with 64 levels with my my modified and corrected (by commit e1adc6e) start files and I don't keep the error with bilinearbig, but I have a new error message (obviously!), that is:

0000: GETIN start_file_name = start_icosa
0000: -> info : Impossible to get the packet with timestamp = 0
0000: Available timestamp are :
0000: -> info :
0000: > Error [CConstDataPacketPtr CStoreFilter::getPacket(Time timestamp) const] : In file '/scratch/cnt0027/lmd1167/dbardet/dynamico-giant-64lvls/code/XIOS/src/filter/store_filter.cpp', line 54 -> Impossible to get the packet with timestamp = 0
0000:
0000: application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
0000: slurmstepd: *** STEP 5318413.0 ON n1017 CANCELLED AT 2018-10-12T10:26:05 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: n1017: tasks 2-4,6-8,10-16,20,22-23: Killed
srun: Terminating job step 5318413.0

Originally posted by @debbardet in #7 (comment)

Error XIOS

The model in makestart runs smoothly in debug mode then after about 13'40 CPU time and 84 iterations, it crashes and says

00: icosa_lmdz.exe: /scratch/cnt0027/lmd1167/aspigaplaneto/dynamico-giant/code/XIOS/extern/blitz/blitz/array-impl.h:1330: bool blitz::Array<P_numtype, N_rank>::assertInRange(int) const [with P_numtype = double, N_rank = 1]: Assertion `0' failed.
00: forrtl: error (76): Abort trap signal
00: Image              PC                Routine            Line        Source             
00: icosa_lmdz.exe     0000000002CE6881  Unknown               Unknown  Unknown
00: icosa_lmdz.exe     0000000002CE49BB  Unknown               Unknown  Unknown
00: icosa_lmdz.exe     0000000002C9D644  Unknown               Unknown  Unknown
00: icosa_lmdz.exe     0000000002C9D456  Unknown               Unknown  Unknown
00: icosa_lmdz.exe     0000000002C43907  Unknown               Unknown  Unknown
00: icosa_lmdz.exe     0000000002C4A76E  Unknown               Unknown  Unknown
00: libpthread-2.17.s  00002B46245395E0  Unknown               Unknown  Unknown
00: libc-2.17.so       00002B4624AC21F7  gsignal               Unknown  Unknown
00: libc-2.17.so       00002B4624AC38E8  abort                 Unknown  Unknown
00: libc-2.17.so       00002B4624ABB266  Unknown               Unknown  Unknown
00: libc-2.17.so       00002B4624ABB312  Unknown               Unknown  Unknown
00: icosa_lmdz.exe     00000000015641D6  _ZNK5blitz5ArrayI        
00: 1328  array-impl.h
00: icosa_lmdz.exe     0000000001564A88  _ZNK5blitz5ArrayI        1697  array-impl.h
00: icosa_lmdz.exe     0000000001C62B60  _ZN4xios6CField14         181  field.cpp
00: icosa_lmdz.exe     0000000001D194B4  _ZN4xios17CFileWr          36  file_writer_filter.cpp
00: icosa_lmdz.exe     00000000023BDD61  _ZN4xios9CInputPi          37  input_pin.cpp
00: icosa_lmdz.exe     000000000276652D  _ZN4xios10COutput          46  output_pin.cpp
00: icosa_lmdz.exe     0000000002765DF6  _ZN4xios10COutput          35  output_pin.cpp
00: icosa_lmdz.exe     00000000028E1312  _ZN4xios7CFilter1          16  filter.cpp
00: icosa_lmdz.exe     00000000023BDD61  _ZN4xios9CInputPi          37  input_pin.cpp
00: icosa_lmdz.exe     000000000276652D  _ZN4xios10COutput          46  output_pin.cpp
00: icosa_lmdz.exe     0000000002765DF6  _ZN4xios10COutput          35  output_pin.cpp
00: icosa_lmdz.exe     00000000027ABAC5  _ZN4xios23CSpatia          68  spatial_transform_filter.cpp
00: icosa_lmdz.exe     00000000023BDD61  _ZN4xios9CInputPi          37  input_pin.cpp
00: i
00: cosa_lmdz.exe     000000000276652D  _ZN4xios10COutput          46  output_pin.cpp
00: icosa_lmdz.exe     0000000002765DF6  _ZN4xios10COutput          35  output_pin.cpp
00: icosa_lmdz.exe     00000000027A61CD  _ZN4xios13CSource          66  source_filter.cpp
00: icosa_lmdz.exe     0000000001CA60FA  _ZN4xios6CField7s          23  field_impl.hpp
00: icosa_lmdz.exe     00000000021234EB  cxios_write_data_         434  icdata.cpp
00: icosa_lmdz.exe     00000000015B019B  idata_mp_xios_sen         466  idata.f90
00: icosa_lmdz.exe     0000000000699130  xios_mod_mp_xios_         458  xios_mod.f90
00: icosa_lmdz.exe     000000000068FD5C  xios_mod_mp_xios_         293  xios_mod.f90
00: icosa_lmdz.exe     0000000000482821  output_field_mod_          50  output_field.f90
00: icosa_lmdz.exe     0000000000747AB4  dissip_gcm_moddis         603  dissip_gcm.f90
00: icosa_lmdz.exe     0000000000744FBC  dissip_gcm_mod_mp         564  dissip_gcm.f90
00: icosa_lmdz.exe     00000000004E65AE  timeloop_gcm_mod_         309  timeloop_gcm.f90
00: icosa_lmdz.exe     00000000004249B1
00:   icosa_init_mod_mp          66  icosa_init.f90
00: libiomp5.so        00002B46247EB413  __kmp_invoke_micr     Unknown  Unknown
00: libiomp5.so        00002B46247BB60D  __kmp_fork_call       Unknown  Unknown
00: libiomp5.so        00002B4624793EE8  __kmpc_fork_call      Unknown  Unknown
00: icosa_lmdz.exe     000000000042487D  icosa_init_mod_mp          62  icosa_init.f90
00: icosa_lmdz.exe     000000000041C0E0  Unknown               Unknown  Unknown
00: icosa_lmdz.exe     000000000041C09E  Unknown               Unknown  Unknown

Weird warning in tpindex with 64 levels

Running Saturn case with 64 levels, I get this weird warning from tpindex:
000: tpindex: Caution! Pressure of upper levels lower than ref pressure for k-coef:
000: k-coeff fixed for upper levels
000: PWL= -3.11805930589871
000: PREF(1)= -3.00000000000000
But codes seems to run properly

Stopped model after 3 minutes of calculations

I have an issue (one more time...).
The model is stopped after ~3 minutes of calculations, without error message.
In icosa_lmdz_270.out, I can see it's stopped at the moment where it check the value of "q" to use aerosols:
0200: naerkind= back2lay 1
0200: Warning: no variance range in aeroptproperties
0200: Tracers found in aeropacity:
0200: If you would like to use aerosols, make sure any old
0200: start files are updated in newstart using the option
0200: q=0
0200: Active aerosols found in aeropacity:
0200: iaero_back2lay= 1

I checked the value of "q" in my start file (start_icosa_270.nc) and all the values of q are zero.

Problem with regular two degree

I can't run the model with 64 vertical levels because I have this error message :

0335: > Error [CField::solveGridReference(void)] : In file '/scratch/cnt0027/lmd1167/dbardet/dynamico-giant/code/XIOS/src/node/field.cpp', line 1334 -> Invalid reference to domain 'regular_two_degree'.
0335:
0335: application called MPI_Abort(MPI_COMM_WORLD, -1) - process 335

Do you know what's happening ?

Try just running the 1D model (with 64 layers) in debug mode; in my case:

forrtl: severe (408): fort: (2): Subscript #1 of the array TAUCUMV has value 130 which is greater than the upper bound of 129

Image PC Routine Line Source
libifcoremt.so.5 00002B5597C9E0E9 for_emit_diagnost Unknown Unknown
rcm1d_64_phystd_s 0000000000A09AED optcv_ 375 optcv.f90
rcm1d_64_phystd_s 00000000007F4A3E callcorrk_ 807 callcorrk.f90
rcm1d_64_phystd_s 000000000056BE53 physiq_mod_mp_phy 812 physiq_mod.f90
rcm1d_64_phystd_s 000000000041D5B2 MAIN__ 2739 rcm1d.f

Originally posted by @ehouarn in https://github.com/_render_node/MDU6SXNzdWUzNjQzOTQ1NjA=/issues/unread_timeline#issuecomment-425400225

long time of calculation

The problem with the time of calculation is the time used by the model to write in icosa_lmdz.out: I selected level_info=100 and print_file=false. To keep a big info_level, we have to use print_file mode because it is faster to create a file per processor rather each processor write in the same file.

Originally posted by @debbardet in #7 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.