ngeet / fates-containers Goto Github PK
View Code? Open in Web Editor NEWRepository for containerized version of fates for use in future tutorials
Repository for containerized version of fates for use in future tutorials
Step 4: Package draft input datasets as a tar.gz. Upload to OSF.
Test pulling down and running locally on different machines using Docker and singularity.
Write up example notes
Using existing example script and container file (docker pull serbinsh/ctsm_containers:ctsm-fates_next_api-fates_sci.1.23.0_api.7.1.0) generate a new build script that uses pre-extracted default single-point forcing files with full-res surf/domain/ndep etc files to identify full set of required inputs. Use a lower resolution grid to start with the I2000Clm50FatesGs compset
Step 3: Modify example script to run at SLZ using full set of single-point inputs. Test
As an alternative to removing and adding in modified config xml files, utilize a .cime config directory with the necessary files within the $HOME
directory in the docker image. See here for more information: http://esmci.github.io/cime/versions/master/html/users_guide/cime-config.html#customizing-cime
While migrating the baseos files over to the ngeet repo I noted that the libraries and packages are installing to /usr/local/ and that the user name isn't strictly necessary for the baseos build. As such perhaps we should move the use of the useradd
to the host land model specific builds of the fates image?
I'm thinking we should try and keep a common baseos build that is usable for both clm and elm host land models. In migrating the baseos files over from https://github.com/serbinsh/ctsm_containers, I've kept them all in the common baseos folder. If there are host land model specific builds, I propose that we work off of the baseos as much as possible until they significantly diverge.
On a recent modeling team call we discussed the interest in the containerize FATES architecture but that the current limiting factor for many is that it still requires access to the met forcing, domain, surface, and ancillary files for the runs. These can be very large files with limited access
Instead we should develop or expand existing scripts that have been developed to extract single point versions from these gridded files. That way we could provide site-level data packages that can be used with container runs. For example, I modified a script that has been floating around to extract met forcing data to run for a single X-Y (see file). We can expand tools like this to extract the data for all ancillary files but maintain the folder structure making it easy for a user to download the data package, extract, and point the container run to this location as the cesm_input_data location.
create_GSWP3.0.5d.v1_single_point_forcing_data.py.txt
And then running a case script like this one with the container would allow the user to use that input data locally on their machine.
This is the first NGT repo for FATES docker. This is labeled as a "tutorial" but may expand to a repo for building versions of FATES for the testbed or testing. Or perhaps we have this basic tutorial repo and eventually create another "working" repo for dev/testbed release of Docker fates? In which case the license may be different?
This might be easiest done by setting up an autobuild process
The Nordic Earth System Model (NorESM) is integrating FATES and working on utilizing its model within the Galaxy project to enable web-enabled, cloud-hosted research: NordicESMhub/galaxy-tools#39. Since the Galaxy project makes use of containers to promote reproducible science and cloud computing, we are collaborating with NorESM to bring containerized hlm-fates to Galaxy. The hope is that this will help further experience with and adoption of the FATES model.
Galaxy utilizes conda's package management infrastructure to distribute and maintain containers, specifically bioconda containers. As such, it is ideal that the dockerfile recipes utilize conda-based libraries for the containerized application dependencies.
Is there any benefit to separate out the HLM Dockerfile build from the FATES configuration? At first I was thinking this didn't really make much sense since the last step for integrating FATES is so minor. That said perhaps we could do this so as to create specific HLM-only images with descriptive tags that correlate to a particular HLM version. This separation of steps might simplify the tagging structure given that one commit of fates_next_api
can suffice for multiple commits of FATES typically.
Plan: build off of my older example using local met - https://github.com/serbinsh/ctsm_containers/wiki/Example-CTSM-FATES-(CLM5-FATES)-run:-PA-SLZ-using-NGEE-Tropics-driver-files - but instead provide default GSWP3 drivers extracted for the SLZ site together with single-point (i.e. 1 pixel) surface and other forcing files as a packaged product that can be easily downloaded for end-user experimentation.
Step 1: Using existing example script and container file (docker pull serbinsh/ctsm_containers:ctsm-fates_next_api-fates_sci.1.23.0_api.7.1.0) generate a new build script that uses pre-extracted default single-point forcing files with full-res surf/domain/ndep etc files to identify full set of required inputs. Use a lower resolution grid to start with the I2000Clm50FatesGs compset
Step 2: Update existing python script for extract single-point drivers and surf/domain files to work with all other ancillary inputs. Generate a new cesm input datas folder containing all required inputs, but with 1 pixel
Step 3: Modify example script to run at SLZ using full set of single-point inputs. Test
Step 4: Package draft input datasets as a tar.gz. Upload to OSF. Test pulling down and running locally on different machines using Docker and singularity. Write up example notes
Step 5: Other user beta test of script, container, and driver data
Step 6: Update to run with NGEE versions of HLM-FATES containers
Step 7: Add full example to wiki page.
Step 2: Update existing python script for extract single-point drivers and surf/domain files to work with all other ancillary inputs. Generate a new cesm input datas folder containing all required inputs, but with 1 pixel
Build fates based on tag sci.1.36.0_api.11.2.0
which includes new fates-hydro updates using dockerhub autobuild.
@serbinsh in trying to setup an automated build can not see the 'ngeetropics' organization even though he is on the 'owner' team.
Currently, the automated build configuration is building the same Dockerfile image twice with the separate git push
of the latest
and v.x.y.z
tags as dictated in the tagging protocol. Since autobuilds take a long time and the images should be exactly the same, there should be some sort of work around available such that we don't have to actually trigger a build for latest
.
Step 5: Other user beta test of script, container, and driver data
Build fail log: https://hub.docker.com/repository/registry-1.docker.io/ngeetropics/elmtest/builds/1a615460-71c4-41c5-8b19-13a46100fbf2
It appears that the automated build is failing due to the fact that the elm_fates dockerfile was setup for local builds using DOCKER_BUILDKIT=1
experimental feature allowing for the ssh key mount type:
Encountered error: 400 Client Error: Bad Request ("Dockerfile parse error line 54: Unknown flag: mount")
This was setup so as to enable build secrets (particularly for handling personal SSH keys on the build machine). Reference information is here. It is possible to use advanced options for autobuild (and autotest) to allow scripts to run during the build process that will set necessary variables. An example script using DOCKER_BUILDKIT=1
gleaned from a google search is shown here.
That said, is this strictly necessary? What happens if we drop the usage of that option? It'd be nice to hang on to it so that a single dockerfile works for local builds and repo autobuilds for the time being.
@serbinsh ran into this attempting to run the elmtest container on modex (from slack conversation on 31 Aug 2020):
Loading input file list: 'Buildconf/datm.input_data_list'
Model datm missing file file1 = '/home/elmuser/data/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-01.nc'
Trying to download file: 'atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-01.nc' to path '/home/elmuser/data/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-01.nc' using WGET protocol.
wget failed with output: and errput --2020-08-31 13:35:47-- https://web.lcrc.anl.gov/public/e3sm/inputdata/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-01.nc
Resolving web.lcrc.anl.gov (web.lcrc.anl.gov)... 140.221.70.30
Connecting to web.lcrc.anl.gov (web.lcrc.anl.gov)|140.221.70.30|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-31 13:35:47 ERROR 404: Not Found.
Model datm missing file file2 = '/home/elmuser/data/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-02.nc'
Trying to download file: 'atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-02.nc' to path '/home/elmuser/data/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-02.nc' using WGET protocol.
wget failed with output: and errput --2020-08-31 13:35:47-- https://web.lcrc.anl.gov/public/e3sm/inputdata/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-02.nc
Resolving web.lcrc.anl.gov (web.lcrc.anl.gov)... 140.221.70.30
Connecting to web.lcrc.anl.gov (web.lcrc.anl.gov)|140.221.70.30|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-31 13:35:47 ERROR 404: Not Found.
Temporary work around was to download directly via CLM svn repo. Is this due perhaps to the compset being retired in elm?
Creating this issue to provide a cross-repo link to the original discussion: NGEET/fates#577
Both @serbinsh and I are seeing this in different version ctsm-fates builds:
mpif90 -o /home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe cime_comp_mod.o cime_driver.o component_mod.o component_type_mod.o cplcomp_exchange_mod.o map_glc2lnd_mod.o map_lnd2glc_mod.o map_lnd2rof_irrig_mod.o mrg_mod.o prep_aoflux_mod.o prep_atm_mod.o prep_glc_mod.o prep_ice_mod.o prep_lnd_mod.o prep_ocn_mod.o prep_rof_mod.o prep_wav_mod.o seq_diag_mct.o seq_domain_mct.o seq_flux_mct.o seq_frac_mct.o seq_hist_mod.o seq_io_mod.o seq_map_mod.o seq_map_type_mod.o seq_rest_mod.o t_driver_timers_mod.o -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -latm -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lice -L../../gnu/openmpi/nodebug/nothreads/mct/noesmf/lib/ -lclm -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -locn -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lrof -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lglc -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lwav -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lesp -L../../gnu/openmpi/nodebug/nothreads/mct/noesmf/c1a1l1i1o1r1g1w1e1/lib -lcsm_share -L../../gnu/openmpi/nodebug/nothreads/lib -lpio -lgptl -lmct -lmpeu -L/lib/ -lnetcdff -lnetcdf -lcurl -llapack -lblas
/usr/bin/ld: cannot find -lnetcdff
/usr/bin/ld: cannot find -lnetcdf
collect2: error: ld returned 1 exit status
/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/Tools/Makefile:874: recipe for target '/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe' failed
make: *** [/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe] Error 1
My particular image is a ctsm-fates-gcc650 build with cime5.6.28
. Here's the Makefile line: https://github.com/ESMCI/cime/blob/fe16302fc332a02427a9e41a8efe959f2fe8c953/scripts/Tools/Makefile#L873-L874
The gcc650
baseos build hasn't changed and if I recall, the ctsm testrepo
ran successfully in the past using that same baseos, so I'm not sure what's going on here. The LD_LIBRARY_PATH
includes the paths to the combined C and Fortan netcdf libraries, so the baseos seems to be fine.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.