Coder Social home page Coder Social logo

fates-containers's Introduction

FATES Containers Repository

This repository contains the necessary Dockerfiles to build FATES Docker images that are then loaded to the NGEE-Tropics Docker Hub Repo. The images built by this repository are intended to be utilized with the fates tutorial. It is not necessary to clone this repo to run FATES Docker containers or to run the docker tutorial containers. For information on how to the docker containers for the fates tutorial see documentation on the fates tutorial.

Dockerfile: Analogous to a make file in some ways. Used to direct the docker engine in the construction of Docker images.

Docker image: Read-only template containing layers with the necessary OS, environment variables, programs and applications for running a specific task.

Docker container: A running instance of a docker image. Containers are emphemeral and do not save run-time information locally.

Docker hub: Official online registry of Docker images. One of many places docker images may be hosted, however.

Repo Structure

cime_config_files: XML configuration files necessary for running host models in docker containers

docker: Contains the dockerfiles necessary to build docker images. Broken down by host model type (ELM, CLM).

Preparations

  1. Setup and test Docker
  2. Sign up with Docker Hub

Simple Test Run

  1. Pull docker image from Dockerhub: docker pull ngeetropics/<dockerhub-repository-name>
  2. Run the container: docker run --rm -ti --hostname=docker -u $(id -u):$(id -g) -v <your-local-scratch-directory>:/output -v <your-local-inputdata-dir>:/inputdata -v <your-local-scripts-dir>:/scripts ngeetropics/<dockerhub-repository-name>:latest

Notes:

  • The docker images do not contain all the necessary input data, so access to an external data source is necessary.
  • Scripts need to be adjusted to match the internal structure of the docker container. See wiki and template script for details.

See the wiki for more detailed information on using docker to build and run host land model cases.

fates-containers's People

Contributors

glemieux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fates-containers's Issues

Develop draft general single-site script for running containers with single-site forcing files

Using existing example script and container file (docker pull serbinsh/ctsm_containers:ctsm-fates_next_api-fates_sci.1.23.0_api.7.1.0) generate a new build script that uses pre-extracted default single-point forcing files with full-res surf/domain/ndep etc files to identify full set of required inputs. Use a lower resolution grid to start with the I2000Clm50FatesGs compset

License - How do we want to license this repo?

@rgknox @glemieux

This is the first NGT repo for FATES docker. This is labeled as a "tutorial" but may expand to a repo for building versions of FATES for the testbed or testing. Or perhaps we have this basic tutorial repo and eventually create another "working" repo for dev/testbed release of Docker fates? In which case the license may be different?

Tagging latest and v.x.y.z with the same dockerhub image

Currently, the automated build configuration is building the same Dockerfile image twice with the separate git push of the latest and v.x.y.z tags as dictated in the tagging protocol. Since autobuilds take a long time and the images should be exactly the same, there should be some sort of work around available such that we don't have to actually trigger a build for latest.

Common baseos for all host land models?

I'm thinking we should try and keep a common baseos build that is usable for both clm and elm host land models. In migrating the baseos files over from https://github.com/serbinsh/ctsm_containers, I've kept them all in the common baseos folder. If there are host land model specific builds, I propose that we work off of the baseos as much as possible until they significantly diverge.

Enhancement: First pass at end-to-end site-scale containerized HLM-FATES workflow

Plan: build off of my older example using local met - https://github.com/serbinsh/ctsm_containers/wiki/Example-CTSM-FATES-(CLM5-FATES)-run:-PA-SLZ-using-NGEE-Tropics-driver-files - but instead provide default GSWP3 drivers extracted for the SLZ site together with single-point (i.e. 1 pixel) surface and other forcing files as a packaged product that can be easily downloaded for end-user experimentation.

Step 1: Using existing example script and container file (docker pull serbinsh/ctsm_containers:ctsm-fates_next_api-fates_sci.1.23.0_api.7.1.0) generate a new build script that uses pre-extracted default single-point forcing files with full-res surf/domain/ndep etc files to identify full set of required inputs. Use a lower resolution grid to start with the I2000Clm50FatesGs compset

Step 2: Update existing python script for extract single-point drivers and surf/domain files to work with all other ancillary inputs. Generate a new cesm input datas folder containing all required inputs, but with 1 pixel

Step 3: Modify example script to run at SLZ using full set of single-point inputs. Test

Step 4: Package draft input datasets as a tar.gz. Upload to OSF. Test pulling down and running locally on different machines using Docker and singularity. Write up example notes

Step 5: Other user beta test of script, container, and driver data

Step 6: Update to run with NGEE versions of HLM-FATES containers

Step 7: Add full example to wiki page.

Conda dependency baseos for ctsm-fates in NorESM Galaxy project

The Nordic Earth System Model (NorESM) is integrating FATES and working on utilizing its model within the Galaxy project to enable web-enabled, cloud-hosted research: NordicESMhub/galaxy-tools#39. Since the Galaxy project makes use of containers to promote reproducible science and cloud computing, we are collaborating with NorESM to bring containerized hlm-fates to Galaxy. The hope is that this will help further experience with and adoption of the FATES model.

Galaxy utilizes conda's package management infrastructure to distribute and maintain containers, specifically bioconda containers. As such, it is ideal that the dockerfile recipes utilize conda-based libraries for the containerized application dependencies.

Intermediate docker images for specific HLMs without FATES?

Is there any benefit to separate out the HLM Dockerfile build from the FATES configuration? At first I was thinking this didn't really make much sense since the last step for integrating FATES is so minor. That said perhaps we could do this so as to create specific HLM-only images with descriptive tags that correlate to a particular HLM version. This separation of steps might simplify the tagging structure given that one commit of fates_next_api can suffice for multiple commits of FATES typically.

Missing driver data for e3sm

@serbinsh ran into this attempting to run the elmtest container on modex (from slack conversation on 31 Aug 2020):

Loading input file list: 'Buildconf/datm.input_data_list'
  Model datm missing file file1 = '/home/elmuser/data/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-01.nc'
Trying to download file: 'atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-01.nc' to path '/home/elmuser/data/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-01.nc' using WGET protocol.
wget failed with output:  and errput --2020-08-31 13:35:47--  https://web.lcrc.anl.gov/public/e3sm/inputdata/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-01.nc
Resolving web.lcrc.anl.gov (web.lcrc.anl.gov)... 140.221.70.30
Connecting to web.lcrc.anl.gov (web.lcrc.anl.gov)|140.221.70.30|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-31 13:35:47 ERROR 404: Not Found.
  Model datm missing file file2 = '/home/elmuser/data/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-02.nc'
Trying to download file: 'atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-02.nc' to path '/home/elmuser/data/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-02.nc' using WGET protocol.
wget failed with output:  and errput --2020-08-31 13:35:47--  https://web.lcrc.anl.gov/public/e3sm/inputdata/atm/datm7/atm_forcing.datm7.Qian.T62.c080727/Solar6Hrly/clmforc.Qian.c2006.T62.Solr.1996-02.nc
Resolving web.lcrc.anl.gov (web.lcrc.anl.gov)... 140.221.70.30
Connecting to web.lcrc.anl.gov (web.lcrc.anl.gov)|140.221.70.30|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-31 13:35:47 ERROR 404: Not Found.

Temporary work around was to download directly via CLM svn repo. Is this due perhaps to the compset being retired in elm?

dockerhub: elm_fates build fail due to invalid ssh setup

Build fail log: https://hub.docker.com/repository/registry-1.docker.io/ngeetropics/elmtest/builds/1a615460-71c4-41c5-8b19-13a46100fbf2

It appears that the automated build is failing due to the fact that the elm_fates dockerfile was setup for local builds using DOCKER_BUILDKIT=1 experimental feature allowing for the ssh key mount type:

Encountered error: 400 Client Error: Bad Request ("Dockerfile parse error line 54: Unknown flag: mount")

This was setup so as to enable build secrets (particularly for handling personal SSH keys on the build machine). Reference information is here. It is possible to use advanced options for autobuild (and autotest) to allow scripts to run during the build process that will set necessary variables. An example script using DOCKER_BUILDKIT=1 gleaned from a google search is shown here.

That said, is this strictly necessary? What happens if we drop the usage of that option? It'd be nice to hang on to it so that a single dockerfile works for local builds and repo autobuilds for the time being.

Move useradd command to fates-specific dockerfiles?

While migrating the baseos files over to the ngeet repo I noted that the libraries and packages are installing to /usr/local/ and that the user name isn't strictly necessary for the baseos build. As such perhaps we should move the use of the useradd to the host land model specific builds of the fates image?

CTSM buildexe failure: `-lnetcdf` and `-lnetcdff` not found?

Both @serbinsh and I are seeing this in different version ctsm-fates builds:

mpif90  -o /home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe cime_comp_mod.o cime_driver.o component_mod.o component_type_mod.o cplcomp_exchange_mod.o map_glc2lnd_mod.o map_lnd2glc_mod.o map_lnd2rof_irrig_mod.o mrg_mod.o prep_aoflux_mod.o prep_atm_mod.o prep_glc_mod.o prep_ice_mod.o prep_lnd_mod.o prep_ocn_mod.o prep_rof_mod.o prep_wav_mod.o seq_diag_mct.o seq_domain_mct.o seq_flux_mct.o seq_frac_mct.o seq_hist_mod.o seq_io_mod.o seq_map_mod.o seq_map_type_mod.o seq_rest_mod.o t_driver_timers_mod.o  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -latm  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lice  -L../../gnu/openmpi/nodebug/nothreads/mct/noesmf/lib/ -lclm  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -locn  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lrof  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lglc  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lwav  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lesp -L../../gnu/openmpi/nodebug/nothreads/mct/noesmf/c1a1l1i1o1r1g1w1e1/lib -lcsm_share -L../../gnu/openmpi/nodebug/nothreads/lib -lpio -lgptl -lmct -lmpeu   -L/lib/ -lnetcdff -lnetcdf -lcurl -llapack -lblas
/usr/bin/ld: cannot find -lnetcdff
/usr/bin/ld: cannot find -lnetcdf
collect2: error: ld returned 1 exit status
/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/Tools/Makefile:874: recipe for target '/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe' failed
make: *** [/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe] Error 1

My particular image is a ctsm-fates-gcc650 build with cime5.6.28. Here's the Makefile line: https://github.com/ESMCI/cime/blob/fe16302fc332a02427a9e41a8efe959f2fe8c953/scripts/Tools/Makefile#L873-L874

The gcc650 baseos build hasn't changed and if I recall, the ctsm testrepo ran successfully in the past using that same baseos, so I'm not sure what's going on here. The LD_LIBRARY_PATH includes the paths to the combined C and Fortan netcdf libraries, so the baseos seems to be fine.

Develop site-scale FATES inputs/drivers for containerized runs

On a recent modeling team call we discussed the interest in the containerize FATES architecture but that the current limiting factor for many is that it still requires access to the met forcing, domain, surface, and ancillary files for the runs. These can be very large files with limited access

Instead we should develop or expand existing scripts that have been developed to extract single point versions from these gridded files. That way we could provide site-level data packages that can be used with container runs. For example, I modified a script that has been floating around to extract met forcing data to run for a single X-Y (see file). We can expand tools like this to extract the data for all ancillary files but maintain the folder structure making it easy for a user to download the data package, extract, and point the container run to this location as the cesm_input_data location.

create_GSWP3.0.5d.v1_single_point_forcing_data.py.txt

config_fates.cfg.txt

And then running a case script like this one with the container would allow the user to use that input data locally on their machine.

create_ctsm-fates_1pt_case_custom_site.sh.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.