Coder Social home page Coder Social logo

input4mips_cvs's Issues

Automate scrape from ESGF

At the moment, we manually scrape/poll the ESGF to create our esgf.json file. It would be better if this process were automated. That should be possible. The only issue right now is that the API which is hit is not public, so we'd have to work out a workaround. Two possible options: a) whitelist some known GitHub CI API (feels wrong to me though, mainly for security reasons) b) run the script on nimbus, and automatically create PRs into this repository from nimbus using the GitHub CLI (this feels like the right solution to me, and would also be pretty easy I think given we already use the GitHub CLI in our GitHub actions).

Registering CMIP7 prelim solar historical forcing

Datasets have the following filenames:
solarforcing-ref-day_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-20231231.nc
solarforcing-ref-mon_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-20231231.nc
solarforcing-picontrol-fx_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-18730128.nc
There will also be an alternative SSI dataset which we shall call
solarforcing-alternative-ssi-day_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-20231231.nc

Current global attributes (for solarforcing-ref-day_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-20231231.nc):

:title = "CMIP7 solar forcing historic (1850-2023)";
:institution_id = "SOLARIS-HEPPA";
:institution = "APARC SOLARIS-HEPPA";
:activity_id = "input4MIPs";
:comment = "SSI data are taken from NRL v03r00_preliminary. Sub-annual variability has been added for the period before 1874; m. TSI in this file is from source data as integral over SSI between 0 and 10,000nm";
:time_coverage_start = "1850-01-01";
:time_coverage_end = "2023-12-31";
:frequency = "day";
:source = "nrlssi_v03r00_preliminary (Odele Coddington et al., pers. comm.); Ap, Kp, F10.7 from ftp.ngdc.noaa.gov until 2014, afterwards from GFZ Potsdam (https://kp.gfz-potsdam.de), P-IPR from SEP-II (Ilya Usoskin et al., pers. comm.), MEE-IPR from APEEP apeep_v2024b_cmip7 (Max van de Kamp et al., pers. comm.), GCR-IPR from CRII v2024-02 (Ilya Usoskin et al., pers. comm.)";
:source_id = "SOLARIS-HEPPA-CMIP-4-1";
:realm = "atmos";
:further_info_url = "http://solarisheppa.geomar.de/cmip7";
:metadata_url = "see http://solarisheppa.geomar.de/solarisheppa/sites/default/files/data/cmip7/CMIP7_metadata_description_4.1.pdf";
:contributor_name = "Bernd Funke, Timo Asikainen, Stefan Bender, Thierry Dudok de Wit, Illaria Ermolli, Margit Haberreiter, Doug Kinnison, Sergey Koldoboskiy, Daniel R. Marsh, Hilde Nesse, Annika Seppaelae, Miriam Sinnhuber, Ilya Usoskin, Max van de Kamp, Pekka T. Verronen";
:references = "Funke et al., Geosci. Model Dev., 17, 1217–1227, https://doi.org/10.5194/gmd-17-1217-2024, 2024";
:contact = "[email protected]";
:dataset_category = "solar";
:dataset_version_number = "4.1";
:grid_label = "gn";
:mip_era = "CMIP7";
:target_mip = "CMIP";
:variable_id = "multiple";
:license = "Solar forcing data produced by SOLARIS-HEPPA is licensed under a Creative Commons Attribution "Share Alike" 4.0 International License (http://creativecommons.org/licenses/by/4.0/). The data producers and data providers make no warranty, either expressed or implied, including but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.";
:Conventions = "CF-1.6";
:creation_date = "2024-05-06T14:57:54Z";

Register GHG source_id and institution_id

@durack1 moving discussion from #12 here.

Summary of @durack1's comment here:

        "PCMDI-AMIP-1-1-9":{
            "calendar":"gregorian",
            "comment":"Based on Hurrell SST/sea ice consistency criteria applied to merged HadISST (1870-01 to 1981-10) & NCEP-0I2 (1981-11 to 2022-12)",
            "contact":"PCMDI ([email protected])",
            "dataset_category":"SSTsAndSeaIce",
            "further_info_url":"https://pcmdi.llnl.gov/mips/amip",
            "grid":"1x1 degree longitude x latitude",
            "grid_label":"gn",
            "institution":"Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA",
            "institution_id":"PCMDI",
            "license":"AMIP boundary condition data produced by PCMDI is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing input4MIPs output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law",
            "mip_era":"CMIP6Plus",
            "nominal_resolution":"1x1 degree",
            "product":"observations",
            "references":"Taylor, K.E., D. Williamson and F. Zwiers, 2000: The sea surface temperature and sea ice concentration boundary conditions for AMIP II simulations. PCMDI Report 60, Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, 25 pp. Available online: https://pcmdi.llnl.gov/report/pdf/60.pdf",
            "region":[
                "global_ocean"
            ],
            "release_year":"2023",
            "source":"PCMDI-AMIP 1.1.9: Merged SST based on UK MetOffice HadISST and NCEP OI2",
            "source_description":"Sea surface temperature and sea-ice datasets produced by PCMDI (LLNL) for the AMIP (DECK) experiment of CMIP6Plus",
            "source_id":"PCMDI-AMIP-1-1-9",
            "source_type":"satellite_blended",
            "source_variables":[
                "areacello",
                "sftof",
                "siconc",
                "siconcbcs",
                "tos",
                "tosbcs"
            ],
            "source_version":"1.1.9",
            "target_mip":"CMIP",
            "title":"PCMDI-AMIP 1.1.9 dataset prepared for input4MIPs"
        },

Summary of my reply here:

  • I think I understand
  • Does it make sense to remove some of the data file specific keys (e.g. grid) to avoid having to make new source IDs for each data file? That would mean the source ID info could be applied to all data files in a data set, with some data file level metadata being captured elsewhere (probably in the file I guess)

Discussion can continue below.

update SST datasets

..PCMDIobs/obs4MIPs_input/MOHC
HadISST1-1 (high, greens function - here and other sources) - updated to 2023-04-12
..PCMDIobs/obs4MIPs_input/NOAA-PSL
COBE1 (high) - 2023-06-01
ERSST5 (low) - 2023-10-01
OISST2-1 (low) - 2023-10-01
..PCMDIobs/obs4MIPs_input/RSS
RSS-MW5-1
RSS-MW-IR5-1

Capture version deprecation information in CVs

We have had a single case of the SOLARIS-HEPPA-CMIP-4-2 retracted dataset which was promptly replaced with the replaced by SOLARIS-HEPPA-CMIP-4-3. In an associated issue, there is the note the problem was “encountered an issue with the proton ionization data in v4.2” 29 Jul 2024, but this is not prominently available.

It would be ideal to capture a meaningful description of the problem and how it is solved so that a modeling group can ascertain whether they need to pay attention to the data correction or proceed with their existing version. For this v4.2 -> v4.3 dataset update, presumably, most modeling groups would be concerned with wrong data (@vnaik60, do you use the proton ionization data for simulations?)., but in the case that there was a metadata inconsistency, then such a problem would be unlikely to require a data switchout.

ping @znichollscr @vnaik60

Test stratospheric aerosol file (extinction) being uploaded on input4mips FTP

Tagging @durack1 @znichollscr

Strat aerosol test file uploaded for checking

I'm uploading one of my test file to the input4mips FTP for Paul to check as instructed by Zeb, and following instuctions on https://input4mips-validation.readthedocs.io/en/latest/how-to-guides/how-to-upload-to-ftp/. The dry run went well and the file is currently uploading. Let me know if there is any issue.

Outstanding questions

The two main outstanding questions for my datasets are:

  1. Do you want one file per variable? This feels a bit messy but happy to catter to your preference! I have 8 variables in the aerosol optical property dataset, emission depends.

  2. Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset rather than a a list of eruption with emission parameters? There is a limited number of eruptions so I'm unsure this makes sense/how the few modelling group modelling this would prefer the data (I can poll them! With UKESM we work from an eruption list). One other concern is that the core data is a mass of SO2 for each eruption. If I grid that as a flux and people regrid that to their model grid(lat/lon, height, not sure whether there would be a time concern), they should try to conserve the mass for each eruption. But that information would be a lot harder to track from a gridded flux file rather than from an eruption list.

Registering contributing institution_id's

For CMIP6, we had the following institution_id entries registered - input4MIPs_institution_id.json:

{
    "institution_id":{
        "CCCma":"Canadian Centre for Climate Modelling and Analysis, Victoria, BC V8P 5C2, Canada",
        "CNRM-Cerfacs":"CNRM (Centre National de Recherches Meteorologiques, Toulouse 31057, France), CERFACS (Centre Europeen de Recherche et de Formation Avancee en Calcul Scientifique, Toulouse 31100, France)",
        "IACETH":"Institute for Atmosphere and Climate, ETH Zurich, Zurich 8092, Switzerland",
        "IAMC":"Integrated Assessment Modeling Consortium (see www.globalchange.umd.edu/iamc/membership for complete membership). Mailing address: International Institute for Applied Systems Analysis (IIASA), Schlossplatz 1, A-2361 Laxenburg, Austria",
        "ImperialCollege":"Imperial College London, South Kensington Campus, London SW7 2AZ, UK",
        "MOHC":"Met Office Hadley Centre, Fitzroy Road, Exeter, Devon, EX1 3PB, UK",
        "MPI-B":"Max Planck Institute for Biogeochemistry, Jena 07745, Germany",
        "MPI-M":"Max Planck Institute for Meteorology, Hamburg 20146, Germany",
        "MRI":"Meteorological Research Institute, Tsukuba, Ibaraki 305-0052, Japan",
        "NASA-GSFC":"NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA",
        "NCAR":"National Center for Atmospheric Research, Boulder, CO 80307, USA",
        "NCAS":"National Centre for Atmospheric Science, University of Reading, Reading RG6 6BB, UK",
        "PCMDI":"Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA",
        "PNNL-JGCRI":"Pacific Northwest National Laboratory - Joint Global Change Research Institute, College Park, MD 20740, USA",
        "SOLARIS-HEPPA":"SOLARIS-HEPPA, GEOMAR Helmholtz Centre for Ocean Research, Kiel 24105, Germany",
        "UCI":"Department of Earth System Science, University of California Irvine, Irvine, CA 92697, USA",
        "UColorado":"University of Colorado, Boulder, CO 80309, USA",
        "UReading":"University of Reading, Reading RG6 6UA, UK",
        "UoM":"Australian-German Climate & Energy College, The University of Melbourne (UoM), Parkville, Victoria 3010, Australia",
        "UofMD":"University of Maryland (UofMD), College Park, MD 20742, USA",
        "VUA":"Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands"
    }
}

Of these CNRM-CERFACS, IACETH, IAMC, ImperialCollege, and MOHC are not registered in PCMDI/mip-cmor-tables/MIP_institutions.json. We'll also need to register a new institution CR (@znichollscr).

@wolfiex how would you recommend we proceed, and should I open an issue in PCMDI/mip-cmor-tables?

Register biomass burning source_id and institution_id

@mjevanmarle just creating an issue as a placeholder for discussions in finalizing the registration of the biomass burning institution_id and source_id.

Note the CMIP6 contribution had institution_id: VUA (here) and a couple of versioned releases, so source_id entries: VUA-CMIP-BB4CMIP6-1-0, 1-1 and 1-2 (here). I wonder if we want to maintain any consistency with the previous version, or just start again? I see that you've used the previous template so VUA-CMIP-BB4CMIP6-1-0 becomes DRES-CMIP-BB4CMIP7-1-0.

Just a note, while we start to coordinate the collation of these prototype (v0) datasets, and gather feedback, we're aiming to catch these in the "CMIP6Plus" project, in preparation for CMIP7 in a couple of years. This will allow a clean split between the CMIP7 "endorsed" forcing collection, and those that we are working out the kinks on (caught in CMIP6Plus).

We have updated the institution registration a little moving beyond CMIP6; these now depend on the RoR registry (see here), and, as an example, Deltares is already registered - https://ror.org/01deh9c76

@wolfiex @matthew-mizielinski @taylor13 @vnaik60 @znichollscr ping

Clarifying rules around controlled vocabularies

In short, it's not clear to me what the rules for the controlled vocabularies are. For example, which fields are compulsory, which can be inferred from data, which are only required for ESGF (i.e. are things that data providers shouldn't worry about, but tooling after submission does need to handle).

To get the conversation started, I've made a google doc here: https://docs.google.com/document/d/1oLK4mWW6TX2YPrhGoLdMLcrK7vX1flU3BjiheTb2Hwk/edit?usp=sharing

Once we've got something a bit more concrete, I will pull everything back into issues that we can track across the various repositories.

Registering institutions

Can be closed as this is a duplicate of #8

There doesn't appear to be an obvious place to register institutions. In the source IDs, they are a field, but there is no *instution_IDs.json file in this repo.

Are they instead meant to be registered instead in https://github.com/PCMDI/mip-cmor-tables/blob/main/MIP_institutions.json or https://github.com/PCMDI/input4MIPs-cmor-tables/blob/master/input4MIPs_institution_id.json#L23?

@durack1 have I understood correctly or am I missing something?

Add source_id/overview view for HTML pages

We currently have two very granular views of the data, a dataset view, with 270 current entries for 3 source_id's registered, and a files view, with 1253 current entries for 4 source_id's registered. These are both dense, and having a higher level source_id view would be useful, so the information we are manually collating at https://wcrp-cmip.org/cmip7-task-teams/forcings/#forcing_datasets_availability could be viewed on a dynamic page in this repo.

This will likely require a get_source_id_view function being added to the html_generation.py file.

Add links to CMOR definitions

The CMOR definitions are very opaque to people who don't work with this stuff a lot. For example, what the grid labels actually mean, which is actually spelt out pretty nicely here: https://github.com/PCMDI/mip-cmor-tables/blob/main/MIP_grid_label.json

To close this issue:

  • add a section to the README which collates these key links
  • point to that section in the README anywhere we think is sensible

The long-term solution would be to link docs on in the CMOR/MIP repositories where this is captured, but I don't think they're stable/available yet (@durack1 please correct me if I'm wrong).

x-ref #55 (comment)

Add workaround for automated comment generation when there are lots of changes

When there are lots of changes, the automated comment from the bot can fail because the comment can be too big. For example, here: https://github.com/PCMDI/input4MIPs_CVs/actions/runs/10431412061/job/28891047381?pr=101

To close:

  • Make the automated comment generation smarter. It appears that the comments have to be capped at 65536 characters. If the default comment would be longer than this, we should probably post some sort of summary instead.

Register volcanic forcing source_id and institution_id

@thomasaubry just creating an issue as a placeholder for discussions in finalizing the registration of the volcanic forcing institution_id and source_id.

Note the CMIP6 contribution had institution_id: IACETH (here) and a couple of versioned releases, so source_id entries: IACETH-SAGE3lambda-2-1-0 and 3-0-0 (here).

We have updated the institution registration a little moving beyond CMIP6, these now depend on the RoR registry (see here), and, as an example UExeter is already registered - https://ror.org/03yghzc09

@wolfiex @matthew-mizielinski @taylor13 @vnaik60 @znichollscr ping

target_mip "long name"

I wonder if in the target_MIP CV "long_name" should be replaced by "full_name" or something else. In the CF coventions the
long_name attribute is attached to a variable, not as a expansion of an abbreviation. This could lead to some confusion (perhaps not too much, admittedly, but I think it's best to not mix descriptor names for different purposes).

Connecting to the data citation service

From @MartinaSt

From: Martina Stockhause
Date: Friday, May 19, 2023 at 2:52 AM
To: Durack, Paul J.
Subject: CMIP6Plus - input4MIPs
Hi Paul,
I have thought about CMIP6Plus and input4MIPs a bit more:

  1. DOI granularity/-ies: For CMIP6-input4MIPs DOIs on two granularities were registered. The easiest is to do it for CMIP6Plus in the same way and have two new DOIs for your AMIP-1-1-9 data. However, if the CMIP6Plus period is to be short, it might be better to have only a DOI on the finer granularity (with the possibility to relate it to the coarser one for CMIP6). I guess an example makes these options clearer:
  • 2 new DOIs on data collections 'input4MIPs.CMIP6Plus.CMIP.PCMDI' and 'input4MIPs.CMIP6Plus.CMIP.PCMDI.PCMDI-AMIP-1-1-9'
  • 1 new DOI on data collection 'input4MIPs.CMIP6Plus.CMIP.PCMDI.PCMDI-AMIP-1-1-9' with the possibility to relate it to 'input4MIPs.CMIP6.CMIP.PCMDI'
  1. Data Access Links: I will use links into the CoG and change them together with all the others for CMIP6 when ESGF moves to MetaGrid.

  2. CV/citation service workflow: Currently, the ESGF publication of the input4MIPs data is first, then the insert into the citation DB and the DOI publication. For some cases, this leads to DOI registrations before the citation manager could adjust the citation metadata. I don't want to change this for a rather short period of CMIP6Plus but rather live with the inconvenience: writing an email to me in case this has happened.

  3. Errata/Versions: Yes, that is something we should improve. There will be a delay for displaying this information on the DOI landing pages.

Which option under 1. would you prefer?
Best wishes,
Martina

SOLARIS-HEPPA-4-3

Issue for tracking publication of the SOLARIS-HEPPA-4-3 dataset.

At present, the published files withwill have the filename according to the DRS. This means they will differ from the original filename as published by the SOLARIS team. This isn't ideal, but in the interests of getting data out (particularly while #64 is ongoing), we will go with this and see what feedback we get from modelling teams, then improve in the next round.

cc @durack1 @st-bender @berndfunke

Deploying website for each version

If you do builds of your website with e.g. ReadTheDocs, you can make it possible for users to look at the docs at different versions of your package/deployment. For example, with input4MIPs validation, users can look at:

  • latest i.e. the latest commit in main
  • stable i.e. the latest tagged commit
  • v0.10.2, a specific tag
  • v0.11.3, a different specific tag

etc.

Unfortunately, with GitHub pages, this doesn't appear to be possible. GitHub pages just always serves the data in the gh-pages branch.

It would add more complexity (but not much, this is how we do all our docs at CR, including input4mips validation), but would this extra feature be useful? It would mean that people could easily see the state of the database/CVs at an earlier point in time, should they ever need to look at how things were at a specific commit rather than always just seeing the latest commit in main on the deployed docs site.

@durack1 interested in your thoughts here

CR-CMIP-0-3-0

@durack1 the files for CR-CMIP-0-3-0 are in /incoming/cr-cmip-0-3-0 on the FTP server and can be queued for publication 🚀

Solar data tiny tweak - targeting SOLARIS-HEPPA-CMIP-4-4

@st-bender @berndfunke 1000 apologies for this, I should have caught it earlier. For your next version (whenever that is), there is one tiny other tweak to your solar data. Can you please remove the attributes datetime_end and datetime_start global attributes in all your files. They're not written correctly at the moment (the format is meant to depend on the file's frequency, a bit of a headache) and we can actually just infer them from the time axis in your data, so we can avoid sparing you the details of the frequency-dependent formatting.

Apologies again, I know we asked you to put that there in the first place. For the 4.3 data, it doesn't matter, just leave it as is.

One more thing for the list: we should double check the creation_date attribute. This should be picked up by the new version of input4mips-validation, but just in case, here is a reminder. If the solar team want to write this themselves, it must be the timestamp when the file was created (in the UTC timezone), in the format "YYYY-mm-ddTHH:MM:SSZ" (the ISO8601 standard).

Thanks and sorry again!

cc @durack1

Tweaks to source_id and data registration formats

Working with @znichollscr, we plan to implement a number of updates to streamline the capture of information from the dataset-level (defined by a source_id registration) and file-level (contained as global attributes/metadata in a single file). I will capture recommended changes below. Lose existing |dataProviderFile entry altogether.

lose convert/move add
core _status, nominal_resolution convert to list: frequency, grid_label, realm, target_mip
dataProviderExtra
esgfIndex _status

To file-level, required global_attributes are: activity_id, tracking_id, nominal_resolution (remove institution, table_id)

For reference this is how a registration looks currently

"CR-CMIP-0-2-0":{
"_status":"Registered",
"contact":"[email protected]; [email protected]",
"dataset_category":"GHGConcentrations",
"datetime_start":"0001-01-15",
"datetime_stop":"2022-12-15",
"frequency":"mon",
"further_info_url":"https://www.climate-resource.com/",
"grid_label":"gm",
"institution_id":"CR",
"license":"CC BY 4.0",
"mip_era":"CMIP6Plus",
"nominal_resolution":"10000 km",
"realm":"atmos",
"region":[
"global"
],
"source":"Global greenhouse gas concentrations 0001 through 2022 based on NOAA/AGAGE/GAGE data",
"source_id":"CR-CMIP-0-2-0",
"source_version":"0.2.0",
"target_mip":"CMIP",
"title":"Climate Resource CMIP 0.2.0 dataset prepared for input4MIPs",
"|dataProviderExtra":{
"source_variables":""
},
"|dataProviderFile":{
"Conventions":"",
"comment":"[TBC which grant] Data produced by Climate Resource supported by funding from the CMIP IPO (Coupled Model Intercomparison Project International Project Office). This is an interim dataset, not for production use",
"creation_date":"",
"tracking_id":""
},
"|esgfIndex":{
"_timestamp":"",
"data_node":"",
"latest":"",
"replica":"",
"version":"",
"xlink":""
}
},

Will need to add to repo CV list, input4MIPs_regions.json following the CF standard region list (here) and converted to JSON in the obs4MIP-cmor-tables repo (here). As work continues to be finalized upstream in the mip-cmor-tables the reproduction of these CVs in this repo could be removed, rather deferring to the upstream sources as the definitive record

Fix MRI-JRA55-do-1-6-0 license omission - CC BY 4.0

The CMIP6Plus MRI-JRA55-do-1-6-0 dataset has a CC BY 4.0 license assigned, but this is not in our database - we need to correct that

(xcd061nctax) bash-4.2$ ncdump -h ../input4MIPs/CMIP6Plus/OMIP/MRI/MRI-JRA55-do-1-6-0/atmos/3hr/
prra/gr/v20240531/prra_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-6-0_gr_195801010130-
195812312230.nc | grep license
                :license = "OMIP boundary condition data produced by MRI is licensed under a Creative Commons
Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0). Consult
https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing input4MIPs output, including citation
requirements and proper acknowledgment. Further information about this data, including some limitations,
can be found via the further_info_url (recorded as a global attribute in this file). The data producers and
data providers make no warranty, either express or implied, including, but not limited to, warranties of
merchantability and fitness for a particular purpose. All liabilities arising from the supply of the
information (including any liability arising in negligence) are excluded to the fullest extent permitted
by law" ;

HTML page links need update in the README

The previous path

https://pcmdi.github.io/input4MIPs_CVs/docs/input4MIPs_files_CMIP6Plus.html

has been augmented with an additional subdir

https://pcmdi.github.io/input4MIPs_CVs/docs/database-views/input4MIPs_files_CMIP6Plus.html

Create DRS creation script

Contributed data most often does not follow the DRS/directory structure required to publish on ESGF.

Generate a file that checks and generates an appropriate directory structure and copies data into this format, see below

"DRS":{
"directory_path_example":"input4MIPs/CMIP6Plus/CMIP/PCMDI/PCMDI-AMIP-1-1-9/ocean/mon/tos/gn/v20230512/",
"directory_path_template":"<activity_id>/<mip_era>/<target_mip>/<institution_id>/<source_id>/<realm>/<frequency>/<variable_id>/<grid_label>/<version>",
"filename_example":"tos_input4MIPs_SSTsAndSeaIce_CMIP_PCMDI-AMIP-1-1-9_gn_187001-202212.nc",
"filename_template":"<variable_id>_<activity_id>_<dataset_category>_<target_mip>_<source_id>_<grid_label>[_<time_range>].nc"
}

Register prototype AMIP datasets

@mzelinka just creating an issue as a placeholder for discussions in finalizing the registration of these prototype AMIP SST datasets. These will all be published against the input4MIPs mip_era CMIP6Plus target_mip CMIP - as per PCMDI-AMIP-1-1-9

source_id dataset time start time end time avail. source inst
PCMDI-AMIP-OI2p1-1-0 OISST2.1 1981-09-01 2023-10-01 2024-05-01 NOAA-PSL
PCMDI-AMIP-Had1p1-1-0 HadISST v1.1 1870-01-16 12 2023-09-16 12 2024-03-16 12 MOHC
PCMDI-AMIP-ERSST5-1-0 ERSST v5 1854-01-01 2023-10-01 2024-05-01 NOAA-PSL
PCMDI-AMIP-Had2p4-1-0 HadISST v2.4 1850-01-16 2023-12-16 MOHC
PCMDI-AMIP-1-1-9 HadISST v1, OISST 2.0 1870-01-01 2022-12-01 NOAA-NCEP

@taylor13 ping as we're discussing identifiers

Notes to self:
Status of E3SM sims here
The path to Frankenstein ~/scripts/examine_SST_datasets.ipynb

Adding more notes - RE update SST datasets (#5 dupe)
..PCMDIobs/obs4MIPs_input/MOHC
HadISST1-1 (high, greens function - here and other sources) - updated to 2023-04-12
..PCMDIobs/obs4MIPs_input/NOAA-PSL
COBE1 (high) - 2023-06-01
ERSST5 (low) - 2023-10-01
OISST2-1 (low) - 2023-10-01
..PCMDIobs/obs4MIPs_input/RSS
RSS-MW5-1
RSS-MW-IR5-1

Also adding

CMEMS AVISO - https://data.marine.copernicus.eu/product/SEALEVEL_GLO_PHY_L4_MY_008_047/services (Copernicus)
PO-DACC Aquarius/SMAP https://doi.org/10.5067/SMP20-4U7CS

Create CMOR input file

To write files with CMOR, you need a specific input file. It's basically a CMOR table as far as I understand.

We used to package that in this repo, but it has been removed in the reshuffle.

To close this issue:

  • define the required file
  • work out what is required to make that table
  • add back in generation of a CMOR-compatible file (ideally auto-generated with each PR)

x-ref: #56 (comment)

Register emissions source ID and institution ID

For the CEDS produced data, we need a source ID and institution ID entry.

Institution ID will be a bit tricky I expect. From what I can tell, the institute is either CEDS or JGCRI. They both appear to be a consortium of PNNL and University of Maryland. Once we have clarity on this, I can help push this upstream to https://github.com/PCMDI/mip-cmor-tables.

In terms of source ID, suggestion below (same idea as #42)

    "CEDS-YYYY-mm-DD":{
        "contact":"[email protected];[email protected];[email protected]",
        "further_info_url":"www.tbd.invalid",
        "institution_id":"CEDS",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP6Plus",
        "source_version":"YYYY.mm.DD"
    }

Consistent repo versioning

We want to label the version of the repo consistently and clearly, so the following will need to be updated each time a release is minted

  • README.md
  • CITATION.cff
  • All docs/*.html files
  • Repo itself

Note to self, there is some code in 07653ab that was lost but could be recovered which solves the README.md and CITATION.cff updates. The HTML files are currently written using a script, which takes the version as an argument, so that's also close to being implemented

Register land use change source_id

We need to register a new source_id for the expected land use change dataset. This will be another contribution from the existing institution_id UofMD.

The recommendation was for UofMD-landState-3-0, following the previous CMIP6 contributions of UofMD-landState- 2-1-h, UofMD-landState-high-2-1-h, and UofMD-landState-low-2-1-h.

Will need to circle with Louise et al to make sure this and additional registered information can be captured

Tweaks to sync across contributions

  • pollESGF.py needs to point output from ./ to ../DatasetsDatabase/input-data/
  • catch extra / in HTML links off README.md
  • correct sublist indenting in README.md

Ensuring no information loss

In #39, we shuffled things around a bit. It wasn't clear what the source of truth prior to #39 was meant to be, was it a) input4MIPs_source_id.json or b) src/240701_2137_comp.json.7z? @durack1 I think this is a question for you.

To ensure that there was no loss of new information (see also comment here: #39 (comment)), we created a legacy folder. This has all the information as it was in main prior to merging #39.

The question now is, what is the source of truth? Once we know that, we can make sure that we are capturing it correctly and then delete the legacy folder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.