The input4mips_cvs's discuss from pcmdi

Automate scrape from ESGF

At the moment, we manually scrape/poll the ESGF to create our esgf.json file. It would be better if this process were automated. That should be possible. The only issue right now is that the API which is hit is not public, so we'd have to work out a workaround. Two possible options: a) whitelist some known GitHub CI API (feels wrong to me though, mainly for security reasons) b) run the script on nimbus, and automatically create PRs into this repository from nimbus using the GitHub CLI (this feels like the right solution to me, and would also be pretty easy I think given we already use the GitHub CLI in our GitHub actions).

Registering CMIP7 prelim solar historical forcing

Datasets have the following filenames:
solarforcing-ref-day_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-20231231.nc
solarforcing-ref-mon_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-20231231.nc
solarforcing-picontrol-fx_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-18730128.nc
There will also be an alternative SSI dataset which we shall call
solarforcing-alternative-ssi-day_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-20231231.nc

Current global attributes (for solarforcing-ref-day_input4MIPs_solar_CMIP_SOLARIS-HEPPA-4-1_gn_18500101-20231231.nc):

:title = "CMIP7 solar forcing historic (1850-2023)";
:institution_id = "SOLARIS-HEPPA";
:institution = "APARC SOLARIS-HEPPA";
:activity_id = "input4MIPs";
:comment = "SSI data are taken from NRL v03r00_preliminary. Sub-annual variability has been added for the period before 1874; m. TSI in this file is from source data as integral over SSI between 0 and 10,000nm";
:time_coverage_start = "1850-01-01";
:time_coverage_end = "2023-12-31";
:frequency = "day";
:source = "nrlssi_v03r00_preliminary (Odele Coddington et al., pers. comm.); Ap, Kp, F10.7 from ftp.ngdc.noaa.gov until 2014, afterwards from GFZ Potsdam (https://kp.gfz-potsdam.de), P-IPR from SEP-II (Ilya Usoskin et al., pers. comm.), MEE-IPR from APEEP apeep_v2024b_cmip7 (Max van de Kamp et al., pers. comm.), GCR-IPR from CRII v2024-02 (Ilya Usoskin et al., pers. comm.)";
:source_id = "SOLARIS-HEPPA-CMIP-4-1";
:realm = "atmos";
:further_info_url = "http://solarisheppa.geomar.de/cmip7";
:metadata_url = "see http://solarisheppa.geomar.de/solarisheppa/sites/default/files/data/cmip7/CMIP7_metadata_description_4.1.pdf";
:contributor_name = "Bernd Funke, Timo Asikainen, Stefan Bender, Thierry Dudok de Wit, Illaria Ermolli, Margit Haberreiter, Doug Kinnison, Sergey Koldoboskiy, Daniel R. Marsh, Hilde Nesse, Annika Seppaelae, Miriam Sinnhuber, Ilya Usoskin, Max van de Kamp, Pekka T. Verronen";
:references = "Funke et al., Geosci. Model Dev., 17, 1217–1227, https://doi.org/10.5194/gmd-17-1217-2024, 2024";
:contact = "[email protected]";
:dataset_category = "solar";
:dataset_version_number = "4.1";
:grid_label = "gn";
:mip_era = "CMIP7";
:target_mip = "CMIP";
:variable_id = "multiple";
:license = "Solar forcing data produced by SOLARIS-HEPPA is licensed under a Creative Commons Attribution "Share Alike" 4.0 International License (http://creativecommons.org/licenses/by/4.0/). The data producers and data providers make no warranty, either expressed or implied, including but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.";
:Conventions = "CF-1.6";
:creation_date = "2024-05-06T14:57:54Z";

Register GHG source_id and institution_id

@durack1 moving discussion from #12 here.

Summary of @durack1's comment here:

excel spreadsheet where attributes was discussed is here: https://docs.google.com/spreadsheets/d/1vMlMVAnP25GpHcS0B8cyvFvUoTZxlUvxBmFB-PbLDG0/edit#gid=108565686
idea is to move as much info as possible into the source ID JSON file in this repo, with the suggestion being as below

        "PCMDI-AMIP-1-1-9":{
            "calendar":"gregorian",
            "comment":"Based on Hurrell SST/sea ice consistency criteria applied to merged HadISST (1870-01 to 1981-10) & NCEP-0I2 (1981-11 to 2022-12)",
            "contact":"PCMDI ([email protected])",
            "dataset_category":"SSTsAndSeaIce",
            "further_info_url":"https://pcmdi.llnl.gov/mips/amip",
            "grid":"1x1 degree longitude x latitude",
            "grid_label":"gn",
            "institution":"Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA",
            "institution_id":"PCMDI",
            "license":"AMIP boundary condition data produced by PCMDI is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing input4MIPs output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law",
            "mip_era":"CMIP6Plus",
            "nominal_resolution":"1x1 degree",
            "product":"observations",
            "references":"Taylor, K.E., D. Williamson and F. Zwiers, 2000: The sea surface temperature and sea ice concentration boundary conditions for AMIP II simulations. PCMDI Report 60, Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, 25 pp. Available online: https://pcmdi.llnl.gov/report/pdf/60.pdf",
            "region":[
                "global_ocean"
            ],
            "release_year":"2023",
            "source":"PCMDI-AMIP 1.1.9: Merged SST based on UK MetOffice HadISST and NCEP OI2",
            "source_description":"Sea surface temperature and sea-ice datasets produced by PCMDI (LLNL) for the AMIP (DECK) experiment of CMIP6Plus",
            "source_id":"PCMDI-AMIP-1-1-9",
            "source_type":"satellite_blended",
            "source_variables":[
                "areacello",
                "sftof",
                "siconc",
                "siconcbcs",
                "tos",
                "tosbcs"
            ],
            "source_version":"1.1.9",
            "target_mip":"CMIP",
            "title":"PCMDI-AMIP 1.1.9 dataset prepared for input4MIPs"
        },

Summary of my reply here:

I think I understand
Does it make sense to remove some of the data file specific keys (e.g. grid) to avoid having to make new source IDs for each data file? That would mean the source ID info could be applied to all data files in a data set, with some data file level metadata being captured elsewhere (probably in the file I guess)

Discussion can continue below.

Add biomass burning files source ID

Hi @mjevanmarle, @durack1 and I think we have worked out a good format for source IDs. For your data, are you happy with the following entry?

    "DRES-CMIP-BB4CMIP7-1-0":{
        "contact":"[email protected];[email protected]",
        "further_info_url":"http://www.globalfiredata.org",
        "institution_id":"DRES",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP6Plus",
        "source_version":"1.0"
    }

Add links to HTML table views to GitHub pages

Arises from #75

update SST datasets

..PCMDIobs/obs4MIPs_input/MOHC
~~HadISST1-1~~ (high, greens function - here and other sources) - updated to 2023-04-12
..PCMDIobs/obs4MIPs_input/NOAA-PSL
~~COBE1~~ (high) - 2023-06-01
~~ERSST5~~ (low) - 2023-10-01
~~OISST2-1~~ (low) - 2023-10-01
..PCMDIobs/obs4MIPs_input/RSS
RSS-MW5-1
RSS-MW-IR5-1

Capture version deprecation information in CVs

We have had a single case of the SOLARIS-HEPPA-CMIP-4-2 retracted dataset which was promptly replaced with the replaced by SOLARIS-HEPPA-CMIP-4-3. In an associated issue, there is the note the problem was “encountered an issue with the proton ionization data in v4.2” 29 Jul 2024, but this is not prominently available.

It would be ideal to capture a meaningful description of the problem and how it is solved so that a modeling group can ascertain whether they need to pay attention to the data correction or proceed with their existing version. For this v4.2 -> v4.3 dataset update, presumably, most modeling groups would be concerned with wrong data (@vnaik60, do you use the proton ionization data for simulations?)., but in the case that there was a metadata inconsistency, then such a problem would be unlikely to require a data switchout.

ping @znichollscr @vnaik60

convert ESGF index scour notebook to script

The file https://github.com/PCMDI/input4MIPs_CVs/blob/main/src/getInput4MIPsESGF.ipynb extracts information in the input4MIPs ESGF project (SOLR) database, this needs to be scripted to enable repeated polling to occur regularly (cron job)

Test stratospheric aerosol file (extinction) being uploaded on input4mips FTP

Tagging @durack1 @znichollscr

Strat aerosol test file uploaded for checking

I'm uploading one of my test file to the input4mips FTP for Paul to check as instructed by Zeb, and following instuctions on https://input4mips-validation.readthedocs.io/en/latest/how-to-guides/how-to-upload-to-ftp/. The dry run went well and the file is currently uploading. Let me know if there is any issue.

Outstanding questions

The two main outstanding questions for my datasets are:

Do you want one file per variable? This feels a bit messy but happy to catter to your preference! I have 8 variables in the aerosol optical property dataset, emission depends.
Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset rather than a a list of eruption with emission parameters? There is a limited number of eruptions so I'm unsure this makes sense/how the few modelling group modelling this would prefer the data (I can poll them! With UKESM we work from an eruption list). One other concern is that the core data is a mass of SO2 for each eruption. If I grid that as a flux and people regrid that to their model grid(lat/lon, height, not sure whether there would be a time concern), they should try to conserve the mass for each eruption. But that information would be a lot harder to track from a gridded flux file rather than from an eruption list.

Registering contributing institution_id's

For CMIP6, we had the following institution_id entries registered - input4MIPs_institution_id.json:

{
    "institution_id":{
        "CCCma":"Canadian Centre for Climate Modelling and Analysis, Victoria, BC V8P 5C2, Canada",
        "CNRM-Cerfacs":"CNRM (Centre National de Recherches Meteorologiques, Toulouse 31057, France), CERFACS (Centre Europeen de Recherche et de Formation Avancee en Calcul Scientifique, Toulouse 31100, France)",
        "IACETH":"Institute for Atmosphere and Climate, ETH Zurich, Zurich 8092, Switzerland",
        "IAMC":"Integrated Assessment Modeling Consortium (see www.globalchange.umd.edu/iamc/membership for complete membership). Mailing address: International Institute for Applied Systems Analysis (IIASA), Schlossplatz 1, A-2361 Laxenburg, Austria",
        "ImperialCollege":"Imperial College London, South Kensington Campus, London SW7 2AZ, UK",
        "MOHC":"Met Office Hadley Centre, Fitzroy Road, Exeter, Devon, EX1 3PB, UK",
        "MPI-B":"Max Planck Institute for Biogeochemistry, Jena 07745, Germany",
        "MPI-M":"Max Planck Institute for Meteorology, Hamburg 20146, Germany",
        "MRI":"Meteorological Research Institute, Tsukuba, Ibaraki 305-0052, Japan",
        "NASA-GSFC":"NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA",
        "NCAR":"National Center for Atmospheric Research, Boulder, CO 80307, USA",
        "NCAS":"National Centre for Atmospheric Science, University of Reading, Reading RG6 6BB, UK",
        "PCMDI":"Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA",
        "PNNL-JGCRI":"Pacific Northwest National Laboratory - Joint Global Change Research Institute, College Park, MD 20740, USA",
        "SOLARIS-HEPPA":"SOLARIS-HEPPA, GEOMAR Helmholtz Centre for Ocean Research, Kiel 24105, Germany",
        "UCI":"Department of Earth System Science, University of California Irvine, Irvine, CA 92697, USA",
        "UColorado":"University of Colorado, Boulder, CO 80309, USA",
        "UReading":"University of Reading, Reading RG6 6UA, UK",
        "UoM":"Australian-German Climate & Energy College, The University of Melbourne (UoM), Parkville, Victoria 3010, Australia",
        "UofMD":"University of Maryland (UofMD), College Park, MD 20742, USA",
        "VUA":"Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands"
    }
}

Of these CNRM-CERFACS, IACETH, IAMC, ImperialCollege, and MOHC are not registered in PCMDI/mip-cmor-tables/MIP_institutions.json. We'll also need to register a new institution CR (@znichollscr).

@wolfiex how would you recommend we proceed, and should I open an issue in PCMDI/mip-cmor-tables?

Generate version history json file - for use by ES-DOCS

@eguil @davidhassell this issue has been generated following the email correspondence regarding https://es-doc.org/cmip6-ensembles-conformance/

It will be useful to iterate over the format of the json version info within this issue

augment HTML for active ESGF data links

Register biomass burning source_id and institution_id

@mjevanmarle just creating an issue as a placeholder for discussions in finalizing the registration of the biomass burning institution_id and source_id.

Note the CMIP6 contribution had institution_id: VUA (here) and a couple of versioned releases, so source_id entries: VUA-CMIP-BB4CMIP6-1-0, 1-1 and 1-2 (here). I wonder if we want to maintain any consistency with the previous version, or just start again? I see that you've used the previous template so VUA-CMIP-BB4CMIP6-1-0 becomes DRES-CMIP-BB4CMIP7-1-0.

Just a note, while we start to coordinate the collation of these prototype (v0) datasets, and gather feedback, we're aiming to catch these in the "CMIP6Plus" project, in preparation for CMIP7 in a couple of years. This will allow a clean split between the CMIP7 "endorsed" forcing collection, and those that we are working out the kinks on (caught in CMIP6Plus).

We have updated the institution registration a little moving beyond CMIP6; these now depend on the RoR registry (see here), and, as an example, Deltares is already registered - https://ror.org/01deh9c76

@wolfiex @matthew-mizielinski @taylor13 @vnaik60 @znichollscr ping

Clarifying rules around controlled vocabularies

In short, it's not clear to me what the rules for the controlled vocabularies are. For example, which fields are compulsory, which can be inferred from data, which are only required for ESGF (i.e. are things that data providers shouldn't worry about, but tooling after submission does need to handle).

To get the conversation started, I've made a google doc here: https://docs.google.com/document/d/1oLK4mWW6TX2YPrhGoLdMLcrK7vX1flU3BjiheTb2Hwk/edit?usp=sharing

Once we've got something a bit more concrete, I will pull everything back into issues that we can track across the various repositories.

Registering institutions

Can be closed as this is a duplicate of #8

~~There doesn't appear to be an obvious place to register institutions. In the source IDs, they are a field, but there is no *instution_IDs.json file in this repo.~~

Are they instead meant to be registered instead in https://github.com/PCMDI/mip-cmor-tables/blob/main/MIP_institutions.json or https://github.com/PCMDI/input4MIPs-cmor-tables/blob/master/input4MIPs_institution_id.json#L23?

~~@durack1 have I understood correctly or am I missing something?~~

Add CI to check that the CVs can be parsed with input4mips validation

Input4MIPs validation has a Python package that captures the CVs data model. If we break that, we want to know early, because it will cause havoc for the validation. Hence, add a CI step that checks that the CVs can be loaded using input4MIPs validation and fails if they can't.

Add source_id/overview view for HTML pages

We currently have two very granular views of the data, a dataset view, with 270 current entries for 3 source_id's registered, and a files view, with 1253 current entries for 4 source_id's registered. These are both dense, and having a higher level source_id view would be useful, so the information we are manually collating at https://wcrp-cmip.org/cmip7-task-teams/forcings/#forcing_datasets_availability could be viewed on a dynamic page in this repo.

This will likely require a get_source_id_view function being added to the html_generation.py file.

Update CC0 to CC BY 4.0 license

Obtain updated file from https://github.com/santisoler/cc-licenses

Add links to CMOR definitions

The CMOR definitions are very opaque to people who don't work with this stuff a lot. For example, what the grid labels actually mean, which is actually spelt out pretty nicely here: https://github.com/PCMDI/mip-cmor-tables/blob/main/MIP_grid_label.json

To close this issue:

add a section to the README which collates these key links
point to that section in the README anywhere we think is sensible

The long-term solution would be to link docs on in the CMOR/MIP repositories where this is captured, but I don't think they're stable/available yet (@durack1 please correct me if I'm wrong).

x-ref #55 (comment)

GitHub pages

There is some sort of GitHub action going on, that generates a GitHub pages site based on main. See https://pcmdi.github.io/input4MIPs_CVs/

I'm not really sure what this is doing, but I think switching to using mkdocs with GitHub pages is the much more modern solution: https://www.mkdocs.org/user-guide/deploying-your-docs/#github-pages

@durack1 do you know anything about this was set up? Is it just a placeholder or is it performing some key function?

Add workaround for automated comment generation when there are lots of changes

When there are lots of changes, the automated comment from the bot can fail because the comment can be too big. For example, here: https://github.com/PCMDI/input4MIPs_CVs/actions/runs/10431412061/job/28891047381?pr=101

To close:

Make the automated comment generation smarter. It appears that the comments have to be capped at 65536 characters. If the default comment would be longer than this, we should probably post some sort of summary instead.

ISMIP6 datestamp cleanup

There are numerous instances of missing or wrong start/stop times in the ISMIP6 data, these need to be updated - see https://pcmdi.github.io/input4MIPs_CVs/docs/input4MIPs_source_id.html and search for "None"

Emissions (CEDS) sample data validation

@hahsan1 thanks for the sample file at https://rcdtn1.pnl.gov/data/CEDS/CEDS_grid_sample. Here is an issue to track our validation of it.

Register volcanic forcing source_id and institution_id

@thomasaubry just creating an issue as a placeholder for discussions in finalizing the registration of the volcanic forcing institution_id and source_id.

Note the CMIP6 contribution had institution_id: IACETH (here) and a couple of versioned releases, so source_id entries: IACETH-SAGE3lambda-2-1-0 and 3-0-0 (here).

We have updated the institution registration a little moving beyond CMIP6, these now depend on the RoR registry (see here), and, as an example UExeter is already registered - https://ror.org/03yghzc09

@wolfiex @matthew-mizielinski @taylor13 @vnaik60 @znichollscr ping

add composite input4MIPs_CVs.json file for CMOR (and other) use

Following the same template as WCRP-CMIP/CMIP6Plus_CVs/CVs/CMIP6Plus_CV.json, we'll need to create a single composite CVs file for CMOR and other software to load prior to writing data

target_mip "long name"

I wonder if in the target_MIP CV "long_name" should be replaced by "full_name" or something else. In the CF coventions the
long_name attribute is attached to a variable, not as a expansion of an abbreviation. This could lead to some confusion (perhaps not too much, admittedly, but I think it's best to not mix descriptor names for different purposes).

Connecting to the data citation service

From @MartinaSt

From: Martina Stockhause
Date: Friday, May 19, 2023 at 2:52 AM
To: Durack, Paul J.
Subject: CMIP6Plus - input4MIPs
Hi Paul,
I have thought about CMIP6Plus and input4MIPs a bit more:

DOI granularity/-ies: For CMIP6-input4MIPs DOIs on two granularities were registered. The easiest is to do it for CMIP6Plus in the same way and have two new DOIs for your AMIP-1-1-9 data. However, if the CMIP6Plus period is to be short, it might be better to have only a DOI on the finer granularity (with the possibility to relate it to the coarser one for CMIP6). I guess an example makes these options clearer:

2 new DOIs on data collections 'input4MIPs.CMIP6Plus.CMIP.PCMDI' and 'input4MIPs.CMIP6Plus.CMIP.PCMDI.PCMDI-AMIP-1-1-9'
1 new DOI on data collection 'input4MIPs.CMIP6Plus.CMIP.PCMDI.PCMDI-AMIP-1-1-9' with the possibility to relate it to 'input4MIPs.CMIP6.CMIP.PCMDI'

Data Access Links: I will use links into the CoG and change them together with all the others for CMIP6 when ESGF moves to MetaGrid.
CV/citation service workflow: Currently, the ESGF publication of the input4MIPs data is first, then the insert into the citation DB and the DOI publication. For some cases, this leads to DOI registrations before the citation manager could adjust the citation metadata. I don't want to change this for a rather short period of CMIP6Plus but rather live with the inconvenience: writing an email to me in case this has happened.
Errata/Versions: Yes, that is something we should improve. There will be a delay for displaying this information on the DOI landing pages.

Which option under 1. would you prefer?
Best wishes,
Martina

pull source_id information from input4MIPs ESGF index

See code in src/getInput4MIPsESGF.ipynb - consider that deprecated datasets do not list files, which leads to a discrepancy if deprecated=false is not selected

SOLARIS-HEPPA-4-3

Issue for tracking publication of the SOLARIS-HEPPA-4-3 dataset.

At present, the published files ~~with~~will have the filename according to the DRS. This means they will differ from the original filename as published by the SOLARIS team. This isn't ideal, but in the interests of getting data out (particularly while #64 is ongoing), we will go with this and see what feedback we get from modelling teams, then improve in the next round.

cc @durack1 @st-bender @berndfunke

Deploying website for each version

If you do builds of your website with e.g. ReadTheDocs, you can make it possible for users to look at the docs at different versions of your package/deployment. For example, with input4MIPs validation, users can look at:

latest i.e. the latest commit in main
stable i.e. the latest tagged commit
v0.10.2, a specific tag
v0.11.3, a different specific tag

etc.

Unfortunately, with GitHub pages, this doesn't appear to be possible. GitHub pages just always serves the data in the gh-pages branch.

It would add more complexity (but not much, this is how we do all our docs at CR, including input4mips validation), but would this extra feature be useful? It would mean that people could easily see the state of the database/CVs at an earlier point in time, should they ever need to look at how things were at a specific commit rather than always just seeing the latest commit in main on the deployed docs site.

@durack1 interested in your thoughts here

CR-CMIP-0-3-0

@durack1 the files for CR-CMIP-0-3-0 are in /incoming/cr-cmip-0-3-0 on the FTP server and can be queued for publication 🚀

Solar data tiny tweak - targeting SOLARIS-HEPPA-CMIP-4-4

@st-bender @berndfunke 1000 apologies for this, I should have caught it earlier. For your next version (whenever that is), there is one tiny other tweak to your solar data. Can you please remove the attributes datetime_end and datetime_start global attributes in all your files. They're not written correctly at the moment (the format is meant to depend on the file's frequency, a bit of a headache) and we can actually just infer them from the time axis in your data, so we can avoid sparing you the details of the frequency-dependent formatting.

Apologies again, I know we asked you to put that there in the first place. For the 4.3 data, it doesn't matter, just leave it as is.

One more thing for the list: we should double check the creation_date attribute. This should be picked up by the new version of input4mips-validation, but just in case, here is a reminder. If the solar team want to write this themselves, it must be the timestamp when the file was created (in the UTC timezone), in the format "YYYY-mm-ddTHH:MM:SSZ" (the ISO8601 standard).

Thanks and sorry again!

cc @durack1

Tweaks to source_id and data registration formats

Working with @znichollscr, we plan to implement a number of updates to streamline the capture of information from the dataset-level (defined by a source_id registration) and file-level (contained as global attributes/metadata in a single file). I will capture recommended changes below. Lose existing |dataProviderFile entry altogether.

	lose	convert/move	add
core	`_status`, `nominal_resolution`	convert to list: `frequency`, `grid_label`, `realm`, `target_mip`
dataProviderExtra
esgfIndex			`_status`

To file-level, required global_attributes are: activity_id, tracking_id, nominal_resolution (remove institution, table_id)

For reference this is how a registration looks currently

input4MIPs_CVs/input4MIPs_source_id.json

Lines 1132 to 1171 in 07653ab

	"CR-CMIP-0-2-0":{
	"_status":"Registered",
	"contact":"[email protected]; [email protected]",
	"dataset_category":"GHGConcentrations",
	"datetime_start":"0001-01-15",
	"datetime_stop":"2022-12-15",
	"frequency":"mon",
	"further_info_url":"https://www.climate-resource.com/",
	"grid_label":"gm",
	"institution_id":"CR",
	"license":"CC BY 4.0",
	"mip_era":"CMIP6Plus",
	"nominal_resolution":"10000 km",
	"realm":"atmos",
	"region":[
	"global"
	],
	"source":"Global greenhouse gas concentrations 0001 through 2022 based on NOAA/AGAGE/GAGE data",
	"source_id":"CR-CMIP-0-2-0",
	"source_version":"0.2.0",
	"target_mip":"CMIP",
	"title":"Climate Resource CMIP 0.2.0 dataset prepared for input4MIPs",
	"\|dataProviderExtra":{
	"source_variables":""
	},
	"\|dataProviderFile":{
	"Conventions":"",
	"comment":"[TBC which grant] Data produced by Climate Resource supported by funding from the CMIP IPO (Coupled Model Intercomparison Project International Project Office). This is an interim dataset, not for production use",
	"creation_date":"",
	"tracking_id":""
	},
	"\|esgfIndex":{
	"_timestamp":"",
	"data_node":"",
	"latest":"",
	"replica":"",
	"version":"",
	"xlink":""
	}
	},

Will need to add to repo CV list, input4MIPs_regions.json following the CF standard region list (here) and converted to JSON in the obs4MIP-cmor-tables repo (here). As work continues to be finalized upstream in the mip-cmor-tables the reproduction of these CVs in this repo could be removed, rather deferring to the upstream sources as the definitive record

Add CR logo to README

Comes out of #75

Fix MRI-JRA55-do-1-6-0 license omission - CC BY 4.0

The CMIP6Plus MRI-JRA55-do-1-6-0 dataset has a CC BY 4.0 license assigned, but this is not in our database - we need to correct that

(xcd061nctax) bash-4.2$ ncdump -h ../input4MIPs/CMIP6Plus/OMIP/MRI/MRI-JRA55-do-1-6-0/atmos/3hr/
prra/gr/v20240531/prra_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-6-0_gr_195801010130-
195812312230.nc | grep license
                :license = "OMIP boundary condition data produced by MRI is licensed under a Creative Commons
Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0). Consult
https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing input4MIPs output, including citation
requirements and proper acknowledgment. Further information about this data, including some limitations,
can be found via the further_info_url (recorded as a global attribute in this file). The data producers and
data providers make no warranty, either express or implied, including, but not limited to, warranties of
merchantability and fitness for a particular purpose. All liabilities arising from the supply of the
information (including any liability arising in negligence) are excluded to the fullest extent permitted
by law" ;

Register MRI-JRA55-do-1-6-0

Replicate registration from PCMDI/input4MIPs-cmor-tables#130 in this repo to enable publication of data imminently. Data has been produced and is being downloaded to LLNL/PCMDI to enable local publication into input4MIPs.

HTML page links need update in the README

The previous path

https://pcmdi.github.io/input4MIPs_CVs/docs/input4MIPs_files_CMIP6Plus.html

has been augmented with an additional subdir

https://pcmdi.github.io/input4MIPs_CVs/docs/database-views/input4MIPs_files_CMIP6Plus.html

CMIP6 ozone dataset documentation paper

This should be added to dataset metadata/CVs

Checa-Garcia, R., 2018: CMIP6 Ozone forcing dataset: supporting information (Initial). Zenodo. https://doi.org/10.5281/zenodo.1135127

@vnaik60

Create DRS creation script

Contributed data most often does not follow the DRS/directory structure required to publish on ESGF.

Generate a file that checks and generates an appropriate directory structure and copies data into this format, see below

input4MIPs_CVs/input4MIPs_DRS.json

Lines 2 to 7 in 07653ab

    
           "DRS":{ 
        
               "directory_path_example":"input4MIPs/CMIP6Plus/CMIP/PCMDI/PCMDI-AMIP-1-1-9/ocean/mon/tos/gn/v20230512/", 
        
               "directory_path_template":"<activity_id>/<mip_era>/<target_mip>/<institution_id>/<source_id>/<realm>/<frequency>/<variable_id>/<grid_label>/<version>", 
        
               "filename_example":"tos_input4MIPs_SSTsAndSeaIce_CMIP_PCMDI-AMIP-1-1-9_gn_187001-202212.nc", 
        
               "filename_template":"<variable_id>_<activity_id>_<dataset_category>_<target_mip>_<source_id>_<grid_label>[_<time_range>].nc" 
        
           }

Register prototype AMIP datasets

@mzelinka just creating an issue as a placeholder for discussions in finalizing the registration of these prototype AMIP SST datasets. These will all be published against the input4MIPs mip_era CMIP6Plus target_mip CMIP - as per PCMDI-AMIP-1-1-9

source_id	dataset	time start	time end	time avail.	source inst
PCMDI-AMIP-OI2p1-1-0	OISST2.1	1981-09-01	2023-10-01	2024-05-01	NOAA-PSL
PCMDI-AMIP-Had1p1-1-0	HadISST v1.1	1870-01-16 12	2023-09-16 12	2024-03-16 12	MOHC
PCMDI-AMIP-ERSST5-1-0	ERSST v5	1854-01-01	2023-10-01	2024-05-01	NOAA-PSL
PCMDI-AMIP-Had2p4-1-0	HadISST v2.4	1850-01-16	2023-12-16		MOHC
PCMDI-AMIP-1-1-9	HadISST v1, OISST 2.0	1870-01-01	2022-12-01		NOAA-NCEP

@taylor13 ping as we're discussing identifiers

Notes to self:
Status of E3SM sims here
The path to Frankenstein ~/scripts/examine_SST_datasets.ipynb

Adding more notes - RE update SST datasets (#5 dupe)
..PCMDIobs/obs4MIPs_input/MOHC
~~HadISST1-1~~ (high, greens function - here and other sources) - updated to 2023-04-12
..PCMDIobs/obs4MIPs_input/NOAA-PSL
~~COBE1~~ (high) - 2023-06-01
~~ERSST5~~ (low) - 2023-10-01
~~OISST2-1~~ (low) - 2023-10-01
..PCMDIobs/obs4MIPs_input/RSS
RSS-MW5-1
RSS-MW-IR5-1

Also adding

CMEMS AVISO - https://data.marine.copernicus.eu/product/SEALEVEL_GLO_PHY_L4_MY_008_047/services (Copernicus)
PO-DACC Aquarius/SMAP https://doi.org/10.5067/SMP20-4U7CS

Create CMOR input file

To write files with CMOR, you need a specific input file. It's basically a CMOR table as far as I understand.

We used to package that in this repo, but it has been removed in the reshuffle.

To close this issue:

define the required file
work out what is required to make that table
add back in generation of a CMOR-compatible file (ideally auto-generated with each PR)

x-ref: #56 (comment)

Update HTML files with function

Convert https://github.com/PCMDI/input4MIPs_CVs/blob/main/src/jsonToHtml.py to a function - that accepts a dictionary and a list of keys that will build a HTML table

Datasets

All
CMIP6Plus

Source_id

All

Register emissions source ID and institution ID

For the CEDS produced data, we need a source ID and institution ID entry.

Institution ID will be a bit tricky I expect. From what I can tell, the institute is either CEDS or JGCRI. They both appear to be a consortium of PNNL and University of Maryland. Once we have clarity on this, I can help push this upstream to https://github.com/PCMDI/mip-cmor-tables.

In terms of source ID, suggestion below (same idea as #42)

    "CEDS-YYYY-mm-DD":{
        "contact":"[email protected];[email protected];[email protected]",
        "further_info_url":"www.tbd.invalid",
        "institution_id":"CEDS",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP6Plus",
        "source_version":"YYYY.mm.DD"
    }

Consistent repo versioning

We want to label the version of the repo consistently and clearly, so the following will need to be updated each time a release is minted

README.md
CITATION.cff
All docs/*.html files
Repo itself

Note to self, there is some code in 07653ab that was lost but could be recovered which solves the README.md and CITATION.cff updates. The HTML files are currently written using a script, which takes the version as an argument, so that's also close to being implemented

Register land use change source_id

We need to register a new source_id for the expected land use change dataset. This will be another contribution from the existing institution_id UofMD.

The recommendation was for UofMD-landState-3-0, following the previous CMIP6 contributions of UofMD-landState- 2-1-h, UofMD-landState-high-2-1-h, and UofMD-landState-low-2-1-h.

Will need to circle with Louise et al to make sure this and additional registered information can be captured

Tweaks to sync across contributions

pollESGF.py needs to point output from ./ to ../DatasetsDatabase/input-data/
catch extra / in HTML links off README.md
correct sublist indenting in README.md

Ensuring no information loss

In #39, we shuffled things around a bit. It wasn't clear what the source of truth prior to #39 was meant to be, was it a) input4MIPs_source_id.json or b) src/240701_2137_comp.json.7z? @durack1 I think this is a question for you.

To ensure that there was no loss of new information (see also comment here: #39 (comment)), we created a legacy folder. This has all the information as it was in main prior to merging #39.

The question now is, what is the source of truth? Once we know that, we can make sure that we are capturing it correctly and then delete the legacy folder.

Add CI step which checks that web pages are up to date before merging

Pretty simple. Right now, it is in theory possible to update the database and not update the associated web pages. We should add CI to make sure that the web pages are up to date. This could have the same pattern as the 'check database is up to date' CI.

	"DRS":{
	"directory_path_example":"input4MIPs/CMIP6Plus/CMIP/PCMDI/PCMDI-AMIP-1-1-9/ocean/mon/tos/gn/v20230512/",
	"directory_path_template":"<activity_id>/<mip_era>/<target_mip>/<institution_id>/<source_id>/<realm>/<frequency>/<variable_id>/<grid_label>/<version>",
	"filename_example":"tos_input4MIPs_SSTsAndSeaIce_CMIP_PCMDI-AMIP-1-1-9_gn_187001-202212.nc",
	"filename_template":"<variable_id>_<activity_id>_<dataset_category>_<target_mip>_<source_id>_<grid_label>[_<time_range>].nc"
	}

pcmdi / input4mips_cvs Goto Github PK

input4mips_cvs's Issues

Strat aerosol test file uploaded for checking

Outstanding questions

Recommend Projects

Recommend Topics

Recommend Org