Coder Social home page Coder Social logo

Comments (22)

mauzey1 avatar mauzey1 commented on July 3, 2024 1

I am little bit tempted to implement some backend CV check module purely in python that could be utilized by PrePARE. Json handling and NetCDF attribute handling has really become easy and flexible within python.

@larsbuntemeyer That is actually an idea I have been contemplating. I was thinking of making one based on the xarray. We could discuss that more on the discussion page.

As for the current issue with PrePARE, I wonder how we could tell if we are using a CV other than CMIP6 CV. I guess we could grab the prefix of the CV file assuming that the name of the CV file will always have the format <MIP name>_CF.json. Knowing if the CV is not for CMIP6, we could skip the parent and sub experiment checks. However, I wonder if other MIPs also use parent and sub experiments.

@durack1 @matthew-mizielinski Do you know of any MIPs that use parent and sub experiments other than CMIP6?

from cmor.

taylor13 avatar taylor13 commented on July 3, 2024 1

Here are some offline comments, I'm including in this thread. Hope it's the right thread.

The global attributes specifications for CMIP6 included:
“ further_info_url has the form https://furtherinfo.es-doc.org/<mip_era>.<institution_id>.<source_id>.<experiment_id>.<sub_experiment_id>.<variant_label> (e.g., https://furtherinfo.es-doc.org/CMIP6.CAS_FGOALS-g3.historical.none.r3i1p1f1). The further_info_url page will be maintained by the es-docs project and will simply be a rendering by the Viewer tool of information provided by modeling groups and recorded in so-called CIM documents.”

The PrePARE default value is consistent with this BUT DOESN’T LOOK LIKE IT INCLUDES THE DOT (PERIOD) SEPARATORS:
#define CMOR_DEFAULT_FURTHERURL_TEMPLATE https://furtherinfo.es-doc.org/<mip_era><institution_id><source_id><experiment_id><sub_experiment_id><variant_label>

I agree with Paul that we want to check source_id independent of further_info_url.

Reviewing some of the coding, it seems to me it is going to difficult to generalize it cleanly to handle other projects (obs4MIPs, input4MIPs, etc.). I have spent some time trying to write down what we would like to check with PrePARE and how we might code this in a general way to ingest simply formatted guidance from an input “configuration” file which would specify which attributes to check and how to check them. My ideas are contained in this document: https://docs.google.com/document/d/1JZcVRo2GucGUoZPeUNrmHmtB8LAs0QBAWiDuf0n3lSM/edit. You might also be interested in an earlier document I put together when PrePARE was originally being developed: https://docs.google.com/document/d/1d4_wdaY52xhLZBTeiTu2ZlpmpTO2Luqe47m2IX1WkWg .

I think it is worth considering how difficult it would be to implement the above from "scratch" (as opposed to modifying the current PrePARE code). I think it might be easier to start over and might take less time than trying to clean up and generalize the existing coding.

Perhaps we can all get together and you can propose what you think would be the best way to proceed.

from cmor.

mauzey1 avatar mauzey1 commented on July 3, 2024

@larsbuntemeyer Could you provide some CORDEX dataset files that I could try to debug PrePARE with? Thank you.

from cmor.

mauzey1 avatar mauzey1 commented on July 3, 2024

CMOR doesn't seem to use the values in the DRS section in CMIP6_CV.json. I'm not sure if it should. @durack1 @taylor13 Any thoughts on this?

Regarding the parts where cmor_CV_checkParentExpID and cmor_CV_checkSubExpID are causing PrePARE to crash, I think this is due to the attributes parent_experiment_id and sub_experiment_id not being found for experiment entries in CORDEX_CV.json.

I think CMOR/PrePARE should be checking required_global_attributes of the CV to see if those attributes need to be present rather than assuming they are required by the MIP. However, I don't see the parent_experiment_id in CMIP6's required global attributes so maybe checking inside the experiment entry for the parent and sub would work better.

@durack1 obs4MIPS and input4MIPs don't have parent and sub experiments. How do they use CMOR/PrePARE?

I agree with using command line arguments with PrePARE to get the right files for the CV.

from cmor.

larsbuntemeyer avatar larsbuntemeyer commented on July 3, 2024

Hi @mauzey1 ! Thanks for picking this up again, sorry, for being unresponsive! I'll provide some CORDEX dataset soon!

CMOR doesn't seem to use the values in the DRS section in CMIP6_CV.json

Yes, that's confusing. It uses file and path templates from the dataset input table, not from the CV. On the other hand, PrePARE has no option to override CMOR_DEFAULT_FILE_TEMPLATE and FILE_NAME_TEMPLATE...

I think CMOR/PrePARE should be checking required_global_attributes of the CV to see if those attributes need to be present rather than assuming they are required by the MIP

I think that is achieved by the _cmip6_option, see also my confusion in this comment: #679 (comment). Again, Prepare does not allow to mirror this behaviour during the checks...

from cmor.

larsbuntemeyer avatar larsbuntemeyer commented on July 3, 2024

I am little bit tempted to implement some backend CV check module purely in python that could be utilized by PrePARE. Json handling and NetCDF attribute handling has really become easy and flexible within python.

from cmor.

durack1 avatar durack1 commented on July 3, 2024

You folks are starting to get aligned with what we've been discussing. We are thinking of following a structure similar to
220714_mip-cmor-tables - note this has evolved a little, but is fine for discussion.

If we follow this structure, then the <MIPName>_CV.json could be assumed, therefore allowing an update to PrePARE and CMOR to work across multiple projects without having to implement per-project hacks.

Note, we are trying to glean as much information about CMOR usage in the draft CMOR Users Survey, so if we have more specific info we want to collect from the community as part of planning, then feel free to suggest augmentations/edits to the questions we have drafted

from cmor.

larsbuntemeyer avatar larsbuntemeyer commented on July 3, 2024

@larsbuntemeyer Could you provide some CORDEX dataset files that I could try to debug PrePARE with? Thank you.

@mauzey1 You can now download current CORDEX example datasets from artifacts, e.g., here: https://github.com/WCRP-CORDEX/cordex-cmip6-cmor-tables/actions/runs/4043896243

from cmor.

matthew-mizielinski avatar matthew-mizielinski commented on July 3, 2024

@durack1 @matthew-mizielinski Do you know of any MIPs that use parent and sub experiments other than CMIP6?

Parent information: yes, particularly where they are extending CMIP6, e.g. run ssp245 with sulphate injections to keep global
average temperature below a target.

sub-experiment id: yes, where there are initialised experiments such as those for the decadal/seasonal forecasting area.

from cmor.

taylor13 avatar taylor13 commented on July 3, 2024

@matthew-mizielinski : Aren't those considered CMIP6 runs, even if they are not the DECK/historical runs?

from cmor.

matthew-mizielinski avatar matthew-mizielinski commented on July 3, 2024

@matthew-mizielinski : Aren't those considered CMIP6 runs, even if they are not the DECK/historical runs?

Sorry, I should have put examples. Both of the following are outside of CMIP6;

  • for the parent information I have a fork of the CMIP6 MIP tables that I used for the Met Office data being provided to the ARISE SAI project (variables the same, but changed to allow a different MIP era, activity id and experiment id)
  • for the sub-experiment id then SNAPSI have a small set of experiments with sub-experiment ids used for the start/initialisation dates -- in this case I think there is a short period (a few months) of data being required at 6 hourly frequency.

from cmor.

mauzey1 avatar mauzey1 commented on July 3, 2024

I'm currently making changes to PrePARE to make its checks more like those done when cmorizing datasets. Below is the relevant section of cmor.c that does those checks.

cmor/Src/cmor.c

Lines 3038 to 3062 in 304db22

if (cmor_has_cur_dataset_attribute(GLOBAL_ATT_INSTITUTION_ID) == 0) {
ierr += cmor_CV_setInstitution(cmor_tables[nVarRefTblID].CV);
}
if (cmor_has_cur_dataset_attribute(GLOBAL_IS_CMIP6) == 0) {
ierr += cmor_CV_checkSourceID(cmor_tables[nVarRefTblID].CV);
ierr += cmor_CV_checkExperiment(cmor_tables[nVarRefTblID].CV);
ierr += cmor_CV_checkFurtherInfoURL(nVarRefTblID);
ierr += cmor_CV_checkGrids(cmor_tables[nVarRefTblID].CV);
ierr += cmor_CV_checkParentExpID(cmor_tables[nVarRefTblID].CV);
ierr += cmor_CV_checkSubExpID(cmor_tables[nVarRefTblID].CV);
}
//
// Set user defined attributes and explicit {} sets.
//
ierr += cmor_CV_checkGblAttributes(cmor_tables[nVarRefTblID].CV);
//
// Copy block to ensure all attributes are set for obs4MIPs
// especially (source_label)
//
if ( cmor_current_dataset.furtherinfourl[0] != '\0') {
ierr += cmor_CV_checkSourceID(cmor_tables[nVarRefTblID].CV);
ierr += cmor_CV_checkFurtherInfoURL(nVarRefTblID);
}

The check for source ID is done if the GLOBAL_IS_CMIP6 variable is true, or if current dataset's further info URL is not an empty string. Should the source ID check be done by default? Isn't that attribute essential for all MIPs?

from cmor.

larsbuntemeyer avatar larsbuntemeyer commented on July 3, 2024

The check for source ID is done if the GLOBAL_IS_CMIP6 variable is true, or if current dataset's further info URL is not an empty string. Should the source ID check be done by default? Isn't that attribute essential for all MIPs?

That would also impact the cmorization itself, not only PrePARE, right? That would work for CORDEX with CMIP6. But i know that people are also using the old CVs with cmor3, in those, there is no source_id but it uses the old model_id vocabulary, so that might be a problem @sol1105 ?

from cmor.

mauzey1 avatar mauzey1 commented on July 3, 2024

@durack1 Why is CMOR checking if the further_info_url is not an empty string before checking the source ID? The comment above this section of code suggests that it is related to obs4MIPs even though obs4MIPs_CV.json doesn't contain anything for further_info_url. Should checking further_info_url be a CMIP6-only check?

cmor/Src/cmor.c

Lines 3055 to 3062 in 304db22

//
// Copy block to ensure all attributes are set for obs4MIPs
// especially (source_label)
//
if ( cmor_current_dataset.furtherinfourl[0] != '\0') {
ierr += cmor_CV_checkSourceID(cmor_tables[nVarRefTblID].CV);
ierr += cmor_CV_checkFurtherInfoURL(nVarRefTblID);
}

from cmor.

durack1 avatar durack1 commented on July 3, 2024

@mauzey1 good question, the further_info_url (supported by ES-Docs, but I am unable to find a working example at this moment) is indeed supported for obs4MIPs (see https://github.com/PCMDI/obs4MIPs-cmor-tables/blob/master/inputs/pcmdi/RSS/RSS_prw_v07r01.json#L6), but all this hard coding is not ideal. We don't have further_info_url support for input4MIPs, so what would be the case for that project?

from cmor.

mauzey1 avatar mauzey1 commented on July 3, 2024

Getting back into working on this issue, I've dug deeper into how the further_info_url attribute is used by PrePARE. When PrePARE calls the function check_furtherinfourl, it will retrieve the further info URL template from the variable cmor_current_dataset.furtherinfourl.

cmor/Src/cmor_CV.c

Lines 373 to 377 in 8d45339

/* -------------------------------------------------------------------- */
/* Retrieve default Further URL info */
/* -------------------------------------------------------------------- */
strncpy(szFurtherInfoURLTemplate, cmor_current_dataset.furtherinfourl,
CMOR_MAX_STRING);

cmor_current_dataset.furtherinfourl was initialized with the default value for the URL template.
#define CMOR_DEFAULT_FURTHERURL_TEMPLATE "https://furtherinfo.es-doc.org/<mip_era><institution_id><source_id><experiment_id><sub_experiment_id><variant_label>"

cmor_current_dataset.furtherinfourl would either use the value of further_info_url stored in the CV file or in the user input if CMOR was being used to write data. However, PrePARE doesn't do this so it will just use the default template.

Another issue that I have noticed is the difference between the default value of the further info URL's template, and the value stored in the further_info_url attribute of CV files. CMIP6_CV.json has the following for this attribute.

        "further_info_url":[
            "https://furtherinfo.es-doc.org/.*"
        ]

Although this appears to be a regex for matching any URL that begins with https://furtherinfo.es-doc.org/, the following lines from cmor_CV_checkFurtherInfoURL will make the function just ignore checking the URL due to the lack of "<>" tokens.

cmor/Src/cmor_CV.c

Lines 379 to 385 in 8d45339

/* -------------------------------------------------------------------- */
/* If this is a string with no token we have nothing to do. */
/* -------------------------------------------------------------------- */
szToken = strtok(szFurtherInfoURLTemplate, "<>");
if (strcmp(szToken, cmor_current_dataset.furtherinfourl) == 0) {
return (0);
}

So even if PrePARE got the value for further_info_url from the CV, it would just be ignoring the URL if the value was not a template that had "<>" tokens like the default in cmor.h.

The check for the further info URL is pretty much hardwired for CMIP6, or at least any project that used the same template for further info URLs.

Should we place the further info URL check within the group of CMIP6 checks, which can be disabled by the PrePARE user?

In CMOR, the check_sourceID fucntion would be called if cmor_current_dataset.furtherinfourl is not an empty string.

cmor/Src/cmor.c

Lines 3055 to 3062 in 8d45339

//
// Copy block to ensure all attributes are set for obs4MIPs
// especially (source_label)
//
if ( cmor_current_dataset.furtherinfourl[0] != '\0') {
ierr += cmor_CV_checkSourceID(cmor_tables[nVarRefTblID].CV);
ierr += cmor_CV_checkFurtherInfoURL(nVarRefTblID);
}

This would mean that the checks for further info URLs and source_ids would be performed unless the user set further_info_url to be an emtpy string. I can understand doing this for the URL but not the source_id.

Shouldn't check_sourceID be done regardless of the further info URL, or whether or not the project is CMIP6?

from cmor.

durack1 avatar durack1 commented on July 3, 2024

@mauzey1 thanks for continuing to chip away at these issues. further_info_url, which uses the es-doc URL, is only applicable to the CMIP6 and obs4MIPs project, within input4MIPs we define this as a URL pointing elsewhere, and for some of the earlier generated source_id entries these don't exist. I also believe that for the latest datasets, even obs4MIPs has deprecated the ES-Doc usage (@gleckler1 can confirm)

from cmor.

durack1 avatar durack1 commented on July 3, 2024

Shouldn't check_sourceID be done regardless of the further info URL, or whether or not the project is CMIP6?

And this is exactly correct. The way that we have things set up, if we don't have a source_id defined (in CVs) that matches the dataset being written, then the program should throw a warning at minimum.

@taylor13 @matthew-mizielinski can confirm their agreement

from cmor.

durack1 avatar durack1 commented on July 3, 2024

@mauzey1 it would be useful to update PrePARE alongside the last planned release of CMOR3 to enable consistency checking as much as possible - let's revisit this issue as the final changes for 3.9.0 are identified to see what can be achieved within available time/resources

from cmor.

taylor13 avatar taylor13 commented on July 3, 2024

There is a lot of stuff considered above. Not sure we should spend resources generalizing PrePARE to handle the generality of the 3.9.0 release. Might want to wait on this for integration into the CMOR 4.0 development. @mauzey1, you might be in the best position to estimate how big the job. What's your advice on this?

from cmor.

durack1 avatar durack1 commented on July 3, 2024

@taylor13 I agree with this, if we have a "final" CV template that we can build PrePARE4 (along CMOR4) to work with, then I suggest that's the best path forward, rather than attempting to wrangle two codebases (C - CMOR, and Python - PrePARE) together.

@mauzey1 if you agree with this, we can tag this against the 4.0 milestone

from cmor.

mauzey1 avatar mauzey1 commented on July 3, 2024

@durack1 Yes, I agree with your suggestion of making this a CMOR4 milestone.

from cmor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.