The latest versions of <a href="https://anaconda.org/conda-forge/libnetcdf" rel="nofol

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It should be easy to expose <a href="https://docs.unidata.ucar.edu/netcdf-c/4.9.2/grou

Another piece of context here; <a href="https://gmd.copernicus.org/articles/9/4381/201

Another piece of context here; <a href="https://gmd.copernicus.org/articl

Exposing latest netcdf 4.9.x library functionality: quantize, zstandard about cmor HOT 13 CLOSED

durack1 commented on July 4, 2024

Exposing latest netcdf 4.9.x library functionality: quantize, zstandard

from cmor.

Comments (13)

durack1 commented on July 4, 2024

In the presentation (noted above), E3SM atmospheric data contributed to CMIP6 had a compression ratio of 1.8, so around half the lossless uncompressed size (netcdf deflate) vs vanilla writes.

To gauge the impact of using lossy compression on data usability, a couple of example cases would be great to test:

the impact of lossy compression (BitRound, BitGroom, GranularBR) when multiple layered operations occur (e.g. ocean data is regridded, then multiple separate fields (thetao, so, ..) are used to calculate a derived quantity - e.g. ocean heat content - so very preliminary discussion is contained in Griffies et al., 2016 Appendix A
the impact of lossy compression on radiation calculations e.g. degraded cloud area fraction being able to recover correct radiation properties (e.g. @martinjuckes)
the impact of unit changes. In CMIP5 ocean potential temperature (thetao) was stored/requested in units of K, whereas in CMIP6 we mapped to degC. I recall that this had an impact on deflation (one was far better than another), but do not recall the specific, or which one was better - maybe degC gave us better compression, so an improvement in CMIP6?
also consider cases where 70% of the grid is masked/missing (land variables), or 30% (ocean variables)
impacts on read/write performance/compute vs storage benefits/reduction
impacts on the calculation of long-term (centennial) drift in ocean properties, which need to be accounted for in piControl simulations to get at forced vs unforced variable responses

ping @juliettelavoie @geo-rao - note the 1-n suggestions above are unrelated to CMOR development, but figured it would be useful to co-locate this information so that discussions outside this repo can start to familiarize themselves with some of the dev discussions

from cmor.

taylor13 commented on July 4, 2024

I think several criteria will need to be considered before we specify an appropriate truncation of the mantissa of numbers. Here are a few:

The precision of any observations of the field.
The number of digits needed to recover at least 99% (or whatever limit we agree) of the information content (see Klower et al. https://www.nature.com/articles/s43588-021-00156-2 ). We need to be careful, I think, relying on the Klower approach, which was developed to serve the needs of weather prediction. After a quick skimming of that article, it appears to me they consider only the spatial variations of a field to determine its information content. There may be climate applications where the time-evolution of a field is just as important, and it is possible that simply resolving adequately spatial variations might not be sufficient to resolve temporal variations.
The kinds of differences that will be performed on a field. As Martin pointed out, in diagnosing models, the spatial or temporal differences in a variable may be meaningful even if we don't believe the absolute values to anywhere near that precision. It may be difficult to determine ahead of time all of the differences (and similar types of diagnostic calculations) which will be performed and therefore what precision is needed.

from cmor.

durack1 commented on July 4, 2024

@taylor13 agreed, it will be useful to catch these comments and redirect toward another place so that testing impacts on data usability and access can be undertaken.

This particular issue can be closed when we've ascertained how to expose these new netcdf functions through CMOR, and indeed whether this is available in the CMOR3 or CMOR4 (future) releases. If it is possible, having these available in a soon to be released version would be my preference, if this is relatively little work

from cmor.

mauzey1 commented on July 4, 2024

It should be easy to expose nc_def_var_quantize and nc_def_var_zstandard in the same way we do with nc_def_var_deflate and cmor_set_deflate.

We might need to add check for the version of NetCDF4 being used to determine if the functions are supported similar to the following code.

cmor/Src/cmor.c

Lines 25 to 42 in 047fd2c

    
           /* ==================================================================== */ 
        
           /*      this is defining NETCDF4 variable if we are                     */ 
        
           /*      using NETCDF3 not used anywhere else                            */ 
        
           /* ==================================================================== */ 
        
           #ifndef NC_NETCDF4 
        
           #define NC_NETCDF4 0 
        
           #define NC_CLASSIC_MODEL 0 
        
           int nc_def_var_deflate(int i, int j, int k, int l, int m) 
        
           { 
        
               return (0); 
        
           }; 
        
           int nc_def_var_chunking(int i, int j, int k, size_t * l) 
        
           { 
        
               return (0); 
        
           }; 
        
           #endif

from cmor.

matthew-mizielinski commented on July 4, 2024

Another piece of context here; Baker et al added some data with lossy compression to an ensemble of CESM simulations. Several of the figures here highlight the benefits of and issues created by introducing it

from cmor.

durack1 commented on July 4, 2024

Another piece of context here; Baker et al added some data with lossy compression to an ensemble of CESM simulations. Several of the figures here highlight the benefits of and issues created by introducing it

Nice catch! It looks like depending on what you're calculating (Figures 5, 6, 9, 13) it does matter. Their section 6 "Lessons learned" is certainly worth reading, and notes that relationships between variables matter, focused on their surface energy balance anomaly using 4 separate variables - pointing out that commonly derived variables need to be a primary consideration, as do high frequency/precipitation extremes and other very data sensitive analyses. They also point out how missing/fill values are treated needs to be a consideration. It could be useful to loop around with Allison and Gary (@strandwg) to see if follow on analyses are available

from cmor.

taylor13 commented on July 4, 2024

Thanks for the link to Baker. Looks like useful information.

from cmor.

strandwg commented on July 4, 2024

Allison is definitely the person to talk to.

from cmor.

sashakames commented on July 4, 2024

Hey, a consideration to stick with "Lossy" compression and skip the "lossless" compression is that the latter is done in hardware on the storage layer and much more efficient that way, you are saving CPU cycles that way. Just another consideration, and I didn't have a chance to raise that with Charlie yesterday.

from cmor.

durack1 commented on July 4, 2024

the latter is done in hardware on the storage layer and much more efficient that way, you are saving CPU cycles that way.

Is this on the GPFS hardware that you're talking about? So this is infrastructure/hardware dependent, right?

from cmor.

sashakames commented on July 4, 2024

typically the blockstorage controller. All the US datacenters have it for sure, its a fairly standard feature been around for a while, but true not a guarantee that every center will enable that. (But I would be surprised if they don't by now)

from cmor.

sashakames commented on July 4, 2024

OTOH, the end-user will end up with a larger file size if downloaded locally, so there would be a trade off (but the downloading tool could then rewrite the file with L3 lossless compression to save space locally; I'd hope by CMIP7 wget is a relic and most use a more robust client that would offer such a feature).

from cmor.

czender commented on July 4, 2024

That's news to me. What type of lossless compression do these blockstorage controllers implement? Zstandard? DEFLATE? Or...?

from cmor.

Exposing latest netcdf 4.9.x library functionality: quantize, zstandard about cmor HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	/* ==================================================================== */
	/* this is defining NETCDF4 variable if we are */
	/* using NETCDF3 not used anywhere else */
	/* ==================================================================== */

	#ifndef NC_NETCDF4
	#define NC_NETCDF4 0
	#define NC_CLASSIC_MODEL 0
	int nc_def_var_deflate(int i, int j, int k, int l, int m)
	{
	return (0);
	};

	int nc_def_var_chunking(int i, int j, int k, size_t * l)
	{
	return (0);
	};
	#endif