climate-explorer-data-prep's People
climate-explorer-data-prep's Issues
Precompute GDD, HDD, FFD, Snowfall from GCM outputs
Precompute data files of the following variables, for each model output file available (raw GCM and downscaled). Descriptions below are taken from Plan2Adapt:
-
Growing Degree-Days (GDDs) is a derived variable that indicates the amount of heat energy available for plant growth, useful for determining the growth potential of crops in a given area. It is calculated by multiplying the number of days that the mean daily temperature exceeded 5°C by the number of degrees above that threshold. For example, if a given day saw an average temperature of 8°C (3°C above the 5°C threshold), that day contributed 3 GDDs to the total. If a month had 15 such days, and the rest of the days had mean temperatures below the 5°C threshold, that month would result in 45 GDDs.
-
Heating Degree-Days (HDDs) is a derived variable that can be useful for indicating energy demand (i.e. the need to heat homes, etc.). It is calculated by multiplying the number of days that the average (mean) daily temperature is below 18°C by the number of degrees below that threshold. For example, if a given day saw an average (mean) temperature of 14°C (4°C below the 18°C threshold), that day contributed 4 HDDs to the total. If a month had 15 such days, and the rest of the days had average (mean) temperatures above the 18°C threshold, that month would result in 60 HDDs.
-
Frost-free days is a derived variable referring to the number of days that the minimum daily temperature stayed above 0°C, useful for determining the suitability of growing certain crops in a given area. The method used to compute this on a monthly basis is from (Wang et al, 2006).
-
'Precipitation as Snow' is a derived variable, calculated from GCM projected total precipitation (rain and snow) as well as temperature as per (Wang et al, 2006).
References
Wang, T.L., Hamann, A., Spittlehouse, D.L. and Aitken, S.N., 2006. "Development of scale-free climate data for Western Canada for use in resource management", International Journal of Climatology, 26: 383-397. Details the ClimateBC empirical downscaling tool.
prsn data should be in cm, not mm
According to Trevor, precipitation as snow is normally given in centimeters, so having our prsn data in centimeters as well (instead of mm or kg/m2/day) will better communicate that we are dealing with snow.
Generate backwards-compatible frequency values
We've updated the frequency attribute of multi year climatologies to include the operation used to aggregate data for the climatology, so files that might previous have had the frequency value sClim
will now be generated with sClimMean
or sClimSD
.
I can see this posing a problem when we need to re-create an already-existing climatology. For ecample, if a mistake is discovered in the datafile txxETCCDI_sClim_BCCAQ_MRI-CGCM3_historical-rcp85_r1i1p1_20700101-20991231
and it needs to be recreated, the recreation will have the unique_id txxETCCDI_sClimMean_BCCAQ_MRI-CGCM3_historical-rcp85_r1i1p1_20700101-20991231
and the indexer won't realize this file is an update of the previous one, possibly resulting in weird bugs when they are both present in the database.
Possible options:
- Modify the indexer to understand that frequency values with
whateverMean
match frequency values a unique_id with the frequency valuewhatever
- Add a generate-backwards-compatible-frequency-values flag to the
generate_climos
script - Do nothing, because this problem probably won't come up very often, and either of those solutions is more headache than it's worth
Update nchelpers version in requirements
In order to use generate_climos
for data files with project_id: other
nchelpers has to be on version 5.5.2
or newer.
generate_prsn not producing correct fill values
The files output by the snowfall generation script (generate_prsn
) have a strange issue with their fill values. The values themselves are the same as they in the parent pr
netCDF, -32767
. However, they do not mask appropriately displaying the number rather than an _
. Furthermore, the metadata states that the fill values should be 32768
.
I assume the issue is occurring somewhere in create_prsn_netcdf_from_source(...)
but cannot say for sure. This needs to be explored further.
Non-monotonic longitudes in netCDF file
NetCDF files are sometimes generated with longitudes that go from 0 to 180 and then from -180 to 0, like this:
ncdump -v lon tasmax_aClim_CanESM2_historical_r3i1p1_19610101-19901231.nc
netcdf tasmax_aClim_CanESM2_historical_r3i1p1_19610101-19901231 {
dimensions:
...
// global attributes:
...
data:
lon = 0, 2.8125, 5.625, 8.4375, 11.25, 14.0625, 16.875, 19.6875, 22.5,
25.3125, 28.125, 30.9375, 33.75, 36.5625, 39.375, 42.1875, 45, 47.8125,
50.625, 53.4375, 56.25, 59.0625, 61.875, 64.6875, 67.5, 70.3125, 73.125,
75.9375, 78.75, 81.5625, 84.375, 87.1875, 90, 92.8125, 95.625, 98.4375,
101.25, 104.0625, 106.875, 109.6875, 112.5, 115.3125, 118.125, 120.9375,
123.75, 126.5625, 129.375, 132.1875, 135, 137.8125, 140.625, 143.4375,
146.25, 149.0625, 151.875, 154.6875, 157.5, 160.3125, 163.125, 165.9375,
168.75, 171.5625, 174.375, 177.1875, -180, -177.1875, -174.375,
-171.5625, -168.75, -165.9375, -163.125, -160.3125, -157.5, -154.6875,
-151.875, -149.0625, -146.25, -143.4375, -140.625, -137.8125, -135,
-132.1875, -129.375, -126.5625, -123.75, -120.9375, -118.125, -115.3125,
-112.5, -109.6875, -106.875, -104.0625, -101.25, -98.4375, -95.625,
-92.8125, -90, -87.1875, -84.375, -81.5625, -78.75, -75.9375, -73.125,
-70.3125, -67.5, -64.6875, -61.875, -59.0625, -56.25, -53.4375, -50.625,
-47.8125, -45, -42.1875, -39.375, -36.5625, -33.75, -30.9375, -28.125,
-25.3125, -22.5, -19.6875, -16.875, -14.0625, -11.25, -8.4375, -5.625,
-2.8125 ;
}
There is no geographic discontinuity in this file, but there is a numerical discontinuity. Some software tools, such as ncWMS and CDO, have trouble working with the bounding boxes of polygons which span across the -180/180 longitude line, and have positive longitude minimums and negative longitude maximums.
ncWMS's response to requesting a map for an area that crosses the numerical discontinuity is:
<ServiceExceptionReport version="1.3.0" xsi:schemaLocation="http://www.opengis.net/ogc http://schemas.opengis.net/wms/1.3.0/exceptions_1_3_0.xsd"><ServiceException>
Invalid bounding box format
</ServiceException></ServiceExceptionReport>
Currently we anticipate no concrete repercussions from this issue, since all our use cases involve displaying maps of Canada, which does not go anywhere near the antemeridian. So it's low priority.
Excluded tests are broken
The test suite passes on Travis but will fail on local machines. This is because only 3 out of 5 test files are run by Travis (test_units_helpers.py, test_update_metadata.py, test_decompose_flow_vectors.py
). Running pytest
with excluded files (test_split_merged_climos.py, test_create_climo_files.py
) break with a collection of similar errors:
tests/test_split_merged_climos.py:69:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dp/split_merged_climos.py:51: in split_merged_climos
output_filepath = os.path.join(outdir, cf.cmor_filename)
/tmp/venv/lib/python3.6/site-packages/nchelpers/decorators.py:26: in wrapper
res = func(*args, **kwargs)
/tmp/venv/lib/python3.6/site-packages/nchelpers/__init__.py:351: in __getattribute__
value = super(CFDataset, self).__getattribute__(name)
/tmp/venv/lib/python3.6/site-packages/nchelpers/__init__.py:1568: in cmor_filename
extension='.nc', **self._cmor_type_filename_components()
/tmp/venv/lib/python3.6/site-packages/nchelpers/__init__.py:1494: in _cmor_type_filename_components
components.update(ensemble_member=self.ensemble_member)
/tmp/venv/lib/python3.6/site-packages/nchelpers/decorators.py:26: in wrapper
res = func(*args, **kwargs)
/tmp/venv/lib/python3.6/site-packages/nchelpers/__init__.py:351: in __getattribute__
value = super(CFDataset, self).__getattribute__(name)
/tmp/venv/lib/python3.6/site-packages/nchelpers/__init__.py:562: in ensemble_member
components[component] = getattr(self.gcm, attr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <nchelpers.CFDataset.AutoGcmPrefixedAttribute object at 0x7fb510b52748>, attr = 'realization'
def __getattr__(self, attr):
prefixed_attr = self._prefixed(attr)
try:
return getattr(self.dataset, prefixed_attr)
except AttributeError:
raise CFAttributeError(
"Expected file to contain attribute '{}' but no such "
> "attribute exists".format(self._prefixed(attr)))
E nchelpers.exceptions.CFAttributeError: Expected file to contain attribute 'GCM__realization' but no such attribute exists
Generally, there is an attribute being accessed that does not exist named [some prefix]__realization
.
One of threes things could be done:
- Fix the broken tests
- Remove the broken tests
- Comment/Explain why the tests are broken/excluded
Index gridded observations data files
Add tests and necessary changes to index gridded observations data files.
Update variable information file with new 'alt' variables
Background and details: pacificclimate/modelmeta#46
update_metadata: parse_ensemble_code should return np.int32
Correct computation of climatologies of min/max climate index variables
We have been computing climatologies for climate index variables involving miniumum and maximum values incorrectly. Specifically, this is known to be a problem for the variables rx1day
, rx5day
. There may be others.
The climatology script presently computes multi-decadal monthly, seasonal, and annual means by applying the CDO operators ymonmean
, yseasmean
, and timmean
(respectively) to the data file, regardless of the variable. All three of these operators take averages of the variable values both within the the intra-year interval (month, season, year) and then across the multi-decadal period for each such interval.
For climate index variables such as rx1day
and rx5day
, it is incorrect to take averages within the intra-year interval. Instead, an operator appropriate to the type of value must be applied, namely the maximum of the values within the interval. For other variables, the intra-year interval operator may have to be different, e.g., for a variable that involved minimum values, the operator would likely have to take minimums within the intra-year interval.
(Note: As our base datasets for rx1day
and rx5day
have monthly resolution, the problem only arises for the seasonal and annual climatologies; mean and max are the same for a 1-item (1-month) interval. But we should fix it generically, because there is no guarantee that this won't be applied to some dataset with sub-monthly temporal resolution.)
The following is an excerpt from an email chain discussing the problem.
[[REG]] Then we take climatological means of both these variables, meaning we take 30-year means of monthly, seasonal, and annual averages of the variables. It's not clear to me which of these values is meaningful and/or useful to our clients.
[[TQM]] Almost. We don’t take 30-year means of monthly, seasonal, and annual AVERAGES rather we take 30-year means of monthly, seasonal, or annual MAXIMA
[[REG]] Maybe that's what we SHOULD be doing, but our data preparation script currently forms 30-year means of monthly, seasonal, and annual AVERAGES. Specifically, for a time series with delta-t = 1 month, the seasonal and annual means are 30-year means of the MEANS of that time series the indicated period (seasonal, annual) within each year. I think that you are saying (and it makes sense to me), that this should be 30-year means of the MAXIMA over the indicated period. If so, we need to change our climatological-values script to process these variables correctly. And we need to establish exactly which ones get this treatment, which get the "means of means" treatment, and which, if any get some other treatment.
[[TQM]] It’s fine to take a climatological mean in the sense of averaging over 30 years – as long as you’re doing that last. But annual RX1day is the maximum of the 12 monthly maxima, summer RX1day is the maximum of the 3 monthly maxima. If you are instead averaging where I’m saying that you should take a maximum that variable isn’t a thing – there’s nothing else we can call it. It should never be calculated that way and certainly never be displayed – it’s quite misleading since it’s similar but different to something we do produce on a regular basis. The reason that this thing that is now being computed doesn’t have a name is because from a user perspective it’s meaningless. RX1day June is the wettest day in June, RX1day July is wettest day in July, RX1day is wettest day in August. RX1day summer is wettest day in summer – that HAS to be the maximum of the three monthly maximums. The average of those 3 values just doesn’t measure anything since individual months in the same season can be quite different from each other. The average of all months’ RX1day values doesn’t tell us anything.
update_metadata: Add custom function returning long_name for variable name
Return the appropriate long_name
value for CLIMDEX index names.
This could be extended more generically to other variable names (e.g., tasmax
), but there's no apparent need for that case. But we do need it for CLIMDEX indices.
update_metadata: Need to correct attributes of CLIMDEX variables
This is an epic (overarching issue or user story).
Problem:
All of the CLIMDEX files formed from BCCAQ (ver 1) downscaled GCM data have the following two problems. It seems likely that other CLIMDEX files generated at PCIC may also.
- attribute
cell_methods
is absent or else ="time: maximum"
(this is incorrect for many indices) - attribute
long_name
= CLIMDEX index abbreviation, not the long name
In other BCCAQ CLIMDEX files there are likely similar problems.
Proposed solution:
Add features to update_metadata
that make it possible to write updates along the following lines:
<dependent variable name>:
cell_methods: = cell_method_for(<dependent variable name>)
standard_name: = standard_name_for(<dependent variable name>)
This innocent bit of specification requires the following features:
- Access to dependent variable names: so that we can obtain the name of the particular CLIMDEX variable in this file
- Ability to specify an expression value as a key: so that we can specify the name of the particular CLIMDEX variable in this file (which varies from file to file) as the target (container) for the attribute updates
- Add custom functions (or possibly just dicts) mapping variable name to cell_methods and long_name
Degree day annual data should be the sum of degree day seasonal data
I generated degree data climatologies, but I think the usual climatology approach where an annual value is the mean of the seasonal value is meaningless for an accumulative value like degree days. I think we want annual values to be the sum of seasonal values. I need to update generate_climos
and redo that data.
update_metadata: Improve custom fn normalize_experiment_id
Normalize list separators: space after comma.
Form climo means of streamflows
Currently we can form climatological means from files containing variables defined over spatiotemporal grids, such as the outputs of GCMs, but not from streamflow output files.
Streamflow, however, is not defined on a grid. A streamflow for a given spatial location is a time series at that location, called an outlet. The collection of outlets do not form a uniform grid -- instead they are distributed essentially at random. Outlets are addressed by an outlet index, with several dependent variables defining the spatial location, name, and streamflow at that outlet.
We need to handle this case too.
Precompute multi-model ensemble statistics
Precompute files of statistics across ensembles for all available variables.
Ensembles:
- All runs
- All available models
Statistics:
- mimimum
- maximum
- ? average
- percentiles:
- 10th
- 25th
- 50th (median)
- 75th
- 90th
Variables:
- tasmin
- tasmax
- pr
- CLIMDEX indices (all?)
Wrong units in datafiles
According to Trevor, the following variables have the wrong units:
Variable | Current Units | Correct Units |
---|---|---|
rp20pr | mm/day | mm |
rp50pr | mm/day | mm |
rp5pr | mm/day | mm |
sdiiETCCDI | mm/day | mm |
Units are extracted directly from the datafiles, so the solution should just be updating the affected datafiles and re-indexing them, perhaps also letting whoever generated the datafiles know that they ended up with the wrong units.
This is distinct from the issue of variables with scientifically correct, but non-user-friendly units.
Add a climatological periods argument to generate_climos
By default, the generate_climos
script creates climatlogies for all of the periods available in the input file. This is great when starting from scratch. However there would be use cases when we're infilling climatologies (e.g. after a failure mid-script) where we want to generate some of the periods, but not all.
We should add a command line flag (like the one that is already sketched out here) that allow the user to select only the periods that they want. All by default.
update_metadata: Add for ... in ... iteration syntax
YAML syntax:
"for <variables> in <expression>":
<update key>:
<etc>: ...
Semantics: Execute the specified updates in a context that includes the value of the <variables>
for each result of <expression>
.
OK, it's Friday night, this is way over the top, but it is certainly doable.
The most general and elegant (for certain values of 'elegant') is to use exec
to build a generator from the for expression, and to then iterate that generator and evaluate the subsidiary update directives in a context with the <variables>
set according to the yielded values. Something like so:
def make_for_generator(variables, expression):
exec('''
for {v} in {e}:
yield {v}
'''.format(v=variables, e=expression)
)
# ... parse YAML "for" key into variables, expression ...
for_generator = make_for_generator(variables, expression)
for vars in for_generator:
execute_updates(updates, vars)
Isn't Python just fucking awesome?
update_metadata: Add expression evaluation for keys
Allow the key of an update specifier to be the value of an expression. The resulting key value is processed as if it were a literal.
YAML syntax:
"=<expression>": ...
Add a time resolution argument to generate_climos
Like #74, when infilling climos, we may want to generate climatologies for one resolution of data (e.g. yearly), but not others (e.g. monthly). Let's make this configurable by the user, all by default.
generate_climos doesn't recognize all hydrologic model output variables
It needs to cover the following variables
BASEFLOW
EVAP
GLAC_AREA_BAND
GLAC_MBAL_BAND
GLAC_OUTFLOW
PET_NATVEG
PREC
RAINF
RUNOFF
SNOW_MELT
SOIL_MOIST_TOT
SWE
SWE_BAND
TRANSP_VEG
Generate Precipitation as Snow
Create a script that will generate precipitation as snow
data using precipitation
, tasmin
and tasmax
. Ensure that the result has all the necessary metadata such that it can then be run through generate_climos
.
update_metadata delete operation fails
Bug in code, tries to delete value, not key. Oh the humanity.
update_metadata: Don't delete on rename from absent attribute
When reprocessing a file with the same updates, a rename will cause an already-renamed attribute to be deleted (because the renamed attribute is now missing and its value returns None, which in turn causes NetCDF4 to remove the target attribute). Don't do this. It's inconvenient and it doesn't have any useful application.
update_metadata: Add custom function to normalize experiment id values
Some files have non-standard values in the experiment_id
attributes. We want to normalize those to the DRS. Provide a custom function for that for use in expressions.
Multiple fill values in one file
An example file:
/storage/data/climate/downscale/CMIP5/BCCAQ/climdex/CNRM-CM5_historical+rcp26_r1i1p1/rx5dayETCCDI_mon_BCCAQ_CNRM-CM5_historical-rcp26_r1i1p1_19500116-21001216.nc
The fill value for rx5dayETCCDI is listed as 1e+20
:
$ncdump -h rx5dayETCCDI_mon_BCCAQ+ANUSPLIN300+CNRM-CM5_historical+rcp26_r1i1p1_195001-210012.nc
float rx5dayETCCDI(time, lat, lon) ;
rx5dayETCCDI:units = "mm" ;
rx5dayETCCDI:_FillValue = 1.e+20f ;
rx5dayETCCDI:long_name = "Monthly Maximum Consecutive 5-day Precipitation" ;
rx5dayETCCDI:cell_methods = "time: maximum" ;
rx5dayETCCDI:history = "Created by climdex.pcic 1.1.1 on Wed Jun 4 10:09:38 2014" ;
But large chunks of the array are filled with 2.945782e+34
instead. The backend yields the following not-very-graphable timeseries from this file:
{
"units": "mm",
"id": "rx5dayETCCDI_mon_BCCAQ_CNRM-CM5_historical-rcp26_r1i1p1_19500116-21001216",
"data": {
"1950-01-16T00:00:00Z": Infinity,
"1950-02-14T12:00:00Z": Infinity,
"1950-03-16T00:00:00Z": Infinity,
"1950-04-15T12:00:00Z": Infinity,
"1950-05-16T00:00:00Z": Infinity,
"1950-06-15T12:00:00Z": Infinity,
"1950-07-16T00:00:00Z": Infinity,
}
}
Not entirely sure if this is a data prep issue or a backend issue.
Generate Climatologies with Snowfall Data
Snowfall data needs to be added as a variable that can be accepted by generate_climos
. Furthermore, convert_pr_var_units(...)
needs to be extended to handle prsn
data.
Distro does not install variable information data file
Wrong CDO operations for monthly counted datasets
When calculating climatologies from monthly counted datasets, the following error occurs:
AttributeError: Unknown method 'seasum'!
The method should be seassum
.
Create (or modify) script to rename variables
Motivation: pacificclimate/modelmeta#46
Task: Script similar to (or extension of) update_metadata
that can rename a variable in a NetCDF file.
Definitely tending towards extending update_metadata
, which already contains 90% of the machinery necessary for a nice implementation of this. Should rename it to something like update_netcdf
.
Add copy and function value features to update_metadata
This is in support of making existing CLIMDEX files indexable, by standardizing their metadata.
Add the following features to update_metadata:
-
Copy assignment:
- Copy the value of one attribute to another
- Syntax:
<name1>: =<name2>
- Semantics: assign value of attribute named
<name1>
to value of attribute named<name2>
-
Function value assignment:
- Assign an attribute a value computed by an arbitrary function of the value of another attribute
- So far can only see a need for passing the value of 1 other attribute, but if it is easy, extend to multiple arguments.
- Don't handle constants as arguments.
- Syntax:
<name1>: =<func>(<name2>, <name3>, ...)
- Semantics: Assign value of attribute named
<name1>
to value of function<func>
applied to arguments<name2>, <name3>, ...
. is defined, by name, in the update_metadata code, and if we need to add functions, that's another PR.
-
Functions:
realization(ensemble_member)
: extract realizationm
fromr<m>i<n>p<l>
ensemble member codeinitialization_method(ensemble_member)
: extract initialization methodn
fromr<m>i<n>p<l>
ensemble member codephysics_version(ensemble_member)
: extract physics versionl
fromr<m>i<n>p<l>
ensemble member code
Ensembles of Degree Day Data
It looks like the PCIC12 is only available for T & P right now.
We just had a question from a user about comparing Cooling Degree Days on the explorer to other tools and getting different values (partly because the other tools do a bad job of showing change from baseline, which we do a better job of, but CDD only shows up for individual models).
Is the PCIC12 for climdex indices in process and do we have an ETA for, or is there a stumbling block for computing it?
update_metadata: Add custom function returning cell_methods for variable name
Return the appropriate cell_methods
value for CLIMDEX index names.
This could be extended more generically to other variable names (e.g., tasmax
), but there's no apparent need for that case. But we do need it for CLIMDEX indices.
Directory name typo in process-climo-means.sh
process-climo-means.sh places things in the climate_exporer_data_prep
directory instead of the climate_explorer_data_prep
directory.
Frost Days - missing?
I don't seem to be able to find the frost days variable. There's freezing degree days but that isn't the same thing. FD counts the # of days below freezing.
dtrETCCDI has inconsistent units
The dtrETCCDI data uses both degC
and degrees_C
as units.
2019-10-13 20:19:56 [2085] [INFO] 172.18.0.1 - - [13/Oct/2019:20:19:56 +0000] "GET /api/data?ensemble_name=ce_files&model=CanESM2&variable=dtrETCCDI&emission=historical,+rcp85×cale=yearly&time=0&area= HTTP/1.1" 500 291 "https://services.pacificclimate.org/pcex/app/" "Mozilla/5
.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"
2019-10-13 20:19:57 [2073] [ERROR] Exception on /api/data [GET]
Traceback (most recent call last):
File "/app/ce/api/data.py", line 142, in data
run_result = result[data_file_variable.file.run.name]
KeyError: 'r1i1p1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 35, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/ce/views.py", line 11, in api_request
return ce.api.call(db.session, *args, **kwargs)
File "/app/ce/api/__init__.py", line 75, in call
rv = func(session, **args)
File "/app/ce/api/data.py", line 147, in data
data_file_variable.file.run, variable),
File "/app/ce/api/util.py", line 31, in get_units_from_run_object
raise Exception("File list {} does not have consistent units {}".format(run.files, units))
Exception: File list [<modelmeta.v2.DataFile object at 0x7f59c0f395f8>, <modelmeta.v2.DataFile object at 0x7f59c0f39668>, <modelmeta.v2.DataFile object at 0x7f59c0f39780>, <modelmeta.v2.DataFile object at 0x7f59c0f39898>, <modelmeta.v2.DataFile object at 0x7f59c0f399b0>, <modelmeta.v2.DataFile object at 0x7f59c0f39ac8>, <modelmeta.v2.DataFile object at 0x7f59c0f39be0>, <modelmeta.v2.DataFile object at 0x7f59c0f39cf8>, <modelmeta.v2.DataFile object at 0x7f59c0f39e10>, <modelmeta.v2.DataFile object at 0x7f59c0f39f28>, <modelmeta.v2.DataFile object at 0x7f59c0f42080>, <modelmeta.v2.DataFile object at 0x7f59c0f42198>, <modelmeta.v2.DataFile object at 0x7f59c0f422b0>, <modelmeta.v2.DataFile object at 0x7f59c0f424e0>, <modelmeta.v2.DataFile object at 0x7f59c0f42668>, <modelmeta.v2.DataFile object at 0x7f59c0f427f0>, <modelmeta.v2.DataFile object at 0x7f59c0f42978>, <modelmeta.v2.DataFile object at 0x7f59c0f42b00>, <modelmeta.v2.DataFile object at 0x7f59c0f42c88>......... File object at 0x7f59c1245748>, <modelmeta.v2.DataFile object at 0x7f59c1245eb8>, <modelmeta.v2.DataFile object at 0x7f59c1245358>, <modelmeta.v2.DataFile object at 0x7f59c1245860>]
does not have consistent units {'degC', 'degrees_C'}
Fix the data to be consistent.
- generate new climatologies
- upload new data to compute canada
- add new data to ncWMS
- replace old data with new data in database
generate multi year means for annual-only climdex indices
There are 480 climdex datasets that are annual-only non-climatology datasets in active use by Climate Explorer.
Historically, we did not support annual-only climatologies, but we do now, and our analysis tools are much nicer for climatologies than non-climatologies, so it makes sense to generate climatologies and replace the non-MYM datasets with them in the database.
SELECT DISTINCT
data_files.unique_id
FROM
ce_meta.ensemble_data_file_variables,
ce_meta.ensembles,
ce_meta.data_files,
ce_meta.time_sets,
ce_meta.data_file_variables
WHERE
ensemble_data_file_variables.ensemble_id = ensembles.ensemble_id AND
ensemble_data_file_variables.data_file_variable_id = data_file_variables.data_file_variable_id AND
data_files.time_set_id = time_sets.time_set_id AND
data_file_variables.data_file_id = data_files.data_file_id AND
ensembles.ensemble_id > 13 AND
time_sets.multi_year_mean = FALSE;
generate_climos should update the history metadata attribute
I have receently been trying to figure out which file a suspicious climatology was made from; it would be nice if generate_climos updated the history
attribute with the command used to run it, similar to the way cdo
does.
update_metadata: Add access to attributes of variables
Extend the local variables context with a dict of variables:
- variable name:
variables
- value: dict of variables
- key: variable name
- value: dict of attributes
Example expression in updates file:
= variables['taxmax']['units']
=> value of attribute units
of variable tasmax
Standardize experiment strings
Climdex variables have experiment strings of the format "historical, rcp26", but downscaled GCM outputs use the format "historical,rcp26". This is a low-priority issue, since there's a workaround in the CE frontend.
update_metadata: Add access to dependent variable names
Extend the global variables context with function dependent_var_names(arg)
.
Value: output of CFDataset.dependent_var_names(arg)
.
Recalculate return period climatologies
The return period climatologies we are using in Climate Explorer do not follow Climate Explorer's conventions on time formatting. Every other annual climatology in Climate Explorer "assigns" the value for the entire climatology to a date in the central year of the climatology. The return period datasets, at least some of them, assign the value to the last day of the climatology.
The nicest way to solve this problem would be to see if Stephen has nominal versions of this data and generate our own climatologies from it.
I did write a script to supply missing time values for this data collection; a lot of the files had timestamps of 0, and units of "days since 01-01-01". The script may be defective, or the error may be in files the script wasn't run on.
Review cell_methods
Recent work has uncovered information that suggests that we may have not been setting cell_methods
correctly in our files.
In particular,
-
cell_methods
in input data files frequently don't record the spacing (interval) of the original data. This may or may not be a real issue, but it does seem to have some relevance when we form climatological statistics, as they form the basis for the climo statisticscell_methods
. -
cell_methods
are probably not correct for climatological outputs. The CF Metadata Conventions is very clear about whatcell_methods
values are considered permissible and correct for climatological statistics. We are not, I believe, following these.
Therefore: Review the content of cell_methods
, both what we receive in input files and what we generate for output files, and determine:
- what they should be
- possibly extending the CF Metadata Conventions if they do not seem to fit our case(s) properly -- but be skeptical of this impulse too
- documenting this in detail for our cases, probably in PCIC Metadata Standards
- what they currently are
- how to handle the differences between (1) and (2), which may include
- rewriting file contents
- updating modelmeta database contents
- asking scientists to modify their data-generation code
- ripping our collective hair out
ACCESS1-0 model outputs have not been indexed
While there are derived outputs from ACCESS1-0 in climate explorer (climdex indices and degree days), the ACCESS1-0 model output climatologies do not seem to be in the climate explorer database.
update_metadata cannot handle invalid blank attributes
I'm not sure if fixing this issue is actually possible.
I attempted to use the update_metadata script to remove an invalid global attribute from some netCDF files. Specifically, some files had a blank string for global: history
, which results in all sorts of weird errors.
I wasn't able to remove the invalid attribute with update metadata, and got this traceback:
2018-12-20 16:17:39 INFO: Processing file: /storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmin_day_BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp45_r1i1p1_19500101-21001231.nc
2018-12-20 16:17:39 INFO: Global attributes:
Traceback (most recent call last):
File "climate-explorer-data-prep/scripts/update_metadata", line 31, in <module>
main(args)
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/dp/update_metadata.py", line 247, in main
process_updates(dataset, updates)
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/dp/update_metadata.py", line 227, in process_updates
apply_attribute_updates(dataset, target, update)
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/dp/update_metadata.py", line 202, in apply_attribute_updates
apply_attribute_updates(dataset, target, element)
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/dp/update_metadata.py", line 196, in apply_attribute_updates
modify_attribute(dataset, target, *attr_updates)
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/dp/update_metadata.py", line 181, in modify_attribute
return delete_attribute(target, name)
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/dp/update_metadata.py", line 145, in delete_attribute
if hasattr(target, name):
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/nchelpers/decorators.py", line 26, in wrapper
res = func(*args, **kwargs)
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/nchelpers/__init__.py", line 353, in __getattribute__
is_indirected, indirected_property = _indirection_info(value)
File "/local_temp/lzeman/climate-explorer-data-prep/venv/lib64/python3.4/site-packages/nchelpers/__init__.py", line 148, in _indirection_info
if isinstance(value, six.string_types) and value[0] == '@':
IndexError: string index out of range
As a workaround, I first set the attribute to a valid string with update_metadata. Then I ran update_metadata a second time to delete the attribute.
Not a very high priority, since there is a workaround.
update_metadata: Log more useful information
- Log only actions that are actually executed. E.g., don't log a rename that is excluded because the old metadata doesn't exist.
- Log result of expression evaluation.
Climatological time bounds should be closed intervals, not half-open
Currently, climatological time bounds are calculated as half-open intervals, which is to say that the end date is computed as the day after the last day averaged; more specifically hour 00:00 of the day after. Because of calendar variations, this is much simpler to compute, but in fact the end date should be the last day averaged.
data corrupted by update_metadata script
I ran the update_metadata script on the giant BCCAQ2 files to rename some metadata attributes. This resulted in an error in the file data. Affected files have normal data for the first few thousand timesteps, but subsequently have a weird data offet, resulting in maps that look like this:
My best guess for the mechanism here is that the offset is caused by a failure to correctly move the data further down the file when adding length (longer attribute names?) to the metadata header. Perhaps because these are netCDF Classic files of size 56 G, and netCDF classic is designed for files smaller than 2G.
You are, I think, allowed to have netCDF classic files longer than 2G if all but one of the variables fits completely with the 2G, which would be the case here. But that may be a grey area that some libraries don't work well with, or something. Maybe only 2G of the data was "scooted down"?
Diagnose issue, and have update_metadata warn the user / refuse to run if it seems like it applies.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.