btschwertfeger / python-cmethods Goto Github PK

A collection of bias correction techniques written in Python - for climate sciences.

Home Page: https://python-cmethods.readthedocs.io/en/stable

License: GNU General Public License v3.0

Python 97.06% Makefile 2.94%

bias-adjustment bias-correction climate-data climate-science delta-method linear-scaling variance-scaling quantile-delta-mapping quantile-mapping delta-change-method

python-cmethods's Introduction

python-cmethods

Welcome to python-cmethods, a powerful Python package designed for bias correction and adjustment of climate data. Built with a focus on ease of use and efficiency, python-cmethods offers a comprehensive suite of functions tailored for applying bias correction methods to climate model simulations and observational datasets via command-line interface and API.

Please cite this project as described in https://zenodo.org/doi/10.5281/zenodo.7652755.

About
Available Methods
Installation
Usage and Examples
Notes
Contribution
References

1. About

Bias correction in climate research involves the adjustment of systematic errors or biases present in climate model simulations or observational datasets to improve their accuracy and reliability, ensuring that the data better represents actual climate conditions. This process typically involves statistical methods or empirical relationships to correct for biases caused by factors such as instrument calibration, spatial resolution, or model deficiencies.

Figure 1: Schematic representation of a bias adjustment procedure

python-cmethods empowers scientists to effectively address those biases in climate data, ensuring greater accuracy and reliability in research and decision-making processes. By leveraging cutting-edge techniques and seamless integration with popular libraries like xarray and Dask, this package simplifies the process of bias adjustment, even when dealing with large-scale climate simulations and extensive spatial domains.

In this way, for example, modeled data, which on average represent values that are too cold, can be easily bias-corrected by applying any adjustment procedure included in this package.

For instance, modeled data can report values that are way colder than the those data reported by reanalysis time-series. To address this issue, an adjustment procedure can be employed. The figure below illustrates the observed, modeled, and adjusted values, revealing that the delta-adjusted time series ($T^{*DM}{sim,p}$) is significantly more similar to the observational data ($T{obs,p}$) than the raw model output ($T{sim,p}$).

Figure 2: Temperature per day of year in observed, modeled, and bias-adjusted climate data

The mathematical foundations supporting each bias correction technique implemented in python-cmethods are integral to the package, ensuring transparency and reproducibility in the correction process. Each method is accompanied by references to trusted publications, reinforcing the reliability and rigor of the corrections applied.

2. Available Methods

python-cmethods provides the following bias correction techniques:

Linear Scaling
Variance Scaling
Delta Method
Quantile Mapping
Detrended Quantile Mapping
Quantile Delta Mapping

Please refer to the official documentation for more information about these methods as well as sample scripts: https://python-cmethods.readthedocs.io/en/stable/

Except for the variance scaling, all methods can be applied on stochastic and non-stochastic climate variables. Variance scaling can only be applied on non-stochastic climate variables.
- Non-stochastic climate variables are those that can be predicted with relative certainty based on factors such as location, elevation, and season. Examples of non-stochastic climate variables include air temperature, air pressure, and solar radiation.
- Stochastic climate variables, on the other hand, are those that exhibit a high degree of variability and unpredictability, making them difficult to forecast accurately. Precipitation is an example of a stochastic climate variable because it can vary greatly in timing, intensity, and location due to complex atmospheric and meteorological processes.
Except for the detrended quantile mapping (DQM) technique, all methods can be applied to 1- and 3-dimensional data sets. The implementation of DQM to 3-dimensional data is still in progress.
Except for DQM, all methods can be applied using cmethods.adjust. Chunked data for computing e.g. in a dask cluster is possible as well.
For any questions -- please open an issue at https://github.com/btschwertfeger/python-cmethods/issues

3. Installation

For optimal compatibility on macOS, ensure that 'hdf5' and 'netcdf' are pre-installed using Homebrew (brew install hdf5 netcdf).

python3 -m pip install python-cmethods

4. CLI Usage

The python-cmethods package provides a command-line interface for applying various bias correction methods out of the box.

Keep in mind that due to the various kinds of data and possibilities to pre-process those, the CLI only provides a basic application of the implemented techniques. For special parameters, adjustments, and data preparation, please use programming interface.

Listing the parameters and their requirements is available by passing the --help option:

cmethods --help

Applying the cmethods tool on the provided example data using the linear scaling approach is shown below:

cmethods \
  --obs examples/input_data/observations.nc \
  --simh examples/input_data/control.nc \
  --simp examples/input_data/scenario.nc \
  --method linear_scaling \
  --kind add \
  --variable tas \
  --group time.month \
  --output linear_scaling.nc

2024/04/08 18:11:12     INFO | Loading data sets ...
2024/04/08 18:11:12     INFO | Data sets loaded ...
2024/04/08 18:11:12     INFO | Applying linear_scaling ...
2024/04/08 18:11:15     INFO | Saving result to linear_scaling.nc ...

For applying a distribution-based bias correction technique, the following example may help:

cmethods \
  --obs examples/input_data/observations.nc \
  --simh examples/input_data/control.nc \
  --simp examples/input_data/scenario.nc \
  --method quantile_delta_mapping \
  --kind add \
  --variable tas \
  --quantiles 1000 \
  --output quantile_delta_mapping.nc

2024/04/08 18:16:34     INFO | Loading data sets ...
2024/04/08 18:16:35     INFO | Data sets loaded ...
2024/04/08 18:16:35     INFO | Applying quantile_delta_mapping ...
2024/04/08 18:16:35     INFO | Saving result to quantile_delta_mapping.nc ...

5. Programming Interface Usage and Examples

import xarray as xr
from cmethods import adjust

obsh = xr.open_dataset("input_data/observations.nc")
simh = xr.open_dataset("input_data/control.nc")
simp = xr.open_dataset("input_data/scenario.nc")

# adjust only one grid cell
ls_result = adjust(
    method="linear_scaling",
    obs=obsh["tas"][:, 0, 0],
    simh=simh["tas"][:, 0, 0],
    simp=simp["tas"][:, 0, 0],
    kind="+",
    group="time.month",
)

# adjust all grid cells
qdm_result = adjust(
    method="quantile_delta_mapping",
    obs=obsh["tas"],
    simh=simh["tas"],
    simp=simp["tas"],
    n_quantiles=1000,
    kind="+",
)

# to calculate the relative rather than the absolute change,
# '*' can be used instead of '+' (this is preferred when adjusting
# stochastic variables like precipitation)

It is also possible to adjust chunked data sets. Feel free to have a look into tests/test_zarr_dask_compatibility.py to get a starting point.

Notes:

For the multiplicative techniques a maximum scaling factor of 10 is defined. This can be changed by passing the optional parameter max_scaling_factor.
Except for detrended quantile mapping, all implemented techniques can be applied to single and multi-dimensional data sets by executing the cmethods.adjust function.
A Jupyter notebook applying all those methods is provided here: /examples/examples.ipynb
The example data is located at: /examples/input_data/*.nc

5. Notes

Computation in Python takes some time, so this is only for demonstration. When adjusting large datasets, you should either use chunked data using for example a dask cluster or to apply the command-line tool BiasAdjustCXX.
Formulas and references can be found in the implementations of the corresponding functions, on the bottom of the README.md and in the documentation.

Space for improvements

Since the scaling methods implemented so far scale by default over the mean values of the respective months, unrealistic long-term mean values may occur at the month transitions. This can be prevented either by selecting group='time.dayofyear'. Alternatively, it is possible not to scale using long-term mean values, but using a 31-day interval, which takes the 31 surrounding values over all years as the basis for calculating the mean values. This is not yet implemented, because even the computation for this takes so much time, that it is not worth implementing it in python - but this is available in BiasAdjustCXX.

6. 🆕 Contributions

… are welcome but:

First check if there is an existing issue or PR that addresses your problem/solution. If not - create one first - before creating a PR.
Typo fixes, project configuration, CI, documentation or style/formatting PRs will be rejected. Please create an issue for that.
PRs must provide a reasonable, easy to understand and maintain solution for an existing problem. You may want to propose a solution when creating the issue to discuss the approach before creating a PR.

7. References

Schwertfeger, Benjamin Thomas and Lohmann, Gerrit and Lipskoch, Henrik (2023) "Introduction of the BiasAdjustCXX command-line tool for the application of fast and efficient bias corrections in climatic research", SoftwareX, Volume 22, 101379, ISSN 2352-7110, (https://doi.org/10.1016/j.softx.2023.101379)
Schwertfeger, Benjamin Thomas (2022) "The influence of bias corrections on variability, distribution, and correlation of temperatures in comparison to observed and modeled climate data in Europe" (https://epic.awi.de/id/eprint/56689/)
Linear Scaling and Variance Scaling based on: Teutschbein, Claudia and Seibert, Jan (2012) "Bias correction of regional climate model simulations for hydrological climate-change impact studies: Review and evaluation of different methods" (https://doi.org/10.1016/j.jhydrol.2012.05.052)
Delta Method based on: Beyer, R. and Krapp, M. and Manica, A.: "An empirical evaluation of bias correction methods for palaeoclimate simulations" (https://doi.org/10.5194/cp-16-1493-2020)
Quantile and Detrended Quantile Mapping based on: Alex J. Cannon and Stephen R. Sobie and Trevor Q. Murdock "Bias Correction of GCM Precipitation by Quantile Mapping: How Well Do Methods Preserve Changes in Quantiles and Extremes?" (https://doi.org/10.1175/JCLI-D-14-00754.1)
Quantile Delta Mapping based on: Tong, Y., Gao, X., Han, Z. et al. "Bias correction of temperature and precipitation over China for RCM simulations using the QM and QDM methods". Clim Dyn 57, 1425–1443 (2021). (https://doi.org/10.1007/s00382-020-05447-4)
I'd like to express my gratitude to @riley-brady for initiating and contributing to the discussion on #47. I appreciate all the valuable suggestions provided throughout the implementation of the subsequent changes.

python-cmethods's People

Contributors

Stargazers

Watchers

Forkers

hungrycat233 jorghaav chalachew-github alan-turing-institute nicolasfauchereau walter-git lawanfalalu hoseindjawadi andrew yjh157 gallarobert

python-cmethods's Issues

Split Quantile Mapping into Quantile Mapping and Detrended Quantile Mapping

Is your feature request related to a problem? Please describe.
The current implementation of the quantile mapping includes the detrended quantile mapping by setting the parameter "detrended" to "True".

Describe the solution you'd like
Since quantile mapping and detrended quantile mapping are quite different and the current implementation lead to overlooking the detrended quantile mapping approach, it would be better to split them into separate methods.

The DQM method should also be executable via `adjust`

For more information see: #48 (comment)

correct simulated daily precipitation using monthly observed precipitation

Thank you very much for the great tool. I have simulated the daily precipitation dataset and observed precipitation in monthly time scale. Is it possible to correct the daily data using monthly observation? If possible, can you please give an example to show how to do it using cmethods or BiasAdjustCXX ?
Many thanks
best,
zhongwang

Remove the unnecessary `CMethods.py` module to access the CMethods class more easily

Is your feature request related to a problem? Please describe.
Importing the CMethods class using from cmethods.CMethods import CMethods is not very nice.

It would be better to move the content from the CMethods.py to the __init__..py to direcly import the CMethods class using from cmethods import CMethods.

Move from setup.py to pyproject.toml

Is your feature request related to a problem? Please describe.
Building packages using setup.py is deprecated, so the move to pyproject.toml should be a task for the future.

Find a solution to process large data sets more efficient

Is your feature request related to a problem? Please describe.
Find a solution to process large data sets more efficient. Loading the data sets like in cmethods/CMethods.py#L144-L150 takes to much time, especially when dealing with very large data sets. This weak point was also addressed in 10.5281/zenodo.7678393 and stackoverflow.com/questions/75815668.

Optimization for `adjust_3d`

Hello! Thank you for a fantastic offering with this package. This has been very helpful to get some quantile mapping up and running quickly. As a thank you, I wanted to offer some optimization I did on gridded bias correction that you might find useful. I specialize in vectorizing/optimizing gridded data, but do not have the capacity right now to open a full PR.

I used cm.adjust_3d on a ~111x111 lat lon grid with 40,000 time steps. The progress bar estimated around 2.5 hours for this but I didn't run it in full. With the below implementation, it ran in 1 minute.

You need a dask cluster running and a dask dataset of course to reap those benefits, but the implementation will speed up in-memory datasets too.

import numpy as np
import xarray as xr
from cmethods import CMethods as cm


def quantile_map_3d(
    obs: xr.DataArray,
    simh: xr.DataArray,
    simp: xr.DataArray,
    n_quantiles: int,
    kind: str,
):
    """Quantile mapping vectorized for 3D operations."""

    def qmap(
        obs: xr.DataArray,
        simh: xr.DataArray,
        simp: xr.DataArray,
        n_quantiles: int,
        kind: str,
    ) -> np.array:
        """Helper for apply ufunc to vectorize/parallelize the bias correction step."""
        return cm.quantile_mapping(
            obs=obs, simh=simh, simp=simp, n_quantiles=n_quantiles, kind=kind
        )

    result = xr.apply_ufunc(
        qmap,
        obs,
        simh,
        # Need to spoof a fake time axis since 'time' coord on full dataset is different
        # than 'time' coord on training dataset.
        simp.rename({"time": "t2"}),
        dask="parallelized",
        vectorize=True,
        # This will vectorize over the time dimension, so will submit each grid cell
        # independently
        input_core_dims=[["time"], ["time"], ["t2"]],
        # Need to denote that the final output dataset will be labeled with the
        # spoofed time coordinate
        output_core_dims=[["t2"]],
        kwargs={"n_quantiles": n_quantiles, "kind": kind},
    )

    # Rename to proper coordinate name.
    result = result.rename({"t2": "time"})

    # ufunc will put the core dimension to the end (time), so want to preserve original
    # order where time is commonly first.
    result = result.transpose(*obs.dims)
    return result

The nice thing about this is that it can handle 1D datasets without any issue. The limitation is they always have to be xarray objects. But it works with dask or in-memory datasets and any arbitrary dimensions as long as a labeled time dimension exists.

The other great thing is you could just implement the apply_ufunc wrapper to every single bias correction code without the need for a separate adjust_3d function. A user can pass in 1D or 2D+ data without any change in code.

Example:

obs = xr.DataArray(
    [[1, 2, 3, 4], [2, 3, 4, 5]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2023-10-25", freq="D", periods=4)},
).transpose("time", "x")
simh = xr.DataArray(
    [[2, 1, 5, 4], [3, 9, 1, 4]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2023-10-25", freq="D", periods=4)},
).transpose("time", "x")
simp = xr.DataArray(
    [[7, 9, 10, 14], [12, 13, 14, 15]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2040-10-25", freq="D", periods=4)},
).transpose("time", "x")

# 2D dataset
>>> quantile_map_3d(obs, simh, simp, 250, "*")
<xarray.DataArray (time: 4, x: 2)>
array([[5., 9.],
       [5., 9.],
       [5., 9.],
       [5., 9.]])
Coordinates:
  * x        (x) int64 0 1
  * time     (time) datetime64[ns] 2040-10-25 2040-10-26 2040-10-27 2040-10-28

# 1D dataset
>>> quantile_map_3d(obs.isel(x=0), simh.isel(x=0), simp.isel(x=0), 250, "*")
<xarray.DataArray (time: 4)>
array([5., 5., 5., 5.])
Coordinates:
    x        int64 0
  * time     (time) datetime64[ns] 2040-10-25 2040-10-26 2040-10-27 2040-10-28

Create a changelog

Is your feature request related to a problem? Please describe.
There should be a changelog file to present an overview of the historical changes.

Add a command-line interface

Providing a command-line interface could replace the examples/biasadjust.py script, so that one can for example install the python-cmethods package and call "cmethods adjust input.nc -o output.nc" (simplified) to apply the tool directly on the data without needing to build any scripts to apply the tool.

The latest documentation still describes the legacy max_scaling_factor

Add pre-commit

Is your feature request related to a problem? Please describe.
Just add pre-commit to standardize and check the code a bit.

Describe the solution you'd like
Use https://pre-commit.com/ in the ci

The tool fails when input time series include np.nan for distribution-based methods

Describe the bug
The tool is not able to process time series with only nan values. (quantile_mapping, quantile_delta_mapping, detrended_quantile_mapping)

To Reproduce
Import CMethods and run an adjustment using datasets containing only nan values.

The error is like:

Traceback (most recent call last):
  File "biasadjust.py", line 152, in <module>
    main()
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "biasadjust.py", line 130, in main
    result = cm.adjust_3d(
  File "/Users/udo/bts/miniconda3/lib/python3.8/site-packages/cmethods/__init__.py", line 247, in adjust_3d
    result[lat, lon] = func(
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/cmethods/__init__.py", line 1167, in quantile_delta_mapping
    xbins = np.arange(global_min, global_max + wide, wide)
ValueError: arange: cannot compute length

Expected behavior
If one of the time series of the control period (obs, simh) is nan, there can't be made an adjustment and the functions should return the raw simp/scenario time series without further adjustment.

adjust_3d forces group to be "time.month" if group is set to the default (None)

Describe the bug

The adjust_3d method will always call bias correction methods and forcing the group to be time.month if no group was defined or is set to the default None.

This should not be the wanted behaviour, because with this, there is no chance to apply bias corrections on 3-dimensional data sets without a grouping.

Solution
Just remove the forced grouping in the adjust_3d method.

Create a release workflow for dev and production releases

Is your feature request related to a problem? Please describe.
There should be a workflow (or more) that get triggered, when a new push via PR goes into the master that triggers the upload of the package to test.pypi.org. Also a workflow that gets triggered on a new release that uploads the package to the production PyPI should be added to improve the development processes.

Create a documentation

Is your feature request related to a problem? Please describe.
There should be a simple documentation, for example built using sphinx. This would be way better than just having a readme and large doc strings in the module.

KeyError when Quantile* Mapping and group != None

Describe the bug
When using the adjust_3d method for adjusting distribution-based bias correction techniques like the quantile mapping or quantile delta mapping and using a group like time.month - an error occurs since the grouping creates Lists inside instead of xarray.datasets.

Validate types during runtime

Use isinstance() to check that the input parameters are in the correct type.

Adjustments using `adjust` require the input data of the control period to have the same size for the time dimension

… This should not be the case since for example the QM and QDM methods do not require same sized time dimensions.

Due to this behavior, workarounds like those shown in #65 are required for data sets with different sizes.

Add python-cmethods to the conda registry

Is your feature request related to a problem? Please describe.
To reach a larger audience it makes sense to add this Python package to the conda registry. For this, the appropriate precautions should be taken and a corresponding upload workflow should be set up.

Multiplicative Quantile Delta Mapping is not applying scaling where the delta is infinite

Describe the bug
The multiplicative quantile delta mapping procedure produces nan values in some cases if the basis of delta is negative.

The delta can be described as:

\Delta(i) = \frac{
X_{sim,p}(i)
} {
F^{-1}{sim,h} \left{ F{sim,p } \left[ X_{sim,p}(i) \right] \right} }
where
$F^{-1}{sim,h}$ represents the inverse cumulative distribution function of the modeled data of the control period
$F{sim,p}$ represents the cumulative distribution function of the modeled data that is to be adjusted
$X_{sim,p}(i)$ is the value of a climate variable $X$ at time step $i$

see: v1.0.0/cmethods/init.py#L1055-L1059

So if the basis is zero - the mentioned part return a zero as scaling factor instead of the maximum scaling factor (10 as default).

EDIT:

This also occurs on every place where the devision is applied.
And if the numerator is zero the value should remain zero - and not nan in some cases.

To Reproduce
A real world example would be: $X_{sim,p}(i)$ is the value of precipitation at time step $i$ and is 0.0001 mm. So if the value inserted into the cumulative distribution function returns a precipitation value that equals zero - for example when the precipitation in the control data was much higher than in the data stat is to be adjusted.

Expected behaviour
The change should not be zero (this means no change) - instead the maximum scaling factor should be applied.

Additional context
python-cmethods v1.0.0