Coder Social home page Coder Social logo

ncar / ldcpy Goto Github PK

View Code? Open in Web Editor NEW
16.0 9.0 2.0 1.21 GB

Statistical and visual tools for gathering metrics and comparing Earth System Model data files. A common use case is comparing data that has been lossily compressed with the original data.

Home Page: https://ldcpy.readthedocs.io/

License: Apache License 2.0

Python 99.23% Makefile 0.51% Jupyter Notebook 0.26%
xarray lossy-data-compression zfp

ldcpy's Introduction

GitHub Workflow CI Status

GitHub Workflow Code Style Status

image

Documentation Status

Python Package Index

Conda Version

image

Large Data Comparison for Python

ldcpy is a utility for gathering and plotting metrics from NetCDF or Zarr files using the Pangeo stack. It also contains a number of statistical and visual tools for gathering metrics and comparing Earth System Model data files.

AUTHORS

Alex Pinard, Allison Baker, Anderson Banihirwe, Dorit Hammerling

COPYRIGHT

2020 University Corporation for Atmospheric Research

LICENSE

Apache 2.0

Documentation and usage examples are available here.

Reference to ldcpy paper

  1. Pinard, D. M. Hammerling, and A. H. Baker. Assessing differences in large spatio­temporal climate datasets with a new Python package. In The 2020 IEEE International Workshop on Big Data Reduction, 2020. doi: 10.1109/BigData50022.2020.9378100.

Link to paper: https://doi.org/10.1109/BigData50022.2020.9378100

Ensure conda is up to date and create a clean Python (3.6+) environment:

conda update conda
conda create --name ldcpy python=3.8
conda activate ldcpy

Now install ldcpy:

conda install -c conda-forge ldcpy

Alternative Installation

Ensure pip is up to date, and your version of python is at least 3.6:

pip install --upgrade pip
python --version

Install cartopy using the instructions provided at https://scitools.org.uk/cartopy/docs/latest/installing.html.

Then install ldcpy:

pip install ldcpy

Accessing the tutorial

If you want access to the tutorial notebook, clone the repository (this will create a local repository in the current directory):

git clone https://github.com/NCAR/ldcpy.git

Start by enabling Hinterland for code completion and code hinting in Jupyter Notebook and then opening the tutorial notebook:

jupyter nbextension enable hinterland/hinterland
jupyter notebook

The tutorial notebook can be found in docs/source/notebooks/TutorialNotebook.ipynb, feel free to gather your own metrics or create your own plots in this notebook!

Other example notebooks that use the sample data in this repository include PopData.ipynb and MetricsNotebook.ipynb.

The AWSDataNotebook grabs data from AWS, so can be run on a laptop with the caveat that the files are large.

The following notebooks asume that you are using NCAR's JupyterHub (https://jupyterhub.hpc.ucar.edu): LargeDataGladenotebook.ipynb, CAMNotebook,ipynb, and error_bias.ipynb

Re-create notebooks with Pangeo Binder

Try the notebooks hosted in this repo on Pangeo Binder. Note that the session is ephemeral. Your home directory will not persist, so remember to download your notebooks if you make changes that you need to use at a later time!

Note: All example notebooks are in docs/source/notebooks (the easiest ones to use in binder first are TutorialNotebook.ipynb and PopData.ipynb)

Binder

ldcpy's People

Contributors

allibco avatar andersy005 avatar dependabot[bot] avatar mnlevy1981 avatar pinarda avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

kmpaul mnlevy1981

ldcpy's Issues

Reduce package size

Limit on PyPI is 100MB, data files are currently using 443 MB and the rest of the package is 3.1MB. We probably want to removing datafiles like the cam-se data, and reduce the number of time slices on the rest of the data. Also, rename the data directory sample-data

get package on PyPi

We want three ways to install, PyPi with pip, conda-forge with conda, and dev install with conda and environment-dev.yml file. MAY RUN INTO ISSUE WITH CARTOPY

OrderedDict in error metrics?

It would be nice if print_stats() printed out statistics in the same order every time; I think using an OrderedDict underneath the json will accomplish that.

get ldcpy package on conda-forge

We want three ways to install, PyPi with pip, conda-forge with conda, and dev install with conda and environment-dev.yml file.

Plots with slider bars

One group in the hackathon had some interactive graphics that allowed users to hover over map and get lat, lon-specific data.

Add support for se datasets

This will require reworking some of the metrics functionality, and adding some new plot functionality as well.

Interactive Mapping

Add ability to select point on spatial map and get time-series plot for that point. (See Plots with Slider Bars issue)

normalize error for plots and metrics

rather than (or in addition to?) absolute RMSE, it would be nice to normalize it based on the field being analyzed. And this could carry over to plot.mean_error()

New Metrics

From the technote, we want the ability to plot:

  • Pooled variance ratio (fig 16)
  • Error lag-1 correlations (fig 19)
  • Amplitude of Annual Error Harmonic (fig 18)
  • Min/Max MAE (currently waiting on xarray version 0.15.2 for idxmax() function)

Update ReadMe

update README.md with complete list of steps to do a development install

CircleCI builds failing occasionally

This test works fine on a local machine and in github workflows, so it is probably not an issue with the code.

Output:

test_subset_lat_lon_ratio_time_series - tests.test_plot.TestPlot
tests/test_plot.py
self = <tests.test_plot.TestPlot testMethod=test_subset_lat_lon_ratio_time_series>

def test_subset_lat_lon_ratio_time_series(self):
    ldcpy.plot(
        ds2,
        'PRECT',
        c0='orig',
        metric='mean',
        c1='recon',
        metric_type='ratio',
        group_by=None,
        subset='first50',
        lat=44.76,
        lon=-93.75,
      plot_type='time_series',
    )

tests/test_plot.py:140:


ldcpy/plot.py:622: in plot
mp.time_series_plot(plot_data_c0, title_c0)
ldcpy/plot.py:355: in time_series_plot
self._label_offset(ax)
ldcpy/plot.py:174: in _label_offset
ax.figure.canvas.draw()
/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py:393: in draw
self.figure.draw(self.renderer)
/opt/conda/lib/python3.7/site-packages/matplotlib/artist.py:38: in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py:1736: in draw
renderer, self, artists, self.suppressComposite)
/opt/conda/lib/python3.7/site-packages/matplotlib/image.py:137: in _draw_list_compositing_images
a.draw(renderer)
/opt/conda/lib/python3.7/site-packages/matplotlib/artist.py:38: in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
/opt/conda/lib/python3.7/site-packages/cartopy/mpl/geoaxes.py:479: in draw
return matplotlib.axes.Axes.draw(self, renderer=renderer, **kwargs)
/opt/conda/lib/python3.7/site-packages/matplotlib/artist.py:38: in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
/opt/conda/lib/python3.7/site-packages/matplotlib/axes/_base.py:2630: in draw
mimage._draw_list_compositing_images(renderer, self, artists)
/opt/conda/lib/python3.7/site-packages/matplotlib/image.py:137: in _draw_list_compositing_images
a.draw(renderer)
/opt/conda/lib/python3.7/site-packages/matplotlib/artist.py:38: in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
/opt/conda/lib/python3.7/site-packages/cartopy/mpl/feature_artist.py:155: in draw
geoms = self._feature.intersecting_geometries(extent)
/opt/conda/lib/python3.7/site-packages/cartopy/feature/init.py:302: in intersecting_geometries
return super(NaturalEarthFeature, self).intersecting_geometries(extent)
/opt/conda/lib/python3.7/site-packages/cartopy/feature/init.py:110: in intersecting_geometries
return (geom for geom in self.geometries() if
/opt/conda/lib/python3.7/site-packages/cartopy/feature/init.py:287: in geometries
geometries = tuple(shapereader.Reader(path).geometries())
/opt/conda/lib/python3.7/site-packages/cartopy/io/shapereader.py:166: in geometries
shape = self._reader.shape(i)
/opt/conda/lib/python3.7/site-packages/shapefile.py:854: in shape
return self.__shape()


self = <shapefile.Reader object at 0x7f994232e690>

def __shape(self):
    """Returns the header info and geometry for a single shape."""
    f = self.__getFileObj(self.shp)
    record = Shape()
    nParts = nPoints = zmin = zmax = mmin = mmax = None
    (recNum, recLength) = unpack(">2i", f.read(8))
    # Determine the start of the next record
    next = f.tell() + (2 * recLength)
    shapeType = unpack("<i", f.read(4))[0]
    record.shapeType = shapeType
    # For Null shapes create an empty points list for consistency
    if shapeType == 0:
        record.points = []
    # All shape types capable of having a bounding box
    elif shapeType in (3,5,8,13,15,18,23,25,28,31):
        record.bbox = _Array('d', unpack("<4d", f.read(32)))
    # Shape types with parts
    if shapeType in (3,5,13,15,23,25,31):
        nParts = unpack("<i", f.read(4))[0]
    # Shape types with points
    if shapeType in (3,5,8,13,15,18,23,25,28,31):
        nPoints = unpack("<i", f.read(4))[0]
    # Read parts
    if nParts:
        record.parts = _Array('i', unpack("<%si" % nParts, f.read(nParts * 4)))
    # Read part types for Multipatch - 31
    if shapeType == 31:
        record.partTypes = _Array('i', unpack("<%si" % nParts, f.read(nParts * 4)))
    # Read points - produces a list of [x,y] values
    if nPoints:
      flat = unpack("<%sd" % (2 * nPoints), f.read(16*nPoints))

E struct.error: unpack requires a buffer of 432 bytes

/opt/conda/lib/python3.7/site-packages/shapefile.py:777: error

Turn SampleNotebook into TutorialNotebook

this will require an overview section, links to the documentation, explanations of plotting options (especially plot_type, metric_type) , listing the required plot arguments, explanations of what the metadata (print_stats, ds commands do

Can we use open_mfdataset?

utils.open_datasets() was built on older code that called xr.open_dataset() several times, but those calls can probably be replaced with a single xr.open_mfdataset() call... we just need to make sure everything is concatenated correctly.

We probably want to add some sort of checking on the input parameters to plot() to make sure the combination of parameters is valid. Alternatively, write time_series_plot, spatial_plot etc functions that fix some parameters and then call plot().

Most of the functions should probably be put in to a class to avoid passing so many parameters around all the time. Also, we probably want to add some sort of checking on the input parameters to plot() to make sure the combination of parameters is valid. Alternatively, write time_series_plot, spatial_plot etc functions that fix some parameters and then call plot().

Originally posted by @pinarda in #40 (comment)

Add units to colorbar

  1. There is a units property in the dataset, but the metrics array returned by a call to get_metrics does not have a units property, so we need to add that property before we return the array.

  2. Then, add the units to the color bar title.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.