Coder Social home page Coder Social logo

xcdat / xcdat-validation Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 3.0 18.12 MB

The xCDAT validation repository for exploring, prototyping, and testing feature ideas.

Home Page: https://xcdat.readthedocs.io/en/latest/

Jupyter Notebook 90.88% Python 0.77% HTML 6.36% Shell 1.98%
jupyter-notebook python climate-data xarray climate-science climate-data-analysis cdat xcdat

xcdat-validation's Introduction

xCDAT logo

Xarray Climate Data Analysis Tools

Badges
Distribution conda-forge platforms conda-downloads
Citation zenodo-doi
DevOps CI/CD Build Workflow codecov docs
Quality Assurance pre-commit black flake8 mypy

xCDAT is an extension of xarray for climate data analysis on structured grids. It serves as a modern successor to the Community Data Analysis Tools (CDAT) library.

Useful links: Documentation | Code Repository | Issues | Discussions | Releases | Mailing List

Project Motivation

The goal of xCDAT is to provide generalizable features and utilities for simple and robust analysis of climate data. xCDAT's design philosophy is focused on reducing the overhead required to accomplish certain tasks in xarray. xCDAT aims to be compatible with structured grids that are CF-compliant (e.g., CMIP6). Some key xCDAT features are inspired by or ported from the core CDAT library, while others leverage powerful libraries in the xarray ecosystem (e.g., xESMF, xgcm, cf_xarray) to deliver robust APIs.

The xCDAT core team's mission is to provide a maintainable and extensible package that serves the needs of the climate community in the long-term. We are excited to be working on this project and hope to have you onboard!

Getting Started

The best resource for getting started is the xCDAT documentation website. Our documentation provides general guidance for setting up xCDAT in an Anaconda environment on your local computer or on an HPC/Jupyter environment. We also include an API Overview and Gallery to highlight xCDAT functionality.

Community

xCDAT is a community-driven open source project. We encourage discussion on topics such as version releases, feature suggestions, and architecture design on the GitHub Discussions page.

Subscribe to our mailing list for news and announcements related to xCDAT, such as software version releases or future roadmap plans.

Please note that xCDAT has a Code of Conduct. By participating in the xCDAT community, you agree to abide by its rules.

Contributing

We welcome and appreciate contributions to xCDAT. Users and contributors can view and open issues on our GitHub Issue Tracker.

For more instructions on how to contribute, please checkout our Contributing Guide.

Features

  • Extension of xarray's open_dataset() and open_mfdataset() with post-processing options
    • Generate bounds for axes supported by xcdat if they don't exist in the Dataset
    • Optional selection of single data variable to keep in the Dataset (bounds are also kept if they exist)
    • Optional decoding of time coordinates
      • In addition to CF time units, also decodes common non-CF time units ("months since ...", "years since ...")
    • Optional centering of time coordinates using time bounds
    • Optional conversion of longitudinal axis orientation between [0, 360) and [-180, 180)
  • Temporal averaging
    • Time series averages (single snapshot and grouped), climatologies, and departures
    • Weighted or unweighted
    • Optional seasonal configuration (e.g., DJF vs. JFD, custom seasons)
  • Geospatial weighted averaging
    • Supports rectilinear grid
    • Optional specification of regional domain
  • Horizontal structured regridding
    • Supports rectilinear and curvilinear grids
    • Extends the xESMF horizontal regridding API
    • Python implementation of regrid2 for handling cartesian latitude longitude grids
  • Vertical structured regridding
    • Support rectilinear and curvilinear grids
    • Extends the xgcm vertical regridding API

Things We Are Striving For

  • xCDAT supports CF compliant datasets, but will also strive to support datasets with common non-CF compliant metadata (e.g., time units in "months since ..." or "years since ...")
    • xCDAT leverages cf_xarray to interpret CF attributes on xarray objects
    • Refer to CF Convention for more information on CF attributes
  • Robust handling of dimensions and their coordinates and coordinate bounds
    • Coordinate variables are retrieved with cf_xarray using CF axis names or coordinate names found in xarray object attributes. Refer to Metadata Interpretation for more information.
    • Bounds are retrieved with cf_xarray using the "bounds" attr
    • Ability to operate on both longitudinal axis orientations, [0, 360) and [-180, 180)
  • Support for parallelism using dask where it is both possible and makes sense

Releases

xCDAT (released as xcdat) follows a feedback-driven release cycle using continuous integration/continuous deployment. Software releases are performed based on the bandwidth of the development team, the needs of the community, and the priority of bug fixes or feature updates.

After releases are performed on GitHub Releases, the corresponding xcdat package version will be available to download through Anaconda conda-forge usually within a day.

Subscribe to our mailing list to stay notified of new releases.

Useful Resources

We highly encourage you to checkout the awesome resources below to learn more about Xarray and Xarray usage in climate science!

Projects Using xCDAT

xCDAT is actively being integrated as a core component of the Program for Climate Model Diagnosis and Intercomparison (PCMDI) Metrics Package and the Energy Exascale Earth System Model Diagnostics (E3SM) Package. xCDAT is also included in the E3SM Unified Anaconda Environment that is deployed on various U.S. Department of Energy supercomputers to run E3SM software tools.

Acknowledgement

xCDAT is jointly developed by scientists and developers from the Energy Exascale Earth System Model (E3SM) Project and Program for Climate Model Diagnosis and Intercomparison (PCMDI). The work is performed for the E3SM project, which is sponsored by Earth System Model Development (ESMD) program, and the Simplifying ESM Analysis Through Standards (SEATS) project, which is sponsored by the Regional and Global Model Analysis (RGMA) program. ESMD and RGMA are programs for the Earth and Environmental Systems Sciences Division (EESSD) in the Office of Biological and Environmental Research (BER) within the Department of Energy's Office of Science.

Contributors

Thank you to all of our contributors!

xCDAT contributors

License

xCDAT is licensed under the terms of the Apache License (Version 2.0 with LLVM exception).

All new contributions must be made under the Apache-2.0 with LLVM exception license.

See LICENSE and NOTICE for details.

SPDX-License-Identifier: Apache-2.0

LLNL-CODE-846944

xcdat-validation's People

Contributors

chengzhuzhang avatar jasonb5 avatar lee1043 avatar pochedls avatar tomvothecoder avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

xcdat-validation's Issues

Investigate landsea mask in xarray

Note:

  • How to use landsea mask in xarray?
  • CDAT has capability of generating estimated landsea mask. Should we consider having such capability with xarray?

Compare time operations in xarray and CDAT

We should compare basic time operations in CDAT and xarray. These include:

  • Conversion between different time units or types (e.g., between calendars or datetime/nctime/component time)
  • Functionality to create seasonal and annual averages and anomalies
  • Functionality to subset data based on time
  • Handling of weird data (e.g., missing data or time periods with overlapping data)
  • How time is handles in plotting routines (e.g., do we need to modify the calendar to plot with Matplotlib)

Investigate and Compare Area Averaging in xarray and CDAT

Specific details to investigate include:

  • Generation of weights (particularly on non-rectangular grids)
  • Ability to define regions (e.g., tropics or Niño 3.4 region)
  • Handling of masked data
  • Handling of 3D data (e.g., if you average a variable x[time, height, lat, lon], does it return xa[time, height]?)

Add notebook to test general utility functions

Validation Checklist

  1. API usage (and plots where helpful)
  2. Performance metrics (use timeit)
  3. Bugs, improvements
  4. Questions

Compare below APIs against CDAT equivalent.

  • xcdat.open_dataset()
  • xcdat.open_mfdataset()
  • xcdat.decode_time_units()

Compare I/O and Metadata in xarray and CDAT

Compare read / write in xarray and CDAT, including:

  • Ability to read file header and metadata (without loading full contents)
  • What is the default (load data into memory? only load when an operation is performed?)
  • Types of files each reads (do they both handle older file types, e.g., grib?)
  • What is the process of labelling an array with axis information to save a netCDF file?

Add notebook to validate bounds functions

Validation Checklist

  1. API usage (and plots where helpful)
  2. Performance metrics (use timeit)
  3. Bugs, improvements
  4. Questions

Compare below APIs with CDAT equivalent.

  • ds.xcdat.fill_missing_bounds() or ds.bounds.fill_missing()
  • ds.xcdat.get_bounds() or ds.bounds.get_bounds()
  • ds.xcdat.add_bounds() or ds.bounds.add_bounds()

List of issues to create

A checked box indicates the item has been added to the issues list.

I/O

  • global attributes of each variable (#11)
  • XML like behavior, query CMIP archive (#6)
  • Function to list paths (#6)
  • Index to point to netcdf files -- constantly move in the database
  • SQLite database -- instead of list of .xml files, query for paths (#6)
  • Types of files xarray reads? Grib (#11)

Time operations

  • cdtime (#7)
  • Calendar, time, etc. (#7)

Regridding

  • Xesmf, tempest regrid (Python interface to C) (#9)
  • Time series - seasonal, averaging, annual cycle departures (#10)
  • Area averaging -- xarray might not create automatically (#10)
  • Vertical interpolation - slow in CDAT
  • Data subsetting (#14)
  • Parallelization
  • Statistical calculations - genutil, reproduce by numpy/scipy, weights
  • plotting - matplotlib, cartopy (#15)

Add notebook to validate spatial averaging

Validation Checklist

  1. API usage (and plots where helpful)
  2. Performance metrics (use timeit)
  3. Bugs, improvements
  4. Questions

Compare below APIs against CDAT equivalent.

  • ds.xcdat.spatial_average() or ds.spatial.avg()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.