Coder Social home page Coder Social logo

xarray-contrib / xarray-tutorial Goto Github PK

View Code? Open in Web Editor NEW
160.0 12.0 102.0 101.71 MB

Xarray Tutorials

Home Page: https://tutorial.xarray.dev/

License: Apache License 2.0

Jupyter Notebook 99.77% TeX 0.13% CSS 0.02% Dockerfile 0.01% Python 0.08%
hacktoberfest

xarray-tutorial's Introduction

Xarray Tutorial

CI Jupyter Book Badge Binder

This is the repository for a Jupyter Book website with tutorial material for Xarray, an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

The website is hosted at https://tutorial.xarray.dev

Tutorials are written as interactive Jupyter Notebooks with executable code examples that you can easily run and modify:

On the Cloud

All notebooks can be run via the Mybinder.org 'Launch Binder' badge at the top of this page. This will load a pre-configured JupyterLab interface with all tutorial notebooks for you to run. You have minimal computing resources and any changes you make will not be saved.

Github Codespaces

This tutorial is available to run within Github Codespaces - "a development environment that's hosted in the cloud" - with the conda environment specification in the conda-lock.yml file.

Open in GitHub Codespaces

☝️ Click the button above to go to options window to launch a Github codespace.

A codespace is a development environment that's hosted in the cloud. GitHub currently gives every user 120 vCPU hours per month for free, beyond that you must pay. So be sure to explicitly stop or shut down your codespace when you are done by going to this page (https://github.com/codespaces).

Once your codespace is launched, the following happens:

  • Visual Studio Code Interface will open up within your browser.
  • A built in terminal will open and it will execute jupyter lab automatically.
  • Once you see a url to click within the terminal, simply cmd + click the given url.
  • This will open up another tab in your browser, leading to a Jupyter Lab Interface.

Locally

You can also run these notebooks on your own computer! We recommend using micromamba or conda-lock to ensure a fully reproducible Python environment:

git clone https://github.com/xarray-contrib/xarray-tutorial.git
cd xarray-tutorial

conda-lock install conda/conda-lock.yml --name xarray-tutorial
# Or `micromamba create -n xarray-tutorial -f conda-lock.yml`
# Or latest package versions: `mamba env create -f conda/environment-unpinned.yml`

conda activate xarray-tutorial
jupyter lab

Contributing

Contributions are welcome and greatly appreciated! See our CONTRIBUTING.md document.

Thanks to our contributors so far!

Contributors

Acknowledgements

This website is the result of many contributions from the Xarray community! We're very grateful for everyone's volunteered effort as well as sponsored development. Funding for SciPy 2022, SciPy 2023 tutorial material development specifically was supported by NASA's Open Source Tools, Frameworks, and Libraries Program (award 80NSSC22K0345).

xarray-tutorial's People

Contributors

andersy005 avatar bijalbpatel avatar dcherian avatar dependabot[bot] avatar e-marshall avatar felixcremer avatar harisankarh avatar howol76 avatar jessicas11 avatar jsta avatar keewis avatar loganthomas avatar lsetiawan avatar maxrjones avatar mwtarnowski avatar negin513 avatar pavithraes avatar pre-commit-ci[bot] avatar qheuristics avatar rabernat avatar richardscottoz avatar scottyhq avatar tomnicholas avatar tylere avatar weiji14 avatar willirath avatar yutik-nn avatar zmoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xarray-tutorial's Issues

fix tutorial.

Should be straightforward:

ValueError: passing 'axis' to Dataset reduce methods is ambiguous. Please use 'dim' instead.

in the sicpy notebook.

Scipy2022 workshop and repository organization

Discussed recently with @kmpaul @dcherian @JessicaS11 as we ramp up for a Scipy2022 tutorial, we're thinking of various possibilities for synthesizing existing content in this repository. Hopefully I accurately capture everything from that discussion below, it will be a bit lengthy!

The main question is should this repository be 1. a collection of stand-alone tutorials that are snapshots in time, or 2. a single general xarray tutorial that gets updated over time? Also want to link to some past discussion of xarray tutorials with @rabernat and @TomNicholas in pydata/xarray#3564

  • option 1: we add a scipy2022 subfolder with all content for a 4 hour workshop, and render it on the website
    (+) minimal changes to current setup
    (+) each event's content is self-contained
    (-) duplicated and outdated content accumulates over time
    (-) a user finding this repository or the rendered website has difficulty navigating the content

  • option 2: we synthesize the generic content under "fundamentals" and add a "workshops" section on the left sidebar for event-specific content (JupyterBook, or stick with current sphinx theme on read-the-docs)
    (+) no proliferation of many tutorial subfolders or repositories with repetitive content
    (-) content from past events would be modified removed or reorganized (but tags could be used to make it possible to return to past states)

Some other general design goals that came up:

  1. compatible with mybinder.org (preferred cloud infrastructure for workshops)
  2. someone arriving at https://xarray-contrib.github.io/xarray-tutorial finds it easy to navigate the content (whether or not they are participating in a workshop)
  3. what is here is not repetitive and complementary to what's in the current xarray documentation https://docs.xarray.dev/en/stable/tutorials-and-videos.html
  4. not focused on any particular scientific domain
    1. organize domain-specific (geoscience, astronomy, finance) content as subchapters or link out to other domain-specific xarray content (https://foundations.projectpythia.org/landing-page.html)
  5. follow best-practices in tutorial design (https://diataxis.fr/tutorials/, https://guidebook.hackweek.io/resources/tutorial-resources.html)

Any addition thoughts, questions, concerns? Please add comments!

Add rioxarray example to IO notebook

After #98, it would be nice to add a rioxarray example illustrating engine="rioxarray" in the IO notebook. This would demonstrate the ability to plug in new backends

Customize binder startup to show notebook index

If someone uses a standard binder link for this repository (https://mybinder.org/v2/gh/xarray-contrib/xarray-tutorial/HEAD, or the badge in this readme or at https://tutorial.xarray.dev/overview/get-started.html), they end up seeing a rather blank jupyterlab with all the files in this repository.

It would be nice to open an index of notebooks like an index.md with one sentence descriptions like on the planetray computer. (see https://github.com/microsoft/planetary-computer-hub/blob/0d01fd87c016ad142a4b25067455a07c32e053bd/helm/chart/files/etc/singleuser/k8s-lifecycle-hook-post-start.sh)

Even more workspace customization is possible (e.g. automatically open dask labextension or other widgets, multiple notebooks, etc: https://github.com/dask/dask-examples/blob/main/binder/start)

setup instructions

I was bcc-ed on this email:

Just a gentle reminder that SciPy 2020 Tutorial set-up instructions are past due. It is really helpful for us to be able to share instructions with attendees early so they have plenty of time to troubleshoot if they are having difficulty with the set-up.

So we need to write up setup instructions.

timing and coordination for tutorial

I received the calendar invite for the scipy tutorial. Do we have any other logistics to discuss? Do we want to have a pre-tutorial meeting to rehearse and make sure everything flows together? Or am I overthinking it...

Fix up 'working with labeled data'

https://tutorial.xarray.dev/fundamentals/02.1_working_with_labeled_data.html

  • learning goals no longer deals w/ interpolation (split into another notebook)
  • fix different selection techniques to actually get the same value!
  • improve exercise formatting
  • describe basic plotting

@dcherian looking at this notebook and others, I'm tempted to change the example data values a bit to not use np.random.randn(3, 4) and instead use something that is easier to visualize both difference in values and dimensions, so for example np.arange(10).reshape(2,5). thoughts?

intersphinx/sphinx-hoverref

For easier and more robust links to the xarray documentation.

Jupyter Book uses it in their own docs, so it is definitely possible. I guess the downside is that the notebook would look more messy in Jupyter. But maybe sphinx-codeautolink would work, at least for the code blocks.

reviewnb

given that this repo is going to be all notebooks, shall we use reviewNB?

Warning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs

@dcherian Doing a bit of cleanup of https://tutorial.xarray.dev/overview/xarray-in-45-min.html#reading-and-writing-files I'm surprised by the serialization warning: Why is it saving floating point data as integer dtype?

import xarray as xr
ds = xr.tutorial.load_dataset("air_temperature")
ds.to_netcdf("my-example-dataset.nc")
/var/folders/gt/fg1fy12n5wg2b027zy5qmcr40000gn/T/ipykernel_90730/1274509432.py:3: SerializationWarning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs
  ds.to_netcdf("my-example-dataset.nc")

Fix up plotting notebooks

  • move data creation to xarray.tutorial.load_dataset("temperature_gradients")?
  • Fix exercise formatting
  • More text in hvplot notebook

Running regular tutorials independent of events

The Dask team has been running regular, community, 90 minutes tutorials since June 2020, and these tutorials appear to have worked well (see dask/community#57). @mrocklin and @jacobtomlinson put together a nice guide on how to run such tutorials: https://blog.dask.org/2020/08/21/running-tutorials.

I am interested in running similar, regular xarray online tutorials. @martindurant and @dcherian generously offered to help out. We are planning on having the first session on Friday October 2nd. As we start getting the content ready, I am wondering if folks who delivered the most recent tutorial at SciPy 2020 would be interested in sharing the feedback they received? We could use the feedback to re-structure the existing tutorial materials so as to cater to different audiences for different tutorial sessions.

Avoid loading files from urls and s3/gcsfs

Instead add datasets to xr.tutorial.open_dataset (https://github.com/pydata/xarray-data/). That way we don't have to worry about broken links.

ds = xr.tutorial.open_dataset("air_temperature.nc").rename({"air": "Tair"})

# we will add a gradient field with appropriate attributes
ds["dTdx"] = ds.Tair.differentiate("lon") / 110e3 / np.cos(ds.lat * np.pi / 180)
ds["dTdy"] = ds.Tair.differentiate("lat") / 105e3
ds.dTdx.attrs = {"long_name": "$∂T/∂x$", "units": "°C/m"}
ds.dTdy.attrs = {"long_name": "$∂T/∂y$", "units": "°C/m"}

modularize existing fundamentals content

In line with #53 and #65, break the longer fundamentals notebooks into more basic units. This will make it easier to link specific skills and sections for an event and make the tutorials shorter and more digestible.

Add intermediate remote data access tutorial

Illustrate

  • opendap / thredds
  • datasets on S3
  • datasets using gcsfs
  • fsspec

Example code:

import gcsfs

fs = gcsfs.GCSFileSystem(token="anon")
ds = xr.open_zarr(
    fs.get_mapper("gs://pangeo-noaa-ncei/noaa.ersst.v5.zarr"), consolidated=True
).load()
ds

xarray and dask 2

  • Parallel/streaming/lazy computation using dask.array with Xarray
  • Reading and writing data with Dask and Xarray
  • Automatic parallelization with apply_ufunc and map_blocks

ds = xr.tutorial.load_dataset("air_temperature") with 0.18 needs engine argument

Many notebooks out there start with the line ds = xr.tutorial.load_dataset("air_temperature"). That now gives an error traceback with xarray>=0.18:

Traceback (most recent call last):
  File "/Users/scott/GitHub/zarrdata/./create_zarr.py", line 6, in <module>
    ds = xr.tutorial.load_dataset("air_temperature")
  File "/Users/scott/miniconda3/envs/zarrdata/lib/python3.9/site-packages/xarray/tutorial.py", line 179, in load_dataset
    with open_dataset(*args, **kwargs) as ds:
  File "/Users/scott/miniconda3/envs/zarrdata/lib/python3.9/site-packages/xarray/tutorial.py", line 100, in open_dataset
    ds = _open_dataset(filepath, **kws)
  File "/Users/scott/miniconda3/envs/zarrdata/lib/python3.9/site-packages/xarray/backends/api.py", line 485, in open_dataset
    engine = plugins.guess_engine(filename_or_obj)
  File "/Users/scott/miniconda3/envs/zarrdata/lib/python3.9/site-packages/xarray/backends/plugins.py", line 112, in guess_engine
    raise ValueError("cannot guess the engine, try passing one explicitly")
ValueError: cannot guess the engine, try passing one explicitly

It's an easy fix though, just add ds = xr.tutorial.load_dataset("air_temperature", engine="netcdf4"), new users might be thrown by that though. Also a note that unless the netcdf4 library is explicitly put into the software environment, even adding the engine=netcdf4 can result in an error: "ValueError: unrecognized engine netcdf4 must be one of: ['store', 'zarr']", so I think a minimal environment definition to run would be:

name: xarray-tutorial
channels:
  - conda-forge
dependencies:
  - xarray=0.18
  - pooch=1.3
  - netcdf4=1.5
  - zarr=2.8

Dask errors in CI logs

Notebooks that create dask clusters (LocalCluster) when executed by jupyter-book build . end up with big tracebacks and end up leaving the dask-worker-space directory... not sure what causes this b/c the notebook execution actually is successful... so it seems like some sort of problem closing the dask cluster:

updating environment: executing outdated notebooks... Executing: /Users/scott/GitHub/xarray-contrib/xarray-tutorial/advanced/xarray_and_dask.ipynb
2022-06-16 11:12:08,966 - distributed.nanny - ERROR - Worker process died unexpectedly
Exception in thread Nanny stop queue watch:
Traceback (most recent call last):
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
2022-06-16 11:12:08,966 - distributed.nanny - ERROR - Worker process died unexpectedly
Exception in thread Nanny stop queue watch:
Traceback (most recent call last):
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/threading.py", line 946, in run
2022-06-16 11:12:08,966 - distributed.nanny - ERROR - Worker process died unexpectedly
Exception in thread Nanny stop queue watch:
Traceback (most recent call last):
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/site-packages/distributed/nanny.py", line 846, in watch_stop_q
2022-06-16 11:12:08,967 - distributed.nanny - ERROR - Worker process died unexpectedly
Exception in thread Nanny stop queue watch:
    self._target(*self._args, **self._kwargs)
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/site-packages/distributed/nanny.py", line 846, in watch_stop_q
Traceback (most recent call last):
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/threading.py", line 946, in run
    child_stop_q.close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/queues.py", line 143, in close
    child_stop_q.close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/queues.py", line 143, in close
    self._reader.close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/connection.py", line 182, in close
    self._target(*self._args, **self._kwargs)
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/site-packages/distributed/nanny.py", line 846, in watch_stop_q
    self.run()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/threading.py", line 946, in run
    self._reader.close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/connection.py", line 182, in close
    self._close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/connection.py", line 366, in _close
    self._close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/connection.py", line 366, in _close
    child_stop_q.close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/queues.py", line 143, in close
    self._target(*self._args, **self._kwargs)
    _close(self._handle)
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/site-packages/distributed/nanny.py", line 846, in watch_stop_q
OSError: [Errno 9] Bad file descriptor
    _close(self._handle)
OSError: [Errno 9] Bad file descriptor
    self._reader.close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/connection.py", line 182, in close
    self._close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/connection.py", line 366, in _close
    child_stop_q.close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/queues.py", line 143, in close
    _close(self._handle)
    self._reader.close()
OSError: [Errno 9] Bad file descriptor
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/connection.py", line 182, in close
    self._close()
  File "/Users/scott/miniconda3/envs/xarray-tutorial/lib/python3.10/multiprocessing/connection.py", line 366, in _close
    _close(self._handle)
OSError: [Errno 9] Bad file descriptor
2022-06-16 11:12:11,148 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/scott/GitHub/xarray-contrib/xarray-tutorial/advanced/dask-worker-space/worker-iqoqflqq', purging
2022-06-16 11:12:11,149 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/scott/GitHub/xarray-contrib/xarray-tutorial/advanced/dask-worker-space/worker-bs9vej_y', purging
2022-06-16 11:12:11,149 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/scott/GitHub/xarray-contrib/xarray-tutorial/advanced/dask-worker-space/worker-asihamol', purging
Execution Succeeded: /Users/scott/GitHub/xarray-contrib/xarray-tutorial/advanced/xarray_and_dask.ipynb
done

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.