<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

pinging some folks who may be interested <a class="user-mention notranslate" data-hove

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Copying my slack comment over. I think The <a href="https://e-

concept idea: tidy Xarray datasets about xarray-tutorial HOT 12 CLOSED

dcherian commented on July 18, 2024

concept idea: tidy Xarray datasets

from xarray-tutorial.

Comments (12)

e-marshall commented on July 18, 2024 2

Resurrecting this thread to hopefully work on a 'data tidying' page in the Xarray tutorial (following discussions at SciPy 2023). Do folks still think this would be a helpful addition? I started working on a rough draft to get some initial ideas down. Would love any input on the format as well as additions/ improvements! Feel free to edit the hackmd directly and/or discuss here.

from xarray-tutorial.

TomNicholas commented on July 18, 2024 1

That works

from xarray-tutorial.

dcherian commented on July 18, 2024

https://tidyr.tidyverse.org/articles/tidy-data.html
https://vita.had.co.nz/papers/tidy-data.pdf

from xarray-tutorial.

e-marshall commented on July 18, 2024

pinging some folks who may be interested @dcherian @scottyhq @TomNicholas @JessicaS11

from xarray-tutorial.

TomNicholas commented on July 18, 2024

Do folks still think this would be a helpful addition?

100% yes!

I tried to access the hackmd but didn't have permission

from xarray-tutorial.

e-marshall commented on July 18, 2024

Shoot, sorry about that, I thought I'd updated the permissions
Does this link work? https://hackmd.io/@sApCZJjaT9eyK2YL4ckT7w/ryJAe5zSp/edit
I'll edit the original post if so

from xarray-tutorial.

scottyhq commented on July 18, 2024

Thanks @e-marshall for the great draft. I think this would be a fantastic addition to the 'fundamentals' section in its own subsection so that it's prominent and easy to see without having to navigate down into the layers. I left some comments in your document and lightly edited some text.

from xarray-tutorial.

dcherian commented on July 18, 2024

Copying my slack comment over. I think

The jupyterbook should become a "part" in this tutorial website. We can advertise that we're open for contributions. Potentially it could also become a pythia cookbook.
The hackmd doc should become a page in the main xarray reference docs that links here. That way we get visibility without cluttering the main doc site with domain-specific examples like "Aquarius" and what not.

Thoughts? @TomNicholas ?

from xarray-tutorial.

TomNicholas commented on July 18, 2024

Thoughts?

I like this suggestion. I like the idea of having a smaller version of this conceptual framework presented somewhere in the main xarray docs. It's important that we point totally new users to this content as early as possible. Also if we have a good diagram that demonstrates these concepts I think including that might be worthwhile to include in xarray's main docs (RGB arrays being worth 10^3 words and all that).

I also like the idea of having another place where we can continue to collect more and more examples of how not to do things, but without cluttering the main xarray docs.

Specific comments on content:

DataTree should probably be mentioned, but with low weight, and possibly only as one termination of a flowchart. Users really don't need it unless their data is fundamentally not alignable, and I would be wary that pointing new users to it too much would lead to some creative new anti-patterns 😅
As written the text conflates geo-science tooling with non-geoscience cases, but tidy-xarray concepts are domain-agnostic. We should be clear to only recommend cf-xarray if your data could sensibly conform to the Climate and Forecast (is that what it stands for?) conventions.
Links to xarray's terminology page would be great, as would any clarifications to it too.

from xarray-tutorial.

e-marshall commented on July 18, 2024

Thanks everyone for the feedback and discussion! To summarize, it seems like next steps are:

Move the jupyter book to a Data Tidying subsection of the Fundamentals section of the Xarray tutorial.
Move hackmd doc to xarray main docs

To do items for each of the above:

1. Jupyter book -- version I'm working on here

a few points in the hackmd doc should be transferred over to the intro page of the jupyter book. jupyter book will have all domain-specific examples/language
add link / ref to terminology page
add high-level mention of DataTree

2. Hackmd.doc

add mention of DataTree
add Terminology page
remove domain-specific examples/language
clean up/combine figures 2,3
remake fig 1 to be text-based (might lose the visual elements like box/underline) and add links to suggested functions/methods

I'll work on the above-mentioned edits. If there are specific parts that anyone was interested in tackling that's totally fine too! I'll plan to open PRs in the tutorial and main repo once edits have been made if that sounds good. I will be presenting a poster on this at AGU next week. Currently I have QR codes linking to the jupyter book hosted on my github pages and the tidy repo for contributing. If the pages are merged before then I'll update the QR codes but timing might be tight

from xarray-tutorial.

dcherian commented on July 18, 2024

Move the jupyter book to a Data Tidying subsection of the Fundamentals section of the Xarray tutorial.

We already have one : https://tutorial.xarray.dev/data_cleaning/data_cleaning.html

from xarray-tutorial.

e-marshall commented on July 18, 2024

Oh sorry, I meant just the content pages, not the full book. I'd imagined the data_cleaining root page having something like the below, where the subheadings (Examples, Contributing) could have their own page instead of their content on the root page, is that what you had in mind?

Introduction

Examples

1. Aquarius

This is an example of tidying a dataset comprised of locally downloaded files. Aquarius is a sea surface salinity dataset produced by NASA and accessed as network Common Data Form (NetCDF) files. You can find this example here. This example focuses on data access steps and organizing data into a workable data cube.

2. ASE Ice Velocity

this examples uses an ice velocity dataset derived from synthetic aperture radar imagery. You can find it here. This example focuses on data access steps and organizing data into a workable data cube.

3. Harmonized Landsat-Sentinel

This example features cloud-optimized data that does not need to be downloaded locally. Here, package such as odc-stac are used to accomplish much of the initial tidying (assembling an x,y,time cube). However, this example shows that there is frequently additional formatting required to make a dataset analysis ready.

Contributing

This project is an evolving community effort. We want to hear from you!. Many workflows involve some version of the examples discussed here. The solutions you’ve developed in your work could help future users and help the community move toward more established norms around tidy data. Please consider submitting any examples you may have. You can create an issue here.If you have any questions or topics you’d like to discuss, please don’t hesitate to create an issue on github.

note: issue template has some errors currently, need to fix

from xarray-tutorial.

concept idea: tidy Xarray datasets about xarray-tutorial HOT 12 CLOSED

Comments (12)

To do items for each of the above:

1. Jupyter book -- version I'm working on here

2. Hackmd.doc

Introduction

Examples

1. Aquarius

2. ASE Ice Velocity

3. Harmonized Landsat-Sentinel

Contributing

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent