Coder Social home page Coder Social logo

Comments (12)

e-marshall avatar e-marshall commented on July 18, 2024 2

Resurrecting this thread to hopefully work on a 'data tidying' page in the Xarray tutorial (following discussions at SciPy 2023). Do folks still think this would be a helpful addition? I started working on a rough draft to get some initial ideas down. Would love any input on the format as well as additions/ improvements! Feel free to edit the hackmd directly and/or discuss here.

from xarray-tutorial.

TomNicholas avatar TomNicholas commented on July 18, 2024 1

That works

from xarray-tutorial.

dcherian avatar dcherian commented on July 18, 2024

https://tidyr.tidyverse.org/articles/tidy-data.html
https://vita.had.co.nz/papers/tidy-data.pdf

from xarray-tutorial.

e-marshall avatar e-marshall commented on July 18, 2024

pinging some folks who may be interested @dcherian @scottyhq @TomNicholas @JessicaS11

from xarray-tutorial.

TomNicholas avatar TomNicholas commented on July 18, 2024

Do folks still think this would be a helpful addition?

100% yes!

I tried to access the hackmd but didn't have permission

from xarray-tutorial.

e-marshall avatar e-marshall commented on July 18, 2024

Shoot, sorry about that, I thought I'd updated the permissions
Does this link work? https://hackmd.io/@sApCZJjaT9eyK2YL4ckT7w/ryJAe5zSp/edit
I'll edit the original post if so

from xarray-tutorial.

scottyhq avatar scottyhq commented on July 18, 2024

Thanks @e-marshall for the great draft. I think this would be a fantastic addition to the 'fundamentals' section in its own subsection so that it's prominent and easy to see without having to navigate down into the layers. I left some comments in your document and lightly edited some text.

from xarray-tutorial.

dcherian avatar dcherian commented on July 18, 2024

Copying my slack comment over. I think

  1. The jupyterbook should become a "part" in this tutorial website. We can advertise that we're open for contributions. Potentially it could also become a pythia cookbook.
  2. The hackmd doc should become a page in the main xarray reference docs that links here. That way we get visibility without cluttering the main doc site with domain-specific examples like "Aquarius" and what not.

Thoughts? @TomNicholas ?

from xarray-tutorial.

TomNicholas avatar TomNicholas commented on July 18, 2024

Thoughts?

I like this suggestion. I like the idea of having a smaller version of this conceptual framework presented somewhere in the main xarray docs. It's important that we point totally new users to this content as early as possible. Also if we have a good diagram that demonstrates these concepts I think including that might be worthwhile to include in xarray's main docs (RGB arrays being worth 10^3 words and all that).

I also like the idea of having another place where we can continue to collect more and more examples of how not to do things, but without cluttering the main xarray docs.


Specific comments on content:

  • DataTree should probably be mentioned, but with low weight, and possibly only as one termination of a flowchart. Users really don't need it unless their data is fundamentally not alignable, and I would be wary that pointing new users to it too much would lead to some creative new anti-patterns 😅
  • As written the text conflates geo-science tooling with non-geoscience cases, but tidy-xarray concepts are domain-agnostic. We should be clear to only recommend cf-xarray if your data could sensibly conform to the Climate and Forecast (is that what it stands for?) conventions.
  • Links to xarray's terminology page would be great, as would any clarifications to it too.

from xarray-tutorial.

e-marshall avatar e-marshall commented on July 18, 2024

Thanks everyone for the feedback and discussion! To summarize, it seems like next steps are:

  1. Move the jupyter book to a Data Tidying subsection of the Fundamentals section of the Xarray tutorial.
  2. Move hackmd doc to xarray main docs

To do items for each of the above:

1. Jupyter book -- version I'm working on here

  • a few points in the hackmd doc should be transferred over to the intro page of the jupyter book. jupyter book will have all domain-specific examples/language
  • add link / ref to terminology page
  • add high-level mention of DataTree

2. Hackmd.doc

  • add mention of DataTree
  • add Terminology page
  • remove domain-specific examples/language
  • clean up/combine figures 2,3
  • remake fig 1 to be text-based (might lose the visual elements like box/underline) and add links to suggested functions/methods

I'll work on the above-mentioned edits. If there are specific parts that anyone was interested in tackling that's totally fine too! I'll plan to open PRs in the tutorial and main repo once edits have been made if that sounds good. I will be presenting a poster on this at AGU next week. Currently I have QR codes linking to the jupyter book hosted on my github pages and the tidy repo for contributing. If the pages are merged before then I'll update the QR codes but timing might be tight

from xarray-tutorial.

dcherian avatar dcherian commented on July 18, 2024

Move the jupyter book to a Data Tidying subsection of the Fundamentals section of the Xarray tutorial.

We already have one : https://tutorial.xarray.dev/data_cleaning/data_cleaning.html

from xarray-tutorial.

e-marshall avatar e-marshall commented on July 18, 2024

Oh sorry, I meant just the content pages, not the full book. I'd imagined the data_cleaining root page having something like the below, where the subheadings (Examples, Contributing) could have their own page instead of their content on the root page, is that what you had in mind?

Introduction

Examples

1. Aquarius

This is an example of tidying a dataset comprised of locally downloaded files. Aquarius is a sea surface salinity dataset produced by NASA and accessed as network Common Data Form (NetCDF) files. You can find this example here. This example focuses on data access steps and organizing data into a workable data cube.

2. ASE Ice Velocity

this examples uses an ice velocity dataset derived from synthetic aperture radar imagery. You can find it here. This example focuses on data access steps and organizing data into a workable data cube.

3. Harmonized Landsat-Sentinel

This example features cloud-optimized data that does not need to be downloaded locally. Here, package such as odc-stac are used to accomplish much of the initial tidying (assembling an x,y,time cube). However, this example shows that there is frequently additional formatting required to make a dataset analysis ready.

Contributing

This project is an evolving community effort. We want to hear from you!. Many workflows involve some version of the examples discussed here. The solutions you’ve developed in your work could help future users and help the community move toward more established norms around tidy data. Please consider submitting any examples you may have. You can create an issue here.If you have any questions or topics you’d like to discuss, please don’t hesitate to create an issue on github.

note: issue template has some errors currently, need to fix

from xarray-tutorial.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.