Comments (12)
Resurrecting this thread to hopefully work on a 'data tidying' page in the Xarray tutorial (following discussions at SciPy 2023). Do folks still think this would be a helpful addition? I started working on a rough draft to get some initial ideas down. Would love any input on the format as well as additions/ improvements! Feel free to edit the hackmd directly and/or discuss here.
from xarray-tutorial.
That works
from xarray-tutorial.
https://tidyr.tidyverse.org/articles/tidy-data.html
https://vita.had.co.nz/papers/tidy-data.pdf
from xarray-tutorial.
pinging some folks who may be interested @dcherian @scottyhq @TomNicholas @JessicaS11
from xarray-tutorial.
Do folks still think this would be a helpful addition?
100% yes!
I tried to access the hackmd but didn't have permission
from xarray-tutorial.
Shoot, sorry about that, I thought I'd updated the permissions
Does this link work? https://hackmd.io/@sApCZJjaT9eyK2YL4ckT7w/ryJAe5zSp/edit
I'll edit the original post if so
from xarray-tutorial.
Thanks @e-marshall for the great draft. I think this would be a fantastic addition to the 'fundamentals' section in its own subsection so that it's prominent and easy to see without having to navigate down into the layers. I left some comments in your document and lightly edited some text.
from xarray-tutorial.
Copying my slack comment over. I think
- The jupyterbook should become a "part" in this tutorial website. We can advertise that we're open for contributions. Potentially it could also become a pythia cookbook.
- The hackmd doc should become a page in the main xarray reference docs that links here. That way we get visibility without cluttering the main doc site with domain-specific examples like "Aquarius" and what not.
Thoughts? @TomNicholas ?
from xarray-tutorial.
Thoughts?
I like this suggestion. I like the idea of having a smaller version of this conceptual framework presented somewhere in the main xarray docs. It's important that we point totally new users to this content as early as possible. Also if we have a good diagram that demonstrates these concepts I think including that might be worthwhile to include in xarray's main docs (RGB arrays being worth 10^3 words and all that).
I also like the idea of having another place where we can continue to collect more and more examples of how not to do things, but without cluttering the main xarray docs.
Specific comments on content:
- DataTree should probably be mentioned, but with low weight, and possibly only as one termination of a flowchart. Users really don't need it unless their data is fundamentally not alignable, and I would be wary that pointing new users to it too much would lead to some creative new anti-patterns 😅
- As written the text conflates geo-science tooling with non-geoscience cases, but tidy-xarray concepts are domain-agnostic. We should be clear to only recommend cf-xarray if your data could sensibly conform to the Climate and Forecast (is that what it stands for?) conventions.
- Links to xarray's terminology page would be great, as would any clarifications to it too.
from xarray-tutorial.
Thanks everyone for the feedback and discussion! To summarize, it seems like next steps are:
- Move the jupyter book to a Data Tidying subsection of the Fundamentals section of the Xarray tutorial.
- Move hackmd doc to xarray main docs
To do items for each of the above:
1. Jupyter book -- version I'm working on here
- a few points in the hackmd doc should be transferred over to the intro page of the jupyter book. jupyter book will have all domain-specific examples/language
- add link / ref to terminology page
- add high-level mention of DataTree
2. Hackmd.doc
- add mention of DataTree
- add Terminology page
- remove domain-specific examples/language
- clean up/combine figures 2,3
- remake fig 1 to be text-based (might lose the visual elements like box/underline) and add links to suggested functions/methods
I'll work on the above-mentioned edits. If there are specific parts that anyone was interested in tackling that's totally fine too! I'll plan to open PRs in the tutorial and main repo once edits have been made if that sounds good. I will be presenting a poster on this at AGU next week. Currently I have QR codes linking to the jupyter book hosted on my github pages and the tidy repo for contributing. If the pages are merged before then I'll update the QR codes but timing might be tight
from xarray-tutorial.
Move the jupyter book to a Data Tidying subsection of the Fundamentals section of the Xarray tutorial.
We already have one : https://tutorial.xarray.dev/data_cleaning/data_cleaning.html
from xarray-tutorial.
Oh sorry, I meant just the content pages, not the full book. I'd imagined the data_cleaining root page having something like the below, where the subheadings (Examples, Contributing) could have their own page instead of their content on the root page, is that what you had in mind?
Introduction
Examples
1. Aquarius
This is an example of tidying a dataset comprised of locally downloaded files. Aquarius is a sea surface salinity dataset produced by NASA and accessed as network Common Data Form (NetCDF) files. You can find this example here. This example focuses on data access steps and organizing data into a workable data cube.
2. ASE Ice Velocity
this examples uses an ice velocity dataset derived from synthetic aperture radar imagery. You can find it here. This example focuses on data access steps and organizing data into a workable data cube.
3. Harmonized Landsat-Sentinel
This example features cloud-optimized data that does not need to be downloaded locally. Here, package such as odc-stac are used to accomplish much of the initial tidying (assembling an x,y,time cube). However, this example shows that there is frequently additional formatting required to make a dataset analysis ready.
Contributing
This project is an evolving community effort. We want to hear from you!. Many workflows involve some version of the examples discussed here. The solutions you’ve developed in your work could help future users and help the community move toward more established norms around tidy data. Please consider submitting any examples you may have. You can create an issue here.If you have any questions or topics you’d like to discuss, please don’t hesitate to create an issue on github.
note: issue template has some errors currently, need to fix
from xarray-tutorial.
Related Issues (20)
- Add more options to run interactively on the cloud HOT 20
- Add non-geoscience example datasets HOT 11
- Add CONTRIBUTING.md to website
- add rough logo
- Add general codespaces setup to README.md HOT 4
- Cartopy plot not rendering HOT 4
- Add sphinx-sitemap
- Review indexing material HOT 1
- A few points for improving the indexing notebooks HOT 3
- Tutorial on custom indexes
- Missing examples for the map and map_dataarray methods of FacetGrid objects
- Link needed for "Working with labeled data" tutorial HOT 2
- Remote access patterns using xarray. HOT 8
- Update to JupyterBook 1.0 HOT 1
- Update deprecated scipy.integrate.trapz to trapezoid
- Set navigation_with_keys=False explicitly
- Periodically check for broken links
- Generate website preview link even if build fails on PRs HOT 1
- Xarray tutorial with custom SSL cert? HOT 3
- Binder environment solve failing HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xarray-tutorial.