pangeo-data / foss4g-2022 Goto Github PK
View Code? Open in Web Editor NEWPangeo tutorial at FOSS4G 2022
Home Page: https://pangeo-data.github.io/foss4g-2022
License: Other
Pangeo tutorial at FOSS4G 2022
Home Page: https://pangeo-data.github.io/foss4g-2022
License: Other
Please type here additional resources you consider relevant to highlight in the Beyond the workhop section.
I guess it would be nice to have a Docker image we could just pull to start Dask cluster on EOSC infra or any other K8S Jupyterhub or even on any laptop to reproduce tutorial prepared here.
I'm not sure if this was already done by @tinaok or any of you in some other repo? Should it go here or in Pangeo-eosc repo ? (I would say here because env is related to Foss4g application).
I've done something similar in https://github.com/guillaumeeb/pangeo-docker by extending Pangeo docker images. I can work on this if you feel this is interesting and you didn't do it yet.
Mention other Pangeo related presentations in FOSS4G in BEYOND THE WORKSHOP > Resources?
Friday 26th
The goal of this issue is to discuss what we should put in chunking introduction notebook.
This comes from comments and discussions in #45.
I believe we should do the following, but this needs discussion (especially with @tinaok):
open_dataset
and the chunks
attribute, introduce Dask Arrays, talk about lazyness and sequential processing. Talk also of native chunks in files here, if possible without kerchunk (with h5py maybe?).Here are some propositions as discussed in #45.
Please indicate whether it's OK for you (especially @tinaok):
At the very end of episode 1 (Xarray), we add info on how to add compression when saving file.
Then we remove compression from "Chunking and compression".
Summary
Issue to add contributors information with @allcontributors.
What needs to be done?
Write a comment with the contribution information with the following nomenclature:
@all-contributors please add @<username> for <contributions>
See the available Emoji Key โจ here.
Feel free to list those contribution types you find more relevant to this repo.
Further info of the all-contributors bot here.
Who can help?
Anyone
The current version opens hyperlinks in the same tab.
I've lost a little bit the track about the infrastructure but, to reproduce the error @j34ni is facing, I started to run the notebook.
Once I try to create the cluster I get ClientResponseError: 401, message='Unauthorized', url=URL('http://api-daskhub-dask-gateway.daskhub:8000/api/v1/clusters/')
Does anyone know the reason?
Testing the code without it doesn't reproduce the error he is facing.
Currently, there are two main folders: NOTEBOOKS (not fond of the upper case style), and tutorial. In tutorial: notebooks.
I'm under the impression that all notebooks that workshop participants want to see should be in tutorial/notebooks.
If so, we should move important content from NOTEBOOKS folder to tutorial/notebooks, and either remove the NOTEBOOKS folder, or rename it in a way it is clear that its content is not important for learning about Pangeo.
Anyway, I think we should clarify things here.
There should be a how to run things section somewhere. I think this is foreseen and intended to go there: https://pangeo-data.github.io/foss4g-2022/before/setup.html, am I right?
I propose to put the following content:
Thoughts?
Hi all,
It seems you are all working on some part of this repo (building use cases for @pl-marasco and @acocac), working on docs/Jupyter book for @annefou and @acocac, integrating things with Dask distributed and EOSC for @tinaok.
I'd like to contribute here for some simple things to help, but I'm not sure in which part to dive into, in addition to reviewing content/notebooks if asked.
A few things I notice for now when opening the repo:
Should you/we open issues for several tasks to do, and see who want to work on them?
Or do you have any idea for me of simple things to do (I cannot engage into lengthy tasks unfortunately).
Best,
We already have a main setup page and an environment.yaml file.
Is this really important that each of our notebooks list the main libraries it needs?
How do we chose the libraries we put there: those which are imported in cells? Those for which we uses some objects one way or another?
I understand that we may want each notebook to contains every information to run it, but this section is really hard to maintain, and maybe we should just link to the setup page and the environment.yaml?
I'd propose to have a separate episode for data discovery after Parallel computing with Dask. We can keep a short episode introducing remote access with s3fs
before Data chunking.
It's good idea to crossreference previous Pangeo related training material in FOSS4G. For instance, I did something similar to cite the Pangeo 101 Galaxy training.
In the data discovery episode, I'll point to the STAC notebooks covered in 2021 FOSS4G. Also I'll highlight @rabernat's talk of Pangeo-Forge.
Feel free to add further ideas in this issue.
Add citation to all packages where they exist
Add timeline + names (who is teaching what).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.