Coder Social home page Coder Social logo

Comments (15)

TomAugspurger avatar TomAugspurger commented on May 23, 2024 1

Another option is nbsphinx: https://nbsphinx.readthedocs.io/en/0.8.7/ That's what's used in dask-examples: https://github.com/dask/dask-examples, which are rendered at https://examples.dask.org/.

from torchgeo.

Geethen avatar Geethen commented on May 23, 2024 1

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

Here is another example of how to do this: https://github.com/PyTorchLightning/lightning-tutorials

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

Started looking into this. You can directly render a notebook using nbsphinx. pandoc is required if you have any markdown in your notebook. This is what Lightning does for their tutorials.

However, PyTorch does something completely different. They instead store the file as a .py file and encode the rst in comments. I'm guessing this makes it easier to test? Not sure how this gets automatically converted to a notebook when you open in Google Colab/MS Learn.

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

On the thread of running and testing notebooks to make sure they remain up-to-date, nbmake seems like a good way to integrate things with pytest: https://semaphoreci.com/blog/test-jupyter-notebooks-with-pytest-and-nbmake

However, that requires a specific conda environment to be active, which I don't want. Also, we'll need all dependencies installed and have data available. Some of these training loops could be very time-intensive to run.

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

Okay, here's what I've decided. We'll use nbsphinx to render the tutorial notebooks and nbmake to test them. Tests will be split into:

  • unit tests (fast, run on every push/pull_request to any branch)
  • integration/functional tests (slow, run on every push/pull_request to a release branch)

This will allow us to iterate quickly on PRs without inundating CI but still make sure that the entire stack including data download and model training works as expected before each release. We'll move testing of setup.py and train.py to the integration/functional tests, which will greatly speed those up as well.

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

Another possibility instead of downloading the data ourselves is to use existing datasets in the cloud. I don't think Google Colab has access to any satellites imagery, and the Planetary Computer is not yet available to the general public. Are there any other cloud services that could work?

from torchgeo.

Geethen avatar Geethen commented on May 23, 2024

Google Earth Engine, there are a few datasets available for ML as of now.
BigEarthNet
LandCoverNet

I would be keen on assisting with this at some point

This also gave me an idea to contribute more datasets to the community catalog

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

Does GEE support running jupyter notebooks? I've only ever used JavaScript in their code editor. It's hard to make any assumptions about data availability since the notebook needs to run on Colab, PC, and CI.

from torchgeo.

Geethen avatar Geethen commented on May 23, 2024

Yes, it does via the GEE Python API.

Some drawbacks

  1. The interactive leafmap/folium map will not stay alive. So you would have to opt for static images (that will need to be downloaded)
  2. Perhaps more serious, is the user would need to download the data to their drive (or GD or GCS) which has been made easier with geedim for image data (currently the workflow I use). However, I do not know of an instance where this can be avoided. I wonder if streaming the data as batches from GEE would be fast enough, or how much of a delay that will introduce.

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

Okay, so this would be no different than our current approach of downloading data from Planetary Computer. Just another source of data.

from torchgeo.

Geethen avatar Geethen commented on May 23, 2024

My apologies. I think it is going to be the case on all platforms for the foreseeable future (until GEE directly supports NNs-likely not any time soon).

Side note: in the geedim package the author used an approach based on rasterio to write image patches in chunks. Perhaps useful for inference

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

Even if GEE directly supported NNs, they wouldn't support TorchGeo, so they aren't really relevant to us other than a possible data source. It would be much more fruitful to be able to directly support data in Colab or Planetary Computer. There's some work in progress on the PC side, but I'm not sure what's available in Colab.

from torchgeo.

Geethen avatar Geethen commented on May 23, 2024

from torchgeo.

adamjstewart avatar adamjstewart commented on May 23, 2024

Yep, GEE used to have a lot more data, although I think PC might have already caught up in that front. GEE is still far more user friendly and easier to scale, so it's winning for non-CS people. But GEE is also very limited because it doesn't support NNs. In that sense, GEE is ~10 years behind TorchGeo 😄

(the entire geospatial community is ~10 years behind the computer vision community, computer vision folks haven't used anything other than CNNs for over a decade)

We're hoping to provide something as easy as possible for geospatial researchers hoping to explore deep learning methods. Of course, TorchGeo isn't restricted to Colab or PC, you can use it on your laptop, supercomputer, or in the cloud (AWS, Azure, GCP, etc.). As long as you can get your hands on some data, and you can afford compute time, you can use TorchGeo.

from torchgeo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.