Coder Social home page Coder Social logo

Comments (18)

Haimantika avatar Haimantika commented on September 20, 2024 1

Hi @adamjstewart I am interested to contribute to this, but I am fairly new and will need more guidance. What is a good place for me to start?

from torchgeo.

Haimantika avatar Haimantika commented on September 20, 2024 1

Hi @Haimantika, thanks for volunteering!

Let's pick a single dataset, maybe torchgeo/datasets/benin_cashews.py, and try to convert it to the new syntax. Once that's working, we can repeat for the other 7 datasets and remove all mention of radiant-mlhub.

This is the new dataset website: https://beta.source.coop/technoserve/cashews-benin/

If you create an account, log in, and click generate credentials, you'll see that the Azure URI is https://radiantearth.blob.core.windows.net/mlhub/technoserve-cashew-benin

We'll add a new dependency on azure-storage-blob in pyproject.toml, requirements/datasets.txt, and requirements/min-reqs.old. I can help determine the minimum supported version.

We'll probably add something similar to download_radiant_mlhub_dataset but for Azure blobs in torchgeo/datasets/utils.py. This can then be imported in torchgeo/datasets/benin_cashews.py and used in _download.

Let me know if anything is unclear. The first dataset is going to be a bit of work, but once we have one working, the rest should be easy.

This is very helpful. Thanks a lot. I will start working on it and get back with doubts, if any.

from torchgeo.

Haimantika avatar Haimantika commented on September 20, 2024 1

I have not seen any PRs that implement download support for Source Cooperative. Which PR are you referring to?

My bad. This one just mentioned the issue.

from torchgeo.

adamjstewart avatar adamjstewart commented on September 20, 2024 1

Yes, this is a 9th dataset that will benefit from your contribution.

P.S. I reached out to the folks at Source Cooperative. One thing to note is that azure-storage-blob will copy raw files/directories, not zip/tar files. So there won't be an easy way to checksum these. For now, let's just focus on downloading and ignore checksumming.

from torchgeo.

Haimantika avatar Haimantika commented on September 20, 2024 1

@Haimantika @darkblue-b all preliminary work is now complete. If you want to claim 1 or more datasets from the above list and start working on them, #2068 will show you what is required to convert them. Note that most of the file changes in that PR are auto-generated by data.py. You really only need to change tests/data/foo/data.py to the new data structure and run it, and change torchgeo/datasets/foo.py to read and download the new data structure.

Thanks Adam. I will take a look at the code and the dataset tonight and update you on which one I take up.

from torchgeo.

Haimantika avatar Haimantika commented on September 20, 2024 1

Hey @adamjstewart I will be taking up the NASA Marine Debris dataset. Will start working from this weekend.

from torchgeo.

adamjstewart avatar adamjstewart commented on September 20, 2024

Hi @Haimantika, thanks for volunteering!

Let's pick a single dataset, maybe torchgeo/datasets/benin_cashews.py, and try to convert it to the new syntax. Once that's working, we can repeat for the other 7 datasets and remove all mention of radiant-mlhub.

This is the new dataset website: https://beta.source.coop/technoserve/cashews-benin/

If you create an account, log in, and click generate credentials, you'll see that the Azure URI is https://radiantearth.blob.core.windows.net/mlhub/technoserve-cashew-benin

We'll add a new dependency on azure-storage-blob in pyproject.toml, requirements/datasets.txt, and requirements/min-reqs.old. I can help determine the minimum supported version.

We'll probably add something similar to download_radiant_mlhub_dataset but for Azure blobs in torchgeo/datasets/utils.py. This can then be imported in torchgeo/datasets/benin_cashews.py and used in _download.

Let me know if anything is unclear. The first dataset is going to be a bit of work, but once we have one working, the rest should be easy.

from torchgeo.

Haimantika avatar Haimantika commented on September 20, 2024

Hi @adamjstewart I finally got some time to work on it. I see a PR has been raised, is the issue solved already?

from torchgeo.

adamjstewart avatar adamjstewart commented on September 20, 2024

I have not seen any PRs that implement download support for Source Cooperative. Which PR are you referring to?

from torchgeo.

Haimantika avatar Haimantika commented on September 20, 2024

download_radiant_mlhub_dataset

Yes, this is a 9th dataset that will benefit from your contribution.

P.S. I reached out to the folks at Source Cooperative. One thing to note is that azure-storage-blob will copy raw files/directories, not zip/tar files. So there won't be an easy way to checksum these. For now, let's just focus on downloading and ignore checksumming.

Hi, I was doing a bit of research and the latest version of source cooperative that I could find was - beta.source.coop

Is that it? Or am I missing something? I have made the changes, can make a PR for you to take a look.

from torchgeo.

adamjstewart avatar adamjstewart commented on September 20, 2024

Yes, that's the new website.

from torchgeo.

Haimantika avatar Haimantika commented on September 20, 2024

@adamjstewart I have raised a PR. There are chances that this is not the solution you are looking for. However I would like to give it one more try after your review and then unassign myself if it does not work to respect your time. :)

from torchgeo.

darkblue-b avatar darkblue-b commented on September 20, 2024

review of MSFT azure-sdk-for-python that includes examples like this. Second view of the azcopy tool. python is preferred for torchGeo ; not clear how portable dependency management would work for azcopy .. Spack and conda have hooks but pip does not have good hooks for this kind of binary tool depends. Simply recommending azcopy and failing gracefully when it is not present was discussed briefly. not yet resolved

from torchgeo.

adamjstewart avatar adamjstewart commented on September 20, 2024

We definitely don't need all of azure, azure-storage-blob would suffice.

from torchgeo.

darkblue-b avatar darkblue-b commented on September 20, 2024

this file appears to implement basic functionality https://github.com/kartAI/kartAI/blob/master/azure/blobstorage.py

from torchgeo.

adamjstewart avatar adamjstewart commented on September 20, 2024

@Haimantika @darkblue-b all preliminary work is now complete. If you want to claim 1 or more datasets from the above list and start working on them, #2068 will show you what is required to convert them. Note that most of the file changes in that PR are auto-generated by data.py. You really only need to change tests/data/foo/data.py to the new data structure and run it, and change torchgeo/datasets/foo.py to read and download the new data structure.

from torchgeo.

adamjstewart avatar adamjstewart commented on September 20, 2024

Pinging the original dataset contributors:

If any of you have time, would you be interested in revamping these datasets to download from Source Cooperative?

from torchgeo.

adamjstewart avatar adamjstewart commented on September 20, 2024

@ashnair1 it looks like SpaceNet is no longer hosted by Source Cooperative and is only on AWS, is that correct? We can use the same CLI function I wrote to update that dataset to its new download location. There's also a new SpaceNet8 released if you want to add it.

from torchgeo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.