Coder Social home page Coder Social logo

Comments (11)

kbgg avatar kbgg commented on June 2, 2024 1

That's correct. Each dataset is slightly different (i.e. vector/raster labels, imagery bands) so there is no "one size fits all" notebook. We don't have an example notebook for the SpaceNet dataset so your best bet would be to explore how the SpaceNet datasets are structured in our API and repurpose an existing notebook.

from mlhub-tutorials.

kbgg avatar kbgg commented on June 2, 2024

Hi Ashwin,

You shouldn't be receiving that error, I'm looking into it and will get back to you shortly.

Best,
Kevin

from mlhub-tutorials.

kbgg avatar kbgg commented on June 2, 2024

@ashnair1 This should now be resolved. The links to the source imagery were pointing to an invalid item ID and the 'label' asset really should have had the 'labels' key instead. Both of these issues have been fixed.

from mlhub-tutorials.

ashnair1 avatar ashnair1 commented on June 2, 2024

The image issue seems to be resolved. But I can't seem to access labels.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-1830123587e0> in <module>
----> 1 get_items(f'https://api.radiant.earth/mlhub/v1/collections/{collectionId}/items?limit={limit}', max_items_downloaded=1)

<ipython-input-4-7e48d49c1897> in get_items(uri, classes, cloud_and_shadow, seasonal_snow, max_items_downloaded, items_downloaded)
     88 
     89         # Download the label and source imagery for the item
---> 90         download_source_and_labels(feature)
     91 
     92         # Stop downloaded items if we reached the maximum we specify

<ipython-input-4-7e48d49c1897> in download_source_and_labels(item)
     38     import pdb
     39     pdb.set_trace()
---> 40     labels = item.get('assets').get('labels')
     41     links = item.get('links')
     42 

TypeError: 'NoneType' object is not subscriptable

It seems item doesn't have a labels field,

ipdb> item.get('assets')
{'MS': {'href': 'https://api.radiant.earth/mlhub/v1/download/cd9b80a8ba7a6e2c05f8420a11caa583dac46ff7ca02030e23a4ccafc4527b4f', 'title': 'MS-geotiff', 'type': 'image/tiff; application=geotiff'}, 'PAN': {'href': 'https://api.radiant.earth/mlhub/v1/download/6991b88e54cf4f1d95779d9ca07d0cd5696a64cd2f01c06c6ebeccfecc6f71a9', 'title': 'PAN-geotiff', 'type': 'image/tiff; application=geotiff'}, 'PS-MS': {'href': 'https://api.radiant.earth/mlhub/v1/download/0e2d142874a26ee67e36ec048c6ef75ada84fe86f0c23dc8cb99bf631611a9dd', 'title': 'PS-MS-geotiff', 'type': 'image/tiff; application=geotiff'}, 'PS-RGB': {'href': 'https://api.radiant.earth/mlhub/v1/download/75563ba9260f85f3496c7391719c34c0ab2e371e05b047f46f00a5bc8179fc1d', 'title': 'PS-RGB-geotiff', 'type': 'image/tiff; application=geotiff'}}

from mlhub-tutorials.

kbgg avatar kbgg commented on June 2, 2024

It looks like you're using the BigEarthNet notebook. In all of the datasets except SpaceNet we've separated source imagery items and label items into different collections. Source imagery items will not have a labels asset. Since the BigEarthNet notebook expects the source imagery and labels to be in separate collections it's erroring out when it reaches a source imagery item instead of a label item. You can add an additional check if there's a labels asset and if not skip the item.

from mlhub-tutorials.

ashnair1 avatar ashnair1 commented on June 2, 2024

Adding the check for label, I was able to download the MS, PAN, PS-MS and PS-RGB versions of img64. But then a record appeared that had no assets field which is as follows:

{'description': 'SpaceNet 2 Khartoum Chipped Training Dataset', 'extent': {'spatial': {'bbox': [[32.4858384, 15.5138111999, 32.5665684, 15.7402062]]}, 'temporal': {'interval': [['2015-04-13T00:00:00Z', None]]}}, 'id': 'sn2_AOI_5_Khartoum', 'license': 'CC-BY-SA-4.0', 'links': [{'href': 'https://api.radiant.earth/mlhub/v1/collections/sn2_AOI_5_Khartoum', 'rel': 'self'}, {'href': 'https://api.radiant.earth/mlhub/v1/', 'rel': 'parent'}, {'href': 'https://api.radiant.earth/mlhub/v1/', 'rel': 'root'}, {'href': 'https://api.radiant.earth/mlhub/v1/collections/sn2_AOI_5_Khartoum/items', 'rel': 'items'}], 'properties': {'license': 'CC-BY-SA-4.0', 'providers': [{'name': 'SpaceNet LLC', 'roles': ['processor', 'host', 'licensor', 'producer'], 'url': 'https://api.radiant.earth/mlhub/v1/download/017ab8ab69ffa44271d32452fe85eab079ac53ef3370b2eed74e2e87769eae57'}]}, 'providers': [{'name': 'SpaceNet LLC', 'roles': ['processor', 'host', 'licensor', 'producer'], 'url': 'https://api.radiant.earth/mlhub/v1/download/078e2ee114866281d8d728c610d8cce8b3780edb1f7e010c49a5e20776c636ee'}], 'stac_extensions': ['label'], 'version': 1}

Could you elaborate on what this record is for?

from mlhub-tutorials.

kbgg avatar kbgg commented on June 2, 2024

That would be a STAC Collection record which doesn't have assets. I'm not sure how your script navigated to that page but links with the rel type "parent" or "collection" in an item will link to that item's collection

from mlhub-tutorials.

ashnair1 avatar ashnair1 commented on June 2, 2024

Right. Just to clarify, I'm trying to re-purpose the download code from the BigEarthNet notebook to download the Spacenet datasets. However it seems to me that it might not be as simple as just replacing collectionID in the notebook (from bigearthnet_v1_labels to sn2_AOI_3_Paris as I originally thought.

As a side note, do you have any examples of using the api to download the SpaceNet datasets? That would really be helpful since it differs from the other datasets.

from mlhub-tutorials.

ashnair1 avatar ashnair1 commented on June 2, 2024

Just a follow up question. How can I check the structure of the Spacenet dataset in the API? I can't seem to find the labels.

Edit: I've observed a couple of things and wanted to know if it was intentional.

(Pdb) rc = requests.get('https://stac-api.radiant.earth/collections/sn2_AOI_3_Paris/items?limit=1000', headers=headers)
(Pdb) rc1 = requests.get('https://api.radiant.earth/mlhub/v1/collections/sn2_AOI_3_Paris/items?limit=1000', headers=headers)
(Pdb) rc1.json().keys()
*** json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(Pdb) rc.json().keys()
dict_keys(['type', 'stac_extensions', 'context', 'numberMatched', 'numberReturned', 'features', 'links'])
  1. The link https://api.radiant.earth/mlhub/v1/collections (which is provided in the notebook) doesn't seem to support large limits but https://stac-api.radiant.earth/collections does.
  2. Setting limit to low values such as 10, won't return labels as the initial records are images and instances of label only appear later on. Setting limit to 10, gives 10 images instead of giving 10 image-label pairs. This seems like a problem if you don't want to download the entire dataset and only want a subset. I suppose you could filter out source images, download labels and then download the imagery since the labels have link to the imagery but that would still involve iterating over the entire dataset.

from mlhub-tutorials.

ashnair1 avatar ashnair1 commented on June 2, 2024

I'm wondering why the Spacenet datasets alone are structured like this. The workflow specified in your tutorial makes a lot of sense but it only works if the source imagery and labels are separate. Grouping the imagery and labels into one collection without image-label pairing makes it harder to get a subset and makes the downloading of the entire dataset tedious. Of course, I might be missing something obvious wherein we could just query the labels from the dataset and download the images via the link field. If that's the case, please do let me know.

from mlhub-tutorials.

kbgg avatar kbgg commented on June 2, 2024

Hi Ashwin,

I pushed some fixes this morning to the API which should fix the issue with large limits on the API. Accessing the stac-api domain is not currently supported and it's only used for internal testing. The SpaceNet team created their first catalog for the SN2 challenge which included both labels and imagery in the same collection. When we created the catalogs for the rest of the challenges we kept the same format to keep things consistent as they would be using the catalogs we generated as well. For the SpaceNet dataset the best path really is to iterate through the dataset and determine which ones are labels and which are imagery.

Best,
Kevin

from mlhub-tutorials.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.