Using the tutorial notebook provided for downloading the bigearth dataset, I tried dow

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

That would be a <a href="https://github.com/radiantearth/stac-spec/tree/master/collect

Spacenet dataset download problem about mlhub-tutorials HOT 11 CLOSED

radiantearth commented on June 2, 2024

Spacenet dataset download problem

from mlhub-tutorials.

Comments (11)

kbgg commented on June 2, 2024 1

That's correct. Each dataset is slightly different (i.e. vector/raster labels, imagery bands) so there is no "one size fits all" notebook. We don't have an example notebook for the SpaceNet dataset so your best bet would be to explore how the SpaceNet datasets are structured in our API and repurpose an existing notebook.

from mlhub-tutorials.

kbgg commented on June 2, 2024

Hi Ashwin,

You shouldn't be receiving that error, I'm looking into it and will get back to you shortly.

Best,
Kevin

from mlhub-tutorials.

kbgg commented on June 2, 2024

@ashnair1 This should now be resolved. The links to the source imagery were pointing to an invalid item ID and the 'label' asset really should have had the 'labels' key instead. Both of these issues have been fixed.

from mlhub-tutorials.

ashnair1 commented on June 2, 2024

The image issue seems to be resolved. But I can't seem to access labels.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-1830123587e0> in <module>
----> 1 get_items(f'https://api.radiant.earth/mlhub/v1/collections/{collectionId}/items?limit={limit}', max_items_downloaded=1)

<ipython-input-4-7e48d49c1897> in get_items(uri, classes, cloud_and_shadow, seasonal_snow, max_items_downloaded, items_downloaded)
     88 
     89         # Download the label and source imagery for the item
---> 90         download_source_and_labels(feature)
     91 
     92         # Stop downloaded items if we reached the maximum we specify

<ipython-input-4-7e48d49c1897> in download_source_and_labels(item)
     38     import pdb
     39     pdb.set_trace()
---> 40     labels = item.get('assets').get('labels')
     41     links = item.get('links')
     42 

TypeError: 'NoneType' object is not subscriptable

It seems item doesn't have a labels field,

ipdb> item.get('assets')
{'MS': {'href': 'https://api.radiant.earth/mlhub/v1/download/cd9b80a8ba7a6e2c05f8420a11caa583dac46ff7ca02030e23a4ccafc4527b4f', 'title': 'MS-geotiff', 'type': 'image/tiff; application=geotiff'}, 'PAN': {'href': 'https://api.radiant.earth/mlhub/v1/download/6991b88e54cf4f1d95779d9ca07d0cd5696a64cd2f01c06c6ebeccfecc6f71a9', 'title': 'PAN-geotiff', 'type': 'image/tiff; application=geotiff'}, 'PS-MS': {'href': 'https://api.radiant.earth/mlhub/v1/download/0e2d142874a26ee67e36ec048c6ef75ada84fe86f0c23dc8cb99bf631611a9dd', 'title': 'PS-MS-geotiff', 'type': 'image/tiff; application=geotiff'}, 'PS-RGB': {'href': 'https://api.radiant.earth/mlhub/v1/download/75563ba9260f85f3496c7391719c34c0ab2e371e05b047f46f00a5bc8179fc1d', 'title': 'PS-RGB-geotiff', 'type': 'image/tiff; application=geotiff'}}

from mlhub-tutorials.

kbgg commented on June 2, 2024

It looks like you're using the BigEarthNet notebook. In all of the datasets except SpaceNet we've separated source imagery items and label items into different collections. Source imagery items will not have a labels asset. Since the BigEarthNet notebook expects the source imagery and labels to be in separate collections it's erroring out when it reaches a source imagery item instead of a label item. You can add an additional check if there's a labels asset and if not skip the item.

from mlhub-tutorials.

ashnair1 commented on June 2, 2024

Adding the check for label, I was able to download the MS, PAN, PS-MS and PS-RGB versions of img64. But then a record appeared that had no assets field which is as follows:

{'description': 'SpaceNet 2 Khartoum Chipped Training Dataset', 'extent': {'spatial': {'bbox': [[32.4858384, 15.5138111999, 32.5665684, 15.7402062]]}, 'temporal': {'interval': [['2015-04-13T00:00:00Z', None]]}}, 'id': 'sn2_AOI_5_Khartoum', 'license': 'CC-BY-SA-4.0', 'links': [{'href': 'https://api.radiant.earth/mlhub/v1/collections/sn2_AOI_5_Khartoum', 'rel': 'self'}, {'href': 'https://api.radiant.earth/mlhub/v1/', 'rel': 'parent'}, {'href': 'https://api.radiant.earth/mlhub/v1/', 'rel': 'root'}, {'href': 'https://api.radiant.earth/mlhub/v1/collections/sn2_AOI_5_Khartoum/items', 'rel': 'items'}], 'properties': {'license': 'CC-BY-SA-4.0', 'providers': [{'name': 'SpaceNet LLC', 'roles': ['processor', 'host', 'licensor', 'producer'], 'url': 'https://api.radiant.earth/mlhub/v1/download/017ab8ab69ffa44271d32452fe85eab079ac53ef3370b2eed74e2e87769eae57'}]}, 'providers': [{'name': 'SpaceNet LLC', 'roles': ['processor', 'host', 'licensor', 'producer'], 'url': 'https://api.radiant.earth/mlhub/v1/download/078e2ee114866281d8d728c610d8cce8b3780edb1f7e010c49a5e20776c636ee'}], 'stac_extensions': ['label'], 'version': 1}

Could you elaborate on what this record is for?

from mlhub-tutorials.

kbgg commented on June 2, 2024

That would be a STAC Collection record which doesn't have assets. I'm not sure how your script navigated to that page but links with the rel type "parent" or "collection" in an item will link to that item's collection

from mlhub-tutorials.

ashnair1 commented on June 2, 2024

Right. Just to clarify, I'm trying to re-purpose the download code from the BigEarthNet notebook to download the Spacenet datasets. However it seems to me that it might not be as simple as just replacing collectionID in the notebook (from bigearthnet_v1_labels to sn2_AOI_3_Paris as I originally thought.

As a side note, do you have any examples of using the api to download the SpaceNet datasets? That would really be helpful since it differs from the other datasets.

from mlhub-tutorials.

ashnair1 commented on June 2, 2024

Just a follow up question. How can I check the structure of the Spacenet dataset in the API? I can't seem to find the labels.

Edit: I've observed a couple of things and wanted to know if it was intentional.

(Pdb) rc = requests.get('https://stac-api.radiant.earth/collections/sn2_AOI_3_Paris/items?limit=1000', headers=headers)
(Pdb) rc1 = requests.get('https://api.radiant.earth/mlhub/v1/collections/sn2_AOI_3_Paris/items?limit=1000', headers=headers)
(Pdb) rc1.json().keys()
*** json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(Pdb) rc.json().keys()
dict_keys(['type', 'stac_extensions', 'context', 'numberMatched', 'numberReturned', 'features', 'links'])

The link https://api.radiant.earth/mlhub/v1/collections (which is provided in the notebook) doesn't seem to support large limits but https://stac-api.radiant.earth/collections does.
Setting limit to low values such as 10, won't return labels as the initial records are images and instances of label only appear later on. Setting limit to 10, gives 10 images instead of giving 10 image-label pairs. This seems like a problem if you don't want to download the entire dataset and only want a subset. I suppose you could filter out source images, download labels and then download the imagery since the labels have link to the imagery but that would still involve iterating over the entire dataset.

from mlhub-tutorials.

ashnair1 commented on June 2, 2024

I'm wondering why the Spacenet datasets alone are structured like this. The workflow specified in your tutorial makes a lot of sense but it only works if the source imagery and labels are separate. Grouping the imagery and labels into one collection without image-label pairing makes it harder to get a subset and makes the downloading of the entire dataset tedious. Of course, I might be missing something obvious wherein we could just query the labels from the dataset and download the images via the link field. If that's the case, please do let me know.

from mlhub-tutorials.

kbgg commented on June 2, 2024

Hi Ashwin,

I pushed some fixes this morning to the API which should fix the issue with large limits on the API. Accessing the stac-api domain is not currently supported and it's only used for internal testing. The SpaceNet team created their first catalog for the SN2 challenge which included both labels and imagery in the same collection. When we created the catalogs for the rest of the challenges we kept the same format to keep things consistent as they would be using the catalogs we generated as well. For the SpaceNet dataset the best path really is to iterate through the dataset and determine which ones are labels and which are imagery.

Best,
Kevin

from mlhub-tutorials.

Spacenet dataset download problem about mlhub-tutorials HOT 11 CLOSED

Comments (11)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent