radiantearth / mlhub-tutorials Goto Github PK

View Code? Open in Web Editor NEW

180.0 180.0 116.0 80.38 MB

Tutorials to access Radiant MLHub Training Datasets

Home Page: https://mlhub.earth/

License: Apache License 2.0

Jupyter Notebook 100.00%

mlhub-tutorials's People

Contributors

Stargazers

Watchers

Forkers

yonahbg hamedalemo bigdatamatta ofentsemodibedi billeerds amyflorida626 eddywine marccoru sherriewang ako1983 cameronbronstein anvlason mustafateke rbavery hieu1999210 coolsciguy massawe14 hamed225 ramcandrews multidream shivam11 kumar-abhijeet sudhirsilwal23 cuulee yangwao nmoghaddam sirilsam77 sandeepgadhwal jammy-bot muskanmahajan37 verrah d-misra tamara-glazer cogitae michaeldarcy hisham32 1lee1338 agnim25 deanjgoldman shruthiram324 komalaftab ebsano jptrinastic rohitmishr1484 morandiaye seongl leombastos dnzengou elisandrahfonseca itohanosa sunleaverjwj abdelatifbous ntag1618 manish007700 dric2018 aminouad lami-genius marrion65 nwabzi adeyinka-hub chandrashekhar1227-ml federicoabait leotafadzwa sambonuruddeen yayradesouza ankkod pinkychow1010 musaho aashish24 rao-monu faatimahm oumayma-rjab tunmiseraji russmain hammed103 rdeggau kec1510 oddcard2 opsy-girl burgerbecker blessingnwonu olabodejames gedeon-m-gedus espoirgk serg132003 sajadkawa galimwavu wandesky usamahn nancynigam transega laohu666 davidchoi76 kathrynberger gr3gorycode ethanbg-01 ariewahyu kennsmithds emmanuel-kipngetich luciananieto

mlhub-tutorials's Issues

Downloaded labels are missing correct spatial properties

If one follows the cv4a-crop-challenge-download-data.ipynb notebook and downloads the labels (that is 2_label.tif, 1_label.tif, 3_label.tif and 0_label.tif) for some reason these do not have any spatial features i.e. the CRS is missing and the extent is defined as:

0.0000000000000000,-3035.0000000000000000 : 2016.0000000000000000,0.0000000000000000

See QGIS properties:

Is there a way to correctly download this data so that there correct spatial information is preserved? This would be awesome so that the data can be used in conjunction with other sources.

Thanks

is there any video tutorial to explain the code please

do you have video to explain the code

Spacenet dataset download problem

Using the tutorial notebook provided for downloading the bigearth dataset, I tried downloading the spacenet 3 dataset. It failed due to the download label and images function.
Downloading the label was solved by converting line labels = item.get('assets').get('labels') to labels = item.get('assets').get('label') in def download_source_and_labels().

However the script still fails while downloading the image. Looking at the request output, it seems it was due to the image not being found.

link = {'href': 'https://api.radiant.earth/mlhub/v1/collections/sn3_AOI_3_Paris/items/SN3_roads_train_AOI_3_Paris_PS-RGB_img413', 'rel': 'source', 'type': 'application/json'}

r = requests.get(link['href'], headers=headers)

print(r.json())
# Output = {'code': 404, 'message': 'Item not found'}

So is the data not available? Also regarding the switch from labels to label, does each dataset have different fields?

Edit: Just tried it on other Spacenet datasets and the problem appears there as well.

Failed to download Assets

Hi,
I have encountered the following error while executing download_labels_and_source(item, assets=['labels', 'B02', 'B03', 'B04']) on your radiant-mlhub-landcovernet notebook.

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_30024/970194576.py in <module>
      4 )
      5 for item in items:
----> 6     download_labels_and_source(item, assets=['labels', 'B02', 'B03', 'B04'])

/tmp/ipykernel_30024/1593371380.py in download_labels_and_source(item, assets, output_dir)
    126 
    127     with ThreadPoolExecutor(max_workers=16) as executor:
--> 128         for argument_batch in executor.map(_get_download_args, source_links):
    129             download_args += argument_batch
    130 

/opt/conda/lib/python3.8/concurrent/futures/_base.py in result_iterator()
    617                     # Careful not to keep a reference to the popped future
    618                     if timeout is None:
--> 619                         yield fs.pop().result()
    620                     else:
    621                         yield fs.pop().result(end_time - time.monotonic())

/opt/conda/lib/python3.8/concurrent/futures/_base.py in result(self, timeout)
    435                     raise CancelledError()
    436                 elif self._state == FINISHED:
--> 437                     return self.__get_result()
    438 
    439                 self._condition.wait(timeout)

/opt/conda/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
    387         if self._exception:
    388             try:
--> 389                 raise self._exception
    390             finally:
    391                 # Break a reference cycle with the exception in self._exception

/opt/conda/lib/python3.8/concurrent/futures/thread.py in run(self)
     55 
     56         try:
---> 57             result = self.fn(*self.args, **self.kwargs)
     58         except BaseException as exc:
     59             self.future.set_exception(exc)

/tmp/ipykernel_30024/1593371380.py in _get_download_args(link)
     88         # Get the item ID (last part of the link path)
     89         source_item_path = urllib.parse.urlsplit(link['href']).path
---> 90         source_item_collection, source_item_id = items_pattern.fullmatch(source_item_path).groups()
     91         source_item = client.get_collection_item(source_item_collection, source_item_id)
     92 

AttributeError: 'NoneType' object has no attribute 'groups'

limit parameter value to download all data

Hi, I have noticed that setting a value too large for the "limit" parameter leads to a Response [503] error.

    r = requests.get(f'{API_BASE}/collections/{collectionId}/items?key={API_KEY}&limit=%d' % i)

Setting a small value obsiously leads to getting only part of the data. What would be a way to download all data? Also what would be a way of finding the amount of data samples in a given dataset?

Many thanks,
Michael

Error while downloading the LandCoverNet dataset

Hello, when I use the codes to download the LandCoverNet dataset, whatever the code to download part of the dataset or the whole one, the following error exists:

JSONDecodeError                           Traceback (most recent call last)

<ipython-input-12-e8676fe95e6b> in <module>
      3                         classes=['Woody Vegetation'],
      4                         max_items_downloaded=10,
----> 5                         downloads=[])
      6 for d in tqdm(to_download):
      7     p.map(download, d)

<ipython-input-7-4b65ded3d0ee> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
     82     print('Loading', uri, '...')
     83     r = requests.get(uri, params={'key': API_KEY})
---> 84     collection = r.json()
     85     for feature in collection.get('features', []):
     86         # Check if the item has one of the label classes we're interested in

D:\Users\Zhwl\Anaconda3\envs\Pytorch36\lib\site-packages\requests\models.py in json(self, **kwargs)
    898                     # used.
    899                     pass
--> 900         return complexjson.loads(self.text, **kwargs)
    901 
    902     @property

D:\Users\Zhwl\Anaconda3\envs\Pytorch36\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352             parse_int is None and parse_float is None and
    353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
    355     if cls is None:
    356         cls = JSONDecoder

D:\Users\Zhwl\Anaconda3\envs\Pytorch36\lib\json\decoder.py in decode(self, s, _w)
    337 
    338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    340         end = _w(s, end).end()
    341         if end != len(s):

D:\Users\Zhwl\Anaconda3\envs\Pytorch36\lib\json\decoder.py in raw_decode(self, s, idx)
    353         """
    354         try:
--> 355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
    357             raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting ',' delimiter: line 22581 column 15 (char 1000568)

LandCoverNet dataset is not getting downloaded

HI, I'm trying to download the LandCoverNet dataset with the code provided in the tutorials but whenever I run the get_items() function in the notebook with argument max_items_downloaded more than or equal to 500 I got the following error: JSONDecodeError: Expecting value: line 1 column 1 (char 0). I checked the API documentation and even if I run the following code:

r = requests.get(f'{API_BASE}/collections/{COLLECTION_ID}/items', headers=headers, 
                 params={"limit":1000})
collection = r.json()
source_items = []
for feature in collection.get('features', []):
    # Check if the item has one of the label classes we're interested in
    labels = feature.get('assets').get('labels')
    links = feature.get('links')
    for link in links:
        if link['rel'] != 'source':
            continue
        source_items.append(link['href'])

I get the same error. This is the entire error log:

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-37-5efc7441cdc0> in <module>
      1 r = requests.get(f'{API_BASE}/collections/{COLLECTION_ID}/items', headers=headers, 
      2                  params={"limit":1000})
----> 3 collection = r.json()
      4 source_items = []
      5 for feature in collection.get('features', []):

~/anaconda3/envs/my_env/lib/python3.7/site-packages/requests/models.py in json(self, **kwargs)
    896                     # used.
    897                     pass
--> 898         return complexjson.loads(self.text, **kwargs)
    899 
    900     @property

~/anaconda3/envs/my_env/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

~/anaconda3/envs/my_env/lib/python3.7/json/decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

~/anaconda3/envs/my_env/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Can someone please look into it and check what's going on? The code is running just fine if I run the notebook with limit less than or equals to 400.

Ran last chunk of radiant-mlhub-api-know-how.ipynb but get error 403: Forbidden

I am going through the how to jupyter notebook. I created a general secret in AWS and try to run the last chunk of code. I have removed my access key and secret key. However, I get an error 403: forbidden error when trying to access the sentinel image data. Can radiantearth only access these images?

`import boto3
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_KEY = ''

def download_s3_file(url, access_key, secret_key):
parsed_url = urlparse(url)

bucket = parsed_url.hostname.split('.')[0]
path = parsed_url.path[1:]
filename = path.split('/')[-1]

s3 = boto3.client(
    's3',
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_KEY
)

s3.download_file(bucket, path, filename, ExtraArgs={'RequestPayer': 'requester'})
print(f'Downloaded s3://{bucket}/{path}')`

Error below:
`---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
in ()
1 true_color_asset_url = get_download_url(selected_item, '2019_07_31_tci', headers)
----> 2 download_s3_file(true_color_asset_url, AWS_ACCESS_KEY_ID, AWS_SECRET_KEY)

in download_s3_file(url, access_key, secret_key)
16 )
17
---> 18 s3.download_file(bucket, path, filename, ExtraArgs={'RequestPayer': 'requester'})
19 print(f'Downloaded s3://{bucket}/{path}')

~\Anaconda2\envs\py36\lib\site-packages\boto3\s3\inject.py in download_file(self, Bucket, Key, Filename, ExtraArgs, Callback, Config)
170 return transfer.download_file(
171 bucket=Bucket, key=Key, filename=Filename,
--> 172 extra_args=ExtraArgs, callback=Callback)
173
174

~\Anaconda2\envs\py36\lib\site-packages\boto3\s3\transfer.py in download_file(self, bucket, key, filename, extra_args, callback)
305 bucket, key, filename, extra_args, subscribers)
306 try:
--> 307 future.result()
308 # This is for backwards compatibility where when retries are
309 # exceeded we need to throw the same error from boto3 instead of

~\Anaconda2\envs\py36\lib\site-packages\s3transfer\futures.py in result(self)
104 # however if a KeyboardInterrupt is raised we want want to exit
105 # out of this and propogate the exception.
--> 106 return self._coordinator.result()
107 except KeyboardInterrupt as e:
108 self.cancel()

~\Anaconda2\envs\py36\lib\site-packages\s3transfer\futures.py in result(self)
263 # final result.
264 if self._exception:
--> 265 raise self._exception
266 return self._result
267

~\Anaconda2\envs\py36\lib\site-packages\s3transfer\tasks.py in _main(self, transfer_future, **kwargs)
253 # Call the submit method to start submitting tasks to execute the
254 # transfer.
--> 255 self._submit(transfer_future=transfer_future, **kwargs)
256 except BaseException as e:
257 # If there was an exception raised during the submission of task

~\Anaconda2\envs\py36\lib\site-packages\s3transfer\download.py in _submit(self, client, config, osutil, request_executor, io_executor, transfer_future, bandwidth_limiter)
341 Bucket=transfer_future.meta.call_args.bucket,
342 Key=transfer_future.meta.call_args.key,
--> 343 **transfer_future.meta.call_args.extra_args
344 )
345 transfer_future.meta.provide_transfer_size(

~\Anaconda2\envs\py36\lib\site-packages\botocore\client.py in _api_call(self, *args, **kwargs)
274 "%s() only accepts keyword arguments." % py_operation_name)
275 # The "self" in this scope is referring to the BaseClient.
--> 276 return self._make_api_call(operation_name, kwargs)
277
278 _api_call.name = str(py_operation_name)

~\Anaconda2\envs\py36\lib\site-packages\botocore\client.py in _make_api_call(self, operation_name, api_params)
584 error_code = parsed_response.get("Error", {}).get("Code")
585 error_class = self.exceptions.from_code(error_code)
--> 586 raise error_class(parsed_response, operation_name)
587 else:
588 return parsed_response

ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden`

Process Pool does not work in jupyter notebook for windows users

Error while downloading the LandCoverNet dataset

Hello, I know that an issue was already raised with this error but it has not been resolved, I have been trying during several days and the result is the same. I copied and run the code that is indicated in the tutorial to download the full dataset:

to_download = get_items(f'{API_BASE}/collections/{COLLECTION_ID}/items?limit=100', downloads=[])
print('Downloading Assets')
for d in tqdm(to_download):
  p.map(download, d)

Error:

<ipython-input-13-51a983007398> in <module>()
      1 p = ThreadPool(20)
----> 2 to_download = get_items(f'{API_BASE}/collections/{COLLECTION_ID}/items?limit=100', downloads=[])
      3 print('Downloading Assets')
      4 for d in tqdm(to_download):
      5   p.map(download, d)

18 frames
<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
    109     for link in collection.get('links', []):
    110         if link['rel'] == 'next' and link['href'] is not None:
--> 111             get_items(link['href'], classes=classes, max_items_downloaded=max_items_downloaded, items_downloaded=items_downloaded, downloads=downloads)
    112 
    113     return downloads

<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
    109     for link in collection.get('links', []):
    110         if link['rel'] == 'next' and link['href'] is not None:
--> 111             get_items(link['href'], classes=classes, max_items_downloaded=max_items_downloaded, items_downloaded=items_downloaded, downloads=downloads)
    112 
    113     return downloads

<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
    109     for link in collection.get('links', []):
    110         if link['rel'] == 'next' and link['href'] is not None:
--> 111             get_items(link['href'], classes=classes, max_items_downloaded=max_items_downloaded, items_downloaded=items_downloaded, downloads=downloads)
    112 
    113     return downloads

<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
    109     for link in collection.get('links', []):
    110         if link['rel'] == 'next' and link['href'] is not None:
--> 111             get_items(link['href'], classes=classes, max_items_downloaded=max_items_downloaded, items_downloaded=items_downloaded, downloads=downloads)
    112 
    113     return downloads

<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
    109     for link in collection.get('links', []):
    110         if link['rel'] == 'next' and link['href'] is not None:
--> 111             get_items(link['href'], classes=classes, max_items_downloaded=max_items_downloaded, items_downloaded=items_downloaded, downloads=downloads)
    112 
    113     return downloads

<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
    109     for link in collection.get('links', []):
    110         if link['rel'] == 'next' and link['href'] is not None:
--> 111             get_items(link['href'], classes=classes, max_items_downloaded=max_items_downloaded, items_downloaded=items_downloaded, downloads=downloads)
    112 
    113     return downloads

<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
    109     for link in collection.get('links', []):
    110         if link['rel'] == 'next' and link['href'] is not None:
--> 111             get_items(link['href'], classes=classes, max_items_downloaded=max_items_downloaded, items_downloaded=items_downloaded, downloads=downloads)
    112 
    113     return downloads

<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
    109     for link in collection.get('links', []):
    110         if link['rel'] == 'next' and link['href'] is not None:
--> 111             get_items(link['href'], classes=classes, max_items_downloaded=max_items_downloaded, items_downloaded=items_downloaded, downloads=downloads)
    112 
    113     return downloads

<ipython-input-9-0c7f41f99dcf> in get_items(uri, classes, max_items_downloaded, items_downloaded, downloads)
     99         print('Getting Source Imagery Assets for', feature['id'])
    100         # Download the label and source imagery for the item
--> 101         downloads.extend(download_source_and_labels(feature))
    102 
    103         # Stop downloaded items if we reached the maximum we specify

<ipython-input-9-0c7f41f99dcf> in download_source_and_labels(item)
     74         source_items.append((path, link['href']))
     75 
---> 76     results = p.map(get_source_item_assets, source_items)
     77     results.append([(labels['href'], path)])
     78 

/usr/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

/usr/lib/python3.6/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    117         job, i, func, args, kwds = task
    118         try:
--> 119             result = (True, func(*args, **kwds))
    120         except Exception as e:
    121             if wrap_exception and func is not _helper_reraises_exception:

/usr/lib/python3.6/multiprocessing/pool.py in mapstar(args)
     42 
     43 def mapstar(args):
---> 44     return list(map(*args))
     45 
     46 def starmapstar(args):

<ipython-input-9-0c7f41f99dcf> in get_source_item_assets(args)
     47         print('ERROR: Could Not Load', href)
     48         return []
---> 49     dt = arrow.get(r.json()['properties']['datetime']).format('YYYY_MM_DD')
     50     asset_path = os.path.join(path, dt)
     51     if not os.path.exists(asset_path):

/usr/local/lib/python3.6/dist-packages/requests/models.py in json(self, **kwargs)
    896                     # used.
    897                     pass
--> 898         return complexjson.loads(self.text, **kwargs)
    899 
    900     @property

/usr/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352             parse_int is None and parse_float is None and
    353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
    355     if cls is None:
    356         cls = JSONDecoder

/usr/lib/python3.6/json/decoder.py in decode(self, s, _w)
    337 
    338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    340         end = _w(s, end).end()
    341         if end != len(s):

/usr/lib/python3.6/json/decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Metadata error with BigEarthNet

Hi!

I was just following along the tutorial to download images and metadata from the BigEarthNet dataset, everything went smoothly, until I checked the metadata, and for all the tiles I downloaded it looked like this:

{"id": "", "type": "Feature", 
"properties": {"labels": ["Coniferous forest", "Mixed forest", "Transitional woodland/shrub"],
"datetime": "2017-06-13T10:10:32+0000", "seasonal_snow": false, "cloud_and_shadow": false},
"geometry": {"type": "Polygon", 
"coordinates": [[[23.05164842666464, 63.204647552556814], [23.075490783348744, 63.2043013847583], 
[71.4485051643846, 3.470929676796522], [23.050886335283472, 63.19388441898293], 
[23.05164842666464, 63.204647552556814]]]}}

The docs for BigEarthNet say that the coordinates are given for the upper left and lower right corners of the tile, but here I have a polygon with 5 coordinate pairs, and moreover, the lower right corner is displaced by a huge amount, as you can see in this picture, as compared to one of the raster bands. For measure, the tile is around Finland, and the stray lower right corner is in the middle of the ocean, near India.

Any idea of why this is happening?

Thanks,

NoCredentialsError: Unable to locate credentials

Following your excellent API notebook. The last cell gives me the following error

NoCredentialsError: Unable to locate credentials

Same error with different (valid) arguments

Is there something obvious I am missing?

Python 3.7.6 on Ubuntu 20.04

Thanks

LandCoverNet NA - ValidationError

Hello, I am having some issues when I try to download the LandCoverNet North America dataset. I have been following along with the LandCoverNet tutorial but keep hitting the same issue.

Environment:
Ubuntu 18.04.6
Python 3.8.0
mlhub 0.5.2

My Code:

import os
from radiant_mlhub import Dataset

os.environ['MLHUB_API_KEY'] = 'apikey'

dataset = Dataset.fetch('ref_landcovernet_na_v1')

print(f'Title: {dataset.title}')
print(f'DOI: {dataset.doi}')
print(f'Citation: {dataset.citation}')
print('\nCollection IDs and License:')
for collection in dataset.collections:
    print(f'    {collection.id} - {collection.license}')

dataset.download()
ref_landcovernet_na_v1: fetch stac catalog: 89932KB [00:13, 6556.94KB/s]        
unarchive ref_landcovernet_na_v1.tar.gz: 100%|█| 562974/562974 [00:50<00:00, 112

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In [3], line 1
----> 1 dataset.download()

File ~/.local/lib/python3.8/site-packages/radiant_mlhub/models/dataset.py:361, in Dataset.download(self, output_dir, catalog_only, if_exists, api_key, profile, bbox, intersects, datetime, collection_filter)
    347 config = CatalogDownloaderConfig(
    348     catalog_only=catalog_only,
    349     api_key=api_key,
   (...)
    358     temporal_query=datetime,
    359 )
    360 dl = CatalogDownloader(config=config)
--> 361 dl()

File ~/.local/lib/python3.8/site-packages/radiant_mlhub/client/catalog_downloader.py:740, in CatalogDownloader.__call__(self)
    738 # call each step
    739 for step in steps:
--> 740     step()
    742 # inspect the error report
    743 self.err_report.flush()

File ~/.local/lib/python3.8/site-packages/radiant_mlhub/client/catalog_downloader.py:282, in CatalogDownloader._create_asset_list_step(self)
    280             _handle_collection(stac_item)
    281         else:
--> 282             _handle_item(stac_item)
    283 log.info(f'{self._fetch_unfiltered_count()} unique assets in stac catalog.')

File ~/.local/lib/python3.8/site-packages/radiant_mlhub/client/catalog_downloader.py:233, in CatalogDownloader._create_asset_list_step.<locals>._handle_item(stac_item)
    231 n = 0
    232 for k, v in assets.items():
--> 233     rec = AssetRecord(
    234         collection_id=stac_item['collection'],
    235         item_id=item_id,
    236         asset_key=k,
    237         common_asset=k in COMMON_ASSET_NAMES,
    238         asset_url=v['href'],
    239         bbox_json=json.dumps(bbox) if bbox else None,
    240         geometry_json=json.dumps(geometry) if geometry else None,
    241         single_datetime=props.get('datetime', None),
    242         start_datetime=common_meta.get('start_datetime', None),
    243         end_datetime=common_meta.get('end_datetime', None),
    244     )
    245     asset_save_path = _asset_save_path(rec).relative_to(self.work_dir)
    246     rec.asset_save_path = str(asset_save_path)

File ~/.local/lib/python3.8/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for AssetRecord
single_datetime
  invalid type; expected datetime, string, bytes, int or float (type=type_error)

Any help would be appreciated, thank you.

Missing Labels for South Africa Crop Type Competition

Hi,

I downloaded the South Africa Crop Type Competition dataset for research purpose, but I can't find the corresponding ground truth information in TEST set.

Specifically, there are two TIF files in TRAIN set for each parcel : field_ids.tif and labels.tif. but corresponding labels.tif is missing for each parcel in TEST set. I know the dataset was uploaded to MLHub before the end date of two related competitions :spot-the-crop-xl-challenge and spot-the-crop-challenge, when the ground truth for TEST set should not be revealed.

Now, both related competitions were ended for almost one year. I wonder if the corresponding ground truth could be available on MLHub.

Reply would be appreciated, Thanks.

LandCoverNet - Downloading Europe

Hello👋

I'm trying to download data from this dataset - https://mlhub.earth/data/ref_landcovernet_eu_v1

I follow this tutorial Accessing LandCoverNet through the Radiant MLHub API, however, found some problems with access

Here my code and errors:

import os
from radiant_mlhub import Dataset

os.environ['MLHUB_API_KEY'] = '<my own api key>'

dataset = Dataset.fetch('ref_landcovernet_eu_v1')

print(f'Title: {dataset.title}')
print(f'DOI: {dataset.doi}')
print(f'Citation: {dataset.citation}')
print('\nCollection IDs and License:')
for collection in dataset.collections:
    print(f'    {collection.id} - {collection.license}')


os.makedirs('./dataset_eu')

dataset.download(output_dir='./dataset_eu')

Error message

---------------------------------------------------------------------------

EntityDoesNotExist                        Traceback (most recent call last)

[<ipython-input-22-0209bdb52a27>](https://localhost:8080/#) in <module>()
----> 1 dataset.download(output_dir='./dataset_eu')

3 frames

[/usr/local/lib/python3.7/dist-packages/radiant_mlhub/client/datasets.py](https://localhost:8080/#) in download_archive(archive_id, output_dir, if_exists, api_key, profile)
    369         if e.response.status_code == 404:
    370             raise EntityDoesNotExist(
--> 371                 f'Archive "{archive_id}" does not exist and may still be generating. Please try again later.') from None
    372         raise MLHubException(f'An unknown error occurred: {e.response.status_code} ({e.response.reason})')

EntityDoesNotExist: Archive "ref_landcovernet_eu_v1_source_sentinel_2" does not exist and may still be generating. Please try again later.

P.S. With African dataset everything is ok, maybe some serverside problems?

Thank you! 💜

Mikhail Gasanov

MlModel not working as expected

Hey there! Hoping someone can help me troubleshoot. I've got an api key and can successfully replicate steps to view data sets, but to see the MLmodels I keep getting an error:

from radiant_mlhub import MLModel
models = MLModel.list()
first_model = models[0]

TypeError: init() got an unexpected keyword argument 'assets'

However I can successfully do this:

models = list_models()
first_model = models[0]

and this:

from radiant_mlhub.client import get_model_by_id
model = get_model_by_id('model_ramp_baseline_v1')
model.keys()

But not this:
from radiant_mlhub import MLModel model = MLModel.fetch('model_ramp_baseline_v1')

TypeError Traceback (most recent call last)
/home/lauren/Projects/notebooks/download_weights.ipynb Cell 9 in <cell line: 2>()
1 from radiant_mlhub import MLModel
----> 2 model = MLModel.fetch(ramp_model)

File ~/miniconda3/envs/deep-learning/lib/python3.9/site-packages/radiant_mlhub/models/ml_model.py:74, in MLModel.fetch(cls, model_id, api_key, profile)
57 """Fetches a :class:MLModel instance by id.
58
59 Parameters
(...)
71 model : MLModel
72 """
73 d = client.get_model_by_id(model_id, api_key=api_key, profile=profile)
---> 74 return cls.from_dict(d, api_key=api_key, profile=profile)

File ~/miniconda3/envs/deep-learning/lib/python3.9/site-packages/radiant_mlhub/models/ml_model.py:113, in MLModel.from_dict(cls, d, href, root, migrate, preserve_dict, api_key, profile)
99 @classmethod
100 def from_dict(
101 cls,
(...)
109 profile: Optional[str] = None
110 ) -> MLModel:
111 """Patches the :meth:pystac.Item.from_dict method so that it returns the calling
112 class instead of always returning a :class:pystac.Item instance."""
...
477 )
479 has_self_link = False
480 for link in links:

TypeError: init() got an unexpected keyword argument 'assets'