apecs-org / polar-eo-database Goto Github PK

View Code? Open in Web Editor NEW

21.0 2.0 5.0 134 KB

Polar Earth Observation Database of satellite sensors

License: GNU General Public License v3.0

Python 100.00%

remote-sensing earth-observation satellite data-catalog antarctica arctic high-mountain-asia open-science

polar-eo-database's People

Contributors

Stargazers

Watchers

Forkers

adrienwehrle wdharcourt1 azamattf lgtm-migrator

polar-eo-database's Issues

Consistency in New data entry template

Issue

The template has pre-defined list of options for multiple choice questions. When someone adds an answer not included there, Other should be selected and a desired answer is to be written. However, when answers of the "Other" kind get accumulated, they would get inconsistent, e.g. someone would write CryoSAT, another would put down Cryo-SAT, etc, leading to data inconsistency.

Solution

To make the answers consistent, can we automate this process? The desired automation would take the new Other-type answer and automatically include it in the pre-defined list of options . Thus, the next person submitting the form would see the updated list of options.

Document how to contribute new database entry via YAML issue template

Contributing a new satellite-sensor database entry just got easier with #33. Now let's document it in CONTRIBUTING.md!

Steps:

Go to https://github.com/APECS-Earth-Observation/Polar-EO-Database/issues/new/choose and select 'New Database Entry'
Provide instructions on filling up the form:

Mention what comes afterwards

edits can be made to the submitted form/issue
the feedback/review process from repository maintainers
etc

TODO after full automation is completed in #35. This PR extends #9.

All contributors should be included, not only code contributors!

Everyone in the APECS remote sensing group and even beyond is contributing to this project, let's include them in the contributors following the all-contributors specification!

Fields need to be suggested

We currently test for the validity of fields in the JSON files and warn the contributor when standards are not met (which is great and needed). However, for the moment we don't propose those options to the contributor at the PR stage. Let's work on that and make it clearer!

One solution is to have comments in a template JSON file with the different options for each key so it could be used as either a reference for the update of existing files or a starting point for a new object!

Use YAML instead of JSON for database entries

JSON is nice, simple and low level, and for that latter reason it is data-only and therefore comments of the form //… or /*…*/ are not allowed. Which is a problem for us because we want the community to understand the entries and potentially contribute!
YAML is also nice and simple, but even more human readable than JSON. Comments are allowed, and there are no embedded brackets that can be scary for potential contributors not used to it. It is used a lot for metadata, an example here.

For those reasons, I think we should move to YAML. This sounds like a big thing, but is simply about saving data in another format (I can implement the change if we take that decision).

Also, I stepped back from #30 and closed it, as we first need to take that decision, then we can work again on the template in the format we'll choose! 😃

[New]: Test template

Sensor name

Sentinel-10

Sensor type

Optical

Dataset level

L1 (raw)

Open Access

Yes

Data access platform

https:/thisisanexample.com

Regions covered

No response

Processing software

No response

Scientific application

No response

Parameters sought

No response

Other information

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Researcher-dependent VS researcher-independent information

Necessity

I think we should divide information we receive from the community into two:

User-dependent information: Information about how researchers use the data

Scientific field
Physical variables derived
Software used
Region/object of study
etc

User-independent information: All data related to tech specs of the data set/sensor.

Sensor name and type
Satellite active period
Temporal and regional coverage
Temporal and regional resolution
Data access platform
Data accessibility (open access/commercial)
etc.

Why is it important?

As we collect info about tech specs of the sensor and validate them, they are not subject to change, unless somebody spots a typo or a mistake. On the other hand, scientific application is user- or researcher-specific and one data set can have various applications.
Therefore, we can work with 2 types of files:

[data_set_name]_techspecs - for one sensor we would have only one file (like we have currently)
[data_set_name]_application_001, [data_set_name]_application_002 - for one sensor we would have multiple files for applications.

This separation would also help us:

aggregate the data on applications, such as CryoSAT: used in Glaciology (24 researchers), used to derive Ice Velocity (20 users), used in the study of the Arctic (62 users) and Antarctic (50 users).
collect information from the community more efficiently using two templates, one about tech specs and one for scientific applications.

In the end, the code would compile the third type of file for each dataset/sensor with all information - one file for each sensor, probably called [data_set_name]_index. All these index files can then be sent to Google Sheet.

What do you guys think?

[New]: Sentinel-42

Sensor name

Sentinel-42

Sensor type

Gravity-meter

Dataset level

L1 (raw)

Open Access

No response

Data access platform

example.com

Regions covered

No response

Processing software

No response

Scientific application

No response

Parameters sought

No response

Other information

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Add all contributors

https://allcontributors.org/docs/en/emoji-key

Align fields with STAC specification?

Just got some great ideas after attending this Cloud-Native Geospatial Event (https://schedule.cloudnativegeo.org) and learned a lot about Spatial Temporal Asset Catalogs (STAC) and OGC API standards! So the people working on those standards have literarlly spent years thinking about metadata, and these are some relevant ones that popped up from scanning through https://github.com/stac-extensions/stac-extensions.github.io:

Satellite - https://github.com/stac-extensions/sat
Synthetic Aperture Radar (SAR) - https://github.com/stac-extensions/sar
Hyperspectral imagery - https://github.com/stac-extensions/hsi

It seems like there are some required fields/attributes which we could definitely ensure are present in our database. Using Sentinel-1 (a SAR satellite) as an example, frequency_band is required, and there's a list of Common Frequency Band Names to select from. See their example sentinel-1.json vs our current Sentinel-1.json

Bot parsing entries from issue template

Now that @weiji14 implemented the issue template for a new database entry, let's get a bot to create the associated YAML file in the database!

Useful resources

https://stackoverflow.com/questions/58597010/how-to-access-a-github-issue-comment-body-using-github-actions

Document how to make a contribution to this repository

For anyone wanting to make a change to the database, we should have step by step instructions on how to make additions/deletions/changes to the JSON files

Make a screen recording of submitting a pull request (@AdrienWehrle)
Have a CONTRIBUTING.md text file documenting the step by step instructions (@weiji14) #14

TODO by next APECS Remote Sensing PG Meeting by April 11.

Check JSON fields in tests

At the moment, only a very basic file opening check is implemented. We should have a set of possible fields and make sure all the fields in the current JSON files, and potential proposed modifications, match this collection. So that JSON fields with valid syntax but strings that are not considered for the database would be caught.

apecs-org / polar-eo-database Goto Github PK

polar-eo-database's People

Contributors

Stargazers

Watchers

Forkers

polar-eo-database's Issues

Issue

Solution

Sensor name

Sensor type

Dataset level

Open Access

Data access platform

Regions covered

Processing software

Scientific application

Parameters sought

Other information

Code of Conduct

Necessity

Why is it important?

Sensor name

Sensor type

Dataset level

Open Access

Data access platform

Regions covered

Processing software

Scientific application

Parameters sought

Other information

Code of Conduct

Useful resources

Recommend Projects

Recommend Topics

Recommend Org