Coder Social home page Coder Social logo

apecs-org / polar-eo-database Goto Github PK

View Code? Open in Web Editor NEW
21.0 2.0 5.0 134 KB

Polar Earth Observation Database of satellite sensors

License: GNU General Public License v3.0

Python 100.00%
remote-sensing earth-observation satellite data-catalog antarctica arctic high-mountain-asia open-science

polar-eo-database's People

Contributors

adrienwehrle avatar allcontributors[bot] avatar azamattf avatar weiji14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

polar-eo-database's Issues

Consistency in New data entry template

Issue

The template has pre-defined list of options for multiple choice questions. When someone adds an answer not included there, Other should be selected and a desired answer is to be written. However, when answers of the "Other" kind get accumulated, they would get inconsistent, e.g. someone would write CryoSAT, another would put down Cryo-SAT, etc, leading to data inconsistency.

Solution

To make the answers consistent, can we automate this process? The desired automation would take the new Other-type answer and automatically include it in the pre-defined list of options . Thus, the next person submitting the form would see the updated list of options.

Document how to contribute new database entry via YAML issue template

Contributing a new satellite-sensor database entry just got easier with #33. Now let's document it in CONTRIBUTING.md!

Steps:

  1. Go to https://github.com/APECS-Earth-Observation/Polar-EO-Database/issues/new/choose and select 'New Database Entry'
  2. Provide instructions on filling up the form:

image

  1. Mention what comes afterwards
  • edits can be made to the submitted form/issue
  • the feedback/review process from repository maintainers
  • etc

TODO after full automation is completed in #35. This PR extends #9.

Fields need to be suggested

We currently test for the validity of fields in the JSON files and warn the contributor when standards are not met (which is great and needed). However, for the moment we don't propose those options to the contributor at the PR stage. Let's work on that and make it clearer!

  • One solution is to have comments in a template JSON file with the different options for each key so it could be used as either a reference for the update of existing files or a starting point for a new object!

Use YAML instead of JSON for database entries

  • JSON is nice, simple and low level, and for that latter reason it is data-only and therefore comments of the form //โ€ฆ or /*โ€ฆ*/ are not allowed. Which is a problem for us because we want the community to understand the entries and potentially contribute!

  • YAML is also nice and simple, but even more human readable than JSON. Comments are allowed, and there are no embedded brackets that can be scary for potential contributors not used to it. It is used a lot for metadata, an example here.

For those reasons, I think we should move to YAML. This sounds like a big thing, but is simply about saving data in another format (I can implement the change if we take that decision).

Also, I stepped back from #30 and closed it, as we first need to take that decision, then we can work again on the template in the format we'll choose! ๐Ÿ˜ƒ

[New]: Test template

Sensor name

Sentinel-10

Sensor type

Optical

Dataset level

L1 (raw)

Open Access

Yes

Data access platform

https:/thisisanexample.com

Regions covered

No response

Processing software

No response

Scientific application

No response

Parameters sought

No response

Other information

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Researcher-dependent VS researcher-independent information

Necessity

I think we should divide information we receive from the community into two:

  1. User-dependent information: Information about how researchers use the data
  • Scientific field
  • Physical variables derived
  • Software used
  • Region/object of study
  • etc
  1. User-independent information: All data related to tech specs of the data set/sensor.
  • Sensor name and type
  • Satellite active period
  • Temporal and regional coverage
  • Temporal and regional resolution
  • Data access platform
  • Data accessibility (open access/commercial)
  • etc.

Why is it important?

As we collect info about tech specs of the sensor and validate them, they are not subject to change, unless somebody spots a typo or a mistake. On the other hand, scientific application is user- or researcher-specific and one data set can have various applications.
Therefore, we can work with 2 types of files:

  1. [data_set_name]_techspecs - for one sensor we would have only one file (like we have currently)
  2. [data_set_name]_application_001, [data_set_name]_application_002 - for one sensor we would have multiple files for applications.

This separation would also help us:

  • aggregate the data on applications, such as CryoSAT: used in Glaciology (24 researchers), used to derive Ice Velocity (20 users), used in the study of the Arctic (62 users) and Antarctic (50 users).
  • collect information from the community more efficiently using two templates, one about tech specs and one for scientific applications.

In the end, the code would compile the third type of file for each dataset/sensor with all information - one file for each sensor, probably called [data_set_name]_index. All these index files can then be sent to Google Sheet.

What do you guys think?

[New]: Sentinel-42

Sensor name

Sentinel-42

Sensor type

Gravity-meter

Dataset level

L1 (raw)

Open Access

No response

Data access platform

example.com

Regions covered

No response

Processing software

No response

Scientific application

No response

Parameters sought

No response

Other information

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Align fields with STAC specification?

Just got some great ideas after attending this Cloud-Native Geospatial Event (https://schedule.cloudnativegeo.org) and learned a lot about Spatial Temporal Asset Catalogs (STAC) and OGC API standards! So the people working on those standards have literarlly spent years thinking about metadata, and these are some relevant ones that popped up from scanning through https://github.com/stac-extensions/stac-extensions.github.io:

It seems like there are some required fields/attributes which we could definitely ensure are present in our database. Using Sentinel-1 (a SAR satellite) as an example, frequency_band is required, and there's a list of Common Frequency Band Names to select from. See their example sentinel-1.json vs our current Sentinel-1.json

Document how to make a contribution to this repository

For anyone wanting to make a change to the database, we should have step by step instructions on how to make additions/deletions/changes to the JSON files

  • Make a screen recording of submitting a pull request (@AdrienWehrle)
  • Have a CONTRIBUTING.md text file documenting the step by step instructions (@weiji14) #14

TODO by next APECS Remote Sensing PG Meeting by April 11.

Check JSON fields in tests

At the moment, only a very basic file opening check is implemented. We should have a set of possible fields and make sure all the fields in the current JSON files, and potential proposed modifications, match this collection. So that JSON fields with valid syntax but strings that are not considered for the database would be caught.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.