magda-io / magda-csw-connector Goto Github PK
View Code? Open in Web Editor NEWA Magda connector for CSW data sources
License: Other
A Magda connector for CSW data sources
License: Other
For some CSW servers, we are unable to generate org name with source metadata id.
And we will have to create id like:
This might cause issues to the registry as they look same case-insensitively.
Some relevant errors:
Failed to PUT data registry record with ID "org-aodn-Australian Ocean Data Centre Joint Facility". 7 retries left. Status code: 500, body:
ERROR] [06/02/2023 04:34:52.609] [[registry-api-akka.actor.default-dispatcher-3](http://registry-api-akka.actor.default-dispatcher-3/)] [RecordsService(akka://registry-api)] Encountered an exception when putting a reco │
│ rd │
│ scalikejdbc.TooManyRowsException
Is your feature request related to a problem? Please describe.
Some GeoNetwork servers support newer output schema: http://standards.iso.org/iso/19115/-3/mdb/2.0
e.g. aodn:
Our default output schema used in the config: http://www.isotc211.org/2005/gmd
works. However, some key fields (e.g. mdb:metadataLinkage
) won't be available.
We need to:
http://standards.iso.org/iso/19115/-3/mdb/2.0
Describe the bug
At the moment I'm noticing that all the following connectors are going very slowly because they're creating datasets that the registry doesn't think are valid:
connector-actmapi-1584194400-9qzjk 1/1 Running 0 6h35m
connector-aims-1584194400-8wr9b 1/1 Running 0 6h35m
connector-aodn-1584194400-4gknb 1/1 Running 0 6h35m
connector-dap-1584799200-84cht 1/1 Running 0 6h35m
connector-ga-1584194400-nlpdq 1/1 Running 0 6h35m
connector-logan-1584194400-99z7b 1/1 Running 1 16h
connector-marlin-1584194400-57xz9 1/1 Running 0 6h35m
connector-sdinsw-1585404000-44hd6 1/1 Running 0 6h35m
Describe the bug
magda-csw-connector generates date string that doesn't match JSON schema for dcat-dataset-strings aspect
Failed to PUT data registry record with ID "dist-aims-ef67ebe0-61ac-4a0a-a5a5-52a12b4d727c-1". 1 retries left. Status code: 400, body:
{
"message": "#/issued: [2019-09-12] is not a valid date-time. Expected [yyyy-MM-dd'T'HH:mm:ssZ, yyyy-MM-dd'T'HH:mm:ss.[0-9]{1,9}Z, yyyy-MM-dd'T'HH:mm:ss[+-]HH:mm, yyyy-MM-dd'T'HH:mm:ss.[0-9]{1,9}[+-]HH:mm]"
}
Make sure validateJsonSchema
option is true
to turn json validation on
To Reproduce
Deploy magda-csw-connector
with config:
id: aims
name: Australian Institute of Marine Science
sourceUrl: https://geo.aims.gov.au/geonetwork/srv/eng/csw
pageSize: 100
Describe the bug
Some datasets harvested by this connector have empty publisher property. E.g. sources:
To Reproduce
E.g. the following request will have publisher = "".
https://data.gov.au/api/v0/registry/records/ds-aurin-aurin:datasource-AU_Govt_ABS-UoM_AURIN_DB_3_abs_ihad_lga_2016?optionalAspect=source&optionalAspect=dcat-dataset-strings&optionalAspect=dcat-distribution-strings&dereference=true
{
"aspects": {
"dcat-dataset-strings": {
"contactPoint": "GeoServer",
"description": "...",
"keywords": [
"socio-economic"
],
"languages": [],
"publisher": "",
"spatial": "POLYGON((96.81 -43.75000004080834, 159.11000000000004 -43.75000004080834, 159.11000000000004 -9.140000000990348, 96.81 -9.140000000990348, 96.81 -43.75000004080834))",
"themes": [],
"title": "ABS - Index of Household Advantage and Disadvantage (IHAD) (LGA) 2016"
},
"source": {
"id": "aurin",
"name": "Australian Urban Research Infrastructure Network",
"type": "csw-dataset",
"url": "https://openapi.aurin.org.au/public/csw?service=CSW&version=2.0.2&request=GetRecordById&elementsetname=full&outputschema=http%3A%2F%2Fwww.isotc211.org%2F2005%2Fgmd&typeNames=gmd%3AMD_Metadata&id=aurin%3Adatasource-AU_Govt_ABS-UoM_AURIN_DB_3_abs_ihad_lga_2016"
}
},
"id": "ds-aurin-aurin:datasource-AU_Govt_ABS-UoM_AURIN_DB_3_abs_ihad_lga_2016",
"name": "ABS - Index of Household Advantage and Disadvantage (IHAD) (LGA) 2016",
"sourceTag": "60eda22a-11ff-4ae9-9def-0f12bef8f179",
"tenantId": 0
}
Besides,
If adding optionalAspect=dataset-distributions
to the above query, the values of all accessURL
will have double slash after hostname.
accessURL: "https://openapi.aurin.org.au//public/wfs?request=getFeature&version=1.0.0...
Expected behavior
accessURL
should be correct (without extra slash).The CSW connector correctly can't capture license info for the following datasets:
Tasmania - National Intertidal-Subtidal Benthic NISB Habitat Map (PLUS) | Datasets | data.gov.au - beta
New South Wales - National Intertidal-Subtidal Benthic NISB Habitat Map | Datasets | data.gov.au - beta
Dampier Marine Park Habitat Validation | Datasets | data.gov.au - beta
Western Australia - National Intertidal-Subtidal Benthic NISB Habitat Map (PLUS) | Datasets | data.gov.au - beta
Queensland - National Intertidal-Subtidal Benthic NISB Habitat Map (PLUS) | Datasets | data.gov.au - beta
Victoria - National Intertidal-Subtidal Benthic NISB Habitat Map (PLUS) | Datasets | data.gov.au - beta
South Australia - National Intertidal-Subtidal Benthic NISB Habitat Map (PLUS) | Datasets | data.gov.au - beta
Northern Territory - National Intertidal-Subtidal Benthic NISB Habitat Map (PLUS) | Datasets | data.gov.au - beta
Tessa Shoals Dampier Marine Park Habitat Validation | Datasets | data.gov.au - beta
Some CSW endpoint might require basic auth and we might also want to (optionally) send getRecords request via HTTP post when the server doesn't support HTTP GET well.
Sample POST request body:
<?xml version="1.0"?>
<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
xmlns:gmd="http://www.isotc211.org/2005/gmd"
service="CSW" version="2.0.2"
outputSchema="http://standards.iso.org/iso/19115/-3/mdb/2.0"
resultType="results"
startPosition="10"
maxRecords="20">
<csw:Query typeNames="gmd:MD_Metadata">
<csw:Constraint version="1.1.0">
<Filter xmlns="http://www.opengis.net/ogc"/>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>
Those parameters are currently sent via query string here:
magda-csw-connector/src/CswUrlBuilder.ts
Lines 17 to 27 in a9bfe38
getRecordsWithPostRequest
.
GET
. But users should be able to set it to true
to make requests sent in POSTbasicAuthSecretName
to allow users to specify the secret name of the basic auth secret.
basicAuthSecretName
is empty, no auth headers will be sentThe CSW connector currently can't correctly capture license info from the AURIN CSW registry.
notice the response doesn't tag license info with codeListValue
attribute == "license" --- that's probably why our code missed it.
Having said that, it seems it's still possible to extract license info out with hardcoded logic
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.