nasa-pds / registry-api-service Goto Github PK
View Code? Open in Web Editor NEW(deprecated) Merged with other projects into https://github.com/NASA-PDS/registry-api
License: Other
(deprecated) Merged with other projects into https://github.com/NASA-PDS/registry-api
License: Other
...so that I can access the registry using a consistent interface.
Given
When I perform
Then I expect
We need the deployment of the Registry API Service either documented as part of the Registry App or linked explicitly. These components are tightly coupled so we need to consider how to document or link to them wherever possible.
Currently it returns a not implemented error, see example:
curl --location --request GET 'http://localhost:8080/products?only-summary=True' -v
The detault format proposed should be application/json
That should be handled as a constant in the MVC object (https://github.com/NASA-PDS/registry-api-service/blob/main/src/main/java/gov/nasa/pds/api/engineering/configuration/WebMVCConfig.java)
And used for example in:
I believe we should have different caterogies of mime types for longProduct or shortProduct descriptions since some format will be applicable for full product (pds4+json) or short product (json)
API server crashes with OutOfMemoryError if invalid query is used
Steps to reproduce the behavior:
title=Kaguya
in 'q' parameterjava.lang.OutOfMemeory
error.=
sign is not supported in queries.0.1.0
** π¦ Applicable requirements**
See parent Epic for details.
The current main
of registry-api-service
fails to build due to a bad dependency:
$ cd /tmp
$ git clone [email protected]:NASA-PDS/registry-api-service.git
β¦
$ cd registry-api-service
$ mvn clean install
β¦
[ERROR] Failed to execute goal on project registry-api-service: Could not resolve dependencies for project gov.nasa.pds:registry-api-service:jar:0.5.0-SNAPSHOT: Failure to find gov.nasa.pds:api:jar:pds.api.110 in https://oss.sonatype.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of oss.sonatype.org-snapshot has elapsed or updates are forced -> [Help 1]
β¦
$ echo \U+1F616
π
It looks like in 7a9456a a typo was introduced in the dependency for gov.nasa.ods:api
changing its version from 0.5.0-SNAPSHOT
to an unusual version tag pds.api.110
. Is this correct?
Hat tip to @ramesh-maddegoda for discovering this! π£
$ mvn clean install
β¦
BUILD SUCCESS
β¦
Various API endpoints like the /collections/{lidvid}/products
let you paginate through your results in a fairly direct manner:
start
to zero and your limit
to 20.start
to 20
and try again.start
to 22
and try again.For example, try this:
curl --header 'Accept: application/json' 'https://pds-gamma.jpl.nasa.gov/api/collections/urn%3Anasa%3Apds%3Ainsight_documents%3Adocument_hp3rad%3A%3A8.0/products?start=0&limit=20&fields=product_class&only-summary=false' | json_pp
and note that the data
field has 2 entries in it.start
to 2 try curl --header 'Accept: application/json' 'https://pds-gamma.jpl.nasa.gov/api/collections/urn%3Anasa%3Apds%3Ainsight_documents%3Adocument_hp3rad%3A%3A8.0/products?start=2&limit=20&fields=product_class&only-summary=false' | json_pp
and you get this nice little result:{
"summary" : {
"limit" : 20,
"properties" : [],
"sort" : [],
"start" : 2
}
}
π» Perfect!
This strategy worked with the various other endpoints too, like /bundles/{lidvid}/collections
. Or it used to. Now (possibly related to the fix for NASA-PDS/pds-api#73) this no longer works for /bundles/{lidvid}/collections
. Instead of returning zero results, it gives a 500 Internal Server Error.
For example, try this:
curl --header 'Accept: application/json' 'https://pds-gamma.jpl.nasa.gov/api/bundles/urn%3Anasa%3Apds%3Ainsight_documents%3A%3A2.0/collections?start=0&limit=20&fields=product_class&only-summary=false' | json_pp
. Note the data
parameter has 5 entries.start
to 5 and do curl --header 'Accept: application/json' 'https://pds-gamma.jpl.nasa.gov/api/bundles/urn%3Anasa%3Apds%3Ainsight_documents%3A%3A2.0/collections?start=5&limit=20&fields=product_class&only-summary=false' | json_pp
. Now you get this:{
"error" : "Internal Server Error",
"message" : "",
"path" : "/bundles/urn:nasa:pds:insight_documents::2.0/collections",
"status" : 500,
"timestamp" : 1631137572150
}
π What happened? This used to work.
On a possibly related note, asking for zero items (limit=0
) also used to work; you'd get back a data
entry that was an empty array, which is what you'd expect. Now you get 500 Internal Server Errorβand on multiple endpoints. I think you should be able to ask for nothing and get nothing, which is not an error condition π
Commit: f91eb30
Broken build: https://github.com/NASA-PDS/registry-api-service/actions/runs/1210454651, https://github.com/NASA-PDS/registry-api-service/actions/runs/1210454650
Besides broken builds above, also can't build on my laptop:
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /Users/jpadams/Documents/proj/pds/pdsen/workspace/registry-api-service/src/main/java/gov/nasa/pds/api/engineering/controllers/MyCollectionsApiController.java:[38,8] gov.nasa.pds.api.engineering.controllers.MyCollectionsApiController is not abstract and does not override abstract method bundlesContainingCollection(java.lang.String,java.lang.Integer,java.lang.Integer,java.util.List<java.lang.String>,java.util.List<java.lang.String>,java.lang.Boolean) in gov.nasa.pds.api.base.CollectionsApi
[ERROR] /Users/jpadams/Documents/proj/pds/pdsen/workspace/registry-api-service/src/main/java/gov/nasa/pds/api/engineering/controllers/MyCollectionsApiController.java:[201,37] bundlesContainingCollection(java.lang.String,@javax.validation.Valid java.lang.Integer,@javax.validation.Valid java.lang.Integer,@javax.validation.Valid java.util.List<java.lang.String>,@javax.validation.Valid java.util.List<java.lang.String>,@javax.validation.Valid java.lang.Boolean) in gov.nasa.pds.api.engineering.controllers.MyCollectionsApiController cannot implement bundlesContainingCollection(java.lang.String,java.lang.Integer,java.lang.Integer,java.util.List<java.lang.String>,java.util.List<java.lang.String>,java.lang.Boolean) in gov.nasa.pds.api.base.CollectionsApi
return type org.springframework.http.ResponseEntity<gov.nasa.pds.model.Products> is not compatible with org.springframework.http.ResponseEntity<java.lang.Object>
[ERROR] /Users/jpadams/Documents/proj/pds/pdsen/workspace/registry-api-service/src/main/java/gov/nasa/pds/api/engineering/controllers/MyCollectionsApiController.java:[200,5] method does not override or implement a method from a supertype
[INFO] 3 errors
API server doesn't work (returns status 500) with default output format with JDK11
Steps to reproduce the behavior:
If XML output format is supported, it should work with JDK11.
API: 0.3.0.
2021-07-22 14:44:38.549 ERROR 17384 --- [nio-8080-exec-8] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.http.converter.HttpMessageConversionException: Could not create JAXBContext for class [class gov.nasa.pds.model.Products]: Implementation of JAXB-API has not been found on module path or classpath.; nested exception is javax.xml.bind.JAXBException: Implementation of JAXB-API has not been found on module path or classpath.
- with linked exception:
[java.lang.ClassNotFoundException: com.sun.xml.bind.v2.ContextFactory]] with root cause
java.lang.ClassNotFoundException: com.sun.xml.bind.v2.ContextFactory
at org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader.loadClass(TomcatEmbeddedWebappClassLoader.java:72) ~[spring-boot-2.3.1.RELEASE.jar!/:2.3.1.RELEASE]
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1188) ~[tomcat-embed-core-9.0.36.jar!/:9.0.36]
at javax.xml.bind.ServiceLoaderUtil.nullSafeLoadClass(ServiceLoaderUtil.java:92) ~[jakarta.xml.bind-api-2.3.3.jar!/:2.3.3]
at javax.xml.bind.ServiceLoaderUtil.safeLoadClass(ServiceLoaderUtil.java:125) ~[jakarta.xml.bind-api-2.3.3.jar!/:2.3.3]
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:230) ~[jakarta.xml.bind-api-2.3.3.jar!/:2.3.3]
at javax.xml.bind.ContextFinder.find(ContextFinder.java:375) ~[jakarta.xml.bind-api-2.3.3.jar!/:2.3.3]
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:691) ~[jakarta.xml.bind-api-2.3.3.jar!/:2.3.3]
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:632) ~[jakarta.xml.bind-api-2.3.3.jar!/:2.3.3]
at org.springframework.http.converter.xml.AbstractJaxb2HttpMessageConverter.lambda$getJaxbContext$0(AbstractJaxb2HttpMessageConverter.java:110) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) ~[na:na]
at org.springframework.http.converter.xml.AbstractJaxb2HttpMessageConverter.getJaxbContext(AbstractJaxb2HttpMessageConverter.java:108) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
at org.springframework.http.converter.xml.AbstractJaxb2HttpMessageConverter.createMarshaller(AbstractJaxb2HttpMessageConverter.java:51) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
at org.springframework.http.converter.xml.Jaxb2RootElementHttpMessageConverter.writeToResult(Jaxb2RootElementHttpMessageConverter.java:181) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
at gov.nasa.pds.api.engineering.serializer.XmlProductSerializer.writeToResult(XmlProductSerializer.java:69) ~[classes!/:0.3.0]
at org.springframework.http.converter.xml.AbstractXmlHttpMessageConverter.writeInternal(AbstractXmlHttpMessageConverter.java:85) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
at org.springframework.http.converter.AbstractHttpMessageConverter.write(AbstractHttpMessageConverter.java:227) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.writeWithMessageConverters(AbstractMessageConverterMethodProcessor.java:290) ~[spring-webmvc-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.HttpEntityMethodProcessor.handleReturnValue(HttpEntityMethodProcessor.java:219) ~[spring-webmvc-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
at org.springframework.web.method.support.HandlerMethodReturnValueHandlerComposite.handleReturnValue(HandlerMethodReturnValueHandlerComposite.java:82) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
Whitelabel Error Page
This application has no explicit mapping for /error, so you are seeing this as a fallback.
Thu Jul 22 14:48:51 PDT 2021
There was an unexpected error (type=Internal Server Error, status=500).
** π¦ Applicable requirements**
The recently added target groups (used to route between the load balancer and the ECS services do not have targets (i.e. the respective service).
Steps to reproduce the behavior:
There should be at least one IP address if the target service identified
If we intend to a single load balancer across all nodes' registry (which we need to do because of the DNS limitations) the destination node needs to be part of the request URL so the LB can accurately route the request to the appropriate registry deployment. While largely meaningless to the registry service once routed, the request will still contain the node specification (since it cannot be 'filtered' by the LB). The service hence must disregard but tolerate this part of the path.
The current path w/o the node must remain supported.
All end-point need the pagination implementation except the urn resolvers.
This need to be implemented efficiently so that we don't go through the same items if a sequence of requests is:
collections?start=0&limit=10
collections?start=11&limit=10
...
One way of doing that is using java stream.skip() method.
Acceptance criteria:
For requests with numerous results, the processing time for any page is the same for any page (e.g. it does not become longer when the start number is bigger). This can be tested on the demo deployment https://pds-gamma.jpl.nasa.gov/api/swagger-ui.html of the reference implementation (https://github.com/NASA-PDS/registry-api-service).
Request examples:
curl --location --request GET 'https://pds-gamma.jpl.nasa.gov/api/products?start=1&limit=500'
curl --location --request GET 'https://pds-gamma.jpl.nasa.gov/api/collections/urn:nasa:pds:orex.ovirs:data_calibrated::10.0/products?start=100&limit=100'
Replace start and limit with your values.
The service requires a couple of sensitive configuration parameters: aws ES login and the keystore password. Rather than stored in the application.config or elsewhere in the code/docker-image, these should be obtained at run-time from the aws secrets manager. For now we will use role-based access, requiring the Fargate task to override the default task-execution-role w/ that which has been enabled to read these secrets. The associated secret names should be configurable such that multiple co-located registry deployments do not clobber each other's secrets - for Docker, these can be passed in as arguments to the build and captured in the application.properties file.
Given an application (in this case, the registry-api-service) requires access and use of a sensitive key/value pair
When I perform a lookup to the AWS secret manager
Then I expect that key/value pair will be returned and not have to be part of the application configuration.
In order to avoid having per-deployment application properties, we will inject es login information as well as other deployment specific parameters as 'value-from' environment variables in the container definition. This will avoid having to make secret manager and parameter store look-ups directly in the service.
Not only the selected fields are returned but some other are proposed to the user.
See request
curl --location --request GET 'http://localhost:8080/products?fields=cart:Bounding_Coordinates.cart:east_bounding_coordinate' \
--header 'Accept: application/kvp+json'
Only the selected fields should be available in the response. Note that this works well for the text/csv
format.
When using the bundle's collections endpoint (/bundles/{lidvid}/collections
) on urn:nasa:pds:insight_documents::2.0
with start=0
and limit=49
with the following fields:
ops:Data_File_Info.ops:file_ref
ops:Data_File_Info.ops:md5_checksum
ops:Label_File_Info.ops:file_ref
ops:Label_File_Info.ops:md5_checksum
the properties for ops:Data_File_Info.ops:file_ref
, ops:Data_File_Info.ops:md5_checksum
, ops:Label_File_Info.ops:file_ref
, and ops:Label_File_Info.ops:md5_checksum
do not contain URLs or MD5 digests, but instead contain the literal string:
null
π Also: It'd be great if instead of {"values": [β¦]}
appearing in properties, that we just had the array [β¦]
without the single key-value dict wrapper.
Rather than null
for file references, there should be URLs to files. Similarly, rather than null
for MD5s, there should be strings of hexadecimal digits. This used to work back in April 2021 (but back then, there wasn't this weird extra values
dict everywhere either.)
π And: we'd just have arrays [β¦]
and not {"values": [β¦]}
.
Try running:
curl \
--silent \
--request GET \
--header 'Accept: application/json' \
'https://pds-gamma.jpl.nasa.gov/api/bundles/urn%3Anasa%3Apds%3Ainsight_documents%3A%3A2.0/collections?start=0&limit=49&fields=ops%3AData_File_Info.ops%3Afile_ref&fields=ops%3AData_File_Info.ops%3Amd5_checksum&fields=ops%3ALabel_File_Info.ops%3Afile_ref&fields=ops%3ALabel_File_Info.ops%3Amd5_checksum&only-summary=false' \
| json_pp
and look at all the null
s and values
. Or try the Swagger interactive UI:
...so that I can the code is easier to maintain
Remove class MyCapabilitiesApiController
Given
When I perform
Then I expect
The following line
hard code ::P1 as part of the requested identifer of registry-refs whereas there could be multiple identifiers ::P2 ...
Request the end points /bndles/{lidvid}/products when there is more that 500 products per collection for at least one collection.
Some product will miss
No product should miss
I am thinking it would be beneficial to leverage on the class https://github.com/NASA-PDS/registry-api-service/blob/master/src/main/java/gov/nasa/pds/api/engineering/elasticsearch/business/CollectionProductIterator.java and mimic the behavior of class https://github.com/NASA-PDS/registry-api-service/blob/master/src/main/java/gov/nasa/pds/api/engineering/elasticsearch/business/CollectionProductRelationships.java as BundleProductRelationships class to re-implement the method to get products of a bundle.
** π¦ Applicable requirements**
I want the best request performances elasticSearch can give me so that I can request millions of records (e.g. products of a collection) as quickly as possible.
This includes:
Given the duration of the equivalent optimized requests to elasticSearch
When I perform the same request through the API
Then I expect the duration (excluding network time between client and api) is not longer than '20%' the elasticSearch requests.
Given an API request
When I perform the underlying elasticSearch request (shown in logs)
Then I expect no extra fields to be pulled from elasticSearch
Given an API request to the end-point /collections/{lidvid}/products
When I perform the request
Then I expect only 2 calls to elasticSearch to be done (per page)
From @msbentley :
And a question for Thomas following on - I ingested a few (~60) test
products into the latest registry, and tried to make a simple product
request, but the API service gives me:2021-08-20 14:20:57.335 ERROR 88586 --- [io-8080-exec-10]
o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet
[dispatcherServlet] in context with path [] threw exception [Request
processing failed; nested exception is
java.lang.IllegalArgumentException: Cannot deserialize instance of
java.lang.String
out of START_ARRAY token
at [Source: UNKNOWN; line: -1, column: -1] (through reference chain:
gov.nasa.pds.api.engineering.elasticsearch.entities.EntityProduct["pds:File/pds:creation_date_time"])]
with root causeI suspect it's related to one particular product, since if I mess with
the start/limit parameters I can work around it, but I haven't figured
out which yet...
From @tdddblog :
I think this is a bug in API. It could not handle products with multiple data files. For example,
<Product_XML_Schema ...
...........
<File_Area_XML_Schema>
<File>
<file_name>PDS4_SPECLIB_1B00_1000.xsd</file_name>
<creation_date_time>2019-11-22T13:12:14</creation_date_time>
<file_size unit="byte">43384</file_size>
<records>925</records>
</File>
<XML_Schema>
<name>PDS4_SPECLIB_1B00_1000.xsd</name>
<offset unit="byte">0</offset>
<parsing_standard_id>XML Schema Version 1.1</parsing_standard_id>
<description>This is a PDS4 XML Schema file for the declared namespace.</description>
</XML_Schema>
</File_Area_XML_Schema>
<File_Area_XML_Schema>
<File>
<file_name>PDS4_SPECLIB_1B00_1000.sch</file_name>
<creation_date_time>2019-11-22T13:12:14</creation_date_time>
<file_size unit="byte">16227</file_size>
<records>234</records>
</File>
<XML_Schema>
<name>PDS4_SPECLIB_1B00_1000.sch</name>
<offset unit="byte">0</offset>
<parsing_standard_id>Schematron ISO/IEC 19757-3:2006</parsing_standard_id>
<description>This is the PDS4 Schematron file for the declared namespace. Schematron provides rule-based validation for XML Schema.</description>
</XML_Schema>
</File_Area_XML_Schema>
</Product_XML_Schema>
API returns as expected
v0.3.2
This requires to update the products/{lidvid} , collections/{lidvid}/.... controllers and use the method gov.nasa.pds.api.engineering.controllers.MyProductsApiBareController.getLatestLidVidFromLid(String) when the argument is a lid and not a lidvid.
A lidvid is lid::vid (identifier::version), so a lid is a lidvid which does not contains '::'.
Acceptance criteria:
Currently the following url works:
http://localhost:8080/products/urn:nasa:pds:izenberg_pdart14_meap:document::1.0
we would like the following url to also work and return the latest lidvid (with the higher version number) available in the registry:
http://localhost:8080/products/urn:nasa:pds:izenberg_pdart14_meap:document
Same for collection and bundle end points:
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap:data_imagecube::1.0
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap:data_imagecube
http://localhost:8080/collections/bundles/urn:nasa:pds:izenberg_pdart14_meap::1.0
http://localhost:8080/collections/bundles/urn:nasa:pds:izenberg_pdart14_meap
And their sub-url:
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap:data_imagecube::1.0/products
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap:data_imagecube/products
http://localhost:8080/collections/bundles/urn:nasa:pds:izenberg_pdart14_meap::1.0/collections
http://localhost:8080/collections/bundles/urn:nasa:pds:izenberg_pdart14_meap/collections
Error cases:
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap:data_imagecube::1. --> 404 (not found), lidvid resolution case
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap:data_imagecube:: --> 404 (not found), lidvid resolution case
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap:data_imagecube: --> 404 (not found), lid resolution
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap:data_imagecub --> 404 (not found since full lid is with additional e), (lid resolution)
http://localhost:8080/collections/urn:nasa:pds:izenberg_pdart14_meap --> 404 not found is no lidvid matching urn:nasa:pds:izenberg_pdart14_meap::* exists otherwise return the lidvid record (200), lid resolution
Can be tested on the demo deployment https://pds-gamma.jpl.nasa.gov/api/swagger-ui.html (replace localhost:8080 accordingly).
...so that I can update my query (q param) to make it work
Given deployed API server
When I perform request q=ops:Data_File_Info.ops:file_size gte 138172
Then I expect an explicit error message like "Unkown operator gte", status 400
To be completed
I believe this can be easily added by using messages in the ParseCancellationException and throwing the exception all the way through the api controllers. Actually this requires a bit a research to understand how a springboot controller method can returm multiple type (products or error).
When one goes to the swagger-ui, the list of proposed format does not match what is expected, see:
Go to the swagger-ui
The expected list should contain application/kvp+json
, text/csv
and application/pds4+xml
Version 7.15.1 of the ES High Level Java API will not work with AWS's ES 7.10 due to the latter's incompatible build flavor ('oss' vs 7.15.1's expected 'default'). We need to revert back to 7.13.3 of the API in order to be able to deploy the registry on AWS.
The impact of this appears to be changing the package of the TimeValue class in the imports in two locations, it does not seem any 7.15.x specific features are in use.
The value of the ES login passed in as an environment variable is in JSON format, which the service is not accommodating. Hence, logins are not successful (Unauthorized). A method in AWSSecretsAccess will perform this parsing and conversion into a key/value tuple - this will need to be changed to a static, then called from the ES configuration class.
...so that I can have a better user experience when trying to run the tool.
This helps the Registry API Service fall in line with all our others tools, which use a command-line script to execute the java -jar
command by finding the Java command, and executing the software. For example:
Given a pre-built registry-api-service application
When I perform an execution of a command-line script registry-api-service
Then I expect the registry-api-service JAR to be executed with the appropriate arguments given to the script via the command-line (as applicable)
Given a pre-built registry-api-service application
When I perform an execution of a window command-line batch file registry-api-service.bat
Then I expect the registry-api-service JAR to be executed with the appropriate arguments given to the script via the command-line (as applicable)
Several items need to be worked out w/ the SA's as we approach deployment of the registry at a production level:
Other items to keep in mind:
Currently the end-point /products refers to any type of product, including collections and bundles.
So the end-points /products/{lidvid}/bundles or /products/{lidvid}/collections created for ticket #29 are confusing.
We need to find a better solution for that.
Given
When I perform
Then I expect
Steps to reproduce the behavior:
Run request:
curl --location --request GET 'https://pds-gamma.jpl.nasa.gov/api/collections/urn:nasa:pds:insight_documents:document_mission::1.1/products?start=0&limit=10&fields=ops:Data_File_Info.ops:md5_checksum&only-summary=false'
The request returns an error 500
The request should return the list of products belonging to the collection urn:nasa:pds:insight_documents:document_mission::1.1
version 0.1.0 of the API
Log message is:
Run request:
2021-04-15 14:56:03.689 DEBUG 3583235 --- [nio-8081-exec-1] lasticSearchRegistrySearchRequestBuilder : Request product reference documents from 0 for size 1
2021-04-15 14:56:03.768 DEBUG 3583235 --- [nio-8081-exec-1] lasticSearchRegistrySearchRequestBuilder : search product ref request :SearchRequest{searchType=QUERY_THEN_FETCH, indices=[registry-refs], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"from":0,"size":1,"query":{"match":{"collection_lidvid":{"query":"urn:nasa:pds:insight_documents:document_mission::1.1","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}}}
2021-04-15 14:56:04.205 ERROR 3583235 --- [nio-8081-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList] with root cause
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList
at gov.nasa.pds.api.engineering.elasticsearch.business.CollectionProductIterator.initProductIterator(CollectionProductIterator.java:143) ~[classes/:na]
at gov.nasa.pds.api.engineering.elasticsearch.business.CollectionProductIterator.(CollectionProductIterator.java:43) ~[classes/:na]
at gov.nasa.pds.api.engineering.elasticsearch.business.CollectionProductRelationships.iterator(CollectionProductRelationships.java:81) ~[classes/:na]
at gov.nasa.pds.api.engineering.controllers.MyCollectionsApiController.getProductChildren(MyCollectionsApiController.java:146) ~[classes/:na]
at gov.nasa.pds.api.engineering.controllers.MyCollectionsApiController.productsOfACollection(MyCollectionsApiController.java:93) ~[classes/:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_271]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_271]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_271]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_271]
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190) ~[spring-web-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138) ~[spring-web-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:105) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:879) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:793) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1040) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:634) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) ~[spring-webmvc-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) ~[tomcat-embed-websocket-9.0.36.jar:9.0.36]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) ~[spring-web-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) ~[spring-web-5.2.7.RELEASE.jar:5.2.7.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) ~[tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590) [tomcat-embed-core-9.0.36.jar:9.0.36]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.36.jar:9.0.36]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_271]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_271]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.36.jar:9.0.36]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_271]
** π¦ Applicable requirements**
...so that I am not confused, since currently it always returns empty results:
Given a bundle lidvid
When I perform {{baseUrl}}/products/{lidvid}/collections
Then I expect to get the collections of this bundle as /bundles/{lidvid}/collections does
Given a collection lidvid
When I perform {{baseUrl}}/products/{lidvid}/bundles
Then I expect to get the bundle of this collection as /collections/{lidvid}/bundles does
this question is the reason why I intially created the ticket #32
If Elasticsearch is down on API server startup, it starts, but doesn't accept connections
Steps to reproduce the behavior:
Server should either
API: 0.3.0; OS: Windows 10; JDK: 11
** π¦ Applicable requirements**
...so that the API underlying data model is more readable and intuitive.
the urls:
/products/:lidvid/collections
should be /products/:lidvid/collection
/products/:lidvid/bundles
should be /products/:lidvid/bundle
/collections/:lidvid/bundles
should be /collections/:lidvid/bundle
If the format of the response should be an array or a single element can be discussed.
If the format of the response remains an array with a single element, we could keep the same url end-points with plural.
If the a product or a collection can belong to multiple collections and bunldes we should keep the url as-is and cancel this ticket.
Given
When I perform
Then I expect
...so that I know how to go on with the API.
For example https://pds-gamma.jpl.nasa.gov/api/ should redirect to https://pds-gamma.jpl.nasa.gov/api/swagger-ui.html
Given
When I perform
Then I expect
Don't want to be surprised by huge costs and also be aware of dollar burn rates, so put in incremental 10% budget alerts based on a $1,000/month expenditure.
Also discuss rate/egress limits w/ SAs to see if they can be put in place as well.
DSIO ticket is DSIO-257
related to NASA-PDS/planetary-data-cloud#1
Use elasticsearch freetext search as configured in the registry.
Find a way to implement that in the API specification and code it.
Applicable Requirements
π¦ https://github.com/NASA-PDS/pds-api/issues/99
For example url https://pds-gamma.jpl.nasa.gov/api/products?limit=100&only-summary=false
Generates error 500 and following error in log:
Controller : accept value is text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
2021-10-13 12:12:02.249 DEBUG 90983 --- [io-8080-exec-10] lasticSearchRegistrySearchRequestBuilder : Elasticsearch request :SearchRequest{searchType=QUERY_THEN_FETCH, indices=[registry], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"from":0,"size":100,"timeout":"60s","query":{"bool":{"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":[],"excludes":["ops:Label_File_Info/ops:blob"]}}}
2021-10-13 12:12:02.255 WARN 90983 --- [io-8080-exec-10] .w.s.m.s.DefaultHandlerExceptionResolver : Resolved [org.springframework.http.converter.HttpMessageNotWritableException: No converter for [class gov.nasa.pds.model.Products] with preset Content-Type 'null']
Note that the same request works with curl:
curl 'https://pds-gamma.jpl.nasa.gov/api/products?limit=100&only-summary=false'
Use the url in chrome browser.
The detailed request sent, with headers, is:
GET /products?limit=100&only-summary=false HTTP/1.1 Host: localhost:8080 Connection: keep-alive Cache-Control: max-age=0 sec-ch-ua: "Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93" sec-ch-ua-mobile: ?0 sec-ch-ua-platform: "macOS" Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 Sec-Fetch-Site: none Sec-Fetch-Mode: navigate Sec-Fetch-User: ?1 Sec-Fetch-Dest: document Accept-Encoding: gzip, deflate, br Accept-Language: en-US,en;q=0.9 Cookie: _xsrf=2|bbb2fc6e|811c32a0ecb3478e4388c5c76eae2cf2|1632265199; username-localhost-8889="2|1:0|10:1632336201|23:username-localhost-8889|44:Y2RhMzc0Nzk5MzY0NDEzMGE4NjMzMTZmNTRiODRkYTc=|eca484ef060d20fb3a301672353d61ff48c12d9a17f67f27223aa576a58ec7cd"; username-localhost-8888="2|1:0|10:1632429831|23:username-localhost-8888|44:MWQ2YjkzNWNhOGRiNGI4Y2I3M2RhYmQzNTIxNWEwNmU=|729e90a370f6ca8059b5deda1de8aec888838f6bb3d7457050e88fedd668a62a"
The request should return a valid json response
latest on main branch
If I submit a products request and do not specify which fields to return, the blob is included in the results. If I specify one or more fields, the blob is not returned and behavior is as expected.
Steps to reproduce the behavior:
This is easiest to reproduce using the Swagger UI:
Log output shows:
2021-07-07 15:53:15.170 DEBUG 21821 --- [/O dispatcher 3] org.apache.http.wire : http-outgoing-2 >> "{"from":0,"size":1,"timeout":"60s","query":{"bool":{"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":[],"excludes":["ops:Label_File_Info/ops:blob"]}}"
Log output in this case shows:
2021-07-07 15:30:58.994 DEBUG 21821 --- [/O dispatcher 2] org.apache.http.wire : http-outgoing-1 >> "{"from":0,"size":1,"timeout":"60s","query":{"bool":{"must":[{"bool":{"should":[{"exists":{"field":"summary","boost":1.0}},{"exists":{"field":"lidvid","boost":1.0}},{"exists":{"field":"lid","boost":1.0}}],"adjust_pure_negative":true,"minimum_should_match":"1","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["summary","pds:File/pds:creation_date_time","ref_lid_instrument_host","pds:Time_Coordinates/pds:start_date_time","lid","ref_lid_investigation","lidvid","title","pds:Modification_Detail/pds:modification_date","ref_lid_instrument","pds:Time_Coordinates/pds:stop_date_time","product_class","vid","ref_lid_target","ops:Label_File_Info/ops:file_ref"],"excludes":["ops:Label_File_Info/ops:blob"]}}"
Note that this is running the service locally against the AWS ES instance (search-pds-dev-esext-kcq7xxa4lsrakjw33lywpjdyfy.us-west-2.es.amazonaws.com)
The relationships that need to be managed are:
bundle --> collections : for a given bundle, all collections can be retrieved
collection --> products: for a given collection, all observational products can be retrieved.
Acceptance criteria:
Check that the relationship management is:
...so that I request the collections of a bundle for example (/bundles/{lidvid}/collections) I know wether the bundles does not exist (404) or if it doesn't have any collection (200, empty result).
Would apply on all crawling requests.
Given a lidvid of a product not in the registry (e.g. foobar
When I perform a query for the collections for that bundle's collections, e.g. /bundles/foobar/collections
Then I expect a 404 error
...so that I can ensure the user does not lose attention or think the software is broken.
Given a deployed API and registry with data ingested
When I perform a query against an endpoint where the response time is >10 seconds and max_response_time
flag not indicated.
Then I expect an error response indicating response time for this query is the default 10 max response time. user should narrow query or use the max_response_time
flag to increase the time.
max_response_time
flag to the API with default 10
(seconds)When executing the /bundles/{lidvid}
API, the properties
key in the JSON result contains two important values, ops:Label_File_Info.ops:file_ref
and ops:Data_File_Info.ops:md5_checksum
.
For example, for /bundles/urn:nasa:pds:insight_documents::2.0
, once upon a time we got these properties
:
ops:Label_File_Info.ops:file_ref
= "https://pds-gamma.jpl.nasa.gov/data/pds4/test-data/registry/urn-nasa-pds-insight_documents/bundle_insight_documents.xml"
ops:Label_File_Info.ops:md5_checksum
= "a366a14158f5a7f0dc7a1b4c06c003ae"
However, now on pds-gamma
(as of 2021-06-29) these two keys have changed from strings to lists:
ops:Label_File_Info.ops:file_ref
= ["https://pds-gamma.jpl.nasa.gov/data/pds4/test-data/registry/urn-nasa-pds-insight_documents/bundle_insight_documents.xml"]
ops:Label_File_Info.ops:md5_checksum
= ["a366a14158f5a7f0dc7a1b4c06c003ae"]
Multiple values for the file_ref
and md5_checksum
don't make sense when describing a single label, but if this is the new expected correct behavior, please go ahead and triage-close this ticket immediately (i.e., wontfix
resolution).
However, if this is a regression, please leave the ticket open for assigningment, estimation, milestoning, etc.
To reproduce:
curl --request GET --header 'Accept: application/json' 'https://pds-gamma.jpl.nasa.gov/api/bundles/urn%3Anasa%3Apds%3Ainsight_documents%3A%3A2.0' | json_pp
ops:Label_File_Info.ops:md5_checksum
and ops:Label_File_Info.ops:file_ref
in the output.["β¦"]
now appear where "β¦"
used to be.This behavior is passed through the PDS API Client and affects the way the PDS Deep Archive works.
To enable per-deployment (node) cost tracking, we are employing alpha-foxtrot billing tags. Need to add these tags to applicable resources in the terraform script. The format will include the node abbreviation.
...so that I can ensure usability of the API through rapid responses to queries
1 second is somewhat arbitrary but loosely taken from https://www.nngroup.com/articles/response-times-3-important-limits/
Other details for the requirement:
Given a deployed API and registry with 1mil+ products ingested
When I perform a request or query against any endpoint with a query of q=*
Then I expect an average 1 second response time, regardless of the type of response type (e.g. pds4+json, json, etc.)
Note: per the performance note, this should be tested against all endpoints and all response formats.
Once #13 is implemented, this may just be a simple regression test we add to the repo to check this. Or we can talk to folks on the team to figure out if we know of any long-running queries that may push this. right now, I can't think of any.
...so that I can access various version of the API if available.
I should also be able to access the latest API version.
A home page, should propose all available versions. Also all the submodule of the API specification (e.g. registry, doi)
Given a maintained version X.Y.Z of the API specification
When I perform a request to uRL http://server/api/registry/X.Y.Z/ (TO BE REVIEWED)
Then I expect to get the swaggerhub ui for this version of the API
Given
When I perform a request to uRL http://server/api/
Then I expect to get the list of API module and version available
See request:
curl --location --request GET 'http://localhost:8080/products?fields=cart:Bounding_Coordinates.cart:east_bounding_coordinate' \
--header 'Accept: application/kvp+json'
The property values should be within "{content}", the extra " are not necessary.
The application/xml and application/pds4+xml response formats response formats are invalid in terms of what we want to actually return in the end.
To prevent users from testing or developing against that, let's disable these for the time being.
...so that I can feel included/welcomed in part of the development contributions.
Given
When I perform
Then I expect
Routing rules tell the AWS load balancer (listener) to which target group to send incoming requests. This also associates the target group w/ the load balancer (which is needed in order for the service to be created successfully). Add the creation of this rule to the ecs.tf terraform script.
If API is deployed behing a proxy (e.g. apache) the href values need to point to the proxu address and be resolvable.
I would avoid as much as possible specific configuration values which are a burden when we deploy. I think we can manage that as described here https://medium.com/@codebyamir/using-apache-as-a-reverse-proxy-for-spring-boot-embedded-tomcat-f704da73e7c8
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.