Comments (6)
Hi @christophfriedrich,
This is due to an issue with how our STAC API is currently implemented; we are aware of the issue and looking to fix it.
In short, first all datasets that match the time/collection/bbox/etc parameters are being retrieved, and then the query extension is filtering the result. Hence why it doesn't affect numberMatched
; however, the datasets actually being returned will all correctly have eo:cloud_cover <= 10
.
The best workaround for the time being would probably be to set limit=1000
(to avoid/reduce pagination) and check the length of features
rather than the value of numberMatched
.
from datacube-explorer.
Hi Ariana, thanks for the quick reply and explanation! Indeed I only looked at the numberReturned
/numberMatched
/context
section and didn't pay attention to the actual results, which in fact fulfill the query. However, what wasn't clear to me initially after reading your comment, is that the filtering currently takes place after pagination. So e.g. for my test case, the 20 results matched initially happened to have a cloud cover between 76 and 99%, so that filtering for <10% gives an empty result (and for <90% only 3 results).
That means that I can't be sure that all possible results are included, unless I increase the limit
to the total number of datasets. I tested your suggestion of limit=1000
, but that request already took 30 seconds to complete and resulted in some 10 MB of data being returned, so is absolutely not feasible for the interactive web application where I want to use this. The whole collection I'm dealing with has some 50,000 datasets, and even smaller bboxes still include 3,000 to 10,000 datasets...
It's nice to hear you're already aware of the issue and keen to fix it -- is there any timeframe when this will be corrected? I wouldn't even care that much about the numberMatched
stuff, but moving the filtering before the pagination is crucial, as the current implementation makes the query
extension quite unusable for me and even produces answers that look wrong for the uninformed (e.g. apparently empty result when querying for <10% even though there are datasets with <10% in the database).
Is sort
applied earlier so that I could make the least cloud-covered images come to the top? 🤔 Alternatively the only workaround I see for me right now is submitting requests with e.g. limit=100
and using the next
links until at some point there were 10 actual results included, but that is rather cumbersome...
from datacube-explorer.
Sorry, I should have been clearer in my explanation - but yes, that's the essence of the problem. If you paginate through the results you will eventually manage to retrieve all matching datasets but obviously that's not ideal. And unfortunately, all the extensions are essentially broken in the same way for the exact same reason.
Fixing this is my top priority but unfortunately I can't give you an exact timeframe as I anticipate it might require some larger changes to the underlying design/logic that I'll have to look into - but hopefully it won't take more than a week or two.
Apologies for the inconvenience and thank you for raising!
from datacube-explorer.
No worries, after I realised it I kind of read it between the lines too. And regarding the timeframe you gave quite a good answer already, it's nice to hear that it's the next thing being worked on and I absolutely understand that changing something like this might induce bigger changes to the architecture that are not done in a day. Better do it thoroughly than rushing it. For me it's not super urgent, just wanted to know whether it's likely to stay like this for months to come or fixed soon™️Looking forward to it! :)
from datacube-explorer.
Hi, thanks for working on this @Ariana-B -- is there a roadmap on when a release can be expected that incorporates your fixes? :)
from datacube-explorer.
Unfortunately I'm not certain - that's dependent on when @omad is able to provide a review, as well as other fixes needing to be incorporated into the next release.
from datacube-explorer.
Related Issues (20)
- Fixup Github container CVE Scanner workflow
- Fixup eo3 document validation assertions HOT 1
- Ignore or address Flake8 gripes
- URL Escaping Breaking Redirection Test
- STAC API returning error when a specific field has no value
- Change in gdal source in Dockerfile
- Improve tests and behaviour in Summaries for anti-meridian crossing datasets
- Pluggable JWT validator for protecting endpoints HOT 2
- Fix JSON schema validation issues in tests
- Update to flask > 2.2.5
- Missing STAC projection properties in STAC item search HOT 1
- Explorer STAC API search issue: only returning max of 20 items HOT 1
- Out of Sync Explorer Tables + Crashes HOT 13
- Unexpected Database Load running cubedash-gen HOT 1
- Result list for region overview is sometimes empty, sometimes correct HOT 2
- Excessive Logging, Unable to Configure - Default Docker EXEC doesn't configure or allow logging customisation
- Invalid date strings in search queries are not being caught gracefully and crash the pod
- Add error or more direct warning to deprecated "query" extension
- Generate and publish OpenAPI documentation for Explorer Users
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datacube-explorer.