Comments (13)
@jainankitk Yes opened this PR in Lucene
from opensearch.
Thanks for filing this issue @ChapterSevenSeeds
I am able to reproduce the issue with the steps you mentioned. Will take a stab at it and update as I get time.
from opensearch.
Thanks @expani . Removing untriaged label.
from opensearch.
The match rule
"match": {
"query": "a"
}
gets converted into an empty query ( No Match ) due to the custom analyser removing stop words.
The final IntervalsSource created for the IntervalQuery here is a DisjunctionIntervalsSource and looks like below
or(
MAXGAPS/30(ORDERED(b,sister)),
MAXGAPS/30(ORDERED(no_match,sister))
)
OpenSearch represents the IntervalsIterator for no_match
using NO_INTERVALS which represents un-positioned docId using NO_MORE_DOCS -> 2147483647
here
The intervals iterator for the rule
"match": {
"query": "sister"
}
is a based on PostingsEnum which represents an un-positioned docId using -1
When both these iterators are matched, they differ which leads to the error reported being thrown here
Lucene also represents an IntervalsIterator for no_match
called NO_INTERVALS
but it represents un-positioned docIds using -1
Reference This was fixed a few years back in Lucene by this PR
I made the same changes to the NO_INTERVALS
iterator used by OpenSearch and the query mentioned above worked as expected.
The diff is
diff --git a/server/src/main/java/org/opensearch/index/query/IntervalBuilder.java b/server/src/main/java/org/opensearch/index/query/IntervalBui
lder.java
index 0e42e79f67d..fa21e6fbd82 100644
--- a/server/src/main/java/org/opensearch/index/query/IntervalBuilder.java
+++ b/server/src/main/java/org/opensearch/index/query/IntervalBuilder.java
@@ -266,6 +266,8 @@ public class IntervalBuilder {
@Override
public IntervalIterator intervals(String field, LeafReaderContext ctx) {
return new IntervalIterator() {
+
+ boolean exhausted = false;
@Override
public int start() {
return NO_MORE_INTERVALS;
@@ -293,16 +295,18 @@ public class IntervalBuilder {
@Override
public int docID() {
- return NO_MORE_DOCS;
+ return exhausted ? NO_MORE_DOCS : -1;
}
@Override
public int nextDoc() {
+ exhausted = true;
return NO_MORE_DOCS;
}
@Override
public int advance(int target) {
+ exhausted = true;
return NO_MORE_DOCS;
We should make the NO_INTERVALS used by Lucene public and use the same in OpenSearch instead of creating a copy to avoid any such issues in future.
from opensearch.
@expani - Thank you root causing this issue. Completely agree on reusing the NO_INTERVALS
iterator in Lucene instead of duplicating the code in OpenSearch. Have you created issue/PR with lucene for addressing this?
from opensearch.
@jainankitk Yes opened this apache/lucene#13385 in Lucene
Thank you for the lucene PR!
from opensearch.
This problem is often seen with patent search. I opened apache/lucene#13388
There are more issues, but making that constant publicly available is wrong. There should be a method to create an empty interval with a reason.
from opensearch.
P.S.: I tend to revert above PR. It does not fit the Lucene API patterns.
from opensearch.
P.S.: The workaround that can be used already in OpenSearch code is the following factory method: Intervals.atLeast(1)
This requires at least one match for an empty list of intervals, so it rewrites to a NoMatchIntervalsSource
which is a duplicate of above constant. The given issue will also merge those implementations.
from opensearch.
Agree on putting this behind an API.
Returning a null
iterator might not be properly checked at all places of usage so having NO_INTERVALS
iterator returning NO_MORE_DOCS
seems more safe of a change.
from opensearch.
I made a comment because of that in the PR. We need more tests, I agree.
Up to now this atLeast(1) has always worked for me.
Null is always used in lucene for iterators that fo not match anything.
from opensearch.
See here for DoxIdSetIterator and ScorerSupplier: https://github.com/apache/lucene/blob/22d50be2eab84c9a75ea65f55fcc4356724faba1/lucene/core/src/java/org/apache/lucene/search/MatchNoDocsQuery.java#L47
from opensearch.
Yes returning null worked for this case in OpenSearch as well, but NO_INTERVAL
is being used extensively across Lucene and not all the usage seem to be covered by UTs.
I added a comment to support both style of iterators null
and NO_INTERVAL
in the PR to make sure the changes are safe.
from opensearch.
Related Issues (20)
- [Feature Request] Add ability to configure maxExpansions parameter for Intervals query HOT 3
- [BUG] "." as field name yields array_index_out_of_bounds_exception HOT 1
- [Feature Request] Create Higher-Level APIs for Plugins to switch contexts HOT 7
- [Feature Request] Build Grok Search Response Processor
- [BUG] New AllocationStatus during Allocator timeout HOT 1
- [BUG] Validate lower limit for Allocator Timeout HOT 1
- Optimise the cluster manager writeTo/readFrom transport call HOT 1
- [BUG] Optimise unassigned shards iteration after allocator timeout HOT 1
- [BUG] Schedule reroute after allocator timeout HOT 1
- [BUG] Expensive setting parsing in ShardsLimitAllocationDecider HOT 2
- [BUG] the aggregation result in float field type is inaccurate HOT 4
- [Feature Request] document `max_input_length` for `completion` type fields
- [BUG] MX bean missing: G1 Concurrent GC error message HOT 2
- [Feature Request] Monitor the pending task count In netty worker thread pool HOT 1
- Add handling for master switch, dangling indices, master reload in the Tiering Service
- Introduce retrying mechanism for failed process in tiering
- [BUG in multiple plugins] Using CreateIndexRequest.mapping() without _doc wrapper fails with v1 index templates HOT 12
- [Feature Request] extend `constant_keyword` field type usage scope HOT 1
- [Proposal] Tiering Status API Model and Design HOT 7
- [Feature Request] Make remote snapshot local file_cache block size configurable HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opensearch.