jurismarches / luqum Goto Github PK
View Code? Open in Web Editor NEWA lucene query parser generating ElasticSearch queries and more !
License: Other
A lucene query parser generating ElasticSearch queries and more !
License: Other
It would be great if words like now
, now-5h
, now-1h
can be used in dates, without being treated as simple string.
Parsing date : [now-1h TO now]
would add more meaning to luqum. These human-readable queries are supported in Elasticsearch. See here Common options
P.S. When I checked, this feature was not implemented. Kindly reply, if it is already there.
Hi, is there a way to go in the opposite direction, i.e.:
Elasticsearch -> Lucene Query DSL
print
should not be used in production code as below. It is unprofessional.
Lines 181 to 184 in 55c9cdc
An exception can be raised.
There isn't even a good reason to silently skip certain characters, e.g. forward slash. Why can't it be processed? It should be a valid character.
Hello,
I've a strange behavior with full text search.
The parser converts search to something like:
{"query": {"match": {"*": {"query": "test", "zero_terms_query": "none"}}}}
That does not return any result.
Adding a trailing '*' the query is instead converted in this one:
{"query": {"query_string": {"query": "test*", "default_field": "*", "analyze_wildcard": true, "allow_leading_wildcard": true}}}
that correctly returns proper results.
There is a way to have latest behavior also for generic search without trailing *?
Like convert to query_string instead of match if field (or default_field) is *
It would be great to make a sample database, load it in elastic and then try the different feature against it.
This could help verify which versions of ES we are compatible with.
Hey guys!
I'd like to know is there any reason spaces were removed from string representation?
Line 398 in 48869be
Here's the code, that I'm running and facing the issue:
from luqum.tree import SearchField, AndOperation, Word
a = AndOperation(SearchField('po_cancelled', Word('false')), SearchField('deleted', Word('false')))
Previously I've had
str(a) == 'po_cancelled:false AND deleted:false'
After version 0.10.0
release I have
str(a) == 'po_cancelled:falseANDdeleted:false'
which brakes further code execution since syntactic is wrong
Could you please explain me another approach to build a query so my code would run as previously or add spaces back?
Thank you for the wonderful library.
I have queries that have some field expressions with double quotes. That is, something like the following:
field_name:""expression text""
When these are parsed, they confuse the parser as it thinks that the initial double quotes is an unknown operation and it gets treated as a Phrase.
Here is a sample query and the parsing operation in python:
from luqum.parser import parser
query = 'field_name:""Field Text"" OR field_name:text AND field_name:"more text"'
parser.parse(query)
Here is the current output:
UnknownOperation(SearchField('field_name', Phrase('""')), Word('Field'), OrOperation(Word('Text""'), AndOperation(SearchField('field_name', Word('text')), SearchField('field_name', Phrase('"more text"')))))
This is what the expected output of the parsing operation would look like:
OrOperation(SearchField('field_name', Phrase('""Field Text""')), AndOperation(SearchField('field_name', Word('text')), SearchField('field_name', Phrase('"more text"'))))
Any thoughts on this? The double quotes over the single quotes does have a distinct meaning in this case, hence why I am asking.
How is foo NOT bar
different from foo -bar
? Should luqum process them identically? If not, why not?
There's a query from pair of symbols which breaks the error handling of the parser, so we can't get relevant information about the syntax error.
How to reproduce:
from luqum.parser import parser
parser.parse('~]')
Expected result: ParseSyntaxError
Actual result: TypeError: __str__ returned non-string (type NoneType)
Possible fix:
Update the TokenValue.__str__
method to always return string. I.e.
class TokenValue:
# ...
def __str__(self):
return str(self.value)
We have a project which uses apache2 license. We would like to leverage this library but license is incompatible(https://apache.org/legal/resolved.html). Is it possible to do duel license for the project?
Short version is I would like to parse a b OR c d
as a AND (b OR c) AND d
.
Long version: I'm attempting to simplify the syntax a little before I integrate this into my code, as it's supposed to be an easy to use search with optional advanced features. I gave up coding it myself and found this library which seems nice.
I'd like it to work like Google where things are "AND" by default, but by providing "OR", it'll compare the two closest values. I've set UnknownOperationResolver
to AND and tried changing the order of parser.precedence
, but not had any luck.
Here's an example of a query I'd like to use:
Working syntax: (either page 1 or page 2, has either "a" or both "b" and "c", title is not sometitle)
(page:page1 OR page:page2) AND (a OR b AND c) AND -title:sometitle
Wanted syntax:
page:page1 OR page:page2 a OR (b c) -title:sometitle
For the record, we're still stuck on Python 2.7 for another year or two, so as I've already had to fix the yield from
lines, I'm not against tweaking other bits of the code if needed.
During parsing an invalid query like a^
the parser fails with TypeError: conversion from NoneType to Decimal is not supported
Traceback (most recent call last): File "scratches/scratch1.py", line 12, in qb = parser.parse('a^') File "lib/python3.8/site-packages/ply/yacc.py", line 333, in parse return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc) File "lib/python3.8/site-packages/ply/yacc.py", line 1120, in parseopt_notrack p.callable(pslice) File "lib/python3.8/site-packages/luqum/parser.py", line 316, in p_boosting p[0] = Boost(p[1], p[2].value) File "lib/python3.8/site-packages/luqum/tree.py", line 374, in __init__ self.force = Decimal(force).normalize() TypeError: conversion from NoneType to Decimal is not supported
Expected result: luqum.exceptions.ParseSyntaxError: Syntax error in input ...
Latest ES6 support Ipv6. When I tried the following query, Luqum is unable to parse properly.
srcIp: 1::1
or
srcIp: 1::1
Any suggestion?
There is a regression since 0.7.x where zero_terms_query: none
is being appended to generated query for match_phrase
For query some_id: hello-world
0.6.1
generated {'match_phrase': {'some_id': {'query': hello-world'}}}
0.7.1
generated {'match_phrase': {'participant_id': {'query': 'LB-S00133', 'zero_terms_query': 'none'}}}
Note that the 0.7.1 query fails on elasticsearch 5.6.8
with: TransportError(400, 'parsing_exception', '[match_phrase] query does not support [zero_terms_query]')
https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-SurroundQueryParser
Is Surround Query Parser not supported?
like this:
3w(foo, bar)
or
(spot prices) 3w (gulf coast)
Install instruction refers to cookiecutter, and comes from here: d0bf7d7
from luqum.elasticsearch.visitor import ElasticsearchQueryBuilder
from luqum.parser import parser
ElasticsearchQueryBuilder().visit(parser.parse('""'))
Hi Alex,
I just want to let you know that the .tar.gz file at PyPi has incorrect structure (the wheel is ok, though).
It looks like:
root@cis-hub:d327478a510f3# tar tzf luqum-0.10.0.linux-x86_64.tar.gz
./
./home/
./home/alex/
./home/alex/projets/
./home/alex/projets/luqum/
./home/alex/projets/luqum/venv/
./home/alex/projets/luqum/venv/lib/
./home/alex/projets/luqum/venv/lib/python3.8/
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__init__.py
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/__init__.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/auto_head_tail.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/check.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/deprecated_utils.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/exceptions.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/head_tail.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/naming.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/parser.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/parsetab.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/pretty.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/tests.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/tree.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/utils.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/__pycache__/visitor.cpython-38.pyc
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/auto_head_tail.py
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/check.py
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/deprecated_utils.py
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/elasticsearch/
./home/alex/projets/luqum/venv/lib/python3.8/site-packages/luqum/elasticsearch/__init__.py
... etc
Thanks for your work,
Mirek
I need to customize the behavior of some elements inside ElasticsearchQueryBuilder
. Sadly it is not currently possible to just change these element without overriding every method to use my custom element.
Could you consider making these elements an attribute of the class so that we can override them more easily, i.e. something like :
class ElasticsearchQueryBuilder(LuceneTreeVisitorV2):
E_MUST = EMust
E_SHOULD = Eshould
[...]
I can make a PR if you agree with the idea.
Any plans to add support for the regexp search operator ("/")?
Example from ElasticSearch documentation:
name: /joh?n(ath[oa]n)/
When calling parser.parse on "field_42:42" the parse will not break it into search field and term.
It will parse the entire string as a single word.
The root cause of this is the TERM regex:
(?P^\s:^,"'+~-()[]{}*)
which can't break the string into groups.
Hi,
Anyone knows a module for applying lucene-DSL query on a python dictionary?
Something like an in-memory Elastic implementation in python
Thanks.
Luqum is mostly working perfectly for me, but I've just hit a bit of a snag. I have a double SearchField
to perform a more advanced query, but I've just realised it won't work with spaces.
>>> field:value1:"value 2"
SearchField('field', SearchField('value1', Phrase('"value 2"')))
>>> field:"value 1":"value 2"`
luqum.parser.ParseError: Syntax error in input at LexToken(COLUMN,':',1,24)!
Is this a limitation of yacc
or is there a way I could get this working?
In the meantime, I've used this to convert "value 1"
to value 1
, it's awfully messy though.
# This will convert 'field:"value 1":"value 2"' to 'field:value 1:"value 2"'
# It will need to be decoded again before being used
offset = 0
while True:
try:
index = value[offset:].index(':"') + 1
except ValueError:
break
offset += index
end = False
for i, c in enumerate(value[offset:]):
if not end:
if i and c == '"':
end = True
elif c == ' ':
break
elif c == ':':
word = value[offset:offset+i]
new_word = word[1:-1].replace(' ', ' ')
value = value[:offset] + new_word + value[offset+i:]
offset += len(new_word) - len(word)
break
Thank you very much, you have created a really amazing library. 👍🏻
I have come across a special case. I have keyword fields that contain wildcard characters (* or ?). In Elasticsearch this is no problem at all. But it seems luqum has some difficulties with this use case.
Here is an example of indexing a document with a keyword field containing wildcard characters using ES.
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts="http://localhost:9200")
mappings = {"properties":{"vendor":{"type":"keyword"}}}
es.indices.create(index="test", mappings=mappings)
es.index(index="test", body={"vendor": "f**k"}, id="example")
Now I want to search for the field. The following works, but is not what I want, because it does a wildcard search and not an exact term search.
es.search(body={
"query": {
"query_string": {
"query": "vendor:f**k"
}
}
}, index="test")
{'took': 2,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 1, 'relation': 'eq'},
'max_score': 1.0,
'hits': [{'_index': 'test',
'_id': 'example',
'_score': 1.0,
'_source': {'vendor': 'f**k'}}]}}
(1) To search exact you have to escape the wildcard characters. This works in ES.
es.search(body={
"query": {
"query_string": {
"query": "vendor:f\*\*k"
}
}
}, index="test")
{'took': 1,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 1, 'relation': 'eq'},
'max_score': 0.2876821,
'hits': [{'_index': 'test',
'_id': 'example',
'_score': 0.2876821,
'_source': {'vendor': 'f**k'}}]}}
(2) Alternatively you can also use a phrase query. This works in ES.
es.search(body={
"query": {
"query_string": {
"query": 'vendor:"f\*\*k"'
}
}
}, index="test")
{'took': 1,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 1, 'relation': 'eq'},
'max_score': 0.2876821,
'hits': [{'_index': 'test',
'_id': 'example',
'_score': 0.2876821,
'_source': {'vendor': 'f**k'}}]}}
Now when I try both (1) and (2) with luqum, it doesn't seem to work.
from luqum.elasticsearch import SchemaAnalyzer, ElasticsearchQueryBuilder
schema_analizer = SchemaAnalyzer({"mappings": mappings})
es_builder = ElasticsearchQueryBuilder(**schema_analizer.query_builder_options())
(1) Luqum creates a wildcard query when the "*" characters are escaped. This behaviour is different from ES and not what I expected. Apparently the escape characters are not removed either.
from luqum.parser import parser
es_builder(parser.parse("vendor:f\*\*k"))
{'wildcard': {'vendor': {'value': 'f\\*\\*k'}}}
(2) Luqum creates a wildcard query when the search term is entered as a phrase. This behaviour is also different from ES and not what I expected.
from luqum.parser import parser
es_builder(parser.parse('vendor:"f**k"'))
{'wildcard': {'vendor': {'value': 'f**k'}}}
Somehow I don't see any possibilities to formulate a query string in such a way that a term with "*" can be searched for exactly.
Regards, André
I have a double-nested schema given here:
https://gist.github.com/seandavi/528e98e943b24a7ef365fbf1e937f5ba
It seems that the top-level path is dropped by some methods of the SchemaAnalyzer
when I use this schema.
import json
m2 = json.load(open('MAPPING_FILE.json'))
m3 = luqum.elasticsearch.SchemaAnalyzer({"mappings" : m2['sra_experiment_joined2']['mappings']['doc']['properties']})
And the output of methods:
m3.nested_fields()
Out[900]:
{'attributes': {'tag': {}, 'value': {}},
'identifiers': {'id': {}, 'namespace': {}, 'uuid': {}},
'reads': {'base_coord': {},
'read_class': {},
'read_index': {},
'read_type': {}},
'xrefs': {'db': {}, 'id': {}}}
In this schema, then, the problem arises from the fact that the nested field names (without the parent) are repeated. This results in too short a list and the paths are not complete.
And sub_fields:
list(m3.sub_fields())
Out[908]:
['tag.keyword',
'value.keyword',
'id.keyword',
'namespace.keyword',
'uuid.keyword',
'Status.keyword',
'accession.keyword',
'alias.keyword',
'attributes.tag.keyword',
'attributes.value.keyword',
'broker_name.keyword',
'center_name.keyword',
'experiment_accession.keyword',
'identifiers.id.keyword',
'identifiers.namespace.keyword',
'identifiers.uuid.keyword',
'reads.read_class.keyword',
'reads.read_type.keyword',
'run_accession.keyword',
'run_center.keyword',
'BioSample.keyword',
'GEO.keyword',
'Status.keyword',
'accession.keyword',
'alias.keyword',
'attributes.tag.keyword',
'attributes.value.keyword',
'broker_name.keyword',
'center_name.keyword',
'description.keyword',
'identifiers.id.keyword',
'identifiers.namespace.keyword',
'numeric_properties.property_id.keyword',
'numeric_properties.unit_id.keyword',
'ontology_terms.keyword',
'organism.keyword',
'sample_type.keyword',
'title.keyword',
'xrefs.db.keyword',
'xrefs.id.keyword',
'BioProject.keyword',
'GEO.keyword',
'Status.keyword',
'abstract.keyword',
'accession.keyword',
'alias.keyword',
'attributes.tag.keyword',
'attributes.value.keyword',
'broker_name.keyword',
'center_name.keyword',
'description.keyword',
'identifiers.id.keyword',
'identifiers.namespace.keyword',
'study_accession.keyword',
'study_type.keyword',
'title.keyword',
'xrefs.db.keyword',
'xrefs.id.keyword',
'db.keyword',
'id.keyword']
Again, note that the field names are missing the parent in the name.
It is quite possible that I am misusing the SchemaAnalyzer
, so any thoughts you have are appreciated.
I don't know much about yacc
, but I've been putting up with this warning each time I use luqum
(I presume you already know it):
WARNING: Token 'SEPARATOR' defined, but not used
WARNING: There is 1 unused token
According to this stackoverflow question, the guy with a similar issue just removed the token from the list and it was fine.
Doing that fixed the warning for me, so I'm wondering if there's a particular reason you are keeping the SEPARATOR
token? If it's genuinely not used, shouldn't it be removed from the source code?
The following should be a valid lucene syntax:
name:bob city:nyc
it should equivalent to:
name:bob OR city:nyc
But I get UnknowOperation instead of OrOperation when parsing the string.
Is it possible to force having a minimum_should_match
value on any bool
values when an OR
operation is present?
Currently I have something like:
search_content = "(a OR b) AND (b OR c)"
tree = parser.parse(search_content)
query = ES_BUILDER(tree)
And the query yields:
{
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"content": {
"query": "a",
"zero_terms_query": "none"
}
}
},
{
"match": {
"content": {
"query": "b",
"zero_terms_query": "none"
}
}
}
]
}
},
{
"bool": {
"should": [
{
"match": {
"content": {
"query": "b",
"zero_terms_query": "none"
}
}
},
{
"match": {
"content": {
"query": "c",
"zero_terms_query": "none"
}
}
}
]
}
}
]
}
}
Whereas I'd like it to be:
{
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"content": {
"query": "a",
"zero_terms_query": "none"
}
}
},
{
"match": {
"content": {
"query": "b",
"zero_terms_query": "none"
}
}
}
]
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"content": {
"query": "b",
"zero_terms_query": "none"
}
}
},
{
"match": {
"content": {
"query": "c",
"zero_terms_query": "none"
}
}
}
]
}
}
]
}
}
But perhaps I'm doing something wrong?
I see on the documentation that we can manipulate the parsed tree in order to change the value of a field or expression. But is it possible to manipulate the tree in order to also append an extra element?
Example, manipulate:
dog: "Max" AND color: "brown"
into:
(name:"Max" AND animal:"dog") AND color: "brown"
I can already convert "dog" into "name" by using the LuceneTreeTransformer example on the documentation, but how about adding new nodes? Is it possible? If so can can anyone share a simple example?
Thanks
Hi everyone,
In the latest version 0.9.0, I think there is a problem with the required dependency elasticsearch-dsl. Installing this package by pip, the compiler shows an error due to missing dependency.
I think it's necessary to add in requirement.txt and setup.py.
Hello!
First of all I'd like to thank the authors of this amazing library for all the effort - this library really helps when it comes to querying nested fields with ES' Query String Query.
I've found some issue concerning the parser when it comes to parsing multi-level nested fields (nested fields within nested fields).
Here's my definition of ElasticsearchQueryBuilder
:
es_builder = ElasticsearchQueryBuilder(nested_fields={
"study_units": {
"country": {"id": {}, "name": {}}
})
The query looks like this: study_units.country.name:italy
The produced output looks like this:
{
'nested': {
'query': {
'nested': {
'query': {
'match': {
'study_units.country.name': {
'type': 'phrase',
'query': 'italy',
'zero_terms_query': 'none'
}
}
},
'path': 'study_units.country'
}
},
'path': 'name'
}
}
I think that the most outer path
key should have the value of study_units
instead of name
.
My question - is this a problem with my definition of nested fields, or something is off here?
The parser doesn't handle escaped characters.
See: https://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping%20Special%20Characters
Hello!
As mentioned in #15, I'd like to propose a solution for handling elasticsearch-dsl's Object
fields (or ES objects in general). During the flattening of nested objects in Elasticsearch, fields in form of:
country = Object(properties={
"id": Integer(),
"name": Text()
})
are transformed into a flat structure, as can be seen here: link
Due to the fact, that here:
luqum/luqum/elasticsearch/visitor.py
Lines 211 to 213 in a8002c2
is the '.' in node.name
check present, parsed queries in form of country.name:Italy
will be transformed to a nested query, which will cause hiccups in Elasticsearch.
Now, the country.name
seems to be parsed correctly by YACC as dot isn't a special char. It's the check in the mentioned file causes the problem for me here.
What have I done - I've subclassed ElasticsearchQueryBuilder as follows:
class ElasticsearchDotAwareQueryBuilder(ElasticsearchQueryBuilder):
def _is_nested(self, node):
for child in node.children:
if isinstance(child, SearchField):
return True
elif self._is_nested(child):
return True
return False
And now I have both :
available for nested fields, as well as .
for flat objects.
I didn't have enough time to dive deeply but I think it shouldn't break other functionalities. My question is - why was the check for dot presence there in the first place - is it just an alternative to :
, just not introduced on parsing level but later in the query building process, or am I missing something?
We have a Lucene query with the syntax like below:
(state: "Completed" OR "Cancelled") AND (segment: "total" OR "cancelled") AND NOT (comment:"This is a sample")
The above Lucene query expects the events having values for:
However, the DSL query formed using the luqum module for the above lucene query is as follows:
{'query': {'bool': {'must': [{'bool': {'should': [{'match_phrase': {'state': {'query': 'Completed'}}}, {'match_phrase': {'text': {'query': 'Cancelled'}}}]}}, {'bool': {'should': [{'match_phrase': {'segment': {'query': 'total'}}}, {'match_phrase': {'text': {'query': 'cancelled'}}}]}}, {'bool': {'must_not': [{'match_phrase': {'comment': {'query': 'This is a sample'}}}]}}]}}}
It can be seen in the above DSL that the 'state' field is now just expecting 'Completed'. 'Canceled' value is not getting expected from the 'text' field which is not in our environment. Similar behavior is seen in the 'segment' field parsing as well.
Can you help us on priority to resolve this issue so that we can continue leveraging luqum module in our application?
import _thread
from luqum.parser import parser
def run():
qs1 = '(title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR ' \
'title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick ' \
'fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox'
qs2 = '(title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR ' \
'title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick ' \
'fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox'
parser.parse(qs1)
parser.parse(qs2)
# The larger the range, the more likely it is
for i in range(100):
_thread.start_new_thread(run, ())
# The single thread works properly
# for i in range(1000):
# run()
raise error:
luqum.exceptions.ParseSyntaxError: Syntax error in input : unexpected end of expression (maybe due to unmatched parenthesis) at the end!
Is there any example of a tree visitor implmentation subclassing TreeVisitor base class ?
Can't find any example in the documentation.
My first use case is a visitor returning me a list of the search fields present in a tree.
Luqum is working great for me and my test users, but one thing that the test users miss is the behavior of query_string to do a full-text search across all fields when no field is specified (eg., "London
") . I see the ability to specify a default fields, but this results in a simple match
query. I guess I am looking to convert these to multi-match
with all available text fields? Any suggestions?
I have a pretty naive user community that likes simple plain-text search, but I also want to support power users with nested, field-based queries. When I have a query like:
cancer AND study.title:colon
The query translation after ElasticsearchQueryBuilder results in:
{"bool": {"must": [{"match": {"text": {"query": "cancer", "zero_terms_query": "all"}}}, {"match": {"study.title": {"query": "colon", "zero_terms_query": "all"}}}]}}
I'd like to simulate the behavior of the query_string with cancer
to match against all available fields (not just a single field, as above). Is there a recommended configuration or approach that I can use to have the "best of both worlds" with nested and object query support while maintaining free text search for bare text? Sorry if I missed this in the docs.
Hello, I am having a problem executing a simple query.
I am running Elasticsearch 6.2.2 locally. From a clean installed base, I do
POST /accounts/person/
{
"name" : "John",
"lastname" : "Doe",
"job_description" : "Systems administrator and Linux specialist"
}
I can then run:
GET /accounts/_search?q=name:john
which returns a proper result. I cannot reproduce this result with luqum
however. I am trying:
from elasticsearch import Elasticsearch
client = Elasticsearch(host='localhost', port=9200)
client.info()
{'name': 'jQqh6TD',
'cluster_name': 'elasticsearch',
'cluster_uuid': 'vMnMGP4XRYC6CAJN7lOzyw',
'version': {'number': '6.2.2',
'build_hash': '10b1edd',
'build_date': '2018-02-16T19:01:30.685723Z',
'build_snapshot': False,
'lucene_version': '7.2.1',
'minimum_wire_compatibility_version': '5.6.0',
'minimum_index_compatibility_version': '5.0.0'},
'tagline': 'You Know, for Search'}
schema = client.indices.get_mapping(index='accounts')
schema
{'accounts': {'mappings': {'person': {'properties': {'address': {'properties': {'city': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'street': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}},
'height': {'type': 'long'},
'job_description': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'lastname': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'name': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}}
from luqum.parser import parser
tree = parser.parse('name:john')
query = es_builder(tree)
query
{'match': {'name': {'query': 'john', 'zero_terms_query': 'none'}}}
response = client.search(
index='accounts',
body=query
)
This search returns the following stack trace:
GET http://localhost:9200/accounts/_search [status:400 request:0.002s]
---------------------------------------------------------------------------
RequestError Traceback (most recent call last)
<ipython-input-187-e42ebfa4fba0> in <module>()
1 response = client.search(
2 index='accounts',
----> 3 body=query
4 )
/anaconda3/lib/python3.6/site-packages/elasticsearch/client/utils.py in _wrapped(*args, **kwargs)
74 if p in kwargs:
75 params[p] = kwargs.pop(p)
---> 76 return func(*args, params=params, **kwargs)
77 return _wrapped
78 return _wrapper
/anaconda3/lib/python3.6/site-packages/elasticsearch/client/__init__.py in search(self, index, doc_type, body, params)
653 index = '_all'
654 return self.transport.perform_request('GET', _make_path(index,
--> 655 doc_type, '_search'), params=params, body=body)
656
657 @query_params('_source', '_source_exclude', '_source_include',
/anaconda3/lib/python3.6/site-packages/elasticsearch/transport.py in perform_request(self, method, url, headers, params, body)
316 delay = 2**attempt - 1
317 time.sleep(delay)
--> 318 status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
319
320 except TransportError as e:
/anaconda3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore, headers)
183 if not (200 <= response.status < 300) and response.status not in ignore:
184 self.log_request_fail(method, full_url, url, body, duration, response.status, raw_data)
--> 185 self._raise_error(response.status, raw_data)
186
187 self.log_request_success(method, full_url, url, body, response.status,
/anaconda3/lib/python3.6/site-packages/elasticsearch/connection/base.py in _raise_error(self, status_code, raw_data)
123 logger.warning('Undecodable raw error response from server: %s', err)
124
--> 125 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
126
127
RequestError: TransportError(400, 'parsing_exception', 'Unknown key for a START_OBJECT in [match].')
Please advise on this. Perhaps I am missing something simple! Thank you very much.
Just a question: Is the keyword
datatype supported?
Hi, we have problem when we searching for license for this project we found License: GNU Lesser General Public License v3 or later (LGPLv3+) on https://pypi.org/project/luqum/ but in this repository we see GNU General Public License v3.0
please which is correct ? however we recommend using LGPL because GPL is not suitable for most of companies
The forward slash character is currently not entirely allowed. It is dropped. I consider this character to be relevant in values of Search Fields. For a motivation, consider GitHub code search by file location which uses it.
The query foo bar:/baz
currently gets parsed as foo bar:baz
which I think is incorrect.
https://readthedocs.org/projects/luqum/ last version is 7 monthes old !
Unfortunately, we are stuck in Py2, and lurum is only compatible with Python. The only thing making it incompatible is the yield from
usage, which is very very easy to convert into supported syntax. Would that be welcomed as a PR?
We have these warning messages when we start the django shell with the 3.10 ply version
WARNING: yacc table file version is out of date
WARNING: Couldn't open 'parser.out'. [Errno 13] Permission non accordée: '/usr/local/lib/python3.4/dist-packages/luqum/parser.out'
WARNING: Token 'SEPARATOR' defined, but not used
WARNING: There is 1 unused token
Generating LALR tables
WARNING: 11 shift/reduce conflicts
WARNING: Couldn't create 'luqum.parsetab'. [Errno 13] Permission non accordée: '/usr/local/lib/python3.4/dist-packages/luqum/parsetab.py'
In the case that the returned node is the initial one plus something (like going from "foo" to "foo OR oof") the class do not stop visiting the new nodes, and since the first one is not removed it keeps triggering the transformation.
A solution could be to analyze only the initial query and not the new ones that have been added.
I was under the impression luqum would be able to catch syntax issues, but is that not the case?
test_query = '''http://crazy.c'"om OR a"teste"'''
tree = parser.parse('content: ({})'.format(test_query))
print str(tree)
es_builder = ElasticsearchQueryBuilder(not_analyzed_fields=["published", "tag"])
query = es_builder(tree)
print query
just prints:
content:(http\:\/\/crazy.c'"om OR a"teste")
{'bool': {'should': [{'match': {'content.http\\': {'query': '\\/\\/crazy.c\'"om', 'zero_terms_query': 'none'}}}, {'match': {'content': {'query': 'a"teste"', 'zero_terms_query': 'none'}}}]}}
which is not accepted syntax for ES.
Luqum parser yields UnknownOperation
when parsing inequality with a FieldGroup.
Equality with FieldGroup:
>>> a = "a:(1 OR 2)"
>>> tree = parser.parse(a)
>>> print(repr(tree))
SearchField('a', FieldGroup(OrOperation(Word('1'), Word('2'))))
Inequality with FieldGroup:
>>> a = "a:>(1 OR 2)"
>>> tree = parser.parse(a)
>>> print(repr(tree))
UnknownOperation(SearchField('a', Word('>')), Group(OrOperation(Word('1'), Word('2'))))
Expected result:
SearchField('a', FieldGroup(OrOperation(Word('>1'), Word('>2'))))
Take note that missing is deprecated in 2.2.0
It seems like you forgot to include CHANGELOG.rst
in the newest release, which means https://github.com/jurismarches/luqum/blob/master/setup.py#L9 will fail and the package can't be installed.
$ sudo -H pip3 install luqum
Collecting luqum
Using cached luqum-0.6.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-vlx183/luqum/setup.py", line 9, in <module>
with open('CHANGELOG.rst', 'r') as f:
IOError: [Errno 2] No such file or directory: 'CHANGELOG.rst'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-vlx183/luqum/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.