elastic / elasticsearch-dsl-py Goto Github PK

View Code? Open in Web Editor NEW

3.8K 3.8K 799.0 1.55 MB

High level Python client for Elasticsearch

Home Page: http://elasticsearch-dsl.readthedocs.org

License: Apache License 2.0

Python 100.00%

elasticsearch python search

elasticsearch-dsl-py's Introduction

Elasticsearch DSL

Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built on top of the official low-level client (elasticsearch-py).

It provides a more convenient and idiomatic way to write and manipulate queries. It stays close to the Elasticsearch JSON DSL, mirroring its terminology and structure. It exposes the whole range of the DSL from Python either directly using defined classes or a queryset-like expressions.

It also provides an optional wrapper for working with documents as Python objects: defining mappings, retrieving and saving documents, wrapping the document data in user-defined classes.

To use the other Elasticsearch APIs (eg. cluster health) just use the underlying client.

Installation

pip install elasticsearch-dsl

Feedback 🗣️

The engineering team here at Elastic is looking for developers to participate in research and feedback sessions to learn more about how you use our Python client and what improvements we can make to their design and your workflow. If you're interested in sharing your insights into developer experience and language client design, please fill out this short form. Depending on the number of responses we get, we may either contact you for a 1:1 conversation or a focus group with other developers who use the same client. Thank you in advance - your feedback is crucial to improving the user experience for all Elasticsearch developers!

Examples

Please see the examples directory to see some complex examples using elasticsearch-dsl.

Compatibility

The library is compatible with all Elasticsearch versions since 2.x but you have to use a matching major version:

For Elasticsearch 8.0 and later, use the major version 8 (8.x.y) of the library.

For Elasticsearch 7.0 and later, use the major version 7 (7.x.y) of the library.

For Elasticsearch 6.0 and later, use the major version 6 (6.x.y) of the library.

For Elasticsearch 5.0 and later, use the major version 5 (5.x.y) of the library.

For Elasticsearch 2.0 and later, use the major version 2 (2.x.y) of the library.

The recommended way to set your requirements in your setup.py or requirements.txt is:

# Elasticsearch 8.x
elasticsearch-dsl>=8.0.0,<9.0.0

# Elasticsearch 7.x
elasticsearch-dsl>=7.0.0,<8.0.0

# Elasticsearch 6.x
elasticsearch-dsl>=6.0.0,<7.0.0

# Elasticsearch 5.x
elasticsearch-dsl>=5.0.0,<6.0.0

# Elasticsearch 2.x
elasticsearch-dsl>=2.0.0,<3.0.0

The development is happening on main, older branches only get bugfix releases

Search Example

Let's have a typical search request written directly as a dict:

from elasticsearch import Elasticsearch
client = Elasticsearch("https://localhost:9200")

response = client.search(
    index="my-index",
    body={
      "query": {
        "bool": {
          "must": [{"match": {"title": "python"}}],
          "must_not": [{"match": {"description": "beta"}}],
          "filter": [{"term": {"category": "search"}}]
        }
      },
      "aggs" : {
        "per_tag": {
          "terms": {"field": "tags"},
          "aggs": {
            "max_lines": {"max": {"field": "lines"}}
          }
        }
      }
    }
)

for hit in response['hits']['hits']:
    print(hit['_score'], hit['_source']['title'])

for tag in response['aggregations']['per_tag']['buckets']:
    print(tag['key'], tag['max_lines']['value'])

The problem with this approach is that it is very verbose, prone to syntax mistakes like incorrect nesting, hard to modify (eg. adding another filter) and definitely not fun to write.

Let's rewrite the example using the Python DSL:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search

client = Elasticsearch("https://localhost:9200")

s = Search(using=client, index="my-index") \
    .filter("term", category="search") \
    .query("match", title="python")   \
    .exclude("match", description="beta")

s.aggs.bucket('per_tag', 'terms', field='tags') \
    .metric('max_lines', 'max', field='lines')

response = s.execute()

for hit in response:
    print(hit.meta.score, hit.title)

for tag in response.aggregations.per_tag.buckets:
    print(tag.key, tag.max_lines.value)

As you see, the library took care of:

creating appropriate Query objects by name (eq. "match")
composing queries into a compound bool query
putting the term query in a filter context of the bool query
providing a convenient access to response data
no curly or square brackets everywhere

Persistence Example

Let's have a simple Python class representing an article in a blogging system:

from datetime import datetime
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text, connections

# Define a default Elasticsearch client
connections.create_connection(hosts="https://localhost:9200")

class Article(Document):
    title = Text(analyzer='snowball', fields={'raw': Keyword()})
    body = Text(analyzer='snowball')
    tags = Keyword()
    published_from = Date()
    lines = Integer()

    class Index:
        name = 'blog'
        settings = {
          "number_of_shards": 2,
        }

    def save(self, ** kwargs):
        self.lines = len(self.body.split())
        return super(Article, self).save(** kwargs)

    def is_published(self):
        return datetime.now() > self.published_from

# create the mappings in elasticsearch
Article.init()

# create and save and article
article = Article(meta={'id': 42}, title='Hello world!', tags=['test'])
article.body = ''' looong text '''
article.published_from = datetime.now()
article.save()

article = Article.get(id=42)
print(article.is_published())

# Display cluster health
print(connections.get_connection().cluster.health())

In this example you can see:

providing a default connection
defining fields with mapping configuration
setting index name
defining custom methods
overriding the built-in .save() method to hook into the persistence life cycle
retrieving and saving the object into Elasticsearch
accessing the underlying client for other APIs

You can see more in the persistence chapter of the documentation.

Migration from `elasticsearch-py`

You don't have to port your entire application to get the benefits of the Python DSL, you can start gradually by creating a Search object from your existing dict, modifying it using the API and serializing it back to a dict:

body = {...} # insert complicated query here

# Convert to Search object
s = Search.from_dict(body)

# Add some filters, aggregations, queries, ...
s.filter("term", tags="python")

# Convert back to dict to plug back into existing code
body = s.to_dict()

Development

Activate Virtual Environment (virtualenvs):

$ virtualenv venv
$ source venv/bin/activate

To install all of the dependencies necessary for development, run:

$ pip install -e '.[develop]'

To run all of the tests for elasticsearch-dsl-py, run:

$ python setup.py test

Alternatively, it is possible to use the run_tests.py script in test_elasticsearch_dsl, which wraps pytest, to run subsets of the test suite. Some examples can be seen below:

# Run all of the tests in `test_elasticsearch_dsl/test_analysis.py`
$ ./run_tests.py test_analysis.py

# Run only the `test_analyzer_serializes_as_name` test.
$ ./run_tests.py test_analysis.py::test_analyzer_serializes_as_name

pytest will skip tests from test_elasticsearch_dsl/test_integration unless there is an instance of Elasticsearch on which a connection can occur. By default, the test connection is attempted at localhost:9200, based on the defaults specified in the elasticsearch-py Connection <https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch /connection/base.py#L29> class. Because running the integration tests will cause destructive changes to the Elasticsearch cluster, only run them when the associated cluster is empty. As such, if the Elasticsearch instance at localhost:9200 does not meet these requirements, it is possible to specify a different test Elasticsearch server through the TEST_ES_SERVER environment variable.

$ TEST_ES_SERVER=my-test-server:9201 ./run_tests

Documentation

Documentation is available at https://elasticsearch-dsl.readthedocs.io.

Contribution Guide

Want to hack on Elasticsearch DSL? Awesome! We have Contribution-Guide.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

elasticsearch-dsl-py's People

Contributors

Stargazers

Watchers

Forkers

andrewgross sbellem robhudson aphillipo julianhille benoss dourvaris davionxiang tubular robertzp thibault svisser prashiyn maackle buildingenergy sunchen009 brianmego yohanboniface duydo dcwatson mojeto gdchamal davidawad krisb78 skylarker mmoonport kakakacool bendalexis robmurtha inconditus btx mamikonyana dangusev yuecong northstar wangganyu188 njoannin willingc westover abhimskywalker amitpandita kurhula suzaku sgerlach doordash 0bsearch davinirjr chrislusf mandeburka benjaminrigaud pquentin kaoskitn daviddyball georgewhewell mdj2 joshstegmaier lolletsoc akashaio wisertogether adisbladis fanchangyong hakanw loozhang zhangwei5095 everett-gillert 5monkeys sauramirez smal andrewyoung1991 alvinchow86 artursmet vovantics jthi3rry modulexcite yekeqiang bwreilly djt5019 viksit ngokevin dqsdhr afrancis13 reflection zhangqin h3idan poneyo jasisz gkirkpatrick craigloftus quiri shejianmin ziky90 nukemberg mikezian zenlotus genisd jesushernandez harshit298 harshmaur morty pombredanne

elasticsearch-dsl-py's Issues

Can't import anything from elasticsearch_dsl

Hello,

I can't seem to run the Peristance Example from the documentation : I can't import anything from py_elasticsearch.

Here is a REPL example session :

Python 3.4.0 (v3.4.0:04f714765c13, Mar 15 2014, 23:02:41)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> from elasticsearch import Elasticsearch
>>> from elasticsearch_dsl import DocType, String, Date, Integer, connections
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'DocType'
>>> import elasticsearch_dsl
>>> dir(elasticsearch_dsl)
['A', 'F', 'Q', 'SF', 'Search', 'VERSION', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'version', 'versionstr', 'aggs', 'exceptions', 'filter', 'function', 'query', 'result', 'search', 'utils']
>>> elasticsearch_dsl.version
(0, 0, 2)
>>>

Python version 3.4.0 on Mac OS X (Yosemite), Python 3.4.0
I installed the module with :
pip3 install elasticsearch_py

I have the same problem when using python 2 (Mac OS's stock interpreter)

AND filter doesn't convert nested filters into dictionaries.

If I use filter.And or filter.Or and call method to_dict() it's not converting nested filters into dictionaries.
I have created test for it. I shall do a pull request with this failing test.

AND'ing filters produces invalid query

In my code I'm dynamically building up a list of filters based on user input. I want to AND them together but when doing so the query gets nested in a bad way.

E.g.:

>>> import operator
>>> from elasticsearch_dsl import F
>>> filters = [F('term', category='games'), F('terms', tags=['rpg', 'action']), F('term', rating='M')]
>>> f = reduce(operator.and_, filters)
>>> f.to_dict()
{'bool': {'must': [{'bool': {'must': [{'term': {'category': 'games'}},
                                      {'terms': {'tags': ['rpg',
                                                          'action']}}]}},
                   {'term': {'rating': 'M'}}]}}

Alternatively someone might maintain a single variable and build up the filters using &=, which results in the same invalid query:

>>> f = filters[0]
>>> for filter in filters[1:]:
...   f &= filter
...
>>> f.to_dict()
{'bool': {'must': [{'bool': {'must': [{'term': {'category': 'games'}},
                                      {'terms': {'tags': ['rpg',
                                                          'action']}}]}},
                   {'term': {'rating': 'M'}}]}}

Is it posible to add non keyword constructor of NOT, OR, AND filters?

I want compose filter like this

from elasticsearch_dsl import filter as f

# my preferred, but not working way
filter_ = f.And([f.Not(f.Term(field1='value1')),
                 f.Or([f.Term(field2='value2'),
                       f.Range(field3={"gt": 34})])])

# working way, but why I have to specify 'filters' keyword in And and Or filters
# and 'filter' keyword in Not filter? 
filter_ = f.And(filters=[f.Not(filter=f.Term(field1='value1')),
                         f.Or(filters=[f.Term(field2='value2'),
                                       f.Range(field3={'gt': 34})])])

Will be great enumerate this Filter child classes in filter module also, because they are not showing in pycharm suggestions now.

When creating a DocType instance from results, save highlights somewhere

I don't think _meta is a good option since it won't be accessible in django templates for example.

How to create range filter

I want this:

"range": {
          "@timestamp": {
            "gte": "2014-12-3T16:00:00",
            "lte": "now"
          }
  }

but this code does different thing:

s = Search( using=client, index="my_index" )
s = s.filter( 'range', gte='now-1h', lte='now', field='@timestamp' )
 "range": {
          "field": "@timestamp", 
          "gte": "now-1h", 
          "lte": "now"
 }

@timestamp is put in the wrong place, how to correct it?

Populate Result list from both _source and fields

Hi,

When I don't specify any fields parameter in a query, the results objects are populated from the _source attribute. If the fields attributes are present in Elasticsearch response, then the results objects are populated from it.

Here is the relevant portion of code in results.py:

class Result(AttrDict):
    def __init__(self, document):
        if 'fields' in document:
            super(Result, self).__init__(document['fields'])
        else:
            super(Result, self).__init__(document['_source'])

Now, my problem is that I need to populate my Result objects with both _source and fields. It seems I cannot to that right now.

The reason I need to do that is because I define script_fields in my query, but I still need to get back the entire initial document.

I'm a bit of an ES noob, so do you think I'm doing something wrong, or is is a limitation that should be fixed?

Possible to add/use with `_msearch` API endpoint?

Unless I missed it, is it possible to use this package against the _msearch endpoint? (ie. send multiple queries in one API request)

Inner exceptions on @property DocType

Using a property on classes makes debugging a nightmare, for example:

from elasticsearch_dsl import DocType, Search
class Foo(DocType):
    @property
    def bars(self):
        return Search().eecute() # with a filter in reality

foo = Foo()
foo.bars

This will raise AttributeError: 'Foo' object has no attribute 'bars'

The actual exception is KeyError: "There is no connection with alias 'default'."

  File /elasticsearch-dsl/elasticsearch_dsl/search.py", line 366, in execute
    es = connections.get_connection(self._using)
  File "/elasticsearch-dsl/elasticsearch_dsl/connections.py", line 53, in get_connection
    raise KeyError('There is no connection with alias %r.' % alias)

I think the except AttributeError clauses in AttrDict and ObjectBase should be removed, I don't see how they add anything.

Querying for a range of dates.

Hello Honza, I'm hoping you might be able to help me out with one last issue i'm having. I need to query for all articles that have 404 errors as a response field and I also need to find all of them from withing the lasts 15 inutes. The issues show up inside of Kibana but they are not showing in my queries.

I've been having a really hard time getting this to work and there aren't any clear examples of querying in a specific range of time in the docs.
Here's my script.

s = Search(using=es, index= "_all") \
        .query("match", response = "404")\
        .filter('range', timestamp={'from': datetime.datetime.now() - datetime.timedelta(minutes=15), 'to' : datetime.datetime.now() }) #, 'lte': datetime(2010, 10, 9)})
        #.filter("range", '@timestamp': {"gte": "now-15m", "lte": "now"}) #, "lt" : "2014-12-31 1:00:00"}) 
        #filter is for a range of possible times. 
        #.filter("range", timestamp={ "gt":"now -5m","lt":"now" })
response = s.execute()

Firing range queries

I want to do a date range query with gte and lte. How do I go about doing this using the dsl?

misleading doc string?

doc string here:
https://github.com/elasticsearch/elasticsearch-dsl-py/blob/master/elasticsearch_dsl/search.py#L256-L257

Is it better change 'You can supply a single value or a list' to 'You can supply a single value or an unpacked list' ?
Since I directly pass a list and raise error without looking at the source code.

Support kwargs passed to `.query` similar to `.filter`

Using filter we can pass in a dict and it properly handles things by combining them into a bool filter:

>>> Search().filter('term', **{'tags': ['python', 'eggs'], 'category': 'Python'}).to_dict()
{'query': {'filtered': {'filter': {'bool': {'must': [{'term': {'category': 'Python',
                                                               'tags': ['python',
                                                                        'eggs']}}]}},
                        'query': {'match_all': {}}}}}

But trying to follow the same pattern with query doesn't work:

>>> Search().query('match', **{'name': 'python', 'body': 'python'}).to_dict()
{'query': {'match': {'body': 'python', 'name': 'python'}}}

and results in the following error:

[match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?

Just based on what I've learned playing with the DSL so far I'd expect this to result in a bool must match query?

Add support for adding highlighting to a `Search` object

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html

Question: How to access subfields?

hi, I want to know how to fitler with subfield.
I have tried several times but could not get the correct result.

for example:

F('term', field='value').to_dict()
Output: {'term': {'field': 'value'}}

how to write the code with equivalent result like these: {'term': {'field.subfield': 'value'}}

Thank you very much.

`parent` attribute in `Meta` class of DocType is not supported

Even though the documentation says that you could have an attribute named parent in the Meta class inside a DocType subclass. The code doesn't support that

Example:
If I want to construct the following mappings

    "mappings" : {
      "blog" : {
        "properties" : {
          "description" : {
            "type" : "string",
            "analyzer" : "caseinsensitive_analyzer"
          },
      },
      "post" : {
        "_parent" : {
          "type" : "blog"
        },
        "_routing" : {
          "required" : true
        },
        "properties" : { }
      }

From the documentation it seems like you should be able to do:

class Blog(DocType):
  description = String()

  class Meta:
    index = 'my_index'

class Post(DocType):
  class Meta:
    index = 'my_index'
    parent = 'blog'

but this will result in the following mappings

    "mappings" : {
      "blog" : {
        "properties" : {
          "description" : {
            "type" : "string",
            "analyzer" : "caseinsensitive_analyzer"
          },
      },
      "post" : {
        "properties" : { }
      }

Instead you have to find a workaround and do the following to create the desired mapping.

class Blog(DocType):
  description = String()

  class Meta:
    index = 'my_index'

class Post(DocType):
  class Meta:
    index = 'my_index'
    mapping = Mapping('post')
    mapping.meta('_parent', type='blog')

Preferred way to use Elasticsearch instance?

What is the recommended approach for using Elasticsearch instance?
1- Create a new instance each time while building a Search object
2- Initialize one Elasticsearch instance in settings file. That means, all the Search objects use the same Elasticsearch instance. It must be thread-safe in this case.

DocType doesn't work as expected with Search.

The persistence doc suggests that the following should work:

from elasticsearch_dsl import DocType, Search

class MyDoc(DocType):
    pass

s = Search(index='my_index')
s = s.doc_type(MyDoc)
s.execute()

But this results in a TypeError (using elasticsearch-py 1.2.0 and elasticsearch-dsl-py latest commit 60a1123):

elasticsearch/client/utils.pyc in _escape(value)
     16     # make sequences into comma-separated stings
     17     if isinstance(value, (list, tuple)):
---> 18         value = ','.join(value)
     19
     20     # dates and datetimes into isoformat

TypeError: sequence item 0: expected string or Unicode, DocTypeMeta found

Additionally, DocType.search doesn't take "index" or "using" parameters, so if I have an explicit index, I can't simply do:

s = MyDoc.search(index='my_index', using='custom_alias')

Release version 0.0.3?

Hello,

I need to use the match_phrase query and it is not included in the current version on pypi, 0.0.2. Would it be possible to cut a release so that newest features are made available? Or is there another way to access that feature without waiting for a release?

Cheers,
Adrian

The Persistance example does not work...

Hello,

I've just started looking at using DocType for my project but ran into some errors.
I then just copy/pasted to a file the persistance example from the readme file and obtained the same results when I ran it...

The output I got was:

Nicolass-MacBook-Pro: Nicolas$ python testing_doc_type.py 
Traceback (most recent call last):
  File "testing_doc_type.py", line 6, in <module>
    connections.add_connection(Elasticsearch())
AttributeError: 'module' object has no attribute 'add_connection'

I tried modifying the connections line so as to read:

connections.connections.add_connection(Elasticsearch())

But that resulted in the following error:

Nicolass-MacBook-Pro: Nicolas$ python testing_doc_type.py 
Traceback (most recent call last):
  File "testing_doc_type.py", line 6, in <module>
    connections.connections.add_connection(Elasticsearch())
TypeError: add_connection() missing 1 required positional argument: 'conn'

Looking at the Connections class, I am not sur what I should put in as "alias" (since Elasticsearch() should be "conn", right?).

I'm probably missing something basic here... Any help (and update to the readme file) would be greatly appreciated!

Nicolas

Using Python 3.3, elasticsearch-py 1.3.0, elasticsearch-dsl-py 0.0.3

Add support for querying the search suggestions

This seems like a separate method since it can't be chained, but it'd be nice to have a helper for this. Something like Suggest('n', field='suggest') where kwargs end up in the completion dict.

AttrDict doesn't work using `.get(...)`

My result has a field created. These access methods work:

>>> obj.created
u'2014-02-11T10:20:39'
>>> obj['created']
u'2014-02-11T10:20:39'

But using .get fails:

>>> obj.get('created')
*** AttributeError:

The reason is it is thinking I'm looking for obj['get'] b/c I'm doing an attribute lookup using .get. I think this will be a common thing and should be fixed.

I fixed it locally by adding something like this on AttrDict:

    def get(self, attr_name, default=None):
        # Don't confuse `obj.get('...')` as `obj['get']`.
        try:
            return self.__getattr__(attr_name)
        except AttributeError:
            return default

I can send a pull request, if preferred.

"found" field added after MyDocType.get(): expected behaviour?

I've been playing around with the persistance example.
I just noticed that after I use the get method, there is a "found" field added to my document.
Saving, then adds that field to the document, and obviously to the mapping.

Is this the expected behaviour?

Distribute with pypi?

Is this package available in pyi? I tried pip install elasticsearch-dsl-py, but no luck.

Strange import error under Django's management shell

While testing some code locally I ran into this. I boiled it down to as minimal a query as I could that causes the issue. I'm guessing something within elasticsearch_dsl is causing this?

$ ./manage.py shell --plain
Using settings_local.py
Raven is not configured (logging is disabled). Please see the documentation for more information.
Python 2.7.8 (default, Oct 27 2014, 15:34:41)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.54)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from elasticsearch_dsl import filter as es_filter
>>> from elasticsearch_dsl import Search
>>> Search().filter(es_filter.Bool(should=[es_filter.Bool(must=[es_filter.Term(id=6)])])).execute()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.7/site-packages/django/core/management/commands/shell.py", line 79, in handle_noargs
    raise ImportError
ImportError

syntax error when passing special fields as keyword arguments

Fields such as '@timestamp' and '@Version' (ones essential to Logstash) cannot be passed as keyword arguments to methods such as query and filter because the @ signed causes Python to throw a syntax error. These are needed in creating time range filters and queries. Perhaps making a special keyword arg called '__parent', which resolves to a key under which the other kwargs are nested would do the trick. So, for instance I invoke:

s = Search(using=es) \
.filter('range', __parent='@timestamp', gte='now-5m', lte='now')
s.to_dict()

and get

...
'filter': {
    'range': {
        '@timestamp': {
            'gte': 'now-5m',
            'lte': 'now'
...

Documentation example of overriding save method does not work under Python 2.7.6

    return super().save(**kwargs)
TypeError: super() takes at least 1 argument (0 given)

How to create filter query?

I want to ask if exists a better way, how to create filter query.

search = Search()

# I have query_string query which I want use inside a filter phrase
my_query = query.Q('query_string', query='complex query...')

# this command failure
my_filter_search = search.filter('query', my_query)

#this works in python but, elasticsearch return SearchPhaseExecutionException[Failed to execute phase [query]...
my_filter_search = search.filter('query', query=my_query)
# output is
# "filter": {
#     "query": {
#         "query": {
#             "query_string": {
#                 "query": "complex query..."
#             }
#         }
#     }
# }

# this works ok, but is it the right way?
my_filter_search = search.filter('query', **my_query.to_dict()) 
# output is
# "filter": {
#     "query": {
#         "query_string": {
#             "query": "complex query..."
#         }
#     }
# }

"query" filter type should be "fquery"

It doesn't seem to work at all now.

Querying a nested object

What is the proper way to query a nested object? Example:
s = Search(using=es).index("my_index").query("nested", path="features", query=Q("term", features__name="foo"))

The code expects keyword parameters, but specifying a nested field like 'features.name' as keyword parameter is not possible and 'features__name' is not converted into 'features.name'.

If it's not supported, I could add some code that will properly replace double underscores with periods.

Getting the TimeZone specific aggregations in Date Histogram

Hey Guys,
I am stuck at modifying the current timezone of a datetime field which is is by default UTC which implementing a datehistogram aggregation.
I am passing a param time_zone='+05:30' for setting the timezone of that field to Asia/Kolkatta but still dont get the desired results.
Here's the query and the output

Query:

s = Search(using=es).index("my-index")
s.aggs.bucket('top_tags','date_histogram',field='my_date_field',interval='year',time_zone='+05:30')
r = s.execute()
r.aggregations.top_tags.buckets

Result:

[{u'doc_count': 392099,
u'key': 1356998400000,
u'key_as_string': u'2013-01-01T00:00:00.000Z'},
{u'doc_count': 360,
u'key': 1388534400000,
u'key_as_string': u'2014-01-01T00:00:00.000Z'}]

Why isnt the modifying timezone reflecting here, Am i missing something here?

maximum recursion depth exceeded when or-ing with a negative filter

This small bit causes maximum recursion depth exceeded error in:

from elasticsearch_dsl import F
F('term', field='value') | ~F('term', field='value')

  File ".../elasticsearch_dsl/utils.py", line 324, in __or__
    return super(self.__class__, self).__or__(other)
  File ".../elasticsearch_dsl/utils.py", line 312, in __or__
    if not (self.must or self.must_not):
  File ".../elasticsearch_dsl/utils.py", line 181, in __getattr__
    if isinstance(value, dict):
RuntimeError: maximum recursion depth exceeded while calling a Python object

Printing out all fields of an entry. without knowing what they are.

Hello again friend! So I'm using the dsl library and i want to know if i can print out an entire entry from the response given. I'm using the following script to search a test instance.

import elasticsearch
from elasticsearch_dsl import Search, Q
from datetime import datetime

es = elasticsearch.Elasticsearch([{u'host': u'himanshu.addteq.com', u'port': b'9200'}])

es.index(index="david", doc_type="test-type", id=42, body={"any": "data", "timestamp": datetime.now()})

s = Search(using=es, index= "_all")
response = s.execute()
for hit in response:
        print hit

and I get this output.

<Result(logstash-2014.12.30/apache/F60rG4hiS42FKL-cx6rnTA): {u'tags': [u'_grokparsefailure'], u'@version': u'1', u'@time...}>
<Result(logstash-2011.08.30/apache/L_CO1AOSTnuvjiBKMl8T_g): {u'ident': u'-', u'referrer': u'"-"', u'@version': u'1', u'@...}>
<Result(logstash-2011.08.30/apache/YEXRqjFjSuOKE6cwLLg0Yw): {u'ident': u'-', u'referrer': u'"-"', u'@version': u'1', u'@...}>
<Result(logstash-2011.08.30/apache/l6QsZNmKQo-s9ZBOftlW4w): {u'ident': u'-', u'referrer': u'"-"', u'@version': u'1', u'@...}>
<Result(logstash-2011.08.30/apache/ONhdAJVhSJC5rfnKAvB78g): {u'ident': u'-', u'referrer': u'"-"', u'@version': u'1', u'@...}> 
...
<Result(logstash-2011.08.30/apache/MeGfj_LTSQWc6D8XugalAg): {u'ident': u'-', u'referrer': u'"http://www.semicomplete.com...}>

I can print the individual fields, like hit.host for example, but printing hit itself does not give me the entire set, I am only given these errors. Thanks again!

How to set source filtering in query?

I would like to set the source filter. Is this feature supported by the DSL? If not, would you add this?

term filter not working with string field

I'm trying to run a match query and a term filter with elasticsearch-dsl.
When I run Search(es, index='haystack').query("match", _all=q) the results return as expected. But when I add a filter Search(es, index='haystack').query("match", _all=q).filter('term', tipo_anuncio='Aluguel'), the response is empty, when I expected to be the first item of the list bellow.

When I apply the same filter but with a numeric variable, for example Search(es, index='haystack').query("match", _all=q).filter('term', num_banheiros=4) the filter works. What am I doing wrong?

This is a part of my script in Django:

q = request.GET.get('q', '')
es = Elasticsearch()
s = Search(es, index='haystack').query("match", _all=q).filter('term', tipo_anuncio='Aluguel')
results = s.execute()

This is my result when I run the search without the filter:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.97323877,
"hits": [
{
"_index": "haystack",
"_type": "modelresult",
"_id": "anunciar.anunciarimovel.2",
"_score": 0.97323877,
"_source": {
"area_util": "600",
"tipo_anuncio": "Aluguel",
"num_banheiros": "4"
}
},
{
"_index": "haystack",
"_type": "modelresult",
"_id": "anunciar.anunciarimovel.7",
"_score": 0.97323877,
"_source": {
"area_util": "380",
"tipo_anuncio": "Venda",
"num_banheiros": "3"
}
}
]
}
}

Thanks!

`match_phrase` and `match_phrase_prefix` queries missing

Need to add them to query.py

('match_phrase', None),
('match_phrase_prefix', None),

update_from_dict fails with more than one aggregations at the same level

Howdy,

The update_from_dict func seems to fail when you have more than one aggregation under the same bucket.

Example:

import pytest
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q, F, A

# under the "foo_range" agg we have two aggs: "foo_range" and "foo_average"
query_body = {
            "query": {
                "filtered": {
                    "query": {
                        "match_all": {}
                    },
                    "filter": {
                        "bool": {
                            "must": [],
                            "must_not": [],
                            "should": []
                        }
                    }
                }
            },
            "aggs": {
                "foo_over_time": {
                    "date_histogram": {
                        "field": "somefield",
                        "interval": "day",
                        "format": "yyyy-MM-dd"
                    },
                    "aggs": {
                        "foo_range": {
                            "range": {
                                "field": "otherfield",
                                "keyed": True,
                                "ranges": [
                                    {
                                        "key": "Negative",
                                        "from": -100,
                                        "to": 0
                                    },
                                    {
                                        "key": "Neutral",
                                        "from": 0,
                                        "to": 1
                                    },
                                    {
                                        "key": "Positive",
                                        "from": 1,
                                        "to": 101
                                    }
                                ]
                            }
                        },
                        "score": {
                            "avg": {
                                "field": "interaction.cx_score"
                            }
                        }
                    }
                }
            }
        }

def test_build_search_from_dict_explodes_with_nested_aggs():
    s = Search().from_dict(query_body)

"Failure" seems to be here:

# line 19 or so
        if aggs:
            self.aggs._params = {
                'aggs': dict(
                    (name, A(value)) for (name, value) in iteritems(aggs))
            }

Is this the normal behaviour or am I messing things up?

Thanks.

How to return search results as JSON response of Restful service?

Is there any way to JSON serialize the search result (=Response object) and return as response of my Restful Django viewset endpoint? Since Response and Result objects' are invalidated JSON representation of objects (in __repr__ method) by adding some extra strings, it says Response object is not JSON serializable.

meta data not added to object when instantiated...

Is this the wanted behavior?

>>> a=Article(_id=3, title="Test")
>>> a.to_dict()
{'title': 'Test'}
>>> a.meta.to_dict()
{'id': 3}

The index metadata will only be added when the object is saved (or more probably when the object is updated after saving).

However, this prevents us from accessing this data directly before the object is saved.

As it is now, we have to call e.g.:

>>> a._doc_type.index
'blog'

Personally, I wouldn't mind having it accessible through a._index
before it is saved.
(Also, it would be nice that the doc_type metadata also be added to the meta field, but that's just my wishful thinking!)
What are your thoughts on this?

Using elasticsearch-dsl-py version 0.0.4 beta

Compatibility with ES 0.90.x

Is this client compatible with ES 0.90.x?

How do I go about doing faceting (ES 0.90.13)?

Accessing and setting document metadata directly vs through `_meta`

If you define a "score" field on a DocType, the _meta.score attribute is set to the document value rather than the _score attribute of the ES hit. I think this is due to ObjectBase.__init__ calling setattr, and the __setattr__ of DocType making a special case for META_FIELDS (which include "score") to set attributes on _meta. I'd imagine this is the case for any document field with the same name as a meta field.

It seems like DocType shouldn't try to expose/set _meta attributes directly. Or at least document which field names are not allowed on a DocType and raise an error if you try to define one that conflicts with _meta.

The Persistance example does not give the expected result

Running the persistance example from the readme file (after small modifications, see #66), I checked the mappings of the blog index: they do not correspond to what was encoded in the example.

As a reminder, here is the Article class:

class Article(DocType):
    title = String(analyzer='snowball', fields={'raw': String(index='not_analyzed')})
    body = String(analyzer='snowball')
    tags = String(index='not_analyzed')
    published_from = Date()
    lines = Integer()

And here is the resultant mapping in my instance of elasticsearch:

{
  "blog" : {
    "mappings" : {
      "article" : {
        "properties" : {
          "body" : {
            "type" : "string"
          },
          "lines" : {
            "type" : "long"
          },
          "published_from" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "tags" : {
            "type" : "string"
          },
          "title" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

As you can see, all fields with type: string are analyzed, whereas the title should be a multi-field and the tags field should not be analyzed.

Is this normal behavior?

Using: Python 3.3, elasticsearch-py 1.3.0, elasticsearch-dsl-py 0.0.3

Query String query.

Am I going mad or is the Search().query('query_string', query=q) not supported?

I have looked at the source and it looks like it should be trivial to add.

I'll give it a go and send you a pull request.

Thanks,

Andy.

No clean way of creating a request with an empty fields list

I want to generate an elasticsearch query that doesn't return any fields, just the _id of the document. The recommended way to do this, as described in the ES fields documentation, is to pass an empty list as the 'fields' parameter like this:

{ 'query': { 'match_all': {} }, 'fields': [] }

AFAICT, this isn't currently supported by elasticsearch-dsl-py. When loading an existing query from a dictionary with 'fields' = [], it even converts that to a request that requests the whole document for each hit.

We can support this functionality pretty easily by making search.fields(None) generate a request for no fields.

Nested filter error

I am using the query DSL to create a nested filter

query = query.filter("nested", f.Bool(must=filter_variants), path="object.info.variants")

object.info.variants is an array of nested documents in my case

And I have this error at execution
nested: QueryParsingException[[myindexname] [nested] filter does not support [filters]]

the nested part of the query generated by the DSL looks like this

             {
              "nested": {
                "path": "object.info.variants", 
                "filters": {
                  "bool": {
                    "must": [
                      {
                        "term": {
                          "object.info.variant.purchase_status.notanalyzed": "IN_STOCK"
                        }
                      }, 
                      {
                        "terms": {
                          "object.info.variants.search_sizes": [
                            "8", 
                            "9"
                          ]
                        }
                      }
                    ]
                  }
                }

line 10 of filter.py

    if filters is not None:
        params['filters'] = filters

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-filter.html#query-dsl-nested-filter
I changed "filters" to "filter" it is working for me.

I don't know if some filters are expecting "filters" instead of "filter", maybe the new aggregations filters in ES 1.4 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html

Agg percentile should be percentiles

e.g. should be (Agg, 'percentiles', None),

Aggregated Buckets return unexpected results when post_zone and min_doc_count =0 are passed

Dear Honza/ ES Team,

I am getting weird results when i use date_histogram aggregation along with various params such as pos_zone and min_doc_count=0 and it breaks on the Feb month and instead of returning the result from the 1st of the month it returns 28th of March instead of 31st.

s = Search(using=es).index(settings.ES_INDEX).doc_type("my_doc_type")
s = s.filter('range',date_field = {"from": date(2014,11,1)})
s.aggs.bucket('ranges', 'date_histogram', field='date_field', interval='month',post_zone="-5:30",min_doc_count=0)

{u'ranges': {u'buckets': [{u'doc_count': 143,
u'key': 1414780200000,
u'key_as_string': u'2014-10-31T18:30:00.000Z'},
{u'doc_count': 654,
u'key': 1417372200000,
u'key_as_string': u'2014-11-30T18:30:00.000Z'},
{u'doc_count': 0,
u'key': 1419964200000,
u'key_as_string': u'2014-12-30T18:30:00.000Z'},
{u'doc_count': 1494,
u'key': 1420050600000,
u'key_as_string': u'2014-12-31T18:30:00.000Z'},
{u'doc_count': 968,
u'key': 1422729000000,
u'key_as_string': u'2015-01-31T18:30:00.000Z'},
{u'doc_count': 0,
u'key': 1425148200000,
u'key_as_string': u'2015-02-28T18:30:00.000Z'},

Please look at the December month buckets.

Please let me know if i am doing something wrong here.

How to do a nested bool?

{
    "query": {
        "bool": {
            "must": [
                {
                     "term": {
                          "name1": "elasticsearch"
                     }
                },
                {
                    "term": {
                          "name2": "adskfjadj"
                     }
                },
                {
                     "bool": {
                          "should": [
                                 {
                                      "term": {
                                            "name": "kdsjfal"
                                       }
                                 },
                                 {
                                      "term": {
                                          "name": "asdlkfja"
                                       }
                                 }
                          ]
                     }
                }
            ]
       }
   }
}

Like above.. omg.. my hands are numb after typing that out..

Convert date/datetime strings to Python date/datetime

One of the nice things from elasticutils was automatic conversion of date/datetime strings in responses to Python date/datetime objects. What are you thoughts on adding something like that here?

Here's the code from elasticutils:
https://github.com/mozilla/elasticutils/blob/master/elasticutils/__init__.py#L423-L437

Something like this could be done at getattr time in AttrDict, updating the _d object on the fly on first access.

str check in sort

https://github.com/elasticsearch/elasticsearch-dsl-py/blob/master/elasticsearch_dsl/search.py#L246

This check means python 2.x can't pass in unicode field names. Even if non-ASCII field names aren't supported (don't actually know), it'd be nice to use the result of something like request.GET.get('sort') which returns a unicode.

elastic / elasticsearch-dsl-py Goto Github PK

elasticsearch-dsl-py's Introduction

Elasticsearch DSL

Installation

Feedback 🗣️

Examples

Compatibility

Search Example

Persistence Example

Migration from elasticsearch-py

Development

Documentation

Contribution Guide

License

elasticsearch-dsl-py's People

Contributors

Stargazers

Watchers

Forkers

elasticsearch-dsl-py's Issues

Recommend Projects

Recommend Topics

Recommend Org

Migration from `elasticsearch-py`