django-es / django-elasticsearch-dsl Goto Github PK

View Code? Open in Web Editor NEW

1.0K 37.0 261.0 329 KB

This is a package that allows indexing of django models in elasticsearch with elasticsearch-dsl-py.

License: Other

Python 100.00%

hacktoberfest elasticsearch

django-elasticsearch-dsl's People

Contributors

Stargazers

Watchers

Forkers

ipsolar dpgmediamagazines grendel7 drdaeman dstumm vjusto mazhar266 vamalyi bleib1dj diwu1989 m-shaka sourya jeffcjohnson arielpontes georgelubaretsi sajadbahar doselect edison7500 vinace onesbasket id13 caseydm saadmk11 cbinyenya dsanders11 xtimonx5 anas-aldrees corrosivekid scalershare roshansathya crsepulveda najamansari ahivert luxcem aquaristar dongqisilent punihaole uresell btimby ulive1s qwazaar akorosov mahaffey cocoakekeyu mnzaki leigh-johnson liaoguoyu macher91 anpr ericholscher jonathanbaugh tahirijaz24 emorozov tanveerahmad1517 zsk1993 globalradio jullindstr safu9 stefanw ayang abhishek-ag2000 safwanrahman foodvisor mjl fellowapp xaneem andreyrusanov akspi xupeng-sh smenateam williamren flavienliger dannyaziz tetsumote rgsoda gautamaggrawal itefect mrstupidbird deepampatel mrbai333 liamk debuggerpk maherpk andriilahuta atleysayone auxodev tsepelev jfeliu mrakrathod ezbc andykv jainal09 acetonen drfrink ogiogi93 zack752235424 cloudgeex anurag2090 fabienheureux philippe-vs

django-elasticsearch-dsl's Issues

Cannot serialize ImageFieldFile

Sorry to raise another issue, I tried looking though the source but could not find something I could easily to alter how it treats Django's ImageFieldFile. I keep getting the following error:

elasticsearch.exceptions.SerializationError: ({'image': <ImageFieldFile: None>}, TypeError("Unable to serialize <ImageFieldFile: None> (type: <class 'django.db.models.fields.files.ImageFieldFile'>)",))

I tired setting the attr property to image.url like below but in the event a file does not exist Django will raise an error.

image = fields.StringField(attr='image.url')

ValueError: The 'image' attribute has no file associated with it.

Do you know of any solutions to this?

Support for I18N

It would be really nice if I18N was supported for multilingual sites.

Deleting a nested object deletes the root object in index

Hi,

I have an index with nested objects (relationships), something like this:

@myindex.doc_type
class MyIndex(DocType):
    data = fields.NestedField(properties={
        'prop1': fields.StringField(),
        'prop2': fields.DateField(),
        [...]
    })

    class Meta:
        model = MyObject
        fields = [...]
        related_models = [Data]

    def get_instances_from_related(self, related_instance):
        return related_instance.myobject

Whenever a "data" object in inserted or updated in database, the "MyObject" index is updated. But when I delete a "data" object, the entire "MyObject" is deleted from the index, it is not updated with the "data" object removed.

Any idea ?

Thanks.

Suggestion: improvement on obtaining models from documents

I'd suggest to have some public API that allows to obtain model information from the search hit document. While it's totally possible to have all necessary data in ElasticSearch alone, sometimes it's necessary (or just preferable) to always fetch the fresh data from the main database.

A few ideas to consider:

Model PK is currently accessible as hit.meta["id"]. Looks reasonable, but an id or pk property on the document instance would be nicer.
The only ways to see associated model class I found were hit.get_queryset().model and hit._doc_type.model. The former may do some extra unnecessary stuff (we don't need a queryset, just a model, and get_queryset can do extra work). The latter is a private API, so it's a bad idea to rely on it.
I believe it would be nice to have something like get_model_class method on DocType subclass' instances, that would return whatever was declared in the document's Meta.model.
And it also would be nice to have a convenience methods that allows to fetch associated database model instance from a search result. Something in line of hit.get_model_instance() to fetch a single element, or search_results.get_models() to return an iterable or a queryset of all matching models (using pk__in lookups).

Currently, I'm using this mixin with all my document classes:

# Can't make this subclass of DocType because of metaclass
# (Do you mind moving things to some DocTypeBase that document mixins can subclass?)
class ModelDocMixin(object):
    """
    This mixin provides some utility function for documents,
    simplifying access to related model information.
    """
    @property
    def pk(self):
        return self.meta["id"]

    def get_model_class(self):
        # return self.get_queryset().model
        return self._doc_type.model

    def get_model_instance(self):
        return self.get_queryset().get(pk=self.pk)

And a pair of helper functions to transform Response (or any iterable) to an iterable of model instances - for cases when it's necessary to fetch data from DB.

However, I believe, such things would be good to have built-in. What do you think?

Add a global flag to ELASTICSEARCH_DSL setting which disables automatic indexing

Hey,

In some cases it can be useful to disable the automatic indexing, for example when loading lots of fixtures. Or in our case when testing the success of fixture-loading in a CI environment with no ES backend.

Would you appreciate a pull request that implements such a flag?

Bulk save

First off let me say this is an awesome python package. I'm trying to import 12 million records from a Heroku Postgres instance into Elastic Cloud, but it stops immediately due to an out of memory error. I think it's due to the way the normal manage.py search_index --rebuild works. Is there a way to easily implement a bulk save and batch the items or limit the memory use?

search_index populate ignores prefetch_related

In most of my DocType classes, I extended get_queryset to add select_related and prefetch_related clauses to speed up indexing. The select_related clauses work fine, but since populate takes the iterator of the queryset, the prefetch_related clauses are ignored.

To help speed up indexing, it would help a lot of an alternative way to process the models could be used (like using pagination?).

Source: https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.iterator

Automatic indexing when object created

Hi,

Great work on this project.

I am having trouble with automatically indexing newly created objects with groupby.

Every time a new model is created, it should index it.

However, it only indexes the "title" value. It does not index the "tag".

So, I created a manual function manual_index. When I run that, the "tag" value does get added to the index. However this process has to be manually triggered.

How can I get the "tag" value to save every time a new object is created?

Search.py

class TaskIndex(DocType):
    title = String()
    class Meta:
        index = 'task-index'

def manual_index():
    TaskIndex.init()
    es = Elasticsearch()
    bulk(client=es, actions=(b.indexing() for b in models.Task.objects.all().iterator()))

Models.py

from itertools import groupby

class Tag(models.Model):
    name = models.CharField("Name", max_length=5000, blank=True)
    taglevel = models.IntegerField("Tag level", null=True, blank=True)

class Item(models.Model):
    title = models.CharField("Title", max_length=10000, blank=True)
    tag = models.ManyToManyField('Tag', blank=True)

    def get_grouped_tags(self):
        tag = self.tag.order_by('taglevel')
        grouped_tags = {
            tag_level: [
                { 'name': tag_of_level.name, 'taglevel': tag_of_level.taglevel, }
                for tag_of_level in tags_of_level
            ] for tag_level, tags_of_level
            in groupby(tag, lambda tag: tag.taglevel)
        }
        return grouped_tags

    def indexing(self):
        obj = TaskIndex(
            meta={'id': self.id},
            title=self.title,
            tag=self.get_grouped_tags()
        obj.save()
        return obj.to_dict(include_meta=True)

TypeError on new ArrayField with document ListField

I Just added a new postgres array field to a table and specified the document field as a list
contacts = ArrayField(models.CharField(max_length=13, blank=True), size=3, null=True)

and
'contacts': fields.ListField(fields.StringField()),

while rebuilding the index I get TypeError: 'NoneType' object is not iterable error

Batching support on reindex

When indexing many entries (entire table reindex for example), support for batch data updates is crucial.

Django-haystack has support for this http://django-haystack.readthedocs.io/en/master/management_commands.html#rebuild-index

Here's how they do it:
https://github.com/django-haystack/django-haystack/blob/d69d4a152f7acf0bf69ab00a3e7bd11c8421e8f0/haystack/management/commands/update_index.py#L269

Add option whether always to refresh

Currently, Django Elasticsearch DSL forces an index refresh on every update. While this ensures data is available for search immediately, it also generates a high server load in a write intensive environment, and is just wasting good server power if being able to search that fast is not a priority.

While the DocType.update method has an argument refresh, most refreshes are triggered by signals which does not support passing that option.

An global option and/or an option per doc type (meta class?) would help prevent unnecessary load on the Elasticsearch servers.

How do you map a ManyToMany field to a DocType class?

My model has a few foreign keys/manytomany fields but I'm not entirely sure how to handle it with elasticsearch-dsl.

class HouseIndex(DocType):
    house_type = String()
    #people
    sold = Boolean()
    built_datetime = Date()
    #alerts
    # associated_locations
    hash = String()


class House(models.Model):
    house_type = models.ForeignKey(HouseType, db_index=True,
                                   on_delete=models.CASCADE)
    people = models.ManyToManyField(to='Person', db_index=True,
                                      through='PersonToHouseMap')
    sold = models.BooleanField(default=False)
    built_datetime = models.DateTimeField()
    alerts = models.ManyToManyField(Alert)
    associated_locations = models.ManyToManyField(to='Location')
    hash = models.CharField(max_length=64, unique=True, null=True)
    objects = HouseManager()

But I'm not sure what to do when it's a ManyToMany field. Such as with people, alerts, and associated locations. Any guidance would be appreciated

Reverse relation does not update document

In the #handle-relationship-with-nestedfieldobjectfield section of the README, we can see good example with three models: Car, Manufacturer and Ad. related_models makes sure that the Car will be re-saved when Manufacturer is updated.

Now I am struggling to make the Car re-saved when Ad is updated. It would be great if this is possible.

In other words, to trigger Car updates not only by models that Car has foreign key to, but by models that have Car as a foreign key.

How to filter results according to user token

Great work on this repo :)

I am trying to make my elastic search secure. I want it to only show results if the owner token matches the Task object.

I made an attempt, as it works in Django rest framework, but I had no success.

What is the correct way to implement owner/token filtering? Thanks

I am accessing the results via:

http://localhost:9200/_search

Models.py

class Task(models.Model):
    title = models.CharField("Title", max_length=10000, blank=True)
    owner = models.ForeignKey('auth.User', blank=True, null=True)

Search.py

from rest_framework import filters

# Create a connection to ElasticSearch
connections.create_connection()

class OwnerFilterBackend(filters.BaseFilterBackend):
    def filter_queryset(self, request, queryset, view):
        return queryset.filter(owner=request.user)

class TaskIndex(DocType):
    title = String()
    filter_backends = (OwnerFilterBackend,)
    class Meta:
        index = 'task-index'
 
 
def bulk_indexing():
    TaskIndex.init()
    es = Elasticsearch()
    bulk(client=es, actions=(b.indexing() for b in models.Task.objects.all().iterator()))
 
 
def _search(title):
    s = Search().filter('term', title=title.text)
    response = s.execute()
    return response

Without success, I have also tried:

def _search(title):
    s = Search().query('bool', must=[ 
        Q('term', title=title.text),
        Q('match', owner=user.pk),
    ])
    return s.execute()

Analyzer is not being applied to fields?

contracts = Index('contracts')
my_analyzer = analyzer('simple')

contracts.analyzer(my_analyzer)


@contracts.doc_type
class ContractDocument(DocType):
    client = fields.StringField(attr='client_name')

    class Meta:
        model = Contract

        fields = [
            'id',
            'name'
        ]

I am trying to apply simple analyzer on fields. But when I call termvectors after running search_index, I see that standard analyzer is applied on fields.

How can I apply simple analyzer to all fields?

I have tons of fields, I don't want to declare them by hand. Only solution is to create an ES Field for each model field? How can I declare analyzer for fields in Meta.fields or how can I modify my class to do this?

Can't get the queryset with to_queryset

django-elasticsearch-dsl==0.4.4
Django==1.10.7
elasticsearch==2.3.0
elasticsearch-dsl==2.0.0

s = Document.search().filter('match', name='test')[:30]
s.to_queryset()

Raise this error

AttributeError: 'Search' object has no attribute '_source'

s.__dict__ return

{'_doc_type': ['buyer_alias_document'],
 '_doc_type_map': {'buyer_alias_document': <bound method DocTypeMeta.from_es of <class 'jurismarches.buyers.documents.BuyerAliasDocument'>>},
 '_extra': {},
 '_fields': None,
 '_highlight': {},
 '_highlight_opts': {},
 '_index': ['buyer_alias'],
 '_model': <class 'jurismarches.buyers.models.BuyerAlias'>,
 '_params': {},
 '_partial_fields': {},
 '_post_filter_proxy': <elasticsearch_dsl.search.QueryProxy object at 0x7fb285a9c588>,
 '_query_proxy': <elasticsearch_dsl.search.QueryProxy object at 0x7fb285a9c550>,
 '_response_class': <class 'elasticsearch_dsl.result.Response'>,
 '_script_fields': {},
 '_sort': [],
 '_suggest': {},
 '_using': 'default',
 'aggs': AggsProxy()}

Auto-indexing for nested fields

Hi,

When I update a nested field value, the change is not being indexed in elastic search.

I have followed your example, but had no success with making the "Tag" become indexed when its value changes.

Could you please give an example of how this is done?

class Car(models.Model):
    name = models.CharField()
    manufacturer = models.ManyToMany('Tag')

class Tag(models.Model):
    name = models.CharField()
    def tags(self):
        return self.tag_set.all()

car = Index('cars')


@car.doc_type
class CarDocument(DocType):

    tag = NestedField(properties={
        'name': StringField(),
    })

    class Meta:
        model = Car
        fields = [
            'name',
        ]

How to insert or update data?

I want to insrt data like this

curl -XPOST http://localhost:9200/test/question_document/1533 -d'
{"title":"my title"}
'

can I insert data use Document?

[Showerthought] Use virtual indexes for zero-downtime rebuilds?

Right now, when you rebuild a index, the index is nuked first, then rebuilt from scratch. During this reindexing process, any searches to the index might fail.

Instead, you could use "virtual indexes" to perform a rebuild without downtime. By that, I mean that you create a real index with a different name, e.g. index_name.<timestamp>. You can then point an alias for index_name and point it to the real index.

When rebuilding the index, you could create a new index in the background, populate it, then switch the aliases over. That way, the application can still use the old index while the new index is being created.

Most Elasticsearch applications I know use something like this and I'm willing to contribute something similar to this project. However, before I do that, I would like to know whether this is a desirable feature to have or whether it's unnecessary complexity for a generic library.

NestedField seems to be broken

Boiled down to its essential part my problem is that I can't get NestFields to work.
At first I thought the problem was in my code but after copying and pasting the example from the documentation I still get the same error:

django_elasticsearch_dsl.exceptions.VariableLookupError: Failed lookup for key [postsecondhand] in <Post: Post object>

(these are my models but the error is the same. Just change Post with Car and postsecondhand with ads)

Any idea?

prepare_field seems does not work

I have in my documents.py:

@vs_entry.doc_type
class VSEntryDocument(DocType):
    class Meta:
        model = VeggieSailorEntry # The model associate with this DocType
        location = fields.GeoPointField(attr="get_location")
        #location = fields.GeoPointField(lat_long=True)

        def prepare_location(self, instance):
            return instance.get_location()
        # The fields of the model you want to be indexed in Elasticsearch
        fields = [
            'short_description',
            'level',
            'description',
            'name',
            'rating' 
           ]

and in my models.py:

    location = models.TextField(default="{'lon':0, 'lat':0}",null=False)
    def get_location(self):
        """Get the location - for the Haystack.
        """
        print (self.long, self.lat)
        if not self.long or not self.lat:
            return {'lon':0, 'lat':0}  
        return  {'lon':self.long, 'lat':self.lat}

and later I am executing:

~/dev/repo/vegbasket/vegbasketapp on  devel! ⌚ 2:07:08
$ ./manage.py search_index --rebuild

Are you sure you want to delete the 'entries' indexes? [n/Y]: y
Deleting index 'entries'
Creating index 'entries'
Indexing 10184 'VeggieSailorEntry' objects
(vs3)
~/dev/repo/vegbasket/vegbasketapp on  devel! ⌚ 2:35:46
$

I am checking in Kibana available fields and location does not exist.

Is this feature is working?

[Doubt] - List of Priority

Hi @sabricot

How are you?

I'd like to know if is possible to provide a list of priority, for example, id's of users, to be more fast in the search.
For example: I typed keyword "Ana"... Thus will be returned the users with keyword "Ana" and that match with the list of priority (ids of users). After that, if not match any ID's of the priority list.... Thus, will be returned another results that match only keyword "Ana".

Thanks,
have a nice week,

Support for JSONField

I am trying to use the Django models JSONField but because its structure can be changed I am not sure of its properties. Is there anyway I can avoid setting them when using a ObjectField? Or can you think of any other solutions?

Without setting properties like below the following happens

attributes = fields.ObjectField()

"attributes": [
  {
  },
  {
  }
],

You package has saved me so much time, thanks.

Performance issue when populate millions documents

Problem

When we need to put a lot of documents in index, we need to use queryset_pagination meta option to paginate. Django pagination need a sorted queryset with order_by (cf doc) otherwise same pk can be present more than once and others missing (like #71).

Put order_by on queryset will make django paginator call order_by for each page. Call order_by on huge queryset (like 10 millions) will lead to a huge perfomance issue.

Temporary solution:

We can override _get_actions method (from django_elasticsearch_dsl.documents.DocType) to not use django paginator when a queryset is passed. More over because of the way a database index work, we should first fetch only pks, and then do sub request based on it.

from django.db.models.query import QuerySet


def _get_actions(self, object_list, action):
    if self._doc_type.queryset_pagination and isinstance(object_list, QuerySet):
        pks = object_list.order_by('pk').values_list('pk', flat=True)
        len_pks = len(pks)
        for start_pk_index in range(0, len_pks, self._doc_type.queryset_pagination + 1):
            end_pk_index = start_pk_index + self._doc_type.queryset_pagination
            if end_pk_index >= len_pks:
                end_pk_index = len_pks - 1
            ranged_qs = object_list.filter(pk__range=[
                pks[start_pk_index],
                pks[end_pk_index]
            ])
            for object_instance in ranged_qs:
                yield self._prepare_action(object_instance, action)
    else:
        yield from super()._get_actions(object_list, action)

Available to make the PR if needed.

Generate ObjectField/NestField properties from a DocType class

I am interested in this (taken from README#TODOS).

Seems already doable without too much difficulty via something like:

fields.ObjectField(properties=ExampleDocument()._doc_type._fields())

ES 6.x support

Hi,

Just curious: what's the status on support for Elasticsearch 6.x?

Populating index

I'm populating an index with around 12 million records. For some reason, the process stopped at around 9 million records. I'm curious, what happens when I run python manage.py search_index --populate --models employee.Employee. Will it skip records that already exist in the index? Will it be a faster process to get to those remaining 3 million records? Or will it take the same amount of time since it needs to iterate through every record?

How to search for a part of a word?

So. I can't understand how to use it with this library. How to make query like this with this library
input:

 curl -XGET 'localhost:9200/product-index/product_index/_search?q=skihkt*'

output

{"took":29,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"product-index","_type":"product_index","_id":"10003","_score":1.0,"_source":{"product_name": "skihkthyfmnbntrptmvooimf"}}]}}

Consider optionally using mocked Elasticsearch in tests

It would be nice to have an option to use mocked Elasticsearch if explicitly specified in settings (settings.ELASTICSEARCH_DSL_TEST == True).

So, if devs would like to skip elasticsearch tests, they would just patch the settings.

DocType is a bit hard to extend.

This is because of how DocTypeMeta creates an instance of DocType in its __new__ method. Creating an instance of the class inside it's Meta's __new__ should really not be happening.

If I create my own class that extends DocType like below

class OwnClass(DocType):
   def __init__(self, *args, **kwargs):
       super(OwnClass, self).__init__(*args, **kwargs)

This will raise an error as soon as you try to import this class. The error being that OwnClass does not exist. This is because we're referring to OwnClass in super(OwnClass, self) before it has been created. This also means that because an instance of the class is created before the class has been fully formed, referring to any of the attributes created after that line in __new__ in your __init__ function will raise an error.

Implement CelerySignalProcessor

Now that Signal Processors are merged into master, it would be handy to have a built-in Celery signal processor.

I'm gonna work on it, if you don't mind, @sabricot.

A way to programmatically create connection

My app runs on ec2 and I'm running ES on AWS Elasticsearch. So basically I want to create an http_auth signature using boto3 to pass into the Elasticsearch instance. Is there a class I can extend (similar to Elasticsearch2SearchBackend) to programmatically create the connection?

Dynamically generate ES index

I have a database model that needs to go to different ElasticSearch indexes depending on the payload provided for the model instance.

Looking through the documentation, it would appear that a single DocType class can only be linked to a single index (through the decorator). Is there currently any way to dynamically create the destination index for a model instance?

NestedField ads does not work in documents.py for the car model

Under CarDocument
ads = fields.NestedField(properties={
'description': fields.StringField(analyzer=html_strip),
'title': fields.StringField(),
'pk': fields.IntegerField(),
})

But when the ad is saved, no record is inserted in the car type. Let me know what I am lacking or there is something missing in the document itself.

If the default value of a field is a translation string, elasticsearch throws a serialization error.

Models

from django.utils.translation import ugettext_lazy as _
from django.db import models

class Account(models.Model):
    name = models.CharField(
        default=_('Account'),
    )

class Transaction(models.Model):
    dest_account = models.ForeignKey(Account, related_name='in_transactions')
    origin_account = models.ForeignKey(Account, related_name='out_transactions', null=True)

Document

class TransactionDocument(DocType):
    dest_account = NullObjectField(properties={
        'name': fields.TextField(),
    })

Code

Transaction.objects.create(
    dest_account=Account.objects.create()
)

Now, when an account is created using the default value, this happens

TypeError at /public/purchases
Unable to serialize u'Initial Account' (type: <class 'django.utils.functional.__proxy__'>)
Traceback:
...
File "/data/projects/project/lib/python2.7/site-packages/elasticsearch_dsl/serializer.py" in default
  11.         return super(AttrJSONSerializer, self).default(data)

File "/data/projects/project/lib/python2.7/site-packages/elasticsearch/serializer.py" in default
  34.         raise TypeError("Unable to serialize %r (type: %s)" % (data, type(data)))

Since the string translation hasn't been determined, the type is django.utils.functional.__proxy__
https://docs.djangoproject.com/en/dev/ref/unicode/#translated-strings

Adding a condition like this one fixes the problem but i don't know where it should go.

from django.utils.functional import Promise
if isinstance(data, Promise):
    return str(data)

Querying documents from ES and populating django model to use in REST endpoints

Hi there, I have a django rest application and need to query documents from ElasticSearch so that I can expose them. I was looking at this project to help me achieving what I am after, however besides just querying the data through my documents module I would have to deserialize the documents into my model and that hook that up to my rest views. Would you know how I could achieve this ?
Thanks.

I there any thougth to add postgres ArrayField? Array data type exist in Elasticseach

Using ObjectField/NestedField- Empty array

Hi,

I am trying to return data from a function, but the result is empty:

For example, the result is:

"name": "XYZ",
"tag" : [ { }, { } ],

But should be:

"name": "XYZ"
"tag": {1: [{'taglevel': 1, 'name': Foo'}], 2: [{'taglevel': 2, 'name': Bar'}}

I have tried both the NestedField and ObjectField options. Both gave the same traceback (as per below).

I used pdb to debug, and I can see the correct result at return grouped_tags but it never arrives in elasticsearch.

IF THIS ISN'T POSSIBLE PLEASE tell me as I have spent weeks on this, and raised issues here and StackOverflow.

Otherwise, how can I get the results from my function?

Thanks

documents.py

vehicle = Index('vehicle')

vehicle.settings(
    number_of_shards=1,
    number_of_replicas=0
)

@vehicle.doc_type
class VehicalDocument(DocType):
    tag = fields.ObjectField(attr="get_grouped_tags")
    class Meta:
        model = Vehicle
        fields = [
            'name'
        ]

models.py

class Tag(models.Model):
    name = models.CharField("Name", max_length=5000, blank=True)
    taglevel = models.IntegerField("Tag level", null=True, blank=True)

class Vehicle(models.Model):
    title = models.CharField("Title", max_length=10000, blank=True)
    tag = models.ManyToManyField('Tag', blank=True)

    def get_grouped_tags(self):
        tag = self.tag.order_by('taglevel')
        grouped_tags = {
            tag_level: [
                { 'name': tag_of_level.name, 'taglevel': tag_of_level.taglevel, }
                for tag_of_level in tags_of_level
            ] for tag_level, tags_of_level
            in groupby(tag, lambda tag: tag.taglevel)
        }
        return grouped_tags

autofield mapping set to string, should be text

Just tested this wrapper after testing with the dsl. The default mapping of string from fields seems to be marked depcireded by elasticsearch 5.6.5

python manage.py search_index --rebuild

elasticsearch | [2017-12-19T14:12:28,061][INFO ][o.e.c.m.MetaDataDeleteIndexService] [0CginTA] [stories/0aaM3MqDRgGfiiDkCkz9LA] deleting index
elasticsearch | [2017-12-19T14:12:28,134][WARN ][o.e.d.i.m.StringFieldMapper$TypeParser] The [string] field is deprecated, please use [text] or [keyword] instead on [story_title]
elasticsearch | [2017-12-19T14:12:28,135][WARN ][o.e.d.i.m.StringFieldMapper$TypeParser] The [string] field is deprecated, please use [text] or [keyword] instead on [auto_uid]
elasticsearch | [2017-12-19T14:12:28,135][WARN ][o.e.d.i.m.StringFieldMapper$TypeParser] The [string] field is deprecated, please use [text] or [keyword] instead on [story_description]
elasticsearch | [2017-12-19T14:12:28,150][INFO ][o.e.c.m.MetaDataCreateIndexService] [0CginTA] [stories] creating index, cause [api], templates [], shards [1]/[0], mappings [story_document]

Handling Bulk Inserts

It'd be cool if it could keep the database and elasticsearch index in sync for bulk inserts. I noticed it wasn't a feature yet.

to_queryset method failed

Django==2.0
django-elasticsearch-dsl==0.4.3
elasticsearch==5.5.1
elasticsearch-dsl==5.4.0

elasticsearch engine 5.0.0./5.1.1

to_queryset failed with:

RequestError Traceback (most recent call last)
in ()
----> 1 s.to_queryset()

~/venvs/seostatistic/lib/python3.6/site-packages/django_elasticsearch_dsl/search.py in to_queryset(self, keep_order)
26 s = self.source(exclude=['*'])
27
---> 28 pks = [result._id for result in s]
29
30 qs = self._model.objects.filter(pk__in=pks)

~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch_dsl/search.py in iter(self)
265 Iterate over the hits.
266 """
--> 267 return iter(self.execute())
268
269 def getitem(self, n):

~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch_dsl/search.py in execute(self, ignore_cache)
637 doc_type=self._doc_type,
638 body=self.to_dict(),
--> 639 **self._params
640 )
641 )

~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/client/utils.py in _wrapped(*args, **kwargs)
71 if p in kwargs:
72 params[p] = kwargs.pop(p)
---> 73 return func(*args, params=params, **kwargs)
74 return _wrapped
75 return _wrapper

~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/client/init.py in search(self, index, doc_type, body, params)
630 index = '_all'
631 return self.transport.perform_request('GET', _make_path(index,
--> 632 doc_type, '_search'), params=params, body=body)
633
634 @query_params('_source', '_source_exclude', '_source_include',

~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/transport.py in perform_request(self, method, url, params, body)
310
311 try:
--> 312 status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
313
314 except TransportError as e:

~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore)
126 if not (200 <= response.status < 300) and response.status not in ignore:
127 self.log_request_fail(method, full_url, url, body, duration, response.status, raw_data)
--> 128 self._raise_error(response.status, raw_data)
129
130 self.log_request_success(method, full_url, url, body, response.status,

~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/connection/base.py in _raise_error(self, status_code, raw_data)
123 logger.warning('Undecodable raw error response from server: %s', err)
124
--> 125 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
126
127

RequestError: TransportError(400, 'illegal_argument_exception', 'Deprecated field [exclude] used, expected [excludes] instead')

Support DecimalField

Please add mapping for django DecimalField

Documentation doesn't explain how to do full-text search.

Absolutely all examples of searches in the documentation specify a field on which to perform the search. However, one of the popular features of Elasticsearch is full-text search. At least one example should be added to the documentation (if this package actually supports it, because it's not clear to me from the docs).

Rebuild index quietly from doctype

I want to test that some data are correctly created/updated in index.
Before each test I need to rebuild the index calling command search_index --rebuild --models app.django_models -f

It would be great if DocType had a classmethod to rebuild its index like this:

@classmethod
def rebuild(self):
    call_command('search_index', '--rebuild', '--models', 'app.django_model', '-f')

The management command should accept another argument like --quiet to not print anything during test

The final function could be:

@classmethod
def rebuild(self):
    call_command('search_index', '--rebuild', '--models', 'app.django_model', '-f', '--quiet')

What do you think (Can do the PR if you want) ?

How to use this pkg without model.?

I have an usecase in which my user details are stored in PostgresSQL. Now i want to expose data already present in Elasticsearch using DRF. No models needed. But I like the idea of using something like ORM. Can you suggest something?

Handle connection issues

Hey, I love your package

What about if we set ELASTICSEARCH_DSL_AUTOSYNC to True and then the ES server is down for whatever reason? Then all requests made on the indexed docuements will fail.

Any idea how to implement it differently. My current workaround here is to disable ELASTICSEARCH_DSL_AUTOSYNC and set indexing in a crontab job

Inherit model functions

My django model (Employee) has functions such as get_absolute_url() that I would like to be available in my document model (EmployeeDocument). I was able to make this work by copying the function and placing it under the EmployeeDocument class. But this seems redundant. Can you think of a clean way to inherit all the functions from the primary model into the document model?

Preserve highlights in to_queryset

Currently, highlighting information is lost once to_queryset is called. Ideally it should be preserved.

Add django-elasticsearch-dsl to Python Package Index

Please add this to Python Package Index.

Accessing nested fields via a function (M2M)

Great work on this!

I'm having trouble with my nested data.

Due to my front end, my data has to stay in a this layout/=format. The data (what get_grouped_tags) returns will look like this:

    "tag" : {
      "1" : [ {
        "taglevel" : 1,
        "name" : "Foo"
      } ],
      "2" : [ {
        "taglevel" : 2,
        "name" : "Bazz"
      } ]
    },

I tried this, which tells me "illegal argument exceptions":

documents.py

from django_elasticsearch_dsl import DocType, Index, fields
from .models import Item

item = Index('items')

item.settings(
    number_of_shards=1,
    number_of_replicas=0
)

@item.doc_type
class ItemDocument(DocType):
    tag = fields.StringField(attr="get_grouped_tags")

    class Meta:
        model = Item
        fields = [
            'typetask',
            'title',
        ]

models.py

class Tag(models.Model):
    name = models.CharField("Name", max_length=5000, blank=True)
    taglevel = models.IntegerField("Tag level", null=True, blank=True)

    def to_search(self):
        tags = self.id
        if tags:
            queryset = Item.objects.filter(tag=tags)
            for object in queryset:
                object.save()
        return queryset

class Item(models.Model):
    title = models.CharField("Title", max_length=10000, blank=True)
    tag = models.ManyToManyField('Tag', blank=True)

    def get_grouped_tags(self):
        tag = self.tag.order_by('taglevel')
        grouped_tags = {
            tag_level: [
                { 'name': tag_of_level.name, 'taglevel': tag_of_level.taglevel, }
                for tag_of_level in tags_of_level
            ] for tag_level, tags_of_level
            in groupby(tag, lambda tag: tag.taglevel)
        }
        return grouped_tags

I tried tag = fields.NestedField(attr="get_grouped_tags"), but the tags field just comes up empty in the index.
I tried:

    tag = fields.NestedField(properties={
        'name': fields.StringField(),
    })

But that returns this traceback gave be a keyError for manager (http://dpaste.com/35D46HP). Also, even if this did work, the data would still need to be presented as it looks in my function.

How do I properly access my nested data from an M2M field?

Thanks

PS:

Could be good to tell newbies how to install it pip install git+https://github.com/sabricot/django-elasticsearch-dsl.git

django-es / django-elasticsearch-dsl Goto Github PK

django-elasticsearch-dsl's People

Contributors

Stargazers

Watchers

Forkers

django-elasticsearch-dsl's Issues

Models.py

Search.py

Problem

Temporary solution:

documents.py

models.py

documents.py

models.py

Recommend Projects

Recommend Topics

Recommend Org