django-es / django-elasticsearch-dsl Goto Github PK
View Code? Open in Web Editor NEWThis is a package that allows indexing of django models in elasticsearch with elasticsearch-dsl-py.
License: Other
This is a package that allows indexing of django models in elasticsearch with elasticsearch-dsl-py.
License: Other
Sorry to raise another issue, I tried looking though the source but could not find something I could easily to alter how it treats Django's ImageFieldFile. I keep getting the following error:
elasticsearch.exceptions.SerializationError: ({'image': <ImageFieldFile: None>}, TypeError("Unable to serialize <ImageFieldFile: None> (type: <class 'django.db.models.fields.files.ImageFieldFile'>)",))
I tired setting the attr
property to image.url
like below but in the event a file does not exist Django will raise an error.
image = fields.StringField(attr='image.url')
ValueError: The 'image' attribute has no file associated with it.
Do you know of any solutions to this?
It would be really nice if I18N was supported for multilingual sites.
Hi,
I have an index with nested objects (relationships), something like this:
@myindex.doc_type
class MyIndex(DocType):
data = fields.NestedField(properties={
'prop1': fields.StringField(),
'prop2': fields.DateField(),
[...]
})
class Meta:
model = MyObject
fields = [...]
related_models = [Data]
def get_instances_from_related(self, related_instance):
return related_instance.myobject
Whenever a "data" object in inserted or updated in database, the "MyObject" index is updated. But when I delete a "data" object, the entire "MyObject" is deleted from the index, it is not updated with the "data" object removed.
Any idea ?
Thanks.
I'd suggest to have some public API that allows to obtain model information from the search hit document. While it's totally possible to have all necessary data in ElasticSearch alone, sometimes it's necessary (or just preferable) to always fetch the fresh data from the main database.
A few ideas to consider:
hit.meta["id"]
. Looks reasonable, but an id
or pk
property on the document instance would be nicer.hit.get_queryset().model
and hit._doc_type.model
. The former may do some extra unnecessary stuff (we don't need a queryset, just a model, and get_queryset
can do extra work). The latter is a private API, so it's a bad idea to rely on it.get_model_class
method on DocType
subclass' instances, that would return whatever was declared in the document's Meta.model
.hit.get_model_instance()
to fetch a single element, or search_results.get_models()
to return an iterable or a queryset of all matching models (using pk__in
lookups).Currently, I'm using this mixin with all my document classes:
# Can't make this subclass of DocType because of metaclass
# (Do you mind moving things to some DocTypeBase that document mixins can subclass?)
class ModelDocMixin(object):
"""
This mixin provides some utility function for documents,
simplifying access to related model information.
"""
@property
def pk(self):
return self.meta["id"]
def get_model_class(self):
# return self.get_queryset().model
return self._doc_type.model
def get_model_instance(self):
return self.get_queryset().get(pk=self.pk)
And a pair of helper functions to transform Response
(or any iterable) to an iterable of model instances - for cases when it's necessary to fetch data from DB.
However, I believe, such things would be good to have built-in. What do you think?
Hey,
In some cases it can be useful to disable the automatic indexing, for example when loading lots of fixtures. Or in our case when testing the success of fixture-loading in a CI environment with no ES backend.
Would you appreciate a pull request that implements such a flag?
First off let me say this is an awesome python package. I'm trying to import 12 million records from a Heroku Postgres instance into Elastic Cloud, but it stops immediately due to an out of memory error. I think it's due to the way the normal manage.py search_index --rebuild
works. Is there a way to easily implement a bulk save and batch the items or limit the memory use?
In most of my DocType classes, I extended get_queryset
to add select_related and prefetch_related clauses to speed up indexing. The select_related clauses work fine, but since populate takes the iterator of the queryset, the prefetch_related clauses are ignored.
To help speed up indexing, it would help a lot of an alternative way to process the models could be used (like using pagination?).
Source: https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.iterator
Hi,
Great work on this project.
I am having trouble with automatically indexing newly created objects with groupby
.
Every time a new model is created, it should index it.
However, it only indexes the "title" value. It does not index the "tag".
So, I created a manual function manual_index
. When I run that, the "tag" value does get added to the index. However this process has to be manually triggered.
How can I get the "tag" value to save every time a new object is created?
Search.py
class TaskIndex(DocType):
title = String()
class Meta:
index = 'task-index'
def manual_index():
TaskIndex.init()
es = Elasticsearch()
bulk(client=es, actions=(b.indexing() for b in models.Task.objects.all().iterator()))
Models.py
from itertools import groupby
class Tag(models.Model):
name = models.CharField("Name", max_length=5000, blank=True)
taglevel = models.IntegerField("Tag level", null=True, blank=True)
class Item(models.Model):
title = models.CharField("Title", max_length=10000, blank=True)
tag = models.ManyToManyField('Tag', blank=True)
def get_grouped_tags(self):
tag = self.tag.order_by('taglevel')
grouped_tags = {
tag_level: [
{ 'name': tag_of_level.name, 'taglevel': tag_of_level.taglevel, }
for tag_of_level in tags_of_level
] for tag_level, tags_of_level
in groupby(tag, lambda tag: tag.taglevel)
}
return grouped_tags
def indexing(self):
obj = TaskIndex(
meta={'id': self.id},
title=self.title,
tag=self.get_grouped_tags()
obj.save()
return obj.to_dict(include_meta=True)
I Just added a new postgres array field to a table and specified the document field as a list
contacts = ArrayField(models.CharField(max_length=13, blank=True), size=3, null=True)
and
'contacts': fields.ListField(fields.StringField()),
while rebuilding the index I get TypeError: 'NoneType' object is not iterable
error
When indexing many entries (entire table reindex for example), support for batch data updates is crucial.
Django-haystack has support for this http://django-haystack.readthedocs.io/en/master/management_commands.html#rebuild-index
Here's how they do it:
https://github.com/django-haystack/django-haystack/blob/d69d4a152f7acf0bf69ab00a3e7bd11c8421e8f0/haystack/management/commands/update_index.py#L269
Currently, Django Elasticsearch DSL forces an index refresh on every update. While this ensures data is available for search immediately, it also generates a high server load in a write intensive environment, and is just wasting good server power if being able to search that fast is not a priority.
While the DocType.update
method has an argument refresh
, most refreshes are triggered by signals which does not support passing that option.
An global option and/or an option per doc type (meta class?) would help prevent unnecessary load on the Elasticsearch servers.
My model has a few foreign keys/manytomany fields but I'm not entirely sure how to handle it with elasticsearch-dsl.
class HouseIndex(DocType):
house_type = String()
#people
sold = Boolean()
built_datetime = Date()
#alerts
# associated_locations
hash = String()
class House(models.Model):
house_type = models.ForeignKey(HouseType, db_index=True,
on_delete=models.CASCADE)
people = models.ManyToManyField(to='Person', db_index=True,
through='PersonToHouseMap')
sold = models.BooleanField(default=False)
built_datetime = models.DateTimeField()
alerts = models.ManyToManyField(Alert)
associated_locations = models.ManyToManyField(to='Location')
hash = models.CharField(max_length=64, unique=True, null=True)
objects = HouseManager()
But I'm not sure what to do when it's a ManyToMany field. Such as with people, alerts, and associated locations. Any guidance would be appreciated
In the #handle-relationship-with-nestedfieldobjectfield section of the README, we can see good example with three models: Car, Manufacturer and Ad. related_models makes sure that the Car will be re-saved when Manufacturer is updated.
Now I am struggling to make the Car re-saved when Ad is updated. It would be great if this is possible.
In other words, to trigger Car updates not only by models that Car has foreign key to, but by models that have Car as a foreign key.
Great work on this repo :)
I am trying to make my elastic search secure. I want it to only show results if the owner token matches the Task
object.
I made an attempt, as it works in Django rest framework, but I had no success.
What is the correct way to implement owner/token filtering? Thanks
I am accessing the results via:
http://localhost:9200/_search
class Task(models.Model):
title = models.CharField("Title", max_length=10000, blank=True)
owner = models.ForeignKey('auth.User', blank=True, null=True)
from rest_framework import filters
# Create a connection to ElasticSearch
connections.create_connection()
class OwnerFilterBackend(filters.BaseFilterBackend):
def filter_queryset(self, request, queryset, view):
return queryset.filter(owner=request.user)
class TaskIndex(DocType):
title = String()
filter_backends = (OwnerFilterBackend,)
class Meta:
index = 'task-index'
def bulk_indexing():
TaskIndex.init()
es = Elasticsearch()
bulk(client=es, actions=(b.indexing() for b in models.Task.objects.all().iterator()))
def _search(title):
s = Search().filter('term', title=title.text)
response = s.execute()
return response
Without success, I have also tried:
def _search(title):
s = Search().query('bool', must=[
Q('term', title=title.text),
Q('match', owner=user.pk),
])
return s.execute()
contracts = Index('contracts')
my_analyzer = analyzer('simple')
contracts.analyzer(my_analyzer)
@contracts.doc_type
class ContractDocument(DocType):
client = fields.StringField(attr='client_name')
class Meta:
model = Contract
fields = [
'id',
'name'
]
I am trying to apply simple
analyzer on fields. But when I call termvectors
after running search_index
, I see that standard
analyzer is applied on fields.
How can I apply simple
analyzer to all fields?
I have tons of fields, I don't want to declare them by hand. Only solution is to create an ES Field for each model field? How can I declare analyzer for fields in Meta.fields
or how can I modify my class to do this?
s = Document.search().filter('match', name='test')[:30]
s.to_queryset()
Raise this error
AttributeError: 'Search' object has no attribute '_source'
s.__dict__
return
{'_doc_type': ['buyer_alias_document'],
'_doc_type_map': {'buyer_alias_document': <bound method DocTypeMeta.from_es of <class 'jurismarches.buyers.documents.BuyerAliasDocument'>>},
'_extra': {},
'_fields': None,
'_highlight': {},
'_highlight_opts': {},
'_index': ['buyer_alias'],
'_model': <class 'jurismarches.buyers.models.BuyerAlias'>,
'_params': {},
'_partial_fields': {},
'_post_filter_proxy': <elasticsearch_dsl.search.QueryProxy object at 0x7fb285a9c588>,
'_query_proxy': <elasticsearch_dsl.search.QueryProxy object at 0x7fb285a9c550>,
'_response_class': <class 'elasticsearch_dsl.result.Response'>,
'_script_fields': {},
'_sort': [],
'_suggest': {},
'_using': 'default',
'aggs': AggsProxy()}
Hi,
When I update a nested field value, the change is not being indexed in elastic search.
I have followed your example, but had no success with making the "Tag" become indexed when its value changes.
Could you please give an example of how this is done?
class Car(models.Model):
name = models.CharField()
manufacturer = models.ManyToMany('Tag')
class Tag(models.Model):
name = models.CharField()
def tags(self):
return self.tag_set.all()
car = Index('cars')
@car.doc_type
class CarDocument(DocType):
tag = NestedField(properties={
'name': StringField(),
})
class Meta:
model = Car
fields = [
'name',
]
I want to insrt data like this
curl -XPOST http://localhost:9200/test/question_document/1533 -d'
{"title":"my title"}
'
can I insert data use Document?
Right now, when you rebuild a index, the index is nuked first, then rebuilt from scratch. During this reindexing process, any searches to the index might fail.
Instead, you could use "virtual indexes" to perform a rebuild without downtime. By that, I mean that you create a real index with a different name, e.g. index_name.<timestamp>
. You can then point an alias for index_name
and point it to the real index.
When rebuilding the index, you could create a new index in the background, populate it, then switch the aliases over. That way, the application can still use the old index while the new index is being created.
Most Elasticsearch applications I know use something like this and I'm willing to contribute something similar to this project. However, before I do that, I would like to know whether this is a desirable feature to have or whether it's unnecessary complexity for a generic library.
Boiled down to its essential part my problem is that I can't get NestFields to work.
At first I thought the problem was in my code but after copying and pasting the example from the documentation I still get the same error:
django_elasticsearch_dsl.exceptions.VariableLookupError: Failed lookup for key [postsecondhand] in <Post: Post object>
(these are my models but the error is the same. Just change Post with Car and postsecondhand with ads)
Any idea?
I have in my documents.py:
@vs_entry.doc_type
class VSEntryDocument(DocType):
class Meta:
model = VeggieSailorEntry # The model associate with this DocType
location = fields.GeoPointField(attr="get_location")
#location = fields.GeoPointField(lat_long=True)
def prepare_location(self, instance):
return instance.get_location()
# The fields of the model you want to be indexed in Elasticsearch
fields = [
'short_description',
'level',
'description',
'name',
'rating'
]
and in my models.py:
location = models.TextField(default="{'lon':0, 'lat':0}",null=False)
def get_location(self):
"""Get the location - for the Haystack.
"""
print (self.long, self.lat)
if not self.long or not self.lat:
return {'lon':0, 'lat':0}
return {'lon':self.long, 'lat':self.lat}
and later I am executing:
~/dev/repo/vegbasket/vegbasketapp on devel! ⌚ 2:07:08
$ ./manage.py search_index --rebuild
Are you sure you want to delete the 'entries' indexes? [n/Y]: y
Deleting index 'entries'
Creating index 'entries'
Indexing 10184 'VeggieSailorEntry' objects
(vs3)
~/dev/repo/vegbasket/vegbasketapp on devel! ⌚ 2:35:46
$
I am checking in Kibana available fields and location does not exist.
Is this feature is working?
Hi @sabricot
How are you?
I'd like to know if is possible to provide a list of priority, for example, id's of users, to be more fast in the search.
For example: I typed keyword "Ana"... Thus will be returned the users with keyword "Ana" and that match with the list of priority (ids of users). After that, if not match any ID's of the priority list.... Thus, will be returned another results that match only keyword "Ana".
Thanks,
have a nice week,
Hi
I am trying to use the Django models JSONField but because its structure can be changed I am not sure of its properties. Is there anyway I can avoid setting them when using a ObjectField? Or can you think of any other solutions?
Without setting properties like below the following happens
attributes = fields.ObjectField()
"attributes": [
{
},
{
}
],
You package has saved me so much time, thanks.
When we need to put a lot of documents in index, we need to use queryset_pagination
meta option to paginate. Django pagination need a sorted queryset with order_by
(cf doc) otherwise same pk can be present more than once and others missing (like #71).
Put order_by
on queryset will make django paginator call order_by for each page. Call order_by
on huge queryset (like 10 millions) will lead to a huge perfomance issue.
We can override _get_actions
method (from django_elasticsearch_dsl.documents.DocType
) to not use django paginator when a queryset is passed. More over because of the way a database index work, we should first fetch only pks, and then do sub request based on it.
from django.db.models.query import QuerySet
def _get_actions(self, object_list, action):
if self._doc_type.queryset_pagination and isinstance(object_list, QuerySet):
pks = object_list.order_by('pk').values_list('pk', flat=True)
len_pks = len(pks)
for start_pk_index in range(0, len_pks, self._doc_type.queryset_pagination + 1):
end_pk_index = start_pk_index + self._doc_type.queryset_pagination
if end_pk_index >= len_pks:
end_pk_index = len_pks - 1
ranged_qs = object_list.filter(pk__range=[
pks[start_pk_index],
pks[end_pk_index]
])
for object_instance in ranged_qs:
yield self._prepare_action(object_instance, action)
else:
yield from super()._get_actions(object_list, action)
Available to make the PR if needed.
I am interested in this (taken from README#TODOS).
Seems already doable without too much difficulty via something like:
fields.ObjectField(properties=ExampleDocument()._doc_type._fields())
Hi,
Just curious: what's the status on support for Elasticsearch 6.x?
I'm populating an index with around 12 million records. For some reason, the process stopped at around 9 million records. I'm curious, what happens when I run python manage.py search_index --populate --models employee.Employee
. Will it skip records that already exist in the index? Will it be a faster process to get to those remaining 3 million records? Or will it take the same amount of time since it needs to iterate through every record?
So. I can't understand how to use it with this library. How to make query like this with this library
input:
curl -XGET 'localhost:9200/product-index/product_index/_search?q=skihkt*'
output
{"took":29,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"product-index","_type":"product_index","_id":"10003","_score":1.0,"_source":{"product_name": "skihkthyfmnbntrptmvooimf"}}]}}
It would be nice to have an option to use mocked Elasticsearch
if explicitly specified in settings (settings.ELASTICSEARCH_DSL_TEST == True).
So, if devs would like to skip elasticsearch tests, they would just patch the settings.
This is because of how DocTypeMeta creates an instance of DocType in its __new__ method. Creating an instance of the class inside it's Meta's __new__ should really not be happening.
If I create my own class that extends DocType like below
class OwnClass(DocType):
def __init__(self, *args, **kwargs):
super(OwnClass, self).__init__(*args, **kwargs)
This will raise an error as soon as you try to import this class. The error being that OwnClass
does not exist. This is because we're referring to OwnClass
in super(OwnClass, self)
before it has been created. This also means that because an instance of the class is created before the class has been fully formed, referring to any of the attributes created after that line in __new__ in your __init__ function will raise an error.
Now that Signal Processors are merged into master, it would be handy to have a built-in Celery signal processor.
I'm gonna work on it, if you don't mind, @sabricot.
My app runs on ec2 and I'm running ES on AWS Elasticsearch. So basically I want to create an http_auth
signature using boto3 to pass into the Elasticsearch instance. Is there a class I can extend (similar to Elasticsearch2SearchBackend
) to programmatically create the connection?
I have a database model that needs to go to different ElasticSearch indexes depending on the payload provided for the model instance.
Looking through the documentation, it would appear that a single DocType
class can only be linked to a single index (through the decorator). Is there currently any way to dynamically create the destination index for a model instance?
Under CarDocument
ads = fields.NestedField(properties={
'description': fields.StringField(analyzer=html_strip),
'title': fields.StringField(),
'pk': fields.IntegerField(),
})
But when the ad is saved, no record is inserted in the car type. Let me know what I am lacking or there is something missing in the document itself.
Models
from django.utils.translation import ugettext_lazy as _
from django.db import models
class Account(models.Model):
name = models.CharField(
default=_('Account'),
)
class Transaction(models.Model):
dest_account = models.ForeignKey(Account, related_name='in_transactions')
origin_account = models.ForeignKey(Account, related_name='out_transactions', null=True)
Document
class TransactionDocument(DocType):
dest_account = NullObjectField(properties={
'name': fields.TextField(),
})
Code
Transaction.objects.create(
dest_account=Account.objects.create()
)
Now, when an account is created using the default value, this happens
TypeError at /public/purchases
Unable to serialize u'Initial Account' (type: <class 'django.utils.functional.__proxy__'>)
Traceback:
...
File "/data/projects/project/lib/python2.7/site-packages/elasticsearch_dsl/serializer.py" in default
11. return super(AttrJSONSerializer, self).default(data)
File "/data/projects/project/lib/python2.7/site-packages/elasticsearch/serializer.py" in default
34. raise TypeError("Unable to serialize %r (type: %s)" % (data, type(data)))
Since the string translation hasn't been determined, the type is django.utils.functional.__proxy__
https://docs.djangoproject.com/en/dev/ref/unicode/#translated-strings
Adding a condition like this one fixes the problem but i don't know where it should go.
from django.utils.functional import Promise
if isinstance(data, Promise):
return str(data)
Hi there, I have a django rest application and need to query documents from ElasticSearch so that I can expose them. I was looking at this project to help me achieving what I am after, however besides just querying the data through my documents module I would have to deserialize the documents into my model and that hook that up to my rest views. Would you know how I could achieve this ?
Thanks.
Hi,
I am trying to return data from a function, but the result is empty:
For example, the result is:
"name": "XYZ",
"tag" : [ { }, { } ],
But should be:
"name": "XYZ"
"tag": {1: [{'taglevel': 1, 'name': Foo'}], 2: [{'taglevel': 2, 'name': Bar'}}
I have tried both the NestedField
and ObjectField
options. Both gave the same traceback (as per below).
I used pdb
to debug, and I can see the correct result at return grouped_tags
but it never arrives in elasticsearch.
IF THIS ISN'T POSSIBLE PLEASE tell me as I have spent weeks on this, and raised issues here and StackOverflow.
Otherwise, how can I get the results from my function?
Thanks
vehicle = Index('vehicle')
vehicle.settings(
number_of_shards=1,
number_of_replicas=0
)
@vehicle.doc_type
class VehicalDocument(DocType):
tag = fields.ObjectField(attr="get_grouped_tags")
class Meta:
model = Vehicle
fields = [
'name'
]
class Tag(models.Model):
name = models.CharField("Name", max_length=5000, blank=True)
taglevel = models.IntegerField("Tag level", null=True, blank=True)
class Vehicle(models.Model):
title = models.CharField("Title", max_length=10000, blank=True)
tag = models.ManyToManyField('Tag', blank=True)
def get_grouped_tags(self):
tag = self.tag.order_by('taglevel')
grouped_tags = {
tag_level: [
{ 'name': tag_of_level.name, 'taglevel': tag_of_level.taglevel, }
for tag_of_level in tags_of_level
] for tag_level, tags_of_level
in groupby(tag, lambda tag: tag.taglevel)
}
return grouped_tags
Just tested this wrapper after testing with the dsl. The default mapping of string from fields seems to be marked depcireded by elasticsearch 5.6.5
python manage.py search_index --rebuild
elasticsearch | [2017-12-19T14:12:28,061][INFO ][o.e.c.m.MetaDataDeleteIndexService] [0CginTA] [stories/0aaM3MqDRgGfiiDkCkz9LA] deleting index
elasticsearch | [2017-12-19T14:12:28,134][WARN ][o.e.d.i.m.StringFieldMapper$TypeParser] The [string] field is deprecated, please use [text] or [keyword] instead on [story_title]
elasticsearch | [2017-12-19T14:12:28,135][WARN ][o.e.d.i.m.StringFieldMapper$TypeParser] The [string] field is deprecated, please use [text] or [keyword] instead on [auto_uid]
elasticsearch | [2017-12-19T14:12:28,135][WARN ][o.e.d.i.m.StringFieldMapper$TypeParser] The [string] field is deprecated, please use [text] or [keyword] instead on [story_description]
elasticsearch | [2017-12-19T14:12:28,150][INFO ][o.e.c.m.MetaDataCreateIndexService] [0CginTA] [stories] creating index, cause [api], templates [], shards [1]/[0], mappings [story_document]
It'd be cool if it could keep the database and elasticsearch index in sync for bulk inserts. I noticed it wasn't a feature yet.
Django==2.0
django-elasticsearch-dsl==0.4.3
elasticsearch==5.5.1
elasticsearch-dsl==5.4.0
elasticsearch engine 5.0.0./5.1.1
to_queryset failed with:
RequestError Traceback (most recent call last)
in ()
----> 1 s.to_queryset()
~/venvs/seostatistic/lib/python3.6/site-packages/django_elasticsearch_dsl/search.py in to_queryset(self, keep_order)
26 s = self.source(exclude=['*'])
27
---> 28 pks = [result._id for result in s]
29
30 qs = self._model.objects.filter(pk__in=pks)
~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch_dsl/search.py in iter(self)
265 Iterate over the hits.
266 """
--> 267 return iter(self.execute())
268
269 def getitem(self, n):
~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch_dsl/search.py in execute(self, ignore_cache)
637 doc_type=self._doc_type,
638 body=self.to_dict(),
--> 639 **self._params
640 )
641 )
~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/client/utils.py in _wrapped(*args, **kwargs)
71 if p in kwargs:
72 params[p] = kwargs.pop(p)
---> 73 return func(*args, params=params, **kwargs)
74 return _wrapped
75 return _wrapper
~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/client/init.py in search(self, index, doc_type, body, params)
630 index = '_all'
631 return self.transport.perform_request('GET', _make_path(index,
--> 632 doc_type, '_search'), params=params, body=body)
633
634 @query_params('_source', '_source_exclude', '_source_include',
~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/transport.py in perform_request(self, method, url, params, body)
310
311 try:
--> 312 status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
313
314 except TransportError as e:
~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore)
126 if not (200 <= response.status < 300) and response.status not in ignore:
127 self.log_request_fail(method, full_url, url, body, duration, response.status, raw_data)
--> 128 self._raise_error(response.status, raw_data)
129
130 self.log_request_success(method, full_url, url, body, response.status,
~/venvs/seostatistic/lib/python3.6/site-packages/elasticsearch/connection/base.py in _raise_error(self, status_code, raw_data)
123 logger.warning('Undecodable raw error response from server: %s', err)
124
--> 125 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
126
127
RequestError: TransportError(400, 'illegal_argument_exception', 'Deprecated field [exclude] used, expected [excludes] instead')
Please add mapping for django DecimalField
Absolutely all examples of searches in the documentation specify a field on which to perform the search. However, one of the popular features of Elasticsearch is full-text search. At least one example should be added to the documentation (if this package actually supports it, because it's not clear to me from the docs).
I want to test that some data are correctly created/updated in index.
Before each test I need to rebuild the index calling command search_index --rebuild --models app.django_models -f
It would be great if DocType
had a classmethod to rebuild its index like this:
@classmethod
def rebuild(self):
call_command('search_index', '--rebuild', '--models', 'app.django_model', '-f')
The management command should accept another argument like --quiet
to not print anything during test
The final function could be:
@classmethod
def rebuild(self):
call_command('search_index', '--rebuild', '--models', 'app.django_model', '-f', '--quiet')
What do you think (Can do the PR if you want) ?
I have an usecase in which my user details are stored in PostgresSQL. Now i want to expose data already present in Elasticsearch using DRF. No models needed. But I like the idea of using something like ORM. Can you suggest something?
Hey, I love your package
What about if we set ELASTICSEARCH_DSL_AUTOSYNC
to True and then the ES server is down for whatever reason? Then all requests made on the indexed docuements will fail.
Any idea how to implement it differently. My current workaround here is to disable ELASTICSEARCH_DSL_AUTOSYNC and set indexing in a crontab job
My django model (Employee) has functions such as get_absolute_url()
that I would like to be available in my document model (EmployeeDocument). I was able to make this work by copying the function and placing it under the EmployeeDocument class. But this seems redundant. Can you think of a clean way to inherit all the functions from the primary model into the document model?
Currently, highlighting information is lost once to_queryset
is called. Ideally it should be preserved.
Please add this to Python Package Index.
Great work on this!
I'm having trouble with my nested data.
Due to my front end, my data has to stay in a this layout/=format. The data (what get_grouped_tags
) returns will look like this:
"tag" : {
"1" : [ {
"taglevel" : 1,
"name" : "Foo"
} ],
"2" : [ {
"taglevel" : 2,
"name" : "Bazz"
} ]
},
I tried this, which tells me "illegal argument exceptions":
from django_elasticsearch_dsl import DocType, Index, fields
from .models import Item
item = Index('items')
item.settings(
number_of_shards=1,
number_of_replicas=0
)
@item.doc_type
class ItemDocument(DocType):
tag = fields.StringField(attr="get_grouped_tags")
class Meta:
model = Item
fields = [
'typetask',
'title',
]
class Tag(models.Model):
name = models.CharField("Name", max_length=5000, blank=True)
taglevel = models.IntegerField("Tag level", null=True, blank=True)
def to_search(self):
tags = self.id
if tags:
queryset = Item.objects.filter(tag=tags)
for object in queryset:
object.save()
return queryset
class Item(models.Model):
title = models.CharField("Title", max_length=10000, blank=True)
tag = models.ManyToManyField('Tag', blank=True)
def get_grouped_tags(self):
tag = self.tag.order_by('taglevel')
grouped_tags = {
tag_level: [
{ 'name': tag_of_level.name, 'taglevel': tag_of_level.taglevel, }
for tag_of_level in tags_of_level
] for tag_level, tags_of_level
in groupby(tag, lambda tag: tag.taglevel)
}
return grouped_tags
I tried tag = fields.NestedField(attr="get_grouped_tags")
, but the tags
field just comes up empty in the index.
I tried:
tag = fields.NestedField(properties={
'name': fields.StringField(),
})
But that returns this traceback gave be a keyError for manager (http://dpaste.com/35D46HP). Also, even if this did work, the data would still need to be presented as it looks in my function.
How do I properly access my nested data from an M2M field?
Thanks
PS:
Could be good to tell newbies how to install it pip install git+https://github.com/sabricot/django-elasticsearch-dsl.git
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.