christopherrabotin / bungiesearch Goto Github PK
View Code? Open in Web Editor NEWUNMAINTAINED CODE -- Elasticsearch-dsl-py django wrapper with mapping generator
License: BSD 3-Clause "New" or "Revised" License
UNMAINTAINED CODE -- Elasticsearch-dsl-py django wrapper with mapping generator
License: BSD 3-Clause "New" or "Revised" License
Would it not be awesome if something like Article.objects.search.text_search(keywords)
would automatically translate to a user provided query, eg, Article.object.search.filter("term", field="my_term").query("match", another_field=keywords)
?
These definitions may be able to fit in the ModexIndex definition and rely solely on reflection within the manager (which is always fun).
Cannot do key.step = None
in __getitem__
because it's a read only attribute.
Code should be this:
key = slice(key.start, key.stop)
It's from ModelIndex.
If multiple search indexes defined for the same model, it arbitrarily picks one.
Present workaround requires creating an alias which explicitly sets the self.search_instance._index
before returning the self.search_instance
.
Speedbar has haystack metrics, let's make it work with bungiesearch as well! :).
Calling only
should return a clone of bungiesearch with the only information set.
In [5]: Article.objects.search.query('match_all')[:]
Out [5]: Article object
Expected a full list.
The following doesn't say which model and index failed. Check all exception messages and make them as explicit as possible.
ValueError: Cannot filter by date if the updated_field is not set in the index's Meta class.
I am trying to filter a query for date range and related field values. I assume that the syntax for BungieSearch filtered queries syntax would be derived from elasticsearch-dsl-py. However from reading the elasticsearch-dsl-py docs I am still not able to figure out how to construct filtered queries. Any advice or nudge in toward the right documentation is welcome. Thanks in advance.
self.model
is None
here. It seems like this should not be the case, and may have slipped through the cracks. I'm going to try to make amends to this - Django seems to set self.model
to None
by default.
/Users/betterworks/bungiesearch/tests/core/models.py(5)<module>()
3
4
----> 5 class Article(models.Model):
6 title = models.TextField(db_index=True)
7 authors = models.TextField(blank=True)
/Users/betterworks/bungiesearch/tests/core/models.py(22)Article()
20 popularity_index = models.IntegerField(default=0)
21
---> 22 objects = BungiesearchManager()
23
24 class Meta:
> /Users/betterworks/bungiesearch/bungiesearch/managers.py(35)__init__()
33 import ipdb; ipdb.set_trace()
34 settings = Bungiesearch.BUNGIE
---> 35 if 'SIGNALS' in settings:
36 self.signal_processor = get_signal_processor()
37 self.signal_processor.setup(self.model)
ipdb> self.model
ipdb> type(self.model)
<type 'NoneType'>
A couple of weeks ago, elasticsearch-dsl added a persistence model, which can be used to define indices and persist data. Bungiesearch also has this functionality, and so much more, including search aliases and django integration.
The idea of this issue is to grab ES-dsl's persistence, to stay true to ES-dsl but add django integration.
Expected behavior: updating the index of an id which has the same id as another item already indexed should update the index instead of creating a new item.
Must allow connecting to ElasticSearch via HTTP Basic Auth.
In any search alias, self.model
should return the model it will search (i.e. the doc type) in order for any alias to be aware of that information.
It may be of interest to allow for multiple signals to be connected. Django does support, by itself, attaching multiple hooks to a given signal.
This will enable devs to rely on the bulk indexing yet still create a celery task for updating latest documents in case there was a server instance failure before buffer was full and indexed.
This will allow very broad search aliases to be created, such as the following:
class Range(SearchAlias):
def alias_for(self, field, gte=None, lte=None, boost=None, as_query=False):
body = {field: {}}
if gte:
body[field]['gte'] = gte
if lte:
body[field]['lte'] = lte
if boost:
if not as_query:
logging.warning('Boost is not applicable to search alias Range when not used as a query.')
else:
body[field]['boost'] = boost
if as_query:
return self.search_instance.query({'range': body})
return self.search_instance.filter({'range': body})
I am a relatively new Django developer, so forgive the question if it should be easily answered from docs or source.
Is there a management command to create ES indexes for models that already exist?
Date based indexing allows one to index items whose updated date is greater or lower than a provided date, and does not support time. This issue is raised to support time as well.
Instead of having a unique search alias, set aliases as a dictionary in order to be able to separate these aliases.
Would it be valuable to return elasticsearch's meta information in the automatically mapped result? This could be achieved using a wrapping class which would extend from which ever model is to be mapped and add additional attributes. In that case, we must also take into consideration that devs may wish to get an item from a search and auto-mapping, change it, and save it. Attributes alien from the model may prevent this saving, so it may be needed to overwrite the save
function and remove any additional attribute.
This would require some nice polymorphic code, and especially extensive testing.
This will make Bungiesearch compatible with other index organizations, including haystack's.
Version 0.0.4 of elasticsearch-dsl-py fails. It seems like the Result object has changed and no longer contains the _meta
attribute.
Temporary solution: use elasticsearch-dsl-py version 0.0.3dev.
======================================================================
ERROR: test_concat_queries (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 234, in test_concat_queries
items = Article.objects.bsearch_title_search('title')[::False] + NoUpdatedField.objects.search.query('match', title='My title')[::False]
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
results = super(Bungiesearch, self).__getitem__(key).execute()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
self.map_results()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
self.results = Bungiesearch.map_raw_results(self.raw_results, self)
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
model_name = result._meta.doc_type
File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'
======================================================================
ERROR: test_fetch_item (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 76, in test_fetch_item
self.assertEqual(Article.objects.search.query('match', _all='Description')[0], Article.objects.get(title='Title one'), 'Searching for "Description" did not return just the first Article.')
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
results = super(Bungiesearch, self).__getitem__(key).execute()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
self.map_results()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
self.results = Bungiesearch.map_raw_results(self.raw_results, self)
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
model_name = result._meta.doc_type
File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'
======================================================================
ERROR: test_iteration (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 92, in test_iteration
self.assertTrue(all([result in db_items for result in lazy_search]), 'Searching for title "title" did not return all articles.')
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 312, in __iter__
self.execute()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
self.map_results()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
self.results = Bungiesearch.map_raw_results(self.raw_results, self)
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
model_name = result._meta.doc_type
File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'
======================================================================
ERROR: test_optimal_queries (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 228, in test_optimal_queries
src_item = NoUpdatedField.objects.search.query('match', title='My title')[0]
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
results = super(Bungiesearch, self).__getitem__(key).execute()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
self.map_results()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
self.results = Bungiesearch.map_raw_results(self.raw_results, self)
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
model_name = result._meta.doc_type
File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'
======================================================================
ERROR: test_post_save (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 178, in test_post_save
self.assertNotEqual(find_three[0:1:True]._meta.index, find_three[1:2:True]._meta.index, 'Searching for "three" did not return items from different indices.')
File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'
======================================================================
ERROR: test_search_aliases (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 144, in test_search_aliases
self.assertTrue(all([result in db_items for result in title_alias]), 'Alias searching for title "title" did not return all articles.')
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 312, in __iter__
self.execute()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
self.map_results()
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
self.results = Bungiesearch.map_raw_results(self.raw_results, self)
File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
model_name = result._meta.doc_type
File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'
======================================================================
FAIL: test_raw_fetch (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 84, in test_raw_fetch
self.assertTrue(hasattr(item, '_meta'), 'Fetching first raw results did not return an object with a _meta attribute.')
AssertionError: Fetching first raw results did not return an object with a _meta attribute.
======================================================================
FAIL: test_specify_index (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 266, in test_specify_index
self.assertEqual(Article.objects.count(), Article.objects.search_index('bungiesearch_demo').count(), 'Indexed items on bungiesearch_demo for Article does not match number in database.')
AssertionError: Indexed items on bungiesearch_demo for Article does not match number in database.
----------------------------------------------------------------------
Ran 22 tests in 10.372s
FAILED (failures=2, errors=6)
One might want to do multiple searches and concatenate the elastic search results, or process them in some way (e.g. sort them or filter them by score) and only then map them to database items.
A possible solution would be extracting mapping from execute
and moving it do a class method. It would accept a list (or tuple) of raw results, and then map them.
Hence, one could do the following (a better example is multiple must/should queries on the same data set):
items = []
items += Article.object.search.query("match", title="Some title")[:20:True]
items += Article.object.search.query("match", description="a description")[:20:True]
items = [item for item in items if item.score > 0.75]
Bungiesearch.map_raw_results(items)
Another solution would be subclassing list
in Bungiesearch to use it as such:
items = Results()
items += Article.object.search.query("match", title="Some title")[:20:True]
items += Article.object.search.query("match", description="a description")[:20:True]
items = [item for item in items if item.score > 0.75]
items[:10] # Executes the mapping
Elasticsearch-dsl-py supports Python 3. However, testing bungiesearch on Python 3 fails with a Runtime Error.
The code in release 1.1.0 will have some initial steps towards Python 3 support, but as the build shows, the support isn't here yet.
max_docs = num_docs + 1 if num_docs > bulk_size else bulk_size + 1
on line 47 of utils
is invalid.
File "/home/rof/.virtualenv/local/lib/python2.7/site-packages/django/db/models/base.py", line 664, in save_base update_fields=update_fields, raw=raw, using=using)
File "/home/rof/.virtualenv/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py", line 170, in send response = receiver(signal=self, sender=sender, **named)
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/signals.py", line 16, in post_save_connector update_index(__items_to_be_indexed__[sender], sender.__name__, buffer_size)
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/utils.py", line 19, in update_index index_name = src.get_index(model_name)
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/__init__.py", line 101, in get_index raise KeyError('Could not find any index defined for {}. Is the model in one of the model index modules of BUNGIESEARCH["INDICES"]?'.format(model))
KeyError: 'Could not find any index defined for Session. Is the model in one of the model index modules of BUNGIESEARCH["INDICES"]?'
When fields
is called on a Bungiesearch instance and the only parameter is _id
, then map_raw_results fails to properly fetch the correct information from the database. This is problematic when requesting elasticsearch to return exactly only the IDs, or when the id
field (without an underscore) is not provided when calling fields
.
In [3]: some_content = RawArticle.objects.bungie_content()
In [4]: for item in some_content[5:10:True]:
print item
...:
<Result(sparrho/RawArticle/2477511): {}>
<Result(sparrho/RawArticle/2477523): {}>
<Result(sparrho/RawArticle/2477528): {}>
<Result(sparrho/RawArticle/2477491): {}>
<Result(sparrho/RawArticle/2477530): {}>
In [5]: some_content = RawArticle.objects.search.fields(['_id'])
In [6]: for item in some_content[5:10:True]:
print item
...:
<Result(sparrho/RawArticle/2477484): {}>
<Result(sparrho/RawArticle/2477504): {}>
<Result(sparrho/RawArticle/2477509): {}>
<Result(sparrho/RawArticle/2477511): {}>
<Result(sparrho/RawArticle/2477523): {}>
In [7]: for item in some_content[5:10:False]:
print item
...:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-7541076dcf2b> in <module>()
----> 1 for item in some_content[5:10:False]:
2 print item
3
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in __getitem__(self, key)
339 else:
340 single_item = True
--> 341 results = super(Bungiesearch, self).__getitem__(key).execute()
342 if single_item:
343 try:
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in execute(self, return_results)
284 self.results = self.raw_results
285 else:
--> 286 self.map_results()
287
288 if return_results:
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in map_results(self)
293 Maps raw results and store them.
294 '''
--> 295 self.results = Bungiesearch.map_raw_results(self.raw_results, self)
296
297 def only(self, *fields):
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in map_raw_results(cls, raw_results, instance)
174 results[pos] = result
175 else:
--> 176 model_results['{}.{}'.format(result._meta.index, model_name)].append(result.id)
177 found_results['{1._meta.index}.{0}.{1.id}'.format(model_name, result)] = (pos, result._meta)
178
/home/chris/.virtualenvs/sparrho-dj17/lib/python2.7/site-packages/elasticsearch_dsl/utils.pyc in __getattr__(self, attr_name)
104 except KeyError:
105 raise AttributeError(
--> 106 '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
107
108 def __getitem__(self, key):
AttributeError: 'Result' object has no attribute 'id'
In [8]: dir(item)
Out[8]: ['_meta']
In [9]: dir(item._meta)
Out[9]: ['doc_type', u'id', u'index', u'score']
In [10]:
It should be possible to get a score function through bungiesearch.
Should then use the doc type to check if it's applicable.
At least in some instances, it seems that indexing a None value will index it as None
: the missing
filter does not filter them out.
In [69]: Article.objects.bungie_allcontent().index('sparrho').filter('missing', field='created').sort('created').count()
Out[69]: 0
In [70]: article = Article.objects.bungie_allcontent().index('sparrho').sort('created')[:1:True]
In [71]: article.created is None
Out[71]: True
File "/home/chris/.virtualenvs/sparrho/src/bungiesearch/bungiesearch/management/commands/search_index.py", line 149, in handle
es.indices.put_mapping(model_name, model_idx.get_mapping(), index=index)
AttributeError: 'list' object has no attribute 'get_mapping'
Connect models to the post_save signal. If possible, define a bulk size of items to index all at once.
Warning: if bulk updating, see if we can connect to the pre shutdown signal (if such exist) in order to index buffered items.
Currently hook_alias
is a class method. However, this limits search aliases to models managed by bungiesearch. It should be relatively trivial to allow aliases to work for bungiesearch instances. That would be useful when one wants to query several doc types at once. However, that could prevent automatic object mapping.
The major advantage is being able to combine queries and aliases (supposing managers still work), as such:
Article.objects.search.query('match', field='value').regex('date', gte='2014-09-23')
Will require the following:
hook alias
to an instance methodself.model = None
on instantiation may help).A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.