Coder Social home page Coder Social logo

bungiesearch's Introduction

Metrics

bungiesearch's People

Contributors

christopherrabotin avatar diwu1989 avatar folcon avatar joshstegmaier avatar meninoebom avatar mgeist avatar moemen avatar nullsoldier avatar terite avatar vannitotaro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bungiesearch's Issues

Query aliases via manager

Would it not be awesome if something like Article.objects.search.text_search(keywords) would automatically translate to a user provided query, eg, Article.object.search.filter("term", field="my_term").query("match", another_field=keywords)?

These definitions may be able to fit in the ModexIndex definition and rely solely on reflection within the manager (which is always fun).

More explicit error messages

The following doesn't say which model and index failed. Check all exception messages and make them as explicit as possible.

ValueError: Cannot filter by date if the updated_field is not set in the index's Meta class.

How to write filtered queries

I am trying to filter a query for date range and related field values. I assume that the syntax for BungieSearch filtered queries syntax would be derived from elasticsearch-dsl-py. However from reading the elasticsearch-dsl-py docs I am still not able to figure out how to construct filtered queries. Any advice or nudge in toward the right documentation is welcome. Thanks in advance.

Bug in signal processing - self.model is None

self.model is None here. It seems like this should not be the case, and may have slipped through the cracks. I'm going to try to make amends to this - Django seems to set self.model to None by default.

/Users/betterworks/bungiesearch/tests/core/models.py(5)<module>()

      3 

      4 

----> 5 class Article(models.Model):

      6     title = models.TextField(db_index=True)

      7     authors = models.TextField(blank=True)

/Users/betterworks/bungiesearch/tests/core/models.py(22)Article()

     20     popularity_index = models.IntegerField(default=0)

     21 

---> 22     objects = BungiesearchManager()

     23 

     24     class Meta:

> /Users/betterworks/bungiesearch/bungiesearch/managers.py(35)__init__()

     33         import ipdb; ipdb.set_trace()

     34         settings = Bungiesearch.BUNGIE

---> 35         if 'SIGNALS' in settings:

     36             self.signal_processor = get_signal_processor()

     37             self.signal_processor.setup(self.model)

ipdb> self.model

ipdb> type(self.model)

<type 'NoneType'>

Merge in elasticsearch-dsl persistance model

A couple of weeks ago, elasticsearch-dsl added a persistence model, which can be used to define indices and persist data. Bungiesearch also has this functionality, and so much more, including search aliases and django integration.

The idea of this issue is to grab ES-dsl's persistence, to stay true to ES-dsl but add django integration.

Update index with a start and ending date

This will enable devs to rely on the bulk indexing yet still create a celery task for updating latest documents in case there was a server instance failure before buffer was full and indexed.

Allow search aliases to be applicable to all managed models

This will allow very broad search aliases to be created, such as the following:

class Range(SearchAlias):
    def alias_for(self, field, gte=None, lte=None, boost=None, as_query=False):
        body = {field: {}}
        if gte:
            body[field]['gte'] = gte
        if lte:
            body[field]['lte'] = lte
        if boost:
            if not as_query:
                logging.warning('Boost is not applicable to search alias Range when not used as a query.')
            else:
                body[field]['boost'] = boost
        if as_query:
            return self.search_instance.query({'range': body})
        return self.search_instance.filter({'range': body})

How do you create indices for existing models

I am a relatively new Django developer, so forgive the question if it should be easily answered from docs or source.

Is there a management command to create ES indexes for models that already exist?

Time specific indexing

Date based indexing allows one to index items whose updated date is greater or lower than a provided date, and does not support time. This issue is raised to support time as well.

Search alias name spaces

Instead of having a unique search alias, set aliases as a dictionary in order to be able to separate these aliases.

Automatic mapping to return elasticsearch meta information as well

Would it be valuable to return elasticsearch's meta information in the automatically mapped result? This could be achieved using a wrapping class which would extend from which ever model is to be mapped and add additional attributes. In that case, we must also take into consideration that devs may wish to get an item from a search and auto-mapping, change it, and save it. Attributes alien from the model may prevent this saving, so it may be needed to overwrite the save function and remove any additional attribute.

This would require some nice polymorphic code, and especially extensive testing.

Any thoughts @Folcon, @gtebbutt ?

Support elasticsearch-dsl-py 0.0.4

Version 0.0.4 of elasticsearch-dsl-py fails. It seems like the Result object has changed and no longer contains the _meta attribute.

Temporary solution: use elasticsearch-dsl-py version 0.0.3dev.

======================================================================
ERROR: test_concat_queries (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 234, in test_concat_queries
    items = Article.objects.bsearch_title_search('title')[::False] + NoUpdatedField.objects.search.query('match', title='My title')[::False]
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
    results = super(Bungiesearch, self).__getitem__(key).execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_fetch_item (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 76, in test_fetch_item
    self.assertEqual(Article.objects.search.query('match', _all='Description')[0], Article.objects.get(title='Title one'), 'Searching for "Description" did not return just the first Article.')
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
    results = super(Bungiesearch, self).__getitem__(key).execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_iteration (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 92, in test_iteration
    self.assertTrue(all([result in db_items for result in lazy_search]), 'Searching for title "title" did not return all articles.')
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 312, in __iter__
    self.execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_optimal_queries (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 228, in test_optimal_queries
    src_item = NoUpdatedField.objects.search.query('match', title='My title')[0]
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
    results = super(Bungiesearch, self).__getitem__(key).execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_post_save (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 178, in test_post_save
    self.assertNotEqual(find_three[0:1:True]._meta.index, find_three[1:2:True]._meta.index, 'Searching for "three" did not return items from different indices.')
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_search_aliases (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 144, in test_search_aliases
    self.assertTrue(all([result in db_items for result in title_alias]), 'Alias searching for title "title" did not return all articles.')
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 312, in __iter__
    self.execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
FAIL: test_raw_fetch (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 84, in test_raw_fetch
    self.assertTrue(hasattr(item, '_meta'), 'Fetching first raw results did not return an object with a _meta attribute.')
AssertionError: Fetching first raw results did not return an object with a _meta attribute.

======================================================================
FAIL: test_specify_index (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 266, in test_specify_index
    self.assertEqual(Article.objects.count(), Article.objects.search_index('bungiesearch_demo').count(), 'Indexed items on bungiesearch_demo for Article does not match number in database.')
AssertionError: Indexed items on bungiesearch_demo for Article does not match number in database.

----------------------------------------------------------------------
Ran 22 tests in 10.372s

FAILED (failures=2, errors=6)

Delay database fetching on demand

One might want to do multiple searches and concatenate the elastic search results, or process them in some way (e.g. sort them or filter them by score) and only then map them to database items.

A possible solution would be extracting mapping from execute and moving it do a class method. It would accept a list (or tuple) of raw results, and then map them.

Hence, one could do the following (a better example is multiple must/should queries on the same data set):

items = []
items += Article.object.search.query("match", title="Some title")[:20:True]
items += Article.object.search.query("match", description="a description")[:20:True]
items = [item for item in items if item.score > 0.75]
Bungiesearch.map_raw_results(items)

Another solution would be subclassing list in Bungiesearch to use it as such:

items = Results()
items += Article.object.search.query("match", title="Some title")[:20:True]
items += Article.object.search.query("match", description="a description")[:20:True]
items = [item for item in items if item.score > 0.75]
items[:10] # Executes the mapping

Python 3 support

Elasticsearch-dsl-py supports Python 3. However, testing bungiesearch on Python 3 fails with a Runtime Error.

The code in release 1.1.0 will have some initial steps towards Python 3 support, but as the build shows, the support isn't here yet.

Signals raise exceptions on unmanaged models

File "/home/rof/.virtualenv/local/lib/python2.7/site-packages/django/db/models/base.py", line 664, in save_base update_fields=update_fields, raw=raw, using=using)
File "/home/rof/.virtualenv/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py", line 170, in send response = receiver(signal=self, sender=sender, **named) 
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/signals.py", line 16, in post_save_connector update_index(__items_to_be_indexed__[sender], sender.__name__, buffer_size)
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/utils.py", line 19, in update_index index_name = src.get_index(model_name) 
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/__init__.py", line 101, in get_index raise KeyError('Could not find any index defined for {}. Is the model in one of the model index modules of BUNGIESEARCH["INDICES"]?'.format(model)) 
KeyError: 'Could not find any index defined for Session. Is the model in one of the model index modules of BUNGIESEARCH["INDICES"]?'

Cannot request to fetch only the _id of the document

Description

When fields is called on a Bungiesearch instance and the only parameter is _id, then map_raw_results fails to properly fetch the correct information from the database. This is problematic when requesting elasticsearch to return exactly only the IDs, or when the id field (without an underscore) is not provided when calling fields.

Example

In [3]: some_content = RawArticle.objects.bungie_content()
In [4]: for item in some_content[5:10:True]:              
    print item
   ...:     
<Result(sparrho/RawArticle/2477511): {}>
<Result(sparrho/RawArticle/2477523): {}>
<Result(sparrho/RawArticle/2477528): {}>
<Result(sparrho/RawArticle/2477491): {}>
<Result(sparrho/RawArticle/2477530): {}>
In [5]: some_content = RawArticle.objects.search.fields(['_id'])
In [6]: for item in some_content[5:10:True]:                    
    print item
   ...:     
<Result(sparrho/RawArticle/2477484): {}>
<Result(sparrho/RawArticle/2477504): {}>
<Result(sparrho/RawArticle/2477509): {}>
<Result(sparrho/RawArticle/2477511): {}>
<Result(sparrho/RawArticle/2477523): {}>
In [7]: for item in some_content[5:10:False]:
    print item
   ...:     
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-7541076dcf2b> in <module>()
----> 1 for item in some_content[5:10:False]:
      2     print item
      3 
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in __getitem__(self, key)
    339         else:
    340             single_item = True
--> 341         results = super(Bungiesearch, self).__getitem__(key).execute()
    342         if single_item:
    343             try:
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in execute(self, return_results)
    284             self.results = self.raw_results
    285         else:
--> 286             self.map_results()
    287 
    288         if return_results:
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in map_results(self)
    293         Maps raw results and store them.
    294         '''
--> 295         self.results = Bungiesearch.map_raw_results(self.raw_results, self)
    296 
    297     def only(self, *fields):
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in map_raw_results(cls, raw_results, instance)
    174                 results[pos] = result
    175             else:
--> 176                 model_results['{}.{}'.format(result._meta.index, model_name)].append(result.id)
    177                 found_results['{1._meta.index}.{0}.{1.id}'.format(model_name, result)] = (pos, result._meta)
    178 
/home/chris/.virtualenvs/sparrho-dj17/lib/python2.7/site-packages/elasticsearch_dsl/utils.pyc in __getattr__(self, attr_name)
    104         except KeyError:
    105             raise AttributeError(
--> 106                 '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
    107 
    108     def __getitem__(self, key):
AttributeError: 'Result' object has no attribute 'id'
In [8]: dir(item)
Out[8]: ['_meta']
In [9]: dir(item._meta)
Out[9]: ['doc_type', u'id', u'index', u'score']
In [10]: 

Indexing a None value may index it as a string

At least in some instances, it seems that indexing a None value will index it as None: the missing filter does not filter them out.

In [69]: Article.objects.bungie_allcontent().index('sparrho').filter('missing', field='created').sort('created').count()
Out[69]: 0
In [70]: article = Article.objects.bungie_allcontent().index('sparrho').sort('created')[:1:True]
In [71]: article.created is None
Out[71]: True

Signal connection

Connect models to the post_save signal. If possible, define a bulk size of items to index all at once.
Warning: if bulk updating, see if we can connect to the pre shutdown signal (if such exist) in order to index buffered items.

Aliases should be able to work with Bungiesearch instances

Currently hook_alias is a class method. However, this limits search aliases to models managed by bungiesearch. It should be relatively trivial to allow aliases to work for bungiesearch instances. That would be useful when one wants to query several doc types at once. However, that could prevent automatic object mapping.

The major advantage is being able to combine queries and aliases (supposing managers still work), as such:
Article.objects.search.query('match', field='value').regex('date', gte='2014-09-23')

Will require the following:

  • Convert hook alias to an instance method
  • Remove model verification if used out of a model (self.model = None on instantiation may help).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.