Coder Social home page Coder Social logo

django-haystack's Introduction

image

image

image

image

image

image

image

Haystack

author

Daniel Lindsley

date

2013/07/28

Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends (such as Solr, Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code.

Haystack is BSD licensed, plays nicely with third-party app without needing to modify the source and supports advanced features like faceting, More Like This, highlighting, spatial search and spelling suggestions.

You can find more information at http://haystacksearch.org/.

Getting Help

There is a mailing list (http://groups.google.com/group/django-haystack/) available for general discussion and an IRC channel (#haystack on irc.freenode.net).

Documentation

See the changelog

Requirements

Haystack has a relatively easily-met set of requirements.

Additionally, each backend has its own requirements. You should refer to https://django-haystack.readthedocs.io/en/latest/installing_search_engines.html for more details.

Experimental support for Django v5.0

The current release on PyPI does not yet support Django v5.0.

To run on Django v5.0, please install by using: pip install git+https://github.com/django-haystack/django-haystack.git

django-haystack's People

Contributors

acdha avatar asedeno avatar benspaulding avatar bigjust avatar cabalist avatar cclauss avatar claudep avatar dcwatson avatar dulmandakh avatar elsaico avatar fabiopiovam avatar honzakral avatar jezdez avatar joaojunior avatar madanthangavelu avatar mattdeboard avatar mcroydon avatar mrkioz avatar pre-commit-ci[bot] avatar rabidcicada avatar robhudson avatar sk1p avatar stevebyerly avatar subsume-zz avatar surgo avatar tmc avatar toastdriven avatar tomkins avatar troygrosfield avatar tymofij avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-haystack's Issues

Xapian Backend

The Xapian search engine is frequently requested. Due to licensing, it can not be bundled with Haystack but a separate download ought to be provided. Easy installation of that (and other GPL backends) would be nice.

./manage.py dumpdata is broken if haystack is installed

I tried to dumpdata from my current project to create a fixture.

Here is the output for the dumpdata command:

phxx@xenya$ ./manage.py dumpdata block --indent=2 | head
Loaded URLconf to initialize SearchSite...
Main site registered 2 index(es).
[
{
"pk": 1,
"model": "block.block",
"fields": {
"title": "Neueste Datenbl\u00e4tter",
"sort_value": 0.0,
"created": "2009-05-11 11:35:31",

As you can see, haystack outputs two lines before performing the command. Though dumpdata is not really broken but if redirect that output into a file (like explained in the django documentation) and try to load the fixture with "./manage.py loaddata", it will raise an error. The first two lines that cause that problem only occur if haystack is installed (in the INSTALLED_APPS setting).

KeyError when reindexing

I get a key error when reindexing. This is probably a whoosh issue, but I am not completely sure how to extract all haystack specific context.

(richard garibaldi):~/Projects/balkonetka/balkonetka% ./manage.py reindex
Loaded URLconf to initialize SearchSite...
Main site registered 2 index(es).
Indexing 74 bras.
Indexing 54 posts.
Traceback (most recent call last):
  File "./manage.py", line 11, in <module>
    execute_manager(settings)
  File "/Users/richard/Projects/balkonetka/balkonetka/django/core/management/__init__.py", line 340, in execute_manager
    utility.execute()
  File "/Users/richard/Projects/balkonetka/balkonetka/django/core/management/__init__.py", line 295, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/richard/Projects/balkonetka/balkonetka/django/core/management/base.py", line 192, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/Users/richard/Projects/balkonetka/balkonetka/django/core/management/base.py", line 219, in execute
    output = self.handle(*args, **options)
  File "/Users/richard/Projects/balkonetka/balkonetka/haystack/management/commands/reindex.py", line 34, in handle
    self.handle_app(None, **options)
  File "/Users/richard/Projects/balkonetka/balkonetka/haystack/management/commands/reindex.py", line 81, in handle_app
    index.backend.update(index, small_cache_qs[start:end])
  File "/Users/richard/Projects/balkonetka/balkonetka/haystack/backends/whoosh_backend.py", line 120, in update
    writer.commit()
  File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/writing.py", line 218, in commit
    self._merge_segments(mergetype)
  File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/writing.py", line 230, in _merge_segments
    new_segments = mergetype(self.index, sw, self.segments)
  File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/writing.py", line 65, in MERGE_SMALL
    writer.add_segment(ix, seg)
  File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/writing.py", line 406, in add_segment
    newdoc = doc_map[docnum]
KeyError: 109

Support for related models

Actually more like support for related SearchIndexes. An exmaple case is searching for all blog posts that mention a certain word either in their body or in one of the comments. Searching both indexes is certainly possible but requires manually following foreign keys and merging the results not to get duplicate entries.

It would be nice to be able to use related SearchIndexes just like you use any other SearchIndex fields.

Does Whoosh Support Date Fields?

Sorry for a less pointed ticket here, but as you've probably guessed from the other issues I've filed, I'm trying to implement a date filter using the Whoosh backend. I'm running into this error from Whoosh:

In [36]: q.query_filters
Out[36]:
[<QueryFilter: AND content__exact=nunc>,
<QueryFilter: AND pub_date__gte=2009-05-12 12:08:21.131583>]

In [37]: q.build_query()
Out[37]: u'nunc AND NOT pub_date:*.."2009-05-12T12:08:21.000Z"'

In [38]: q.run()

ParseException: Expected end of text (at char 23), (line:1, col:24)

I'm totally green about how whoosh (or any search index, really) works, but I can't find any docs or code referring to handling dates. I think it may be objecting to the "*.." operators being used for a string field? Maybe this isn't even a whoosh specific problem, but for today I don't have time to see what Solr does with this data.

manage.py reindex raises Exception for Whoosh backend

The first time you run reindex everything is fine. The second and subsequent times raise the following exception:

(richard garibaldi):~/Projects/balkonetka/balkonetka% ./manage.py reindex
Loaded URLconf to initialize SearchSite...
Main site registered 1 index(es).
Indexing 73 bras.
Traceback (most recent call last):
File "./manage.py", line 11, in
execute_manager(settings)
File "/Users/richard/Projects/balkonetka/balkonetka/django/core/management/init.py", line 340, in execute_manager
utility.execute()
File "/Users/richard/Projects/balkonetka/balkonetka/django/core/management/init.py", line 295, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/Users/richard/Projects/balkonetka/balkonetka/django/core/management/base.py", line 192, in run_from_argv
self.execute(_args, *_options.dict)
File "/Users/richard/Projects/balkonetka/balkonetka/django/core/management/base.py", line 219, in execute
output = self.handle(_args, *_options)
File "/Users/richard/Projects/balkonetka/balkonetka/haystack/management/commands/reindex.py", line 34, in handle
self.handle_app(None, **options)
File "/Users/richard/Projects/balkonetka/balkonetka/haystack/management/commands/reindex.py", line 81, in handle_app
index.backend.update(index, small_cache_qs[start:end])
File "/Users/richard/Projects/balkonetka/balkonetka/haystack/backends/whoosh_backend.py", line 117, in update
writer.update_document(**doc)
File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/writing.py", line 204, in update_document
self.delete_by_term(name, fields[name], searcher = searcher)
File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/index.py", line 184, in delete_by_term
return self.delete_by_query(q, searcher = searcher)
File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/index.py", line 205, in delete_by_query
for docnum in q.docs(s):
File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/query.py", line 267, in docs
if (fieldnum, text) in searcher:
File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/searching.py", line 62, in contains
return term in self.term_reader
File "/Users/richard/Projects/balkonetka/balkonetka/whoosh/util.py", line 177, in wrapper
raise Exception("This object has been closed")
Exception: This object has been closed

What is more, whoosh leaves a lock file behind itself, which you have to remove manually.

SearchQuerySet unnecessarily loads in results not within the slice asked for

Given the following code:

from haystack.query import SearchQuerySet
hello_results = SearchQuerySet().filter(content='hello')[80:100]

SearchQuerySet will load all results from 0 to 100 when all we need to do it load 80 to 100. This becomes painful on even larger sets. Currently SearchQuerySet will always load in the _result_cache up to the index being requested. It should do some checking, perhaps if the requested index is greater then the result_cache length + ITERATOR_LOAD_PER_QUERY, then skip the populating of the result_cache.

Whoosh Error

Whoosh sometimes throws an exception when searching, usually to the effect of:

"KeyError at /search/: ([u'city'], {})"
@whoosh/fields.py in name_to_number, line 339

Create stored fields iterator or method for SearchResult

It some cases, especially when converting the SearchResult into JSON format, it would be nice only to feed stored fields to the JSON converter. Currently you declare stored fields in the index and pull those same fields when converting to JSON, having an iterator or method would make this process more inline with the DRY principle.

QueryFilter.__repr__ assumes values are strings

In attempting to implement date filtering along the lines of the documentation (e.g.
SearchQuerySet().filter(content='foo', pub_date__lte=datetime.date(2008, 1, 1))
)

I get an error when trying to inspect the filters.

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/haystack/backends/init.pyc in repr(self)
121 join = 'OR'
122
123 --> return '<QueryFilter: %s %s=%s>' % (join, FILTER_SEPARATOR.join((self.field, self.filter_type)), self.value.encode('utf8'))
124
125 def split_expression(self, expression):

AttributeError: 'datetime.date' object has no attribute 'encode'

changing line 123 to:
return '<QueryFilter: %s %s=%s>' % (join, FILTER_SEPARATOR.join((self.field, self.filter_type)), str(self.value).encode('utf8'))

seems to do the trick.

patch to solr backend to permit passing datetimes and dates to query filters

Right now, solr_backend's build_query() will explode if you pass in a datetime to a query filter, a la

 qs.filter(pub_date__lte=datetime.datetime.now())

which I think is desirable to support. The patch is very straightforward:

diff --git a/haystack/backends/solr_backend.py b/haystack/backends/solr_backend.py
index 122a636..a982421 100644
--- a/haystack/backends/solr_backend.py
+++ b/haystack/backends/solr_backend.py
@@ -1,3 +1,4 @@
+import datetime
 import sys
 from django.conf import settings
 from django.core.exceptions import ImproperlyConfigured
@@ -202,6 +203,8 @@ class SearchQuery(BaseSearchQuery):

                 if isinstance(value, (int, long, float, complex)):
                     value = str(value)
+                elif isinstance(value, (datetime.datetime, datetime.date)):
+                    value=self.backend.conn._from_python(value)

                 # Check to see if it's a phrase for an exact match.
                 if ' ' in value:

search for "a b" causes an AttributeError exception

get AttributeError at /search/
'NoneType' object has no attribute 'doc_scores'
Exception Location: /home/mick/src/iae/whoosh/searching.py in search, line 170

The following addition to the test cases causes this error

~/src/django-haystack$ git diff
diff --git a/tests/whoosh_tests/tests/whoosh_backend.py b/tests/whoosh_tests/tests/whoosh_backend.py
index 0004f76..3e5f613 100644
--- a/tests/whoosh_tests/tests/whoosh_backend.py
+++ b/tests/whoosh_tests/tests/whoosh_backend.py
@@ -112,7 +112,8 @@ class WhooshSearchBackendTestCase(TestCase):

     # A one letter query string gets nabbed by a stopwords filter. Should
     # always yield zero results.
  •    self.assertEqual(self.sb.search('a'), {'hits': 0, 'results': []})
    
  •    self.assertEqual(self.sb.search('a'), {'hits': 0, 'results': []})  
    
  •    self.assertEqual(self.sb.search('a b'), {'hits': 0, 'results': []})
    
    
     self.assertEqual(self.sb.search('*')['hits'], 3)
    

Test failures

I get the following test failures:

FAIL: test_build_query_with_models (solr_tests.tests.solr_query.SolrSearchQueryTestCase)

Traceback (most recent call last):
File "/home/joseph/utils/src/django-haystack.git/tests/solr_tests/tests/solr_query.py", line 94, in test_build_query_with_models
self.assertEqual(self.sq.build_query(), '(hello) AND (django_ct_s:core.mockmodel OR django_ct_s:core.anothermockmodel)')
AssertionError: u'(hello) AND (django_ct_s:core.anothermockmodel OR django_ct_s:core.mockmodel)' != '(hello) AND (django_ct_s:core.mockmodel OR django_ct_s:core.anothermockmodel)'

FAIL: test_build_query_with_models (whoosh_tests.tests.whoosh_query.WhooshSearchQueryTestCase)

Traceback (most recent call last):
File "/home/joseph/utils/src/django-haystack.git/tests/whoosh_tests/tests/whoosh_query.py", line 95, in test_build_query_with_models
self.assertEqual(self.sq.build_query(), '(hello) AND (django_ct_s:"core.mockmodel" OR django_ct_s:"core.anothermockmodel")')
AssertionError: u'(hello) AND (django_ct_s:"core.anothermockmodel" OR django_ct_s:"core.mockmodel")' != '(hello) AND (django_ct_s:"core.mockmodel" OR django_ct_s:"core.anothermockmodel")'


Ran 121 tests in 2.754s

FAILED (failures=2)

whoosh_backend.build_query() mishandles date values

When attempting to implement code along the lines of this from the docs:
Note.objects.filter(pub_date__lte=datetime.datetime.now())
the whoosh_backend yields an error:

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/haystack/backends/whoosh_backend.pyc in build_query(self)
316
317 # Check to see if it's a phrase for an exact match.

--> 318 if ' ' in value:
319 value = '"%s"' % value
320

TypeError: argument of type 'datetime.datetime' is not iterable

The pysolr_backend has a bit right before the comparable code which converts to "what pysolr" wants, but I don't know what that would be for whoosh.

Have update_object available at SearchSite level

Currently you need to do site.get_index(klass).update_object(instance)

It would be nice if haystack provided a shorthand for the common case where the object is of the same class as the index. I propose having something like this at SearchSite level:

def update_object(self, instance, index = None):
    if not index:
        index = type(instance)
    self.get_index(index).update_object(instance)

And possibly a similar wrapper for remove_object.

following tutorial, schema generated by build_solr_schema is "empty"

I tried following the tutorial at http://haystacksearch.org/docs/tutorial.html#haystack-tutorial, but I ended up with the following problem:
even if I have
import haystack
haystack.autodiscover()
in urls.py, build_solr_schema generates the same output as if no SearchIndex was defined. Using this schema, solr understandably gives

SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no defaultSearchField defined in schema.xml

(I'll try to attach or link to a minimal example that shows this problem.)

Suggest using ?q= instead of ?query= for search string

?q= is what Google uses, is significantly shorter and is nicer to type if you're hand-entering a search URL (which I find myself doing quite often). The term "query" is also a bit unfriendly to expose to non technical users in a URL.

Whoosh ParseException when searching for colons

Howdy Hays,

This is the exception I get when I search my site for "http:"
(btw django 1.0.2)...

ParseException at /search/http:

Expected end of text (at char 4), (line:1, col:5)

Request Method: GET
Request URL: http://localhost:8000/search/http:
Exception Type: ParseException
Exception Value:

Expected end of text (at char 4), (line:1, col:5)

Exception Location: F:\Documents and Settings...\apps\whoosh\qparser.py in parse, line 184

Lucene Backend

This is the final official backend blocking the release of Haystack. Implement.

auto_query does not properly escape special characters when using whoosh backend

Using todays (patched) trunk-version of whoosh and latest haystack the auto_query SearchQuerySet fails if a reserved character [1] is used.

In [6]: SearchQuerySet().filter(content=u"hello world :.'")
Out[6]: [<SearchResult: weblog.entry (pk=u'1477')>]

In [8]: SearchQuerySet().auto_query(u"hello world :.'")
Out[8]: ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (364, 0))
...
ParseException: Expected end of text (at char 20), (line:1, col:21)

[1] http://github.com/toastdriven/django-haystack/blob/38725f0891d15f638c67ee4fa714245ae2014314/haystack/backends/whoosh_backend.py#L28

patch to the search index metaclass to support inherited field declarations

One may want to use a number of identical fields across a number of (in other respects distinct) SearchIndex classes, but the current DeclarativeMetaclass used doesn't support inheritance, so they need to be declared all over again in every index. The patch below adds simple inheritance support. (This doesn't seem to be the ideal way to submit a patch; if there is some cooler git way to proceed of which I am ignorant, feel free to bang me on the head.)

diff --git a/haystack/indexes.py b/haystack/indexes.py
index 009dad5..bc7be57 100644
--- a/haystack/indexes.py
+++ b/haystack/indexes.py
@@ -5,6 +5,15 @@ from haystack.fields import *
 class DeclarativeMetaclass(type):
     def __new__(cls, name, bases, attrs):
         attrs['fields'] = {}
+        try:
+            parents=[b for b in bases if issubclass(b, SearchIndex)]
+        except NameError:
+            pass
+        else:
+            for p in parents:
+                fields=getattr(p, 'fields', None)
+                if fields:
+                    attrs['fields'].update(fields)

         for field_name, obj in attrs.items():
             if isinstance(obj, SearchField):

Narrowing search queries to specific models is not working properly

Narrowing search queries to specific models is not working properly due to
a bug in the ModelSearchForm model handling: in forms.py a set of models is
passed to the search query set (sqs.models(self.get_models())), whereas the
models method itself is defined as def models(self, *models). Meaning, “for
model in models” returns a set of models, instead of just one model.

Support for stemming

It appears as though Whoosh has support for stemming analysis, but it is unclear if this is present in Haystack. Presumably, it is, but documentation on how to set it up to work with Haystack doe not exist.

Haystack within the Django admin for global search across all models

One of the talks at EuroDjangoCon mentioned that a global search field in the admin that searched all editable content items would be useful. This seems like just the kind of thing that Haystack could be used for. Might be tricky figuring out exactly how to integrate it in to the admin - personally I'd expect it to involve a brand new /admin/search/ view which is styled to look like the admin but has haystack do all of the work

Need some way to access a model's verbose_plural_name in templates.

I'm trying to organize my search results by model, with headers for each model, like this:

MODEL 1:

  • item
  • item

MODEL 2:

  • item

And so forth. Problem is, the headers should really be the verbose_name_plural for each model, but that's inaccessible in templates (templates can't use _meta, because attributes starting with an underscore are disallowed in templates). I just need some way to get to that verbose_name_plural so I can make pretty headers.

Thanks!

Revamp Schema Building

The current schema setup is less than optimal, especially if one defines new fields. Need to add a way to allow for better extension so SearchBackends can take advantage of what's available.

Error on 1 letter searches

I was having errors when performing 1 letter searches. Not that anyone would usually do this, but if it happens, the user would get a server error. It seems like the code:

    # A one-character query (non-wildcard) gets nabbed by a stopwords
    # filter and should yield zero results.
    if len(query_string) <= 1 and query_string != '*':
        return []

returns a empty list, on which a get() method is called in line 497:
self._results = results.get('results', [])

This throws an error, because you can't use get() on a list.

Changing the return statement to:
return {}

Fixes things for me.

indexes.DateTimeField should return a datetime-object

I've implemented a SearchIndex as the tutorial explained and the return of the pub_date field is a unicode string:

{% for result in page.object_list %}
    {{ result.pub_date|timesince }} # won't work
...

Would be better if DateTimeField and DateField return a datetime object.

When no indexes are registered, Whoosh throws an exception.

This is a bit of an edge-case, but when no indexes are registered (either manually or via autodiscover), the Whoosh backend fails miserably. A different (but similar looking) "KeyError at /search/" due to no "document=True" fields being available.

Make the highlighted formatter replaceable

I'm using the whoosh backend and fetching highlighted results. Currently the highlighted words are uppercase words. I want some red blinky shiny stars around them. ;) The highlight function is not replaceable (not without abstracting the whole whoosh backend) so it would be cool if this highlight function is replaceable.

Refs:

Bug indexing inherited models

When attempting to index an inherited model, I received the following explosion: http://dpaste.com/hold/38525/

The problem, it seems, is on line 66 of reindex.py:

  qs = index.get_query_set().filter(**extra_lookup_kwargs).order_by(model._meta.pk.attname)

The "attname" of the pk of an inherited table is something like "parent_ptr_id", whereas the "name" will be "parent_ptr". If you change "attname" to "name" in the above, reindex works in this case. Whether this is a general solution or not, I'm not sure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.