Coder Social home page Coder Social logo

xapit's Issues

Previous or next record

Given a search collection and a record from that search, one should be able to fetch the previous/next record in that search. For example:

recipes = Recipe.search(:conditions => { :category_id => 3 }, :order => :name)
recipe = recipes[17] # some record
recipes.previous(recipe) # returns recipe 16
recipes.next(recipe) # returns recipe 18

This is ideal for adding previous/next links after clicking on a searched record in a web app.

This needs to work outside of the paginated result set, so we can't just fetch the records and do the processing in Ruby.

One solution is to take the sort key and add that as a condition. For example, if the recipe's name is "Chicken", we can find the previous record by adding a condition where name < Chicken which might find "Cake".

However, this solution has two serious limitations. One is that an order key must be specified (the default xapian relevancy won't work). Another is that the order key must be unique. If we have multiple recipes with the same name then it will skip them.

Search conditions with ranges

One should be able to pass a range into a condition to fetch any record with a value in that range.

search(:conditions => { :priority => 3..5 })

On a similar note, it would be nice if one could search on a set of values too. This would find all records which match either value.

search(:conditions => { :priority => [3, 5] })

This should work for integers, floats, and dates/times.

Problem when settings time_zone

Everything was fine, before i add this:

config.time_zone = 'Paris'
config.active_record.default_timezone = 'Paris'

Trace errors :

Indexing Product
rake aborted!
undefined method `to_i' for Sat, 01 Jan 2005 00:00:00 +0000:DateTime
/opt/ree/lib/ruby/gems/1.8/gems/activesupport-2.3.2/lib/active_support
/time_with_zone.rb:261:in `to_i'
/some_path/20090814101621/vendor/plugins/xapit/lib/xapit/indexers
/abstract_indexer.rb:50:in `field_terms'

Conditions in text query

Search conditions should be definable in the textual query. If there is a Person model with index.field :age then one should be able to specify age:17 in the text query and get all people matching that age.

Support "or" for separate conditions

While it is possible to perform an "or" condition on one attribute:

search(:conditions => { :priority => [3, 5] })

It is currently not possible to perform an "or" on separate attributes.

search(:conditions => { :priority => 3, :category => "foo" }) # does an AND search

One possible solution is to support an "or_search" method which can be chained. For example.

search(:conditions => { :priority => 3 }).or_search(:conditions => { :category => "foo" })

Another solution is to use an array in the conditions call.

search(:conditions => [{ :priority => 3 }, { :category => "foo" }])

I think I prefer this latter approach.

Non-ascii characters

If the term "über" is indexed and one searches for "über" it should find the matching document.

Spell checker

If spell checking is enabled, when someone queries "pengiun" it should suggestion "penguin" if that is an indexed term. This can be stored in Collection#similar_terms method which returns an array of matching terms that are only a few characters away from the words mentioned in the query.

Value in posting too large

Hi,

I'm fighting with my tests at the moment.

My model as got facets, and I use the code provided in the README to display the facets on the page.

It works fine in the browser, but when passing my test, I've got an error

ActionView::TemplateError: RangeError: Value in posting list too large.

See trace under

test: GET :index should respond with success. (ProductsControllerTest):
ActionView::TemplateError: RangeError: Value in posting list too large.
On line #3 of app/views/products/index.haml

    1: - content_for :sidebar do
    2:   %h2 Facets
    3:   - @products.facets do |facet|
    4:     = facet.name
    5:     - for option in facet.options
    6:       = link_to option.name, :overwrite_params => {:facets => option}

    xapit (0.2.7) lib/xapit/query.rb:36:in `mset'
    xapit (0.2.7) lib/xapit/query.rb:36:in `matchset'
    xapit (0.2.7) lib/xapit/query.rb:40:in `matches'
    xapit (0.2.7) lib/xapit/facet.rb:64:in `matches'
    xapit (0.2.7) lib/xapit/facet.rb:44:in `matching_identifiers'
    xapit (0.2.7) lib/xapit/facet.rb:32:in `unfiltered_options'
    xapit (0.2.7) lib/xapit/facet.rb:26:in `options'
    xapit (0.2.7) lib/xapit/collection.rb:118:in `facets'
    xapit (0.2.7) lib/xapit/collection.rb:117:in `select'
    xapit (0.2.7) lib/xapit/collection.rb:117:in `facets'
    app/views/products/index.haml:3
    haml (3.0.2) lib/haml/helpers.rb:343:in `call'
    haml (3.0.2) lib/haml/helpers.rb:343:in `capture_haml'
    haml (3.0.2) lib/haml/helpers.rb:566:in `with_haml_buffer'
    haml (3.0.2) lib/haml/helpers.rb:339:in `capture_haml'
    haml (3.0.2) rails//lib/haml/helpers/action_view_mods.rb:88:in `capture'
    app/views/products/index.haml:1:in `_run_haml_app47views47products47index46haml'
    haml (3.0.2) rails//lib/haml/helpers/action_view_mods.rb:13:in `render'
    haml (3.0.2) rails//lib/haml/helpers/action_view_mods.rb:13:in `render'
    test/functional/products_controller_test.rb:11:in `__bind_1273850390_850922'
    shoulda (2.10.3) lib/shoulda/context.rb:380:in `call'
    shoulda (2.10.3) lib/shoulda/context.rb:380:in `run_current_setup_blocks'
    shoulda (2.10.3) lib/shoulda/context.rb:379:in `each'
    shoulda (2.10.3) lib/shoulda/context.rb:379:in `run_current_setup_blocks'
    shoulda (2.10.3) lib/shoulda/context.rb:361:in `test: GET :index should respond with success. '
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:34:in `run'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:33:in `each'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:33:in `run'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:34:in `run'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:33:in `each'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:33:in `run'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/ui/testrunnermediator.rb:46:in `run_suite'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:67:in `start_mediator'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:41:in `start'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/ui/testrunnerutilities.rb:29:in `run'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/autorunner.rb:216:in `run'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/autorunner.rb:12:in `run'
    /home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit.rb:279
    test/functional/products_controller_test.rb:5

Does anyone as a clue about this?

ps: I'm doing my reindexing before calling the get :index

Fix sorting and conditional ranges for global Xapit.search

Currently sorting, conditional ranges, and a few other things do not work when doing a global Xapit.search. This is because it relies on Xapian's values which are different for each model.

One possible solution is to re-assign the value's position to a global index, so the "name" attribute will always be at the same position for each model, but other unique attributes will be at a different position, etc.

This does come with many difficulties because if a new attribute is added then this changes the position for everything. Usually everything is re-indexed at that point so it may not be a problem.

Index and search with accented Characters..

Hi,
I want my search module to be accent-insensitive
I added sanitation in the query parser, using a method from rails

 #normalize(form=ActiveSupport::Multibyte.default_normalization_form)
 def cleanup_text(text) # depends on rails to sanitize the accents in string
     text.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').to_s.gsub(/\b([a-z])\*/i) { $1 } 
 end

and get > "àáâãäå" => "aaaaaa" nice !

But what about indexing ?!
where can I sanitize the terms before adding them to the db ?
I didn't find the right place in the indexer... any idea ?

Similar records

One should be able to call search_similar on an indexed object to find records which are similar. This should use all "text" and "field" terms to find similarity.

This method should accept search options to further narrow down the similar results, and the returned object should be a Xapit::Collection.

@article.search_similar(:conditions => ...)

Speed up test suite

The tests take a long time to run because the database is regenerated each time. There must be some way instead to clear all records in a database instead of removing the file.

Specify classes in Xapit.search

When searching multiple classes with Xapit.search there should be a :classes or :members option which takes an array of classes to perform searches on.

Xapit.search("puzzle", :classes => [Article, Comment])

Support large floats/integer values when sorting

From a GitHub Message:

Sorry I wasn't too clear in my last email regarding my problems with Xapit.serialize_value. I said that it used Xapian.sortable_serialise with integer values (for Date and Time objects) when Xapian.sortable_serialise accepted floats. Here's a console demo to illustrate the problem I'm encountering:

   Loading development environment (Rails 2.3.2)
   >> d = Date.new 2098, 3, 20  # happened to have a very future date in my database
   => Thu, 20 Mar 2098
   >> d.to_time.to_i
   => 4046083200
   >> Xapian.sortable_serialise(d.to_time.to_i)
   TypeError: Expected argument 0 of type double, but got Fixnum 4046083200
           in SWIG method 'Xapian::sortable_serialise'
           from (irb):3:in `sortable_serialise'
           from (irb):3
   >> d.to_time.to_f
   => 4046083200.0
   >> Xapian.sortable_serialise(d.to_time.to_f)
   => "�c�R\244"
   >> Xapian.sortable_serialise(2**31-1)
   => "�_�\377\377\360"
   =>

So it appears that Xapian.sortable_serialise has a problem with very large integer/Fixnum values.

Updating existing index

Instead of clearing the database and recreating the index each time, there should be support for only re-indexing the changed/created records. This can somewhat be determined using the updated_at timestamp in Rails applications and looking at the time the database was created. Possible gotchas include:

  • time zone differences between updated_at timestamp and local system time.
  • attributes changed through associated records (possible to use the "touch" method in newer versions of Rails to get around this)
  • leftover facet option records (shouldn't hurt anything)
  • the updating will likely happen in a separate process, how do we communicate this to the main process?
  • if a record is deleted, how do we mark it as needing to be removed from the index? One way is to have a separate table but that can be messy.
  • changes to xapit block won't be handled properly (no way to get around this, just communicate it)

An alternative solution which solves some of these issues is to keep the Xapian database loaded under a different process and use a REST api or something similar to communicate with it. This way it can have the writable database always loaded and update it on the fly as records change. Of course the downside is it would require a separate process...

Facet options and OR conditions

I have for my user model a facet called country. In my results, I have this options generated by xapit : france (4719ac1), us (0cbaf87) and spain (4567bc2). I would like to get in my results users associated to the countries us OR spain. It seems that an and condition is generated if I try this :

User.search('', :facets => %w(0cbaf87 4567bc2))

so, no results are returned. Any idea ?

Regards,

Julien

ORM Agnostic

Create an abstract adapter layer for interacting with ORMs besides ActiveRecord (such as DataMapper, Sequel or CouchDB).

Dates and times in conditions hash

One should be able to mark a date as the field.

xapit do |index|
  index.field :released_on
end

And then perform a search on it through the conditions hash.

Xapit.search(:conditions => { :released_on => Date.today })

Similarly, it would be nice if times were supported, and that searching by a date would include all times in that date.

Custom indexer not being used.

I noticed that specifying a custom indexer on Xapit.setup isn't working. The problem is that on Blueprint#initialize, @indexer is assigned to an instance of SimpleIndexer no matter what is being specified on the setup call. Here is the fix I'm using on my application

# Patch for xapit to make it respect custom indexer setting
class Xapit::IndexBlueprint
  alias_method :original_initialize, :initialize
  def initialize(member_class, *args)
    original_initialize(member_class, *args)
    @indexer = Xapit::Config.indexer.new(self)
  end
end

No SQL queries when listing facets

Currently Xapit will hit the database to fetch one record for each type of facet when listing them. This is so it can retrieve the facet attribute values in order to list the facet options.

In theory this query could be done on the xapit database instead of hitting the SQL database. This would also make the facet behavior more consistent if the Xapit database gets out of sync with the SQL database.

To do this it would need to loop through the match.document.terms to fetch the terms which are facet options (beginning with a capital "F"). This will likely result in a cleaner implementation too.

Specify collapse_key in search options

Hi,

I'm looking for a way to specify the collapse_key which is used in Query#matchset:

    enquire.collapse_key = options[:collapse_key] if options[:collapse_key]

As I haven't found it, I added a method in AbstractQueryParser:

    def collapse_key
      @options[:collapse_key] ? @options[:collapse_key].to_i : nil
    end

And I updated query_options:

    def query_options
      {
        :offset => offset,
        :limit => per_page,
        :sort_by_values => sort_by_values,
        :sort_descending => @options[:descending],
        :collapse_key => collapse_key
      }
    end

So that I can do:

    Article.search("phone", :collapse_key => 0)

Is there a way to specify the collapse_key properly?

Regards

Synonyms

"Xapian provides support for storing a synonym dictionary, or thesaurus. This can be used by the Xapian::QueryParser class to expand terms in user query strings, either automatically, or when requested by the user with an explicit synonym operator (~)."

http://xapian.org/docs/synonyms.html

Customize attribute weight

One should be able to specify the weight of an attribute in the index. The default weight is 1...

class Article < ActiveRecord::Base
  xapit do |index|
    index.text :title, :weight => 10
    index.text :tags, :weight => 5
    index.text :content
  end
end

Internally this will increment the term count by the weight value for each word that is indexed for that attribute.

document.add_term(word, weight)

That should cause the heavier weighted terms to take precedence in the relevance ranking.

Memoize wildcard query

The wildcard query can be somewhat slow and is often performed multiple times per search. To improve performance it is more efficient if this is memoized. Something like:

class Xapit::AbstractQueryParser
  extend ActiveSupport::Memoizable
  memoize :wildcard_query
end

This has a very strong reliance on ActiveSupport which I may not want.

Check Xapian database before removing

The database removal task wipes out the entire directory. This is dangerous if one accidentally specifies the wrong path. It should check to see if the contents of the directory look like a Xapian database.

Xapit::Collection#offset

The addition of Xapit::Collection#offset would allow the use of will_paginates' page_entries_info helper method.

def offset
(current_page - 1) * per_page
end

Question about conditions and xapit

I'm moving over from acts_as_xapian and I need to do the following:

"Search all records where photo_file_name IS NOT NULL"

Essentially, all records that have an attached file with paperclip. How does one do this in xapit?

Support DataMapper

Fill in the rest of the DataMapperAdapter and include Xapit::Membership into DataMapper::Resource so it is usable automatically by all DataMapper models.

The find_each method is the only one that I'm not sure how to implement. Anyone want to take a stab at adding this for DataMapper?

Language stemming

When the word "running" is indexed in a document, and someone types in "runs" in a search, it should find the document when stemming is enabled. You should be able to specify the stemming language in the Config class.

Single find query

Currently Xapit does a separate find(id) query for each record. This should be done in a single query to improve performance.

This problem is somewhat more complex than one would initially expect. The records will not be returned in the proper order from SQL so they need to be sorted again. One also needs to take into account that searches can span multiple models with Xapit.search and therefore need to have a separate database query per model.

Turn into Ruby Gem

Turn Xapit into a Ruby Gem. Add a script/generate xapit to generate rake task and config file for using in a Rails app through the gem.

Issue with Models with guid ids

If a model uses guid formatted ids (like 'e6cac453-a527-18dc-d731-7b4cf73598a1'). fetch_results in collections.rb fails with multiple errors.

The lines (172, 180) currently are:
class_name, id = match.document.data.split('-')
should be
class_name, id = match.document.data.split('-', 2)

Line 181 is:
member = records_by_class[class_name].detect { |m| m.id == id.to_i }
should be:
member = records_by_class[class_name].detect { |m| m.id == id }

undefined method `xapit'

Hi,
working with rails 2.3.8 and Activerecord, I had some trouble to setup Xapit.
After including the gem, running the generator, and setting up my class.
Rails::Initializer.run do |config|
...
config.gem 'xapit'

Xapit.setup(:database_path => "#{Rails.root}/db/xapian")

class Depense < ActiveRecord::Base
...
 xapit do |x|
   x.text :motif, :categorie
   x.field :imputable_type, :imputable_id
   x.sortable :date, :motif
 end

I got a undefined method error while running:

rake :xapit:index
rake aborted!
undefined method `xapit' for #Class:0xb6f2a650
........
It seem's the Membership module wasn't included in Activerecord::Base as documented.
of course My workaround is :
include Xapit::Membership
in Activerecord::Base explicitly setup.

clearly
if defined? ActiveRecord
ActiveRecord::Base.class_eval do
include Xapit::Membership
end
end
doesn't work on my system...

Strange number of record return with wildcard

When wildcard is use in query of a term who don't match any record, all result is return

>> Product.search("fly*").size
=> 24
>> Product.search("flyd*").size
=> 4019

4019 is the total number of product in my database.

Hide facets which don't narrow down results

Currently, if only one facet option was found in a result that entire facet is hidden. However it is possible that if some records are nil then the single facet would narrow down results and is therefore valuable.

Instead it is better to only hide facets if all facet options's match count is equal to the total search count. Therefore no facet options will narrow down the result set and the facet should be hidden.

Automatic updates of the search index

Is there someway of checing the age of the index, or of updating it at intervals?

I've tried Googling for some sort of best practice for this, but have so far come up with nothing.

rails_cron would have seemed to have been perfect for the job, but that seems to have gone the way of the dodo... Any recommendations?

facet-params include page from pagination

Hi Ryan,
great work! Thanks a lot for sharing this. I think I found one issue:

If You use pagination and open e.g. page 4, then the facets will include page=4 in their call:
facets=f9aed79&page=4&suchwort=xyz
This will lead to 0 results.

So I changed the facet-li from Your example to
<%= link_to option.name, :overwrite_params => { :facets => option, :page => nil } %>

Greetings
Sven

Breadcrumb Applied Facets

One should be able to change the mode of the applied facets so it works like a nested breadcrumb style:

Search > Clothing > $25-$50 > Mens

Clicking on a given section would drop off the applied facets after it. This should be settable in the config:

Xapit::Config.setup(:breadcrumb_facets => true)

Conditions

Is there a way to have this kind of conditions ?

A AND (B OR C)

without using not_conditions or translating it into (A AND B) OR (A AND C)...
Regards

Integration with Cancan

Hi guys !
Did anybody succeed toi integrate xapit search with Cancan access restrictions ?!
Since .search() is not a name_scope, and return a xapit object wich doesn't know Cancan methods. I cannot chain the queries (like with pagination or other stuff).
Anyway I tried to write a new method for Xapit

 # Xapit & Cancan fancing Tango ! (yet ?)
module XapitExt
  class Collection
    def xapit_accessible_by(ability, action = :read)            # verify access in the search results. why ? why not...
      cancan_query = ability.query(action, self)                 # self is a xapit object not the searched object ! grrrrr !
      search(:conditions => cancan_query.conditions)             #, :joins => query.joins # joins needed ?! didn't see anything in the parser.
    end
  end
  module Membership                                       # will be included in activerecord...
    module AdditionalMethods
      module ClassMethods
        def xapit_accessible_by(ability, action = :read)        # check abilities as conditions for the search, the right way !
          cancan_query = ability.query(action, self)
          search(:conditions => cancan_query.conditions)
        end
      end
    end
  end
end

It didn't work for me since my conditions return a SQL fragment, not very useful for Xapit....
Maybe .search() method will rely on a scope or .where() to work with any other solutions.
In the mean time if someone as an idea to sort this out, I'll be pleased a lot.
Bye

Wildcard Matching

It would be nice if the an asterisk could be used to match any characters in a search. Now Xapian does offer some of this functionality, but it is very limited. You can only do a wildcard match at the end of a term.

It would be nice if this was supported for both normal search string queries and conditions:

Xapit.search("foo*", :conditions => { :name => "bar*" })

One use for this is on an immediate-feedback AJAX search where one gets results as they type the query.

sortable attributes must be strings?

I'm just started experimenting with xapit. Looks like sorting is purely string-based, numeric-valued attributes are not supported? Is this a Xapit issue or Xapian?

Auto-reload Database

Upon fetching the Xapit database, it should check to see if the database date modified has changed since last fetch. If it has then it should reload the database.

This has two key advantages. One is that it is easier to sync the database with a cron task of some kind (without using xapit-sync). Another is that this is a nice reload alternative for those using xapit-sync with Passenger.

Problem indexing array of values

I'm using the example shown in the readme, to index text from a has_many relation by returning an array. Indexing happens without any errors. When I try to search using only text that would be found on the related table (e.g. xapit), I get no results. I've tried various combinations, including a minimal configuration and it still doesn't work.

class Product
  has_many :items

  xapit do |index|
    index.text :item_descriptions
  end

  def item_descriptions
    items.map(&:description) # ['xapit t-shirt', 'xapit ball cap']
  end
end

Product.search 'ball cap'
=> []

Nested facets

It would be nice if some facets could be nested. For example, let's say this is a store application with a Departments facet:

  • Groceries
  • Clothing
  • Toys & Games
  • Electronics

Choosing the Clothing department could show subdepartments:

  • Men
  • Women
  • Children

There are a couple ways this could be implemented, but I'll expand upon those ideas later.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.