ryanb / xapit Goto Github PK
View Code? Open in Web Editor NEWHigh level Ruby library for interacting with Xapian, a full text search engine.
License: MIT License
High level Ruby library for interacting with Xapian, a full text search engine.
License: MIT License
Given a search collection and a record from that search, one should be able to fetch the previous/next record in that search. For example:
recipes = Recipe.search(:conditions => { :category_id => 3 }, :order => :name) recipe = recipes[17] # some record recipes.previous(recipe) # returns recipe 16 recipes.next(recipe) # returns recipe 18
This is ideal for adding previous/next links after clicking on a searched record in a web app.
This needs to work outside of the paginated result set, so we can't just fetch the records and do the processing in Ruby.
One solution is to take the sort key and add that as a condition. For example, if the recipe's name is "Chicken", we can find the previous record by adding a condition where name < Chicken
which might find "Cake".
However, this solution has two serious limitations. One is that an order key must be specified (the default xapian relevancy won't work). Another is that the order key must be unique. If we have multiple recipes with the same name then it will skip them.
One should be able to pass a range into a condition to fetch any record with a value in that range.
search(:conditions => { :priority => 3..5 })
On a similar note, it would be nice if one could search on a set of values too. This would find all records which match either value.
search(:conditions => { :priority => [3, 5] })
This should work for integers, floats, and dates/times.
The Xapit::Membership module is not automatically included into ActiveRecord when using a gem (instead of a plugin).
Everything was fine, before i add this:
config.time_zone = 'Paris'
config.active_record.default_timezone = 'Paris'
Trace errors :
Indexing Product
rake aborted!
undefined method `to_i' for Sat, 01 Jan 2005 00:00:00 +0000:DateTime
/opt/ree/lib/ruby/gems/1.8/gems/activesupport-2.3.2/lib/active_support
/time_with_zone.rb:261:in `to_i'
/some_path/20090814101621/vendor/plugins/xapit/lib/xapit/indexers
/abstract_indexer.rb:50:in `field_terms'
Search conditions should be definable in the textual query. If there is a Person
model with index.field :age
then one should be able to specify age:17
in the text query and get all people matching that age.
While it is possible to perform an "or" condition on one attribute:
search(:conditions => { :priority => [3, 5] })
It is currently not possible to perform an "or" on separate attributes.
search(:conditions => { :priority => 3, :category => "foo" }) # does an AND search
One possible solution is to support an "or_search" method which can be chained. For example.
search(:conditions => { :priority => 3 }).or_search(:conditions => { :category => "foo" })
Another solution is to use an array in the conditions call.
search(:conditions => [{ :priority => 3 }, { :category => "foo" }])
I think I prefer this latter approach.
If the term "über" is indexed and one searches for "über" it should find the matching document.
If spell checking is enabled, when someone queries "pengiun" it should suggestion "penguin" if that is an indexed term. This can be stored in Collection#similar_terms method which returns an array of matching terms that are only a few characters away from the words mentioned in the query.
Hi,
I'm fighting with my tests at the moment.
My model as got facets, and I use the code provided in the README to display the facets on the page.
It works fine in the browser, but when passing my test, I've got an error
ActionView::TemplateError: RangeError: Value in posting list too large.
See trace under
test: GET :index should respond with success. (ProductsControllerTest):
ActionView::TemplateError: RangeError: Value in posting list too large.
On line #3 of app/views/products/index.haml
1: - content_for :sidebar do
2: %h2 Facets
3: - @products.facets do |facet|
4: = facet.name
5: - for option in facet.options
6: = link_to option.name, :overwrite_params => {:facets => option}
xapit (0.2.7) lib/xapit/query.rb:36:in `mset'
xapit (0.2.7) lib/xapit/query.rb:36:in `matchset'
xapit (0.2.7) lib/xapit/query.rb:40:in `matches'
xapit (0.2.7) lib/xapit/facet.rb:64:in `matches'
xapit (0.2.7) lib/xapit/facet.rb:44:in `matching_identifiers'
xapit (0.2.7) lib/xapit/facet.rb:32:in `unfiltered_options'
xapit (0.2.7) lib/xapit/facet.rb:26:in `options'
xapit (0.2.7) lib/xapit/collection.rb:118:in `facets'
xapit (0.2.7) lib/xapit/collection.rb:117:in `select'
xapit (0.2.7) lib/xapit/collection.rb:117:in `facets'
app/views/products/index.haml:3
haml (3.0.2) lib/haml/helpers.rb:343:in `call'
haml (3.0.2) lib/haml/helpers.rb:343:in `capture_haml'
haml (3.0.2) lib/haml/helpers.rb:566:in `with_haml_buffer'
haml (3.0.2) lib/haml/helpers.rb:339:in `capture_haml'
haml (3.0.2) rails//lib/haml/helpers/action_view_mods.rb:88:in `capture'
app/views/products/index.haml:1:in `_run_haml_app47views47products47index46haml'
haml (3.0.2) rails//lib/haml/helpers/action_view_mods.rb:13:in `render'
haml (3.0.2) rails//lib/haml/helpers/action_view_mods.rb:13:in `render'
test/functional/products_controller_test.rb:11:in `__bind_1273850390_850922'
shoulda (2.10.3) lib/shoulda/context.rb:380:in `call'
shoulda (2.10.3) lib/shoulda/context.rb:380:in `run_current_setup_blocks'
shoulda (2.10.3) lib/shoulda/context.rb:379:in `each'
shoulda (2.10.3) lib/shoulda/context.rb:379:in `run_current_setup_blocks'
shoulda (2.10.3) lib/shoulda/context.rb:361:in `test: GET :index should respond with success. '
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:34:in `run'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:33:in `each'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:33:in `run'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:34:in `run'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:33:in `each'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/testsuite.rb:33:in `run'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/ui/testrunnermediator.rb:46:in `run_suite'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:67:in `start_mediator'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:41:in `start'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/ui/testrunnerutilities.rb:29:in `run'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/autorunner.rb:216:in `run'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit/autorunner.rb:12:in `run'
/home/nicolas/.rvm/rubies/ree-1.8.7-2010.01/lib/ruby/1.8/test/unit.rb:279
test/functional/products_controller_test.rb:5
Does anyone as a clue about this?
ps: I'm doing my reindexing before calling the get :index
See this fork: http://github.com/jfahrenkrug/xapit/tree/master
See this repository for adapter. http://github.com/speedmax/xapit/tree/master
Scenario: Multi-word spelling suggestion with correct and incorrect spelling
Given indexed records named "Zebra, Apple"
When I query for "zebra aple"
Then I should have "zebra apple" as a spelling suggestion
Currently sorting, conditional ranges, and a few other things do not work when doing a global Xapit.search
. This is because it relies on Xapian's values which are different for each model.
One possible solution is to re-assign the value's position to a global index, so the "name" attribute will always be at the same position for each model, but other unique attributes will be at a different position, etc.
This does come with many difficulties because if a new attribute is added then this changes the position for everything. Usually everything is re-indexed at that point so it may not be a problem.
Hi,
I want my search module to be accent-insensitive
I added sanitation in the query parser, using a method from rails
#normalize(form=ActiveSupport::Multibyte.default_normalization_form) def cleanup_text(text) # depends on rails to sanitize the accents in string text.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').to_s.gsub(/\b([a-z])\*/i) { $1 } end
and get > "àáâãäå" => "aaaaaa"
nice !
But what about indexing ?!
where can I sanitize the terms before adding them to the db ?
I didn't find the right place in the indexer... any idea ?
One should be able to call search_similar
on an indexed object to find records which are similar. This should use all "text" and "field" terms to find similarity.
This method should accept search options to further narrow down the similar results, and the returned object should be a Xapit::Collection
.
@article.search_similar(:conditions => ...)
The tests take a long time to run because the database is regenerated each time. There must be some way instead to clear all records in a database instead of removing the file.
When searching multiple classes with Xapit.search
there should be a :classes
or :members
option which takes an array of classes to perform searches on.
Xapit.search("puzzle", :classes => [Article, Comment])
From a GitHub Message:
Sorry I wasn't too clear in my last email regarding my problems with Xapit.serialize_value. I said that it used Xapian.sortable_serialise with integer values (for Date and Time objects) when Xapian.sortable_serialise accepted floats. Here's a console demo to illustrate the problem I'm encountering:
Loading development environment (Rails 2.3.2)
>> d = Date.new 2098, 3, 20 # happened to have a very future date in my database
=> Thu, 20 Mar 2098
>> d.to_time.to_i
=> 4046083200
>> Xapian.sortable_serialise(d.to_time.to_i)
TypeError: Expected argument 0 of type double, but got Fixnum 4046083200
in SWIG method 'Xapian::sortable_serialise'
from (irb):3:in `sortable_serialise'
from (irb):3
>> d.to_time.to_f
=> 4046083200.0
>> Xapian.sortable_serialise(d.to_time.to_f)
=> "�c�R\244"
>> Xapian.sortable_serialise(2**31-1)
=> "�_�\377\377\360"
=>
So it appears that Xapian.sortable_serialise has a problem with very large integer/Fixnum values.
Instead of clearing the database and recreating the index each time, there should be support for only re-indexing the changed/created records. This can somewhat be determined using the updated_at timestamp in Rails applications and looking at the time the database was created. Possible gotchas include:
An alternative solution which solves some of these issues is to keep the Xapian database loaded under a different process and use a REST api or something similar to communicate with it. This way it can have the writable database always loaded and update it on the fly as records change. Of course the downside is it would require a separate process...
I have for my user model a facet called country. In my results, I have this options generated by xapit : france (4719ac1), us (0cbaf87) and spain (4567bc2). I would like to get in my results users associated to the countries us OR spain. It seems that an and condition is generated if I try this :
User.search('', :facets => %w(0cbaf87 4567bc2))
so, no results are returned. Any idea ?
Regards,
Julien
Create an abstract adapter layer for interacting with ORMs besides ActiveRecord (such as DataMapper, Sequel or CouchDB).
One should be able to mark a date as the field.
xapit do |index| index.field :released_on end
And then perform a search on it through the conditions hash.
Xapit.search(:conditions => { :released_on => Date.today })
Similarly, it would be nice if times were supported, and that searching by a date would include all times in that date.
I noticed that specifying a custom indexer on Xapit.setup isn't working. The problem is that on Blueprint#initialize, @indexer is assigned to an instance of SimpleIndexer no matter what is being specified on the setup call. Here is the fix I'm using on my application
# Patch for xapit to make it respect custom indexer setting
class Xapit::IndexBlueprint
alias_method :original_initialize, :initialize
def initialize(member_class, *args)
original_initialize(member_class, *args)
@indexer = Xapit::Config.indexer.new(self)
end
end
Currently Xapit will hit the database to fetch one record for each type of facet when listing them. This is so it can retrieve the facet attribute values in order to list the facet options.
In theory this query could be done on the xapit database instead of hitting the SQL database. This would also make the facet behavior more consistent if the Xapit database gets out of sync with the SQL database.
To do this it would need to loop through the match.document.terms to fetch the terms which are facet options (beginning with a capital "F"). This will likely result in a cleaner implementation too.
Hi,
I'm looking for a way to specify the collapse_key which is used in Query#matchset:
enquire.collapse_key = options[:collapse_key] if options[:collapse_key]
As I haven't found it, I added a method in AbstractQueryParser:
def collapse_key
@options[:collapse_key] ? @options[:collapse_key].to_i : nil
end
And I updated query_options:
def query_options
{
:offset => offset,
:limit => per_page,
:sort_by_values => sort_by_values,
:sort_descending => @options[:descending],
:collapse_key => collapse_key
}
end
So that I can do:
Article.search("phone", :collapse_key => 0)
Is there a way to specify the collapse_key properly?
Regards
"Xapian provides support for storing a synonym dictionary, or thesaurus. This can be used by the Xapian::QueryParser class to expand terms in user query strings, either automatically, or when requested by the user with an explicit synonym operator (~)."
One should be able to specify the weight of an attribute in the index. The default weight is 1...
class Article < ActiveRecord::Base xapit do |index| index.text :title, :weight => 10 index.text :tags, :weight => 5 index.text :content end end
Internally this will increment the term count by the weight value for each word that is indexed for that attribute.
document.add_term(word, weight)
That should cause the heavier weighted terms to take precedence in the relevance ranking.
The wildcard query can be somewhat slow and is often performed multiple times per search. To improve performance it is more efficient if this is memoized. Something like:
class Xapit::AbstractQueryParser extend ActiveSupport::Memoizable memoize :wildcard_query end
This has a very strong reliance on ActiveSupport which I may not want.
The database removal task wipes out the entire directory. This is dangerous if one accidentally specifies the wrong path. It should check to see if the contents of the directory look like a Xapian database.
The addition of Xapit::Collection#offset would allow the use of will_paginates' page_entries_info helper method.
def offset
(current_page - 1) * per_page
end
I'm moving over from acts_as_xapian and I need to do the following:
"Search all records where photo_file_name IS NOT NULL"
Essentially, all records that have an attached file with paperclip. How does one do this in xapit?
Fill in the rest of the DataMapperAdapter and include Xapit::Membership into DataMapper::Resource so it is usable automatically by all DataMapper models.
The find_each method is the only one that I'm not sure how to implement. Anyone want to take a stab at adding this for DataMapper?
When the word "running" is indexed in a document, and someone types in "runs" in a search, it should find the document when stemming is enabled. You should be able to specify the stemming language in the Config class.
Currently Xapit does a separate find(id)
query for each record. This should be done in a single query to improve performance.
This problem is somewhat more complex than one would initially expect. The records will not be returned in the proper order from SQL so they need to be sorted again. One also needs to take into account that searches can span multiple models with Xapit.search
and therefore need to have a separate database query per model.
Turn Xapit into a Ruby Gem. Add a script/generate xapit
to generate rake task and config file for using in a Rails app through the gem.
If a model uses guid formatted ids (like 'e6cac453-a527-18dc-d731-7b4cf73598a1'). fetch_results in collections.rb fails with multiple errors.
The lines (172, 180) currently are:
class_name, id = match.document.data.split('-')
should be
class_name, id = match.document.data.split('-', 2)
Line 181 is:
member = records_by_class[class_name].detect { |m| m.id == id.to_i }
should be:
member = records_by_class[class_name].detect { |m| m.id == id }
Hi,
working with rails 2.3.8 and Activerecord, I had some trouble to setup Xapit.
After including the gem, running the generator, and setting up my class.
Rails::Initializer.run do |config|
...
config.gem 'xapit'
Xapit.setup(:database_path => "#{Rails.root}/db/xapian")
class Depense < ActiveRecord::Base
...
xapit do |x|
x.text :motif, :categorie
x.field :imputable_type, :imputable_id
x.sortable :date, :motif
end
I got a undefined method error while running:
rake :xapit:index
rake aborted!
undefined method `xapit' for #Class:0xb6f2a650
........
It seem's the Membership module wasn't included in Activerecord::Base as documented.
of course My workaround is :
include Xapit::Membership
in Activerecord::Base explicitly setup.
clearly
if defined? ActiveRecord
ActiveRecord::Base.class_eval do
include Xapit::Membership
end
end
doesn't work on my system...
When wildcard is use in query of a term who don't match any record, all result is return
>> Product.search("fly*").size => 24 >> Product.search("flyd*").size => 4019
4019 is the total number of product in my database.
Currently, if only one facet option was found in a result that entire facet is hidden. However it is possible that if some records are nil
then the single facet would narrow down results and is therefore valuable.
Instead it is better to only hide facets if all facet options's match count is equal to the total search count. Therefore no facet options will narrow down the result set and the facet should be hidden.
Is there someway of checing the age of the index, or of updating it at intervals?
I've tried Googling for some sort of best practice for this, but have so far come up with nothing.
rails_cron would have seemed to have been perfect for the job, but that seems to have gone the way of the dodo... Any recommendations?
Hi Ryan,
great work! Thanks a lot for sharing this. I think I found one issue:
If You use pagination and open e.g. page 4, then the facets will include page=4 in their call:
facets=f9aed79&page=4&suchwort=xyz
This will lead to 0 results.
So I changed the facet-li from Your example to
<%= link_to option.name, :overwrite_params => { :facets => option, :page => nil } %>
Greetings
Sven
One should be able to change the mode of the applied facets so it works like a nested breadcrumb style:
Search > Clothing > $25-$50 > Mens
Clicking on a given section would drop off the applied facets after it. This should be settable in the config:
Xapit::Config.setup(:breadcrumb_facets => true)
Is there a way to have this kind of conditions ?
A AND (B OR C)
without using not_conditions or translating it into (A AND B) OR (A AND C)...
Regards
Hi guys !
Did anybody succeed toi integrate xapit search with Cancan access restrictions ?!
Since .search() is not a name_scope, and return a xapit object wich doesn't know Cancan methods. I cannot chain the queries (like with pagination or other stuff).
Anyway I tried to write a new method for Xapit
# Xapit & Cancan fancing Tango ! (yet ?)
module XapitExt
class Collection
def xapit_accessible_by(ability, action = :read) # verify access in the search results. why ? why not...
cancan_query = ability.query(action, self) # self is a xapit object not the searched object ! grrrrr !
search(:conditions => cancan_query.conditions) #, :joins => query.joins # joins needed ?! didn't see anything in the parser.
end
end
module Membership # will be included in activerecord...
module AdditionalMethods
module ClassMethods
def xapit_accessible_by(ability, action = :read) # check abilities as conditions for the search, the right way !
cancan_query = ability.query(action, self)
search(:conditions => cancan_query.conditions)
end
end
end
end
end
It didn't work for me since my conditions return a SQL fragment, not very useful for Xapit....
Maybe .search() method will rely on a scope or .where() to work with any other solutions.
In the mean time if someone as an idea to sort this out, I'll be pleased a lot.
Bye
It would be nice if the an asterisk could be used to match any characters in a search. Now Xapian does offer some of this functionality, but it is very limited. You can only do a wildcard match at the end of a term.
It would be nice if this was supported for both normal search string queries and conditions:
Xapit.search("foo*", :conditions => { :name => "bar*" })
One use for this is on an immediate-feedback AJAX search where one gets results as they type the query.
I'm just started experimenting with xapit. Looks like sorting is purely string-based, numeric-valued attributes are not supported? Is this a Xapit issue or Xapian?
Upon fetching the Xapit database, it should check to see if the database date modified has changed since last fetch. If it has then it should reload the database.
This has two key advantages. One is that it is easier to sync the database with a cron task of some kind (without using xapit-sync). Another is that this is a nice reload alternative for those using xapit-sync with Passenger.
I'm using the example shown in the readme, to index text from a has_many relation by returning an array. Indexing happens without any errors. When I try to search using only text that would be found on the related table (e.g. xapit), I get no results. I've tried various combinations, including a minimal configuration and it still doesn't work.
class Product
has_many :items
xapit do |index|
index.text :item_descriptions
end
def item_descriptions
items.map(&:description) # ['xapit t-shirt', 'xapit ball cap']
end
end
Product.search 'ball cap'
=> []
It would be nice if some facets could be nested. For example, let's say this is a store application with a Departments facet:
Choosing the Clothing department could show subdepartments:
There are a couple ways this could be implemented, but I'll expand upon those ideas later.
If one specifies a custom primary key in their Active Record model then it should use that throughout Xapit.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.