elastic / elasticsearch-ruby Goto Github PK

View Code? Open in Web Editor NEW

2.0K 349.0 596.0 14.25 MB

Ruby integrations for Elasticsearch

Home Page: https://www.elastic.co/guide/en/elasticsearch/client/ruby-api/current/index.html

License: Apache License 2.0

Ruby 98.87% HTML 0.47% Dockerfile 0.03% Shell 0.63%

ruby elasticsearch elastic search rubynlp client

elasticsearch-ruby's Introduction

Elasticsearch

Download the latest version of Elasticsearch or sign-up for a free trial of Elastic Cloud.

This repository contains the official Elasticsearch Ruby client. The elasticsearch gem is a complete Elasticsearch client which uses two separate libraries:

elastic-transport - provides the low-level code for connecting to an Elasticsearch cluster.
elasticsearch-api - provides a Ruby API for the Elasticsearch RESTful API.

Documentation

Please refer to the full documentation on elastic.co for comprehensive information.

Both elastic-transport and elasticsearch-api are documented. You can check the elastic-transport and the elasticsearch-api documentation at RubyDocs.

Installation

gem install elasticsearch

Refer to the Installation section of the getting started documentation.

Connecting

Refer to the Connecting section of the getting started documentation.

Usage

require 'elasticsearch'
client = Elasticsearch::Client.new(host: 'https://my-elasticsearch-host.example')
client.ping
client.search(q: 'test')

Refer to the official documentation for examples of how to use the most frequently called APIs and doc/examples for some practical examples.

For optimal performance, you should use a HTTP library which supports persistent ("keep-alive") connections, e.g. Patron or Typhoeus. These libraries are not dependencies of the Elasticsearch gems. Ensure you define a dependency for a HTTP library in your own application.

Check out these other official Ruby libraries for working with Elasticsearch:

elasticsearch-rails - integration with Ruby models and Rails applications.
elasticsearch-extensions, deprecated.
elasticsearch-dsl which provides a Ruby API for the Elasticsearch Query DSL.

Please see their respective READMEs for information and documentation.

Compatibility

We follow Ruby’s own maintenance policy and officially support all currently maintained versions per Ruby Maintenance Branches.

Language clients are forward compatible; meaning that clients support communicating with greater or equal minor versions of Elasticsearch without breaking. It does not mean that the client automatically supports new features of newer Elasticsearch versions; it is only possible after a release of a new client version. For example, a 8.12 client version won't automatically support the new features of the 8.13 version of Elasticsearch, the 8.13 client version is required for that. Elasticsearch language clients are only backwards compatible with default distributions and without guarantees made.

Gem Version		Elasticsearch Version	Supported
7.x	→	7.x	7.17
8.x	→	8.x	8.x
main	→	main

Development

See CONTRIBUTING.

License

This software is licensed under the Apache 2 license. See NOTICE.

elasticsearch-ruby's People

Contributors

Stargazers

Watchers

Forkers

vhyza mkdynamic jsuchal anveo johnnycon svankie spiliero mshirley stefanofontanelli killerdesigner jage mpereira howaboutwe fanyeren joelr everydayhero bentona redtailtech presto53 allenlsy vid-io ctrochalakis vinchu skroutz otobrglez colinsurprenant vrinek heartsavior glebtv jazminschroeder konklone imlqw mindis hihihippp donaldpiret mschulkind ahabari dorilla alexislitool abhijitiitr zbristow herqueles3 aquawoman22 nickdaniels tmandry curlyheads tspaulino tuttinator akito0107 kcdragon techvalidate dimroc jsvd tpot grepory cvalerio pkarman cannikin viecode09 gnuhub ifttt rosscdh lemig justahero lisp-ceo reuben-sutton mserran2 gregoryp daveinglis lupinedev rthbound wrgoldstein sujiaw janko nornagon kkirsche fcheung ksmaheshkumar torrancew thorrsson cheald igrep basiltamm nicolasleger inkel lorgio hydrogen18 shipci pgeraghty kostasdx channainfo mscifo ryanza langsharpe yanksan antonk5 talevy em-gazelle robsteranium resayinfo

elasticsearch-ruby's Issues

Logging/Tracing bad queries

After the application of bff153a, I can no longer see my 4xx class errors. Most importantly this makes the tracer useless for debugging the new queries I'm writing when I get a 400 back from ES.

Rails and persistent connections

I am already using the Gem in a Rails 3.2.17 app and now I'm experimenting with persistent connections. Would it make any sense if my Rails app didn't have to make a new connection with every request (for instance, in ApplicationController), but make one persistent connection at startup and reuse it for the lifetime of the app's instance?

Connecting with elasticsearch-transport and specifying :net_http_persistent as the adapter does give me a persistent connection, but of course only for the lifetime of the request.

Where would the connection code have to go in order to just make one reusable connection which is accessible from my controllers?

Please add min_score to search valid_params

min_score is a valid body search param, so having it be supported would be really helpful.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-min-score.html

Ruby 2.1 support

It will be really cool to add 2.1 to the .travis.yml. And rbx also, but for rbx need to do something with ruby-prof.

Nodes info always returns all information

The arguments are seemingly ignored.

To reproduce:

[2] pry(main)> client = Elasticsearch::Client.new
...
[3] pry(main)> client.nodes.info jvm: true
# Everything is returned, not just jvm information

Elasticsearch version:

Version: 1.1.0, Build: 2181e11/2014-03-25T15:59:51Z, JVM: 1.7.0_45

Gem version:

[5] pry(main)> Gem.loaded_specs["elasticsearch"].version
=> Gem::Version.new("1.0.1")

Nested type mappings are not parsed to Hashes

When getting a document via get from an index that contains nested mappings the nested object is returned as a string instead of being parsed as a hash.

For example, and index with this mapping:

mappings: {
  document_type: {
    _source: {enabled: true},
    properties: {
      id: {type: :integer},
      name: {type: :string},
      list_of_nested_objects: {
        properties: {
          id: {type: :integer},
          name: {type: :string,
        }
      }
    }
  }
}

when issued a get would return this as its Hash:

{'id' => 1, 'name' => 'cat', 'list_of_nested_objects' => ["string of json", "string of json"]}

I would have anticipated that the nested field would have also been deserialized to a Hash.

Is this support planned?

exists_alias method returns incorrect results for ES 0.90.0

It always returns false.

client = Elasticsearch::Client.new log: true, trace: true
client.indices.create index: "test"
client.indices.update_aliases body: {actions: [{add: {index: "test", alias: "test_alias"}}]}
client.indices.exists_alias(name: "test_alias")

TestCluster non-root start

Hello, I'm using TestCluster to spec our project internal search implementation. And in Ubuntu I can't start test cluster under non-root user. I've find out it is because of elasticsearch data dir write restrictions. Could you add something like -D es.default.path.data=#{path_to_app_tmp_elasticsearch_dir} to the test cluster start script to defeat this problem, or there is a better solution?

Thanks.

elasticsearch how to specify when you create an index score and analyze

sorry, friend,I can only speak a little English.

elasticsearch how to specify when you create an index score and analyze

MultiJson::LoadError after getting '_source' of unexisting object

Hi Karmi
I found something, I don't know if it's a bug or a normal behaviour.
When I try to use "get_source" of an unexisting object, I got a MultiJson::LoadError because the read string is "".

Your file

  elasticsearch-api/lib/elasticsearch/api/actions/get_source.rb

catches only this exceptions

  if Array(arguments[:ignore]).include?(404) && e.class.to_s =~ /NotFound/; false

You can test it with CURL, it returns "".

$ curl -XGET 'localhost:9200/test/directory_person/User-123456/_source'
$

I got the elasticsearch-ruby version 1.0.0.rc2 and ES 1.0.1.

JRuby support ?

Hi,

Are you planning to support JRuby in this new library ?

Thanks

How to use the exists filter?

This is a question, not an issue.

How can I use the exists filter to filters documents where a specific field has a value in them?

I'm trying to get all documents with special_id, any suggestions?

client.search(
  :index => "my-index",
  :type  => "my-type",
  :body  => { :filtered => { :filter => { :exists => { :field => "special_id" } } } }
)

This is my existing documents in elasticsearch:

client.search(
  :index => "my-index",
  :type  => "my-type",
  :body  => { :query => { :match_all => {} } },
)

{"took"=>1,
 "timed_out"=>false,
 "_shards"=>{"total"=>5, "successful"=>5, "failed"=>0},
 "hits"=>
  {"total"=>2,
   "max_score"=>1.0,
   "hits"=>
    [{"_index"=>"my-index",
      "_type"=>"my-type",
      "_id"=>"88756c5a61094810b1ab2dcdfdfed475",
      "_score"=>1.0,
      "_source"=>
       {"id"=>"88756c5a61094810b1ab2dcdfdfed475",
        "special_id"=>"272033007",
        "name"=>"name1"}},
     {"_index"=>"my-index",
      "_type"=>"my-type",
      "_id"=>"4bdbef4e63c0d6b91e2a2a1d5536a695",
      "_score"=>1.0,
      "_source"=>
       {"id"=>"4bdbef4e63c0d6b91e2a2a1d5536a695",
        "special_id"=>nil,
        "name"=>"name2"}}]}}

delete_by_query action should use type parameter (as described)

The documentation of Elasticsearch::API::Actions.delete_by_query says that you can use the :type key in the arguments options hash, but it is ignored.

I am not sure, but is it enough to add something like

Utils.__escape(arguments[:type])

in delete_by_query.rb ?

0.4.9 client creation fails

This fails in 0.4.9:

Elasticsearch::Client.new

The new :transport_options seems required.

stack trace:

NoMethodError: undefined method `[]' for nil:NilClass
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/faraday-0.8.9/lib/faraday/connection.rb:39:in `initialize'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport/transport/http/faraday.rb:42:in `new'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport/transport/http/faraday.rb:42:in `block in __build_connections'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport/transport/http/faraday.rb:35:in `map'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport/transport/http/faraday.rb:35:in `__build_connections'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport/transport/base.rb:32:in `initialize'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport/client.rb:85:in `new'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport/client.rb:85:in `initialize'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport.rb:25:in `new'
    from /Users/michael/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/elasticsearch-transport-0.4.9/lib/elasticsearch/transport.rb:25:in `new'

indices.put_mapping method problem/question.

Hi,
i noticed in elasticsearch documentation that '_mapping' directive is at the end of the url

http://localhost:9200/twitter/tweet/_mapping

but when i use

client.indices.put_mapping index: 'SomeIndex', type: 'SomeType', body: {example_mapping}

the url constructed from

path   = Utils.__pathify Utils.__listify(arguments[:index]), '_mapping', Utils.__escape(arguments[:type])

puts _mapping in the middle of the string and fails loudly with

InvalidTypeNameException[mapping type name [_mapping] can't start with '_'

Is it proper? Maybe i lost something in the way or missused put_mapping??

elasticsearch:
Version: 0.90.11, Build: 11da1ba/2014-02-03T15:27:39Z, JVM: 1.7.0_45

Thanks

& escaped to \u0026 instead of %26 in searches.

I see in this issue where it should be using CGI which should yield %26 instead of the unicode escape sequence. I am escaping Lucene reserved characters. Even so, when I query_string with an ampersand, the generated curl looks like this:

'http://localhost:9200/test/customer/_search?pretty&from=0&size=5' -d '{"query":{"bool":{"must":[{"query_string":{"query":"H\u0026S*","fields":["name"]}}]}}}'

This is causing me not to find customers with ampersands in their names. Is there a configuration that I'm missing, or a different way to query this or an actual issue?

Redoing search: Stick with (Re)Tire or Start over with elasticsearch-ruby?

A general question as to whether elasticsearch-ruby can replace (Re)Tire as a production ES wrapper at this time.

I currently use Tire on my app but I'm going to pretty much redo my implementation completely. Would it be worthwhile to start over with ES-ruby at this time (Oct2013)? What are some things to look out for?

Faraday options

Hi @karmi

Was wondering why the pass through of faraday options into the Http::Faraday isn't allowed?

The collection is created with just url &block, where you could do url, options, &block. Allowing further/easy customization of the faraday connections.

:connection => ::Faraday::Connection.new(url, options[:transport_options], &@block )

Filter by associated records attribute value

Hi, this is a question, not an issue.

I have a article model with this mapping :

{
  "article" : {
    "properties" : {
      "status" : {
        "type" : "long"
      },
      "authors" : {
        "properties" : {
          "name" : {
            "type" : "string"
          }
        }
      },
      "body" : {
        "type" : "string"
      },
      ...
      ...
}

An article can have multiple authors and i want to search all articles for an author.

In my controller, I can do something like :

@articles = Article.search("authors.name:#{params[:author]}")

And this works fine.

Now, i also want to filter results by status where status equal to 1 so i have to write a DSL query by myself. First i try to write the author name filter :

   search = {
        filter: {
            nested: {
                path: 'authors',
                query: {
                    filtered: {
                        query: { match_all: {}},
                        filter: {
                            term: { authors.name: 'Angelina'} # here is the problem since i can't use authors.name as a symbol. How to write this correctly ?
                        }
                    }
                }
            }
        }
    }

Thanks for help, it drives me crazy

reindex method

At this post http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ I read that some APIs provide a reindex() method. It could be great to have it in this gem.

Curl transport host_unreachable_exceptions should potentially include more errors

We sometimes see these 2 errors pop up:
Curl::Err::GotNothingError
Curl::Err::RecvError

It would be great to retry them as they don't happen for any particular reason I can see and its very temporary.

Output to pretty?

I'm extremely green to coding, so I know my question may end up with a RTFM answer, and I'm looking through the docs and info @ http://rubydoc.info/gems/elasticsearch-api. I'm trying to append '&pretty" to searches with +, so they display in this format in my browser. I've found that the output from rails results in html hex 'q=asa%26pretty%3Dtrue' and its not working (CGI.escape i guess). I've also tried using pretty: true like so.

@esearch.search q:(params[:query]), fields:'@message', size:'100', pretty: true do

It does not work, but it does not break it either. Is there a search argument for pretty? Is this what you mean @ http://rubydoc.info/gems/elasticsearch-api by 'Using JSON Builders'? This seems like its more for building the search parameters. Any help is greatly appreciated, and thanks for the great gem!!

A CHANGELOG would help to keep up

Hi @karmi

It would be great if you could add a CHANGELOG to the project.

Looking at the commit history isn't that great when you just want to keep up with the project and have the big picture.

Thanks a lot for your work on Elasticsearch.

Differences using GET or POST in search method (elasticsearch 0.90.10)

I don't know if this is an issue or this as a misunderstood.

Here is a sample query :
client.search({ :index => 'test', :type => 'user', :body => { :query => { :match => { :full_name => "Roger Sand" } } } })

The result of the search method using "GET" (as elasticsearch-ruby does) is different than the result using "POST" method :

With get :
=> {"timed_out"=>false, "_shards"=>{"failed"=>0, "successful"=>5, "total"=>5}, "took"=>1, "hits"=>{"hits"=>[{"_score"=>1.0, "_index"=>"test", "_type"=>"user", "_source"=>{"full_name"=>"Cindy Gilbert"}, "_id"=>"84049"}, {"_score"=>1.0, "_index"=>"test", "_type"=>"user", "_source"=>{"full_name"=>"Sand Roger"}, "_id"=>"84051"}, {"_score"=>1.0, "_index"=>"test", "_type"=>"user", "_source"=>{"full_name"=>"Cindy Sand"}, "_id"=>"84048"}, {"_score"=>1.0, "_index"=>"test", "_type"=>"user", "_source"=>{"full_name"=>"Jean-Roger Sands"}, "_id"=>"84050"}, {"_score"=>1.0, "_index"=>"test", "_type"=>"user", "_source"=>{"full_name"=>"Roger Gilbert"}, "_id"=>"84047"}, {"_score"=>1.0, "_index"=>"test", "_type"=>"user", "_source"=>{"full_name"=>"Roger Sand"}, "_id"=>"84046"}], "max_score"=>1.0, "total"=>6}}

With post :
{"timed_out"=>false, "took"=>2, "_shards"=>{"total"=>5, "failed"=>0, "successful"=>5}, "hits"=>{"total"=>5, "hits"=>[{"_score"=>0.8838835, "_id"=>"84051", "_type"=>"user", "_index"=>"test", "_source"=>{"full_name"=>"Sand Roger"}}, {"_score"=>0.2712221, "_id"=>"84046", "_type"=>"user", "_index"=>"test", "_source"=>{"full_name"=>"Roger Sand"}}, {"_score"=>0.22097087, "_id"=>"84048", "_type"=>"user", "_index"=>"test", "_source"=>{"full_name"=>"Cindy Sand"}}, {"_score"=>0.17677669, "_id"=>"84050", "_type"=>"user", "_index"=>"test", "_source"=>{"full_name"=>"Jean-Roger Sands"}}, {"_score"=>0.028130025, "_id"=>"84047", "_type"=>"user", "_index"=>"test", "_source"=>{"full_name"=>"Roger Gilbert"}}], "max_score"=>0.8838835}}

The GET methods did not seem to take into account the CGI arguments, because it render all of my "user" objects with a score of "1".

Is this a misunderstood, an elasticsearch-ruby issue or an elasticsearch issue ?

Regards

integrating in rails

it's possible to integrate it with rails such as tire using active record or mongodb?

Collection#get_connection()` can return `dead` connection objects

Hello,

We are evaluating the elasticsearch gem so we can replace our patched rubberband client.

It appears that Collection#get_connection() can return dead connection objects (see the example script below). This happens because selector classes (Random, RoundRobin) does not take this fact under consideration. So, marking connections as dead, resurrecting dead connections and relevant mechanisms are efectively not working at the moment.

$ cat test_es.rb
require 'elasticsearch'

selectors = Hash[
  :random, Elasticsearch::Transport::Transport::Connections::Selector::Random,
  :robin,  Elasticsearch::Transport::Transport::Connections::Selector::RoundRobin,
]

conns = (0..2).map { |i|
  Elasticsearch::Transport::Transport::Connections::Connection.new(
    host: "host-#{i}",
    connection: Object.new,
  )
}

col = Elasticsearch::Transport::Transport::Connections::Collection.new(
  connections: conns,
  selector_class: selectors[ARGV.first.to_sym]
)

p [:selector, col.selector.class.name]

conn =conns.sample
conn.dead!
p [:dead, conn]

10.times {
  p [:gen_connection, col.get_connection()]
}

$ bundle exec ruby test_es.rb robin
[:selector, "Elasticsearch::Transport::Transport::Connections::Selector::RoundRobin"]
[:dead, <Elasticsearch::Transport::Transport::Connections::Connection host: host-0 (dead since 2014-01-21 10:50:39 +0200)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-0 (dead since 2014-01-21 10:50:39 +0200)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-1 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-0 (dead since 2014-01-21 10:50:39 +0200)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-1 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-0 (dead since 2014-01-21 10:50:39 +0200)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-1 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-0 (dead since 2014-01-21 10:50:39 +0200)>]

$ bundle exec ruby test_es.rb random
[:selector, "Elasticsearch::Transport::Transport::Connections::Selector::Random"]
[:dead, <Elasticsearch::Transport::Transport::Connections::Connection host: host-0 (dead since 2014-01-21 10:50:46 +0200)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-1 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-1 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-0 (dead since 2014-01-21 10:50:46 +0200)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]
[:gen_connection, <Elasticsearch::Transport::Transport::Connections::Connection host: host-2 (alive)>]

mget with fields parameter returns Array instead of int

For some reason, regular interger filelds get wrapped into Arrays

query = {:index => 'crawler', :type => 'domain', :_source => false :fields => ['inlinks', 'last_crawl'],:body => { :ids => ['example.org', 'somethingelse.com'] }}

client.mget(query)

==>

{......., "fields"=>{"_id"=>"example.org", "last_crawl"=>[1395804365], "inlinks"=>[946074]}}

I am not sure if this is intended behavior.
The documentation for get mentions that: "Field values fetched from the document it self are always returned as an array.", but I'm not sure if this is the behavior I'm seeing. If it is, that's kinda weird :)

HTTP Gem support

Will you consider accepting PR with new transport adapter The HTTP Gem?

Rubydoc documentation for both the transport and API seems broken

http://rubydoc.info/gems/elasticsearch-transport/file/README.markdown

and

http://rubydoc.info/gems/elasticsearch-api/file/README.markdown

are stuck on page generation.
Not sure wether this is a rubydoc issue or an issue with this gem?

Another transport?

Is there any plan to support another transport? maybe thrift?

Pagination

Any work on integrating pagination yet?

More like this query consistency

ruby API

# this doesn't return any product
client.mlt index:'gz-production', type: 'product', id: 647012, mlt_fields: 'name'
# --> GET gz-production/product/647012/_mlt {:mlt_fields=>"name"}
# error 
#"{\"error\":\"MapperParsingException[failed to parse [affiliate_program_id]]; nested: ElasticSearchIllegalStateException[Field should have either a string, numeric or binary value]; \",\"status\":400}"

shell curl

# this does return some products
# if I do not specify the parameter mlt_fields, I will get the same error as the first attempt
curl -XGET http://localhost:9200/gz-production/product/647012/_mlt?mlt_fields=name

I'm a bit confused why the call from elasticsearch-ruby failed, since due to what I see the request seems to be the same

Are clients thread-safe?

I'm wondering it I can share Elasticsearch::Client instances across multiple threads, or should I instantiate a new one per request/thread?

0.4.8 transport.perform_request with curb body is no longer parsed JSON

Since upgrading to 0.4.8, requests the go via the curb transport are returning a string body, instead of a hash body as in 0.4.7.

I'm seeing client.indices.get_aliases returning a string, instead of a hash for example.

Indexes with / (forward slash %2F) in _type are not usable

Tire used forward slashes for namespaced models, so when I tried to do the same when developing my mongoid integration for this gem and hit the following problem:

If slash is not escaped it gives a 400 error (no handler)

> Elasticsearch::API::Utils.__pathify('test_items', 'test/item')
=> "test_items/test/item"

If slash is escaped it escapes it a second time

> Elasticsearch::API::Utils.__pathify('test_items', 'test%2Fitem')
=> "test_items/test%252Fitem"

At the moment I'm monkeypatching it to fix double escaping, which is surely not the best option.

bulk index times out with small number of small records

max_buffer_size                                   = 10
  time_stamp, signal_strength, node_mac, device_mac = l.split('|')

  @buffer << {
      index: {
          _index: @index_name,
          _type:  'probe',
          _id:    SimpleUUID::UUID.new.to_i,
          data:   {
              time_stamp:      DateTime.parse(time_stamp),
              node_mac:        node_mac,
              device_mac:      device_mac,
              signal_strength: signal_strength
          }
      }
  }

  if @buffer.size >= max_buffer_size
    @client.bulk body: @buffer
    @buffer=[]
  end

Tried the same thing with the stretcher gem (slight change in syntax) and it works fine. I tried using patron and typheus but same results.

Here is how I create the index.

 fields = {
      time_stamp:      {type: 'date'},
      node_mac:        {type: 'string', analyzer: 'keyword'},
      device_mac_mac:  {type: 'string', analyzer: 'keyword'},
      signal_strength: {type: 'float'}, #integer ?

  }
  @client.indices.delete index: @index_name rescue nil
  @client.indices.create index: @index_name,
                         body:  {
                             settings: {
                                 index: {
                                     number_of_shards:                 5,
                                     number_of_replicas:               0,
                                     'routing.allocation.include.name' => 'node-1'
                                 }
                             },
                             mappings: {
                                 probe: {
                                     properties: fields
                                 }
                             }
                         }

RoutingMissingException[routing is required ...]

I'm migrating my application from using the Tire gem to Elasticsearch gem. I'm getting a RountingMissingException when I try to create documents when using the Elasticsearch gem (0.4.11) & api/model (0.1.1), but using Tire, it works fine.

I can see that there is a difference in the method they are using, Elasticsearch using PUT while Tire is using POST.

Here is a gist of the relevant mappings and errors: https://gist.github.com/AaronRustad/590817b7c85ee7b9bb4d

Please let me know if anyone needs more information.
Thanks!

http auth host: option

I just finde out that http authentication with username and password is not working with

client  = Elasticsearch::Client.new host: 'localhost', port: '9200', user: 'my_username', password: 'my_password', scheme: 'http'
client.perform_request 'GET', 'foo'
# => Elasticsearch::Transport::Transport::Errors::Unauthorized: [401] Authentication Required

while this ones working fine

client2  = Elasticsearch::Client.new url: 'http://my_username:my_password@localhost:9200/'
client2.perform_request 'GET', 'foo'
# => foo stuff

clientWorking  = Elasticsearch::Client.new hosts: [ {host: 'localhost', port: '9200', user: 'my_username', password: 'my_password', scheme: 'http'}]
clientWorking.perform_request 'GET', 'foo'
# => foo stuff

I know only the second way is documented, but for me it is a bit contra intutive to use the hosts option if I have only one host. Or passing in an url instead of options.

Would be fine if this behavior would be better documented or fixed. Thx.

How to initialize rails app not using elasticsearch-model

Hi Karmi,

Sorry if this is a dumb question but I haven't been able to figure it out. Hoping to get some help with initializing a rails app that's not using elasticsearch-model (I posted on SO but got no replies. http://stackoverflow.com/questions/19541439/elasticsearch-configuration-with-new-official-clients)

If using the es-model gem, I could have an initializer file that declares

Elasticsearch::Model.client = Elasticsearch::Client.new host: 'search.myserver.com'

and then in the app, all the models could call Elasticsearch::Model.client or
Model.elasticsearch.client

Am I understanding that correctly? What should it be if I'm not using the es-model gem?

Currently, I am just creating a new connection every time i need to query something
client = Elasticsearch::Client.new host: 'search.myserver.com'

Doesn't seem like this could scale well. I'm trying to figure out how to declare it in the initializer to leverage connection pooling, etc. Please help!

Thanks in advance

How can I use keep-alive connection with elasticsearch-ruby with no problem? (related typheous)

elasticsearch-ruby has a dependency on faraday, and elasticsearch-ruby recommends to use typheous for keep-alive connection.

I've installed typheous 0.6.7 (recent) and failed.
There're problems with typheous 0.6.7 and faraday 0.9.0 (I've read some issues from typheous and faraday, and found no recent works.)
Because of faraday's autoload feature, I give up using typheous and uninstall.

So I need a suggestion.

How can I use keep-alive connection with elasticsearch-ruby with no problem?

Library-level exception class(es)

Some way of rescuing any (and only) exceptions that emanate from making a call to #search or #index would be hugely helpful.

Otherwise, I'm left either rescuing Exception (not great), or rescuing whatever exceptions the current implementation of elasticsearch-ruby may happen to generate.

parallel search queries

Support firing multiple search queries at the same time.

If query 1 takes 200ms to complete and query 2 takes 300ms complete, then searching sequentially will require 500ms total to complete.

If we had a way to leverage the underlaying http transport such as Curb which already supports making parallel http requests, then at the worst case scenario, it should only take 300ms for both queries to come back.

Serialize Time to supported format

The default format coming out of Time.to_s can't be parsed by elasticsearch (" MapperParsingException[failed to parse date field [2014-03-18 13:36:03 +0000], tried both date format [dateOptionalTime], and timestamp number with locale []]")

Could elasticsearch-ruby serialize the times and dates using .iso8601 to avoid this issue?

It's pretty easy to do yourself, but a bit annoying and seems like something that shouldn't happen the first time you try indexing anything with a Time object in it.

GET vs POST for search

I noticed the trace returns curl compatible output. As in:-

curl -X GET 'http://localhost:9200/blah/_search?pretty' -d ' ...

For curl, it looks like by default using the -d argument switches to a POST request by default -- the `-X GET' is necessary to keep it as a GET request, but the HTTP spec sorta implies that the body of a GET request is meaningless.

Here's a thread where someone else is wondering whether a GET having a body makes any sense - see http://stackoverflow.com/questions/978061/http-get-with-request-body

Is there a de-facto standard which treats parameters in the body as URL parameters except that they don't need to be fully url escaped?

elasticsearch supports a POST request - shouldn't that be the default or at least configurable since that conforms to standards.

indices_boost is a request body option, not a query param

The docs here seem to indicate that you can pas the indices_boost option as a query param:

However, it needs to be part of the request body in order to work - as shown in the docs.

E.g., this doesn't work:

client.search(
  index: 'articles,comments',
  indices_boost: {'articles' => 5, 'comments' => 1},
  body: {
    query: { query_string: {query: 'foo'} }
  }
)

But this does work:

client.search(
  index: 'articles,comments',
  body: {
    query: { query_string: {query: 'foo'} }
    indices_boost: {'articles' => 5, 'comments' => 1}
  }
)

ignore_indices was replaced with other options in ElasticSearch 1.0 and stopped working

See:
elastic/elasticsearch#4453

These new options are not working:
URL parameter 'ignore_unavailable' is not supported

Timeout ignored on optimize?

When I run an optimize with :master_timeout set to 1800, I get a timeout error after 60sec.
Not sure if this is a bug or a setting I somehow set on my ES cluster inadvertently.

Console output

time ./esoptimize.rb
{"count"=>88, "memory_in_bytes"=>46047892}
/usr/local/rvm/gems/ruby-1.9.3-p448@global/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/base.rb:132:in `__raise_transport_error': [504]  (Elasticsearch::Transport::Transport::Errors::GatewayTimeout)
        from /usr/local/rvm/gems/ruby-1.9.3-p448@global/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/base.rb:227:in `perform_request'
        from /usr/local/rvm/gems/ruby-1.9.3-p448@global/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
        from /usr/local/rvm/gems/ruby-1.9.3-p448@global/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/client.rb:102:in `perform_request'
        from /usr/local/rvm/gems/ruby-1.9.3-p448@global/gems/elasticsearch-api-1.0.1/lib/elasticsearch/api/namespace/common.rb:21:in `perform_request'
        from /usr/local/rvm/gems/ruby-1.9.3-p448@global/gems/elasticsearch-api-1.0.1/lib/elasticsearch/api/actions/indices/optimize.rb:64:in `optimize'
        from ./esbackup.rb:53:in `<main>'

real    1m0.535s
user    0m0.146s
sys     0m0.025s

Code I'm using:

#!/usr/bin/env ruby
require 'rubygems'
require 'pp'

gem 'elasticsearch', '=1.0.1'
require 'elasticsearch'
eshost = 'localhost:9200'

@client = Elasticsearch::Client.new log: false, timeout: 1800, host: eshost

idx = "accesslogging-2014-01-11"
pp @client.indices.stats(index: idx)["indices"][idx]['primaries']['segments']
pp @client.indices.optimize index: idx, max_num_segments: 1, master_timeout: 1800
pp @client.indices.stats(index: idx)["indices"][idx]['primaries']['segments']

How to do a scan query?

I have a Person model with elasticsearch-rails integration. As it does not implement scan queries yet, I tried using the API methods. Without success so far.

Querying works like a charm

Person.search({:query=>{:match_all=>{}}}}, :index => 'people').to_a
=> [#<Elasticsearch::Model::Response::Result:0x007fa3b7a96520
      #<Elasticsearch::Model::Response::Result:0x007fa3b7a95878
      ...
     ]

I didn't find any documentation on the scan query, so I'm just guessing. Coming from Tire, I expect to be able to iterate over the results with an each block but the search call just returns an empty array. Can you please provide a working example?

Person.search({:query=>{:match_all=>{}}}}, :index => 'people', :search_type => :scan, :size => 100, :scroll => 10).to_a
=> []

travis-ci

Why not adding travis and travis badge?

"GET /_template{/temp*}" missing

Hi,

It seems that elasticsearch-api is missing an action on indices templates : I can get all the templates by omitting the name parameter, or a specific template by setting name with its name, but I can't get templates matching a pattern (like template-*).