richardwilly98 / elasticsearch-river-mongodb Goto Github PK

MongoDB River Plugin for ElasticSearch

Shell 0.18% JavaScript 6.77% Groovy 2.08% Java 87.50% HTML 0.97% Batchfile 2.49%

elasticsearch-river-mongodb's Introduction

Issue Tracker

This project is maintained in the spare time of contributors and so time able to be dedicated to it is extremely limited.

Please file issues only for reproducable problems. Given our limited time, we are not able to provide any help using the river and issues that simply say "things aren't working" will be closed. However, if you're able to diagnose an issue yourself we may be able to help with a fix and we are happy to review pull requests.

To debug issues, try changing the log level to trace as described in the wiki. You may wish to build a custom version of the river with extra logging.

MongoDB River Plugin for ElasticSearch

This plugin uses MongoDB or the TokuMX fork of MongoDB as datasource to store data in ElasticSearch. Filtering and transformation are also possible. See the wiki for more details.

In order to install the plugin, simply run: bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.9

Note that you must be using MongoDB replica sets since this river tails the oplog.

MongoDB River Plugin	ElasticSearch	MongoDB	TokuMX
master	1.7.3	3.0.0	1.5.1
2.0.9	1.4.2	3.0.0	1.5.1
2.0.5	1.4.2	2.6.6	1.5.1
2.0.2	1.3.5	2.6.5	1.5.1
2.0.1	1.2.2	2.4.9 -> 2.6.3	1.5.0
2.0.0	1.0.0 -> 1.1.1	2.4.9
1.7.4	0.90.10	2.4.8
1.7.3	0.90.7	2.4.8
1.7.2	0.90.5	2.4.8
1.7.1	0.90.5	2.4.6
1.7.0	0.90.3	2.4.5
1.6.11	0.90.2	2.4.5
1.6.9	0.90.1	2.4.4
1.6.8	0.90.0	2.4.3
1.6.7	0.90.0	2.4.3
1.6.6	0.90.0	2.4.3

Build status

Initial implementation by aparo.

Modified to get the same structure as the other Elasticsearch rivers (like CouchDB)

The latest version monitors the oplog capped collection and supports attachment (GridFS).

Configure the river using the definition described in the wiki:

  curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
    "type": "mongodb", 
    "mongodb": { 
      "db": "DATABASE_NAME", 
      "collection": "COLLECTION", 
      "gridfs": true
    }, 
    "index": { 
      "name": "ES_INDEX_NAME", 
      "type": "ES_TYPE_NAME" 
    }
  }'

Example:

  curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{ 
    "type": "mongodb", 
    "mongodb": { 
      "db": "testmongo", 
      "collection": "person"
    }, 
    "index": {
      "name": "mongoindex", 
      "type": "person" 
    }
  }'

Import data from mongo console:

  use testmongo
  var p = {firstName: "John", lastName: "Doe"}
  db.person.save(p)

Query index:

  curl -XGET 'http://localhost:9200/mongoindex/_search?q=firstName:John'

  curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{ 
    "type": "mongodb", 
    "mongodb": { 
      "db": "testmongo", 
      "collection": "fs", 
      "gridfs": true 
    }, 
    "index": {
      "name": "mongoindex", 
      "type": "files" 
    }
  }'

Import binary content in mongo:

  %MONGO_HOME%\bin>mongofiles.exe --host localhost:27017 --db testmongo --collection fs put test-document-2.pdf
  connected to: localhost:27017
  added file: { _id: ObjectId('4f230588a7da6e94984d88a1'), filename: "test-document-2.pdf", chunkSize: 262144, uploadDate: new Date(1327695240206), md5: "c2f251205576566826f86cd969158f24", length: 173293 }
  done!

Query index:

  curl -XGET 'http://localhost:9200/files/4f230588a7da6e94984d88a1?pretty=true'

Admin URL: http://localhost:9200/_plugin/river-mongodb/

See more details check the wiki

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2009-2012 Shay Banon and ElasticSearch <http://www.elasticsearch.org>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

Changelog

2.0.9

Update versions ES 1.4.2, MongoDB 3.0.0, MongoDB driver 2.13.0

2.0.5

Update versions ES 1.4.0
Bug fix for initial import of sharded collections
Bug fix for competing with other rivers for resources
Bug fix and test for skipping sync of elements when restarting river

2.0.2

Update versions ES 1.3.5, MongoDB 2.6.5, MongoDB driver 2.12.4, Java 7
Improved scaling for running multiple rivers and numerous bug fixes by @ankon of Collaborne B.V.
Improved TokuMX support by @kdkeck of Connectifier, Inc.
Support for collection names with period by @qraynaud

2.0.1

Update versions ES 1.2.2, MongoDB 2.6.3, MongoDB driver 2.12.3
Support for TokuMX by @kdkeck of Connectifier, Inc.

2.0.0

Update versions ES 1.0.0, MongoDB 2.4.9, MongoDB driver 2.11.4
Detection of stale river

1.7.4

Make include_fields work for nested fields
Fix ensuring river status indicates failures

1.7.3

Update versions ES 0.90.7
Optimization of oplog.rs query. The current query was too complex and not efficient in MongoDB. oplog.rs query now uses only $ts filter.
New options/import_all_collections parameter can be used to import all collection of a database (see issue #177)
Version, commit data and commit id are displayed in the log when the river starts.
New options/store_statistics parameter can be used to store statistic each time bulk processor is flushed. Data are store in _river/{river.name}
Default value for index/bulk/concurrent_bulk_requests has been changed to the number of cores available.
Capture failures from bulk processor and set status to IMPORT_FAILED when found.
Fix issues in document indexed counter in administration.
Refactoring to use 1 bulk processor per index/type.

1.7.2

Update versions MongoDB 2.4.8
Optimization of oplog.rs filter (see issue #123)
options/drop_collection will also track dropDatabase (see issue #133)
Add filter in the initial import by @bernd (see issue #157)
MongoDB default port is 27017 (if not specific in configuration) (see issue #159)
Allow alias in place of index (see issue #163)
Initial import supports existing index (see issue #167)
New parameter options/skip_initial_import to skip initial import using collection data. Default value is false.
Stop properly the river when deleted (see issue #169)
Administration updates: stopped river can be deleted from the administration, auto-refresh.
Switch to BulkProcessor API to provide more flexible configuration during bulk indexing. (see commit)

1.7.1

Update versions ES 0.90.5, MongoDB driver 2.11.3
Initial import using the collection by @benmccann. (see issue #47)
Add unit test to validate Chinese support (see issue #95)
Ensure MongoDB cursor is closed (see issue #comment-24427369)
River administration has been improved. (see issue #109)
Allow fields ts or op to be used in user collection. (see pr #136)
Use of OPLOG_REPLAY to query oplog.rs

1.7.0

Update versions ES 0.90.3, MongoDB 2.4.6
Ability to index documents from a given datetime (see issue #102)
Fix for options/exclude_fields by @ozanozen (see issue #103)
Fix for options/drop_collection (see issue #105)
New advanced transformation feature. (see issue #106)
Add site to the river. Initial implementation (only start / stop river or display river settings). (see issue #109)
Implement include fields (see issue #119)
Refactoring of the river definition (new class MongoDBRiverDefinition).

1.6.11

Add SSL support by @alistair (see #94)
Add support for $set operation (see issue #91)
Add Groovy unit test (for feature #87)
Update versions ES 0.90.2, MongoDB 2.4.5 and MongoDB driver 2.11.2
Fix for options/drop_collection option (issue #79)
New options/include_collection parameter to include the collection name in the document indexed. (see #101)

1.6.9

Allow the script filters to modify the document id (see #83)
Support for Elasticsearch 0.90.1 and MongoDB 2.4.4
Improve exclude fields (support multi-level - see #76)
Fix to support ObjectId (see issue #85)
Add logger object to script filters
Provide example for Groovy script (see issue#87)

1.6.8

Implement exclude fields (see issue #76)
Improve reconnection to MongoDB when connection is lost (see issue #77)
Implement drop collection feature (see issue #79). The river will drop all documents from the index type.

1.6.7

Issue with sharded collection (see issue #46)

1.6.6

Support for Elasticsearch 0.90.0 and MongoDB 2.4.3
MongoDB driver 2.11.1 (use of MongoClient)

Building from master

Install Maven (e.g brew install maven). Be sure mvn command is available.

Clone the repository:

git clone https://github.com/richardwilly98/elasticsearch-river-mongodb.git

Execute installation script (inside the dir where you cloned the project):

./elasticsearch-river-mongodb/install-local.sh

If your elasticsearch is not installed in /usr/share/elasticsearch you can set it ES_HOME. eg:

ES_HOME=/usr/local/Cellar/elasticsearch/1.1.1/ ./elasticsearch-river-mongodb/install-local.sh

Developing in Eclipse

Install the m2eclipse plugin:

Work with: --All Available Sites--
Under Collaboration choose the m2e plugins

Install the testng eclipse plugin

Run mvn eclipse:eclipse -DdownloadSources=true

Import the project with File > Import > Maven > Existing Maven Projects

elasticsearch-river-mongodb's People

Contributors

Stargazers

Watchers

Forkers

nomic akhavi jamescarr samvj xma laigood huchunyu ajaxros val9 cogenta zhengreat gustavocoding spancer fragro jdalecki hopi alexning kwonder clkao bitted rafael-munoz mblodnick zealdin findly-inc contextworks pablomolnar woisio christos-papoulas rbung hemant19cse maxlang lilithwittmann shinytechtastic jassinm benmccann micka47 bernd jbinfo fmacicasan alexandra12 ianjw11 jidev chrisbg hengesense aweiland vantroy mallorymegan1984 johnnncodes smergler firdausramlan yilab duego lijinhui philmod kdkeck nicolastr fashtimedotcom llvtt mykabam micatom anuva smurp ekochnev cdosso syzer castrovilli fredoche sipims qraynaud suminda123 pmariano foxlik gusnips eon01 jlinn tmulin minewhat noonanmedia renttherunway javajian deepakr199 l4inoday lsnyder prakashru lkanikka yuxinling priyanka1308 danielsmithsd dharshanr fc13240 likaiguo noah- bakergh suensummit maziadi gpstathis nai0om adammendoza goudru hellodengfei

elasticsearch-river-mongodb's Issues

failed bulk item index

I am failing on the indexing of the following object. I suspect, however, that this may be an issue with the json schema. I have sub objects that are arrays in the case where multiple exist (such as location), however, they are simply objects where there is only one.

[Masque] [clinicaltrialindex][4] failed to execute bulk item (index) index {[clinicaltrialindex][clinicaltrial][4ffdebc4bc313a65577ec5bf], source[{"_id":"4ffdebc4bc313a65577ec5bf","brief_summary":{"textblock":"The purpose of this study is to see if it is safe and effective to give an experimental anti-HIV drug, adefovir dipivoxil (ADV), in combination with other anti-HIV drugs (HAART) to patients who have a viral load (level of HIV in the blood) between 50 and 400 copies/ml."},"brief_title":"A Study on the Safety and Effectiveness of Adefovir Dipivoxil in Combination With Anti-HIV Therapy (HAART) in HIV-Positive Patients","condition":"HIV Infections","condition_browse":{"mesh_term":["HIV Infections","Acquired Immunodeficiency Syndrome"]},"detailed_description":{"textblock":"Patients are randomized to 1 of 2 arms in a 2:1 ratio. Approximately 260 patients receive ADV and approximately 130 patients receive placebo. Patients receive ADV or placebo in addition to L-carnitine and their current stable HAART regimen. Each patient receives blinded study medication for 48 weeks and is evaluated at Weeks 16, 24, and 48. Patients who reach the primary endpoint of virologic failure prior to Week 48 may continue blinded study medication or receive open-label ADV at the investigator's discretion. In both cases, patients continue their study visits as per the original visit schedule. Virologic failure is defined as 2 consecutive HIV-1 RNA measurements, after baseline, above 400 copies/ml (measured by the Roche Amplicor HIV-1 Monitor UltraSensitive assay) drawn at least 14 days apart. All patients who complete study visits without treatment-limiting ADV toxicity may continue open-label ADV in the Maintenance Phase at the discretion of the principal investigator."},"eligibility":{"criteria":{"textblock":"Inclusion Criteria You may be eligible for this study if you: - Are HIV-positive. - Have been on a stable HAART regimen consisting of at least 3 antiretroviral drugs for at least 16 weeks prior to study entry. - Have a CD4 count of 50 cells/mm3 or more. - Have a viral load greater than 50 and less than or equal to 400 copies/ml within 14 days prior to study entry. - Have had at least 1 additional viral load in the past that was less than or equal to 400 copies/ml while on your current stable HAART regimen."},"gender":"Both","minimum_age":"N/A","maximum_age":"N/A","healthy_volunteers":"No"},"enrollment":"390","firstreceived_date":"November 2, 1999","has_expanded_access":"No","id":"NCT00002426","id_info":{"org_study_id":"232K","secondary_id":"GS-97-415","nct_id":"NCT00002426"},"intervention":{"intervention_type":"Drug","intervention_name":"Adefovir dipivoxil"},"intervention_browse":{"mesh_term":["Adefovir","Adefovir dipivoxil","Reverse Transcriptase Inhibitors"]},"keyword":["HIV-1","RNA, Viral","VX 478","Reverse Transcriptase Inhibitors","Anti-HIV Agents","Viral Load"],"lastchanged_date":"June 23, 2005","location":[{"facility":{"name":"Pacific Oaks Research","address":{"city":"Beverly Hills","state":"California","zip":"90211","country":"United States"}}},{"facility":{"name":"ViRx Inc","address":{"city":"Palm Springs","state":"California","zip":"92262","country":"United States"}}},{"facility":{"name":"Ctr for AIDS Research / Education and Service (CARES)","address":{"city":"Sacramento","state":"California","zip":"95814","country":"United States"}}},{"facility":{"name":"San Francisco VA Med Ctr","address":{"city":"San Francisco","state":"California","zip":"94121","country":"United States"}}},{"facility":{"name":"Kaiser Foundation Hospital","address":{"city":"San Francisco","state":"California","zip":"94118","country":"United States"}}},{"facility":{"name":"San Francisco Gen Hosp / UCSF AIDS Program","address":{"city":"San Francisco","state":"California","zip":"94110","country":"United States"}}},{"facility":{"name":"Blick Med Associates","address":{"city":"Stamford","state":"Connecticut","zip":"06901","country":"United States"}}},{"facility":{"name":"George Washington Univ Med Ctr","address":{"city":"Washington","state":"District of Columbia","zip":"20037","country":"United States"}}},{"facility":{"name":"Georgetown Univ Med Ctr","address":{"city":"Washington","state":"District of Columbia","zip":"20007","country":"United States"}}},{"facility":{"name":"Dupont Circle Physicians Group","address":{"city":"Washington","state":"District of Columbia","zip":"200091104","country":"United States"}}},{"facility":{"name":"IDC Research Initiative","address":{"city":"Altamonte Springs","state":"Florida","zip":"32701","country":"United States"}}},{"facility":{"name":"Community AIDS Resource Inc","address":{"city":"Coral Gables","state":"Florida","zip":"33146","country":"United States"}}},{"facility":{"name":"TheraFirst Med Ctrs Inc","address":{"city":"Fort Lauderdale","state":"Florida","zip":"33308","country":"United States"}}},{"facility":{"name":"Duval County Health Department","address":{"city":"Jacksonville","state":"Florida","zip":"32206","country":"United States"}}},{"facility":{"name":"Health Positive","address":{"city":"Safety Harbor","state":"Florida","zip":"34695","country":"United States"}}},{"facility":{"name":"Center for Quality Care","address":{"city":"Tampa","state":"Florida","zip":"33609","country":"United States"}}},{"facility":{"name":"Georgia Research Associates","address":{"city":"Atlanta","state":"Georgia","zip":"30342","country":"United States"}}},{"facility":{"name":"Rush Presbyterian - Saint Luke's Med Ctr","address":{"city":"Chicago","state":"Illinois","zip":"60612","country":"United States"}}},{"facility":{"name":"Indiana Univ Infectious Disease Research Clinic","address":{"city":"Indianapolis","state":"Indiana","zip":"46202","country":"United States"}}},{"facility":{"name":"Johns Hopkins Univ School of Medicine","address":{"city":"Baltimore","state":"Maryland","zip":"21205","country":"United States"}}},{"facility":{"name":"Albany Med College","address":{"city":"Albany","state":"New York","zip":"12208","country":"United States"}}},{"facility":{"name":"Mount Sinai Med Ctr","address":{"city":"New York","state":"New York","zip":"10029","country":"United States"}}},{"facility":{"name":"St Luke Roosevelt Hosp","address":{"city":"New York","state":"New York","zip":"10011","country":"United States"}}},{"facility":{"name":"Bentley-Salick Med Practice","address":{"city":"New York","state":"New York","zip":"10011","country":"United States"}}},{"facility":{"name":"James Jones MD","address":{"city":"New York","state":"New York","zip":"10019","country":"United States"}}},{"facility":{"name":"Wake Forest Univ School of Medicine","address":{"city":"Winston Salem","state":"North Carolina","zip":"27157","country":"United States"}}},{"facility":{"name":"Associates of Med and Mental Health","address":{"city":"Tulsa","state":"Oklahoma","zip":"74114","country":"United States"}}},{"facility":{"name":"The Research and Education Group","address":{"city":"Portland","state":"Oregon","zip":"97210","country":"United States"}}},{"facility":{"name":"Roger Williams Med Ctr","address":{"city":"Providence","state":"Rhode Island","zip":"02908","country":"United States"}}},{"facility":{"name":"Miriam Hosp","address":{"city":"Providence","state":"Rhode Island","zip":"02906","country":"United States"}}},{"facility":{"name":"Vanderbilt Univ School of Medicine","address":{"city":"Nashville","state":"Tennessee","zip":"37212","country":"United States"}}},{"facility":{"name":"Univ of Texas Southwestern Med Ctr of Dallas","address":{"city":"Dallas","state":"Texas","zip":"75235","country":"United States"}}},{"facility":{"name":"Univ of Texas Med Branch","address":{"city":"Galveston","state":"Texas","zip":"77555","country":"United States"}}},{"facility":{"name":"Thomas Street Clinic","address":{"city":"Houston","state":"Texas","zip":"77009","country":"United States"}}},{"facility":{"name":"Univ of Utah Med School / Clinical Trials Ctr","address":{"city":"Salt Lake City","state":"Utah","zip":"84108","country":"United States"}}},{"facility":{"name":"Infectious Disease Physicians Inc","address":{"city":"Annandale","state":"Virginia","zip":"22203","country":"United States"}}},{"facility":{"name":"N Touch Research Corp","address":{"city":"Seattle","state":"Washington","zip":"98122","country":"United States"}}},{"facility":{"name":"St Paul's Hosp","address":{"city":"Vancouver","state":"British Columbia","country":"Canada"}}},{"facility":{"name":"Sunnybrook Health Science Centre","address":{"city":"Toronto","state":"Ontario","country":"Canada"}}},{"facility":{"name":"Centre hospitalier de l'Universite de Montreal (CHUM)","address":{"city":"Montreal","state":"Quebec","country":"Canada"}}},{"facility":{"name":"Hopital Edouard Herriot","address":{"city":"Lyon Cedex 03","country":"France"}}},{"facility":{"name":"Hopital Sainte-Marguerite","address":{"city":"Marseille","country":"France"}}},{"facility":{"name":"Klinikum Der Johann Wolfgang Goethe Universitat","address":{"city":"Frankfurt","country":"Germany"}}},{"facility":{"name":"Universitatskrankenhaus Eppendorf","address":{"city":"Hamburg","country":"Germany"}}},{"facility":{"name":"Klinikum der Ludwig-Maximilians-Universitaet","address":{"city":"Muenchen","country":"Germany"}}},{"facility":{"name":"Royal Free Hosp","address":{"city":"London","country":"United Kingdom"}}},{"facility":{"name":"King's College Hospital","address":{"city":"London","country":"United Kingdom"}}},{"facility":{"name":"Chelsea and Westminster Hosp","address":{"city":"London","country":"United Kingdom"}}},{"facility":{"name":"Senior Lecturer in GU Medicine","address":{"city":"London","country":"United Kingdom"}}}],"location_countries":{"country":["United States","Canada","France","Germany","United Kingdom"]},"official_title":"A Randomized, Double-Blind, Placebo-Controlled, Multicenter Study of the Safety and Efficacy of Adefovir Dipivoxil as Intensification Therapy in Combination With Highly Active Antiretroviral Therapy (HAART) in HIV Infected Patients With HIV-1 RNA > 50 and <= 400 Copies/Ml","overall_status":"Completed","oversight_info":{"authority":"United States: Food and Drug Administration"},"phase":"N/A","required_header":{"download_date":"Information obtained from ClinicalTrials.gov on July 10, 2012","link_text":"Link to the current ClinicalTrials.gov record.","url":"http://clinicaltrials.gov/show/NCT00002426"},"source":"NIH AIDS Clinical Trials Information Service","sponsors":{"lead_sponsor":{"agency":"Gilead Sciences","agency_class":"Industry"}},"study_design":"Endpoint Classification: Safety Study, Masking: Double-Blind, Primary Purpose: Treatment","study_type":"Interventional","verification_date":"December 1999"}]}
org.elasticsearch.index.mapper.MapperParsingException: object mapping for [clinicaltrial] tried to parse as object, but got EOF, has a concrete value been provided to it?
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:447)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:437)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:311)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

Unable to create indexes NoShardAvailableActionException

Hi,

While mongo and elasticsearch seams to be running fine, I have trouble creating new indexes.
After some trails and error i noted that he river mongodb status seems to be erroneous...
The URL: http://localhost:9200/_river/mongodb/_status

Returns:
{"error":"NoShardAvailableActionException[[_river][0] No shard available for [[_river][mongodb][_status]: routing [null]]]","status":500}

Someone has clue what could be causing this?

EleastiSearch 0.20.2
River plugin 1.61
Mongodb 2.2.2

Can I retrieve attribute from mongo db while searching elastic search index (mongoIndex).

Hi,

I have three fields id, name and details in mongo db but I am only indexing id and name
in elastic search and setting details to null using script filter . Can i some how query the
elastic search and force it to include details field from mongo db in the same single request.

Regards
Saud Ur Rehman.

org.elasticsearch.common.collect.computationexception

Hey guys,

Iam trying to develop a native elasticsearch service for my project. Im using Java API and followed all guide but i got these exception below:

org.elasticsearch.common.collect.ComputationException: java.lang.NoClassDefFoundError: org/apache/lucene/analysis/ga/IrishAnalyzer

I searched any-all Google results and much of them say that this issue is related with Maven dependency besides it is not meaningful to me. Here is my usage:

Node node = NodeBuilder.nodeBuilder().node();
Client _nodeClient = node.client();
SearchResponse resSearch = null;
try{
resSearch = _nodeClient.prepareSearch("videos")
.setSearchType(SearchType.DEFAULT)
.setQuery(QueryBuilders.queryString("q:anytext"))
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();

Could you give a hand pls? I can not keep moving on :(

Pattern matching in a collection name

I've found in MongoDBRiver.getIndexFilter() an interesting line with following code:

            filter.put(OPLOG_NAMESPACE, Pattern.compile(mongoOplogNamespace));

It seems like a bug or a half-feature... :)

mongoOplogNamespace is the concatenation of a DB name and a collection name via DOT!

    mongoOplogNamespace = mongoDb + "." + mongoCollection;

But DOT in patterns is any symbol.

So potentialy, it is possible to get all data from repository DB using a configuration like:

...
"mongodb": { 
    "db": "repo", 
    "collection": "i"
}
...

because it gives repo.i regular expression matches repository string. Also it matches any collections which are started from 'i' in the repo DB. Or repository collections in any DB.

I've faced with this problem when I duplicated collections in the MongoDB and the River caught those collections.

Is river 1.5.0 working with mongodb 2.2.2/es 0.19.12?

I'm having the same problem as #37, except I'm running mongo 2.2.2/es 0.19.12. Everything setup, replset and all. No documents seem to make it down the river from mongo to es.

Could you please confirm 1.5.0 should be working with 2.2.2/0.19.12?

...
[2012-12-07 22:36:27,973][INFO ][river.mongodb            ] [Astronomer] [mongodb][mongogridfs] Using mongodb server(s): host [localhost], port [27017]
[2012-12-07 22:36:27,973][INFO ][river.mongodb            ] [Astronomer] [mongodb][mongogridfs] starting mongodb stream. options: secondaryreadpreference [false], throttlesize [500], gridfs [true], filter [], db [testmongo], indexing to [testmongo]/[files]
[2012-12-07 22:36:27,974][INFO ][river.mongodb            ] [Astronomer] [mongodb][mongogridfs] Mapping: {"files":{"properties":{"content":{"type":"attachment"},"filename":{"type":"string"},"contentType":{"type":"string"},"md5":{"type":"string"},"length":{"type":"long"},"chunkSize":{"type":"long"}}}}
[2012-12-07 22:36:28,142][INFO ][river.mongodb            ] [Astronomer] [mongodb][mongogridfs] No known previous slurping time for this collection

That's the last. The files collection is being filled with pdf's using mongofiles, but sadly no indexing takes place. I've been banging my head for the last couple of hours or so. Please shed some light on this. Thanks a bunch.

Regards,
Roland.

request body dose not work?

Hi,
I'm trying your mongodb river and i find that it can work with query string, but not request body. Any clue?

with query string

curl -XGET http://localhost:9200/mongoidx/jobs/_search?pretty=true&q=title:SA

It'll return search results as expected.

with quest body
curl -XPOST http://localhost:9200/mongoidx/jobs/_search?pretty=true -d '{
"query" : { "term" : { "title" : "SA" }}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
'

Thanks

Suggestion: Initial sync

Hey Richard,
I would like to suggest some sort of initial sync functionality (optional).

Something like when you create the river via the PUT api, some additional options regarding on how the user would like to perform the initial sync.

This would be a "one time" operation. I dont even know if it is possible...

The main issue is that not everything is on the oplog, especially for really large and stale collections...
So it would be nice to implement a set of options that would allow the user to tell the river to pull all data from mongo (much like a GetAll operation).

Of course we could discuss different strategies for pulling the data, such as:

GetAll (easy, but cumbersome for large collections)
via MongoDump, MongoExport of BsonDump
Others..?

It would be nice to support different import strategies, much like as plugins for this river.

Keep up the good work :)

what does throttlesize do?

The readme would benefit from a brief description of what the throttlesize param does.

Obviously it throttles something, but what? In what situations would you need to change it? Is giving the Java vm more memory an alternative? Are there any consequences of raising/lowering it? Does it mean that there's a delay in indexing under certain circumstances?

I've read #30 and #23 but still don't have a great understanding of it.

Converting a Standalone to a Replica Set

Hi Richard,
The article “Convert a Standalone to a Replica Set” describes how to set mongodb for a replica set using the console .
1)How do I set it for the windows service? Right now my service is started by the windows service manager like this:
“C:\mongodb\bin\mongod.exe" --config "c:\mongodb\mongod.cfg" –service”

Should I change it to:
“C:\mongodb\bin\mongod.exe" –port 27017 –replSet rs0 --config "c:\mongodb\mongod.cfg" –service”
… or this is one time initializatiom?
Do I run :
rs.initiate()
.... just once or every time I import any data into mongodb?

Is this the right forum to ask questions like this or there is another one?
Regards,
Janusz

Indexing of document with property "attachment" fails

I have a collection of activities. The activity document contains a property "attachment" which at the moment just holds a string. The river doesn't seem to like the property name "attachment". I had to change the name to make it work.
Apart from that weird behavior the river works fine for all my collections.

how do I configure mongodb repl to to get this river working?

Can you suggest the best way to configure mongo replication for this river to work please? It would be great to know what to set in my mongodb.conf file without having to read up all about replication config etc.

NoSuchElementException and no search result.

Hi, I forgot something or did something wrong? there is no search result.
I google by "elasticsearch IndexMissingException" but can not solve it.
These(1-5) are what I do:
1, bin/mongod --directoryperdb --dbpath=/var/data/db --logpath=/var/data/log/mongodb.log --fork
2, bin/elasticsearch
3, curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
4, bin/mongo
use testmongo
db.person.save({firstName: "John", lastName: "Doe"})
5, curl -XGET "localhost:9200/testmongo/person/_search?q=firstName:John&pretty=true"

And I got:
{
"error" : "IndexMissingException[[testmongo] missing]",
"status" : 404
}
When I tail the elasticsearch.log, there are some exceptions:
java.util.NoSuchElementException
at java.util.LinkedList$ListItr.next(LinkedList.java:715)
at com.mongodb.DBCursor._next(DBCursor.java:453)
at com.mongodb.DBCursor.next(DBCursor.java:533)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processFullCollection(MongoDBRiver.java:378)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:353)
at java.lang.Thread.run(Thread.java:636)
[2012-04-06 18:19:30,636][INFO ][river.mongodb ] [Rama-Tut] [mongodb][mongodb] No known previous slurping time for this collection

My environment:
debian 6 64bit
openjdk 6 64bit
mongodb 2.0.4 Linux 64-bit
elasticsearch 0.19.2
and
plugin -install elasticsearch/elasticsearch-mapper-attachments/1.2.0
plugin -install richardwilly98/elasticsearch-river-mongodb/1.1.0

So, what's my wrong please?

How to proper use this plugin?

I've tried method from doc:

curl -X PUT "localhost:9200/_river/mongodb/_meta" -d '{
"type": "mongodb",
"mongodb":{
"db":"dbtest",
"collection":"users",
"index":{"
name":"mongoindex",
"type":"users"
}
}}'

after getting normal result:
{"ok":true,"_index":"_river","_type":"mongodb","_id":"_meta","_version":1}

I can access result on the nex url:
http://localhost:9200/dbtest/_search
but should be as far as I understand something like:
http://localhost:9200/dbtest/users/_search

Can someone explain me how to proper configure this to work with several collections

and is it possible to index not every field from mongo document but only a few?

Can river mongo select specific attributes for ElasticSearch to index

Hi we are looking for the functionality to select the attributes for ElasticSearch to index.
For example, i may have millions of records like this in my mongo collection:

{ "_id" : ObjectId("509e5cb863cade071b013552"),

"id" : "235601010750659014_6335261",

"tags" : [ "beach", "losangeles", "california" ],

"user" : { "username" : "xxxx", "website" : "", "bio" : "xxxxx", "profile_picture" : "http://some_website.com/xxx.jpg", "full_name" : "XXXXX", "id" : "1234" },

"comments" : { "count" : 12 },

"images" : { "low_resolution" : { "url" : "http://some_website.com/xxx.jpg", "width" : 306, "height" : 306 }, "thumbnail" : { "url" : "http://some_website.com/xxx.jpg", "width" : 150, "height" : 150 }, "standard_resolution" : { "url" : "http://some_website.com/xxx.jpg", "width" : 612, "height" : 612 } },
}

And suppose I just want index on document.tags, document.user.username, document.user.full_name, document.user.bio.
Can we use river to tell ElasticSearch to index just those attributes (even attributes within attributes)?
So this is different from filtering records by attribute using the new "script feature".
Thank you very much.

Doesn't work with sharded collections.

The oplog is only available for local transactions. How to capture all data from MongoDB in a sharded environment?

Exception: java.lang.NoSuchMethodError: com.mongodb.Mongo.fsyncAndLock()

Following the wiki / example. I get this exception:

[2012-03-13 12:38:10,971][INFO ][cluster.metadata ] [Metalhead] [mongoindex] creating index, cause [api], shards [5]/[1], mappings [] [2012-03-13 12:38:11,596][INFO ][river.mongodb ] [Metalhead] [mongodb][mongodb] No known previous slurping time for this collection Exception in thread "elasticsearch[Metalhead]mongodb_river_slurper-pool-26-thread-1" java.lang.NoSuchMethodError: com.mongodb.Mongo.fsyncAndLock()Lcom/mongodb/CommandResult; at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processFullCollection(MongoDBRiver.java:375) at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:353) at java.lang.Thread.run(Thread.java:636) [2012-03-13 12:38:11,845][INFO ][cluster.metadata ] [Metalhead] [_river] update_mapping [mongodb] (dynamic)

issues with mongo 2.2.1

Just wondering if this known to work at all with the latest version of mongo 2.2.1? I've got a replica set and elasticsearch seems to start up fine, but when I add data nothing happens. Also it doesn't seem to initially import any data. Here's my ES log:

[2012-10-30 23:32:14,131][INFO ][discovery ] [Umar] elasticsearch/435P26SvQLGKaxfCf1G_kg
[2012-10-30 23:32:14,589][INFO ][http ] [Umar] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.0.19:9200]}
[2012-10-30 23:32:14,659][INFO ][node ] [Umar] {0.19.11}[2009]: started
[2012-10-30 23:32:33,094][INFO ][gateway ] [Umar] recovered [3] indices into cluster_state
[2012-10-30 23:32:45,167][INFO ][river.mongodb ] [Umar] [mongodb][mongodb] Using mongodb server(s): host [localhost], port [27017]
[2012-10-30 23:32:45,235][INFO ][river.mongodb ] [Umar] [mongodb][mongodb] starting mongodb stream: options: secondaryreadpreference [false], gridfs [false], filter [testmongo], db [mongoindex], indexing to [person]/[{}]
[2012-10-30 23:32:46,104][INFO ][river.mongodb ] [Umar] [mongodb][mongogridfs] Using mongodb server(s): host [localhost], port [27017]
[2012-10-30 23:32:46,132][INFO ][river.mongodb ] [Umar] [mongodb][mongogridfs] starting mongodb stream: options: secondaryreadpreference [false], gridfs [true], filter [testmongo], db [testmongo], indexing to [files]/[{}]
[2012-10-30 23:32:46,171][INFO ][river.mongodb ] [Umar] [mongodb][mongogridfs] Mapping: {"files":{"properties":{"content":{"type":"attachment"},"filename":{"type":"string"},"contentType":{"type":"string"},"md5":{"type":"string"},"length":{"type":"long"},"chunkSize":{"type":"long"}}}}
[2012-10-30 23:32:49,754][INFO ][river.mongodb ] [Umar] [mongodb][mongogridfs] No known previous slurping time for this collection

It stops there and even when I add new data nothing gets added to this log. ES is still running since I can run queries. This is the result of running this: curl -XGET 'http://localhost:9200/testmongo/_count'

{"count":0,"_shards":{"total":5,"successful":5,"failed":0}}

Anyways not even sure if 2.2.1 is supported. If not what is the highest version of mongo supported?

No known previous slurping time

I tried to follow these instructions exactly:
https://gist.github.com/2029361

But when I run:
curl -XGET "http://localhost:9200/testmongo/_search?q=firstName:John"
I get:
{"error":"IndexMissingException[[testmongo] missing]","status":404}

The elasticsearch log just keeps repeating this:

java.util.NoSuchElementException
at java.util.LinkedList$ListItr.next(LinkedList.java:698)
at com.mongodb.DBCursor._next(DBCursor.java:453)
at com.mongodb.DBCursor.next(DBCursor.java:533)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processFullCollection(MongoDBRiver.java:378)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:353)
at java.lang.Thread.run(Thread.java:680)
[2012-05-29 02:29:43,249][INFO ][river.mongodb ] [Node1] [mongodb][mongodb] No known previous slurping time for this collection
[2012-05-29 02:29:43,252][INFO ][node ] [Node1] {0.19.3}[5532]: stopping ...
[2012-05-29 02:29:43,261][INFO ][river.mongodb ] [Node1] [mongodb][mongodb] closing mongodb stream river
[2012-05-29 02:29:43,270][WARN ][river.mongodb ] [Node1] [mongodb][mongodb] A mongoDB cursor bug ?

And the mongodb log just keeps repeating this:
Tue May 29 02:29:43 [conn3] CMD fsync: sync:1 lock:1
Tue May 29 02:29:43 [conn3] removeJournalFiles
Tue May 29 02:29:43 [fsyncjob] db is now locked for snapshotting, no writes allowed. db.fsyncUnlock() to unlock
Tue May 29 02:29:43 [fsyncjob] For more info see http://www.mongodb.org/display/DOCS/fsync+Command
Tue May 29 02:29:43 [conn3] command: unlock requested

Any ideas on what I am doing wrong?

Filter logic for river

Hi. I am just starting to investigate using the river concept for my mongodb/elasticsearch setup. The thing that I was wondering is if it would be possible to implement a filter on the river such that it will only detect changes that meet a certain criteria, e.g. in my case I don't want elasticsearch to grab a record from my mongodb until a certain field has been set. Is there a way to accomplish this with the current way that the river is implemented? Please advise. And thanks in advance.

How about transaction

Hi,
I started to wonder how elasticsearch handles transaction with the shards – is it at all possible?
Is elsaticsearch transaction-less app – and is left to the main CRUD database to handle it?
Regards,
Janusz

no longer works after a period of inactivity

Mongodb 2.0.4 (10gen / debian squeeze)
Elasticsearch 0.19.0 & 0.19.2 debian build
elasticsearch-river-mongodb 1.1.0 with mongodb driver 2.7.2

it works well in the evening and the next day it no longer works... ?

no errors in logs :/

Do not use fsync/lock

It seems like your use case of fsynclock is not needed. Please remove your use of them.

Add check if oplog collection exists.

The river should be disabled / stopped if this required collection does not exist.

Slurping large collections

We have large (15million+ documents, 30GB) collections in Mongodb.
Our servers have 16GB ram & 8 cores, fast local storage and 10GB ethernet.

Trying to use river to auto-syncronise elasticsearch and mongodb.

When I start a river running, elasticsearchs memory use appears to climb without limit, eventually getting stuck in a garbage collection loop and failing.

Examination of the code suggests the stream between the slurper and indexer threads is growing unboundedly as the indexer cannot keep up with the slurper. (the slurp is sustaining about 100Mbit, 5000-10000 documents per second)

Perhaps a slurp rate throttle or maximum stream queue size would allow the slurper to back off and let the indexer catch up.

Gradle dependencies error

Hi, i'am stuck at start : /

Project with path ':elasticsearch' could not be found in root project 'elasticsearch-river-mongodb'.

error on IndexMissingException

Hi there, I've tried to follow the same issues regarding NoSuchElementException
{"error":"IndexMissingException[[mongoindex] missing]","status":404}

Below is ES.log, I've tried to set the log to debug and reinstall plugins. From the ES.log, it seems not even tried to tell if it did find the mongodb replicaset or not.

[2012-06-11 16:15:14,038][INFO ][node ] [Joe Fixit] {0.19.4}[26020]: initializing ...
[2012-06-11 16:15:14,055][INFO ][plugins ] [Joe Fixit] loaded [river-mongodb, mapper-attachments], sites []
[2012-06-11 16:15:16,260][INFO ][node ] [Joe Fixit] {0.19.4}[26020]: initialized
[2012-06-11 16:15:16,261][INFO ][node ] [Joe Fixit] {0.19.4}[26020]: starting ...
[2012-06-11 16:15:16,362][INFO ][transport ] [Joe Fixit] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/192.168.1.11:9300]}
[2012-06-11 16:15:19,571][INFO ][cluster.service ] [Joe Fixit] detected_master [Reyes, Cecelia][ecfXwyIWSSOo5T3m756Vvg][inet[/192.168.1.11:9301]], added {[Lighting Rod][mJb8jdIPQxWDpzrVK9B4ZA][inet[/192.168.1.11:9302]],[Living Eraser][qEpGWyf5S2SR4gP-jhcsUw][inet[/192.168.1.11:9303]],[Reyes, Cecelia][ecfXwyIWSSOo5T3m756Vvg][inet[/192.168.1.11:9301]],}, reason: zen-disco-receive(from master [[Reyes, Cecelia][ecfXwyIWSSOo5T3m756Vvg][inet[/192.168.1.11:9301]]])
[2012-06-11 16:15:19,638][INFO ][discovery ] [Joe Fixit] elasticsearch/RPqNAlTZRG6kGaPqRgUIdw
[2012-06-11 16:15:19,641][INFO ][http ] [Joe Fixit] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/192.168.1.11:9200]}
[2012-06-11 16:15:19,641][INFO ][node ] [Joe Fixit] {0.19.4}[26020]: started

I also issued the command from other windows using mongod --replSet foo --port 27017 --dbpath /data/r0 --oplogSize 700, but without lucks, can you please provide any insights? In addition, will any oplog file will get generated I can spot it?

Thanks.

Sharded collections error

Im having a problem when applying the river against a MongoDB sharded environment (it should be supported on river version 1.6.0).

ES version: 0.20.1
River version: 1.6.0
MongoDB Server version: 2.2.0

Short version:

Im getting this error in ES log (the error loops forever until i forcibly stop ES):

[2012-12-14 17:51:58,837][INFO ][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] mongoServersSettings: [{port=27017, host=mongo-flexicloud}]
[2012-12-14 17:51:58,853][INFO ][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] Server: mongo-flexicloud - 27017
[2012-12-14 17:51:58,853][INFO ][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] Using mongodb server(s): host [mongo-flexicloud], port [27017]
[2012-12-14 17:51:58,853][INFO ][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] starting mongodb stream. options: secondaryreadpreference [true], throttlesize [500], gridfs [false], filter [], db [AA], script [null], indexing to [aa]/[catalogs]
[2012-12-14 17:51:59,274][ERROR][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] Mongo gave an exception
com.mongodb.MongoException: can't use 'local' database through mongos
at com.mongodb.MongoException.parse(MongoException.java:82)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:314)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:295)
at com.mongodb.DB.getCollectionNames(DB.java:412)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.assignCollections(MongoDBRiver.java:715)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:756)
at java.lang.Thread.run(Unknown Source)

Long Version:

I will try to provide has much information as possible.

The rig (everything is running on Windows):

The MongoDB cluster:

Mongo-1 (shard 1 - master)
Mongo-2 (shard 1 - secondary)
Mongo-3 (shard 2 - master)
Mongo-4 (shard 2 - secondary)
Mongo-5 (mongos & arbiters) (dns alias: mongo-flexicloud)

Target database is named as "AA" with shardingEnabled and has 2 collections:

Accounts collection (40000 documents) - Sharded
Catalogs collection (57 documents) - Not sharded

The ElasticSearch cluster:
Cluster name: xpto

ES-1 (master: true, data: true)
ES-2 (master: true, data: true)
ES-3 (master: true, data: true)
ES-4 (master: true, data: true)
ES-5 (Coordinator, master: true, data: false) (dns alias: flexilastic)

I setted up the river successfully using the plugin install methods.
I also setted up the river plugin on the other nodes aswell, but it should have no impact whatsoever because im using the ES-5 node to perform the API operations.
Following is the ElasticSearch startup log from the node ES-5 startup:

[2012-12-14 17:49:55,436][INFO ][org.elasticsearch.node ] [Elijah] {0.20.1}[4468]: initializing ...
[2012-12-14 17:49:55,560][INFO ][org.elasticsearch.plugins] [Elijah] loaded [river-mongodb, mapper-attachments], sites [bigdesk, head]
[2012-12-14 17:49:59,289][INFO ][org.elasticsearch.node ] [Elijah] {0.20.1}[4468]: initialized
[2012-12-14 17:49:59,289][INFO ][org.elasticsearch.service] starting...
[2012-12-14 17:49:59,289][INFO ][org.elasticsearch.node ] [Elijah] {0.20.1}[4468]: starting ...
[2012-12-14 17:49:59,398][INFO ][org.elasticsearch.transport] [Elijah] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.100.100.109:9300]}
[2012-12-14 17:50:02,565][INFO ][org.elasticsearch.cluster.service] [Elijah] detected_master [Cosby][hvYoobDTSRWSP47m5Tq4jg][inet[/10.100.100.107:9300]]{master=true}, added {[Jackman][54c5D21oTSyzCtw4svOGCw][inet[/10.100.100.123:9300]]{master=true},[Cosby][hvYoobDTSRWSP47m5Tq4jg][inet[/10.100.100.107:9300]]{master=true},[Lucy][jACaQpOrReishEwtDBAKww][inet[/10.100.100.103:9300]]{master=true},[Belamy][QkdX_dsDRlK08a7KLt2iug][inet[/10.100.100.124:9300]]{master=true},}, reason: zen-disco-receive(from master [[Cosby][hvYoobDTSRWSP47m5Tq4jg][inet[/10.100.100.107:9300]]{master=true}])
[2012-12-14 17:50:02,612][INFO ][org.elasticsearch.discovery] [Elijah] flexilastic/wpPROrKTT7i7omDqFXH8NQ
[2012-12-14 17:50:02,612][INFO ][org.elasticsearch.http ] [Elijah] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.100.100.109:9200]}
[2012-12-14 17:50:02,612][INFO ][org.elasticsearch.node ] [Elijah] {0.20.1}[4468]: started
[2012-12-14 17:50:02,612][INFO ][org.elasticsearch.service] running...

This is request i used to setup the river for the Catalogs collection:

PUT: http://flexilastic:9200/_river/aa-catalogs/_meta
Request BODY:

{
"type" : "mongodb",
"mongodb" : {
"servers" :
[{
"host" : "mongo-flexicloud",
"port" : "27017"
}
],
"db" : "AA",
"collection" : "Catalogs",
"gridfs" : false
},
"index" : {
"name" : "aa",
"type" : "catalogs"
}
}

The i got the following error (the error loops forever until i force ES to shutdown):

Using Mapping along with mongo db

I want to make sure that the details attribute in the type "test" of Index "mongoindex" is not indexed but only stored. I tried the below two command but I can see elastic search still analyzing it.

curl -XPUT localhost:9200/mongoindex -d '{"settings": {"number_of_shards": 5,"number_of_replicas": 1},"mappings": {"test": {"properties": {"details": {"type": "string","index": "no","store":"yes"}}}}}'

curl -XPUT localhost:9200/_river/mongodb/_meta -d '{ "type": "mongodb", "mongodb": {"host":"localhost", "port":27017, "db":"testdb", "collection": "test"}, "index": {"name": "mongoindex", "type": "test"}}'

Regards
Saud Ur Rehman

Cannot find oplog.rs collection

Is it possible to start synchronizing data without sharded mongo installation?
I just want to use it for development process on my local machine. There is only one mongo instance startet without any replication.

mongodb to elasticsearch removal strategy

Question from Martin
Hi Richard,

I am successfully using your mongo 2 elastic river plugin to power the backend of my latest web project. Thank you for taking the time to develop a great bit of code.

I wonder if I could just pick your brains for a second?

The documents I'm pushing into elastic do need to be removed once a certain flag is set.
I want to keep the records in mongo.
I did look at your 'filter' param in the config, but you said there was a limitation where it would not delete records once they were already in elastic.

Do you have any ideas on how I could accomplish this?

Many Thanks,
Martin

Initial import does not work

Hi,

It looks like the river initial import does not work. My setup is the following:

ES 0.19.11
MongoDB 2.2.1
River 1.5.0

After a quick look at the code, a wild guess would be that this appeared when #31 was fixed.
In fact, the method Slurper#getIndexFilter does not return null anymore when there is no input timestamp. This means that the first slurper loop won't execute processFullCollection().

Let me know, if you need more information.
Cheers,
Emmanuel

data being indexed on elasticsearch does not get pushed to mongodb

I initialized the river as documented in the wiki. Replica set is set up, but I only have a single replication server.

When I insert data in mongodb it gets pushed to elasticsearch:

mongo
PRIMARY> use DBNAME
PRIMARY> entry = {    "user" : "phil",
...     "post_date" : "2009-11-15T14:12:12",
...     "message" : "trying out Elastic Search"}
{
    "user" : "phil",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}

db.contacts.insert(entry)

$ curl -XGET 'http://localhost:9200/contacts/_search?pretty=true&size=5000' -d '
> { 
>     "query" : { 
>         "matchAll" : {} 
>     } 
> }'
{
  "took" : 62,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "contacts",
      "_type" : "contact",
      "_id" : "1",
      "_score" : 1.0, "_source" : {
    "user" : "phil",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}
    } ]
  }

However, when I index data to elasticsearch issuing a PUT request, the data does not show up in mongodb (data from above has been cleared before executing the following sample)

$ curl -XPUT 'http://localhost:9200/contacts/contact/1' -d '{
>     "user" : "kimchy",
>     "post_date" : "2009-11-15T14:12:12",
>     "message" : "trying out Elastic Search"
> }'
{"ok":true,"_index":"contacts","_type":"contact","_id":"1","_version":1}forste@machine:~/opt$ 
$ url -XGET 'http://localhost:9200/contacts/_search?pretty=true&size=5000' -d '
{                     
    "query" : {                         
        "matchAll" : {} 
    } 
}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "contacts",
      "_type" : "contact",
      "_id" : "1",
      "_score" : 1.0, "_source" : {
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}
    } ]
  }
}

$ mongo
PRIMARY> use DBNAME
PRIMARY> db.contacts.find()
PRIMARY>

Does the river work both ways?

please can help some one

[2012-11-12 14:44:47,060][WARN ][bootstrap ] jvm uses the client vm, make sure to run java with the server vm for best performance by adding -server to the command line
[2012-11-12 14:44:47,087][INFO ][node ] [Stone] {0.19.9}[4480]: initializing ...
[2012-11-12 14:44:47,182][INFO ][plugins ] [Stone] loaded [river-mongodb, mapper-attachments], sites []
[2012-11-12 14:44:52,254][INFO ][node ] [Stone] {0.19.9}[4480]: initialized
[2012-11-12 14:44:52,278][INFO ][node ] [Stone] {0.19.9}[4480]: starting ...
[2012-11-12 14:44:52,514][INFO ][transport ] [Stone] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/192.168.10.18:9300]}
[2012-11-12 14:44:55,707][INFO ][cluster.service ] [Stone] new_master [Stone][faSz-0aNQf2EBZoZ2q-yQQ][inet[/192.168.10.18:9300]], reason: zen-disco-join (elected_as_master)
[2012-11-12 14:44:55,745][INFO ][discovery ] [Stone] elasticsearch/faSz-0aNQf2EBZoZ2q-yQQ
[2012-11-12 14:44:55,808][INFO ][http ] [Stone] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/192.168.10.18:9200]}
[2012-11-12 14:44:55,808][INFO ][node ] [Stone] {0.19.9}[4480]: started
[2012-11-12 14:44:55,840][INFO ][gateway ] [Stone] recovered [0] indices into cluster_state
[2012-11-12 14:45:49,647][WARN ][transport.netty ] [Stone] Exception caught on netty layer [[id: 0x016c14c0, /127.0.0.1:53945 => /127.0.0.1:9300]]
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.1gb] exceeded [918.7mb]
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:31)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:390)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-12 14:46:38,616][WARN ][transport.netty ] [Stone] Exception caught on netty layer [[id: 0x016c14c0, /127.0.0.1:53945 :> /127.0.0.1:9300]]
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.1gb] exceeded [918.7mb]
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:31)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:478)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:366)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:107)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:399)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:634)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:99)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:390)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-12 14:48:23,804][WARN ][transport.netty ] [Stone] Exception caught on netty layer [[id: 0x002f75e5, /127.0.0.1:53959 => /127.0.0.1:9300]]
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.1gb] exceeded [918.7mb]
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:31)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:390)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-12 14:59:08,316][INFO ][node ] [Stone] {0.19.9}[4480]: stopping ...
[2012-11-12 14:59:08,338][WARN ][transport.netty ] [Stone] Exception caught on netty layer [[id: 0x002f75e5, /127.0.0.1:53959 :> /127.0.0.1:9300]]
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.1gb] exceeded [918.7mb]
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:31)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:478)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:366)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:107)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:399)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:634)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:99)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:390)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-12 14:59:08,368][INFO ][node ] [Stone] {0.19.9}[4480]: stopped
[2012-11-12 14:59:08,371][INFO ][node ] [Stone] {0.19.9}[4480]: closing ...
[2012-11-12 14:59:08,415][INFO ][node ] [Stone] {0.19.9}[4480]: closed
[2012-11-12 16:42:00,664][WARN ][bootstrap ] jvm uses the client vm, make sure to run java with the server vm for best performance by adding -server to the command line
[2012-11-12 16:42:00,680][INFO ][node ] [Order] {0.19.9}[4684]: initializing ...
[2012-11-12 16:42:00,767][INFO ][plugins ] [Order] loaded [river-mongodb, mapper-attachments], sites []
[2012-11-12 16:42:04,099][INFO ][node ] [Order] {0.19.9}[4684]: initialized
[2012-11-12 16:42:04,100][INFO ][node ] [Order] {0.19.9}[4684]: starting ...
[2012-11-12 16:42:04,482][INFO ][transport ] [Order] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/192.168.10.18:9300]}
[2012-11-12 16:42:07,709][INFO ][cluster.service ] [Order] new_master [Order][wixWS6fNTNWrF5hcT35TiA][inet[/192.168.10.18:9300]], reason: zen-disco-join (elected_as_master)
[2012-11-12 16:42:07,811][INFO ][discovery ] [Order] elasticsearch/wixWS6fNTNWrF5hcT35TiA
[2012-11-12 16:42:07,876][INFO ][http ] [Order] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/192.168.10.18:9200]}
[2012-11-12 16:42:07,877][INFO ][node ] [Order] {0.19.9}[4684]: started
[2012-11-12 16:42:07,943][INFO ][gateway ] [Order] recovered [0] indices into cluster_state

Create river for multiple collections from a single mongodb

Hi,

can someone shed some light on the config on howto create a river for multiple collections from a single mongodb.
I already have a config for a single collection up and running.

Best regards,
Meykel

Front Page Examples Don't Work

I'm not sure if there something insanely retarded I am doing but I cannot for the life of me seem to get this to work. I have installed the mapper plugin (1.2.0 but also tried 1.1.0 and 1.3.0), 1.1.0 of this plugin, restarted elasticsearch, then followed the example on the front page and in other sections of the wiki. It doesn't seem to index anything.

Is there something I missed? I am running elasticsearch 0.19.

Thanks,
James

Nothing happens when I put the configuration.

I put

{
    type: "mongodb",
    mongodb: {
        "servers": [
            { host: "localhost", port: "27017" }
        ],
        "credentials": [
            {
                db: "local",
                user: "admin",
                password: "blabla"
            }
        ],
        db: "app_database", 
        collection: "apps",
        gridfs: false 
    },
    index: {
        name: "apps"
    }
}

into http://localhost:9200/_river/mongodb/_meta and nothing happens except the document created (which is what would happen even without the plugin). Yes, I installed the plugin with <ES_HOME>/bin/plugin -install richardwilly98/elasticsearch-river-mongodb/1.4.0 and it's dependency too. When I visit http://localhost:9200/apps/_search it says:

{
"error": "IndexMissingException[[apps] missing]",
"status": 404
}

Update > http://localhost:9200/_river/mongodb/_status says:

{
"_index": "_river",
"_type": "mongodb",
"_id": "_status",
"exists": false
}

Does that mean the plugin is not installed properly?

com.mongodb.MongoException: not talking to master and retries used up

My search is not working now. I guess because my index was not configured for replica set:

curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "mongo",
"host": "local",
"port": "40000",
"collection": "users"
},
"index": {
"name": "api",
"type": "users"
}
}'

Is there anyway to declare properly a replica set so that elasticsearch can find the master. Like PHP do:
$m = new Mongo("mongodb://localhost:40000,localhost:41000", array("replicaSet" => true));

geo_point mapping

Hi,
I have a class called Asset with the member annotated like this:
@Embedded
@indexed(IndexDirection.GEO2D)
private GpsLocation location;// morphia annotation

The “GpsLocation“ object is an instance of a simple class containing two Double latitude/longitude variables.
When I run my tool to create all Asset objects in mongoDB from xml file the Asset collection is created and indexed into elasticsearch db. When I execute code using mongodb and morphia code:
Query dbAsset = productsDAO.getDatastore().find(AssetDB.class).field("location").near(
latitude, longitude, 5);
return fromDbList(dbAsset.asList(), resolve);

… I am getting the correct results.
However mongoDB river creates an Asset where location field looks like this:
"location": {
"dynamic": "true",
"properties": {
"latitude": {
"type": "double"
},
"longitude": {
"type": "double"
}
}
}
… so location is not of type geo_point.
Is it possible to fix it in mongoDB river or by the time it get to river plugin all info that it is geo location is lost?
Regards,
Janusz

Failed to load class with value [mongodb]

I successfully installed elasticsearch with the river Windows in the past, but I cannot do that in Debian. I installed the plugin with bin/plugin as shown in the wiki, and the plugins directory contains exactly what it should:

`-- plugins
    |-- mapper-attachments
    |   |-- elasticsearch-mapper-attachments-1.4.0.jar
    |   `-- tika-app-1.1.jar
    `-- river-mongodb
        |-- elasticsearch-river-mongodb-1.4.0-SNAPSHOT.jar
        `-- mongo-java-driver-2.8.0.jar

When I run elasticsearch it says it's loading the plugin, but http://localhost:9200/_river/mongodb/_status says this error:

NoClassSettingsException[Failed to load class with value [mongodb]]; nested: ClassNotFoundException[mongodb];

Config successful,but mongodb gave an exception

Hi,all. Here it my river Configuration:

curl -XPUT 'http://192.168.1.206:9200/_river/mongodb/_meta' -d '{
type:"mongodb",
mongodb:{
host:"192.168.1.206",
port:27017,
db:"testes",
collection:"userlog"
},
index:{
name:"userlog",
type:"userlog",
bulk_size:1000,
bulk_timeout:30
}
}'

It run successful,but Mongodb throw an exception to ElasticSearch, as

[2012-04-11 15:33:20,140][ERROR][river.mongodb] [Bloodhawk] [mongodb][mongodb] Mongo gave an exception
com.mongodb.MongoException: Could not lock the database for FullCollection sync
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processFullCollection(MongoDBRiver.java:388)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:353)
at java.lang.Thread.run(Unknown Source)
[2012-04-11 15:33:20,156][INFO][river.mongodb] [Bloodhawk] [mongodb][mongodb] No known previous slurping time for this collection

Please tell me what happend, thank you!

com.mongodb.MongoException: can't find a master

If of elasticsearch and mongdb in the normal operation of a machine, if in a different machine error.

help me!

java code :

public void riverMongo3(){
Client client = EsticSearchClientFactory.getClient();
try {
client.prepareIndex("_river", "mongodb", "_meta")
.setSource(
jsonBuilder().startObject()
.field("type", "mongodb")
.startObject("mongodb")
.field("host","192.168.1.133")
.field("port",10000)
.field("db","jua")
.field("collection","blog")
.endObject()
.startObject("index")
.field("name","test")
.field("type","test")
.field("bulk_size","1000")
.field("bulk_timeout","30")
.endObject()
.endObject()
).execute().actionGet();
} catch (ElasticSearchException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

192.168.1.133:10000 can be visited;

error:

[2012-10-30 14:06:27,907][INFO ][cluster.metadata ] [Stygyro] [_river] update_mapping mongodb
[2012-10-30 14:06:28,034][INFO ][river.mongodb ] [Stygyro] [mongodb][mongodb] Using mongodb server(s): host [192.168.1.133], port [10000]
[2012-10-30 14:06:28,035][INFO ][river.mongodb ] [Stygyro] [mongodb][mongodb] starting mongodb stream: options: secondaryreadpreference [false], gridfs [false], filter [jua], db [test], indexing to [test]/[{}]
[2012-10-30 14:06:28,187][INFO ][index.analysis ] [Stygyro] [test] /home/www/es/elasticsearch/config/mmseg
[2012-10-30 14:06:28,192][INFO ][index.analysis ] [Stygyro] [test] /home/www/es/elasticsearch/config/mmseg
[2012-10-30 14:06:28,193][INFO ][index.analysis ] [Stygyro] [test] /home/www/es/elasticsearch/config/mmseg
[2012-10-30 14:06:28,200][INFO ][paoding-analyzer ] postPropertiesLoaded init
[2012-10-30 14:06:28,200][INFO ][paoding-analyzer ] postPropertiesLoaded return
[2012-10-30 14:06:28,202][INFO ][index.analysis ] [Stygyro] [test] /home/www/es/elasticsearch/config/mmseg
[2012-10-30 14:06:28,216][INFO ][paoding-analyzer ] postPropertiesLoaded init
[2012-10-30 14:06:28,216][INFO ][paoding-analyzer ] postPropertiesLoaded return
[2012-10-30 14:06:28,247][INFO ][cluster.metadata ] [Stygyro] [test] creating index, cause [api], shards [5]/[1], mappings []
[2012-10-30 14:06:29,185][ERROR][river.mongodb ] [Stygyro] [mongodb][mongodb] Mongo gave an exception
com.mongodb.MongoException: can't find a master
at com.mongodb.DBTCPConnector.checkMaster(DBTCPConnector.java:437)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:208)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:313)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:298)
at com.mongodb.DB.getCollectionNames(DB.java:298)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.assignCollections(MongoDBRiver.java:509)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:546)
at java.lang.Thread.run(Thread.java:662)
[2012-10-30 14:06:29,196][ERROR][river.mongodb ] [Stygyro] [mongodb][mongodb] Mongo gave an exception
com.mongodb.MongoException: can't find a master
at com.mongodb.DBTCPConnector.checkMaster(DBTCPConnector.java:437)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:208)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:313)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:298)
at com.mongodb.DB.getCollectionNames(DB.java:298)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.assignCollections(MongoDBRiver.java:509)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:546)
at java.lang.Thread.run(Thread.java:662)
[2012-10-30 14:06:29,207][ERROR][river.mongodb ] [Stygyro] [mongodb][mongodb] Mongo gave an exception
com.mongodb.MongoException: can't find a master

Issue with river and gridFS configuration

Note: a non gridFS mongo stream (people from the example) works fine

This is on Mac OS X Lion

[2012-06-08 12:07:37,352][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] starting mongodb stream: host [localhost], port [27017], gridfs [true], filter [testmongo], db [mongoindex], indexing to [files]/[{}]
[2012-06-08 12:07:37,355][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] Mapping: {"files":{"properties":{"content":{"type":"attachment"},"filename":{"type":"string"},"contentType":{"type":"string"},"md5":{"type":"string"},"length":{"type":"long"},"chunkSize":{"type":"long"}}}}
[2012-06-08 12:07:37,408][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection
[2012-06-08 12:07:37,914][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection
[2012-06-08 12:07:38,417][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection
[2012-06-08 12:07:38,920][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection
[2012-06-08 12:07:39,422][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection

plugins/river-mongodb contains:

elasticsearch-river-mongodb-1.3.0-SNAPSHOT.jar
mongo-java-driver-2.7.2.jar

elasticsearch-0.19.4

2.0.6 MongoDB

PRIMARY> rs.status();
{
"set" : "foo",
"date" : ISODate("2012-06-08T19:14:21Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1339182415000,
"i" : 1
},
"optimeDate" : ISODate("2012-06-08T19:06:55Z"),
"self" : true
},
{
"_id" : 1,
"name" : "localhost:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2658,
"optime" : {
"t" : 1339182415000,
"i" : 1
},
"optimeDate" : ISODate("2012-06-08T19:06:55Z"),
"lastHeartbeat" : ISODate("2012-06-08T19:14:21Z"),
"pingMs" : 0
}
],
"ok" : 1
}

Mongodb with username, password

My mongodb has username/password protection. How can i use it in the configuration of river?
I tried:
curl -XPUT "localhost:9200/_river/mongodb/_meta" -d '
{
"type": "mongodb",
"mongodb": {
"db": "mydb",
"host": "localhost",
"collection": "mycol",
"user": "myuser",
"password": "mypassword"
},
"index": {
"name": "myindex",
"type": "mytype"
}
}';

but i get an empty index and in the log i see: [mongodb][mongodb] Invalid credential

Partial update support

@richard

seems like partial updates, like increment are not supported now?
Logs on this

Cannot get object id. Skip the current item: [{$set={ "userLikesCount" : 1}, _id=null}]
Oplog do not contain _id inside "o", but there is "o2" which contains
the _id of the document.

Do you think you could fix this?
Thanks

Links in home page talking about memory issues

Hello,

On the github homepage, there is:
"For the initial implementation see tutorial"
http://www.matt-reid.co.uk/blog_post.php?id=68#&slider1=4

On this link, at the end, in the comments, we can find:
"Matthew Reid · Norwich, Norfolk
Note to future readers. I have since come across memory problems with the mongodb river so have reverted back to manually re-indexing documents!"

I find it quite confusing to have that on the home page of the plugin.
If there are memory issues to know about the plugin, is it possible to explain them?
If there are not and it's a plugin misuse, can someone explain that misuse and remove that link please?

Identify index field name freely while using mongo river

Hi Richard,

While using mongo river, I can't identify the mapping properties' names freely, except using the collections' fields' names. I suggest mongo river support this feature so that users can name the properties names freely or even identify which fields to river, which fields to exclude.

e.g. I have a collection 'users' which has two properties 'user_name' and 'user_age' in mongodb, while river 'users', I have to configure my mapping as below to make the river job run successfully:

{
"user" :
{
"properties" :
{
"user_name":
{
"type":"string",
"index_analyzer":"ansj",
"search_analyzer":"ansj",
"null_value" : "NA"
},
"user_age":
{
"type":"string",
"null_value" : "NA"
}
}
}
}

I hope mongo river can support identify properties names freely, in my case, if only I can identify properties names like 'username', 'age'. Or in some case, I only want to river 'user_name', with 'user_age' excluded.

Thanks,
Spancer

version containing SNAPSHOT dependencies

Version 1.6.0 does not build using maven as it depends on non existing de.flapdoodle.embed:de.flapdoodle.embed.mongo:jar:1.28-SNAPSHOT
Janusz

Automatic plugin deployment is broken with ES 0.20.2

Hi,

This command does not work anymore with ES 0.20.2:
plugin.bat -install richardwilly98/elasticsearch-river-mongodb/1.6.1

Use the command as temporary solution:
plugin.bat -url https://github.com/downloads/richardwilly98/elasticsearch-river-mongodb/elasticsearch-river-mongodb-1.6.1.zip -install river-mongodb

Thanks,
Richard.

failed to create river [mongodb][mongodb] in log file

I am unable to properly use the mongo-river plugin with elasticsearch.

I followed the instructions on the front page after creating a replicate set in mongo called myset and doign rs.instantiate() to mongo.

I changed the XGet call to include mongoindex instead of testmongo.

I kept getting: -> { "error":"IndexMissingException[[mongoindex] missing]","status":404}

I checked the myset.log file, and it contains the following:

[2012-08-02 01:52:05,434][INFO ][node ] [Decay] {0.19.8}[25663]: initializing ...
[2012-08-02 01:52:05,440][INFO ][plugins ] [Decay] loaded [], sites [river-mongodb]
[2012-08-02 01:52:06,534][INFO ][node ] [Decay] {0.19.8}[25663]: initialized
[2012-08-02 01:52:06,534][INFO ][node ] [Decay] {0.19.8}[25663]: starting ...
[2012-08-02 01:52:06,592][INFO ][transport ] [Decay] bound_address {inet[/127.0.0.1:9300]}, publish_address {inet[/127.0.0.1:9300]}
[2012-08-02 01:52:09,649][INFO ][cluster.service ] [Decay] new_master [Decay][M7x4p7G1Sr2U58D362eoyw][inet[/127.0.0.1:9300]], reason: zen-disco-join (elected_as_master)
[2012-08-02 01:52:09,700][INFO ][discovery ] [Decay] myset/M7x4p7G1Sr2U58D362eoyw
[2012-08-02 01:52:09,711][INFO ][http ] [Decay] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[/127.0.0.1:9200]}
[2012-08-02 01:52:09,712][INFO ][node ] [Decay] {0.19.8}[25663]: started
[2012-08-02 01:52:10,161][INFO ][gateway ] [Decay] recovered [2] indices into cluster_state
[2012-08-02 01:52:10,249][WARN ][river ] [Decay] failed to create river [mongodb][mongodb]
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [mongodb]
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:86)
at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:57)
at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:135)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:270)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:264)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:86)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.ClassNotFoundException: mongodb
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:72)
... 9 more

it seems to load the river-mongodb plugin but it gives me a warning that it cannot find mongodb for some reason. How do i get it to find it because i do have it working for other projects.

richardwilly98 / elasticsearch-river-mongodb Goto Github PK

elasticsearch-river-mongodb's Introduction

Issue Tracker

MongoDB River Plugin for ElasticSearch

Build status

License

Changelog

2.0.9

2.0.5

2.0.2

2.0.1

2.0.0

1.7.4

1.7.3

1.7.2

1.7.1

1.7.0

1.6.11

1.6.9

1.6.8

1.6.7

1.6.6

Building from master

Developing in Eclipse

elasticsearch-river-mongodb's People

Contributors

Stargazers

Watchers

Forkers

elasticsearch-river-mongodb's Issues

Recommend Projects

Recommend Topics

Recommend Org