tjake / solandra Goto Github PK

Solandra = Solr + Cassandra

License: Apache License 2.0

Java 81.11% Shell 3.51% JavaScript 15.36% Python 0.03%

solandra's Introduction

Solandra

Solandra is a real-time distributed search engine built on Apache Solr and Apache Cassandra.

At its core, Solandra is a tight integration of Solr and Cassandra, meaning within a single JVM both Solr and Cassandra are running, and documents are stored and disributed using Cassandra's data model.

Solandra makes managing and dynamically growing Solr simple(r).

For more information please see the wiki

####Project Status:####

Solandra is relatively stable, most of the common functionality used has been working and stable for users. I personally am no longer developing Solandra much beyond version changes and applying pull requests.

If you are looking for a supported Solr + Cassandra integration beyond what Solandra offers look at DataStax Enterprise Search
(Full disclosure I am a developer on that team).

I've written up how Solandra and DataStax Enterprise Search differ here.

####Requirements:####

Java >= 1.6 Cassandra >= 1.1 Solr >= 3.1

####Features:######

Supports most out-of-the-box Solr functionality (search, faceting, highlights)
Replication, sharding, caching, and compaction managed by Cassandra
Multi-master (read/write to any node)
Writes become available as soon as write succeeds
Easily add new SolrCores w/o restart across the cluster

####Getting started:####

The following will guide you through setting up a single node instance of Solandra.

From the Solandra base directory:

mkdir /tmp/cassandra-data
ant
cd solandra-app; bin/solandra

Now that Solandra is running you can run the demo:

cd ../../reuters-demo
./1-download_data.sh
./2-import_data.sh
While data is loading, open the file ./website/index.html in your favorite browser.

####Embedding in an existing cassandra distribution####

To use an existing Cassandra distribution perform the following steps.

Download your Cassandra distribution
Unzip it the directory of your choice
Run the following solandra ant task to deploy the necessary files into the unzipped dir

ant -Dcassandra={unzipped dir} cassandra-dist
You can now start Solr within Cassandra by using $CASSANDRA_HOME/bin/solandra command. Cassandra now takes two optional properties: -Dsolandra.context and -Dsolandra.port for the context path and the Jetty port.

####Limitations####

Solandra uses Solr's built in distributed searching mechanism. Most of its limitations are covered here:

http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

solandra's People

Contributors

Stargazers

Watchers

Forkers

mircouleur kumm jschappet rob7hunter eurospy yatirb hanst d5nguyenvan voidsling3r cybercent juliezhang andremi up1 thaingo ceocoder ecarnevale plleoes pombredanne jsalim cnauroth nkvoll ltullman grimes jasonjackson amorton rowdycarloon navjeetc jasonmk kryszard oledoledoff shitup dr10 milindparikh compfix pengzong1111 ash211 mecca831 mihaisoloi dnikku whereami noomerikal flychen50 codingismy11to7 mkjellman fohr mchorfa wyan0220 strategist922 bdobyns limeng05 vgupta mailmahee kewlcherry dfboy no8899 pgorla-zz gigfork nix4 miolini mfcardenas ccstartfish101 fsparv pigmon93 qimwang soenter scholarpallavi genman jspenc72 stephenzkang skeleton610 bigdbcloud avacore mrt lkjx77 duanshuaimin smalldirector seebeyond tomdev2008 mr-l7 rickycoder chaabni wickedjava haebin gpmattoo ramzi-alqrainy judevina piyushmattoo rowhit nikolayvoronchikhin jeffjirsa denghongdong jl9n csm pcengineer48 davidd2k yonglehou iosream fengshao0907 jango2015 vishwakarmarahul

solandra's Issues

NullPointerException in facet component

This error occurs for queries like this:

q=test&facet.date.start=NOW/DAY-6DAYS&facet.date.gap=%2B1DAY&facet.date.end=NOW/DAY%2B1DAY&fq=quality_i:[27+TO+*]&fq=dups_i:[*+TO+0]&fq=crt_b:"false"&f.dest_title_1_s.facet.mincount=1&facet=true&f.dest_title_1_s.facet.limit=12&facet.limit=10&facet.date={!ex%3Ddt}dt&f.tag.facet.mincount=2&wt=javabin&rows=15&version=1&f.tag.facet.limit=20&facet.sort=count&facet.query=dt:[NOW/HOURS-8HOURS+TO+*]&facet.query=dt:[*+TO+NOW/DAY-6DAYS]&facet.query=retw_i:[5+TO+*]&facet.query=retw_i:[20+TO+*]&facet.query=retw_i:[50+TO+*]&facet.query=dups_i:[*+TO+0]&facet.query=dups_i:[1+TO+*]&facet.query=quality_i:[*+TO+26]&facet.query=quality_i:[27+TO+*]&facet.query=url_i:[1+TO+*]&facet.query=url_i:0&start=0&facet.field=lang&facet.field=crt_b&facet.field=dest_title_1_s

This occurs only sometimes and only for the first query. I'll investigate if this is related due to the date facet, normal facet or facet query.

 ERROR SolrCore:139 - java.lang.NullPointerException
    at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:260)
    at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:230)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
    at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:169)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:133)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)   
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:722)   
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:404)      
    at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
    at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
    at org.mortbay.jetty.Server.handle(Server.java:324)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
    at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:828)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:514)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
    at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:451)

Numeric fields are not properly stored or indexed

Hi Jake,
Take a look at my fork, I've added tests from Uwe's numeric tests on the lucene core. Only a handful of tests appear to be working. I'll be correcting this in my fork and I'll let you know when I'm done.

JVM crashed hard while feeding

Although the data on disc is under 200 MB I got the following error indicating too few RAM or SWAP space when the JVM crashed hard:

A fatal error has been detected by the Java Runtime Environment:

java.lang.OutOfMemoryError: requested 536870920 bytes for Chunk::new. Out of swap space?

Internal Error (allocation.cpp:215), pid=3080, tid=1023273840
Error: Chunk::new

JRE version: 6.0_22-b04
Java VM: Java HotSpot(TM) Server VM (17.1-b03 mixed mode linux-x86 )
If you would like to submit a bug report, please visit:
http://java.sun.com/webapps/bugreport/crash.jsp

--------------- T H R E A D ---------------

Current thread (0x096d5400): JavaThread "CompilerThread1" daemon [_thread_in_native, id=3091, stack(0x3cf5e000,0x3cfdf000)]

Stack: [0x3cf5e000,0x3cfdf000], sp=0x3cfdc150, free space=1f83cfdf000k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x6a92d2]
V [libjvm.so+0x2b27cf]
V [libjvm.so+0x12e08c]
V [libjvm.so+0x12e586]
V [libjvm.so+0x5aa97b]
V [libjvm.so+0x5aa6ff]
V [libjvm.so+0x5acd3f]
V [libjvm.so+0x5ac780]
V [libjvm.so+0x27b71b]
V [libjvm.so+0x278093]
V [libjvm.so+0x2097b7]
V [libjvm.so+0x280fdc]
V [libjvm.so+0x280889]
V [libjvm.so+0x66ff16]
V [libjvm.so+0x6695fe]
V [libjvm.so+0x57a8fe]
C [libpthread.so.0+0x580e]

Current CompileTask:
C2:2085 org.apache.cassandra.db.ColumnIndexer.serializeInternal(Lorg/apache/cassandra/io/util/IIterableColumns;Ljava/io/DataOutput;)V (395 bytes)

--------------- P R O C E S S ---------------

Java Threads: ( => current thread )
0x09c68800 JavaThread "pool-6-thread-107" [_thread_blocked, id=3725, stack(0x3a9be000,0x3a9df000)]
0x09c68000 JavaThread "pool-6-thread-106" [_thread_blocked, id=3724, stack(0x3ca3a000,0x3ca5b000)]
0x09915800 JavaThread "btpool0-10" [_thread_blocked, id=3489, stack(0x3a93a000,0x3a95b000)]
0x09c79800 JavaThread "MultiThreadedHttpConnectionManager cleanup" daemon [_thread_blocked, id=3208, stack(0x3a9df000,0x3aa00000)]
0x0998e000 JavaThread "pool-5-thread-1" [_thread_blocked, id=3205, stack(0x3ad4d000,0x3ad6e000)]
0x098ba800 JavaThread "pool-4-thread-1" [_thread_blocked, id=3204, stack(0x3ad6e000,0x3ad8f000)]
0x3b8b3800 JavaThread "DestroyJavaVM" [_thread_blocked, id=3081, stack(0xb6b5b000,0xb6b7c000)]
0x3b8a2400 JavaThread "WRITE-/127.0.0.1" [_thread_blocked, id=3203, stack(0x3ad8f000,0x3adb0000)]
0x3b8a1400 JavaThread "WRITE-/127.0.0.1" [_thread_blocked, id=3202, stack(0x3adb0000,0x3add1000)]
0x3b8a0800 JavaThread "ACCEPT-peters-laptop/127.0.1.1" [_thread_in_native, id=3201, stack(0x3add1000,0x3adf2000)]
0x0986dc00 JavaThread "CompactionExecutor:1" [_thread_blocked, id=3200, stack(0x3adf2000,0x3ae13000)]
0x3b89c800 JavaThread "MiscStage:1" [_thread_blocked, id=3199, stack(0x3ae13000,0x3ae34000)]
0x3b89b000 JavaThread "MigrationStage:1" [_thread_blocked, id=3198, stack(0x3ae34000,0x3ae55000)]
0x3b899800 JavaThread "AntiEntropyStage:1" [_thread_blocked, id=3197, stack(0x3ae55000,0x3ae76000)]
0x3b898000 JavaThread "GossipStage:1" [_thread_blocked, id=3196, stack(0x3ae76000,0x3ae97000)]
0x3b896800 JavaThread "StreamStage:1" [_thread_blocked, id=3195, stack(0x3ae97000,0x3aeb8000)]
0x3b895000 JavaThread "InternalResponseStage:2" [_thread_blocked, id=3194, stack(0x3aeb8000,0x3aed9000)]
0x3b893c00 JavaThread "InternalResponseStage:1" [_thread_blocked, id=3193, stack(0x3aed9000,0x3aefa000)]
0x3b892400 JavaThread "RequestResponseStage:2" [_thread_blocked, id=3192, stack(0x3aefa000,0x3af1b000)]
0x3b890c00 JavaThread "RequestResponseStage:1" [_thread_blocked, id=3191, stack(0x3af1b000,0x3af3c000)]
0x3b88f400 JavaThread "ReadStage:32" [_thread_blocked, id=3190, stack(0x3af3c000,0x3af5d000)]
0x3b88dc00 JavaThread "ReadStage:31" [_thread_blocked, id=3189, stack(0x3af5d000,0x3af7e000)]
...

0x097bac00 JavaThread "ReadRepair:1" [_thread_blocked, id=3114, stack(0x3bb51000,0x3bb72000)]
0x3b837000 JavaThread "Thread-16" [_thread_in_native, id=3113, stack(0x3bb86000,0x3bba7000)]
0x3b839000 JavaThread "pool-2-thread-1" [_thread_blocked, id=3112, stack(0x3bba7000,0x3bbc8000)]
0x3b843800 JavaThread "pool-1-thread-1" [_thread_blocked, id=3111, stack(0x3b9cc000,0x3b9ed000)]
0x0995fc00 JavaThread "Timer-2" daemon [_thread_blocked, id=3110, stack(0x3c64e000,0x3c66f000)]
0x3cb48400 JavaThread "btpool0-9" [_thread_blocked, id=3108, stack(0x3c8df000,0x3c900000)]
0x3cd0d400 JavaThread "btpool0-8" [_thread_in_native, id=3107, stack(0x3ca19000,0x3ca3a000)]
0x3cd88400 JavaThread "btpool0-6" [_thread_in_native, id=3105, stack(0x3ca5b000,0x3ca7c000)]
0x3cc59800 JavaThread "btpool0-5" [_thread_in_native, id=3104, stack(0x3ca7c000,0x3ca9d000)]
0x3cdfa400 JavaThread "btpool0-4" [_thread_blocked, id=3103, stack(0x3ca9d000,0x3cabe000)]
0x3cc30000 JavaThread "btpool0-3" [_thread_blocked, id=3102, stack(0x3cabe000,0x3cadf000)]
0x3ccadc00 JavaThread "btpool0-2 - Acceptor0 [email protected]:8983" [_thread_in_native, id=3101, stack(0x3cadf000,0x3cb00000)]
0x0979f000 JavaThread "btpool0-1" [_thread_blocked, id=3100, stack(0x3ce00000,0x3ce21000)]
0x09877000 JavaThread "btpool0-0" [_thread_blocked, id=3099, stack(0x3ce21000,0x3ce42000)]
0x0978a000 JavaThread "Timer-1" daemon [_thread_blocked, id=3098, stack(0x3ce42000,0x3ce63000)]
0x09780c00 JavaThread "Timer-0" daemon [_thread_blocked, id=3097, stack(0x3ce63000,0x3ce84000)]
0x3d0fb000 JavaThread "RMI TCP Accept-0" daemon [_thread_in_native, id=3095, stack(0x3cf1c000,0x3cf3d000)]
0x3d0f7000 JavaThread "RMI TCP Accept-8084" daemon [_thread_in_native, id=3094, stack(0x3cf3d000,0x3cf5e000)]
0x3d08fc00 JavaThread "RMI TCP Accept-0" daemon [_thread_in_native, id=3093, stack(0x3d11e000,0x3d13f000)]
0x3d001000 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=3092, stack(0x3d153000,0x3d174000)]
=>0x096d5400 JavaThread "CompilerThread1" daemon [_thread_in_native, id=3091, stack(0x3cf5e000,0x3cfdf000)]
0x096d3c00 JavaThread "CompilerThread0" daemon [_thread_blocked, id=3090, stack(0x3d174000,0x3d1f5000)]
0x096d2000 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=3089, stack(0x3cfdf000,0x3d000000)]
0x096d0800 JavaThread "Surrogate Locker Thread (CMS)" daemon [_thread_blocked, id=3088, stack(0x3d1f5000,0x3d216000)]
0x096bd000 JavaThread "Finalizer" daemon [_thread_blocked, id=3087, stack(0x3d25c000,0x3d27d000)]
0x096bb800 JavaThread "Reference Handler" daemon [_thread_blocked, id=3086, stack(0x3d27d000,0x3d29e000)]

Other Threads:
0x096b9000 VMThread [stack: 0x3d29e000,0x3d31f000] [id=3085]
0x3d0fd400 WatcherThread [stack: 0x3ce9b000,0x3cf1c000] [id=3096]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap
par new generation total 29504K, used 23569K [0x41910000, 0x43910000, 0x43910000)
eden space 26240K, 84% used [0x41910000, 0x42e9a248, 0x432b0000)
from space 3264K, 46% used [0x432b0000, 0x4342a1c8, 0x435e0000)
to space 3264K, 0% used [0x435e0000, 0x435e0000, 0x43910000)
concurrent mark-sweep generation total 1769472K, used 1041301K [0x43910000, 0xaf910000, 0xaf910000)
concurrent-mark-sweep perm gen total 34468K, used 20701K [0xaf910000, 0xb1ab9000, 0xb3910000)

Dynamic libraries:
08048000-08052000 r-xp 00000000 08:07 394700 /usr/lib/jvm/java-6-sun-1.6.0.22/jre/bin/java
08052000-08053000 rwxp 00009000 08:07 394700 /usr/lib/jvm/java-6-sun-1.6.0.22/jre/bin/java
095ee000-0a40c000 rwxp 00000000 00:00 0 [heap]
18ff9000-38c00000 rwxp 00000000 00:00 0

...

VM Arguments:
jvm_args: -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1759M -Xmx1759M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweep ... DEFAULT ... -Dshards.at.once=2
java_command: start.jar etc/jetty-logging.xml etc/jetty.xml
Launcher Type: SUN_STANDARD
--------------- S Y S T E M ---------------

OS:squeeze/sid

uname:Linux 2.6.31-22-generic #69-Ubuntu SMP Wed Nov 24 08:51:08 UTC 2010 i686
libc:glibc 2.10.1 NPTL 2.10.1
rlimit: STACK 8192k, CORE 0k, NPROC infinity, NOFILE 1024, AS infinity
load average:3.57 2.73 2.29

CPU:total 2 (2 cores per cpu, 1 threads per core) family 6 model 23 stepping 10, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1

Memory: 4k page, physical 3602792k(104512k free), swap 7815580k(7647756k free)

vm_info: Java HotSpot(TM) Server VM (17.1-b03) for linux-x86 JRE (1.6.0_22-b04), built on Sep 15 2010 01:02:09 by "java_re" with gcc 3.2.1-7a (J2SE release)

time: Thu Jan 6 13:52:18 2011
elapsed time: 1896 seconds

Lucandra Index corruption

Hi Jake,
I am using Lucandra in my application.
We are creating some rules in our application and I index them on the fly in Lucandra.
As rules change, i need to update the corresponding index as well.
But after a few times, the index gets corrupted.
Scenario:
When i update the lucandra document using updateDocument(updateTerm,doc,analyzer) method.
It works for 1st , 2nd time, but it corrupts the indexed fields' value on 3rd time.
I have verified that the value which i am inserting (on 3rd time ) is correct.
I am using a web based UI tool (CassUI) ) for Cassandra to see/verify the DB values in Lucandra document.
So , although i can see the value is lying in the lucandra document.
But when i query for the same, i do not get any result.
The indexSearcher is not able to search that document( With my query mentioning the value inserted on 3rd time).
On debugging i verified that proper indexed value is getting inserted in the document.
But somehow something else is getting corrupted, because the indexSearcher is not able to find that term in the document.

Is there a way to diagnose corrupt index??
Please help , i am stuck with this problem !!

Only 100 fields returned in query

I am implementing some military data formats as solr schemas. The GMTI format has over 200 fields. Many of these fields are multivalued. I created the schema file and put data in the database, no problem. I can query on any of the fields, and I will get a hit if the query condition matches the data.

The document that is returned is truncating to 100 fields. I have tried calling setFaceLimit and setRows to a number much higher than 100, but that doesn't help.

Is there some parameter I have to set to retrievve more than 100 fields? I know the fields are in the document that was saved because I can search on them.

Help!

tjake, please help me

hello, tjake,
I am using the Lucandra, but I find that, when i search some data through BooleanQuery, i
can not get all data that satisfied the BooleanQuery, it always lose one. please tell me why?

README CF definitions not up to date

The following CF definition appears to be required, but is not mentioned in the README:

  <ColumnFamily CompareWith="BytesType" ColumnType="Super" Name="TermInfo"/>

TermQuery Search leads to exception "termFrequency is missing from supercolumn"

I'm trying to insert a url and then search for it again using a TermQuery search. The code below, which I run right inside the BookmarksDemo, results in a "termFrequency is missing from supercolumn" exception in the TopDocs line.

    Document doc = new Document();
    doc.add(new Field("url", "http://www.google.com",
                      Field.Store.YES, Field.Index.NOT_ANALYZED));
    indexWriter.addDocument(doc, analyzer);
    TermQuery tq = new TermQuery(new Term("url", "http://www.google.com"));
    TopDocs topDocs = indexSearcher.search(tq, 10);

Note that if you change the url you are searching for to, say, "http://www.amazon.com", the code finds 0 results without an exception (this is correct since amazon isn't indexed).

error on ./run-demo search

Hi,

I setup lucandra and cassandra 0.6, and was able to index successfully, but searching for linux throws this error

z3r0c001@legolas:~/workspace/Libraries/Lucandra$ ./run_demo.sh -search title:linux
InvalidRequestException(why:start key's md5 sorts after end key's md5. this is not allowed; you probably should not specify end key at all, under RandomPartitioner)

but searching for amazon returns normal results,

z3r0c001@legolas:~/workspace/Libraries/Lucandra$ ./run_demo.sh -search title:amazon
15:53:16,707 DEBUG LucandraTermEnum:199 - Found 2 keys in range:bookmarksÿÿtitleÿÿamazon to bookmarksÿÿtitlf in 39ms
15:53:16,710 DEBUG LucandraTermEnum:213 - titleÿÿamazon has 2
15:53:16,711 DEBUG LucandraTermEnum:213 - tagsÿÿachieving has 1
15:53:16,712 DEBUG LucandraTermEnum:248 - loadTerms: bookmarksÿÿtitleÿÿamazon(3) took 45ms
15:53:16,713 INFO IndexReader:135 - docFreq() took: 47ms
15:53:16,716 DEBUG LucandraTermEnum:154 - Found bookmarksÿÿtitleÿÿamazon in cache
Search matched: 2 item(s)
15:53:16,744 DEBUG IndexReader:193 - Document read took: 26ms

Setting up an Amazon EC2 server with Fedora Core 5 at Notes from a messy desk
http://woss.name/2006/09/19/setting-up-an-amazon-ec2-server-with-fedora-core-5/
15:53:16,749 DEBUG IndexReader:193 - Document read took: 5ms
Amazon Web Services Developer Connection : HOWTO Building a self-bundling Debian ...
http://developer.amazonwebservices.com/connect/thread.jspa?threadID=17046

pls let me know if I can help.
thanks
z3r0

ArrayIndexOutOfBoundsException when using MoreLikeThis functionality in lucene.

I have some test cases that reproduce the problem, but it appears an empty field will screw of the lucandra caching. notice if I call clearCache() the test code will run correctly.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.lucene.search.TermScorer.score(TermScorer.java:130)
at org.apache.lucene.search.BooleanScorer$BooleanScorerCollector.collect(BooleanScorer.java:80)
at org.apache.lucene.search.BooleanScorer.nextDoc(BooleanScorer.java:313)
at org.apache.lucene.search.BooleanScorer.score(BooleanScorer.java:330)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:207)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:168)
at org.apache.lucene.search.Searcher.search(Searcher.java:98)
at org.apache.lucene.search.Searcher.search(Searcher.java:108)
at ReproduceBug.findSimilarItems(ReproduceBug.java:133)
at ReproduceBug.main(ReproduceBug.java:99)

Sort results fails

Hello, it's impossible to sort results with a huge database (many millions of entries).
Exception in thread "main" java.lang.IllegalStateException: numDocs reached.
And I don't really understand the use of "numDocs" in the IndexReader...
Thanx for your help

Solandra: Issue with shards

We've been able to setup our index in the Solandra edition (Lucandra and Cassandra in the same JVM) but once the index shards our queries no longer work. In the SolandraComponent I see the following line: String shard = addr.getHostAddress() + ":8983/solr/" + indexName + "~" + i; which references a solr instance in jetty, however that no longer appears to be part of the build.

Is this a mistake or is it expected that a solr instance is installed along side the solandra webapp ?

Thanks in advance.

Quote search terms can cause search failure

Searching for a multi-word phrase in double quotes yields a null. Below I've modified the bookmarks demo to show you how to reproduce the error. On my machine (Mac, Snow Leopard) with the newest Lucandra (from today), I see A and B print, but not C. I think an error is being swallowed up, as at the very end I get a "null" printed. Note that if I change the phrase to ""hello world"" or something else that doesn't appear in a document, everything goes fine--"C" is printed, and I see "Search matched: 0 item(s)" at the end.

    QueryParser qp = new QueryParser(Version.LUCENE_CURRENT, "title", analyzer);
System.out.println("A");
    Query q = qp.parse("\"set up\"");
    System.out.println("B");
    TopDocs docs = indexSearcher.search(q, 10);
    System.out.println("C");
    ...

ArrayIndexOutofBoundsException

Hello Jake,

This use case doesn't work if you add it in the LucantdraTest Class. Could you have a look please?

public void testReproduceScoringError() throws Exception {

    IndexReader indexReader = new IndexReader(indexName, client);



    // Explicit call to clearCache before of the ThreadLocal shared

    // instance of cache... 

    indexReader.clearCache();



    for (int i = 0; i < 10; i++) {

        Document doc1 = new Document();

        doc1.add(new Field("newkey", "newkey"+i, Field.Store.YES, Field.Index.ANALYZED, TermVector.WITH_POSITIONS_OFFSETS));

        doc1.add(new Field("category", "category1", Field.Store.YES, Field.Index.ANALYZED, TermVector.WITH_POSITIONS_OFFSETS));

        indexWriter.addDocument(doc1, analyzer);

    }



    IndexSearcher searcher = new IndexSearcher(indexReader);



    QueryParser qp = new QueryParser(Version.LUCENE_CURRENT, "newkey", analyzer);

    Query q = qp.parse("newkey:newkey1 OR category:category1");

    TopDocs docs  = searcher.search(q, 100);

    assertEquals(10, docs.totalHits);



}

Lucandra slows down when the value of a term is often the same

Hello tjake,

When the value of a term often occurs in a database Lucandra slows down. I' ve got a database with two categories (it's a Term) and many millions of others Terms. If I want to filter over one of the two categories Lucandra fetch half the database because of the Cassandra "sliceRange".
Any idea to resolve this issue?
Thanx.

ArrayIndexOutOfBoundsException in IndexReader

At line 441 it should be:
...
while(norms.length <= idx && norms.length < numDocs ){
...
otherwise you will get a ArrayIndexOutOfBounds at norms.length = 1024

solr not working on integers?

I posted the default xmls solr examples xml's
I tried to sort on the int field, it's not working "sort=popularity desc"
range queries on the int field is not working also, popularity:[0 TO 10]

bug in CassandraUtils.java, line 191

--- intArray[idx] = (b[i] << 24) + ((b[i + 1] & 0xFF) << 16) + ((b[i + 2] & 0xFF) << 8) + (b[i + 3] & 0xFF);
+++ intArray[idx++] = (b[i] << 24) + ((b[i + 1] & 0xFF) << 16) + ((b[i + 2] & 0xFF) << 8) + (b[i + 3] & 0xFF);

zoie support

Hi,
"Zoie is a real-time search and indexing system built on Apache Lucene." http://sna-projects.com/zoie/

Zoie is much more efficient in indexing, so it's more suitable in this kind of operations. LinkedIn is using it (and developed). It'll be perfect if you support zoie or replace lucene with it.

Best Regards,

Very large index size with Lucandra

Hi,

I recently shifted from Lucene 2.4.1 to Lucandra. What I noticed is that for the same data earlier the index size was way smaller than the total data size should by nodetool for the Lucandra host. I did do a flush before check the size of commitlog and data folder on Lucandra host. Earlier the index size used to be 100MB whereas for same operation the data size on Lucandra is in GBs.

Is this normal? What are the typical index sizes that others have seen on Lucandra?

Thanks!

Log4j bridge missing from ivy

I might have missed something - apologies if so - but it looks like you need the log4j bridge in ivy. Here is a patch

diff --git a/ivy.xml b/ivy.xml
index 605c8c1..153a705 100644
--- a/ivy.xml
+++ b/ivy.xml
@@ -23,6 +23,7 @@
 <dependency org="junit" name="junit" rev="4.6" conf="* -> *,!sources,!javadoc" />
 <dependency org="org.slf4j" name="slf4j-api" rev="1.5.8" conf="* -> *,!sources,!javadoc"/>
 <dependency org="org.slf4j" name="slf4j-simple" rev="1.5.8" conf="* -> *,!sources,!javadoc"/>
+    <dependency org="org.slf4j" name="log4j-over-slf4j" rev="1.5.8" conf="* -> *,!sources,!javadoc
 <dependency org="commons-lang" name="commons-lang" rev="2.4" conf="* -> *,!sources,!javadoc"/>
 <dependency org="org.apache.solr" name="solr-core" rev="1.4.0" conf="* -> *,!sources,!javadoc"
 </dependencies>

Sort doesn't work correctly in multi-thread.

Hi, tjake

I met an issue that sort does not work correctly when I uses lucandra in multi-thread.

The main reason is that IndexReader uses ThreadLocal.

The Lucene caches a field in sorting. It uses the instance of IndexReader to get caches like the following: (copied form FieldCacheImpl)

// inherit javadocs
public int[] getInts(IndexReader reader, String field, IntParser parser)
throws IOException {
return (int[]) caches.get(Integer.TYPE).get(reader, new Entry(field, parser));
}

Since I share the instance of IndexReader in several threads, this code returns the same field cache. However, the data of IndexReader differs in threads. So the Lucene may use unintended cache, which is created in another thread. This causes incorrect sorting.

The supportive care of this bug is not to share IndexReader in several threads. However, the Lucene says IndexReader is thread-safe, so I think the IndexReader of lucandra should be fixed.

log4j missing

when trying the steps to get the bookmark demo running, I need to put the "log4j-1.2.16.jar" in the lib/ directory for it to compile successfully.
This might be worth adding as a dependency

updateDocument call without deleting document

Currently updateDocument works like this,

deleteDocument(doc);
addDocument(doc);

I was trying to see if update can be make to work like this,

updateDocument(doc, analyzer) {
foreach(field in doc)
if(field exists)
{
delete its columns from TermVector CF
add newly tokenized terms under same docId
}
else
{
add newly tokenized terms under same docId
}

changing update to this, will save a lot on performance,

no more parsing entire document object for updating one field
changes to cassandra will be a lot smaller

thanks,

}

Untitled

NullPointerExceptions when using dismax

When using the demo:

change dismax to this (so that all fields can be found)

dismax
explicit
0.01

text^0.5 title^2

2<-1 5<-2 6<90%

100
:
feed the reuters demo schema.xml (not the data)
http://localhost:8983/solandra/reuters/select?q=&qt=dismax
=>

java.lang.NullPointerException
at lucandra.LucandraAllTermDocs.fillDocBuffer(LucandraAllTermDocs.java:150)
at lucandra.LucandraAllTermDocs.(LucandraAllTermDocs.java:63)
at lucandra.IndexReader.termDocs(IndexReader.java:404)
at org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:320)
at org.apache.lucene.search.MatchAllDocsQuery$MatchAllScorer.(MatchAllDocsQuery.java:55)
at org.apache.lucene.search.MatchAllDocsQuery$MatchAllDocsWeight.scorer(MatchAllDocsQuery.java:128)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:169)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:133)

Now sometimes the exception does not occur after data feeding. But the
initial query always failed in my tests.

When I'm querying while feeding sometimes other exception (randomly?)
occured
E.g.
http://localhost:8983/solandra/reuters/select?q=text&qt=dismax
=>

    java.lang.NullPointerException
at lucandra.LucandraTermEnum.skipTo(LucandraTermEnum.java:87)
at lucandra.LucandraTermDocs.seek(LucandraTermDocs.java:115)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1103)
at lucandra.IndexReader.termDocs(IndexReader.java:406)
at org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:320)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:67)
at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:144)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:169)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:133)

and after refresh this one happened:

java.lang.NullPointerException
at lucandra.LucandraTermEnum.docFreq(LucandraTermEnum.java:99)
at lucandra.IndexReader.docFreq(IndexReader.java:177)
at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308)
at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:147)
at org.apache.lucene.search.Similarity.idfExplain(Similarity.java:765)
at org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:46)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146)
at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.(DisjunctionMaxQuery.java:106)
at org.apache.lucene.search.DisjunctionMaxQuery.createWeight(DisjunctionMaxQuery.java:177)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:184)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:415)
at org.apache.lucene.search.Query.weight(Query.java:99)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:169)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:133)

[RuntimeException: termFrequency is missing from supercolumn] on wildcard search in Lucandra

Hi Jake,

I am using Lucandra for indexing. I am adding a field to a document as follows:

document.add(new Field("fieldName", "val!IN_20002001!IN_20002002!OUT_20002003!OUT_20002004", Store.NO, Index.NOT_ANALYZED));

Along with the field above, I have added a bunch of other fields to the document and then indexed it using the indexWriter as follows:

indexWriter.addDocument(document, analyzer);

So when I do a query such as following one:
"fieldName:val_"
I am getting the exception mentioned in the title above.
and if I do a query like
"fieldName:val_IN_2000*"
I don't get an exception but I get 0 matching results.
Am I doing something wrong?

FYI... I am using OrderPreservingPartitioner in Cassandra.

Query on Lucandra usage

Hi,

I have just started using Lucandra for my project. I need to index data for users. The index update and search mix is 60-40%. I am expecting around 10-20M update and 10-12M search operation per day. I would like to understand what is the right way to use Lucandra.

I am using UserId as the index name parameter to IndexWriter and IndexReader. There may be approx 1M users. Is this okay to use UserId as indexname?
Should we create new IndexWriter and IndexReader for every update and search? Note since we are using userId as Indexname we cannot keep all IndexWriter and IndexReader instances open all the time.
Should we always invoke reopen on the index reader before every search?
What would the optimal Cassandra config to start with for Lucandra? I understand some parameters will need to be configured based on usage.
Any other recommendations?

Thanks,
Naren

only one value gets stored on multivalued fields

Please forgive me if I'm totally wrong, but I'm not much of a Solr or Java guy. I ran through your getting started and it seems as though multivalued fields like category only store one value even though the document had many. It seems as though it still matches on all of the values, but only stores one. Is this desired? I guess I would prefer to have all the values there so i didn't have to go hit another source...

Thanks for your hard work!

readme should link the sematext article as background material

http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/

Problem with accents

I am developing for the pt_BR locale (lots of ã,õ,ç,á,...).
Searching words with these diacritic marks is Not working!!

With plain lucene I was using the following analyzer without problems:

StandardAnalyzer defaultAnalyzer = new StandardAnalyzer(Version.LUCENE_30) {
@OverRide
public TokenStream tokenStream(String fieldName, Reader reader) {
return new ASCIIFoldingFilter(super.tokenStream(fieldName,reader));
}
};

PerFieldAnalyzerWrapper analyser = new PerFieldAnalyzerWrapper(defaultAnalyzer);
analyser.addAnalyzer(TAGS.getFieldName(), new CommaAnalyzer());
analyser.addAnalyzer(StandardFieldEnum.PARENTS.getFieldName(),new CommaAnalyzer());

return analyser;

Please ignore the PerFiledAnalyzer, it is there just to be completly honest :).

Any Ideas?
Thanks.

duplicate term key additions

IndexWriter
public void addDocument(Document doc, Analyzer analyzer) {
...
// Indexed field
line 97:
if (field.isIndexed()) {
...
}
//Untokenized fields go in without a termPosition
line 220:
if (field.isIndexed() && !field.isTokenized()) {

If the field is indexed but not analyzed, then the term key will be added twice. It looks to me line 97 should be changed to
if (field.isIndexed() && field.isTokenized()) {

Please verify.

Thanks,
Chad

delete doesn't always delete

If I change the testDelete unit test to:

public void testDelete() throws Exception {

    IndexReader indexReader = new IndexReader(indexName, client);
    IndexSearcher searcher = new IndexSearcher(indexReader);

    QueryParser qp = new QueryParser(Version.LUCENE_30, "key", analyzer);
    Query q = qp.parse("+key:\u5639\u563b");

    TopDocs docs = searcher.search(q, 10);

    assertEquals(1, docs.totalHits);


    indexWriter.deleteDocuments(new Term("key",new String("\u5639\u563b")));

    QueryParser qp2 = new QueryParser(Version.LUCENE_30, "key", analyzer);
    Query q2 = qp2.parse("+key:\u5639\u563b");

    TopDocs docs2 = searcher.search(q2, 10);

    ScoreDoc scoreDoc = docs.scoreDocs[0];   
    Document doc = searcher.doc(scoreDoc.doc);

    assertEquals(0, doc.getFields().size());

    assertEquals(0, docs2.totalHits);
}

I get the unittest to fail. I have not honestly looked at the details (hopefully I will have time in a few days), however I can guess that this is due to the fact that Cassandra is not deleting the KEY underneath (but I can be completely wrong). Note that the Document that is returned has no Fields to it. I would think that these should be removed from the list of items returned in TopDocs.

Also possible I don't understand something basic in Lucene itself.

upgrade to lucene 3.0 perhaps

I'm not sure if upgrading to lucene 3.0 will benifit or not, but if you are interested may be I can help out

Can't delete all

org.apache.solr.common.SolrException: can't delete all: delete:query=*:*,fromPending=true,fromCommitted=true at solandra.SolandraIndexWriter.deleteByQuery(SolandraIndexWriter.java:220)

trying to use lucandra for existing solr index and getting this error when running import command

Lookint at the code of lucandra it is coded to throw this error . Any way to work around this?

SolrReopenComponent Exception

null

java.io.IOException
at lucandra.IndexReader.document(IndexReader.java:281)
at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:259)
at solandra.SolandraReopenComponent.process(SolandraReopenComponent.java:98)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

How to reproduce:

Launch 2 instances of cassandra forming a 2 node cluster ring
Start solandra on both nodes
./post.sh something from node A
query solr for that something from node B (works)
kill node A's cassandra instance
repeat the same query from node B (throws an exception) - The data can be retrieved from node B through cassandra, but not through solandra

<SavedCachesDirectory> is missing from bundled Cassandra config

Without in the configuration, the current stable 0.6 release fails to start.

Solution/workaround:
Copy the entry from Cassandra's default configuration.

after ingesting for about 20 minutes solandra throws protocol violation error

On Client Side

xml error - The operation has timed out
xml error - The server committed a protocol violation. Section=ResponseStatusLin
e

On server side,

172.16.5.11

10:27:14,426 INFO GCInspector:184 - Lucandra.TermInfo 10:27:14,432 INFO CassandraIndexManager:148 10:27:14,436 INFO SSTableTracker:65 - saving 10:27:46,829 INFO GCInspector:133 - GC for 10:27:46,829 INFO GCInspector:146 - Pool Name 10:27:46,830 INFO GCInspector:161 - ReadStage 10:27:46,830 INFO GCInspector:161 - RequestResponseStage 10:27:46,830 INFO GCInspector:161 - ReadRepair 10:27:46,831 INFO GCInspector:161 - MutationStage 10:27:46,831 INFO GCInspector:161 - GossipStage 10:27:46,831 INFO GCInspector:161 - AntiEntropyStage 10:29:17,128 INFO UpdateRequestProcessor:171 10:29:17,131 INFO GCInspector:161 - MigrationStage 10:29:17,131 INFO GCInspector:161 - StreamStage 10:29:17,132 INFO GCInspector:161 - MemtablePostFlusher 10:29:17,132 INFO GCInspector:161 - FlushWriter 10:29:17,132 INFO GCInspector:161 - MiscStage 10:29:17,132 INFO GCInspector:161 - FlushSorter 10:29:17,132 INFO GCInspector:161 - InternalResponseStage 10:29:17,133 INFO GCInspector:161 - HintedHandoff 10:29:17,133 INFO GCInspector:165 - CompactionManager 10:29:17,133 INFO GCInspector:177 - MessagingService 10:29:17,133 INFO GCInspector:181 - ColumnFamily 10:29:17,133 INFO GCInspector:184 - system.LocationInfo 10:29:17,134 INFO GCInspector:184 - system.HintsColumnFamily 10:29:17,134 INFO GCInspector:184 - system.Migrations 10:29:17,134 INFO GCInspector:184 - system.Schema 10:29:17,134 INFO GCInspector:184 - system.IndexInfo 10:29:17,134 INFO GCInspector:184 - Skunk.MIDMeta 10:29:17,135 INFO GCInspector:184 - Skunk.TimeMID 10:29:17,135 INFO GCInspector:184 - Skunk.Emails 10:29:17,135 INFO GCInspector:184 - Skunk.MetaTags 10:29:17,146 INFO GCInspector:184 - L.TI 10:29:17,146 INFO GCInspector:184 - L.Docs 10:29:17,147 INFO GCInspector:184 - L.SI 10:29:17,147 INFO GCInspector:184 - L.TL 10:29:17,147 INFO GCInspector:184 - Lucandra.Documents 10:29:17,147 INFO GCInspector:184 - Lucandra.TermInfo 10:29:17,148 WARN MessagingService:545 - 10:29:17,156 WARN MessagingService:545 - 10:29:17,160 INFO GCInspector:146 - Pool Name 10:29:17,161 INFO GCInspector:161 - ReadStage 10:29:17,163 INFO GCInspector:161 - RequestResponseStage 10:29:17,164 INFO GCInspector:161 - ReadRepair 10:29:17,164 INFO GCInspector:161 - MutationStage 10:29:17,164 INFO GCInspector:161 - GossipStage 10:29:17,165 INFO GCInspector:161 - AntiEntropyStage 10:29:17,165 INFO GCInspector:161 - MigrationStage 10:29:17,165 INFO GCInspector:161 - StreamStage 10:29:17,165 INFO GCInspector:161 - MemtablePostFlusher 10:29:17,166 INFO GCInspector:161 - FlushWriter 10:29:17,166 INFO GCInspector:161 - MiscStage 10:29:17,166 INFO GCInspector:161 - FlushSorter 10:29:17,166 INFO GCInspector:161 - InternalResponseStage 10:29:17,167 INFO GCInspector:161 - HintedHandoff 10:29:17,167 INFO GCInspector:165 - CompactionManager 10:29:17,167 INFO GCInspector:177 - MessagingService 10:29:17,167 INFO GCInspector:181 - ColumnFamily 0,0 0/1000 0/10000000
- ShardInfo for htmlsni has expired
system-LocationInfo-KeyCache for LocationInfo of system
ConcurrentMarkSweep: 14490 ms, 84168 reclaimed leaving 6395477472 used; max is 6552551424
Active Pending
0 0
0 0
0 0
18 20
0 0
0 0
- {} 0 812628
0 0
0 0
1 1
1 1
0 0
0 0
0 0
0 0
n/a 76
n/a 141,202
Memtable ops,data Row cache size/cap Key cache size/cap
0,0 0/0 1/1
0,0 0/0 1/1
0,0 0/0 1/1
0,0 0/0 1/1
0,0 0/0 0/1
0,0 0/1000 0/10000000
0,0 0/1000 0/10000000
0,0 0/1000 0/100000
0,0 0/1000 0/1000000
364185,7283808 0/1000 0/10000000
2160,206878593 0/1000 0/10000000
138623,6786771 1000/1000 20/100000
1233280,37118783 0/1000 0/1000000
0,0 0/1000 0/10000000
0,0 0/1000 0/10000000
Dropped 424 MUTATION messages in the last 5000ms
Dropped 4 REQUEST_RESPONSE messages in the last 5000ms
Active Pending
0 0
0 0
0 0
32 38
1 81
0 0
0 0
0 0
1 1
1 1
0 0
0 0
0 0
0 0
n/a 76
n/a 0,0
Memtable ops,data Row cache size/cap Key cache size/cap

ON 172.16.5.12

10:27:53,776 INFO SolrCore:1324 - [htmlsni] webapp=/solandra path=/update params={commit=true} status=500 QTime=290298
10:27:53,777 ERROR SolrDispatchFilter:139 - java.lang.RuntimeException: Read command failed after 100 attempts
at lucandra.CassandraUtils.robustRead(CassandraUtils.java:390)
at lucandra.cluster.CassandraIndexManager.getShardInfo(CassandraIndexManager.java:188)
at lucandra.cluster.CassandraIndexManager.getNextId(CassandraIndexManager.java:328)
at solandra.SolandraIndexWriter.addDoc(SolandraIndexWriter.java:194)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:170)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:134)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:536)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:930)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:405)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:451)

.... til this time

10:29:17,257 INFO HintedHandOffManager:192 - Started hinted handoff for endpoint /172.16.5.11
10:29:17,258 INFO Gossiper:569 - InetAddress /172.16.5.11 is now UP
10:29:17,260 INFO HintedHandOffManager:248 - Finished hinted handoff of 0 rows to endpoint /172.16.5.11
10:29:17,363 INFO CassandraIndexManager:148 - ShardInfo for htmlsni has expired
10:29:17,403 INFO CassandraIndexManager:226 - htmlsni has 29 shards

OOM error searching in a bigdata set having a really smal value set.

I've added 350k documents through lucandra in the same index,
every document had a type field where 3 possible values are possible.
type:internal (approx 130k documents)
type:external (approx 200k documents)
type:advertorial approx 20k documents)

since the type field does not need to be tokenized, I'm indexing with Store.YES,Index.NOT_ANALYZED

so, when I try to do a search like "type:internal" it does work for a while but I get an exception like this after a few minutes:

10/04/28 14:40:34 INFO lucandra.IndexReader: docFreq() took: 16932ms
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.nio.ByteBuffer.wrap(ByteBuffer.java:350)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:137)
at java.lang.StringCoding.decode(StringCoding.java:173)
at java.lang.String.(String.java:443)
at java.lang.String.(String.java:515)
at lucandra.IndexReader.addDocument(IndexReader.java:375)
at lucandra.LucandraTermEnum.getTermDocFreq(LucandraTermEnum.java:287)
at lucandra.LucandraTermDocs.seek(LucandraTermDocs.java:106)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:814)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:74)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:205)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:168)
at org.apache.lucene.search.Searcher.search(Searcher.java:98)
at org.apache.lucene.search.Searcher.search(Searcher.java:108)

at LucandraTest.main(LucandraTest.java:74)

and for instance if I do search for limiting some other fields and say, I get 1000 hits as result, with a query like "foo:bar* AND blah:bla_"
if I add the type in the query "foo:bar_ AND blah:bla* AND type:internal" It takes a really long time and finally I get the same exception.

I'm basically using the same code in the bookmarks demo for indexing and searching the documents, so I didn't feel the need to post the sample code.

sort working so slow?

I have a data set which includes about 1M documets,

A search query with 3k results even with filtering works fine; however if I try to add a sort field like using the constructs
...
Sort sort = new Sort(new SortField("date", SortField.STRING));
...
TopDocs docs = searcher.search(q, filter, 10, sort);

it either hangs up for a while and does nothing or I get a timeout exception.

Isn't the sort working properly or am I missing something in the usage of sort?

Caching breaks sort order on second and consecutive queries.

The test that demonstrates this problem is an updated NumericRangeQuery. You can find the test here.

https://github.com/tnine/Lucandra/blob/0.7/test/lucandra/NumericRangeTests.java

I have your latest code as of today, the testDocumentOrderingSearchTwice is the failing test.

Untitled

Problems deleting documents using the Lucandra IndexWriter

========= SHORT =========

Problems with the Lucandra IndexWriter:

Calling the deleteDocuments(Query) method will at most delete 1000 documents
Passing a MatchAllDocsQuery to the deleteDocuments(Query) method will not remove any documents at all (since the Lucandra IndexSearcher won't return any hits using such query).
There are no deleteAll() method (as in Lucene 2.9.0 and later)

Request for improvements:

Introduce a deleteDocuments(Query query, int maxNumberOfDocumentsToDelete) method for IndexWriter
Ensure MatchAllDocsQuery will be accepted when using a Lucandra IndexSearcher
If possible - introduce a deleteAll() method for IndexWriter

========= LONG =========

We have an increasing number of indexes that contains lots of small documents. most of the fields contain arbitrary/unknown values. Some contain a known set of values. The documents are based on data stored in Cassandra.

Sometimes an index must be "synced" with the data currently stored in Cassandra. Just update/re-index the index using the data currently stored in Cassandra will just not do. Sure, lots of documents will be better "up to date" but the index will still contain obsolete/dirty data (data that no longer exist in Cassandra).

The preferred solution in most of our cases are to completely clear the index from all documents and then re-index it using the data currently stored in Cassandra. Lucene provides at least two ways to delete all documents from an index using a Lucene IndexWriter:

Call indexWriter.deleteAll() [introduced in Lucene 2.9.0 (sep 2009)]
Call IndexWriter.deleteDocuments(Query) and pass a MatchAllDocsQuery instance

None of them are supported when using the Lucandra IndexWriter.

To delete all documents using Lucandra, we first presumed it could be done like this:

Build a massive Boolean query with all known values for a specific field
Call IndexWriter.deleteDocuments(Query) passing the massive Boolean

But - we found out that this will do only if the index contains at most 1000 documents. The deleteDocuments(Query) method executes a search to find all documents to be removed (IndexWriter#271) and the search result will at most contain 1000 hits.

To delete all documents "for real" using Lucandra we have to:

Build a massive Boolean query with all known values for a specific field
Find out how many deletions that are neccesary by querying the index with a Lucandra Searcher and "enough" number of ”max hits to return”.
Call IndexWriter.deleteDocuments(Query) passing the massive Boolean query several times (result of step 2 / 1000 + 1)

In our opinion - the Lucandra indexWriter (and Lucandra IndexSearcher) have some important issues that need to be handled. See SHORT above for suggested improvements.

Externalize Cassandra Keyspace configuration

We're running several processes that connect to different keyspaces in a single virtual machine. With the current keyspace definition within the CassandraUtils as a static string, it is impossible for Readers and Writers to connect to different keyspaces, even though different Thrift api connections can be passed during instantiation.

solandra: sorting doesn't work after documents are in cache

The first time you do a given search with a sort order, it works correctly. If you run the same search immediately after this (while the docs are in cache), the results are out of order.

indexing on the local, and querying from the remote does not work

When I try to index on the localhost where cassandra instance is running and if I try to search from the remote host, I get an exception like this.

Exception in thread "main" java.lang.RuntimeException: invalid term format: -1 vYJoT????type????0
at lucandra.CassandraUtils.parseTerm(CassandraUtils.java:159)
at lucandra.LucandraTermEnum.loadTerms(LucandraTermEnum.java:245)
at lucandra.LucandraTermEnum.skipTo(LucandraTermEnum.java:89)
at lucandra.IndexReader.terms(IndexReader.java:390)
at org.apache.lucene.search.PrefixTermEnum.(PrefixTermEnum.java:41)
at org.apache.lucene.search.PrefixQuery.getEnum(PrefixQuery.java:45)
at org.apache.lucene.search.MultiTermQuery$ConstantScoreAutoRewrite.rewrite(MultiTermQuery.java:227)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:382)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:438)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:307)
at org.apache.lucene.search.Query.weight(Query.java:98)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
at org.apache.lucene.search.Searcher.search(Searcher.java:181)
at LucandraTest.main(LucandraTest.java:78)

NullPointerException when operating with a nonexistent Cassandra instance

If the Cassandra instance that holds the indexes used by Lucandra is down (or it doesn't exist), you get a NullPointerException when performing an operation (index, search, delete), instead of getting an Excception that let's you know what the problem in.

e.g:
java.lang.NullPointerException
at lucandra.LucandraTermEnum.loadTerms(LucandraTermEnum.java:236)
at lucandra.LucandraTermEnum.skipTo(LucandraTermEnum.java:88)
at lucandra.IndexReader.docFreq(IndexReader.java:149)
at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:147)
at org.apache.lucene.search.Similarity.idfExplain(Similarity.java:765)
at org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:46)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146)
at org.apache.lucene.search.Query.weight(Query.java:99)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
at org.apache.lucene.search.Searcher.search(Searcher.java:181)
at org.apache.lucene.search.Searcher.search(Searcher.java:191)
at lucandra.IndexWriter.deleteDocuments(IndexWriter.java:268)

Before that, you may see "WARN CassandraProxyClient Connection failed to Cassandra node: host:port" on the logs, but I think the NPE should be replaced with something more robust

Solr admin page features broken in Lucandra / Solr

We are using Solr within Lucandra. We downloaded tjake-Lucandra-3bd25a6.zip and did a build. We are running Windows 7. We have apache-cassandra-0.6.1.

We go to the admin page and try to use schema browser and get an error 500. We get the same error 500 when trying to click statistics.

We have used both of these features in apache-solr-1.4.0 without any errors.

Here is the error message we get when clicking Schema Browser:

Apr 28, 2010 11:40:03 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/admin/luke params={wt=json&show=schema} status=500 QTime=4
Apr 28, 2010 11:40:03 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at lucandra.LucandraTermEnum.next(LucandraTermEnum.java:108)
at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:453)
at org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:99)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
    at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
    at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
    at org.mortbay.jetty.Server.handle(Server.java:285)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
    at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
    at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Delimiter used for multi valued fields, please make it configurable

Hello, I see that you fixed Issue 7, by changing how multi valued fields work so that they get stored now by concatenating them with a space. Can you please make this delimiter configurable? Our data has spaces in it, so it is impossible to tell where one record ends and the next one begins.

the other option would be to return the xml as elements instead of the type which just concats everything together. I don't know if that is possible or not.

NumberFormatException in IndexWriter.addDocument

I am trying to deploy Solandra and it starts fine (I can go to admin) but I get an error when I try to insert a doc.
I post the exception in case someone already knows what's going on; I will narrow tomorrow to which field is causing the error and post an update. I don't think the problem is with the doc itself as the same code works fine against a regular solr.

SEVERE: java.lang.NumberFormatException: For input string: "<U+0080>Zàº°"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at org.apache.solr.util.NumberUtils.int2sortableStr(NumberUtils.java:36)
at org.apache.solr.schema.SortableIntField.toInternal(SortableIntField.java:52)
at org.apache.solr.schema.FieldType$DefaultAnalyzer$1.incrementToken(FieldType.java:320)
at lucandra.IndexWriter.addDocument(IndexWriter.java:135)
at lucandra.IndexWriter.updateDocument(IndexWriter.java:356)
at solandra.SolandraIndexWriter.addDoc(SolandraIndexWriter.java:133)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)