Coder Social home page Coder Social logo

clucene's Introduction

CLucene README
==============

------------------------------------------------------
CLucene is a C++ port of Lucene.
It is a high-performance, full-featured text search 
engine written in C++. CLucene is faster than lucene
as it is written in C++.
------------------------------------------------------

CLucene has contributions from many, see AUTHORS

CLucene is distributed under the GNU Lesser General Public License (LGPL) 
	*or*
the Apache License, Version 2.0
See the LGPL.license and APACHE.license for the respective license information.
Read COPYING for more about the license.

Installation
------------
* For Linux, MacOSX, cygwin and MinGW build information, read INSTALL.
* Boost.Jam files are provided in the root directory and subdirectories.
* Microsoft Visual Studio (6&7) are provided in the win32 folder.

Mailing List
------------
Questions and discussion should be directed to the CLucene mailing list
  at [email protected]  
Find subscription instructions at 
  http://lists.sourceforge.net/lists/listinfo/clucene-developers
Suggestions and bug reports can be made on our bug tracking database
  (http://sourceforge.net/tracker/?group_id=80013&atid=558446)

The latest version
------------------
Details of the latest version can be found on the CLucene sourceforge project
web site: http://www.sourceforge.net/projects/clucene

Documentation
-------------
Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/
You can also build your own documentation by running doxygen from the root directory
of clucene.
CLucene is a very close port of Java Lucene, so you can also try looking at the
Java Docs on http://lucene.apache.org/java/


Performance
-----------
Very little benchmarking has been done on clucene. Andi Vajda posted some 
limited statistics on the clucene list a while ago with the following results.

There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
6108kb of HTML text. 
org.apache.lucene.demo.IndexFiles with java and gcj: 
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
    . running with java 1.4.1_01-99 : 20379 ms
    . running with gcj 3.3.2 -O2    : 17842 ms
    . running clucene 0.8.9's demo  :  9930 ms 

I recently did some more tests and came up with these rough tests:
663mb (797 files) of Guttenberg texts 
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
• Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
• Clucene: 232141. peak mem usage ~60, avg ~4mb ram

Searching indexing using 10,000 single word queries
• Jlucene: ~60078ms and used ~13mb ram
• Clucene: ~48359ms and used ~4.2mb ram

Platform notes
--------------

'Too many open files'
Some platforms don't provide enough file handles to run CLucene properly.
To solve this, increase the open file limit:

On Solaris:
ulimit -n 1024
set rlim_fd_cur=1024

Acknowledgments
----------------

The Apache Lucene project is the basis for this software, so the biggest
acknoledgment goes to that project.

We wish to acknowledge the following copyrighted works that
make up portions of the CLucene software:

CLucene relies heavily on the use of autoconf and libtool to provide
a build environment.

clucene's People

Contributors

kalemas avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.