Coder Social home page Coder Social logo

txhong / luceneutil Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mikemccand/luceneutil

0.0 2.0 0.0 68.54 MB

Various utility scripts for running Lucene performance tests

Shell 0.03% Batchfile 0.01% Java 1.49% Python 1.39% JavaScript 92.48% HTML 2.13% CSS 2.46% GLSL 0.01% C 0.01%

luceneutil's Introduction

Luceneutil: Lucene benchmarking utilities

Setting up luceneutil

First, pick a root directory, under which luceneutil will be checked out, datasets exist, indices are built, Lucene source code is checked out, etc.. We'll refer to this directory as $LUCENE_BENCH_HOME here.

# 1. checkout luceneutil:
# Choose a suitable directory, e.g. ~/Projects/lucence/benchmarks.
mkdir $LUCENE_BENCH_HOME && cd $LUCENE_BENCH_HOME
git clone https://github.com/mikemccand/luceneutil.git util

# 2. Run the setup script
cd util
python src/python/setup.py -download

In the second step, the setup procedure creates all necessary directories in the clones parent directory and downloads a 6 GB compressed Wikipedia line doc file from an Apache mirror. If you don't want to download the large data file just remove the -download flag from the commandline.

After the download has completed, extract the lzma file in $LUCENE_BENCH_HOME/data.

Preparing the benchmark candidates

The benchmark compares a baseline version of Lucene to a patched one. Therefore we need two checkouts of Lucene, for example:

  • $LUCENE_BENCH_HOME/lucene_baseline: contains a complete svn checkout of Lucene, this is the baseline for comparison
  • $LUCENE_BENCH_HOME/lucene_candidate: contains a complete svn checkout of Lucene with some change applied that should be benchmarked against the baseline.

A trunk version of Lucene can be checked out with

cd $LUCENE_BENCH_HOME
svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk lucene_baseline

Adjust the command accordingly for lucene_candidate.

Running a first benchmark

setup.py has created two files: localconstants.py, and localrun.py in $LUCENE_BENCH_HOME/util/src/python/.

The file localconstants.py should be used to override any existing constants in constants.py, for example if you want to change the Java commandline used to run benchmarks. To run an inintal benchmark you don't need to modify this file.

Now you can start editing localrun.py to define your comparison, at the bottom near its __main__:

This file is a copy of example.py and should be used to define your comparisons. You don't have to build 2 separate indexes; you can make one and pass it to the two different competitors if you are only benching some code difference but not a file format change.

To run the benchmark you first test like this:

cd $LUCENE_BENCH_HOME/util
python src/python/localrun.py -source wikimedium10k

Running the geo benchmark

This one is different and self-contained. Read the command-line examples at the top of src/main/perf/IndexAndSearchOpenStreetMaps.java

luceneutil's People

Contributors

daddywri avatar danielmitterdorfer avatar jpountz avatar mikemccand avatar nknize avatar rjernst avatar rmuir avatar s1monw avatar shaie avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.