Want to find some good workloads to benchmark for query rate, document ingestion, etc.

For a baseline... according to Anandtech, <a href="http://www.anandtech.com/show/9185/

For an ingestion baseline, <a href="http://blog.mikemccandless.com/2011/04/catching-sl

Another open source project we could use for a baseline is <a href="https://github.com

<a href="https://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-

We now have a setup that lets us compare against Lucene, <a href="https://github.com/b

Benchmark against... something? about bitfunnel HOT 9 CLOSED

bitfunnel commented on July 2, 2024

Benchmark against... something?

from bitfunnel.

Comments (9)

danluu commented on July 2, 2024

For a baseline... according to Anandtech, they get 30 QPS from Elastic Search when they put Wikiepdia on a Broadwell Xeon D (1540). They say that wikipedia is "+/- 40GB", which implies they might be using English language wikipedia, articles only. That's currently 49GB but maybe it was closer to 40GB when they benchmarked it? We should ask them exactly what configuration they ran when we do our benchmarks if we end up having access to similar hardware.

from bitfunnel.

danluu commented on July 2, 2024

This ASPLOS '15 paper on something unrelated happens to use Lucene as one of its targets. They run 10k queries on "Wikipedia" and their latency (fig 2b) implies single digit QPS number per core. On "a server with two 8-core Intel 64-bit
Xeon processors (2.30 GHz)... 64GB RAM", they show that it's possible to get to 40-ish QPS before tail latency spikes above 1 second (fig 4). See section 6.1 for tails on methodology and hardware.

from bitfunnel.

danluu commented on July 2, 2024

I tried asking this question publicly and basically everyone who replied thought that it should be possible to do better. However, there were very few concrete suggestions; the actual concrete suggestions were:

"possible by reducing term dictionary size. I bet the term dict for Wikipedia is large"
"you need to reserve a good bit of the systems ram for FS caching (i.e. Don't give it all to the JVM)"

We can try those, although it would be a bit surprising if every public benchmark we can find has a poor setup. If so, it seems to indicate that there are non-obvious default settings that we need to change, and that we should make sure our defaults don't result in people doing the same thing they do for Lucene.

from bitfunnel.

danluu commented on July 2, 2024

Someone who works at ElasticSearch claims that it's because the benchmarks are benchmarking worst case queries with no stopwords, and that the results are per-thread. I can't see any evidence of either claim. I don't know how that guy could have determined that it's on worst case queries since neither benchmark I linked to talks about query distribution.

The ASPLOS paper specifically notes that tail latency is worse with 4 threads than 1 above 42 QPS and that, in general, tail latency degrades more with more threads. The y-axis doesn't go high enough to tell, but once you get significantly above the top of the graph (1.5s) with a slope that indicates that tail latency could easily hit 10s+ in the high 40QPS range. Please note that the units here are seconds and not milliseconds. In the results section, the ASPLOS paper also notes that it mixes "long" and "short" queries, where long and short refer to the time the query takes to complete, so it almost can't be the case that it's all worst case queries (and we don't know that the long queries are worst case queries). In (6), the ASPLOS paper notes that they pull queries from the nightly tests and use "the term requests". Additionally, running with no stop words is what we do, and AFAIK what every major search engine has done for years.

That guy basically concludes with "The source code is open...", which seems to match what you got in your other interaction with the ElasticSearch folks.

from bitfunnel.

danluu commented on July 2, 2024

On the nightly regression tests mentioned above, they appear to get... 30QPS - 40QPS on Wikipedia: http://home.apache.org/~mikemccand/lucenebench/Term.html. They note that they take best case results ("Each of the 5 instances are run 50 times per JVM instance; we keep the best (fastest) time per task/query instance").

This code appears to be the code that's used to run their benchmarks. It looks like runNightly.cmd launches nightlyBench.py.

The code has almost no documentation in the actual code, but if I've skimmed it correctly, that launches r.runSimpleSearchBench in benchUtil.py.

In benchUtil.py, we see

# Skip this pctg of the slowest runs:
SLOW_SKIP_PCT = 10

If you believe the text from the main page, this means that they have two separate filters that filter out the tail, so they drop anything above the 90% tail and then after doing that only look at the best of 50 runs?

But wait, this code also has

# SELECT = 'min'
# SELECT = 'mean'
SELECT = 'median'

Which seems to indicate that the text on the main page is outdated, and that they take the median instead of the min?

Also, the README for that repo says "In the second step, the setup procedure creates all necessary directories in the clones parent directory and downloads a 6 GB compressed Wikipedia line doc file from an Apache mirror.", which seems to indicate they're not running against all of Wikipedia?

from bitfunnel.

danluu commented on July 2, 2024

For an ingestion baseline, this post talks a bit about Lucene's benchmark setup and mentions that with an OCZ Vertex 3 in a 2 socket Xeon X5680s overclocked to 4Ghz, Lucene ingests "~102GB plain text per hour"

The benchmark takes a while to run. To even extract wikipedia takes 40 minutes on the mac we've been using

expand-enwiki:
  [bunzip2] Expanding enwiki-20070527-pages-articles.xml.bz2 to /Users/visualstudio/dev/lucene-solr/lucene/benchmark/temp/enwiki-20070527-pages-articles.xml
BUILD SUCCESSFUL
Total time: 42 minutes 34 seconds

from bitfunnel.

danluu commented on July 2, 2024

Another open source project we could use for a baseline is OpenAcoon / DeuSu. They claim:

The above website runs on an Intel E3-1225 with 32gb RAM and two 500gb SSDs. The search-index on that site currently holds about 1.08 billion WWW-pages. On average a query takes about 0.2 seconds.
...
The software was originally written in Delphi (=Pascal).
...
Sorry for the quality of most of the code. Big parts of it were written 15 years ago when I was still young and stupid. :)

from bitfunnel.

danluu commented on July 2, 2024

Here's an old blog post where someone at Elastic ran Term queries against all of wikipedia, which I believe are basically what we support:

It looks like they get 8k QPS for single term queries, with increasing speed as they AND in more terms and decreasing speed when they OR in more terms, as expected.

from bitfunnel.

danluu commented on July 2, 2024

We now have a setup that lets us compare against Lucene, here. Our Lucene results are in the same ballpark as the results from the Elastic post cited above, which puts us multiple orders of magnitude faster than most public benchmarks that have been cited, like Anandtech and the ASPLOS '15 paper linked to above.

I'm sure the setup could use a lot of work, but it appears to have results that are at least as fast as the fastest public results we can find on a machine that's no faster than any of the machines cited above.

from bitfunnel.

Benchmark against... something? about bitfunnel HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent