Coder Social home page Coder Social logo

elasticsearch-stress-test's Introduction

THIS PROJECT IS NO LONGER MAINTAINED

Elasticsearch Stress Test

Overview

This script generates a bunch of documents, and indexes as much as it can to Elasticsearch. While doing so, it prints out metrics to the screen to let you follow how your cluster is doing.

How to use

  • Save this script
  • Make sure you have Python 2.7+
  • pip install elasticsearch

How does it work

The script creates document templates based on your input. Say - 5 different documents. The documents are created without fields, for the purpose of having the same mapping when indexing to ES. After that, the script takes 10 random documents out of the template pool (with redraws) and populates them with random data.

After we have the pool of different documents, we select an index out of the pool, select documents * bulk size out of the pool, and index them.

The generation of documents is being processed before the run, so it will not overload the server too much during the benchmark.

Mandatory Parameters

Parameter Description
--es_address Address of the Elasticsearch cluster (no protocol and port). You can supply mutiple clusters here, but only one node in each cluster (preferably the client node)
--indices Number of indices to write to
--documents Number of template documents that hold the same mapping
--clients Number of threads that send bulks to ES
--seconds How long should the test run. Note: it might take a bit longer, as sending of all bulks whose creation has been initiated is allowed

Optional Parameters

Parameter Description Default
--number-of-shards How many shards per index 3
--number-of-replicas How many replicas per index 1
--bulk-size How many documents each bulk request should contain 1000
--max-fields-per-document What is the maximum number of fields each document template should hold 100
--max-size-per-field When populating the templates, what is the maximum length of the data each field would get 1000
--no-cleanup Boolean field. Don't delete the indices after completion False
--stats-frequency How frequent to show the statistics 30
--not-green Script doesn't wait for the cluster to be green False
--no-verify No verify SSL certificates False
--ca-file Path to Certificate file
--username HTTP authentication Username
--password HTTP authentication Password

Examples

Run the test for 2 Elasticsearch clusters, with 4 indices on each, 5 random documents, don't wait for the cluster to be green, open 5 different writing threads and run the script for 120 seconds

python elasticsearch-stress-test.py  --es_address 1.2.3.4 1.2.3.5 --indices 4 --documents 5 --seconds 120 --not-green --clients 5

Run the test on ES cluster 1.2.3.4, with 10 indices, 10 random documents with up to 10 fields in each, the size of each field on each document can be up to 50 chars, each index will have 1 shard and no replicas, the test will run from 1 client (thread) for 300 seconds, will print statistics every 15 seconds, will index in bulks of 5000 documents and will leave everything in Elasticsearch after the test

 python elasticsearch-stress-test.py --es_address 1.2.3.4 --indices 10 --documents 10 --clients 1 --seconds 300 --number-of-shards 1 --number-of-replicas 0 --bulk-size 5000 --max-fields-per-document 10 --max-size-per-field 50 --no-cleanup --stats-frequency 15

Run the test with SSL

 python elasticsearch-stress-test.py --es_address https://1.2.3.4 --indices 5 --documents 5 --clients 1 --ca-file /path/ca.pem

Run the test with SSL without verify the certificate

 python elasticsearch-stress-test.py --es_address https://1.2.3.4 --indices 5 --documents 5 --clients 1 --no-verify

Run the test with HTTP Authentification

 python elasticsearch-stress-test.py --es_address 1.2.3.4 --indices 5 --documents 5 --clients 1 --username elastic --password changeme

Contribution

You are more then welcome! Please open a PR or issues here.

elasticsearch-stress-test's People

Contributors

barakm avatar danielmitterdorfer avatar dymil avatar jayme-github avatar magicmicah avatar mathewmeconry avatar mend-bolt-for-github[bot] avatar noniperi avatar orzilca avatar roiravhon avatar sadok-f avatar talhibner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-stress-test's Issues

Documents not creating...

python elasticsearch-stress-test.py --es_address 139.0.0.1 128.0.0.2 --indices 4 --documents 5 --seconds 120 --not-green --clients 2

Test is done! Final results:
Elapsed time: 132 seconds
Successful bulks: 0 (0 documents)
Failed bulks: 26 (26000 documents)
Indexed approximately 0 MB which is 0.00 MB/s

Cleaning up created indices.. Done!

Test always fails to bulk insert

python elasticsearch-stress-test.py --es_address https://es-dev.us-east-1.es.amazonaws.com:443 --indices 4 --documents 10 --clients 5 --seconds 10 --not-green --stats-frequency 5 --no-verify

Starting initialization of https://es-dev.us-east-1.es.amazonaws.com:443
/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py:135: UserWarning: Connecting to es-dev.us-east-1.es.amazonaws.com using SSL with verify_certs=False is insecure.
  'Connecting to %s using SSL with verify_certs=False is insecure.' % host)
Done!
Creating indices..
Generating documents and workers..
Done!
Starting the test. Will print stats every 5 seconds.
The test would run for 10 seconds, but it might take a bit more because we are waiting for current bulk operation to complete.

Elapsed time: 6 seconds
Successful bulks: 0 (0 documents)
Failed bulks: 5 (5000 documents)
Indexed approximately 0 MB which is 0.00 MB/s


Test is done! Final results:
Elapsed time: 11 seconds
Successful bulks: 0 (0 documents)
Failed bulks: 13 (13000 documents)
Indexed approximately 0 MB which is 0.00 MB/s

Cleaning up created indices..  Done!

I haven't been able to get this to produce any successful bulks regardless of what I've tried.

Indices are being created, but they never contain any documents.

Default timeout settings cause ConnectionErrors

Problem

When a bulk request is taking longer than the default timeout of the Python Elasticsearch client, the script is recording an error and moving on with the next request. The problem is that the server is likely still processing the request. Thus test script is actually throwing even more load at an already overloaded node.

Steps to reproduce

  1. Start an Elasticsearch node - say 5.5.2 - with out-of-the-box settings on localhost
  2. Run python elasticsearch-stress-test.py --es_address localhost --documents 10 --clients 10 --seconds 120 --indices 5 --no-cleanup --not-green

The script will produce failures due to read timeouts. If you insert a print in the try-except you'll see these:

ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=10))

Proposed solution

10 seconds is not an insanely long time period if you are hitting a node with default settings with large bulk requests. So I suggest to increase the timeout to e.g. 60 seconds by creating the Elasticsearch client with:

es = Elasticsearch(esaddress, timeout=60)

instead of

es = Elasticsearch(esaddress)

Add index name prefix config option

Sometimes test is killed by OS if it consumes too much memory. In such cases the database is not cleaned up and we should delete test indices manually. It would be nice to add index name prefix config option for situations described above. This prefix will give us ability to delete indices using wildcards.

Index size is not as expected

Hi,
i'm tring to benchmark an elasticsearch installation but i don't understand how the parameters max_fields_per_doc and max_size_per_field works.
Are they use to determinate the size of a single document, aren't they?

Thanks you for support
Cristian

Why the test shows different the performance between Windows and Linux?

I used the your stress test code for checking performance of our systems such as Microsoft Windows 10, 10 Server, Linux(CentOS 7).
In case of Linux(CentOS 7), It shows the 27-30 MB/s on the test using example 1.
(python elasticsearch-stress-test.py --es_address 1.2.3.4 1.2.3.5 --indices 4 --documents 5 --seconds 120 --not-green --clients 5)
But, On Windows 10 and 10 server shows the 2.7~4 MB/s.
Why difference the performance between them?
All of them have same configuration of elasticsearch such as heap size (2GB) and same hardwrare(CPU, RAM, SSD).

Doesn't work after 8.x

I tried to build and run the project recently and found out that it doesn't work. After some debugging, it is because of the pip elasticsearch version, after elasticsearch-py 8.0.0 it doesn't work. The last working version is 7.17.7. I'd suggest fixing the elasticsearch-py version on requirements.txt and update the README.

How to reproduce:

python -m venv .venv
source .venv/bin/activate
pip install elasticsearch
python

>>> from elasticsearch.connection import create_ssl_context
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'elasticsearch.connection'
>>> exit()

Works:

pip uninstall elasticsearch
pip install elasticsearch==7.17.7
python

>>> from elasticsearch.connection import create_ssl_context

Could not create index

Could not create index. Is your cluster ok?
TransportError(400, u'index_already_exists_exception', u'already exists')

Indices=30
documents=25
Client=3

Script stopping.

Hi,

first of all great work!

I am having some inconsistency issuse with the script.

Sometimes it runs well with bulk=1000 document=1 and clients=100 and sometimes it fails with lower settings, no I try to run with this command and it get killed, maybe its becuase I am sending it through NGINX loadbalancer?:

root@XXXXX:~# ./elasticsearch-stress-test.py  --es_address XXXX:8080  --indices 1 --documents 1 --seconds 60 --not-green --clients 10 --stats-frequency 5 --number-of-shards 6 --number-of-replicas 2 --bulk-size 500

Starting initialization of XXX:8080
Done!
Creating indices..
Generating documents and workers..
Done!
Starting the test. Will print stats every 5 seconds.
The test would run for 60 seconds, but it might take a bit more because we are waiting for current bulk operation to complete.

Killed

Thanks,
Tal

It should have parameter to run test as long as my container is not deleted or Volume does not have enough resources .

I was running this stress-tool inside container by deploying job on kubernetes. tool is working fine as it is creating fare load on my volume.but at certain period this is going to stop as it has --second parameter for running test. but i want to keep my stress test running as long as my

  1. Volume get filled entirely.
  2. Deleted my job or container which is running stress-tool.

elasticsearch-stress script should have some parameter added so that user can run it inside conatiner as long as he/she faces above condition.

Not generating load

Sample o/p on script run

Starting initialization of 127.0.0.1
Done!
Creating indices..
Cluster timeout....
Cleaning up created indices.. Starting the test. Will print stats every 15 seconds.
The test would run for 300 seconds, but it might take a bit more because we are waiting for current bulk operation to complete.

Test is done! Final results:
Elapsed time: 10 seconds
Successful bulks: 0 (0 documents)
Failed bulks: 0 (0 documents)
Indexed approximately 0 MB which is 0.00 MB/s

Add some examples

Can you please add some examples about how to use your script?
like :

./elasticsearch-stress-test.py es_address "10.10.10.10" indices 3 documents 10 clients 3 seconds 60

Not able to generate high load

Hi, great tool!

Having said that, there are some issues to be solved :)

  • it seems like its impossible to generate heavy load. no matter which setting I try (clients, duration, doc size), the maximum load of docs/second I can reach is 2000.
    i have tried setting the clients count to big numbers such as 5000, but I dont see indexing rate changes on Marvel (its never passes to 2000 docs/s).
    maybe I am missing something?
  • Missing testing of non-bulk indexing. (es.index(...))

Need to create all indices using only one unassigned shard

Hi,
I'm back again.

This time I'm facing another problem.
The python script creates one unassigned shard per index.
Is this possible or not to keep the unassigned shard to 1? If possible, then how can I do this?

Or how can I reduce the total no. of unassigned shards?

Thanks.

Update README with the version support

Hi there, I am trying to run elasticsearch-stress-test in k8s on elasticsearch-7.2.0 . Unable to run tests there. Please check. I am also using the keys: --not-green --no-verify --not-green
I am attaching the log here:

Starting initialization of http://test:9200
Done!
Creating indices.. 
Could not create index. Is your cluster ok?
ConnectionError(('Connection aborted.', BadStatusLine("''",))) caused by: ProtocolError(('Connection aborted.', BadStatusLine("''",)))

If this does not support any specific version(s), I think it should be written on readme. Or if I am doing something wrong, please ping.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.