Coder Social home page Coder Social logo

dmitry-saprykin / cassandra_range_repair Goto Github PK

View Code? Open in Web Editor NEW

This project forked from briangallew/cassandra_range_repair

0.0 1.0 0.0 311 KB

python script to repair the primary range of a node in N discrete steps

License: MIT License

Python 87.24% Shell 12.76%

cassandra_range_repair's Introduction

range_repair.py

A Python script to repair the primary range of a Cassandra node in N discrete steps using best practices.

Background

When Cassandra begins the repair process it constructs a merkle tree, which is a tree of hashes over segments of data that the node is responsible for. The node compares it's tree to that of the replicas, if there is a difference in the hash values for any of the nodes then the segment for that hash is requested from the replica and is re-inserted.

By default the Merkle tree for Cassandra represents 15 discreet segments, which means that the data on the node is broken into 15 pieces. If the data for one of these 15 pieces is different from that of the replicas then it will result in 1/15th of the data being streamed and re-inserted. This can cause problems for two use cases: dense nodes, and DSE Solr nodes running at or near capacity.

Solr Nodes

For DSE Solr nodes when the data transferred as the result of a repair is re-inserted it is also re-indexed because each host maintains it's own independent set of indexes. If the node is already at or near capacity then the additional strain caused by the repair/re-index can push it over the edge. As a last result Cassandra will shed load by dropping mutations. If a mutation is dropped the data will at some point need to be brought over from the replicas through the repair process which unfortunately begins a never ending cycle of re-index/repair.

Dense Nodes

For clusters that have a large amount of data per node the repair process could require an unacceptably large amount of data to be streamed and re-inserted.

How the script works

The script works by figuring out the primary range for the node that it's being executed on, and instead of running repair on the entire range, run the repair on only a smaller sub-range. When a repair is initiated on a sub-range Cassandra constructs a merkle tree only for the range specified, which in turn divides the much smaller range into 15 segments. If there is disagreement in any of the hash values then a much smaller portion of data needs to be transferred which lessens load on the system.

Multiple Datacenters

If you have multiple datacenters in your ring, then you MUST specify the name of the datacenter containing the node you are repairing as part of the command-line options (--datacenter=DCNAME). Failure to do so will result in only a subset of your data being repaired (approximately data/number-of-datacenters). This is because nodetool has no way to determine the relevant DC on its own, which in turn means it will use the tokens from every ring member in every datacenter.

Options

Usage: range_repair.py [options]

Options:
  -h, --help            show this help message and exit
  -k KEYSPACE, --keyspace=KEYSPACE
                        Keyspace to repair (REQUIRED)
  -c COLUMNFAMILY, --columnfamily=COLUMNFAMILY
                        ColumnFamily to repair, can appear multiple times
  -H HOST, --host=HOST  Hostname to repair [default: $HOSTNAME]
  -s STEPS, --steps=STEPS
                        Number of discrete ranges [default: 100]
  -o OFFSET, --offset=OFFSET
                        Number of tokens to skip [default: 0]   
  -n NODETOOL, --nodetool=NODETOOL
                        Path to nodetool [default: nodetool]
  -w WORKERS, --workers=WORKERS
                        Number of workers to use for parallelism [default: 1]
  -D DATACENTER, --datacenter=DATACENTER
  -l, --local           Restrict repair to the local DC
  -p, --par             carry out a parallel repair (post-2.x only)
  -S, --snapshot        Use snapshots (pre-2.x only)
  -v, --verbose         Verbose output
  -d, --debug           Debugging output
  --dry-run             Do not execute repairs.
  --syslog=FACILITY     Send log messages to the syslog
  --logfile=FILENAME    Send log messages to a file

Sample

$ ./range_repair.py -k demo_keyspace
INFO       2014-05-09 17:31:33,503    get_ring_tokens                 66  : running nodetool ring, this will take a little bit of time
DEBUG      2014-05-09 17:31:39,057    get_ring_tokens                 72  : ring tokens found, creating ring token list...
DEBUG      2014-05-09 17:31:40,207    get_host_tokens                 86  : host tokens found, creating host token list...
DEBUG      2014-05-09 17:31:40,208    repair_keyspace                 170 : repair over range (-2974082934175371230, -2971948823734978979] with 100 steps for keyspace demo_keyspace
DEBUG      2014-05-09 17:31:40,208    repair_keyspace                 176 : step 0100 repairing range (-2974082934175371230, -2974061593070967308] for keyspace demo_keyspace ...
DEBUG      2014-05-09 17:32:47,508    repair_keyspace                 182 : SUCCESS
DEBUG      2014-05-09 17:32:47,509    repair_keyspace                 176 : step 0099 repairing range (-2974061593070967308, -2974040251966563386] for keyspace demo_keyspace ...
DEBUG      2014-05-09 17:33:54,904    repair_keyspace                 182 : SUCCESS
...

Dependencies

  • Python 2.6
  • Cassandra nodetool must exist in the PATH

History

cassandra_range_repair's People

Contributors

briangallew avatar elubow avatar mstump avatar pauloricardomg avatar pilate avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.