Coder Social home page Coder Social logo

palantir / cassandra Goto Github PK

View Code? Open in Web Editor NEW
10.0 235.0 7.0 335.61 MB

Palantir's fork of Apache Cassandra

License: Apache License 2.0

Shell 0.60% Batchfile 0.28% PowerShell 0.28% Python 3.45% Thrift 0.30% Java 94.63% GAP 0.45% AMPL 0.01% Dockerfile 0.01%
octo-correct-managed

cassandra's Introduction

Executive summary

Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

For more information, see the Apache Cassandra web site.

Requirements

  1. Java >= 1.7 (OpenJDK and Oracle JVMS have been tested)

  2. Python 2.7 (for cqlsh)

Getting started

This short guide will walk you through getting a basic one node cluster up and running, and demonstrate some simple reads and writes.

First, we’ll unpack our archive:

$ tar -zxvf apache-cassandra-$VERSION.tar.gz
$ cd apache-cassandra-$VERSION

After that we start the server. Running the startup script with the -f argument will cause Cassandra to remain in the foreground and log to standard out; it can be stopped with ctrl-C.

$ bin/cassandra -f

Note for Windows users: to install Cassandra as a service, download Procrun, set the PRUNSRV environment variable to the full path of prunsrv (e.g., C:\procrun\prunsrv.exe), and run "bin\cassandra.bat install". Similarly, "uninstall" will remove the service.

Now let’s try to read and write some data using the Cassandra Query Language:

$ bin/cqlsh

The command line client is interactive so if everything worked you should be sitting in front of a prompt:

Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.2.0 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
cqlsh>

As the banner says, you can use 'help;' or '?' to see what CQL has to offer, and 'quit;' or 'exit;' when you’ve had enough fun. But lets try something slightly more interesting:

cqlsh> CREATE KEYSPACE schema1
       WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE schema1;
cqlsh:Schema1> CREATE TABLE users (
                 user_id varchar PRIMARY KEY,
                 first varchar,
                 last varchar,
                 age int
               );
cqlsh:Schema1> INSERT INTO users (user_id, first, last, age)
               VALUES ('jsmith', 'John', 'Smith', 42);
cqlsh:Schema1> SELECT * FROM users;
 user_id | age | first | last
---------+-----+-------+-------
  jsmith |  42 |  john | smith
cqlsh:Schema1>

If your session looks similar to what’s above, congrats, your single node cluster is operational!

For more on what commands are supported by CQL, see the CQL reference. A reasonable way to think of it is as, "SQL minus joins and subqueries, plus collections."

Wondering where to go from here?

cassandra's People

Contributors

aweisberg avatar belliottsmith avatar beobal avatar blambov avatar blerer avatar carlyeks avatar driftx avatar gdusbabek avatar iamaleksey avatar ifesdjeen avatar jasobrown avatar jbellis avatar jmckenzie-dev avatar krummas avatar leonz avatar mebigfatguy avatar michaelsembwever avatar mishail avatar mshuler avatar pauloricardomg avatar pcmanus avatar snazy avatar svc-excavator-bot avatar thobbs avatar tjake avatar tpetracca avatar vijay2win avatar xedin avatar yukim avatar zpear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassandra's Issues

Idea: AtlasDBQueryFilter (probably in Cassandra 3)

Most relevant classes: CollationController, SSTableNamesIterator.

At present, AtlasDB uses the SliceQueryFilter for its queries. This has three downsides.

  1. In multicolumn range scans, we must read all historic versions.
  2. Cassandra has an optimization for the names query filter - it says - sort your SSTables by max timestamp descending (provided that they do not overlap) - if you see the relevant column in the latest SSTable, you can skip looking at the others.
  3. We can only select one column per RPC - if an internal use case has 5 columns, we must issue 5 RPCs to Cassandra in order to load these.

It would be relatively straightforward for us to define an AtlasDBQueryFilter which specializes Cassandra's behaviour for AtlasDB tables specifically - locating the latest write for the cell only instead of having to look at all SSTables and merge.

Compaction Byte Rate Metrics are Confusing

If you do a 10GB compaction over 10 minutes, you don't see 1/60th of a GB per second over 10 minutes like you'd expect. You see the 10GB as a single instantaneous event.

Ctrl-C stops nodetool repair

Currently if you ctrl-C afternodetool repair -hosts <ips> cassandra will complete repair of the current keyspace but not continue for the rest of them. We should either keep the whole process running after ctrl-C or make it painfully obvious that not all keyspaces will continue being repaired.

Bytes Flushed rate metric

Right out we have a Bytes Flushed count metric that accumulates and resets every time the node is bounced, this makes it confusing to see where flushes actually occur/increase. We should turn it into a rate metric similar to how compaction metrics work.

New Cluster Bootstrap Borked

Bug intro'ed in #97

If we have a brand new cluster that has never been turned on before, and then you turn multiple nodes on at once, they all remove themselves from the seed list, but none of them can communicate with any other ones so you fail bootstrap.

Example of brand new 2-node cluster where seeds list is both nodes.

INFO  [2019-11-19T21:52:04.042Z] org.apache.cassandra.net.OutboundTcpConnection: Handshaking version with other-node-hostname/other-node-ip (0: other-node-hostname/other-node-ip)
ERROR [2019-11-19T21:52:34.608Z] org.apache.cassandra.service.CassandraDaemon: Exception encountered during startup (throwable0_message: Unable to gossip with any seeds)
java.lang.RuntimeException: Unable to gossip with any seeds
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1360)
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:523)
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:759)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:678)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:564)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:322)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:560)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:670)

WARN  [2019-11-19T21:52:34.609Z] org.apache.cassandra.gms.Gossiper: No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [2019-11-19T21:52:34.609Z] org.apache.cassandra.net.MessagingService: Waiting for messaging service to quiesce
INFO  [2019-11-19T21:52:34.609Z] org.apache.cassandra.net.MessagingService: MessagingService has terminated the accept() thread

Compactions running in parallel can fill up disk due to race condition

We've implemented a configurable safety precaution where a compaction won't run if it will exceed a certain threshold for max size on disk, which is at a default of 95% (#195). However, there's a race condition here if two compactions run in parallel.

@Sam-Kramer's example: There's only 100gb of 1000gb free. Compaction A requires requires 40gb, Compaction B requires 40gb, and Compaction C requires 40gb. They all start at the same time, and would satisfy the requirement, but ultimately fill up disk.

Difficult to track down corrupted sstables during compactions

I hit a cluster with a corrupted sstable. It wasn't being caught by CorruptSSTableException, but instead a ByteBuffer error during the compaction:

ERROR [2019-12-09T00:00:08.251Z] org.apache.cassandra.service.CassandraDaemon: Exception in thread Thread[CompactionExecutor:6,1,main] (throwable0_message: Not enough bytes. Offset: 2. Length: 16898. Buffer size: 5889)
java.lang.IllegalArgumentException: Not enough bytes. Offset: 2. Length: 16898. Buffer size: 5889
        at org.apache.cassandra.db.composites.AbstractCType.checkRemaining(AbstractCType.java:362)

I couldn't tell which table this was. I looked at ./nodetool compactionstats and there were no pending compactions. So I turned on CompactionTask DEBUG logging to see what comapction was being started right before:

DEBUG [2020-01-09T17:23:48.442Z] org.apache.cassandra.db.compaction.CompactionTask: Compacting (cc08dc80-3304-11ea-b4dc-5f43b96547a5) [/path/to/lb-31-big-Data.db:level=0, ] (0: cc08dc80-3304-11ea-b4dc-5f43b96547a5, 1: [/path/to/lb-31-big-Data.db:level=0, ])

I assumed the one right before the error and ran corruption remediation on that table, then scrubbed that table. Still was getting Compaction exceptions. I lowered concurrentcompactors on the affected node from 4 to 1 to ensure that the log line right before the exception was the failing compaction.

The scrub also failed, so I had to delete all sstables. (The failure didn't tell me which table)

WARN  [2020-01-08T18:36:02.481Z] org.apache.cassandra.utils.OutputHandler$LogOutput: Error reading index file (throwable0_message: Not enough bytes. Offset: 2. Length: 16898. Buffer size: 5889)
java.lang.IllegalArgumentException: Not enough bytes. Offset: 2. Length: 16898. Buffer size: 5889
        at org.apache.cassandra.db.composites.AbstractCType.checkRemaining(AbstractCType.java:362)
        at org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:98)
        at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)

WARN  [2020-01-08T18:36:02.481Z] org.apache.cassandra.utils.OutputHandler$LogOutput: Error reading index file (throwable0_message: Illegal Capacity: -1374630724)
java.lang.IllegalArgumentException: Illegal Capacity: -1374630724
        at java.util.ArrayList.<init>(ArrayList.java:157)
        at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:140)
        at org.apache.cassandra.db.compaction.Scrubber.updateIndexKey(Scrubber.java:384)
        at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:181)

Summary of entire remediation:

  1. Turn on CompactionTask DEBUG logging and incorrectly identified a corrupted keyspace1.table1.
  2. Run corruption remediation on other nodes on that keyspace1.table1.
  3. Run nodetool scrub on keyspace1.table1.
  4. Check logs to see exceptions still occurring
  5. Reduce concurrent compactors to 1 and correctly identify corrupted keyspace2.table2
  6. Run corruption remediation on other nodes on keyspace2.table2.
  7. Run nodetool scrub on keyspace2.table2. It failed
  8. Turn off affected node and delete all sstables in keyspace2.table2
  9. Run corruption remediation on other nodes again
  10. Turn affected node on and repair keyspace2.table2
  11. Confirm in logs that there is no more compaction exception

Throw an error when repair on Time Window Compaction Strategy is called via jmx

Repair should never be used for TWCS tables. For the endpoints that we use via JMX, we should disable repair if the table happens to be TWCS.

Not sure if we should also disable via nodetool - it feels like we should keep that one for a "what if" (though I don't know what that "what if" is - like, we need repair to recover but are willing to take the storage hit).

Cassandra fails to start if authz role has null values

Cassandra should be tolerant of nulls.

Should also check authn role.

        at java.util.Objects.requireNonNull(Objects.java:228)
        at com.palantir.cassandra.auth.model.config.ImmutableAuthorizationRole$Builder.addAllKeyspaces(ImmutableAuthorizationRole.java:325)
        at com.palantir.cassandra.auth.model.config.ImmutableAuthorizationRole$Builder.keyspaces(ImmutableAuthorizationRole.java:314)
        at com.palantir.cassandra.auth.model.config.AuthorizationRole.jsonCreator(AuthorizationRole.java:58)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)```

LCS Metrics

Add observability to size of each level for leveled compaction.

Metrics that would calculate size of each level - should be linear but there are edgecases that can cause unexpected table promotion.

Also, possibly the largest sstable in a given level.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.