palantir / cassandra Goto Github PK

Palantir's fork of Apache Cassandra

License: Apache License 2.0

Shell 0.60% Batchfile 0.28% PowerShell 0.28% Python 3.45% Thrift 0.30% Java 94.63% GAP 0.45% AMPL 0.01% Dockerfile 0.01%

octo-correct-managed

cassandra's Introduction

Executive summary

Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

For more information, see the Apache Cassandra web site.

Requirements

Java >= 1.7 (OpenJDK and Oracle JVMS have been tested)
Python 2.7 (for cqlsh)

Getting started

This short guide will walk you through getting a basic one node cluster up and running, and demonstrate some simple reads and writes.

First, we’ll unpack our archive:

$ tar -zxvf apache-cassandra-$VERSION.tar.gz
$ cd apache-cassandra-$VERSION

After that we start the server. Running the startup script with the -f argument will cause Cassandra to remain in the foreground and log to standard out; it can be stopped with ctrl-C.

$ bin/cassandra -f

Note for Windows users: to install Cassandra as a service, download Procrun, set the PRUNSRV environment variable to the full path of prunsrv (e.g., C:\procrun\prunsrv.exe), and run "bin\cassandra.bat install". Similarly, "uninstall" will remove the service.

Now let’s try to read and write some data using the Cassandra Query Language:

$ bin/cqlsh

The command line client is interactive so if everything worked you should be sitting in front of a prompt:

Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.2.0 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
cqlsh>

As the banner says, you can use 'help;' or '?' to see what CQL has to offer, and 'quit;' or 'exit;' when you’ve had enough fun. But lets try something slightly more interesting:

cqlsh> CREATE KEYSPACE schema1
       WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE schema1;
cqlsh:Schema1> CREATE TABLE users (
                 user_id varchar PRIMARY KEY,
                 first varchar,
                 last varchar,
                 age int
               );
cqlsh:Schema1> INSERT INTO users (user_id, first, last, age)
               VALUES ('jsmith', 'John', 'Smith', 42);
cqlsh:Schema1> SELECT * FROM users;
 user_id | age | first | last
---------+-----+-------+-------
  jsmith |  42 |  john | smith
cqlsh:Schema1>

If your session looks similar to what’s above, congrats, your single node cluster is operational!

For more on what commands are supported by CQL, see the CQL reference. A reasonable way to think of it is as, "SQL minus joins and subqueries, plus collections."

Wondering where to go from here?

Getting started: http://wiki.apache.org/cassandra/GettingStarted
Join us in #cassandra on irc.freenode.net and ask questions
Subscribe to the Users mailing list by sending a mail to [email protected]
Planet Cassandra aggregates Cassandra articles and news: http://planetcassandra.org/

cassandra's People

Contributors

Stargazers

Watchers

Forkers

txangel samq-randcorp svc-excavator-bot-org jarensaa isabella232 vandimit wangchenguang123

cassandra's Issues

Idea: AtlasDBQueryFilter (probably in Cassandra 3)

Most relevant classes: CollationController, SSTableNamesIterator.

At present, AtlasDB uses the SliceQueryFilter for its queries. This has three downsides.

In multicolumn range scans, we must read all historic versions.
Cassandra has an optimization for the names query filter - it says - sort your SSTables by max timestamp descending (provided that they do not overlap) - if you see the relevant column in the latest SSTable, you can skip looking at the others.
We can only select one column per RPC - if an internal use case has 5 columns, we must issue 5 RPCs to Cassandra in order to load these.

It would be relatively straightforward for us to define an AtlasDBQueryFilter which specializes Cassandra's behaviour for AtlasDB tables specifically - locating the latest write for the cell only instead of having to look at all SSTables and merge.

how to build this project

Expose ephemeral snapshot method

https://github.com/palantir/cassandra/blob/palantir-cassandra-2.2.14/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2569

We should expose the ephemeral snapshot method as a separate method in StorageServiceMbean

Disabling client interfaces should be persisted on disk

Compaction Byte Rate Metrics are Confusing

If you do a 10GB compaction over 10 minutes, you don't see 1/60th of a GB per second over 10 minutes like you'd expect. You see the 10GB as a single instantaneous event.

Remove all cross vpc ip swapping mentions

we deprecated the config + removed its functionality in #455

Ctrl-C stops nodetool repair

Currently if you ctrl-C afternodetool repair -hosts <ips> cassandra will complete repair of the current keyspace but not continue for the rest of them. We should either keep the whole process running after ctrl-C or make it painfully obvious that not all keyspaces will continue being repaired.

Metric for sstables upgraded to cassandra version 3

This metric would be helpful to track migration process for clusters upgrading from Cassandra 2 -> Cassandra 3.

Bytes Flushed rate metric

Right out we have a Bytes Flushed count metric that accumulates and resets every time the node is bounced, this makes it confusing to see where flushes actually occur/increase. We should turn it into a rate metric similar to how compaction metrics work.

cassandra.live_disk_space_used.count metric inaccurate with multiple data directories

For a cluster with 4 separate data directories, I noticed this metric is quite inaccurate. The most recent metric for one node is 214GB total (summed all CFs), but cassandra.load.count is 423GB. 423GB matches what I see on the host.

Backport TimeWindowCompactionStrategy

Have some internal use cases with TTL'ed time-based data that it would be excellent for.

New Cluster Bootstrap Borked

Bug intro'ed in #97

If we have a brand new cluster that has never been turned on before, and then you turn multiple nodes on at once, they all remove themselves from the seed list, but none of them can communicate with any other ones so you fail bootstrap.

Example of brand new 2-node cluster where seeds list is both nodes.

INFO  [2019-11-19T21:52:04.042Z] org.apache.cassandra.net.OutboundTcpConnection: Handshaking version with other-node-hostname/other-node-ip (0: other-node-hostname/other-node-ip)
ERROR [2019-11-19T21:52:34.608Z] org.apache.cassandra.service.CassandraDaemon: Exception encountered during startup (throwable0_message: Unable to gossip with any seeds)
java.lang.RuntimeException: Unable to gossip with any seeds
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1360)
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:523)
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:759)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:678)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:564)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:322)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:560)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:670)

WARN  [2019-11-19T21:52:34.609Z] org.apache.cassandra.gms.Gossiper: No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [2019-11-19T21:52:34.609Z] org.apache.cassandra.net.MessagingService: Waiting for messaging service to quiesce
INFO  [2019-11-19T21:52:34.609Z] org.apache.cassandra.net.MessagingService: MessagingService has terminated the accept() thread

Compactions running in parallel can fill up disk due to race condition

We've implemented a configurable safety precaution where a compaction won't run if it will exceed a certain threshold for max size on disk, which is at a default of 95% (#195). However, there's a race condition here if two compactions run in parallel.

@Sam-Kramer's example: There's only 100gb of 1000gb free. Compaction A requires requires 40gb, Compaction B requires 40gb, and Compaction C requires 40gb. They all start at the same time, and would satisfy the requirement, but ultimately fill up disk.

Difficult to track down corrupted sstables during compactions

I hit a cluster with a corrupted sstable. It wasn't being caught by CorruptSSTableException, but instead a ByteBuffer error during the compaction:

ERROR [2019-12-09T00:00:08.251Z] org.apache.cassandra.service.CassandraDaemon: Exception in thread Thread[CompactionExecutor:6,1,main] (throwable0_message: Not enough bytes. Offset: 2. Length: 16898. Buffer size: 5889)
java.lang.IllegalArgumentException: Not enough bytes. Offset: 2. Length: 16898. Buffer size: 5889
        at org.apache.cassandra.db.composites.AbstractCType.checkRemaining(AbstractCType.java:362)

I couldn't tell which table this was. I looked at ./nodetool compactionstats and there were no pending compactions. So I turned on CompactionTask DEBUG logging to see what comapction was being started right before:

DEBUG [2020-01-09T17:23:48.442Z] org.apache.cassandra.db.compaction.CompactionTask: Compacting (cc08dc80-3304-11ea-b4dc-5f43b96547a5) [/path/to/lb-31-big-Data.db:level=0, ] (0: cc08dc80-3304-11ea-b4dc-5f43b96547a5, 1: [/path/to/lb-31-big-Data.db:level=0, ])

I assumed the one right before the error and ran corruption remediation on that table, then scrubbed that table. Still was getting Compaction exceptions. I lowered concurrentcompactors on the affected node from 4 to 1 to ensure that the log line right before the exception was the failing compaction.

The scrub also failed, so I had to delete all sstables. (The failure didn't tell me which table)

WARN  [2020-01-08T18:36:02.481Z] org.apache.cassandra.utils.OutputHandler$LogOutput: Error reading index file (throwable0_message: Not enough bytes. Offset: 2. Length: 16898. Buffer size: 5889)
java.lang.IllegalArgumentException: Not enough bytes. Offset: 2. Length: 16898. Buffer size: 5889
        at org.apache.cassandra.db.composites.AbstractCType.checkRemaining(AbstractCType.java:362)
        at org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:98)
        at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)

…

WARN  [2020-01-08T18:36:02.481Z] org.apache.cassandra.utils.OutputHandler$LogOutput: Error reading index file (throwable0_message: Illegal Capacity: -1374630724)
java.lang.IllegalArgumentException: Illegal Capacity: -1374630724
        at java.util.ArrayList.<init>(ArrayList.java:157)
        at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:140)
        at org.apache.cassandra.db.compaction.Scrubber.updateIndexKey(Scrubber.java:384)
        at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:181)

Summary of entire remediation:

Turn on CompactionTask DEBUG logging and incorrectly identified a corrupted keyspace1.table1.
Run corruption remediation on other nodes on that keyspace1.table1.
Run nodetool scrub on keyspace1.table1.
Check logs to see exceptions still occurring
Reduce concurrent compactors to 1 and correctly identify corrupted keyspace2.table2
Run corruption remediation on other nodes on keyspace2.table2.
Run nodetool scrub on keyspace2.table2. It failed
Turn off affected node and delete all sstables in keyspace2.table2
Run corruption remediation on other nodes again
Turn affected node on and repair keyspace2.table2
Confirm in logs that there is no more compaction exception

        at java.util.Objects.requireNonNull(Objects.java:228)
        at com.palantir.cassandra.auth.model.config.ImmutableAuthorizationRole$Builder.addAllKeyspaces(ImmutableAuthorizationRole.java:325)
        at com.palantir.cassandra.auth.model.config.ImmutableAuthorizationRole$Builder.keyspaces(ImmutableAuthorizationRole.java:314)
        at com.palantir.cassandra.auth.model.config.AuthorizationRole.jsonCreator(AuthorizationRole.java:58)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)```

Also, possibly the largest sstable in a given level.