Coder Social home page Coder Social logo

scylla-rust-driver's Introduction

ScyllaDB Rust Driver

Crates.io docs.rs minimum rustc version

This is a client-side driver for ScyllaDB written in pure Rust with a fully async API using Tokio. Although optimized for ScyllaDB, the driver is also compatible with Apache Cassandra®.

Note: this driver is officially supported but currently available in beta. Bug reports and pull requests are welcome!

Getting Started

The documentation book is a good place to get started. Another useful resource is the Rust and Scylla lesson on Scylla University.

Examples

let uri = "127.0.0.1:9042";

let session: Session = SessionBuilder::new().known_node(uri).build().await?;

let result = session.query("SELECT a, b, c FROM ks.t", &[]).await?;
let mut iter = result.rows_typed::<(i32, i32, String)>()?;
while let Some((a, b, c)) = iter.next().transpose()? {
    println!("a, b, c: {}, {}, {}", a, b, c);
}

Please see the full example program for more information. You can also run the example as follows if you have a Scylla server running:

SCYLLA_URI="127.0.0.1:9042" cargo run --example basic

All examples are available in the examples directory

Features and Roadmap

The driver supports the following:

  • Asynchronous API
  • Token-aware routing
  • Shard-aware routing (specific to ScyllaDB)
  • Prepared statements
  • Query paging
  • Compression (LZ4 and Snappy algorithms)
  • CQL binary protocol version 4
  • Batch statements
  • Configurable load balancing policies
  • Driver-side metrics
  • TLS support - install openssl if you want to use it https://docs.rs/openssl/0.10.32/openssl/#automatic
  • Configurable retry policies
  • Authentication support
  • CQL tracing

Ongoing efforts:

  • CQL Events
  • More tests
  • More benchmarks

Getting Help

Please join the #rust-driver channel on ScyllaDB Slack to discuss any issues or questions you might have.

Supported Rust Versions

Our driver's minimum supported Rust version (MSRV) is 1.66.0. Any changes will be explicitly published and will only happen during major releases.

Reference Documentation

Other Drivers

License

This project is licensed under either of

at your option.

scylla-rust-driver's People

Contributors

altanozlu avatar annastuchlik avatar asledz avatar avelanarius avatar colin-grapl avatar cvybhu avatar dgarcia360 avatar dtzxporter avatar gor027 avatar havaker avatar jasperav avatar kbr-scylla avatar kejmer avatar lorak-mmk avatar macher259 avatar muzarski avatar nemosupremo avatar oeb25 avatar penberg avatar piodul avatar ponewor avatar psarna avatar quentinperez avatar rukai avatar sylwiaszunejko avatar ten0 avatar ultrabug avatar wmitros avatar wprzytula avatar wyfo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scylla-rust-driver's Issues

CQL query support

Add support for sending QUERY requests to the server (4.1.4.).

Please note that there are separate issues for query result set parsing (#11) and paging (#10).

Cluster topology discovery

Probe the cluster for its topology to enable things like token-aware routing (you need to know the address of the node that owns a specific token).

std::bad_alloc (sic) when trying to use LZ4 compression

HEAD: a42aef8

I get... an std::bad_alloc error when trying to run the example.rs:

[piodul@localhost scylla-rust-driver]$ cargo run --example example

# ... skipped warnings ...

warning: 5 warnings emitted

    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/examples/example`
Connecting to localhost:9042 ...
Error: Error (code 0): std::bad_alloc

The exception occurs when trying to send the first query with LZ4 compression turned on.

Add retrieving replication strategy information from the cluster

In order to implement token-aware policy, we need to know what's the replication strategy for given keyspace in order to know which nodes are responsible the data we're trying to read/write.

First part is to get result metadata from the response of a PREPARED request. That's already mostly done - look for TableSpec and deser_table_spec for details.

Once we have the keyspace name (which is just a string), we can retrieve its replication strategy by selecting it from a system table:

SELECT replication FROM system_schema.keyspaces WHERE keyspace_name = 'our_name';

After that we need to parse the replication strategy information and it can later be used to determine which nodes to use for load balancing.

I believe that that's the related code from gocql:
https://github.com/gocql/gocql/blob/964d7011f63d85c0c135ca47e2d06032c6be391b/topology.go#L71-L91
https://github.com/gocql/gocql/blob/5913df4d474e0b2492a129d17bbb3c04537a15cd/metadata.go#L544-L598

LZ4 compression is incompatible with Scylla

LZ4 compression works for small messages with Scylla:

[penberg@nero rust-driver]$ SCYLLA_URI="localhost:9042" cargo run --example cqlsh-rs
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/examples/cqlsh-rs`
Connecting to localhost:9042 ...
>> USE ks
>>

but not for larger ones:

Connecting to localhost:9042 ...
>> CREATE KEYSPACE IF NOT EXISTS ks WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error (code 0): CQL frame LZ4 uncompression failure', examples/cqlsh-rs.rs:22:48
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

We need to find a crate that is compatible with Scylla's LZ4.

Allow passing multiple node addresses to session::connect

We currently only allow passing a single address to Session::connect. It's fine for token-aware routing, since we then pull the topology info and establish connections to each node anyway, but we should also allow non-token-aware users to have sensible load balancing (e.g. round robin over multiple addresses). Token awareness is kind of hardcoded for now, but it's not going to always be the case, so users should have other options.

In order to do that, we should allow passing multiple addresses to connect, just like other drivers tend to do, e.g. Session::connect(&[addr1, addr2, addr3]). These addresses should then be inserted into a pool, which can be used for load balancing.

Run PREPARE on all nodes

Currently, prepare() only sends the request via a single connection. What's better to do instead is to use the topology information (if available) to send the preparation request to each node.

Statement preparation

Add support for the PREPARE (4.1.5.) statement, which registers a CQL statement on the cluster for later execution.

Code formatting check

Our CI should check if our code conforms to a style enforced by cargo fmt.

Some of us (at least I) use cargo fmt to format the code that we write locally. We should agree on one version and enforce its formatting so that we won't have to deal with unwanted formatting change when working on files formatted by another person which used a different version of cargo fmt (or did not use cargo fmt at all).

Calculate replica sets when performing token-aware routing

Currently, when executing a prepared statement, we calculate the token of the partition key and pick the connection to the node which owns this token (i.e. which owns the token range/vnode that this token lies in).

The owner of a token should be one of the replicas for the given partition, but I'm not sure if it will always be -- maybe in NTS it could happen that this owner lies in a data-center that the keyspace does not replicate to, in which case it won't be a replica.

In any case, there can and usually will be other replicas based on the replication strategy of the keyspace that we're performing the statement on. The connection picking code should calculate this set of replicas and in case executing the statement fails on one of them, try another one. It could also try balancing the load between the replicas somehow by e.g. picking the connections in a round-robin fashion.

ORM

I made an orm for https://github.com/AlexPikalov/cdrs a while ago: https://github.com/Jasperav/cdrs_orm. In short:

I see that Scylla has an ORM for Go: https://github.com/scylladb/gocqlx.

I think it's a good idea to make an ORM for this crate when it is production ready, maybe we can use cdrs_orm (my crate) as a starting point. I can help with the migration.

LWT support

We need to support and test LWT (lightweight transaction) queries - the ones that have conditions within them, e.g. IF EXISTS.
We also have a Scylla-specific optimization, which returns a flag when preparing a statement that this statement is a lightweight transaction. We can then reduce the number of paxos conflicts by trying to always send the requests of the same key to the same node, which is assumed to be the paxos coordinator.

We should also have tests for LWT queries.

An API for querying rows with known type

Problem

Currently, our query API returns rows in untyped form. Because of its untypedness, it can be quite inconvenient to use - there is a lot of unwrapping involved in order to get a single value:

if let Some(rs) = session.query("SELECT a, b, c FROM ks.t", &[]).await? {
    for r in rs {
        let a = r.columns[0].as_ref().unwrap().as_int().unwrap();
        let b = r.columns[1].as_ref().unwrap().as_int().unwrap();
        let c = r.columns[2].as_ref().unwrap().as_text().unwrap();
        println!("a, b, c: {}, {}, {}", a, b, c);
    }
}

While I think this interface may have some use (e.g. when doing SELECT * FROM ... we may not know the column names and types and we want to discover it along with the response), it's unnecesarily inconvenient when the user knows the schema and knows which types to expect.

We should add another interface which returns rows as tuples of user-specfied types. This interface would first check that the types declared by the user match with the metadata in response, and then proceed with deserialization.

Examples

This is how I imagine the example above would be rewritten using such an API:

let result: Option<QueryResult<(i32, i32, String)>> = session.query("SELECT a, b, c FROM ks.t", &[]).await?;
if let Some(rs) = result {
    for (a, b, c) in rs {
        println!("a, b, c: {}, {}, {}", a, b, c);
    }
}

I think there should be some scenarios in which the type inference eliminates the need to specify the types at all:

fn print_row(a: u32, b: u32, c: String) {
    println!("a, b, c: {}, {}, {}", a, b, c);
}

if let Some(rs) = session.query("SELECT a, b, c FROM ks.t", &[]).await? {
    for (a, b, c) in rs {
        print_row(a, b, c);
    }
}

Maybe the Query and PreparedStatement types should encode the returned row type? The old API would be put to UntypedQuery and UntypedPreparedStatement.

Establish connections to all available nodes on startup

Currently, connections to specific nodes/shards are established lazily - only when a token aware statement is about to be sent to a specific node and it doesn't have a working connection yet. That creates a subtle problem with statements which are propagated from the driver side (e.g. #115), because the aforementioned statements will only get propagated to existing connections, which may be not enough. In particular, right after calling Session::connect, the number of connections is currently always 1.

The solution is to simply establish a connection the moment we discover (via topology) that there exists a node which wasn't contacted yet.

Provide an option to bind values to each separate statement in a batch

While using current batch API, it is easy to encounter the problem of statements/values count mismatch. To avoid it, we need to provide an option to bind values to each separate statement in a batch.

New batch API should support:

  • Binding values to each separate statement in a batch
  • Modifying statement's values between batch executions

Prepared statement execution

After statement preparation support (#8), add support for the EXECUTE request (4.1.6.) to execute prepared statements. Please note that a prepared statement might be evicted from the prepared statement cache on the server, which requires the driver to re-prepare the statement.

Allow setting the consistency level

As of now, it's not possible to set the consistency level - it's hardcoded everywhere as ONE. We should allow the user to set the consistency level.

More informative errors

Currently, we are using the anyhow crate for error handling because it is very easy to use - it exposes one error type anyhow::Error, which is a catch-all for all types implementing std::error::Error.

This approach has a big drawback - while anyhow::Error can be associated with a descriptive error message so that it is easy to understand by a human, it's not really possible to differentiate different kinds of failure in the code - e.g. distinguish between query timeout and connection close. This kind of information will not only be useful for client programs, but for driver internals, too - for example, see #59 (comment)

We should gradually move away from the anyhow::Error type and write meaningful error types ourselves. I'd like to suggest the thiserror crate - it greatly simplifies writing custom error types.

TLS support

We should support encrypted connections over TLS.

Retry policy support

Add support for retrying of some operations under right circumstances (query is idempotent, retry policy is configured).

Support USE statement

The USE statement isn't particularly complicated, but it should be treated in a special way - it should be propagated to all underlying connections for a given CQL session, since it changes their state.

Refactor Value API

Currently all values passed to queries are represented like this:

pub enum Value {
    Val(Bytes),
    Null,
    NotSet,
}

This works well for simple types but I ran into some problems when implementing User Defined Types here
First problem is that converting a UDT into Val(Bytes) can fail if a field is serizalized to [bytes] bigger than 2GiB. This makes query api ugly because instead of using simple values! we have to use try_values! and handle conversion error.
Ex. with values!:

session
        .query(
            "INSERT INTO ks.t (a, b, c) VALUES (?, ?, ?)",
            &scylla::values!(3, 4, "def"),
        )
        .await?;

And with try_values!:

session
        .query(
            "INSERT INTO ks.t (a, b, c) VALUES (?, ?, ?)",
            &(scylla::try_values!(3, 4, "def") ?),
        )
        .await?;

Additionally the same error can happen when awaiting the query because we serialize Val(Bytes) for the final time when sending the query.
Maybe we could avoid this ugliness if Value was changed to something like:

pub enum Value {
    Serialized(Bytes),
    TooBigTooSend,
}

Then values! would convert rust types into either succesfully serialized Serialized(Bytes) or if it turns out bigger than 2GiB TooBigTooSend. Later query api would look if the values are properly serialized and throw an error in case of TooBigTooSend.
This would make user's api nicer at the cost of unwrapping Values in query code.
Maybe splitting Value into variants would also work well with named values in the future (?)

Another issue is that when serializing a UDT with current api each field is recursively converted into a Value and then written onto final BytesMut which means an allocation for each conversion into Value which is not ideal. A better solution would be to introduce a trait which would allow to serialize a type as value by writing into &mut impl BufMut instance - similarly to how requests are serialized. Then we could convert all rust types into Value::Serialized using this trait - currently bytes inside Value::Val(Bytes) aren't a fully serialized value so it wouldn't work

There would be some problems because for example token routing uses hashing value bytes only if it's not null but this could be solved by checking if number of serialized bytes is > 4 and then taking bytes[4..] as data to hash

I'm not sure what api for Value would be the best but current one is non ideal
I'm gonna try some options and see what could work

Allow configuring load balancing algorithms

Session objects should accept some kind of config information (e.g. a configuration struct), which includes the picked load balancing strategy:

  • RoundRobin
  • DCAwareRoundRobin
  • TokenAware
  • ShardAware

... where TokenAware and ShardAware also take an underlying policy for internal load balancing (so that a user can configure the load balancing to be TokenAware(RoundRobin) or TokenAware(DCAwareRoundRobin).

Authentication support

If authentication is enabled on the server, it will send a AUTHENTICATE (4.2.3.) message, which the client responds with AUTH_RESPONSE. There's also AUTH_CHALLENGE and AUTH_SUCCESS messages sent by the server.

Add multiplexing to the connection

Our connection class needs to be able to handle multiple communication streams, indentified by stream: u16 field in CQL specification. Also, CQL allows the server to push messages to the driver via EVENT. In order to correctly handle that, we shoul prepare the connection class to multiplex multiple communication channels and have separate routines for reading and writing.

Support all CQL types

Support for CQL types is now restricted to pretty much ints and strings. We would like the rest of the types, along with collections and UDTs, to be supported by our driver as well.

Implement shard aware load balancing

As with #124, the backend is already there, but we want this option to be configurable.

Also, shard-aware load balancer should be the default choice in case a user hasn't provided any custom configuration. Shard-aware routing should fall back to token awareness if shard info is not available (e.g. because we're talking to Cassandra instead of Scylla).

Add a way to wait for schema agreement

It's a quite important feature, since otherwise users don't know when it's safe to start sending requests again, after they issue some schema modification statements.

Connection breakage if compression algorithm is not supported

If we request a compression algorithm that is not supported by the server, we should either fall back to non-compression or fail the connection with a human-readable error. Right now, our code will just assume that the compression algorithm is supported and send compressed messages, which the server will fail to parse.

Compression support

The CQL binary protocol supports compression. The client must negotiate which compression algorithm to use during connection establishment.

Shard-aware routing

Scylla has a CQL protocol extension to enable shard-aware routing -- i.e. routing a request directly to a port on a specific shard in a node.

Query result parsing

The RESULT response (4.2.5.) from the server has a complex Rows type, which represents a result set returned for a CQL query.

Driver-side metrics

We should gather and expose some metrics about driver operations, e.g. how many queries were sent, how many of them resulted in an error, what was their latency, etc.

Implementation detail: @piodul since session is immutable, what's the rusty way of keeping mutable metrics inside it? RefCell?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.