vaticle / typedb-protocol Goto Github PK
View Code? Open in Web Editor NEWTypeDB (Core and Cluster) RPC Communication Protocol
License: Mozilla Public License 2.0
TypeDB (Core and Cluster) RPC Communication Protocol
License: Mozilla Public License 2.0
We currently use oneof
to denote an optional field. As of Protobuf 3.13+ this is no longer the standard, and is ugly.
Replace 'oneof' with 'optional' when all dependants are on Protobuf 3.13+
We have left-over compilation tooling for Rust using the grpc
crate. Once we've verified the tonic
-based compiler introduced in #160 is working fully, we should delete the old tooling.
Implement user role management protocol supporting the following functionalities:
ConceptMethod now has a oneof identifier - either an IID or a Label, which is weird.
Split ConceptMethod into ThingMethod and TypeMethod.
[This issue has been generalised based on Alex's message below]
The protocol spec includes definitions specific to Core and Cluster in one place. However, this exposes Cluster APIs on the Core server that could accidentally be implemented and additionally represents a domain leak. The Cluster protocol should be separate and extend the Core protocol in a separately released package.
We currently have a very large number of Concept API methods - and a lot of them feel like they should just be overloads rather than being separate methods. For example:
ThingType.GetOwns.Req thing_type_get_owns_req = 303;
ThingType.GetOwnsExplicit.Req thing_type_get_owns_explicit_req = 310;
ThingType.GetOwnsOverridden.Req thing_type_get_owns_overridden_req = 311;
In the example above, explicit
and overridden
should really be parameters - not methods in their own right. We should add them to ThingType.GetOwns.Req
:
message GetOwns {
message Req {
oneof filter {
AttributeType.ValueType value_type = 1;
}
bool keys_only = 2;
bool explicit_only = 3;
bool overridden_only = 4;
}
message ResPart {
repeated Type attribute_types = 1;
}
}
We can apply this same simplification to a great number of Concept API methods to bring down the total count.
Problem to solve
The nodejs genrule is cumbersome, and also requires exposing the raw .proto files with a filegroup instead of accessing via the proto_library rules provided.
Proposed solution
Rewrite the genruled as a full fledge bazel rule, including accessing the source files via proto_library.src_file, and delete the associated filegroup from the /proto/BUILD file
We're renaming this class in grakn.core.graql.Answer
(server) so we need to align it in the Session.proto
definition. However, we need to also update the client drivers. So let's only do this after we complete vaticle/typedb#4890.
Right now, some dependencies and their respective version numbers are declared in package.json
. This is not ideal and they should be declared in Bazel instead.
UUID representation as string is inefficient.
Store Session IDs and Request IDs (and any other UUIDs) as bytes, not strings.
We use a oneof with a custom Null
message to represent deliberately missing values in protobuf. This is non-standard since google provides standard Null messages, but also, oneof's in protobuf already have a "not set" state, making the Null message redundant.
We should replace all of our usages with either single-field oneofs or normal messages (which are always optionally not set).
In Rust, we have the ability to split up our gRPC bindings, producing a server package and a client package. This would currently make the client package imported by typedb_client
roughly 25% smaller, and the server package imported by typedb
roughly 15% smaller.
One naive way would be to compile two distinct crates (typedb_protocol_client
and typedb_protocol_server
.) But that's not great: we should be importing from typedb_protocol
, not from typedb_protocol_client
.
A better strategy exists: compile the server and client bindings individually, then pack them both into the same crate - but gate them behind crate features, so if the user wants to depend on the client they specify the following in their Cargo.toml
:
[dependencies]
typedb_protocol = { version: "1.0.0", features = ["client"] }
And if they want the server, they specify features = ["server"]
.
At the time of writing, producing a crate with optional features is most likely not supported in bazel-distribution
.
The current structure of having separate packages for Session
, Keyspace
and KGMS
doesn't accurately reflect the programmating story of interacting with the GRPC protocol and could be better organised.
UUIDv4 generation is quite costly, and generating a new one for each query in rapid succession (during e.g. data ingestion) can make it a bottleneck.
Generate UUID once for the transaction and append a sequence number for each request in a transaction, or move away from uuids altogether.
When performing a query that returns many results, the query performance is always relatively bad, scaling with the connection latency to the database, because the protocol explicitly iterates one result at a time despite the server being able to generate all of the results quickly.
Currently there is no known workaround to this issue.
This performance issue could be mitigated by allowing the server to continue sending results in batches.
The iteration request could specify a batch size (up to ALL) in Iter.Req
, and the grakn server would return the requested number of results with multiple Iter.Res
, ending with a Iter.Res.done = true
if the results have completed before the batch size was met (or at the end of the results in ALL).
Client Server
====== ======
Req(ALL) |
| --> |
| Res
| <-- Res
Receive <-- Res
Receive <-- Res
Receive <-- Res
Receive <-- Done
Receive <-- |
Done |
This works well for execute requests, since ALL can always be requested (as the execute is always planning to receive all).
In streaming requests, a large batch size could be used instead. The client can begin iterating the results as fast as it receives them, but the batch size also prevents the server from overwhelming it with results (a very simplified variation of back-pressure). In future, this back-pressure would be better handled with custom flow control for the Transaction
RPC.
The only issue with using large batches for a streaming request is that if the client wants to use the Concept API during the transaction, it must wait until it has received the whole of the last batch.
This change also does not impact existing client implementations, since not providing a batch size would default to a single iteration (as before).
An alternative to using large batches in streaming would be to use a double-buffered batching stream, which ensures the next batch is being fetched whilst the previous batch is being consumed, however this would better be built on top of this current work. The best variation of this would be an "adaptive" double-buffered stream, which adapts the batch size based on the iteration latency.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.