mystenlabs / narwhal Goto Github PK

View Code? Open in Web Editor NEW

153.0 153.0 67.0 51.5 MB

Narwhal & Tusk are a high throughput mempool & consensus, used in the Sui smart contract platform

License: Apache License 2.0

Python 9.18% Rust 89.93% Shell 0.63% Dockerfile 0.26%

consensus rust

narwhal's People

Contributors

Stargazers

Watchers

Forkers

huitseeker bmwill bbenzikry mwufi bonedaddy patrickkuo blockchain-for punwai sblackshear mystenmark mwtian rootwarp touhonoob code-eren omercinar300 erkanborsa70 kadrkn movingco foorgoo judex56 amor2019 alienburg brewedboost puyen2015 erwanor jsteenb2 aakoshh pinkdiamond1 msrg andnasnd ispeterxu velvia crabust suidevnet ssyuan fomalhauthmj emrgnt-cmplxty dmitri-perelman yr86 zclim allan-bailey suimerq 0xvon crypto-forks richmix piersy dattgoswami dewrin patstiles tatinep lky-stephen dadacf 117503445 mirror-quote moonlinkvcapital neurocompute-ai steelvortex9 herzarika imotai sajwad xinforma zhouhu949 johnchenyan leaveshy

narwhal's Issues

[read_causual] introduce endpoint to retrieve collection digests

Epic: #44

In order to allow our customers to interface with our mempool we need to introduce a series of endpoints. One of those is the read_causal which will takes a collection digest and returns the collection digests encountered in a BFS of the graph, up to a commit boundary. On high level this endpoint will something like:

read_causal( collection_boundary: Digest ) -> Vec<Digest>

The behaviour of this endpoint on high level will work like this:

Start by the last "checkpoint" that has been read previously read
Start fetching collection digests by performing the BFS traversal (see MystenLabs/sui#5187 ) (probably up to a limit??)
Return the results

[get_collections] block_waiter to call block_synchronizer for fetching missing blocks

Story

Parent story: #67

Description

Currently the block_waiter returns an error when a certificate is not found. As part of this task the block_waiter should be modified in order to leverage the block_synchronizer to fetch the missing blocks. The process will be best effort since we might still fail to fetch the missing blocks. To ensure that we avoid unnecessary latency, the block_waiter should continue the normal processing on fetching the found the blocks and continue fetching the retrieved blocks from the block_synchronizer once they become available.

[remove_collections] introduce retry logic for failed worker requests

Story

#70

Description

In order to improve the block_remover robustness, we want to retry a few times to fetch a batch from a worker in case the request has been failed or not response has been received in time.

As part of this task this logic should be implemented on the block_remover. The number of retries should be configurable.

[get_collections] re-asses the storage model for worker batch store

Story: #67

Following the comments on the PR #66 , we should consider whether we should stick with the current storage model , meaning storing the WorkerMessage::Batch(Batch) serialised in database, or just store the Batch directly to allow us directly retrieve the required quantity.

Points that should be taken into account:

potential performance penalties: it was deliberately chosen to store the serialised WorkerMessage::Batch(Batch) as is to avoid the serialisation cost

[remove_collections] wire in components and integrate end to end

Story

Parent story: #67

Description

Following the work on #80 as part of this task we should wire in the block_remover component on the primary node and also make sure that the responses from the worker nodes for the batch removals are pumped into it.

[Narwhal as a mempool] Expose block / collection information to the primary

This is a sub-part of #44 .
We should expose the ability for the primary to read the contents of a collection (a Narwhal block) locally.
The transactions for a block:

arrive at workers of a primary,
are grouped there as batches,
are grouped at the primary - and proposed there - as a block / collection.

Reading the collection should therefore involve mapping all of this in reverse:

which workers have related batches (from the block's header),
connecting to those and re-constituting the data locally.

An extension of this issue will include a cache at the primary, see #44.

Upstream issue:
facebookresearch/narwhal#13
Related issue
MystenLabs/sui#5184
Note:
Worker fault tolerance will require data-tracking at the primary, and C&C for replication of worker's data at other workers. It would be nice if it could extend the rails built here.

[store] use the reopen! macro to reopen multiple column families

A new macro has been introduced as part of the typed-store to allow us reopen multiple column families at once and avoid code duplication.

As part of this issue, the macro reopen! should be used instead of handling the reopen of column families one by one.

For example, a place where this could be used:

narwhal/node/src/main.rs

Lines 123 to 141 in 6b08995

    
           let header_store = Store::new( 
        
               rocks::DBMap::<Digest, Header<Ed25519PublicKey>>::reopen(&rocksdb, Some("headers")) 
        
                   .expect("Failed keying headers database"), 
        
           ); 
        
           let certificate_store = Store::new( 
        
               rocks::DBMap::<Digest, Certificate<Ed25519PublicKey>>::reopen( 
        
                   &rocksdb, 
        
                   Some("certificates"), 
        
               ) 
        
               .expect("Failed keying certificates database"), 
        
           ); 
        
           let payload_store = Store::new( 
        
               rocks::DBMap::<(Digest, WorkerId), PayloadToken>::reopen(&rocksdb, Some("payload")) 
        
                   .expect("Failed keying payload database"), 
        
           ); 
        
           let batch_store = Store::new( 
        
               rocks::DBMap::<Digest, Vec<u8>>::reopen(&rocksdb, Some("batches")) 
        
                   .expect("Failed keying batch message database"), 
        
           );

Merge async & sync DB interface

#11 introduces a typed DB interface that supports keying a key-value map by ColumnFamily

This issue is about refactoring the general storage interface:
https://github.com/MystenLabs/narwhal/blob/main/store/src/lib.rs

So that it acts as an asynchronous front-end to the typed DB interface.

[get_collections] : primaries handle must-block request

Story

Parent story: ##67

Description

As part of this task it should be implemented the logic on the primary nodes to handle the requests for fetch the missing certificates. It is reminded that the first step of the protocol is for a primary node to ask their peers for a list of certificate ids (block ids) . Then the peers should respond back with the certificate them selfs (or empty for the ones that are not found).

[network] be able to send error messages across network

Story: #67

At the moment there is no way to pass error messages across the network communication between nodes (either primary <-> primary, worker <-> worker, primary <-> worker) . So far there wasn't any need to propagate any errors between nodes. However, as we progress with features that demand some request-response communication, this becomes necessary.

As part of this issue it should be provided a common way to wrap and send errors across the network for any node communication. As reference is provided the way that Sui handles errors and the corresponding SuiError type.

Differentiate Digest type based on hashed type

We store many digests as hashes deduced from their contents, where the deduction is driven by an impl<T> Hash for T.

This applies for votes, certificates, headers ... It would be good to be able to differentiate between a Certificate Digest and a Vote Digest. Using phantom types, we should be able to do this at no runtime cost.

[tusk] allow Narwhal run without Tusk

Currently on the repository Tusk is running by default without having the option to disable it. All the information is kept on its internal DAG. However, not all of our customers will want to run Narwhal with Tusk - in fact at the moment no-one will immediately want to. For that reason we should refactor the codebase to:

Give the ability to run a node with or without Tusk
When running without Tusk, the DAG should still be available to use in order for our customers to be able to perform read_causal requests #69 and remove_collections #70 . Those 2 issues are basically blocked until this work is finished first

On a high level we should aim to:

allow the node to be run with a command line option to disable Tusk
extract the common DAG parts of Tusk and create a component of "lightweight" Tusk to allow us form the DAG and advance rounds
be able to boot the node with the above component

Cryptographic agility

The Celo implementation supports BLS12-377 for validator signatures only : https://docs.celo.org/validator-guide/summary

Sommelier uses Tendermint Core, which supports either of Ed25519 signatures or Secp256k1 ECDSA signatures: https://docs.tendermint.com/v0.35/spec/core/encoding.html#public-key-cryptography

Our implementation currently considers only one variant : Ed25519 signatures. As a consequence the code base will need enough crypto agility to support signing with multiple types of keypairs.

This umbrella issue aims at referring further issues on the topic.

a first baseline is a generic API for signing and verification,
we may also want to support batch verification for those signature schemes that do have it,
we may also want to support aggregated / accountable signature verification for those signature schemes that do have it (hi Celo),

Prior APIs that allow this:
https://github.com/diem/diem/blob/main/crypto/crypto/src/traits.rs (though YMMV)

Better error handling for the subscriber handler

Story

This task is part of the light client story #95

Description

The subscriber handler deals with errors with a lot of ad-hoc prints and breaks.

[crypto] Check canonicity of our base64 encodings

We encode and decode key material in base 64:

narwhal/crypto/src/traits.rs

Lines 43 to 58 in 822acf2

    
           pub trait EncodeDecodeBase64: Sized { 
        
               fn encode_base64(&self) -> String; 
        
               fn decode_base64(value: &str) -> Result<Self, eyre::Report>; 
        
           } 
        
           // The Base64ct is not strictly necessary for (PubKey|Signature), but this simplifies things a lot 
        
           impl<T: ToFromBytes> EncodeDecodeBase64 for T { 
        
               fn encode_base64(&self) -> String { 
        
                   base64ct::Base64::encode_string(self.as_bytes()) 
        
               } 
        
               fn decode_base64(value: &str) -> Result<Self, eyre::Report> { 
        
                   let bytes = base64ct::Base64::decode_vec(value).map_err(|e| eyre!("{}", e.to_string()))?; 
        
                   <T as ToFromBytes>::from_bytes(&bytes).map_err(|e| e.into()) 
        
               } 
        
           }

Using base64ct:
https://docs.rs/base64ct/latest/base64ct/f

We should check base64ct is using (or can be configured to use) canonical encodings as well to avoid the information leakage through padding.

[virtual DAG] Leaf removal

#85 introduces a DAG but special cases leaves as not removable / compressible.
This hinders actual removals of nodes from the dag at the end of a parent-pointer chain (suffix removals).

[virtual dag] Node addition

On the coat tails of #85, Implement:

keeping pointers to the heads of the DAG,
adding Nodes and maintaining the head pointers,
maintaining an association table from Collection Digests to nodes.

[store] Feedback from use in fastnft

We have now used the DBKey and DBBatch to implement persistence in fastnft. Some feedback:

For a couple of operations we need atomic test-and-write, ie. to process orders we need to check lock values (none or the same) and then set them; for certificates we test objects / locks exist then do the write. The current abstractions do not provide a native (efficient) way of doing this, and therefore we need a mutex across this logic.
Ser and Deser is really nice, and functional. BUT right now this is performed synchronously in the main thread that needs to have a mut lock on the database. The abstraction promotes doing ser / deser in the critical region (incidentally). It would probably be better to allow threads to have concurrent access to the DBMap, and internally protect operations using a mutex of RWlock.
Its probably worth supporting a variant of typed multi_get, it is used often enough.

Implement Consensus fault-tolerance through memory-mapped files

Along with the certificate DB:

narwhal/primary/src/core.rs

Lines 281 to 284 in 46f2457

    
           // Store the certificate. 
        
           let bytes = bincode::serialize(&certificate).expect("Failed to serialize certificate"); 
        
           self.store.write(certificate.digest().to_vec(), bytes).await;

The only state that the consensus needs to recover is the last_voted_round table:
https://github.com/MystenLabs/narwhal/blob/main/consensus/src/lib.rs#L32

This fixed-sized element can be recovered from a memory-mapped file.

This PR should implement:

the synchronous consensus state writes,
the recovery flow

[remove_collections] introduce endpoint to remove collections

Epic: #44

In order to ensure that:

Our customers will only receive collections that haven't already consumed and proposed (or discarded)
We reclaim space in Narwhal nodes

we are looking to introduce an endpoint that will allow a validator to remove the specified collections both from the internal DAG and the data store. At the moment more details can be found on the doc here.

The endpoint on high level will look like:

remove_collections(collection_refs: Vec<CollectionDigest>) -> Result((), Error)

where the collection_refs is a vector with the collection digests that should be removed.

[refactor] The majority of the spawned tasks do not return a JoinHandle

Given that the majority of the spawned tasks on the codebase do not return a JoinHandle , I believe it would make sense to open a ticket and refactor them, what do you think?

Originally posted by @akichidis in #77 (comment)

[consensus] Explicitly order batches coming from the block waiter

Description

The current block waiter fetches transactions' data in batches in an implicit (but deterministic) order. We should make this order explicit and choose a way to order them. We should also think of the best place to order those batches.

[DB API] Make a cross CF BatchWrite

The async typed storage API is very nice and we can use it in other projects too. It would be nice to provide a WriteBatch functionality that allows for multiple writes / deletes potentially across many Column Families to be executed atomically. This is low priority since for the moment we do not need this feature in Narwhal / Tusk.

[get_collections] fetch missing collections - implement must-block protocol for certificates

Story

Parent story: #67

Description

When the get_collections request is received, it is possible for some of the requested collections to be missing from the validator node that servers the request. On the first iteration we have handled this by emitting back an error message that says the collection is missing. However, this is not acceptable as it can block the consensus protocol. Legitimate and valid certificates that are missing from a validator node should be fetched instead and populated internally before responding back to the get_collections request. If there is no way to retrieve them, then an error will be returned.

On this design doc 4 different options have been presented. We are going to implement option 1 for now:
https://docs.google.com/document/d/1DpwOPgI-YSIlt2SrQaT3ioF6G-rlNJoYtHls3eZogkw/edit#heading=h.rgoyj9yf0kb8

As part of this task we should focus on implementing the want-block protocolling for fetching the missing certificates. Syncing the missing batches is part of another issue. The blocker_waiter should be extended to:

detect the missing blocks
start the must-block protocol to fetch the missing certificates
validate the fetched certificates. As duplicates are expected to be received, we care to evaluate and process the first valid one, discard the rest.
keep and store the valid (unique) certificates (store in certificate_store)
put a placeholder to start the process of syncing the batches

It has to be noted that we should allow fetching the batches for the certificates that have already be found to ensure that we can work in parallel while we are fetching the missing ones.

Narwhal + Tusk : block proposals under a size limit

Most blockchains have a mempool size limit which they use as backpressure.

This is necessary because as a point of decentralization most blockchains' block are bound by a size limit and a block going over the limit is considered invalid.

Tusk commits, through

the graph walk they consist of
the lack of a limit on individual block sizes,

can consist of a large amount of blocks, and, transitively, transactions. Even if we assumed we had a way to price those transactions in gas ahead of time (which often requires an execution attempt "in real life"), we do not have a way to present the contents of the mempool under a certain size limit. Would it be possible to design one?

/cc @gdanezis

[tusk] disable tusk via command line argument

Story: #74

As part of this task we should be able to disable Tusk and just run a primary node without any consensus algorithm. We should also try to run a test how the server will respond. A command line argument should be introduced that will enable/disable the Tusk (consensus). When disabled, the produced certificated should still be consumed but basically not processed.

[safety] Reject high rounds in block headers

The sanitize_header function processes all headers received by a primary:

narwhal/primary/src/core.rs

Lines 328 to 340 in 03e91f8

    
           fn sanitize_header(&mut self, header: &Header<PublicKey>) -> DagResult<()> { 
        
               ensure!( 
        
                   self.gc_round < header.round, 
        
                   DagError::TooOld(header.id.clone(), header.round) 
        
               ); 
        
               // Verify the header's signature. 
        
               header.verify(&self.committee)?; 
        
               // TODO [issue #3]: Prevent bad nodes from sending junk headers with high round numbers. 
        
               Ok(()) 
        
           }

It should reject blocks with a claimed round much higher than the current block round.

This block round is managed using an AtomicU64 in the core's fields:

narwhal/primary/src/core.rs

Line 44 in 03e91f8

consensus_round: Arc<AtomicU64>,

A simple but large integer interval between a received header's claimed round and the current round should be enough to avoid spam (which is the main goal of this issue). Determining what "large enough" looks like will require analyzing how fast the block round evolves over time. The idea is that a small gap in round numbers is acceptable (though the header will not be voted on, it is not invalid, and post verification should contribute to an increase of the node's own round number).

The error produced here should clearly indicate the cause of the rejection, and produce a new variant in DagError:

narwhal/primary/src/error.rs

Line 27 in 03e91f8

pub enum DagError {

Indeed at worst, this could indicate the local authority is an out-of-sync node that hasn't caught up with the network.

It should also emit logs, ideally with context allowed since the switch to tracing (#26).

[get_collections]: improve the endpoint to receive a list of digests

Story: #67

On the first iteration we were aiming to retrieve only one collection for simplicity. As part of this task we want to improve the functionality and make the endpoint receive a list of digests (collection ids) and return back the corresponding collections. That will ensure we save roundtrips and improve the overall performance.

Draft API for execution layer

Expose an API for Narwhal to communicate with an external execution layer. This API should be general enough to support a wide variety of execution layers, but two known clients are:

Cosmos, which uses ABCI (application-blockchain consensus interface). ABCI is probably a good starting point for Narwhal to mimic.
Celo, whose relevant interfaces are here

One important factor is the interaction with XO vs. OX models of executions (see: https://arxiv.org/abs/1906.11229 for toponymy)
See Google Doc

Block remover test failures on DB creation

Description

The block remover tests may execute concurrently and create databases conflicting for the same path

One solution would be to make the tests operate in different crates using https://crates.io/crates/tempfile or whatever we use in the code base already.

Steps to reproduce

Run the block remover tests


running 2 tests
test src/block_remover.rs - block_remover::BlockRemover (line 94) ... FAILED
test src/block_waiter.rs - block_waiter::BlockWaiter (line 122) ... ok

failures:

---- src/block_remover.rs - block_remover::BlockRemover (line 94) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'Failed creating database: RocksDBError("IO error: While lock file: /tmp/LOCK: Resource temporarily unavailable")', src/block_remover.rs:25:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/ee915c34e2f33a07856a9e39be7e35e648bfbd5d/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/ee915c34e2f33a07856a9e39be7e35e648bfbd5d/library/core/src/panicking.rs:143:14
   2: core::result::unwrap_failed
             at /rustc/ee915c34e2f33a07856a9e39be7e35e648bfbd5d/library/core/src/result.rs:1785:5
   3: core::result::Result<T,E>::expect
   4: rust_out::main::{{closure}}
   5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   6: <core::pin::Pin<P> as core::future::future::Future>::poll
   7: tokio::runtime::basic_scheduler::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
   8: tokio::coop::with_budget::{{closure}}
   9: std::thread::local::LocalKey<T>::try_with
  10: std::thread::local::LocalKey<T>::with
  11: tokio::runtime::basic_scheduler::CoreGuard::block_on::{{closure}}::{{closure}}
  12: tokio::runtime::basic_scheduler::Context::enter
  13: tokio::runtime::basic_scheduler::CoreGuard::block_on::{{closure}}
  14: tokio::runtime::basic_scheduler::CoreGuard::enter::{{closure}}
  15: tokio::macros::scoped_tls::ScopedKey<T>::set
  16: tokio::runtime::basic_scheduler::CoreGuard::enter
  17: tokio::runtime::basic_scheduler::CoreGuard::block_on
  18: tokio::runtime::basic_scheduler::BasicScheduler::block_on
  19: tokio::runtime::Runtime::block_on
  20: rust_out::main
  21: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.



failures:
    src/block_remover.rs - block_remover::BlockRemover (line 94)

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 17.37s

error: test failed, to rerun pass '--doc'

[virtual dag] Preserve the utility of the `read_causal` after a graph cut

Epic

We would like to make sure the customer can implement a priority-based block proposal.

This is described in detail in this talk: https://drive.google.com/drive/u/1/folders/1xf8wJdQTsN_nvlMMOOGztIDdg81EAatJ

Description

we would like read_causal to make complete graph walks in between two remove_collections calls (the later signals committed data that should move off the mempool),
the remove_collections call therefore needs to remove suffixes of the graph, to avoid graph cuts,
this forces block proposals to be FIFO in the first place.

Solution

we should have a remove_collections call that remove the storage of transactions but keeps the headers,
this may make header storage in the DAG grow arbitrarily,
we would like to walk the DAG more efficiently, with a "path compression" feature: any path that goes "through" a removed collection should "jump" it and go straight to the next node.

[non-synthetic client transactions]

At the moment, Narwhal only verifies its own transactions and votes.

This is problematic because in production the payload of those messages is itself a set of transactions (whether received through a mempool or directly by workers), and those should at minima have their signatures verified before hitting the proposer.

We should support the following as user signatures:

for Sommelier, secp256k1, ed25519 and multi-signature (ref)
for Celo, it's meant to support Eth equivalence, i.e. secp256k1, to check:¹
https://github.com/celo-org/celo-blockchain/blob/e0c433849e3e6bfe32a421fd8dc05372286ba6d3/crypto/crypto_test.go

But that also includes subtleties like EIP-155

(see also CIP-35)

reminder: go has a single type, ECDSA, to represent public keys on both secp256k1 and secp256r1, and dynamically injects the correct verification impl at runtime. ↩

[get_collections] fetch missing collections - implement batch sync

Story

Parent story: #67

Description

Following the implementation of fetching the missing certificates from peer nodes #97 , as part of this task we should focus on syncing the missing batches. The approach that should be followed is the one described in Option 1 on the design doc.

We should reuse the existing mechanism to sync batches but with the difference that we'll extend the existing solution to dictate the peer from which we want to fetch the batches from, as on the current solution the batch is fetched via the original author of the header. Also, as part of this ticket we should also extend the block_waiter to wait for the batches to be synced before going ahead and requesting them to eventually fulfil the request.

Missing crypto "lego blocks"

We already have identified two missing bits of our cryptographic inventory story:

we need to abstract signature and verification in a scheme-agnostic manner (#7)¹
we need to have a leader election mechanism for Tusk, which looks like a random beacon, as far as the paper is concerned (#10)

This umbrella issue is to map out the inventory of the cryptographic building blocks we need as a minimalistic MVP of Narwhal:

a slack discussion hinted that a VRF might be sufficient for leader election rather than a full-on random beacon,
what should we do w.r.t aggregation and / or batch verification, to make sure we have "pipes" (code paths) that exploit those primitives where available, (low prio, may be seen as an extension of #7 )
how does that interact with quorum changes?

The lighter our "footprint" the better: an anti-example of this design task would be to require a random beacon fed by a Joint-Feldman DKG, which:

is a synchronous algorithm that's implemented on an asynchronous network, with large delays "simulating" the synchronicity,
this comes at a huge operations cost in dealing with a public bulletin board, emulating time delays with the chain, etc.,
if the customer doesn't run this already, they may not be happy being tied in hours of baby-sitting the DKG in Discord channels w/ their node runners
JF only secure to feed re-keyable signature schemes which may not mesh well with any customer that cannot implement a pairings-based scheme.

this in fact maps the "forever" constraint that in so far as we integrate Narwhal/Tusk in other chains / code bases, we'll need to operate within their validators' crypto tooling environment. ↩

[chore] Update all copyright headers to include year + new owner

They should all be updated to include (first line only where a pre-existing stanza referring to FB exists):

// Copyright(C) 2021, Facebook, Inc. and its associates
// Copyright(C) 2022, Mysten Labs 
// SPDX-License-Identifier: Apache-2.0

Assume batches can be stored on workers with ids other than the author's

On the current implementation when a worker forms a new batch it broadcasts it to the worker nodes of the same id of other validators. That assumption is propagated across the codebase - meaning that irrespective of which primary node we query, we assume that a specific batch will be found to the worker of same id (which is basically dictated by the propagated header)

The above assumption is making scaling quite restrictive and eventually we want it removed. On our network our primary node should keep the mapping it self to which local worker a batch has been stored and basically decouple the storying/query semantics from the ones that the author produces (worker id). It has to be noted that a batch is identified by two parameters: the batch digest and the worker_id (author). We still want to be able to identify batches based on both those two properties, but for our primary node we should be able to translate that to which local worker the batch has been stored.

[consensus] Limit the number of subscribers at the network level

Story

This task is part of the light client story #95

Description

Right now only the subscriber core limits the number of subscribers. Our network is however a nice DoS target.

Component restarts for our code bases

This is a sister issue of MystenLabs/sui#382 (which is cross-repos)

[consensus] Light client support

Epic

We would need some light client support to integrate Narwhal into other products. That is, allow other systems to use Narwhal & Tusk as a black-box sequencer.

Description

A (crude) example of what would be nice to have is the following: https://github.com/MystenLabs/sui/blob/5b112c50eb326c29b2e9db8bf483cc4c638fdf07/test_utils/src/sequencer.rs#L247

[consensus] Error dialogue with light clients

Story

This issue is part of the light client story #95

Description

The subscriber core can handle a fixed (maximum) number of light clients. Right now, when this maximum is reached, additional light clients' connections are simply refused. It would be nice to instead send them back an error message.

[construct identifiers] form an identifier by using arbitrary data

Description

On various places in our codebase we have the need to construct request/look up keys that are based on some other arbitrary (but deterministically defined) data. One of the examples is on the block_remover component the construct_request_key method which forms a key based on provided block (certificate) ids. Those ids are used to form a unique request id which is further used on map look ups. This is now done in a bit naive way by just concatenating the id bytes, which is not ideal since:

it leads to variable length keys
it can lead to quite big keys

As part of this task we want to define a way of creating more concise keys.

Note: Ideally we would like to implement a proper request/response pattern so we don't have to handle this logic on the core application level.

[get_collections] introduce endpoint to retrieve the collections

Epic: #44

In order to allow our customers to integrate with Narwhal, one of the endpoints that we would like to introduce is the get_collections. As part of this story we are looking to expose an endpoint that will allow our clients to retrieve the transactions for a list of given headers / certificates. We'll call this a collection.

Be able to deploy Narwhal in a "staging" environment

Description

Currently we do have a way to deploy Narwhal to Amazon and run a benchmark, which is definitely great and helpful. However, we are lacking the ability to have a constantly running network of Narwhal nodes in a way that "staging" (or development) environments do in a more traditional enterprise setup. Ideally, we would like to deploy in a (or a set of) AWS Amazon nodes a few Narwhal nodes in a primary-worker setup and let them run uninterrupted. Ideally we would like to have a nice CI/CD way to deploy after merging to a staging branch - or for simplicity we could just do it after the main branch.

Why is this important?

Currently we have no way to tell whether Narwhal is working correctly when left running for long periods (e.x weeks)
Reveal issues/problems when up and running
Have a deployment for developers to test the nodes in a real deployment (not just via our integration tests)

Approach
On its simplest form we would like to be able to trigger a (re)deployment after a CI/CD run on a main or staging branch. I am aware that things can get quite complex and certainly we are not looking for SRE level of work here, but something simple enough and robust that will allow us to deploy Narwhal in AWS Amazon. I would suggest looking into the following direction (some thoughts):

Use Terraform to ensure that whatever we build is reproducible and track changes. Highly recommended
Build Docker images for our Narwhal node that will allow us to run either as primary or worker
Integrate the image building process as part of our CI/CD to push images on AWS ECR
Choose something like ECS to deploy the nodes. Alternatively, if we want to really abstract the deployment and node management process , we can use something like AWS Fargate
Start with a simple setup of primary --> workers node. We can expand to multiple primaries network as follow up
Be able to see logs in Cloudwatch would be an extra bonus here

Issue list:

#211
#212

[remove_collections] create module to orchestrate and remove collections on primary

Story

Parent story: #70

Description

To remove collections from a Narwhal node, for each collection the following should happen on primary (ignore order for now):

delete the certificate from the certificates_store
delete the header from the headers_store
delete the batch references from the payload_store
trigger requests to downstream worker nodes to delete the batches from their internal batch_store

to achieve the above a mechanism / module should be build to orchestrate the deletion of all those resources. As part of this task we should focus mainly on the part of orchestrating the deletion of batches from the worker nodes. On high level the primary should:

retrieve the collections one by one from its internal store
identify the batches included into the header
send messages to delete the batches to the corresponding worker nodes
gather the replies and identify a successful deletion per collection

then the results can be used on further processes to perform deletions on the primary node storage it self.

Dealing with the actual deletion of the batches from the worker nodes should be addressed on different issue.

[remove_collections] receive and handle messages to delete batches on workers

Story

Parent story: #70

Description

Following the work on #80 , as part of this task the mechanism should be built in Worker nodes to actually delete from their internal store batch_store, the batches dictated by the received message. The worker node should reply back to the primary with the successful response or error of the operation.

It has to be noted that the operations on the storage should be atomic to guarantee that either all or none will succeed (see multi_remove )

Unify consensus testing facilities

Description

The consensus crate is growing and there are now a number of testing facilities spread across the unit test files (e.g., consensus_tests.rs, subscriber_tests.rs, etc). These testing facilities are used multiple times in various test files (not only in the one where they are defined). It would great to group all facilities into a single file common.rs.

Smarter subscriber

Story

This task is part of the light client story #95

Description

The way the subscriber works increases the latency of commit since we wait for the consensus to output in order to start downloading batches at the executor. Can't we use the DAG to start earlier than consensus and ideally be ready to execute the moment the consensus outputs? From @LefKok

[remove_collections]: introduce endpoint to remove the provided collections from Narwhal

Epic: #44

In order to ensure that:

Our customers will only receive collections that haven't already consumed and proposed (or discarded)
We reclaim space in Narwhal nodes
we are looking to introduce an endpoint that will allow a validator to remove the specified collections both from the internal DAG and the data store. At the moment more details can be found on the doc here.

The endpoint on high level will look like:

remove_collections(collection_refs: Vec<CollectionDigest>) -> Result((), Error)

where the collection_refs is a vector with the collection digests that should be removed.

[Narwhal mempool integration] Draft API for consensus

Our customers want to integrate with Narwhal as a mempool first, which requires their consensus modules, co-located with a primary, to be able to query the mempool and update it with the outcome of consensus. Note: in this context, we call Narwhal blocks "collections" (because the consensus has a lock on the word block).

We have identified this will require at least the following API, exposed by Narwhal to consensus:

read_causal( bd: Digest ) -> Vec<Digest> takes a collection digest and returns the collection digests encountered in a BFS of the graph, up to a commit boundary (see also MystenLabs/sui#5187 for the current implementation of this walk).
get_collection(Digest) -> Vec<Transaction> takes a collection digest and returns the transactions contained in the collection.
drop_causal_read( bd: Digest ) takes a collection digest and registers it to update the "commit boundary" of the aforementioned BFS, thereby making sure the collections in question are no longer considered valid for proposal. This should be called by the co-located validator's consensus module when voting for such a proposal.
drop_collections( bd: Vec<Digest> ) takes a collection digest and drops the contents of the collection from storage. This should be called by the co-located consensus module when witnessing a commit, and thereby enshrining the content of a block (proposed beforehand through collections) into some ledger storage.

The present umbrella epic is about building the API in question. It involves three broad areas of work:

exposing the functionality through an IPC front-end with RPC,
accessing and managing collection data, which transactions are batched and stored at workers, from the primary. (see also MystenLabs/sui#5184)
building a cache functionality on the primary to make the cost of external requests manageable.

More details here:
https://docs.google.com/document/d/1BuD7JaZxes8UtWOJZd8D2ZJkcj7tqYSj4601ffuceTg/edit?usp=sharing

[collection] worker nodes process RequestBatch messages and respond

Following the work on the PR , we should introduce the necessary functionality to the worker nodes in order to be able to process the received RequestBatch messages and respond back to the primary with the results.

The RequestBatch message (received over the TCP communication) is basically dictating to a worker node to retrieve from its internal store a batch by the provided id. Then the worker should respond back to the primary node with a RequestedBatch piggy backed with the transactions fetched from the retrieved batch.

If the requested batch has not been found (although not really expected this to happen), then the worker should respond with an error of batch not found.

	let header_store = Store::new(
	rocks::DBMap::<Digest, Header<Ed25519PublicKey>>::reopen(&rocksdb, Some("headers"))
	.expect("Failed keying headers database"),
	);
	let certificate_store = Store::new(
	rocks::DBMap::<Digest, Certificate<Ed25519PublicKey>>::reopen(
	&rocksdb,
	Some("certificates"),
	)
	.expect("Failed keying certificates database"),
	);
	let payload_store = Store::new(
	rocks::DBMap::<(Digest, WorkerId), PayloadToken>::reopen(&rocksdb, Some("payload"))
	.expect("Failed keying payload database"),
	);
	let batch_store = Store::new(
	rocks::DBMap::<Digest, Vec<u8>>::reopen(&rocksdb, Some("batches"))
	.expect("Failed keying batch message database"),
	);

	pub trait EncodeDecodeBase64: Sized {
	fn encode_base64(&self) -> String;
	fn decode_base64(value: &str) -> Result<Self, eyre::Report>;
	}

	// The Base64ct is not strictly necessary for (PubKey\|Signature), but this simplifies things a lot
	impl<T: ToFromBytes> EncodeDecodeBase64 for T {
	fn encode_base64(&self) -> String {
	base64ct::Base64::encode_string(self.as_bytes())
	}

	fn decode_base64(value: &str) -> Result<Self, eyre::Report> {
	let bytes = base64ct::Base64::decode_vec(value).map_err(\|e\| eyre!("{}", e.to_string()))?;
	<T as ToFromBytes>::from_bytes(&bytes).map_err(\|e\| e.into())
	}
	}


	// Store the certificate.
	let bytes = bincode::serialize(&certificate).expect("Failed to serialize certificate");
	self.store.write(certificate.digest().to_vec(), bytes).await;

	fn sanitize_header(&mut self, header: &Header<PublicKey>) -> DagResult<()> {
	ensure!(
	self.gc_round < header.round,
	DagError::TooOld(header.id.clone(), header.round)
	);

	// Verify the header's signature.
	header.verify(&self.committee)?;

	// TODO [issue #3]: Prevent bad nodes from sending junk headers with high round numbers.

	Ok(())
	}

mystenlabs / narwhal Goto Github PK

narwhal's People

Contributors

Stargazers

Watchers

Forkers

narwhal's Issues

Story

Description

Story

Description

Story

Description

Story

Description

Story

Description

Description

Story

Description

Description

Steps to reproduce

Epic

Description

Solution

Footnotes

Story

Description

Footnotes

Story

Description

Epic

Description

Story

Description

Description

Description

Issue list:

Story

Description

Story

Description

Description

Story

Description

Recommend Projects

Recommend Topics

Recommend Org