dusk-network / dusk-blockchain Goto Github PK

Reference implementation of the DUSK Network node, written in Golang

License: MIT License

Makefile 0.63% Shell 1.22% Go 98.00% JavaScript 0.15%

dusk-blockchain's Issues

Add CONTRIBUTING.md file to repository root

We should add a file with contribution guidelines, to make it easier for other developers to contribute code to our repository.

Automate startup of node, with regards to consensus

With both the provisioner component and the generation component not being dependent on a specific hash, and with both now able to update their values when a tx expires, we can make the node even more easy to use by automating the syncing, staking, bidding and starting of consensus components without any need for user input.

Some handy added features would be specifying a --wallet flag to start the node without consensus running, and adding some commands that can adjust default locktimes and values for automated transactions.

This enhancement should remove the need for node operators to consistently check if their transactions have expired and from sending new ones in due time by themselves, which is much less of a drag. This replaces #13, since a user can no longer accidentally start consensus components on the wrong round this way.

Repopulating committees and bidlists takes a long time if blockchain is longer than 250,000 blocks

On testnet nodes, an incredibly long delay was observed anywhere between 2-5 minutes to re-initialize a node after shutting it down, when the blockchain was longer than 250k blocks. This is due to the fact that bidlists and committees and kept seperately between components, meaning that there will be a good handful of committee stores and bidlists re-populating on startup, and also because none of this information is persisted on disk, and so the node needs to walk up the blockchain many times when it restarts.

configure Travis CI

The current repo is missing Continuous Integration functionality.

We are going to integrate Travis CI into our toolchain.

Integrate GitHub API with the Dusk Testnet Deployer tool

The Dusk Testnet Deployer tool is using hardcoded repos/branches on the old GitLab repositories.
We need to switch to GitHub, using their API for deployment operations.

Don't allow user to call `startprovisioner` or `startblockgenerator` when still syncing

As discussed in #7, if a user calls startprovisioner while syncing, it could lead to their provisioner components starting on an obsolete round, and getting stuck with no proper feedback for the user. The node should disallow the use of startprovisioner until we are sure the node is at the tip of the chain.

Subtract 1 DUSK when a node is extracted with deterministic sortition

The current version of deterministic sortition does not decrease the available stake of a node one or more DUSK of which has been already extracted for the committee. The deterministic sortition procedure has to be upgraded to subtract 1 DUSK from the stake available during an instance of sortition procedure when a DUSK under the control of the node is extracted for the committee.

Consider exposing wallet commands as an API

Currently, the entire wallet interaction goes through CLI commands. A client application is not capable of creating a transaction, getting balance and so on.
An example: To enable test-harness, rpc package is calling cli.SendBidTx directly as a temporary patch.

Proposal:

Wallet commands can be moved to a separate package so that both CLI and RPC packages can use them via RPCBus.

Other option could be to deprecated CLI entirely and build RPC API only for interaction with the wallet.

Let me know your thoughts on that.
@julesdesmit @autholykos

Help nodes falling-behind to catch up

Due to some unexplained issue, some nodes during deployment can fall behind and never sync up with the rest of their peers. My hunch is that it is due to some nodes not accepting every block after finishing consensus, which ends up in a gap in their blockchain, and then when someone asks to sync with them, they only supply about half the blocks and leave the requesting node hanging. Need to do more testing to confirm this.

To quote @toghrulmaharramov

we have to work on formalizing a bootstrapping model for the nodes that have fallen behind for less than n blocks. Unfortunately, due to the complexity of Reduction phase, which muddles up the bootstrapping process, the catch-up process has to be made more complex than on rival protocols. I have a few ideas on how the problem could be elegantly curtailed

Consensus is not initialized when running the network from genesis block

When starting a local network from scratch (for testing purposes) it seems no nodes are running the consensus.

According to the current implementation, msg.InitializationTopic is published in case of topics.AcceptedBlock event only which is totally fine for updating the consensus module with the chain tip.

However, in case of starting the network from genesis block, wouldn't the network get stuck as no node is able to generate block at height 1?

@julesdesmit The below patch workarounds this but need your expertise here, if I'm missing something?

+++ b/pkg/core/consensus/initiator.go
@@ -35,6 +35,13 @@ func GetStartingRound(eventBroker wire.EventBroker, db database.DB, keys user.Ke
        // Start listening for accepted blocks, regardless of if we found stakes or not
        acceptedBlockChan, listener := InitAcceptedBlockUpdate(eventBroker)
 
+       if currentHeight == 0 {
+               roundBytes := make([]byte, 8)
+               binary.LittleEndian.PutUint64(roundBytes, 1)
+               eventBroker.Publish(msg.InitializationTopic, bytes.NewBuffer(roundBytes))
+               return nil
+       }
+

Fix intermittent failure of BLS keypair generation

When calling user.NewRandKeys or user.NewKeysFromBytes, if the rand.Reader, or any other seed, runs out of bytes before being able to generate a BLS keypair, it returns an EOF error. This can be easily mitigated by adding some logic to retry with a new seed in this case. Additionally, this should take care of an intermittently failing test.

Update bid parameters upon expiration

When a bid expires, bidder should be able to modify the amount bidden D and the secret K in object to the proof of knowledge

Refactor eventfilter and accumulator packages defensively

Current way the accumulator and eventfilter prevent race conditions is by making heavy used of state management in the accumulator. It is possible to refactor the packages defensively to keep the same resilience against race conditions but without excessive locking.

Error fetching block from node db: database: block not found

After testing the node near height 13903 as described in issue #3, it is not possible to sync anymore because of a "block not found" error.

Jules on Telegram:

will be pushing a fix for this shortly - we did have to roll back a couple blocks on the blockchain in order to ensure smooth syncs for the updated nodes. there was a pretty nasty bug found with certificate discrepancies between nodes, identified last night and patched preliminarily on the other nodes. once the fix is pushed on the github, i'd advise to rebuild and re-sync from scratch

Fix race conditions in monitor_test.go

Our CI has highlighted a few race conditions in the monitoring test file.

Encapsulate event bus and RPC bus into their own seperate packages

As either of these components do not share anything, it would make a bit more structural sense to have these encapsulates into their own packages. So, wire.NewEventBus() and wire.NewRPCBus() become eventbus.New() and rpcbus.New().

Refactor `encoding` package to increase efficiency

Currently there is room for improvement in passing by reference vs passing by value, especially in hotter parts of the code using this package (such as encoding and decoding events, transactions and big blocks).

Implement multiple extraction from deterministic sortition

Currently a node can only be extracted from the committee once per step.

A custom type for hash with a predefined String() method

Yes. Maybe it would make sense to use a custom type for hash with a predefined String() method that Hex encodes the []byte?

Originally posted by @autholykos in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMTkyMTk1OTc4OnYy/pull_request_review_threads/discussion

Remove global channels in RPCBus source file, and find another way to implement them

The RPCBus had it's channels implemented as global variables due to simplicity (according to @ggoranov). However, this approach can be quite unsafe, as there is no guarantee that the correct component is reading a request - even if the intent is to only have the channel strictly between two components, this is not mentioned anywhere (as far as I can tell, if I missed it please let me know), and in such a case it would still be better to return it from a function call or something similar, so to not make that particular channel available to the rest of the code since it shouldn't be shared outside of those components in the first place. Anybody not properly familiar with this code could easily make a mistake that way.

Clarify the effects of `startprovisioner`

It is not so intuitive that you can execute startprovisioner without staking. According to the documentation it is necessary to stake.

"stake": Stake a given amount of DUSK, to allow participation as a provisioner in consensus. Make sure to use the sync command first.

What is actually happening when you run startprovisioner without staking?

Is it a deliberate choice to let users sync manually instead of syncing automatically? What is the risk of staking without syncing?

update modules to make use of the most recent crypto package

This is a temporary patch to a more decisive solution to dependency updates on release branches. To be able to exploit the performance improvements of dusk-network/dusk-crypto#7, go.mod must be updated.

Import crypto from dusk-network/dusk-crypto rather than subtreeing it directly in the project

Dusk-blockchain uses the crypto package embedded directly in the codebase. It should instead import dusk-network/dusk-crypto and treat it as an external package

1 tDusk missing from wallet balance

After testing the blockgenerator, 1 tDusk is missing from my wallet balance.

$ ./testnet 
initializing node...
initialization complete. opening console...
> createwallet [mypassword]
Wallet created successfully!
Public Address: [myaddress]
> sync
Syncing wallet... (13791/13791)
Found 0 spends and 1 receives
> balance
Balance: 10.00000000
> stake 8 100 [mypassword]
hash: [somehash]
> sync                
Syncing wallet... (13804/13804)
Found 0 spends and 1 receives
> startprovisioner
provisioner module started
to more accurately follow the progression of consensus, use the showlogs command
> sync
Syncing wallet... (13807/13807)
Found 0 spends and 0 receives
> sync
Syncing wallet... (13808/13808)
Found 0 spends and 0 receives
> bid 2 100 [mypassword]
hash: [txid]
> startblockgenerator [txid]



balance
              

stop

After waiting a few hours I exited the console with CTRL + C because the console would not return control (see #2). Coincidentally the test monitor was stuck at height 13903.

I restarted my node to check whether I could reproduce the error. Unfortunately I did not manage to sync, but I also noticed 1 tDUSK is missing.

$ ./testnet 
initializing node...
initialization complete. opening console...
> balance
please load a wallet before trying to check balance
> loadwallet [mypassword]
error attempting to load wallet: cipher: message authentication failed
> sync
please load a wallet before trying to sync
> loadwallet [mypassword]
Wallet loaded successfully!
Public Address: [myaddress]
> balance
Balance: 9.00000000
> balance
Balance: 9.00000000
> sync    
Syncing wallet... (13903/13903)
Found 0 spends and 1 receives
> balance
Balance: 9.00000000

Parallelize transaction marshalling and unmarshalling in blocks

This is currently done sequentially, however as the ordering currently does not matter, we should be able to parallelize this, and cut quite some time off of encoding and decoding blocks.

Timer does not exceed 20 seconds

Currently, in the IncreaseTimeout function for the consensus.Timer, t.timeOut is evaluated against t.baseTimeOut*2^10. Golang reads the ^ symbol as a bitwise XOR, and not a power. This should be changed, as the timer needs to keep increasing until it has doubled 10 times.

Simplify `Reducer` algorithm by eliminating unnecessary asynchronous processing

This is somewhat related to #21 albeit more focused on a general simplification of the Reducer structure.

The Reducer component is supposed to perform operations that are quite sequential in nature. A (grossly oversimplified) layout of its duty is the following loop:

wait for a quorum of (valid) reduction events to be reached or a timeout to be triggered
validate the winning candidate block
propagate the 1st step reduction
increment the step
wait for a quorum of (valid) reduction events to be reached (or a timeout to be triggered)
propagate the result

Such a layout might be better modeled with 2 separate sync.WaitGroup instead of overusing channels.

Call sync command automatically when doing anything wallet-related

As pointed out, needing to sync manually any time a wallet operation is performed is quite cumbersome. This should be integrated into the wallet functions themselves, instead.

Confusing description of locktime in README.md

In README.md locktime is confusingly described as an amount of blocks that must exceed the block height. That means locktime must be at least as long as the age of the chain, which seems excessive and impractical.

[locktime] stands for the amount of blocks for which the bid is locked (block_height < locktime < 250000)

Additionally the maximum locktime is 250,000 blocks, which at the current average blocktime of 7 seconds is 20 days. After 20 days the block height will pass 250000, at which point there is no value for locktime which would satisfy block_height < locktime < 250000.

Unify transactions and block structures between `core` and `wallet` packages

This would save us from duplicate code, and conversion issues when the wallet package is externalized.

Dusk

Very good

Move `key` package out of dusk-blockchain, and import it from dusk-wallet

The key package is currently located in the crypto folder. It is duplicated in dusk-wallet, so we should import it from there to have only one copy of this package.

Remove `crypto` package

The crypto package is still left in the repo, carrying only an unfinished README document. It should be deleted, as we are now importing it from an external repo.

Use a concurrent strategy to speed up BLS Public Keys and Signature aggregation

Benchmarks conducted on dusk-network/dusk-crypto#14 show a substantial performance improvement when parallelizing the aggregation operations. Given the criticality of such aggregation when creating Agreement messages, it would be good to apply the concurrent strategy to minimize event production time and avoid slacking behind when in terms of steps (or even round) just because Agreement creation is slow.

Following function should serve as an inspiration for exploiting concurrency when aggregating Signature:

func aggregateSignatures(sigs []*Signature) *Signature {
    const W = 4
    C := len(sigs)

    M := C / W
    R := C % W

    if C < M {
        return aggregateSignaturesSequential(sigs)
    }

    var wg sync.WaitGroup
    var last *Signature

    // goroutines will fill this slice concurrently
    sgs := make([]*Signature, W) 

    wg.Add(W)
    // w workers for n couples
    for w := 0; w < W; w++ {
        // each goroutine will execute a sequential Aggregation of M signatures
        lo, hi := w*M, (w+1)*M 

        go func(i int, set []*Signature) {
            sgs[i] = aggregateSignaturesSequential(set)
            wg.Done()
        }(w, sigs[lo:hi])
    }

    // aggregating the remaining chunks not fitting in the worker count
    if R > 0 { 
        last = aggregateSignaturesSequential(sigs[C-R:])
    }

    wg.Wait() // waiting for all the goroutines to be done

    if last != nil {
        sgs = append(sgs, last)
    }

    return aggregateSignaturesSequential(sgs)
}

func aggregateSignaturesSequential(sigs []*Signature) *Signature {
    s, sigmas := sigs[0], sigs[1:]
    for _, sig := range sigmas {
        s = s.Aggregate(sig)
    }
    return s
}

Add more fine-grained logging to enhance debugging

Anything not covered by the monitoring system or current logging entries, should be added to get an as-complete-as-possible view of the node when problems arise.

Pass buffers in the Event Bus by value

Currently, to avoid sharing the same pointer, we copy the buffer being sent over the event bus for each handler in any given map of handlers. We can instead pass these buffers by value. This would remove the need for the copyBuffer function, and most likely make the dissemination of event buffers much more efficient.

Connection resets during sync process

While sync'ing some occasional connection reset and timeouts are observed.

Remove MLSAG package, and import it from dusk-crypto instead

As MLSAG should be part of dusk-crypto, this package should be removed from the repo, and our module import of dusk-crypto should be updated to a version which includes MLSAG

Fix race condition between round updates and agreement event creation

As observed on the testnet, a possibility exists for the node to unintentionally offset it's agreement component step counter due to the agreement.broker.Listen goroutine being entirely seperate from the invocation to agreement.broker.sendAgreement (which happens as a callback from the reduction component). As aggregating the reduction events into an agreement event takes some amount of time (I believe it's a few hundred ms), there's an opening for the round update to happen in the midst of this. After the aggregation is complete, and the state is updated, the deferred call to broker.state.IncrementStep is executed, and the node then starts off that round with an agreement step counter that is ahead by one.

As a result, any events that are created next, will be invalid. This happens because the committee bitset that gets generated is dependent on the broker.state.Step, meaning that for reduction events created on step 1 and 2, the agreement event contains a bitset corresponding to these events in the committees from step 3 and 4. This causes an inconsistency in the network and, if enough nodes are affected on the same round, it can halt consensus entirely since too many nodes will not be able to generate proper agreement events, regardless of how many times they try.

This bug was reproduced in a test, which will be included in the PR to fix this. I believe a simple fix would just be to invoke the sendAgreement function through a channel, instead of a callback, which would synchronize it with the occurrence of round updates, thus keeping the step counter in it's correct place at any time.

Automate Soundness Testing

A test harness should allow us to automate E2E tests on top of fully configured and functioning testnet to avoid regressions, especially on race conditions. An overview of E2E test setup could include:

Configuring network state
Bootstrapping the network (dusk-node,blindBid,voucher, monitor services)
Triggering actions of different kinds
- send RPC requests
- simulate wire events

Perform assertions to ensure correct state and timing are achieved by testing assertions that check:

persistent node state (blockchain and wallet stores)
non-persistent node state (mempool, eventBus, etc ..)
environment state (cpu, mem, hdd, network stats, etc)

Ideally, these testing could be run locally on demand but also on a regular base by CI implemented in #17.

Following pseudo-code should demonstrate the concept:

func TestSendTransaction() {

   duskNodeConfig := readConfig(basicRun-dusk-node.toml)
   
   // ImagePull  ...
   // ContainerCreate ...

    logPipe,dbPath,rpcAddr := cli.ContainerStart(duskNodeConfig)
    db :=  db.Open(dbPath)

   var expectedTx transactions.Tx
   rpcConn.SendRPC(rpcAddr, "SendRawTx", expectedTx)
 
  // Assert point: Ensure the transaction was accepted and stored
  for {
        attempts++
        time.Sleep( 10*time.Second)
        err := db.FetchBlockTxByHash(expectedTx.Hash)

         if err == database.NotFound {
             continue
         }

        if attempts > 6 {
            t.Fatal("expecting the transaction is accepted within 1 minute")
        }
    }
}

CLI wallet does not unlock after bid expiration

Example with bid of 2 and locktime of 100 blocks.

> loadwallet [password]
Wallet loaded sucessfully!
> sync
Syncing wallet.. (138088/13808)
> bid 2 100 [password]
hash: [txid]
>startblockgenerator [txid]

Immediately after the last command, the CLI does not respond to input.

After the locktime of 100 blocks has passed, the CLI is expected to return control, but remains non-interactive.

Entire bid list is cleared if a bid at index 0 is removed

Currently, if a bid at index 0 is removed, the entire bid list is cleared.

Can't build node from source

So no running on Windows for now / ever?

And where's the compiled program as mentioned in the install guide?

Thanks...

Chain committee is one round behind consensus committee during launch

Since a committee.Store is initialized on the chain tip height, and a consensus round is always one ahead of the tip, it can cause a discrepancy between the chain and the consensus committees.

For example, if the chain tip is at block 2000, then the round is 2001. Upon startup, consensus components run RemoveExpiredProvisioners with round 2001, so their committee is then one step ahead of the chain, who only calls this function when blocks are accepted.

As a result, this can cause the node to get stuck, since a forged block will have an invalid certificate according to the chain, even though it is correct from a consensus point of view.

Tx inputs are not added

Since the tx structures were unified, the StandardTx method was built to emulate the one from the old model, where it returns the Standard struct as a value. This will cause problems when creating transactions, as inputs are not saved on the correct Standard struct, and leaving the actual transaction bare.

Repopulating committees and bidlists takes a long time if blockchain is longer than 250,000 blocks

Move `wallet` out of dusk-blockchain, and import it as a module instead

Since dusk-wallet can be used as a library by different DUSK applications, it should be extrapolated out of dusk-blockchain and imported as a module.

Proof generator should have method to update values, instead of re-instantiating it when it expires

When a bid expires, we currently create a new proof generator as it is stored on the broker as an interface value, with no method to update the D value. However, this can cause quite some slowdown on the node, as it would have to reconstruct the bidlist, which can take (as observed on testnet) up to a few minutes. A much more efficient way to do this would be just to add a method to the Generator interface to update the D value directly, and make use of a boolean flag as mentioned by @ggoranov.

Agreement step counter does not increment on failed reduction cycle

This is a regression introduced by #63, since the collector will stop short any attempt to run sendAgreement if there is no vote set included in the message. The problem is that it should, to stay in sync with the reduction events it might receive, and so to create a proper committee for these events.

A fix will be pushed shortly.

Refactor event bus handlers

With the addition of the new ring buffer, the Handler interface now contains behaviour which is only needed for one specific implementation. This can probably be refactored to be a bit more accomodating.

dusk-network / dusk-blockchain Goto Github PK

dusk-blockchain's Issues

Recommend Projects

Recommend Topics

Recommend Org