dusk-network / dusk-blockchain Goto Github PK
View Code? Open in Web Editor NEWReference implementation of the DUSK Network node, written in Golang
License: MIT License
Reference implementation of the DUSK Network node, written in Golang
License: MIT License
We should add a file with contribution guidelines, to make it easier for other developers to contribute code to our repository.
With both the provisioner component and the generation component not being dependent on a specific hash, and with both now able to update their values when a tx expires, we can make the node even more easy to use by automating the syncing, staking, bidding and starting of consensus components without any need for user input.
Some handy added features would be specifying a --wallet
flag to start the node without consensus running, and adding some commands that can adjust default locktimes and values for automated transactions.
This enhancement should remove the need for node operators to consistently check if their transactions have expired and from sending new ones in due time by themselves, which is much less of a drag. This replaces #13, since a user can no longer accidentally start consensus components on the wrong round this way.
On testnet nodes, an incredibly long delay was observed anywhere between 2-5 minutes to re-initialize a node after shutting it down, when the blockchain was longer than 250k blocks. This is due to the fact that bidlists and committees and kept seperately between components, meaning that there will be a good handful of committee stores and bidlists re-populating on startup, and also because none of this information is persisted on disk, and so the node needs to walk up the blockchain many times when it restarts.
The current repo is missing Continuous Integration functionality.
We are going to integrate Travis CI into our toolchain.
The Dusk Testnet Deployer tool is using hardcoded repos/branches on the old GitLab repositories.
We need to switch to GitHub, using their API for deployment operations.
As discussed in #7, if a user calls startprovisioner
while syncing, it could lead to their provisioner components starting on an obsolete round, and getting stuck with no proper feedback for the user. The node should disallow the use of startprovisioner
until we are sure the node is at the tip of the chain.
The current version of deterministic sortition does not decrease the available stake of a node one or more DUSK of which has been already extracted for the committee. The deterministic sortition procedure has to be upgraded to subtract 1 DUSK from the stake available during an instance of sortition procedure when a DUSK under the control of the node is extracted for the committee.
Currently, the entire wallet interaction goes through CLI commands. A client application is not capable of creating a transaction, getting balance and so on.
An example: To enable test-harness, rpc package is calling cli.SendBidTx
directly as a temporary patch.
Proposal:
Wallet commands can be moved to a separate package so that both CLI and RPC packages can use them via RPCBus.
Other option could be to deprecated CLI entirely and build RPC API only for interaction with the wallet.
Let me know your thoughts on that.
@julesdesmit @autholykos
Due to some unexplained issue, some nodes during deployment can fall behind and never sync up with the rest of their peers. My hunch is that it is due to some nodes not accepting every block after finishing consensus, which ends up in a gap in their blockchain, and then when someone asks to sync with them, they only supply about half the blocks and leave the requesting node hanging. Need to do more testing to confirm this.
To quote @toghrulmaharramov
we have to work on formalizing a bootstrapping model for the nodes that have fallen behind for less than
n
blocks. Unfortunately, due to the complexity ofReduction phase
, which muddles up the bootstrapping process, the catch-up process has to be made more complex than on rival protocols. I have a few ideas on how the problem could be elegantly curtailed
When starting a local network from scratch (for testing purposes) it seems no nodes are running the consensus.
According to the current implementation, msg.InitializationTopic
is published in case of topics.AcceptedBlock event only which is totally fine for updating the consensus module with the chain tip.
However, in case of starting the network from genesis block, wouldn't the network get stuck as no node is able to generate block at height 1?
@julesdesmit The below patch workarounds this but need your expertise here, if I'm missing something?
+++ b/pkg/core/consensus/initiator.go
@@ -35,6 +35,13 @@ func GetStartingRound(eventBroker wire.EventBroker, db database.DB, keys user.Ke
// Start listening for accepted blocks, regardless of if we found stakes or not
acceptedBlockChan, listener := InitAcceptedBlockUpdate(eventBroker)
+ if currentHeight == 0 {
+ roundBytes := make([]byte, 8)
+ binary.LittleEndian.PutUint64(roundBytes, 1)
+ eventBroker.Publish(msg.InitializationTopic, bytes.NewBuffer(roundBytes))
+ return nil
+ }
+
When calling user.NewRandKeys
or user.NewKeysFromBytes
, if the rand.Reader
, or any other seed, runs out of bytes before being able to generate a BLS keypair, it returns an EOF error. This can be easily mitigated by adding some logic to retry with a new seed in this case. Additionally, this should take care of an intermittently failing test.
When a bid expires, bidder should be able to modify the amount bidden D
and the secret K
in object to the proof of knowledge
Current way the accumulator
and eventfilter
prevent race conditions is by making heavy used of state management in the accumulator. It is possible to refactor the packages defensively to keep the same resilience against race conditions but without excessive locking.
After testing the node near height 13903 as described in issue #3, it is not possible to sync anymore because of a "block not found" error.
Jules on Telegram:
will be pushing a fix for this shortly - we did have to roll back a couple blocks on the blockchain in order to ensure smooth syncs for the updated nodes. there was a pretty nasty bug found with certificate discrepancies between nodes, identified last night and patched preliminarily on the other nodes. once the fix is pushed on the github, i'd advise to rebuild and re-sync from scratch
Our CI has highlighted a few race conditions in the monitoring test file.
As either of these components do not share anything, it would make a bit more structural sense to have these encapsulates into their own packages. So, wire.NewEventBus()
and wire.NewRPCBus()
become eventbus.New()
and rpcbus.New()
.
Currently there is room for improvement in passing by reference vs passing by value, especially in hotter parts of the code using this package (such as encoding and decoding events, transactions and big blocks).
Currently a node can only be extracted from the committee once per step.
Yes. Maybe it would make sense to use a custom type for hash
with a predefined String()
method that Hex encodes the []byte
?
Originally posted by @autholykos in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMTkyMTk1OTc4OnYy/pull_request_review_threads/discussion
The RPCBus had it's channels implemented as global variables due to simplicity (according to @ggoranov). However, this approach can be quite unsafe, as there is no guarantee that the correct component is reading a request - even if the intent is to only have the channel strictly between two components, this is not mentioned anywhere (as far as I can tell, if I missed it please let me know), and in such a case it would still be better to return it from a function call or something similar, so to not make that particular channel available to the rest of the code since it shouldn't be shared outside of those components in the first place. Anybody not properly familiar with this code could easily make a mistake that way.
It is not so intuitive that you can execute startprovisioner without staking. According to the documentation it is necessary to stake.
"stake": Stake a given amount of DUSK, to allow participation as a provisioner in consensus. Make sure to use the sync command first.
What is actually happening when you run startprovisioner without staking?
Is it a deliberate choice to let users sync manually instead of syncing automatically? What is the risk of staking without syncing?
This is a temporary patch to a more decisive solution to dependency updates on release branches. To be able to exploit the performance improvements of dusk-network/dusk-crypto#7, go.mod must be updated.
Dusk-blockchain uses the crypto package embedded directly in the codebase. It should instead import dusk-network/dusk-crypto and treat it as an external package
After testing the blockgenerator, 1 tDusk is missing from my wallet balance.
$ ./testnet
initializing node...
initialization complete. opening console...
> createwallet [mypassword]
Wallet created successfully!
Public Address: [myaddress]
> sync
Syncing wallet... (13791/13791)
Found 0 spends and 1 receives
> balance
Balance: 10.00000000
> stake 8 100 [mypassword]
hash: [somehash]
> sync
Syncing wallet... (13804/13804)
Found 0 spends and 1 receives
> startprovisioner
provisioner module started
to more accurately follow the progression of consensus, use the showlogs command
> sync
Syncing wallet... (13807/13807)
Found 0 spends and 0 receives
> sync
Syncing wallet... (13808/13808)
Found 0 spends and 0 receives
> bid 2 100 [mypassword]
hash: [txid]
> startblockgenerator [txid]
balance
stop
After waiting a few hours I exited the console with CTRL + C because the console would not return control (see #2). Coincidentally the test monitor was stuck at height 13903.
I restarted my node to check whether I could reproduce the error. Unfortunately I did not manage to sync, but I also noticed 1 tDUSK is missing.
$ ./testnet
initializing node...
initialization complete. opening console...
> balance
please load a wallet before trying to check balance
> loadwallet [mypassword]
error attempting to load wallet: cipher: message authentication failed
> sync
please load a wallet before trying to sync
> loadwallet [mypassword]
Wallet loaded successfully!
Public Address: [myaddress]
> balance
Balance: 9.00000000
> balance
Balance: 9.00000000
> sync
Syncing wallet... (13903/13903)
Found 0 spends and 1 receives
> balance
Balance: 9.00000000
This is currently done sequentially, however as the ordering currently does not matter, we should be able to parallelize this, and cut quite some time off of encoding and decoding blocks.
Currently, in the IncreaseTimeout
function for the consensus.Timer
, t.timeOut
is evaluated against t.baseTimeOut*2^10
. Golang reads the ^
symbol as a bitwise XOR, and not a power. This should be changed, as the timer needs to keep increasing until it has doubled 10 times.
This is somewhat related to #21 albeit more focused on a general simplification of the Reducer
structure.
The Reducer
component is supposed to perform operations that are quite sequential in nature. A (grossly oversimplified) layout of its duty is the following loop:
Such a layout might be better modeled with 2 separate sync.WaitGroup
instead of overusing channels.
As pointed out, needing to sync manually any time a wallet operation is performed is quite cumbersome. This should be integrated into the wallet functions themselves, instead.
In README.md locktime
is confusingly described as an amount of blocks that must exceed the block height. That means locktime must be at least as long as the age of the chain, which seems excessive and impractical.
[locktime]
stands for the amount of blocks for which the bid is locked (block_height < locktime < 250000
)
Additionally the maximum locktime
is 250,000 blocks, which at the current average blocktime of 7 seconds is 20 days. After 20 days the block height will pass 250000, at which point there is no value for locktime
which would satisfy block_height < locktime < 250000
.
This would save us from duplicate code, and conversion issues when the wallet
package is externalized.
Very good
The key
package is currently located in the crypto folder. It is duplicated in dusk-wallet
, so we should import it from there to have only one copy of this package.
The crypto
package is still left in the repo, carrying only an unfinished README document. It should be deleted, as we are now importing it from an external repo.
Benchmarks conducted on dusk-network/dusk-crypto#14 show a substantial performance improvement when parallelizing the aggregation operations. Given the criticality of such aggregation when creating Agreement
messages, it would be good to apply the concurrent strategy to minimize event production time and avoid slacking behind when in terms of steps
(or even round
) just because Agreement
creation is slow.
Following function should serve as an inspiration for exploiting concurrency when aggregating Signature
:
func aggregateSignatures(sigs []*Signature) *Signature {
const W = 4
C := len(sigs)
M := C / W
R := C % W
if C < M {
return aggregateSignaturesSequential(sigs)
}
var wg sync.WaitGroup
var last *Signature
// goroutines will fill this slice concurrently
sgs := make([]*Signature, W)
wg.Add(W)
// w workers for n couples
for w := 0; w < W; w++ {
// each goroutine will execute a sequential Aggregation of M signatures
lo, hi := w*M, (w+1)*M
go func(i int, set []*Signature) {
sgs[i] = aggregateSignaturesSequential(set)
wg.Done()
}(w, sigs[lo:hi])
}
// aggregating the remaining chunks not fitting in the worker count
if R > 0 {
last = aggregateSignaturesSequential(sigs[C-R:])
}
wg.Wait() // waiting for all the goroutines to be done
if last != nil {
sgs = append(sgs, last)
}
return aggregateSignaturesSequential(sgs)
}
func aggregateSignaturesSequential(sigs []*Signature) *Signature {
s, sigmas := sigs[0], sigs[1:]
for _, sig := range sigmas {
s = s.Aggregate(sig)
}
return s
}
Anything not covered by the monitoring system or current logging entries, should be added to get an as-complete-as-possible view of the node when problems arise.
Currently, to avoid sharing the same pointer, we copy the buffer being sent over the event bus for each handler in any given map of handlers. We can instead pass these buffers by value. This would remove the need for the copyBuffer
function, and most likely make the dissemination of event buffers much more efficient.
While sync'ing some occasional connection reset
and timeouts are observed.
As MLSAG should be part of dusk-crypto
, this package should be removed from the repo, and our module import of dusk-crypto
should be updated to a version which includes MLSAG
As observed on the testnet, a possibility exists for the node to unintentionally offset it's agreement component step counter due to the agreement.broker.Listen
goroutine being entirely seperate from the invocation to agreement.broker.sendAgreement
(which happens as a callback from the reduction component). As aggregating the reduction events into an agreement event takes some amount of time (I believe it's a few hundred ms), there's an opening for the round update to happen in the midst of this. After the aggregation is complete, and the state is updated, the deferred call to broker.state.IncrementStep
is executed, and the node then starts off that round with an agreement step counter that is ahead by one.
As a result, any events that are created next, will be invalid. This happens because the committee bitset that gets generated is dependent on the broker.state.Step
, meaning that for reduction events created on step 1 and 2, the agreement event contains a bitset corresponding to these events in the committees from step 3 and 4. This causes an inconsistency in the network and, if enough nodes are affected on the same round, it can halt consensus entirely since too many nodes will not be able to generate proper agreement events, regardless of how many times they try.
This bug was reproduced in a test, which will be included in the PR to fix this. I believe a simple fix would just be to invoke the sendAgreement
function through a channel, instead of a callback, which would synchronize it with the occurrence of round updates, thus keeping the step counter in it's correct place at any time.
A test harness should allow us to automate E2E tests on top of fully configured and functioning testnet to avoid regressions, especially on race conditions. An overview of E2E test setup could include:
Perform assertions to ensure correct state and timing are achieved by testing assertions that check:
Ideally, these testing could be run locally on demand but also on a regular base by CI implemented in #17.
Following pseudo-code should demonstrate the concept:
func TestSendTransaction() {
duskNodeConfig := readConfig(basicRun-dusk-node.toml)
// ImagePull ...
// ContainerCreate ...
logPipe,dbPath,rpcAddr := cli.ContainerStart(duskNodeConfig)
db := db.Open(dbPath)
var expectedTx transactions.Tx
rpcConn.SendRPC(rpcAddr, "SendRawTx", expectedTx)
// Assert point: Ensure the transaction was accepted and stored
for {
attempts++
time.Sleep( 10*time.Second)
err := db.FetchBlockTxByHash(expectedTx.Hash)
if err == database.NotFound {
continue
}
if attempts > 6 {
t.Fatal("expecting the transaction is accepted within 1 minute")
}
}
}
Example with bid of 2 and locktime of 100 blocks.
> loadwallet [password]
Wallet loaded sucessfully!
> sync
Syncing wallet.. (138088/13808)
> bid 2 100 [password]
hash: [txid]
>startblockgenerator [txid]
Immediately after the last command, the CLI does not respond to input.
After the locktime of 100 blocks has passed, the CLI is expected to return control, but remains non-interactive.
Currently, if a bid at index 0 is removed, the entire bid list is cleared.
So no running on Windows for now / ever?
And where's the compiled program as mentioned in the install guide?
Thanks...
Since a committee.Store
is initialized on the chain tip height, and a consensus round is always one ahead of the tip, it can cause a discrepancy between the chain and the consensus committees.
For example, if the chain tip is at block 2000, then the round is 2001. Upon startup, consensus components run RemoveExpiredProvisioners
with round 2001, so their committee is then one step ahead of the chain, who only calls this function when blocks are accepted.
As a result, this can cause the node to get stuck, since a forged block will have an invalid certificate according to the chain, even though it is correct from a consensus point of view.
Since the tx structures were unified, the StandardTx
method was built to emulate the one from the old model, where it returns the Standard
struct as a value. This will cause problems when creating transactions, as inputs are not saved on the correct Standard
struct, and leaving the actual transaction bare.
On testnet nodes, an incredibly long delay was observed anywhere between 2-5 minutes to re-initialize a node after shutting it down, when the blockchain was longer than 250k blocks. This is due to the fact that bidlists and committees and kept seperately between components, meaning that there will be a good handful of committee stores and bidlists re-populating on startup, and also because none of this information is persisted on disk, and so the node needs to walk up the blockchain many times when it restarts.
Since dusk-wallet
can be used as a library by different DUSK applications, it should be extrapolated out of dusk-blockchain
and imported as a module.
When a bid expires, we currently create a new proof generator as it is stored on the broker
as an interface value, with no method to update the D value. However, this can cause quite some slowdown on the node, as it would have to reconstruct the bidlist, which can take (as observed on testnet) up to a few minutes. A much more efficient way to do this would be just to add a method to the Generator
interface to update the D value directly, and make use of a boolean flag as mentioned by @ggoranov.
This is a regression introduced by #63, since the collector
will stop short any attempt to run sendAgreement
if there is no vote set included in the message. The problem is that it should, to stay in sync with the reduction events it might receive, and so to create a proper committee for these events.
A fix will be pushed shortly.
With the addition of the new ring buffer, the Handler interface now contains behaviour which is only needed for one specific implementation. This can probably be refactored to be a bit more accomodating.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.