Coder Social home page Coder Social logo

concord-bft's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

concord-bft's Issues

simpleTest does not seem to be working on current master

Describe the bug
simpleTest (see https://github.com/vmware/concord-bft/tree/master/bftengine/tests/simpleTest) does not seem to be running on the current master branch for Concord-BFT.

To Reproduce

  1. Setup a (physical or virtual) machine with all the dependencies needed to build and run Concord-BFT, or use one you already have if you happen to have one (I am omitting the exact steps I used to setup such a machine in the interest of concision, but I can give them if they are of interest).
  2. If you do not have Concord-BFT cloned, clone it from this repository, otherwise navigate to your Concord-BFT directory and verify this repository is a remote for your local copy, then checkout master and pull the latest master from this repository.
  3. Starting in the directory concord-bft, run the following:
rm -rf build
cd build
cmake ..
make -j4
  1. change directories to concord-bft/bftengine/tests/simpleTest/scripts.
  2. Run:
./simpleTest.py
  1. Look at the output of this run to see if it passed. For me it looks like it failed. The last few lines of output I see are:
INFO 2019-08-23 23:27:40.512 (concord) Client 4 - request 6570808644235952128 has committed (isRO=0, request size=16,  retransmissionMilli=50) 
Iterations count: 100
Total iterations count: 2800
INFO 2019-08-23 23:27:40.512 (concord)Client 4 - sends request 6570808644256923648 (isRO=1 , request size=8, retransmissionMilli=50 ) 
INFO 2019-08-23 23:27:40.512 (concord) Client 4 - sends request 6570808644256923648 (isRO=1, request size=25,  retransmissionMilli=50, numberOfTransmissions=1, resetReplies=0, sendToAll=1)
INFO 2019-08-23 23:27:40.512 (concord) Client 4 received ClientReplyMsg with seqNum=6570808644256923648 sender=1  size=24  primaryId=0 hash=657769120
INFO 2019-08-23 23:27:40.512 (concord) Client 4 received ClientReplyMsg with seqNum=6570808644256923648 sender=2  size=24  primaryId=0 hash=657769120
INFO 2019-08-23 23:27:40.512 (concord) Client 4 received ClientReplyMsg with seqNum=6570808644235952128 sender=3  size=24  primaryId=0 hash=2399
INFO 2019-08-23 23:27:40.513 (concord) Client 4 received ClientReplyMsg with seqNum=6570808644256923648 sender=3  size=24  primaryId=0 hash=657769120
INFO 2019-08-23 23:27:40.513 (concord) Client 4 - request 6570808644256923648 has committed (isRO=1, request size=8,  retransmissionMilli=50) 
ERROR 2019-08-23 23:27:40.513 (plain-udp)receiver is NULL
2019-08-23 23:27:40,804 INFO   End config "n=4_r=4_f=1_c=0_cl=1"
2019-08-23 23:27:40,804 INFO   CONFIGURATION RUN FAIL
INFO 2019-08-23 23:27:40.513 (simpletest.client)test done, iterations: 2800
TESTS FAIL

Expected behavior
I expected that, when I run this test after building Concord-BFT off master, the test should run and report to me it succeeded, not that it failed.

Screenshots
Screenshots are not applicable; please see the test output lines above.

Metrics server error message uses network order

Describe the bug
The error message for the metrics server uses listenport_ rather than servaddr.sin_port, resulting in error messages that show the wrong port number. This can be confusing to debug.

For example, on a message server to 6161, this error is shown if the port is already in use:

1: FATAL 2019-07-27 21:55:25.574 (metrics-server)Error binding UDP socket: IP=0, Port=4376, errno=Address already in use

which is simply 6161 in network order (0x1811 vs 0x1118).

To Reproduce
Steps to reproduce the behavior:

Run the metrics server with the port already in use (for example, run the server twice).

Expected behavior
The correct port should be reported.

Introduce conan.io as the package management for the project

Summary

We would like to start using conan.io for package management

Motivation

  • To get rid of manual installs of the dependencies
  • Enable repeatable builds
  • Allow users to install binary packages

Detailed Design

  • Added package manager manager to cmake
  • Convert submodules to conan packages
  • Set up managed repository

Drawbacks

  • Complexity of having an extra tool

Alternatives

Naked cmake or another package manager

Open Questions

NA

UDP vs. interesting network topologies

We were recently debugging some strange behavior in an app running on concord-bft. When we started the application, we would see some initial read-only requests be handled just fine. But, when we started sending write requests, they would sometimes time out, and we would see errors in the log like:

 22:41:39.878 WARNING: Node 1 received invalid message from Node 0 (type==105)

We saw many different "type" values, and each replica index would appear as the receiver. Curiously, it was always Node 0 that was logged as the sender of the invalid message.

We tracked this down to a problem with our network, which confused the UDP communication module.

The problem with the network is that each replica was running in a docker container on a different host, and was attaching to the bridge network. The UDP communication module uses the socket details to determine the source of the message, and these socket details were giving the IP and port of the bridge network (172.17.) instead of the host network (10.). Unfortunately, if the IP:port pair is unknown to the UDP module, instead of returning an error, it returns "0". For a variety of reasons (signature matching, expectations of primary for a view, etc.), a message from some other node will be considered an invalid message if it's thought to come from Node 0.

The fix for our deployment was to switch our docker containers to using the host network, so that the socket source matches the configured source.

To help others find and fix this problem in their deployments, it would be nice to see a change in the UDP communication module, to log when message appears to come from an unknown source (including the source).

compile error--clang: error: unsupported option '--no-system-header-prefix'

ubuntu 14.04
clang: 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4)
gcc:4.8.4

when i run make,i got an error:

Scanning dependencies of target bls_relic
[ 1%] Building CXX object threshsign/src/bls/relic/CMakeFiles/bls_relic.dir/BlsAccumulatorBase.cpp.o
clang: error: unsupported option '--no-system-header-prefix'
clang: error: no such file or directory: 'relic'
make[2]: *** [threshsign/src/bls/relic/CMakeFiles/bls_relic.dir/BlsAccumulatorBase.cpp.o] Error 1
make[1]: *** [threshsign/src/bls/relic/CMakeFiles/bls_relic.dir/all] Error 2
make: *** [all] Error 2

I have installed clang,relic,cryptopp.
I need some help. Thanks.

Some warnings cause the build to fail after -Werror

Describe the bug
It seems that after -Werror was enabled in #129 - several warnings which occur when using clang to build now cause the build to fail. One is the cryptopp dependency, as documented in issue #131 - but there are several format type warnings which clang generates that gcc does not catch as well, which cause the build to fail:

/concord-bft/bftengine/src/bftengine/DebugStatistics.cpp:89:53: warning: format specifies type 'long' but the argument has type 'int64_t'
      (aka 'long long') [-Wformat]
                        fprintf(stdout, "lastExecutedSeqNumber = %ld\t", d.lastExecutedSequenceNumber);
                                                                 ~~~     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/concord-bft/bftengine/src/bftengine/ReplicaImp.cpp:615:19: warning: format specifies type 'long' but the argument has type 'bftEngine::impl::Time'
      (aka 'unsigned long long') [-Wformat]
                                           i, currTime, timeOfPartProof);
                                              ^~~~~~~~

To Reproduce
Steps to reproduce the behavior:
Build the project using clang (for example, on macos).

Expected behavior
The project should build successfully under clang.

‘ConcordAssert’ was not declared

Describe the bug

When I tried compile with -DBUILD_COMM_TCP_PLAIN=ON, I encountered the following errors:

/root/abc/concord-bft/communication/src/PlainTcpCommunication.cpp:674:7: error: ‘ConcordAssert’ was not declared in this scope
       ConcordAssert(results != tcp::resolver::iterator());

Thanks.

Feature Request: API for use from other languages

Is your feature request related to a problem? Please describe.
Please make it possible to use this library from other languages. Something like REST/HTTP or Websocket messaging / RPC API to consume this library from other languages, such as Javascript.

Describe the solution you'd like
The state should be capable of being defined in and manageable from any language. For example, in-browser javascript, we define the state object that uses the underlying concord-bft consens engine to sync the state with other browser-based instance, native-app node instances etc.

  • The app talks to underlying consensus engine over RPC (over http / websockets, for example: BRPC or DHT-RPC).
  • Multiple app instances can join the consensus network on the fly and leave (e.g. new browser tab starts another app instance with the replicated state). The DHT-RPC makes it possible to do the decentralized P2P RPC
  • Apps store their state data in databases of their choice, either in local files, or distributed databases, such as MongoDB, ElasticSearch clusters on AWS
  • The concord-bft does not fix the schema of the state object. Using serialization such as cbor , instead to send and receive the state data during the RPC.

Describe alternatives you've considered
Tendermint ABCI is a good reference for inspiration. It has a clear separation between application state (managed by the app in any language, called ABCI proxy) and blockchain-state (managed by the Tendermint Node for consensus). Ref. Lotion.JS

Additional context
A consensus engine that can replicate multiple state-machines across multiple applications, created in multiple languages.

Use DebugPersistentStorage instead of FileStorage for SimpleKVBC TesterReplica

Describe the bug
In order to fix the skvbc tests, I added file persistence for bft metadata to the SimpleKVBC TesterReplica. However, the application state itself is not persisted. This is confusing, and can potentially lead to weird errors. The file persistence should be removed and instead, the in memory, debug persistence should be used.

Thanks to @guyg8 for catching this silly mistake on my part.

To Reproduce
N/A

Expected behavior
We should use Debug persistence flag.

Memory leak at PartialProofsSet::tryToCreateFullProof

Describe the bug
At PartialProofsSet::tryToCreateFullProof()
the line - IThresholdAccumulator* acc = thresholdAccumulator->clone()
creates a new object by calling -
virtual IThresholdAccumulator* clone() { return new BlsThresholdAccumulator(*this); } (file BlsThresholdAccumulator.h)

The pointer is being passed to AsynchProofCreationJob for async execution, but I don't see any code that deletes it.

Expected behavior
delete the created object by using std ::unique_ptr

[PROJECT] Add Milestones, remove "Projects" and "Wiki"

Though there are not many open issues, a Milestone (minimum 1.0, possibly more) could be added to get a sense of a roadmap. This has an additional benefit of making the project look more "alive".

The "wiki" seems unused (doesn't open for me), so possibly just remove.

The same for the "Projects" tab.

Add a README.md to kvbc directory

Is your feature request related to a problem? Please describe.
All top level directories should have a brief readme to explain what they do.

Describe the solution you'd like
Write a README.md that talks about the code in the directory, what it is used for and how it is structured. It doesn't need to be more than a few lines + bullets.

Describe alternatives you've considered
The alternative is to not document our top level directories.

Additional context
N/A

Implement logging wrapper as header only

This change is related on the implementation in #1
Currently the wrapper contains .cpp because the global variable is defined there.

The proposal is to move the global variable inside bft engine (currently to the old Logger implementation) and to change it's semantics to "bft engine only global logger".
Any new usage of the wrapper, especially outside the bft engine, should create its own logger object.

Clean Thread Local Storage over the code

Is your feature request related to a problem? Please describe.
TLS hasn't been used by the library anymore

Describe the solution you'd like
Remove all TLS dependencies

Can't build the latest master branch

Describe the bug
build failed with error

To Reproduce
follow the instruction in readme

Expected behavior
choose the native build procedure

Screenshots

[ 69%] Built target ClientsManager_test
[ 69%] Building CXX object bftengine/tests/testSeqNumForClientRequest/CMakeFiles/seqNumForClientRequest_test.dir/seqNumForClientRequest_test.cpp.o
[ 69%] Building CXX object bftengine/tests/messages/CMakeFiles/FullCommitProofMsg_test.dir/FullCommitProofMsg_test.cpp.o
[ 69%] Building CXX object bftengine/tests/messages/CMakeFiles/ViewChangeMsg_test.dir/ViewChangeMsg_test.cpp.o
/home/wizard/src/concord-bft/kvbc/src/categorization/block_merkle_category.cpp: In constructor ‘concord::kvbc::categorization::detail::BlockMerkleCategory::BlockMerkleCategory(const std::shared_ptr&)’:
/home/wizard/src/concord-bft/kvbc/src/categorization/block_merkle_category.cpp:314:109: error: no matching function for call to ‘logging::Logger::Logger()’
 BlockMerkleCategory::BlockMerkleCategory(const std::shared_ptr& db) : db_{db} {
                                                                                                             ^
In file included from /home/wizard/src/concord-bft/logging/include/Logger.hpp:23:0,
                 from /home/wizard/src/concord-bft/kvbc/include/categorization/block_merkle_category.h:18,
                 from /home/wizard/src/concord-bft/kvbc/src/categorization/block_merkle_category.cpp:14:
/home/wizard/src/concord-bft/logging/include/Logging.hpp:101:3: note: candidate: logging::Logger::Logger(logging::LoggerImpl&)
   Logger(LoggerImpl& logger) : logger_{&logger} {}
   ^~~~~~
/home/wizard/src/concord-bft/logging/include/Logging.hpp:101:3: note:   candidate expects 1 argument, 0 provided
/home/wizard/src/concord-bft/logging/include/Logging.hpp:99:7: note: candidate: constexpr logging::Logger::Logger(const logging::Logger&)
 class Logger {
       ^~~~~~
/home/wizard/src/concord-bft/logging/include/Logging.hpp:99:7: note:   candidate expects 1 argument, 0 provided
/home/wizard/src/concord-bft/logging/include/Logging.hpp:99:7: note: candidate: constexpr logging::Logger::Logger(logging::Logger&&)
/home/wizard/src/concord-bft/logging/include/Logging.hpp:99:7: note:   candidate expects 1 argument, 0 provided
At global scope:
cc1plus: error: unrecognized command line option ‘-Wno-undefined-var-template’ [-Werror]
cc1plus: error: unrecognized command line option ‘-Wno-extra-semi’ [-Werror]
cc1plus: all warnings being treated as errors
kvbc/CMakeFiles/kvbc.dir/build.make:453: recipe for target 'kvbc/CMakeFiles/kvbc.dir/src/categorization/block_merkle_category.cpp.o' failed
make[2]: *** [kvbc/CMakeFiles/kvbc.dir/src/categorization/block_merkle_category.cpp.o] Error 1
CMakeFiles/Makefile2:3624: recipe for target 'kvbc/CMakeFiles/kvbc.dir/all' failed
make[1]: *** [kvbc/CMakeFiles/kvbc.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 69%] Building CXX object bftengine/tests/messages/CMakeFiles/ViewChangeMsg_test.dir/helper.cpp.o
[ 69%] Building CXX object bftengine/tests/messages/CMakeFiles/FullCommitProofMsg_test.dir/helper.cpp.o
[ 70%] Linking CXX executable seqNumForClientRequest_test
[ 70%] Linking CXX executable ViewChangeMsg_test
[ 70%] Linking CXX executable FullCommitProofMsg_test
[ 70%] Built target seqNumForClientRequest_test
[ 70%] Built target ViewChangeMsg_test
[ 70%] Built target FullCommitProofMsg_test
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2
wizard@concord-dev:~/src/concord-bft/build$ 

[TASK] Clarify some Things (re #426)

(draft, me executing this most possibly within the next days)

re #426

Task

Document any findings from within issue #426 (especially #426 (comment)), thus visitors have the right expectations, and can find information/documentation faster.

Steps

  • #439 Consider creating a 1.0 Milestone
  • Add a clarification re "Some docs we have written are not quite ready for open source and need to be cleaned and stripped of proprietary information before being shared."
  • Mention the availability of "in-code" documentation with the readme
  • Ensure that in-code documentation (which is a good thing), is mentioned from an in-unit (folder) README, so a visitor browsing the code finds it.
  • clarify that there is a business-unit behind the code, which prioritizes business customers (this is nothing bad, of course, should be mentioned to avoid false expectations)
  • clarify "All work done for paying customers relating to the BFT code becomes open source and is made freely available. "
  • clarify "status" (e.g.: "is used in production by multiple yxz)
  • Consider providing a link or interested business-customers (e.g. "custom development" on vmware website)

Nodes joining and leaving dynamically

Could not find any other place to raise a query, hence posting it here.

It is not clear how arbitrary nodes can join and leave the BFT consensus network on the fly.

The SimpleTest application is demonstrating few nodes with a client, but it does not seem to be considering the scenario where multiple nodes writing a value at the same time:

Is that possible? Can multiple nodes:

  1. update the value of same key at the same time?
  2. write multiple values that belong to different keys at the same time?

If only one client is ever allowed to read/write from the nodes (as assumed in the above shown test code), then it is a very restrictive scenario that does not test the distributed nature of the nodes. Any test case that demonstrates multiple nodes writing multiple values (but everyone getting synced with one single agreed value / block of values) would be the best showcase example. Any pointers or info on those lines would be of great help.

Add batching behavior to stats

Is your feature request related to a problem? Please describe.

When diagnosing performance of my cluster, one metric I'd like to have access to is how many requests the primary replica is putting into each consensus batch. That information is currently in the "Sending PrePrepareMsg" log ("requests="). It would be nice to have it somewhere more accessible, especially since that log is scheduled to be lowered from INFO to DEBUG in #179 .

Describe the solution you'd like

There is now a stats accumulator. It would be great if a batching metric were added there.

GCC build fails with unused variable issue

Describe the bug
Related to #131 and #132, and probably caused by #129 - building under the GCC distributed with Ubuntu 18.04 results in a compile error due to unused variable warnings:

/home/travis/build/no2chem/concord-bft/bftengine/src/bftengine/ReplicaLoader.cpp: In function ‘bftEngine::impl::ReplicaLoader::ErrorCode bftEngine::impl::{anonymous}::loadViewInfo(bftEngine::impl::PersistentStorage*, bftEngine::impl::LoadedReplicaData&)’:
/home/travis/build/no2chem/concord-bft/bftengine/src/bftengine/ReplicaLoader.cpp:159:11: error: variable ‘initialViewNum’ set but not used [-Werror=unused-but-set-variable]
   ViewNum initialViewNum = 0;
           ^~~~~~~~~~~~~~
/home/travis/build/no2chem/concord-bft/bftengine/src/bftengine/ReplicaLoader.cpp:160:8: error: variable ‘isInView’ set but not used [-Werror=unused-but-set-variable]
   bool isInView = false;
        ^~~~~~~~
cc1plus: all warnings being treated as errors

To Reproduce
See this job:
https://travis-ci.com/no2chem/concord-bft/jobs/219173980

Expected behavior
The build should succeed

Reduce log level in core BFT engine

Change the INFO level to DEBUG in most BFT internal log messages. The current INFO level produces huge logs and are not suitable for production.

Improve SimpleTest run

Currently test is run using GNU parallel and this approach has few limitations:

  1. no indication why job failed without looking in the logs
  2. sometimes terminating the jobs doesn't work and user must kill processes manually
  3. less suitable to be run by external tools that may expect clear exit code(s) from batch or specific
    jobs

The proposed solution is to implement script in Python that will allow parallel execution of test executables and will produce clean and detailed output, if needed. Moreover, the termination should be deterministic and this script should be easily run by any external SW and to return clean exit code and path to log files that were created during the test.

Logging utility refactor

Currently logging in the library is inconsistent. The desired solution is to implement a wrapper that will abstract dependencies from the library code.

  1. Implement logging interface similar to standard interfaces ( Log4CPlus is preferrable one with support of << operator ).
  2. Implement simple console logger to be a part of this library.

Add concurrent transaction tests

Our multiIO storage tests only test the semantics of a single transaction. We should run a few concurrent transactions (in separate threads) and ensure that all commit. In-memory transactions should always commit as they hold exclusive locks, unless we purposefully raise an exception during the transaction. However, the RocksDB semantics may not be the same. We probably want to read/write to non-overlapping keys to ensure commitment of all transactions. However, we probably also want a few tests that check the rest of the RocksDB semantics. We can just conditionalize these with #ifdef USE_ROCKSDB.

SkvbcPersistenceTest fails intermittently (in CI & locally)

Describe the bug
As stated in the title, there are intermittent failures of persistence-related system tests.
Most of the failures seem to be related to the fact that we restart replicas and some of the metrics aren't immediately available. However other failures have been observed too, including failing assertions in the concord-bft code (see attached stack traces).

To Reproduce
Steps to reproduce the behavior:

  • in CI it happens approx. 1 out of 10 times - cannot be reproduced deterministically
  • when running the system tests locally (Ubuntu 18.04), failures can be observed by running the following shell command:
    for i in `seq 20`; do python3 -m unittest test_skvbc_persistence.SkvbcPersistenceTest.test_st_while_primary_crashes 1>/dev/null; done

Expected behavior
SkvbcPersistenceTest should succeed consistently.

Screenshots
N/A

metadataStorageTest broken

Describe the bug
f4549f4 appears to have broken the metadataStorageTest.

To Reproduce
cd build && cmake -DBUILD_ROCKSDB_STORAGE=TRUE .. && make && ctest -R metadata --verbose

Expected behavior
The test should pass.

Screenshots

andrewstone@ubuntu:~/concord-bft/build$ ctest -R metadata --verbose
UpdateCTestConfiguration  from :/home/andrewstone/concord-bft/build/DartConfiguration.tcl
Parse Config file:/home/andrewstone/concord-bft/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/home/andrewstone/concord-bft/build/DartConfiguration.tcl
Parse Config file:/home/andrewstone/concord-bft/build/DartConfiguration.tcl
Test project /home/andrewstone/concord-bft/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 16
    Start 16: metadataStorage_test

16: Test command: /home/andrewstone/concord-bft/build/storage/test/metadataStorage_test
16: Test timeout computed to be: 1500
16: [==========] Running 2 tests from 1 test suite.
16: [----------] Global test environment set-up.
16: [----------] 2 tests from metadataStorage_test
16: [ RUN      ] metadataStorage_test.single_read
16: unknown file: Failure
16: C++ exception with description "Metadata object objectId 1 size is too big: given 80, allowed 4
16: " thrown in the test body.
16: [  FAILED  ] metadataStorage_test.single_read (0 ms)
16: [ RUN      ] metadataStorage_test.multi_write
16: unknown file: Failure
16: C++ exception with description "Metadata object objectId 1 size is too big: given 80, allowed 4
16: " thrown in the test body.
16: [  FAILED  ] metadataStorage_test.multi_write (0 ms)
16: [----------] 2 tests from metadataStorage_test (0 ms total)
16:
16: [----------] Global test environment tear-down
16: [==========] 2 tests from 1 test suite ran. (0 ms total)
16: [  PASSED  ] 0 tests.
16: [  FAILED  ] 2 tests, listed below:
16: [  FAILED  ] metadataStorage_test.single_read
16: [  FAILED  ] metadataStorage_test.multi_write
16:
16:  2 FAILED TESTS
1/1 Test #16: metadataStorage_test .............***Failed    0.04 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.05 sec

The following tests FAILED:
	 16 - metadataStorage_test (Failed)
Errors while running CTest

Implement TLS support for python test framework

Currently python test client (the bft_client.py) supports only UDP communication to the concord replicas.
Need to implement TLS support, either with Trio or natively using Python SSL library.
Important points:

  • we use certificate pinning and we trust only our certificates. The solution should support custom verification callback
  • currently, the python client should connect to all replicas (in other words, it should be the "Client" in the TCP world)

pyclient_test failing periodically

Describe the bug
pyclient_tests is failing intermittently. Master build currently failed because of it.
To Reproduce
Run the test.
Expected behavior
Test passes.

GenerateConcordKeys: not found

Following the Run examples directions in README.md (commit c007f72), I got

$ LD_LIBRARY_PATH=/usr/local/lib ./testReplicasAndClient.sh Generating new keys... ./testReplicasAndClient.sh: 10: ./testReplicasAndClient.sh: ../../../../tools/GenerateConcordKeys: not found

because GenerateConcordKeys is not in the git tree tools directory. Instead it is in the $C_BFT_BUILD_DIR directory (which BTW got set to this /home/<user>/builds/concord-bft/release by scripts/linux/set-env.sh:26 builddir_base=~/builds/concord-bft - a second bug? Totally did not expect the build directory to get placed outside the git tree.)

The following two one line changes seems to properly work around the issue:

diff --git a/bftengine/tests/simpleTest/scripts/testReplicasAndClient.sh b/bftengine/tests/simpleTest/scripts/testReplicasAndClient.sh
index ff442be..15607a0 100755
--- a/bftengine/tests/simpleTest/scripts/testReplicasAndClient.sh
+++ b/bftengine/tests/simpleTest/scripts/testReplicasAndClient.sh
@@ -7,7 +7,7 @@ echo "Generating new keys..."
 
 rm -f private_replica_*
 
-../../../../tools/GenerateConcordKeys -n 4 -f 1 -o private_replica_
+$C_BFT_BUILD_DIR/tools/GenerateConcordKeys -n 4 -f 1 -o private_replica_
 
 parallel --halt now,fail=1 -j0 ::: \
     "$scriptdir/../server 0" \
diff --git a/bftengine/tests/simpleTest/scripts/simpleTest.py b/bftengine/tests/simpleTest/scripts/simpleTest.py
index ff4285f..8a4362c 100755
--- a/bftengine/tests/simpleTest/scripts/simpleTest.py
+++ b/bftengine/tests/simpleTest/scripts/simpleTest.py
@@ -212,7 +212,7 @@ class TestConfig:
                 self.log_dir, self.config_name.replace(",","_")),"w") as w:
             w.write("cli args: " + cli_args)
 
-        cmd = ["../../../../tools/GenerateConcordKeys",
+        cmd = [os.environ['C_BFT_BUILD_DIR'] + "/tools/GenerateConcordKeys",
                "-n", str(self.num_of_replicas),
                "-f", str(self.num_of_faulty),
                "-o", os.path.join(self.log_dir, "private_replica_")]

Unrecognized symbols 'CMP_EQ' and 'BN_POS'

Describe the bug
Failing to build in Ubuntu:Xenial with the error message as in the title (screenshot attached)

To Reproduce
I have attached my Dockerfile used to build a concord-bft image.
Ignore everything past line 59.
The bug occurs at line 58.

systems_concord-bft_Dockerfile.txt

Expected behavior
Compile concord-bft successfully

Screenshots

concord_bug

Encountered TooSlowError in pyclient_tests_udp as part of CI Checks

Encountered the TooSlowError error in the test, pyclient_tests_udp in Build and Test check of CI checks for the PR
Link to run where it failed - https://github.com/vmware/concord-bft/pull/1194/checks?check_run_id=1865803092

Steps to reproduce the behavior: Most probably this is happening because of slow startup and there was nothing in the changes introduced by PR to make it slow, so it would be a transient error

Proposed Solution: Increase in the timeout, so that new PRs would not face this problem

State transfer does not complete when crashing primary replica

Describe the bug
When crashing the primary in the presence of a stale replica, state transfer (reserved pages) does not complete.

To Reproduce
Steps to reproduce the behavior:

  1. Start 3 nodes out of a 4 node cluster. Write a specific key, then enough data to the cluster to trigger several checkpoints.
  2. Stop the primary (do NOT trigger view change)
  3. Start the stale node, which should start fetching/state transfer
  4. [EXPECTED] State transfer completes for transaction blocks, but loops infinitely for reserved pages
  5. [OBSERVED] State transfer never completes

I suspect the following logic to be related:
https://github.com/vmware/concord-bft/blob/master/bftengine/src/bcstatetransfer/BCStateTran.cpp#L1901

PS: I have attached logs which demonstrate the issue. The following snippet seems to repeat a lot:
INFO |state-transfer|void bftEngine::SimpleBlockchainStateTransfer::impl::BCStateTran::processData()|state is GettingMissingResPages|
INFO |state-transfer|void bftEngine::SimpleBlockchainStateTransfer::impl::BCStateTran::processData()|nextRequiredBlock_=18446744073709551615

Expected behavior
State transfer should be able to complete with only 3 up-to-date nodes, regardless of whether view change has completed.

Screenshots
N/A - attaching logs
persistence_with_st_and_crashing_primary.zip

-Werror results in compilation error on macos

Describe the bug
When building on macos (10.14.15), compilation of corebft fails due to assertions in cryptopp. I haven't tested if the behavior occurs on other platforms, but this appears related to clang and #129

update: this seems to be directly related to #129 and clang, BUILD_TESTING isn't required

In file included from /concord-bft/bftengine/src/bftengine/PrePrepareMsg.cpp:11:
In file included from /concord-bft/bftengine/src/bftengine/Crypto.hpp:13:
In file included from /usr/local/include/cryptopp/dll.h:17:
In file included from /usr/local/include/cryptopp/aes.h:9:
In file included from /usr/local/include/cryptopp/rijndael.h:11:
/usr/local/include/cryptopp/seckey.h:180:32: error: extra ';' inside a class [-Werror,-Wextra-semi]
        CRYPTOPP_COMPILE_ASSERT(Q > 0);
                                      ^
/usr/local/include/cryptopp/seckey.h:181:37: error: extra ';' inside a class [-Werror,-Wextra-semi]
        CRYPTOPP_COMPILE_ASSERT(N % Q == 0);
                                           ^
/usr/local/include/cryptopp/seckey.h:182:37: error: extra ';' inside a class [-Werror,-Wextra-semi]
        CRYPTOPP_COMPILE_ASSERT(M % Q == 0);

To Reproduce
Configure the project on macos and build.

cmake ..
make

Expected behavior
The project should build.

Stable branch

I am looking for a stable branch that compiles inside the Docker image. I wanted to test some cryptographic optimizations. I tried the master branch and the v0.10.x branches.

Steps to Reproduce
General build:

  1. cd into the cloned repo: cd concord-bft
  2. build the container: docker build -t sbft .
  3. Exec a shell in the container (I use visual studio code for this)
  4. Create the build directory and cd into it: mkdir build; cd build
  5. Try building cmake .. && make -j7

Expected behavior
Everything should compile without errors.

Logs

Incorrect UDP Stop implementation

Describe the bug
When PlainUdpCommunication::Stop is called, sometimes the call doesn't return. The reason is that isRunning is not used correctly and when the socket is closed, the receive thread continues to wait for messages.
Moreover, there is no need to call shutdown() on DGRAM socket. close() is enough.

To Reproduce
Run attached test few times - eventually one of the calls will not complete

Expected behavior
To shut UDP comm correctly and to return immediately

Looks like you have some compile time errors

Describe the bug
Making in '/home/tcullen/builds/concord-bft/release' ...

[ 21%] Built target bls_relic
[ 31%] Built target common
[ 32%] Built target threshsign
[ 92%] Built target corebft
[ 93%] Linking CXX executable server
../../libcorebft.a(Crypto.cpp.o): In function bftEngine::impl::SecretSharingOperations::recoverBinaryString(unsigned short, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)': Crypto.cpp:(.text+0x3815): undefined reference to CryptoPP::StringStore::TransferTo2(CryptoPP::BufferedTransformation&, unsigned long&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool)'
Crypto.cpp:(.text+0x39de): undefined reference to CryptoPP::StringStore::TransferTo2(CryptoPP::BufferedTransformation&, unsigned long&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)' Crypto.cpp:(.text+0x3a2d): undefined reference to CryptoPP::StringStore::TransferTo2(CryptoPP::BufferedTransformation&, unsigned long&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool)'
../../libcorebft.a(Crypto.cpp.o): In function CryptoPP::SourceTemplate<CryptoPP::StringStore>::Pump2(unsigned long&, bool)': Crypto.cpp:(.text._ZN8CryptoPP14SourceTemplateINS_11StringStoreEE5Pump2ERmb[_ZN8CryptoPP14SourceTemplateINS_11StringStoreEE5Pump2ERmb]+0x30): undefined reference to CryptoPP::StringStore::TransferTo2(CryptoPP::BufferedTransformation&, unsigned long&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool)'
../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP11OutputProxyE[_ZTVN8CryptoPP11OutputProxyE]+0xd0): undefined reference to CryptoPP::BufferedTransformation::Skip(unsigned long)' ../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP17SimpleProxyFilterE[_ZTVN8CryptoPP17SimpleProxyFilterE]+0xd0): undefined reference to CryptoPP::BufferedTransformation::Skip(unsigned long)'
../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP17SimpleProxyFilterE[_ZTVN8CryptoPP17SimpleProxyFilterE]+0x120): undefined reference to CryptoPP::Filter::TransferTo2(CryptoPP::BufferedTransformation&, unsigned long&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)' ../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP17SimpleProxyFilterE[_ZTVN8CryptoPP17SimpleProxyFilterE]+0x128): undefined reference to CryptoPP::Filter::CopyRangeTo2(CryptoPP::BufferedTransformation&, unsigned long&, unsigned long, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) const'
../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP14SourceTemplateINS_11StringStoreEEE[_ZTVN8CryptoPP14SourceTemplateINS_11StringStoreEEE]+0xd0): undefined reference to CryptoPP::BufferedTransformation::Skip(unsigned long)' ../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP14SourceTemplateINS_11StringStoreEEE[_ZTVN8CryptoPP14SourceTemplateINS_11StringStoreEEE]+0x120): undefined reference to CryptoPP::Filter::TransferTo2(CryptoPP::BufferedTransformation&, unsigned long&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool)'
../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP14SourceTemplateINS_11StringStoreEEE[_ZTVN8CryptoPP14SourceTemplateINS_11StringStoreEEE]+0x128): undefined reference to CryptoPP::Filter::CopyRangeTo2(CryptoPP::BufferedTransformation&, unsigned long&, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) const' ../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP12StringSourceE[_ZTVN8CryptoPP12StringSourceE]+0xd0): undefined reference to CryptoPP::BufferedTransformation::Skip(unsigned long)'
../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP12StringSourceE[_ZTVN8CryptoPP12StringSourceE]+0x120): undefined reference to CryptoPP::Filter::TransferTo2(CryptoPP::BufferedTransformation&, unsigned long&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)' ../../libcorebft.a(Crypto.cpp.o):(.data.rel.ro._ZTVN8CryptoPP12StringSourceE[_ZTVN8CryptoPP12StringSourceE]+0x128): undefined reference to CryptoPP::Filter::CopyRangeTo2(CryptoPP::BufferedTransformation&, unsigned long&, unsigned long, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) const'
collect2: error: ld returned 1 exit status
bftengine/tests/simpleTest/CMakeFiles/simpleTest_server.dir/build.make:122: recipe for target 'bftengine/tests/simpleTest/server' failed
make[2]: *** [bftengine/tests/simpleTest/server] Error 1
CMakeFiles/Makefile2:371: recipe for target 'bftengine/tests/simpleTest/CMakeFiles/simpleTest_server.dir/all' failed
make[1]: *** [bftengine/tests/simpleTest/CMakeFiles/simpleTest_server.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

To Reproduce
make.sh

Expected behavior
What is expected to happen is that compilation completes without error or warning.

Screenshots
If applicable, add screenshots to help explain your problem.

[SKVBC TesterReplica] Failure to parse a correct configuration file

Describe the bug
A TesterReplica configuration file compliant to the format is rejected

To Reproduce
Steps to reproduce the behavior:

  1. Use same configuration file as the sample one
  2. Run skvbc_replica with -n option pointing to that configuration file

Expected behavior
Replica to parse the configuration and run successfully

What happens
The replica exits logging the following:

INFO 2020-05-14 08:20:13.146 (concord)  Command line options:
FATAL 2020-05-14 08:20:13.184 (concord)  failed to parse command line arguments: basic_string::substr: __pos (which is 32) > this->size() (which is 18)
FATAL 2020-05-14 08:20:13.184 (concord)  exception: basic_string::substr: __pos (which is 32) > this->size() (which is 18)

Root cause analysis
The commit 03c42304c81f17040d80151b3261ed6963030367 introduced at least two regresses:

  1. At line 58 of tests/config/config_file_parser.cpp, value = tmp.substr(tmp[1]); causes the above error since it substrings not from the 1st char as intended, but from the 32nd char, the ASCII value of the space following the dash.
  2. Multi-values are not accepted since the key is changed to the IP address of the last read line.

Fix README for bftengine/tests directory

Describe the bug
Since moving the code out of this directory and into the top level kvbc directory, the README no longer makes sense.

Expected behavior
Documentation should match the directory structure

DebugStatistics.cpp is missing inttypes.h include

Describe the bug
After merging #133 - DebugStatistics.cpp fails to compile (at least on my mac) as it is missing the inttypes.h include.

[ 42%] Building CXX object bftengine/CMakeFiles/corebft.dir/src/bftengine/DebugStatistics.cpp.o
/concord-bft/bftengine/src/bftengine/DebugStatistics.cpp:89:48: error: expected ')'
                        fprintf(stdout, "lastExecutedSeqNumber = %" PRId64 "\t", d.lastExecutedSequenceNumber);
                                                                    ^
/concord-bft/bftengine/src/bftengine/DebugStatistics.cpp:89:11: note: to match this '('
                        fprintf(stdout, "lastExecutedSeqNumber = %" PRId64 "\t", d.lastExecutedSequenceNumber);

To Reproduce
Steps to reproduce the behavior:
Build concord-bft on master

Expected behavior
The build should succeed.

Random fails on startup

Random startup failures with message:
concord: /concord/submodules/concord-bft/bftengine/src/bftengine/ReplicaImp.cpp:2670: virtual void bftEngine::impl::ReplicaImp::changeStateTransferTimerPeriod(uint32_t): Assertion `false' failed.

It happens because of the race condition between RepicaImp::processMessages() and ReplicaImp::start() methods.

Build fails due to missing zconf.h

Describe the bug
Build fails with the following error

[ 42%] Building CXX object bftengine/CMakeFiles/corebft.dir/src/bftengine/BFTEngine.cpp.o
/home/andrewstone/concord-bft/bftengine/src/bftengine/BFTEngine.cpp:20:10: fatal error: 'zconf.h' file not found
#include <zconf.h>
         ^~~~~~~~~

To Reproduce
Steps to reproduce the behavior:
cmake .. && make

Expected behavior
I expect the build to succeed.

This appears to have been introduced in this commit:
84711da

Is there any chance that this project becomes more active?

Is your feature request related to a problem? Please describe.
I'm always frustrated when I see some nice project in a "kind of abandoned" state.

Describe the solution you'd like
More activity, at least bringing the concord project to a real 1.0 version, e.g. with nearly everything clean-up and documented.

Please note that it is impossible to contribute this work, as long as the maintainers are not very active (e.g. dealays with PRs etc.). It is something that vmware needs to want.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.