Comments (8)
I can try to investigate it, @jpraynaud you can assign me on it
from mithril.
Steps to reproduce
To reproduce this issue it is enough just to run the devnet and watch logs:
You are right, running the devnet
is not enough. In order to reproduce the problem, you can run the end to end test:
cargo build --release -p mithril-aggregator -p mithril-signer -p mithril-client-cli && cargo run -p mithril-end-to-end -- -vvvv --bin-directory target/release/ --work-directory=./artifacts --devnet-scripts-directory=./mithril-test-lab/mithril-devnet
or this one to keep the network alive:
cargo build --release -p mithril-aggregator -p mithril-signer -p mithril-client-cli && cargo run -p mithril-end-to-end -- -vvvv --bin-directory target/release/ --work-directory=./artifacts --devnet-scripts-directory=./mithril-test-lab/mithril-devnet --run-only
The logs will be accessible in the artifacts/devnet/node_pool{N}.log
files
The problem is happening on the signer node so you should probably investigate what's happening between the Mithril signer and the Cardano node. The signer is asking the node the current epoch at regular interval (usually 2 min in production) and this is done by the ChainObserver
which takes care of handling the n2c communcations.
from mithril.
@falcucci is this something that could be related to the way pallas is communicating with the Cardano node?
(The Mithril signer is retrieving the epoch from the node at this pace of once every mninute)
from mithril.
@scarmuega FYI, this is the issue we talked about during Office Hours
from mithril.
Could be because a non-graceful shutdown of the local state query. We should double-check whether the mithril-signer is sending a MsgRelease
and MsgDone
after the query.
from mithril.
Also seen on the devnet
:
[jp:cardano.node.LocalErrorPolicy:Error:64] [2024-04-16 13:22:14.42 UTC] IP LocalAddress "node-pool2/ipc/node.sock@11042" ErrorPolicyUnhandledApplicationException (MuxError MuxBearerClosed "<socket: 34> closed when reading data, waiting on next header True")
[jp:cardano.node.LocalErrorPolicy:Error:64] [2024-04-16 13:22:14.42 UTC] IP LocalAddress "node-pool2/ipc/node.sock@11043" ErrorPolicyUnhandledApplicationException (MuxError MuxBearerClosed "<socket: 35> closed when reading data, waiting on next header True")
[jp:cardano.node.LocalErrorPolicy:Error:64] [2024-04-16 13:22:14.46 UTC] IP LocalAddress "node-pool2/ipc/node.sock@11044" ErrorPolicyUnhandledApplicationException (MuxError MuxBearerClosed "<socket: 36> closed when reading data, waiting on next header True")
from mithril.
Steps to reproduce
To reproduce this issue it is enough just to run the devnet and watch logs:
# Terminal 1
cd mithril-test-lab/mithril-devnet
SKIP_CARDANO_BIN_DOWNLOAD=true \
FORCE_DELETE_ARTIFACTS_DIR=true \
NUM_BFT_NODES=1 NUM_POOL_NODES=2 ./devnet-run.sh
# Terminal 2
cd mithril-test-lab/mithril-devnet/artifacts && tail -n 62 -f ./node-pool1/node.log
# Terminal 3
docker logs -f artifacts-mithril-signer-node-pool1-1
Then after some time you will see in Terminal 2 an error
[andrew-d:cardano.node.LocalErrorPolicy:Error:62] [2024-04-18 13:51:30.14 UTC] IP LocalAddress "node-pool1/ipc/node.sock@3" ErrorPolicyUnhandledApplicationException (MuxError MuxBearerClosed "<socket: 28> closed when reading data, waiting on next header True")
and in Terminal 3 could not retrieve epoch settings at epoch Epoch(1)
with nested error: Epoch service was not initialized, the function inform_epoch must be called first
.
Investigating the causes
Signing node called /epoch-settings
HTTP-endpoint of an aggregator. And the aggregator responds with an error Epoch service was not initialized, the function inform_epoch must be called first
, which means:
/// Raised when service has not collected data at least once.
#[error("Epoch service was not initialized, the function `inform_epoch` must be called first")]
NotYetInitialized,
So it is obviously that epoch_service
wasn't initialized by calling inform_epoch
first.
There is self.runner.inform_new_epoch(new_time_point.epoch).await?;
call in mithril-aggregator/src/runtime/state_machine.rs
. And this seemed the only place where inform_epoch
is called.
There are two very close to each other error messages in aggregator's logs with events sequence:
- 2024-04-18T15:00:44.925938221Z Epoch service could not obtain current protocol parameters for epoch 4
- 2024-04-18T15:00:45.073509748Z Epoch service was not initialized, the function
inform_epoch
must be called first.
Stack backtrace of 1st one is:
2: mithril_aggregator::services::epoch_service::MithrilEpochService::get_protocol_parameters::{{closure}}
/mithril/mithril-aggregator/src/services/epoch_service.rs:136:26
3: <mithril_aggregator::services::epoch_service::MithrilEpochService as mithril_aggregator::services::epoch_service::EpochService>::inform_epoch::{{closure}}
/mithril/mithril-aggregator/src/services/epoch_service.rs:193:14
4: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/future/future.rs:124:9
5: <mithril_aggregator::runtime::runner::AggregatorRunner as mithril_aggregator::runtime::runner::AggregatorRunnerTrait>::inform_new_epoch::{{closure}}
/mithril/mithril-aggregator/src/runtime/runner.rs:460:14
6: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/future/future.rs:124:9
7: mithril_aggregator::runtime::state_machine::AggregatorRuntime::try_transition_from_idle_to_ready::{{closure}}
/mithril/mithril-aggregator/src/runtime/state_machine.rs:287:64
8: mithril_aggregator::runtime::state_machine::AggregatorRuntime::cycle::{{closure}}
/mithril/mithril-aggregator/src/runtime/state_machine.rs:177:18
9: mithril_aggregator::runtime::state_machine::AggregatorRuntime::run::{{closure}}
/mithril/mithril-aggregator/src/runtime/state_machine.rs:110:42
Which means that when aggregator try_transition_from_idle_to_ready
it calls inform_epoch
with epoch 4, but get_protocol_parameters
fails.
I'm continuing to explore
...
from mithril.
@jpraynaud , thank you for pointing to e2e tests and to the right direction. I left a comment in #1644. Not 100% sure that this is correct solution, it seems works and error message is not appearing anymore.
from mithril.
Related Issues (20)
- Remove `snapshot` command in client CLI
- Use a new GitHub Action for creating releases
- Warmup import Cardano transactions at node startup
- Lock signature of signed entity types during warm-up
- Prepare `testing-sanchonet` for respin with Cardano `8.11-pre`
- Release `2423` distribution
- Sign Cardano transactions with ChainPoint based beacon
- Implement Resource Pooling for Block Range Merkle maps HOT 1
- Performance optimizations for Cardano transactions signature/proof
- Document Cardano transactions signature and proving in website
- Low latency signature of Cardano transactions
- Block Streamer returns `ChainScannedBlocks`
- Import Cardano transactions with `ChainReader`
- SQLite WAL files are not truncated in signer and aggregator
- Remove connections coupling with providers in database
- Client verification fails with an already stored but non certified yet transaction
- Custom headers in mithril client
- Cardano transactions prover performances drop with more than 5 transactions
- Handle rollbacks in Cardano transactions
- Conditional embedding of Cardano CLI in Docker images
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mithril.