espressosystems / espresso-polygon-zkevm-demo Goto Github PK

License: GNU Affero General Public License v3.0

Makefile 0.01% Rust 99.40% Nix 0.15% Just 0.07% Dockerfile 0.04% Shell 0.34%

espresso-polygon-zkevm-demo's Introduction

Espresso Sequencer - Polygon zkEVM - Integration Demo

This repo contains a demo where two rollups based on the Polygon zkEVM stack use the Espresso Sequencer and Data Availability (DA) instead of the Polygon zkEVM Sequencer and Ethereum L1 as DA.

The repo consists mainly of rust code, docker services and end-to-end tests to tie together the following code bases:

The Espresso Sequencer.
A fork of Polygon's zkevm-node used as submodule at ./zkevm-node.
A fork of Polygon's zkevm-contracts used as submodule at ./zkevm-contracts.

The diagram below shows the architecture. Note that only one of the zkEVM nodes is depicted for simplicity. The diagram is intended to give a simple conceptual overview, there may be slight discrepancies in how individual components are depicted and implemented.

Usage

To get the latest images: just pull
To start the demo: just demo.
To stop the demo: just down

Metamask

If not yet set up, install Metamask and set up a new wallet.
In metamask click on the three dots in the top right corner, then "Expand view".
On the newly opened page click on the three dots in the top right corner, then "Networks" -> "Add a network" -> "Add a network manually".

Use the following parameters:

Network name: espresso-polygon-zkevm-1
New RPC URL: http://localhost:18126
Chain ID: 1001

For interacting with the second rollup add a network with these parameters instead:

Network name: espresso-polygon-zkevm-2
New RPC URL: http://localhost:28126
Chain ID: 1002

For "Currency symbol" anything can be set and "Block explorer URL" should be left blank.

Faucet

To request funds from the local faucet run

curl -X POST http://localhost:18111/faucet/request/0x0000000000000000000000000000000000000000

replacing the zero address with the desired receiver address. Use http://localhost:28111 to talk to the faucet of the second node instead.

To copy your Metamask address click on the address at the top of the Metamask panel.

Preconfirmations

If you tried the demo as instructed above, using the RPC nodes at ports 18126 and 28126, you may notice some latency (perhaps around 30s) between confirming a transaction in Metamask and having the transaction complete. In large part, this latency is because of the path the transaction has to follow between being submitted and being finalized. Refer to the architecture diagram and you'll see that, after a transaction has been sequenced, the HotShot Commitment Service must pick it up and send it to the L1 blockchain, it must be included in the L1, which can be slow, the L2 node must pick it up from the HotShot Contract, and only then can the L2 node fetch your transaction from the sequencer query service, execute it, and report the results to Metamask.

This long round trip through the L1 slows down the transaction a lot, especially on a real L1 like Ethereum, with a block time of 12 seconds and a finality time of 12 minutes. (For the demo, we use a block time of 1 second, and the latency is still noticeable!) One of the advantages of using a decentralized consensus protocol as the sequencer is that it can provide preconfirmations: ahead-of-time assurances that your transaction will be included in the rollup, before the L1 has even seen that transaction. Since HotShot is a Byzantine fault tolerant consensus protocol, these preconfirmations come with a strong guarantee: any transaction which is preconfirmed by HotShot will eventually make it to the L1 (in the same order relative to other transactions), unless at least 1/3 of the total value staked in consensus is corrupt and subjected to slashing. In the right conditions, this guarantee can be as strong as the safety guarantees of the L1 itself, especially when restaking is used to let the L1 validators themselves operate the consensus protocol.

Of course, having your transaction included in the sequence quickly isn't all that helpful if it still takes a long time to get executed. Our demo includes a second L2 node for each L2 network, which is configured to fetch new blocks directly from the sequencer's preconfirmations and execute them immediately, bypassing the slow path through the L1 completely. You can experience the results for yourself: go back into your Metmask settings, to "Networks" -> "espresso-polygon-zkevm-1", and set "New RPC URL" to http://localhost:18127. Similarly, set the RPC URL for "espresso-polygon-zkevm-2" to http://localhost:28127. This tells Metamask to use the L2 nodes that use preconfirmations, instead of the L2 nodes that use the L1, when submitting and tracking transactions.

Once you've updated your settings, you can go back to your Metamask account and try making another transfer. It should complete noticeably faster, in around 5 seconds.

Changing the L1 Block Time

For convenience, this demo uses a local L1 blockchain with a block time of 1 second. This is good for experimentation, but it tends to understate the benefits of preconfirmations. In reality, the L1 will be a blockchain like Ethereum with a substantially longer block time. Ethereum produces a new block only once every 12 seconds, and blocks do not become final for 12 minutes (until that point they can be re-orged out of the chain).

If you really want to feel the UX difference between having preconfirmations and not having them, you can rebuild the local L1 Docker image to use a different block time, and then rerun the above experiment, using Metamask with ports 18126 and 18127 to try sending transactions without and with preconfirmations, respectively.

To rebuild the L1 Docker image and change the block time, use the following command:

ESPRESSO_ZKEVM_L1_BLOCK_PERIOD=12 just build-docker-l1-geth

You can replace 12 with whatever block time you want (in seconds).

When you're done and just want things to go fast again, use just build-docker-l1-geth to revert to the default (1 second), or just pull to sync all your Docker images with the official, default versions.

Hardware Requirements

The demo requires an Intel or AMD CPU. It's currently not possible to run this demo on ARM architecture CPUs, including Macs with M1 or M2 CPUs. See this issue for more information. You can however run the example rollup demo of the Espresso Sequencer.

Because the demo runs 4 zkEVM nodes (and thus 4 zk provers) it has fairly substantial resource requirements. We recommend at least the following:

6 GB RAM
6 CPUs
50 GB storage

On Mac, where Docker containers run in a virtualized Linux environment, you may have to take manual steps to allow Docker to access these resources, even if your hardware already has them. Open Docker Desktop and in the left sidebar click on Resources. There you can configure the amount of CPUs, RAM, and storage allotted to the Linux VM.

Lightweight Modular Demo

Since the full demo has such intense resource requirements, we have designed the demo to be modular, so you can get a much lighter version of it by starting only some of the services. To run a lightweight version of the demo, use just demo-profiles <modules>, replacing <modules> with the list of modules you want to start. The available modules are:

zkevm1: start a regular node and prover for the first L2
zkevm1-preconfirmations: start a node for the first L2 that uses fast preconfirmations
zkevm2: start a regular node and prover for the second L2
zkevm2-preconfirmations: start a node for the second L2 that uses fast preconfirmations

For example, to experiment with and without preconfirmations without the overhead of a second L2, you could run just demo-profiles zkevm1 zkevm1-preconfirmations. If you want to try out multiple simultaneous L2s but don't want the overhead of the secondary preconfirmations nodes, you could use just demo-profiles zkevm1 zkevm2.

Development

Obtain code: git clone --recursive [email protected]:EspressoSystems/espresso-polygon-zkevm-demo.
Make sure nix is installed.
Activate the environment with nix-shell, or nix develop, or direnv allow if using direnv.
Install NPM dependencies with pnpm i
Run just to see the available just recipes.

To know more about the environment check out the following files

.env: Environment variables
docker-compose.yaml: Polygon zkEVM services
standalone-docker-compose.yaml: Espresso Sequencer services

Another good place to start is the end-to-end test in polygon-zkevm-adaptor/tests/end_to_end.rs.

Test

To run the tests, run

just pull # to pull docker images
cargo test --all-features

Some of the tests use Docker. In particular, reorg tests automate Docker via the Docker Unix socket. If you are running Docker Desktop for Mac, you need to configure it to create a symlink for this socket, which you can enable with Settings -> Advanced -> Allow the default Docker socket to be used.

Figures

To build the figures, run

make doc

Contracts

Ensure submodules are checkout out: git submodule update --init --recursive
Install dependencies just npm i
Compile the contracts just hardhat compile
Update the rust bindings: just update-contract-bindings
Update the zkevm-node contract bindings to match zkevm-contracts: just update-zkevm-node-contract-bindings

Misc

Building docker images locally

Build the docker images locally: just build-docker.
Revert to the CI docker images: just pull.

Authenticate with GitHub container registry

This is only required to download "private" docker images from the GitHub container registry.

Go to your github profile
Developer Settings > Personal access tokens > Personal access tokens (classic)
Generate a new token
- for the scope options of the token, tick the repo box.
Run docker login ghcr.io --username <you_github_id> --password <your_personal_access_token>

Handling git submodules

The project requires to use git submodules. In order to avoid corrupting the state of one of those submodules you can:

run git submodule update before making changes,
or configure git to automatically update submodules for the repository with git config submodule.recurse true inside the repository.

Disclaimer

DISCLAIMER: This software is provided "as is" and its security has not been externally audited. Use at your own risk.

espresso-polygon-zkevm-demo's People

Contributors

Stargazers

Watchers

Forkers

iprotoni ggkkdev bawa74090 jorya1 mountainpeak1 hmd7ai sartimo dshivaay-23 bersezk skyrith kongtoubaofu dabao8888 hknnr proof0s tudorpintea999

espresso-polygon-zkevm-demo's Issues

Preconfirmations node does not handle catchup correctly

If the preconfirmations node is reset or started late, it starts in a de facto catchup mode, where the L2 batches being synced via preconfirmations are well behind HotShot, and more importantly, well behind the batches verified on L1 by the regular node. When this is happening, we get many errors due to batches referenced by verified batches not being synced yet: https://app.datadoghq.com/logs?query=host%3A%22%2Ftestnet%2Fsepolia%2F%22%20service%3Acloudwatch%20%40aws.awslogs.logGroup%3A%22%2Ftestnet%2Fsepolia%2F%22%20%40aws.awslogs.logStream%3A%22zap%2Fpreconf_zkevm%2F173a1cca6d2f431c870910a29006abe0%22%20%22violates%22%20&cols=status%2C%40logger.name%2C%40error.message&context_event=AYr7TL2QAABRlCMVdimp5ABE&event=AgAAAYr-p_EEQ-LqxAAAAAAAAAAYAAAAAEFZci1wX2JEQUFETWVxT0lVekVUU0FBagAAACQAAAAAMDE4YWZlYTgtMjIxMC00ZmJjLWE5NjMtMzE1MzExYThiYzM1&index=%2A&messageDisplay=inline&refresh_mode=sliding&saved-view-id=2130045&storage=hot&stream_sort=time%2Cdesc&viz=&from_ts=1696403216408&to_ts=1696489616408&live=true.

These errors seem to cause problems with the catchup process itself. I believe a fix would be to not start syncing L1 blocks until we have synced L2 batches up to the HotShot head.

RUSTSEC-2022-0093

This has been patched. We should look into upgrading ed25519-dalek, or wait until our dependencies have done so, if necessary.

Regular node does not handle two NewBlocks events in the same L1 block

The commitment task included two updates in the same L1 block:
https://sepolia.etherscan.io/tx/0x1f7c64a31f523d46ad6a61a6b6ae849b27cff2de66ab3e38629fd4d420371ea9
https://sepolia.etherscan.io/tx/0x982914455cdbea0b5e89f5292275fae0d34759ab140247aad61ca5dc4e82a5df

The zkevm node did not like this, it seems to have skipped the second update, which made it think that the first update from the following L1 block contained L2 transactions from the future:
https://app.datadoghq.com/logs?query=%40host%3A%22%2Ftestnet%2Fsepolia%2F%22%20%40aws.awslogs.logStream%3A%22zap%2Fzkevm%2F011f0b577cf64142858924f600a04a35%22%20&cols=status%2C%40logger.name%2C%40error.message&event=AgAAAYsklQ5hsHUM0QAAAAAAAAAYAAAAAEFZc2tsUnRVQUFCa3BRWmRZZzM2WXdBRQAAACQAAAAAMDE4YjI0OWMtMmMxYy00NGM1LWFiZjUtMjY4ZTZkZjc4N2U4&index=%2A&messageDisplay=inline&refresh_mode=paused&saved-view-id=2130045&stream_sort=desc&viz=stream&from_ts=1697125769385&to_ts=1697125835090&live=false

FEATURE: More options for disabling faucet discord integration

The discord integration is disabled if empty-string (or the env value is not set) but it would be nice to have a non-empty-string value that also disables the integration

polygon DAC for Data- availavility

Is data availability layer is configurable as we are using tiramisu as data availability for L1 ,can we use polgon DAC or any other DA layer such as celestia ?

Replace references to hermez with polygon-zkevm

Fix host key verification failed

This github action can't fetch a private repo because the host key verification failed: https://github.com/EspressoSystems/espresso-polygon-zkevm-demo/actions/runs/4881609881/jobs/8710719713#step:8:106

RUSTSEC-2023-0052 and RUSTSEC-2023-0053

Regular node does not handle reorgs correctly

After a seemingly endless stream of docker errors, I finally got the slow tests working correctly, and encountered a real bug.

The L1 chain diverges at block 46
2023-12-15T22:25:26.7817200Z [22:25:26] Forking reorgme_geth_child_0_0 -> Found block 0 - 46:0x7ab on reorgme_geth_child_0_0
The L2 node gets a NewBlocks event with HotShot block 1, from L1 block 47
46043 2023-12-15T22:25:52.6723460Z zkevm-1-permissionless-node-1 | 2023-12-15T22:25:52.667Z DEBUG etherman/etherman.go:542 NewBlocks event detected &{FirstBlockNumber:+1 NumBlocks:+19 Raw:{Address:0x5FbDB2315678afecb367f032d93F642f64 46043 180aa3 Topics:[0x8203a21e4f95f72e5081d5e0929b1a8c52141e123f9a14e1e74b0260fa5f52f1] Data:[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19] BlockNumber:47 TxHash:0x303 46043 340020645eef50334fe699ceb5ac3e2b0bb1a95b0a71cc553462da2f04772 TxIndex:0 BlockHash:0x498a8dd8e7f4858fcedea97de26f2275f56979ce05ef43bcd631393add1ef8d5 Index:0 Removed:false}} {"pid": 1, "version": ""}

The L2 node creates a corresponding batch with L1 block number 44, using the L1 head from the HotShot block, rather than the block containing the event

044 2023-12-15T22:25:52.6776479Z zkevm-1-permissionless-node-1        | 2023-12-15T22:25:52.677Z    INFO    etherman/etherman.go:678        Creating batch number 1 {"pid": 1, "version": ""}
46045 2023-12-15T22:25:52.6779111Z zkevm-1-permissionless-node-1        | 2023-12-15T22:25:52.677Z    INFO    etherman/etherman.go:721        Fetched L1 block 44, hotshot block: 1, timestamp 1702679109, transactions 0x    {"pid": 1, "version": ""}

The L2 node goes on to process 143 batches through 59 L1 blocks
2023-12-15T22:27:58.8469685Z zkevm-1-permissionless-node-1 | 2023-12-15T22:27:58.844Z INFO etherman/etherman.go:721 Fetched L1 block 59, hotshot block: 143, timestamp 1702679270, transactions 0x {"pid": 1, "version": ""}
The L1 node rejoins the network, and reorgs from block 69 to 46
2023-12-15T22:28:11.5711755Z [22:28:11] Joining reorgme_geth_child_0_0 -> Compare with other nodes 0 - 69:0x4a8 vs 69:0x4a8

The L2 node detects and handles the rerog

65683 2023-12-15T22:28:12.5938078Z zkevm-1-permissionless-node-1        | 2023-12-15T22:28:12.576Z    DEBUG   synchronizer/synchronizer.go:481        Reverting synchronization to block: 45  {"pid": 1, "version": ""}
65684 2023-12-15T22:28:12.5940739Z zkevm-1-permissionless-node-1        | 2023-12-15T22:28:12.581Z    INFO    ethtxmanager/ethtxmanager.go:215        processing reorg from block: 46 {"pid": 1, "version": ""}

The L2 node gets a NewBlocks event starting from HotShot block 1, since all later HotShot blocks had been reorged out of the L1 state

66011 2023-12-15T22:28:15.7771755Z zkevm-1-permissionless-node-1        | 2023-12-15T22:28:15.776Z    DEBUG   etherman/etherman.go:542        NewBlocks event detected &{FirstBlockNumber:+1 NumBlocks:+19 Raw:{Address:0x5FbDB2315678afecb367f032d93F642f64  66011 180aa3 Topics:[0x8203a21e4f95f72e5081d5e0929b1a8c52141e123f9a14e1e74b0260fa5f52f1] Data:[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19] BlockNumber:80 TxHash:0x303  66011 340020645eef50334fe699ceb5ac3e2b0bb1a95b0a71cc553462da2f04772 TxIndex:1 BlockHash:0xc21531d17eadbf97cb4e52e42ebf36d3d74677d4b02de3dd5da87d189507986f Index:2 Removed:false}}    {"pid": 1, "version": ""}

Despite the L2 node's state having been reset, it thinks that the latest batch processed is 17, and refuses to process the new batches
2023-12-15T22:28:15.7823865Z zkevm-1-permissionless-node-1 | 2023-12-15T22:28:15.781Z ERROR etherman/etherman.go:691 received old batch 1, prev batch is &{17 45 1702679126} {"pid": 1, "version": ""}

This happens because we store the L2 batches with an L1 block number older than the block where we received them, since we use the L1 origin from HotShot. Therefore, when we deleted all blocks older than 46 to handle the reorg, we did not delete batches with L1 origin older than this, even though some of those batches were affected by the reorg -- like batch 1, which has L1 origin 44, but was received in L1 block 47.

Using the L1 origin from HotShot is correct, since we need to be consistent with the preconfirmations node, which doesn't even know about the block where we received the event from L1. However, for the non-preconfirmations node, the block where we received the event does matter -- not for computing the state, but for processing reorgs.

To fix this, we can make batches in the database reference the block they were received in, so that deleting that block will also delete the corresponding batch. But this reference can be a new column, like ReceivedAt, rather than the existing BlockNumber column, so that it does not effect execution -- it's only use is to cause a deletion of that block to cascade into a deletion of the batch.

Add a second L2 RPC server with fast preconfirmations

The L2 node that we currently deploy in the demo listens to events from the sequencer contract to determine when a new block has been sequenced. This is necessary for the prover, since it cannot submit a proof before the L1 has seen that block. But it means that confirmations from the RPC server come at L1 latency, which is not very good for UX.

We should add a second L2 node that only runs an RPC server, not a prover, and listens to new blocks directly from a sequencer node. This server will be able to confirm transactions at consensus speeds, much faster than L1 speeds. This will make for a smoother UX, improve the rate that the faucet can service requests, and make for a compelling illustration of the benefits of preconfirmations, by directly comparing the UX of the two different RPCs.

Modify L2 node to take a configuration parameter to switch between the contract event stream and the sequencer event stream
Modify Etherman to read new blocks directly from the sequencer when configured to do so
Modify permissionless demo to deploy two instances of the L2 node, one fast and one slow
Modify faucet configuration to point at fast L2 node
Modify README to point users at the fast RPC by default, and also tell them about the slow RPC and explain the difference

Modify adaptor query service to get L1 block info from sequencer blocks

EspressoSystems/espresso-sequencer#536 will have the block proposer add the L1 block number and timestamp to each sequencer block. The adaptor query service should use this instead of its own matching algorithm. This will make the adaptor stateless again, and hence not a single point of failure.

This should not be very much work and mostly consist of deleting code, but it's not strictly necessary for Cortado, hence stretch.

Preconfirmations node does not handle L1 reorgs correctly

On 2023-10-04, there was a reorg on Sepolia. The regular zkevm node continued functioning, but the preconfirmations node got into a bad state, where there was a batch missing from the database, causing it to fail when it synced trusted state, and retry syncing from the same block repeatedly, ultimately not making progress.

It appears that pgstatestorage.Reset deletes all L1 blocks after the reorgs as well as all L2 batches that reference those L1 blocks, implicitly, via the foreign key constraint. For the regular node, this is fine, since syncing of L2 batches is synchronized with syncing of L1 blocks: it will start reprocessing L1 blocks from the point of the reorg and rediscover the L2 batches that were deleted. But the preconfirmations node syncs L2 batches in a completely separate goroutine. While this routine does start from the last synced batch each time around the loop (which should get reset after a reorg), it syncs multiple batches at once. So we could have this sequence:

Preconfirmations task starts syncing from batch 10
Discovers the current HotShot block height is 15, enters a loop where it will sync batches 10-15 *without re-checking the latest sync batch)
Syncs batch 10
A reorg happens that deletes batch 10
Syncs batch 11-15
Repeats the outer loop, finds the latest synced batch is 15 -- which is in the database! But what we don't realize is that batch 10 is missing

So we never go back and re-sync batch 10. Probably the simplest way to fix this is just to have the preconfirmations task check the latest synced batch each batch it processes, to see if there has been a reorg that deleted the last batch. This shouldn't be too expensive since processing a batch accesses the DB anyways. Maybe another approach would be to have the main synchronizer task send the preconfirmations task a message over a channel when there is a reorg, and have the preconfirmations task do some kind of select loop.