Coder Social home page Coder Social logo

ansible-collection-general's Introduction

Ansible Collection - ethpandaops.general

Lint Integration Integration (ethereum_node)

A collection of reusable ansible components used by the EthPandaOps team.

Roles

Ethereum tooling

Ethereum client pair

Ethereum execution clients

Ethereum consensus clients

Ethereum L2 clients

General purpose tooling

Prometheus exporters

Hetzner

Usage

Currently we're not publishing the collection to Ansible Galaxy. We'll do that once it grows bigger.

To install the collection directly from our git repository you can do the following:

ansible-galaxy collection install git+https://github.com/ethpandaops/ansible-collection-general.git,master

Or using a requirements.yml file that looks like:

collections:
  - name: ethpandaops.general
    source: https://github.com/ethpandaops/ansible-collection-general.git,master
    type: git

Then run the following command:

ansible-galaxy install -r requirements.yml

Local testing and development

Clone the repository. Make sure that you follow that directory structure, otherwise ansible test won't work:

git clone [email protected]:ethpandaops/ansible-collection-general.git ansible_collections/ethpandaops/general

If you want to test and develop on this ansible collection you'll need some tools. We're using asdf to commit to certain versions of those tools. Some additional python specific tools are defined in the requirements.txt.

Make sure you have asdf installed and then you can run the ./setup.sh script which will install all required tools.

For linting and sanity checks you can run the following commands:

ansible-lint
ansible-test sanity

Some roles have molecule tests inside. You can check this if a role has a molecule directory within. To run molecule ona given role you can do the following:

cd roles/blockscout
molecule test

If you want to test the ethereum_node role with molecule, you can pass it the specific execution and consensus clients via ENV vars:

cd roles/ethereum_node
EXECUTION_CLIENT=geth CONSENSUS_CLIENT=lighthouse molecule test

License

MIT License

ansible-collection-general's People

Contributors

skylenet avatar barnabasbusa avatar parithosh avatar pk910 avatar savid avatar samcm avatar threewebcode avatar mieubrisse avatar

Stargazers

André Claro avatar  avatar Yong Kang Chia avatar  avatar Pablo Castellano avatar Markkus Millend avatar ElasticRoentgen avatar  avatar Tobias Leinss avatar  avatar  avatar Georg avatar Avenbreaks avatar Pavel Shibanov avatar  avatar  avatar  avatar Chris Hager avatar  avatar Javed Khan avatar Samuel Laferriere avatar  avatar  avatar Stefan avatar Will Pankiewicz avatar

Watchers

Lucian avatar r0qs avatar  avatar Kostas Georgiou avatar  avatar Ahmad avatar  avatar

ansible-collection-general's Issues

Auto add mev-boost related commands if the respective variable is set to true

e.g: if ethereum_node_mev_boost_enabled: true then we can add lighthouse_container_command_mev_args to the command automatically, this needs to be done with ternary on the respective client roles.

e.g:

      {{
        ethereum_node_mev_boost_enabled | ternary(
          lighthouse_container_command_args + lighthouse_container_command_extra_args + lighthouse_container_command_mev_args,
          lighthouse_container_command_args + lighthouse_container_command_extra_args
        )
      }}

Fix fail2ban issue

on init setup, fail2ban blocks the deployer IP. Need to find a way around it.

Add mock-relay/builder to collection

We usually want to test mev-boost codepaths in testnets and require a relay/builder infrastructure setup. This is mostly handled by external parties, but they join relatively late in the testing pipeline. In order to catch bugs earlier, we would need to spin up a mock-relay that would rely on a local node to act as a builder. This mock-relay can also modify the bid to make sure the relay provided block is used and it can additionally submit invalid blocks to cause some chaos on the network.

The mock-relay is a piece of software used in hive that we are reusing for our testing. The codebase can be found here: https://github.com/ethereum/hive/tree/master/simulators/eth2/common/builder/mock

The Dockerfile for the mock-relay can be found here: https://github.com/marioevz/hive/blob/builder-as-external-command/simulators/eth2/common/DockerFile.mock_builder

Few dependencies before this issue is addressed:

  • Add ability to build the mock_builder as a standalone tool, it currently builds as a package on hive main and we used a fork in the past
  • Add dockerfile for mock_builder into hive itself
  • Consider adding some automation for building the tool and pushing to ethpandaops
  • Decide sane defaults for the mock_builder

Add web3 signer role

Should need something like this:

docker run -d -it --name web3signer \
-v /data/ethereum-network-config:/network-config:ro \
-v /data/teku-validator:/validator-data:rw \
consensys/web3signer:develop \
--http-listen-host "0.0.0.0" \
--http-listen-port 9000 \
--http-host-allowlist "*" \
--metrics-enabled=true \
--metrics-host "0.0.0.0" \
--metrics-port 9001 \
--metrics-host-allowlist "*" \
eth2 \
--network /network-config/config.yaml \
--Xtrusted-setup /network-config/trusted_setup.txt \
--keystores-path /validator-data/keys \
--keystores-passwords-path /validator-data/secrets \
--slashing-protection-enabled=false

forky genesis time is incorrect

Because this is how we get the genesis time:

ethereum_genesis_timestamp: "{{ lookup('ansible.builtin.pipe', '{{ ethereum_genesis_timestamp_relative_cmd[ansible_system] }}') }}"
ethereum_genesis_timedelay: 60

Forky's genesis time calculation will always be different.

genesis_time: {{ ethereum_genesis_timestamp | int + ethereum_genesis_timedelay | int }}

https://github.com/ethpandaops/ansible-collection-general/blob/master/roles/generate_kubernetes_config/templates/forky.yaml.j2#L43

This needs to be changed.

First time running playbook.yaml results error on ethereum_genesis

On macos:

TASK [ethpandaops.general.ethereum_genesis : Inform of failure] ***************************************************************************************************************************************************************************************************************************************************
fatal: [localhost -> 127.0.0.1]: FAILED! => 
  msg: |-
    The conditional check '(ethereum_genesis_generator_cmd.rc != 0) or (ethereum_genesis_generator_cmd.stdout == "")' failed. The error was: error while evaluating conditional ((ethereum_genesis_generator_cmd.rc != 0) or (ethereum_genesis_generator_cmd.stdout == "")): 'dict object' has no attribute 'rc'. 'dict object' has no attribute 'rc'
  
    The error appears to be in '/Users/bbusa/Documents/Ethereum.nosync/testnets/template-devnets/ansible/vendor/collections/ansible_collections/ethpandaops/general/roles/ethereum_genesis/tasks/generate_genesis.yaml': line 90, column 7, but may
    be elsewhere in the file depending on the exact syntax problem.
  
    The offending line appears to be:
  
      always:
        - name: Inform of failure
          ^ here

bug: ethereum_node_cl_validator_enabled=false gets ignored first time

When I run the ethereum_node playbook for the first time it seems to ignore all extra variables that are passed.

When I run the playbook a second time, the value is taken into consideration.
Example:

ethereum_node_cl_validator_enabled=false is set in the inventory file.

First time around (incorrect behaviour):

TASK [validator_keys : Create dest dir] *******************************************************************************************************************************************************************************************************************************************************************
changed: [lodestar-geth-1] => (item={'src': '/Users/bbusa/Documents/Ethereum.nosync/testnets/4844-testnet/ansible/inventories/devnet-5/files/validator_keys/lodestar-geth-1/keys/', 'dest': '/data/lodestar-validator/keys/'})
changed: [lodestar-geth-1] => (item={'src': '/Users/bbusa/Documents/Ethereum.nosync/testnets/4844-testnet/ansible/inventories/devnet-5/files/validator_keys/lodestar-geth-1/secrets/', 'dest': '/data/lodestar-validator/secrets/'})

TASK [validator_keys : Copy keys] *************************************************************************************************************************************************************************************************************************************************************************
failed:... Tries to copy the keys.

Second time around (correct behaviour):

TASK [validator_keys : Create dest dir] *******************************************************************************************************************************************************************************************************************************************************************
skipping: [lodestar-geth-1] => (item={'src': '/Users/bbusa/Documents/Ethereum.nosync/testnets/4844-testnet/ansible/inventories/devnet-5/files/validator_keys/lodestar-geth-1/keys/', 'dest': '/data/lodestar-validator/keys/'}) 
skipping: [lodestar-geth-1] => (item={'src': '/Users/bbusa/Documents/Ethereum.nosync/testnets/4844-testnet/ansible/inventories/devnet-5/files/validator_keys/lodestar-geth-1/secrets/', 'dest': '/data/lodestar-validator/secrets/'}) 
skipping: [lodestar-geth-1]

TASK [validator_keys : Copy keys] *************************************************************************************************************************************************************************************************************************************************************************
skipping: [lodestar-geth-1] => (item={'src': '/Users/bbusa/Documents/Ethereum.nosync/testnets/4844-testnet/ansible/inventories/devnet-5/files/validator_keys/lodestar-geth-1/keys/', 'dest': '/data/lodestar-validator/keys/'}) 
skipping: [lodestar-geth-1] => (item={'src': '/Users/bbusa/Documents/Ethereum.nosync/testnets/4844-testnet/ansible/inventories/devnet-5/files/validator_keys/lodestar-geth-1/secrets/', 'dest': '/data/lodestar-validator/secrets/'}) 
skipping: [lodestar-geth-1]

Make every RPC overridable in the zsh script

You can use the override var $PROVIDED_RPC or something and if it doesn't exist you can fallback to the basic RPC. Should allow us to use the scripts more widely. syntax is ${parameter:-default} afaik

Fix docker log rotation bug

We need to use the common log options instead of the log rotate role as from docker 23.x.x, the docker cli expects to be the sole owner of the docker logs file. If other processes touch the log file then the truncate options fail, i.e, --tail=.

Tracked here: docker/cli#4008
older moby issue: moby/moby#36853

We need to decide a proper path forward. Options:

  • Use common log options and just rotate logs on docker containers, i.e:
common_log_options:
  max-file: "10"
  max-size: 500m
  mode: non-blocking
  max-buffer-size: 4m
common_log_driver: json-file
  • Pin older version of docker

We additionally need to see if it makes sense to keep the logrotate role in light of this. Atleast we need to add a disclaimer so people don't inadvertently run into this issue.

`ethereum_genesis` linux permission issue

The ethereum_genesis role does not work when running under linux.

There is a permission issue with docker volumes.
It looks like there is some kind of owner mapping, when running the docker commands under linux.

eg. when running this command:

docker run --rm -t -u $UID
        -v {{ ethereum_genesis_generator_tmp_output_dir_register.path }}:/data
        -v {{ ethereum_genesis_generator_tmp_config_dir_register.path }}:/config
        {{ ethereum_genesis_generator_container_image }} all

When looking at the folder permissions from outside the container (host system), the /data & /config folders are owned by my user.
But when looking into the container, the /data & /config folders are owned by root.
Therefore running the container with -u $UID fails with permission errors.

The role properly runs when removing all the -u $UID flags from docker run commands.
There are 3 docker commands that need this fix (1x generate_genesis.yaml, 2x generate_validator_keys.yaml)

ERROR! A worker was found in a dead state

TASK [ethpandaops.general.docker_nginx_proxy : Add nginx config template] *************************************************************************************************************************************************************************************
ERROR! A worker was found in a dead state

ethereumjs docker stop timeout issue

Sometimes, due to the inefficiencies of ethereumjs, it cannot stop within 15 seconds (http current timeout).
Error message:

UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=15)

Workaround solution is mentioned in this issue: docker/compose#3927

TLDR we should set:

export DOCKER_CLIENT_TIMEOUT=120
export COMPOSE_HTTP_TIMEOUT=120

k3s ansible role creates unwanted deployments in kube-system

The k3s ansible role creates local-path-provisioner and metrics-server deployments in kube-system namespace during installation.

These tools are deployed by argocd, so they are sort of redundant.
The metrics-server keeps crashing due to invalid permissions:

W0713 07:57:56.436072       1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:metrics-server" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
E0713 07:57:56.436112       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:metrics-server" cannot list resource "configmaps" in API group "" in the namespace "kube-system"

The local-path provisioner in the cloud namespace that gets created by argocd fails, probably due to the other local-path-provisioner created by the k3s role.

Failed sync attempt to a6e54246622863096006e150e06edfa9435db3d9: one or more objects failed to apply, reason: StorageClass.storage.k8s.io "local-path" is invalid: provisioner: Forbidden: updates to provisioner are forbidden. (retried 5 times).

https://argocd.core.ethpandaops.io/applications/k3s-berlin-local-path-provisioner?view=tree&resource=&conditions=false&node=storage.k8s.io%2FStorageClass%2F%2Flocal-path%2F0

Would be good to have a flag of what tools should be deployed into the kube-system during installation.

Shared ssh key to create machines

When used a shared ssh key to create a new infra, the ansible script will work till the point of root login is disabled.

Workaround would be to create a new user e.g: ethpandaops and add the shared ssh key as its pubkey, and add ethpandaops github user as a member whose keys will always be present on all our machines.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.