Coder Social home page Coder Social logo

poanetwork / deployment-playbooks Goto Github PK

View Code? Open in Web Editor NEW
25.0 25.0 48.0 558 KB

Ansible playbooks for deployment POA Network nodes on EC2 or any Linux (Ubuntu 16.04) hosting. Includes master of ceremony, validator, bootnode, explorer, netstat roles

License: MIT License

Shell 86.61% Makefile 13.39%
ansible aws

deployment-playbooks's People

Contributors

arseniipetrovich avatar igorbarinov avatar natlg avatar phahulin avatar varasev avatar vitalyznachenok avatar vladimirnovgorodov avatar ykisialiou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deployment-playbooks's Issues

Upgrade SOKOL to parity 1.9.2+

Things to prepare:

  • playbook:
    • stop parity
    • backup current binary
    • backup current parity_data
    • backup current node.toml
    • download new binary
    • if bootnode - replace threads with processing_threads in node.toml
    • parity db kill ? resync ?
    • restart parity
    • restart netstats if it exists

Things to do:

Single system user for all the node types

Currently each node uses its own user to run processes. Bootnode uses bootnode and moc uses moc user. Each user has its own keyfile. There is superuser root.

Login in as root is considered as bad security practice. Deployment playbooks should run as ordinary user with sudo privileges. No root account configuration should be necessary.

I see no value for having different users on different hosts. The proposal is to have the same user running all the processes.

Playbooks put public key for the operator to the remote host so they can SSH as a poa user with their private key.

Refactor AWS-related code

To support multiple cloud providers deployment playbooks should clearly separate configuration management code and cloud resources provisioning.

To achieve this I propose

  1. Move AWS-related variables from the group_vars/all to a separate file
  2. Move AWS firewall configuration from -access roles to the -aws-access roles
  3. Move AWS provisioning code to an /aws subfolder

(Feature) Playbook for archiving

The role should allow having archiving for

  • validator
  • bootnode

At the moment, archiving node on kovan/mainent runs with parameters

RUST_BACKTRACE=1 parity --pruning=archive --fat-db on --cache-size-db 12000 --min-peers 5 --max-peers 10 --snapshot-peers 500 --pruning-history 1200 --no-ui --no-dapps --auto-update=all --no-discovery --allow-ips=public --no-periodic-snapshot

we should use parameters

Potential Issue: sokol branch group_vars/all.example - *_BRANCH tokens point to master

Is this OK?

jhl@johnny-lenovo:~/poa-network/deployment-playbooks/group_vars$ git branch

  • sokol
    jhl@johnny-lenovo:/poa-network/deployment-playbooks/group_vars$ grep -n BRANCH * | grep -v sokol
    all.example:34:TEMPLATES_BRANCH: "dev-mainnet"
    all.example:35:GENESIS_BRANCH: "master"
    all.example:42:SCRIPTS_MOC_BRANCH: "master"
    all.example:43:SCRIPTS_VALIDATOR_BRANCH: "master"
    jhl@johnny-lenovo:
    /poa-network/deployment-playbooks/group_vars$

Potential Issue: sokol branch group_vars/all.example - *_BRANCH tokens point to "master" branch

Is this OK?

jhl@johnny-lenovo:~/poa-network/deployment-playbooks/group_vars$ git branch

  • sokol
    jhl@johnny-lenovo:/poa-network/deployment-playbooks/group_vars$ grep -n BRANCH * | grep -v sokol
    all.example:34:TEMPLATES_BRANCH: "dev-mainnet"
    all.example:35:GENESIS_BRANCH: "master"
    all.example:42:SCRIPTS_MOC_BRANCH: "master"
    all.example:43:SCRIPTS_VALIDATOR_BRANCH: "master"
    jhl@johnny-lenovo:
    /poa-network/deployment-playbooks/group_vars$

Single point of configuration

Now we have two files - all.example and all.network. The point of this issue is to refactor all.example file to explicitly specify there all necessary variables for deployment. It will become a kind of instruction file for those who want to create their own network.

cosmetic issue in variable naming

Some templates have {{ansible_distribution_release}} variable declaration, without spaces. Looks like we should find and change all such variables to {{ some_var_name }}. Doesn't affect funtionality actually, but code will look better.

Create defaults for each role

From the point of simplicity it will be a good idea to explicitly define which variables are used by roles. To do that I propose to create defaults/main.yml files in each role, where will be all the necessary (for that role) variables defined.
When launched as a dependency from another role those defaults variables will be overwritten by group_vars/ folder variables, so it's safe to define them.
As well, we will be able to launch dependent roles separately.

Problem: README doesn't provide all the steps

Solution:
README.md file should have steps in order to run ec2.yml book. For example:
It should tell a user to setup AWS keys first in groups_var folder:
https://github.com/oraclesorg/deployment-playbooks/blob/master/group_vars/all#L15-L16

Question#1:
What is https://github.com/oraclesorg/deployment-playbooks/blob/master/group_vars/all#L24?
Where do I find it? Does it show up when I create EC2 instance?

Question#2:
Does the script assumes to have
https://github.com/oraclesorg/deployment-playbooks/blob/master/group_vars/all#L18
ssh keypairname already being set in aws keys? if not, is there an instruction on how to do so?

Question#3:
Any difference between files.pub vs ssh_bootnode.pub? Where each one is used?
https://github.com/oraclesorg/deployment-playbooks/tree/master/files

Questoin#4:
When you run ec2.yml how to verify that everything has completed as expected? Please provide verification steps.

Question#5:
When you run sites.yml how to verify that everything has completed as expected? Please provide verification steps.

Unnecesary Environment=MYVAR=myval declaration in templates

Some templates has unnecessary Environment=MYVAR=myval declaration, which should be cleaned up. The list of templates below:

  • roles/netstat/templates/poa-dashboard.j2
  • roles/poa-parity/templates/poa-dashboard.j2
  • roles/poa-parity/templates/poa-pm2.j2
  • roles/poa-pm2/templates/poa-pm2.j2
  • roles/explorer/templates/poa-chain-explorer.j2
  • roles/poa-parity/templates/poa-chain-explorer.j2
  • roles/poa-netstats/templates/poa-netstats.j2
  • roles/poa-parity/templates/poa-netstats.j2

What is the purpose of keeping separate branches for sokol and core?

As I can see the difference is in the group variables files and a single template file.

One may create two example files for different network types. Then parameterize template so it will be the same for both networks and put the configuration to the variables file. That way only one branch is needed which is easier to maintain and integrate with terraform provisioning.

Is there any reason for maintaining two additional branches that I am not aware of?

Default variables on the role level

Some variables may be moved to the role level and set up with reasonable defaults.

Variables that control access to the node:

allow_validator_ssh: true
allow_validator_p2p: true

Play variables:

nginx_headers: "on"
PROXY_PORT: "8545"
username: "bootnode"
users:
  - name: "bootnode"
home: "/home/bootnode"

Proposed structure is:

roles
|-bootnode
  |-defaults
    |-main.yml

(Feature) Add new settings to validator role

Problem:

  • some validators don't need ssh access, we need to disable ssh/22 for them
  • some validators don't need p2p port tcp/30303 open to the internet

Solution:
add variable allow_ssh: true/false
if it's true, ssh/22 is enabled on the security group. the default value is true

add variable allow_p2p: true/false
if it's true, tcp/30303 is enabled on a security group. the default values is true

Clone specific version of the codebase from the GitHub

Currenly git task downloads the latest version of code committed to GitHub repo.

git:
  repo: "https://github.com/{{ MAIN_REPO_FETCH }}/chain-explorer"
  dest: "{{ home }}/chain-explorer"

Proposal is to fix the version to a specific commit like this:

git:
  repo: "https://github.com/{{ MAIN_REPO_FETCH }}/chain-explorer"
  dest: "{{ home }}/chain-explorer"
  version: "acee07c"

The main reason is reproducible deployments. When I deploy from the same configuration I expect to get the same result. If I rely on the latest version of the code in the repository then my deployment in 1 month will be different from my deployment today.

This scenario is hard to troubleshoot. Also fixing the version allows controlled dependency upgrade that is visible in the commit history.

(Fix) Rename mining and owner roles

Problem:

  • we don't have mining, we have validation. No miners, but validators
  • we don't have owner, but we have master_of_ceremony

Solution:

  • rename mining to validator
  • rename owner to master_of_ceremony

Problem: ec2.yml failed

Issue:

TASK [Install python] ************************************************************************************************
The authenticity of host '13.56.182.154 (13.56.182.154)' can't be established.
ECDSA key fingerprint is SHA256:8f91Tzugvp0uA+5mRrqxpVkIWMhH753u5kiU3K9Hvfc.
Are you sure you want to continue connecting (yes/no)? fatal: [13.56.182.154]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation prompt: "}

It waits for an answer and it doesn't respond yes to it

Support for CentOS targets

Deployment playbooks support one base OS image at this time - Ubuntu 16.04. This issue requests support for CentOS 7.X as base image for POA nodes.

To add CentOS 7.x support:

  • Add CentOS base image to the local development environment
  • Add CentOS base image to the Terraform scripts
  • Change packages deployment method from apt to yum based on ansible_os_family variable
  • Fix issues with different package names on CentOS and Ubuntu
  • Fix issues with different configuration for the services on CentOS and Ubuntu
  • Verify changes by deploying nodes to CentOS and Ubuntu vms

Example of maintaining different set of tasks for different platforms:

# file: tasks.yml

- import_tasks: tasks_ubuntu.yml
  when: ansible_os_family == "Debian"

- import_tasks: tasks_centos.yml
  when: ansible_os_family == "RedHat"

(Feature) add json-threads parameter to node.toml for bootnode

Problem:
there is only one thread for bootnode which is not enough
Solution:
increase to 4 using parameter json-threads=4

--jsonrpc-threads THREADS        Turn on additional processing threads in all RPC servers.
                                   Setting this to non-zero value allows parallel cpu-heavy queries
                                   execution. (default: 0)

Refactor package list in preconf role

At this moment list of packages that this role installs looks like this:

    - bc
    - haveged
    - rsync
    - iotop
    - dstat
    - sysstat
    - htop
    - lbzip2
    - pigz
    - unzip
    - zip
    - mtr
    - tcpdump
    - openssh-client
    - sudo
    - mc
    - net-tools
    - screen
    - git
    - cloud-utils
    - build-essential
    - nload

They fall into different categories but it is impossible to tell what is the exact purpose of each package.

Sysadmin tools mtr and tcpdump are typical examples. Operator uses these packages to troubleshoot issues on the node. If drop these packages node is still operational.

POA Components dependencies If drop these packages some of poa components will stop working. When poa component stops dependency on this packet it should be removed from the system.

Ansible modules dependencies. Ansible modules depend on these packages to do their tasks. If the module is not needed then this package should not be installed.

The proposal:

  1. Move ansible modules dependencies to the role level where they are used.
  2. Move POA Components dependencies to the role that installs these components.
  3. Refactor Sysadmin tools into a separate role that user may choose to install optionally.
  4. Install no packages in the preconf role

Dependent roles

Current playbooks look like this

roles:
    - usermanager
    - nodejs
    - poa-logrotate
    - poa-parity
    - poa-pm2
    - poa-netstats
    - validator
    - validator-access

What they should look like is this

roles:
  - validator

Only a single role that defines node type.

If one wants to co-locate two roles on a single node it is possible and clear:

roles:
  - validator
  - explorer

To achieve this use Ansible role dependencies

Google as sole DNS provider

Noticed that Google is being used for DNS services and only google (8.8.8.8 and 8.8.4.4) inside of /group_vars/all. Although I do this myself, perhaps add another public DNS services provider...just in case google is unreachable? (Granted, if google's unreachable - we should probably be ducking and covering.)

screen shot 2018-01-23 at 4 20 05 pm

Problem: invalid AMIID

Steps:
When I run ansible-playbook with image: "ami-9d04e4e5" I get:

TASK [Launch instance] ***********************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Instance creation failed => InvalidAMIID.NotFound: The image id '[ami-9d04e4e5]' does not exist"}

Solution:
Please explain to the user how to create an AMI

Specify Vagrant compatibility mode

Vagrant complains about implicit compatibility mode selection.

Vagrant has automatically selected the compatibility mode '2.0'
according to the Ansible version installed (2.4.3.0).

Alternatively, the compatibility mode can be specified in your Vagrantfile:
https://www.vagrantup.com/docs/provisioning/ansible_common.html#compatibility_mode

To make it explicit add the following line to the Vagrantfile:

node.vm.provision :ansible do |ansible|
  ...
  ansible.compatibility_mode = "2.0"
  ...

Not all prerequisites completed when installing OpenSSL

While installing pyOpenSSL module there is a chance, that not all necessary packages already installed.
To avoid this i propose the following code changes for nginx role:

Current:

- name: Ensure python OpenSSL dependencies are installed.
  pip:
    name: pyOpenSSL
    state: present

Fix:

- name: Ensure OpenSSL dependencies are installed (Ubuntu)
  package:
     name: {{ item }}
     state: present
  with_items: 
    - build-essential
    - libssl-dev
    - libffi-dev
    - python-dev
  when: ansible_os_family == "Debian"
 
- name: Ensure OpenSSL dependencies are installed (CentOS)
  package:
     name: {{ item }}
     state: present
  with_items: 
    - gcc
    - libffi-devel
    - openssl-devel
    - python-devel
  when: ansible_os_family == "RedHat"

- name: Install OpenSSL module
  pip:
    name: pyOpenSSL
    state: present

Update CORE to 1.9.2

Update instructions: https://github.com/poanetwork/poa-devops/blob/master/docs/Update-parity-version.md


  • Migration playbook (https://github.com/poanetwork/poa-devops/blob/master/roles/upd-parity-version/tasks/main.yml):

    • add cors section for bootnodes
  • Update nodes (I round - less critical nodes):

    • Bootnode-Archive
    • Bootnode-Traces
    • Explorer
  • Update defaults in deployment playbooks repository (#86):

  • Update nodes (II round - public rpc nodes and MoC):

    • Bootnode A
    • Bootnode B
    • Bootnode C
    • Bootnode D
    • Bootnode MoC OVH
    • Master of Ceremony
  • Update nodes (III round - part of validators)

    • Jeff Flowers
    • Jim O'Regan
    • John H LeGassic
    • Roman Storm
  • Notify BINANCE?

  • Update nodes (IV round - other bootnodes and validators):

    • Bootnode Atlanta A
    • Bootnode Azure Brazil A
    • Bootnode Chicago A
    • Bootnode Chicago B
    • Bootnode Chicago C
    • Bootnode Dallas A
    • Bootnode Frankfurt
    • Bootnode Santa Clara
    • Bootnode_Paris
    • Bootnode_YTI
    • Alex Emelyanov
    • John D. Storey
    • Lillian Chan
    • Melanie Marsollier
    • Michael Milirud
    • Rocco Federico Mancini
    • Sherina Hsuan Yang
    • Stephen Arsenault
    • Sviataslau Vishneuski
  • Update nodes (V round - backup)

    • Bootnode-HB

Split playbook based on node types

Currently all the node types plays are defined in site.yml. This issue suggests splitting the site.yml file into a set of files like validator.yml. Then these files are imported to the main file with import_playbook statement. This structure is modular and can be easily integrated with Terraform.

For example, to provision only the validator node one runs

ansible-playbook validator.yml

Current validator.yml moves to the aws subfolder as it contains aws-related code.

Installation path prefix

Prefix each role with the installation_path variable and make sure role does not install anything outside of this path.

This step simplifies different roles collocation on the same vm and playbook dependency on the single home variable.

Refactor `-access` roles

Currently -access roles contain code for configuration of EC2 firewall and system firewall. Moving to new cloud providers like Azure require splitting node configuration from provider configuration.

  1. Move ufw.yml file into the role itself, like from moc-access to moc
  2. Import this task from the main.yml file in this role
  3. Rename -access role to the -aws-access role
  4. Apply -aws-access role to the EC2-related playbooks and skip dependency from the moc role

These suggestions apply to all five node types.

Please consider supporting interactive auth for playbooks deployments

If the SSH keys are password protected, ansible-playbook fails with

TASK [hf-spec-change : Shutdown poa-netstats service] *****************************************************************************
fatal: [52.191.165.235]: FAILED! => {"changed": false, "msg": "Unable to stop service poa-netstats: Failed to stop poa-netstats.service: Interactive authentication required.\nSee system logs and 'systemctl status poa-netstats.service' for details.\n"}
to retry, use: --limit @/home/mm/poa-devops/site.retry

The workaround is to use non-password-protected ssh keys, but that's a security vulnerability if the control system is compromised. Suggest looking into allowing interactive auth during deployment.

Best, MM

(Feature) Move node.toml files to playbook

In the current configuration node.toml files are downloaded from https://github.com/poanetwork/deployment-azure repository. Then additional parameters are added via text replacement or appending.

It would be better to store them directly in playbook as templates: everything will be kept in one place && it's much cleaner - with text replacement it's not clear what the resulting file looks like.

Update SOKOL to parity 1.10

Update instructions: https://github.com/poanetwork/poa-devops/blob/master/docs/Update-parity-version.md


Things to prepare

Things to do

  • Update nodes (round I):

    • Bootnode
    • Bootnode-Traces
    • Bootnode-CentOS
    • Bootnode-Test-NSG
    • Bootnode-Test
    • Bootnode-Orchestrator-1
    • Bootnode-Orchestrator-2
    • Bootnode-Orchestrator-3
    • Master of Ceremony
    • Master of Ceremony (CentOS) (skips update)
  • Ping Andrew Cravenho to check poa explorer

  • Update nodes (round II):

    • Jeff Flowers
    • Jeff Flowers (Val)
    • Jim O'Regan SOKOL Validator Chicago B
    • John H. LeGassic
    • Roman Storm
  • Update nodes (round III):

    • Adam Kagy Sokol Validator
    • Alex Emelyanov | Sokol
    • Bootnode Oxana K
    • Bootnode Sokol Las Vegas A
    • Henry Vishnevsky (skips update)
    • Ilmira Nugumanova
    • Jim O'Regan SOKOL Validator Chicago A (skips update)
    • John D. Storey
    • Lillian Chan
    • Marat Pekker
    • Melanie Marsollier
    • Michael Milirud
    • MM Azure EastUS Bootnode (skips update)
    • Rocco Federico Mancini
    • Sherina Hsuan Yang
    • Sokol Bootnode Toronto A
    • Sokol Walter Karshat
    • Stephen Arsenault
  • Check consensus

    • Add and remove a test validator
  • Update archiving nodes (round IV):

    • Bootnode-Archive
    • Bootnode-HB

Local tests do no stop on error

If machine fails to start during make test the tests show the error and then try to start the next machine.

The desired behavior is to fail on the first error.

The possible solution is

- vagrant up $$i || exit 1; \
+ vagrant up $$i; \

Use Ansible-lint

I am working on PR that will update playbooks to pass Ansible-lint. Linter checks the code against set of rules and generates errors in case of their violation.

I would suggest running linter as a pre-commit hook so each commit is validated.

Also run ansible-playbook --syntax-check to verify syntax and show deprecation warnings.

Local development environment

Running virtual machine in the cloud just to test the change in the playbook is time consuming.

Better way is to test changes locally before applying them to master branch. There are two possible ways of doing this:

  1. vagrant with VirtualBox & Ansible provisioners
  2. vagrant with Docker & Ansible provisioners

Both options use Ubuntu images similar to those used in the cloud.

The second option is faster and require less resources.

Does this project use some kind of local testing already?

The workflow will look like:

  1. Make a change to a playbook
  2. vagrant up runs Docker container and applies playbook to this container
  3. Check the result (manually for now, run integration tests in the future)
  4. vagrant down tears down the container
  5. Commit the change (pre-commit hook runs linter and syntax checks)

groups based on environment

As a possible solution to #74 one may create two host groups in the inventory - sokol and core. Then add sokol-related variables to the group_vars/sokol file and core-related variables to the group_vars/core file.

Variables will live in a single branch and may be applied based on what deployment group hosts belong to. That makes possible maintaining both sokol and core deployments from a single playbook.

group_vars/all should be renamed to group_vars/poa which is applied to the poa group. poa group includes sokol and core as children.

Refactor variable naming

This issue discusses variable naming conventions. Now there is some structure in naming like bootnode_orchestrator but it is used for subset of variables and is not fully consistent.

All variables must have prefix defining their scope. Possible scopes for node types are validator, moc, explorer, netstat, bootnode. Possible scopes for the network-level variables are test and prod. It is possible to use role name as scope for low-level roles like parity_.

Suffix describes the variable type as Ansible does not have strict type system. It is optional and should be used when the type of the variable is not clear from the context. Example: parity_binary_url.

All variables that are not intended to be modified by the user should be moved to the role level /vars folder. Example: NODE_PWD: "node.pwd", NODE_SOURCE_DEB.

User should be able to configure role just looking at its defaults folder.

All variables should be lower-case.

AWS-related variables must be in a separate file.

Usernamager role can't create users when running as non-root user

When running the playbook I get an error:

$ ansible-playbook -i hosts site.yml
...

TASK [usermanager : Create users] ******************************************************
failed: [bootnode/0] (item={u'name': u'bootnode'}) => {"changed": false, "item": {"name": "bootnode"}, "msg": "useradd: Permission denied.\nuseradd: cannot lock /etc/passwd; try again later.\n", "name": "bootnode", "rc": 1}

I expect this task to use sudo or the whole play run with become: True.

Ansible connects to the remote machine as user poa. This user has passwordless sudo access:

poa@netstat:~$ sudo whoami
root

As fix I add become: True to the whole play:

- hosts: bootnode
  become: True
  vars:
...

Default variables

Currently all the variables are in the group_vars/all file.

  1. some variables are not intended to be modified by the user
  2. some variables may assume reasonable defaults

The first type of vars should be moved to the role level vars folder and the second to the default folder.

Variables on playbook level will be specific to the environment (sokol or core) and contain only required ones. That way it will be easier for the user to configure the deployment.

Circular dependencies in the roles

Let's consider explorer role.

explorer role depends on poa-parity:

---
dependencies:
  ..
  - { role: poa-parity }

But poa-parity itself has dependency on explorer (and netstats) by using template roles/poa-parity/templates/poa-chain-explorer.j2.

The scope of this issue to resolve such circular dependencies. Node-level roles depend on other roles. But this roles never depend on the node roles - only roles down the hierarchy. Instead second-level roles get configuration and configure the templates accordingly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.