Coder Social home page Coder Social logo

grid_deployment's Introduction

Threefold Grid backend

This repo provides all tools required to setup a Threefold Guardian stack. Such a stack will be completely standalone, is made up of several services and provide you with all available grid functionality.

Grid backend services with docker compose

One can deploy a full grid backend stack with docker compose for each of the Threefold Grid networks (Devnet, QAnet, Testnet & Mainnet).
Have a look at the documentation to get started.

Grid Hub

The hub is used to distribute flist files for ZOS to boot a users workload.

Grub bootstrap

The bootstrap services has to task to provide files to boot from over the internet.

TFchain Validator

The grid run on TFchain, here you can find an easy installer to setup a validator.

Grid snapshots

Daily snapshots can be found here: https://bknd.snapshot.grid.tf/
Have a look at the docs to setup your own snapshot creation

One can also use RSYNC to download the snapshots

grid_deployment's People

Contributors

coesensbert avatar delandtj avatar despiegk avatar hossnys avatar mik-tf avatar peternashaat avatar robvanmieghem avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grid_deployment's Issues

Improve 'service not ready' errors after a deploy

When starting a deploy, even if all the data is synced, it takes time for a service to start. If another service depends on it there are some error's like:

Gridproxy

2023-03-15T17:06:36Z error failed to connect to endpoint, retrying error="node 'ws://tfchain-public-node:9944' is behind acceptable delay with timestamp '2023-02-21 02:56:18 +0000 UTC'"
2023/03/15 17:06:36 Connecting to ws://tfchain-public-node:9944...
2023-03-15T17:06:36Z error failed to connect to endpoint, retrying error="node 'ws://tfchain-public-node:9944' is behind acceptable delay with timestamp '2023-02-21 03:01:36 +0000 UTC'"
2023/03/15 17:06:37 Connecting to ws://tfchain-public-node:9944...
2023-03-15T17:06:37Z fatal failed to create server: failed to connect to substrate: node 'ws://tfchain-public-node:9944' is behind acceptable delay with timestamp '2023-02-21 03:06:55 +0000 UTC

Activation service

2023-03-15 17:05:35 API-WS: disconnected from ws://tfchain-public-node:9944: 1006:: connection failed

..

Figure out if it's feasible to add docker-compose health checks for the proper services to eliminate these error's. So only start a service (that otherwise generates errors), if another services passes the health check.

https://docs.docker.com/compose/compose-file/#healthcheck

Add Caddyfile to .gitignore

We regularly have to adjust the caddyfile in deployments, mostly to add 02 or 03 at the beginning of a url.

  • move Caddyfile to Caddyfile-example
  • add to .gitignore
  • add cp to install_grid_bknd.sh: cp Caddyfile-example Caddyfile

for dev, test and mainnet

package the hub

should be able to run it separately, with optional tooling to sync from another backend (can be as easy as rsync or so)

.env: separate unique deployment data from service versions

the .env currently has specific deployment data and the service versions. This files is in .gitignore so it's not possible to just git pull newer versions without manual adjustments.

Test separate environment files to have the versions in one that is not in .gitignore

Trouble trying to deploy grid on node 921

Issue

I try to deploy following the readme.md of docker-compose and can't access the website.

# deploy VM (node 921) on tfgrid with ipv4, ipv6, 32GB RAM, 750GB SSD, 32 vcores
# set A record for IPV4 and AAAA record for IPV6 for threefold.pro (Host: @)
# verify DNS records point to threefold.pro 

# set the VM

git clone https://github.com/threefoldtech/grid_deployment
cd grid_deployment/docker-compose/mainnet
echo .subkey_mainnet >> .gitignore
../subkey generate-node-key > .subkey_mainnet

cp .secrets.env-example .secrets.env
cp Caddyfile-example Caddyfile

# adjust .secrets.env file with domain, node key (in .subkey_mainnet) and tfchain mnemonics
nano .secrets.env

# launch the script
sh install_grid_bknd.sh

Screenshots

deploy1

deploy2

deploy3

deploy4

deploy5

deploy6

deploy7

deploy8

Add new dashboard variables

Full TFGrid Validator Stack Deployment

Overview

  • End goal
    • develop procedures to deploy a full tfgrid validator stack
      • create the compose file and procedures

This issue presents all the components of the grid stack. Note that bridges will be added later.

Validator Stack Components

  • TFChain Node (our blockchain node)
  • TFHub (lets people go from docker to tfgrid ZOS flists)
  • TFBootstrap (how to install new node)
  • Explorer (has all stats)
  • Validator Code (keeps the grid clean & healthy)
  • Monitoring Software

Notes on the Name

We call it validator stack for now but we might need to change it to something more precise.

Additional Notes for Later Phase

We provide context as to why we need to provide those compose files. For this precise project, all we care about is building the procedures and docs for the compose files.

Context to the Project

We need to create those compose files and procedures to ultimately be able to have 6 validators hosting independent grid stacks.

  • we will have 6 official validators running the full stack
  • they will all be grid.tf with their location + service names
    • e.g. we have dubai, ghent, etc.
    • dashboard.ghent.grid.tf
    • dashboard.dubai.grid.tf
    • etc.
      - we set load balances to each 6 locations
      - stacks are completely independent with all data needed
    • some services will need to be replicated, e.g. monitoring

UX/Architecture

  • Users can access any of the 6 validator stacks (e.g. dashboard.ghent.grid.tf) and also the official one (dashboard.grid.tf)
  • Anyone can deploy the full stack if they want to

Add new weblets variables

find a safe way to expose the cockroach console

this console can't be exposed publicly but will give valuable insights in how the db is behaving. There are some disk iops performance issues with mainnet db size or on slow disks, which we will make a ticket for soon. For that, good metrics will help a lot.

indexer_db:8080

env prerequisite script

For now, only debian-based compatibility:

  • update/upgrade distro
  • install some troubleshooting tools
  • install docker
  • install docker-compose

Docker-compose to build a full Grid backend stack

Why

It should be easy for anyone to setup all the services required to host a complete Grid V3 backend. Anyone using these services should have all the tools to fully utilize the Grid.

Goal

A well prepared docker-compose package that will setup all required services with minimal user input, largely automated. Simple to setup, use and monitor.

Components - service management

Components - services

  • TFchain public node
  • Graphql stack (Indexer, Processor, redis and Postgresql)
  • Gridproxy
  • Yggdrasil (Gridproxy)
  • Activation service
  • Dashboard
  • Weblets (playground)

Requirements

  • Simple installation
  • Minimal operational overhead
  • Have each service running as fast as possible (data snapshots)
  • User documentation
  • A way to check the service status

Caddy - DNS

Several services need to be exposed via a dns record, so one can interact with the hosted websites (Dashboard & Playground). These websites in turn need to be able to access (from the user browser) the TFchain public node, Graphql and Grixproxy hosted by this package. Every record needs a valid TLS certificate via Let's Encrypt. This will be a story on it's own.

Metrics

Exposing Prometheus metrics on the services will provide state monitoring. We can later build upon this to include a monitoring service and alerting. Or an integration with Vector and https://metrics.grid.tf

Remarks

Sync time of the public node, indexer and processor is too long. The chain is currently ~60GB so this would take weeks to sync, index and process. Providing snapshots could be a good first approach to shorten the time to be operational. We could test and provide weekly snapshots of the chain data plus indexer / processor postgres database. This will be another story.

Docker-compose should allow for upgrading deployed containers. Automatic upgrade could also be build in though in it's first phase a user will have to initiate an upgrade manually.

TODO

  • #2
  • Make docker container images available for all components/services:
  • Assemble the services in a docker compose file

tfchain & graphql data snapshot

Why

Speed up the process of setting up a Grid backend. The chain is currently ~60GB so this would take weeks to sync, index and process. We can provide recent snapshots so a new backend can be in sync fast.

Goal

Create an automated process of generating easy to use snapshots and make them available over rsync. If someone is deploying a Grid backend, docker-compose will then use rsync (script or container) to pull this snapshot and start syncing with the network.

Components

  • TFchain public node
  • Graphql - Indexer
  • Graphql - Processor
  • Public rsync server
  • Script to create snapshot

Archive

A Substrate public node stores it's chain data in many different files, which take longer to transfer then one big file. We will do some tests to figure out what is fastest:

  • Rsync unarchived files
  • Rsync arhive file and extract (zip? tar.gz?)

Script

A script should be created that does the following:

  • stop a service
  • make archive
  • put it in rsync public dir
  • change ln of 'latest'
  • start service again
  • remove previous snapshot

Rsync

Expose the snapshots over a public rsync server. This should also be included in the documentation, so someone can download a snapshot outside of docker-compose.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.