Coder Social home page Coder Social logo

activestandbycrossaz's Introduction

Active/Standy Across Availability Zones

Before the R80.40 release, which introduced the capability of setting up an Active/Standby cluster with members in separate AZs, Check Point didn't have a way to fail over across AZs. The aim of the two scripts in this repo is to allow eactly that: cross -AZ, high availability, with a (stateless) failover between two gateways or clusters deployed in different AZs.

Specifically, the scripts here are intended to be used in the context of a multi-VPC architecture centered around a Transit Gateway, where inter-VPC, VPC<->On Prem and/or VPC<->internet traffic needs to be inspected. This solution has customers dedicate a VPC to a security setup consisting of two gateways (or clusters) in different AZs and use the Lambda function code of these scripts to make it so that at any given time one healthy gateway (or cluster) will have these traffic flows routed through it. If a gateway or cluster in one AZ become unavailable, the Lambda function will detect it and will change the routes so as to point traffic to the gateway (or cluster) in the other AZ.

There are two scripts in this repo: twogatwaysacrossAZs is intended to be used when there are two gateways in different AZs. twoclustersaccrossAZs is intended to use when there are either two gateways or two clusters in different AZs.

general mode of operation

the functions generally work in two stages. in the first stage the function, set up to be triggered periodically by a Cloudwatch Scheduler, and living on a subnet who's routing points to the active gateway, sends a TCP probe to some host/port that would only succeed if the gateway is functioning, i.e., for the probe to succeed, it has to match a rule on on the gateway. If the probe fails it immediately tries to probe a second host/port. If both probes fail then, in the case of the two gateway script, the route table conrolling the subnet in changed to point to the ENI of the other gateway, while in the case of the two cluster setup, the functions waits for a while (because an intra-AZ failover might be under way), continuously trying to see if a connection has been restored, and only once this times out, it changes the route table.

Alt Routeswitch diagram

setup for twogatwaysacrossAZs

  1. The code assumes a VPC is already in place with 2 Check Point cloudguard gateways deployed in 2 AZs. The VPC is attached to some TGW. The Subnets for the Check Point ENIs are different from the subnets used for the TGW attachment.
  2. The code needs to be deployed as a python 3.7 Lambda function.
  3. the function needs to allow execution time of up to 10 secs.
  4. the function needs to be deployed in the same VPC as the gateways, on the same subnets as the subnets where the ENIs of the TGW are deployed
  5. A VPC endpoint for the EC2 API needs to be enabled on the VPC (so as to allow the function to make API calls to EC2 when the gateways are down).
  6. Permissions: the Lambda function needs permissions to describe route tables, as well as replace routes.
  7. Trigger: the function needs to be triggered by a Cloudwatch scheduler. Every 30 secs seems reasonable.
  8. Environmental variables need to be set up as follows:
variable name variable value notes
host1 an FQDN or IP address the address for the first probe the function will use
port1 a port number the port for the first probe the function will use
host2 an FQDN or IP address the address for the second probe the function will use
port2 a port number the port for the second probe the function will use
eni1 ENI ID The ENI of eth1 of the first gateway. The function will use this ENI in setting the target of the route if this gateway is to become active
eni2 ENI ID The ENI of eth1 of the second gateway. The function will use this ENI in setting the target of the route if this gateway is to become active
routetable Route Table ID the route table associated with the subnets to which the Lambda function and TGW ENIs are attached

setup for twoclustersaccrossAZs

  1. The code assumes a VPC is already in place with 2 Check Point cloudguard gateways (or clusters) deployed in 2 AZs. The VPC is attached to some TGW. The Subnets for the Check Point ENIs are different from the subnets used for the TGW attachment.
  2. The code needs to be deployed as a python 3.7 Lambda function.
  3. the function needs to allow execution time of up to 10 secs more than the "waittime" value is set (see below)
  4. the function needs to be deployed in the same VPC as the gateways, on the same subnets as the subnets where the ENIs of the TGW are deployed
  5. A VPC endpoint for the EC2 API needs to be enabled on the VPC (so as to allow the function to make API calls to EC2 when the gateways are down)
  6. Permissions: the Lambda function needs permissions to describe instances and route tables, as well as replace routes.
  7. Trigger: the function needs to be triggered by a Cloudwatch scheduler. Every 40 secs seems reasonable.
  8. Environmental variables need to be set up as follows:
variable name variable value notes
host1 an FQDN or IP address the address for the first probe the function will use
port1 a port number the port for the first probe the function will use
host2 an FQDN or IP address the address for the second probe the function will use
port2 a port number the port for the second probe the function will use
routetable Route Table ID the route table associated with the subnets to which the Lambda function and TGW ENIs are attached
tagkey a string a tag key that together with the tage value (see below) uniquely identifies all and only the instances of the clusters (or gateways)
tagvalue a string a tag value that together with the tage key (see above) uniquely identifies all and only the instances of the clusters (or gateways)
waittime a non-negative number the number of seconds the function should wait when there is no connectivity before changing the route. This is necessary when clusters are used because intra-AZ cluster failover might be underway that will resolve the connectivity issue. a value no less than 30 is recommended

activestandbycrossaz's People

Contributors

jlebowitsch avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.