| || |
,---.| ,---.. .,---||---.,---.|--- ,---.
| | | || || || || || `---.
`---'`---'`---'`---'`---'`---'`---'`---'`---'
Auto remediation & automation bots for AWS.
This solution is meant to be used in conjunction with Dome9's Continuous Compliance Engine or AWS GuardDuty to remediate issues that are uncovered.
For a GuardDuty quickstart doc, please see README_GUARDDUTY.md
For a condensed quickstart doc, please see README_QUICKSTART.md
For more technical information, please see README_ADVANCED.md
For more information on creating your own bots, please see README_DEVELOPER_GUIDE.md
Cloud-Bots is an automatic remediation solution for AWS built on top of Dome9's Continuous Compliance capabilities
Dome9 Compliance Engine continuously scans the relevant cloud account (AWS,Azure,GCP) for policy violations, and then alert and report.
For some organizations that is enough. However, at a certain scale and cloud matureness level- organizations prefer to move towards automatic-remediation approach, in which the system takes specific automated remediation bots in regards to specific violations.
This approach could reduce the load from the security operators and drastically reduce the time to resolve security issues.
- Dome9 will scan the accounts on an ongoing basis and send failing rules to SNS
- In the rules, if we want to add remediation, we can add in a "remediation flag" into the compliance section so that the SNS event is tagged with what we want to do.
- Each remediation bot that is tagged correlates to a file in the bots folder of the remediation function.
- Lambda reads the message tags and looks for a tag that matches AUTO:
- If any of those AUTO tags match a remediation that we have built out, it'll call that bot.
- All of the methods are sending their events to an array called text_output. Once the function is finished working, this array is turned into a string and posted to SNS
Here are two videos on CloudBots:
The function can exist in any region you want in your account. Only one is needed per account though.
In multi-account mode, only one function is required. In single account mode, one function is required per account.
In either mode, there's no need for one function per region or antyhing like that. The max is one function per account.
Every cloud-bot lives in the same function. There aren't multiple functions for the different bots.
No. The Dome9-connect role is only for Dome9 to collect data from your AWS accounts. The CloudBots function needs its own execution role to run the remediation actions, but it's completely separate from the Dome9 role.
AUTO: is used to signal to CloudBots that a remediation action needs to be triggered. The bot name correlates to a file name in the bots/ folder.
By putting the bot syntax in the "Compliance Section" field, multiple actions can be triggered from one rule since the Compliance Section is passed through the event as an array.
Any new bot needs to go into the bots folder in the function. From there, you call it with the AUTO: syntax.
For example, a delete user bot would be named delete_user.py and put in the bots folder.
It would be triggered with "AUTO: delete_user"
Currently only python is supported
Dome9's cross account role is completely separate from the CloudBots permissions and cross account roles. Dome9 permissions are in yellow, while the CloudBots permissions are in bold. This is done so that the most sensitive permissions stay within the customer environments and are never given to a third party.
You can deploy this stack via the link below. Pick the region that you would like it deployed in.
Region | Launch |
---|---|
us-east-1 | ![]() |
us-east-2 | ![]() |
us-west-1 | ![]() |
us-west-2 | ![]() |
ca-central-1 | ![]() |
eu-central-1 | ![]() |
eu-west-1 | ![]() |
eu-west-2 | ![]() |
eu-west-3 | ![]() |
ap-northeast-1 | ![]() |
ap-northeast-2 | ![]() |
ap-southeast-1 | ![]() |
ap-southeast-2 | ![]() |
ap-south-1 | ![]() |
sa-east-1 | ![]() |
If you want to deploy this via CLI, please see README_ADVANCED.md
-
Click the link and click Next
-
In the parameters section, set the Deployment Mode as single or multi depending on if this will be run across multiple accounts or not. (you can change this later if needed)
-
In the email address field, put in one of the subscriber emails we saved in step 1.
-
Click on Next > Next.
-
On the 4th page, you'll need to check the 2 boxes that allow this template to create IAM resources with custom names (This is for the role that is created for Lambda to perform the bots).
-
Next, click on the 'Create Change Set' button at the bottom of the page. Then click 'Execute'
-
From here, the stack will deploy. If there are no errors, go to the 'Outputs' tab and grab the two ARNs that were output. You'll need them later.
In single account mode, the Lambda function will only remediate issues found within the account it's running in. If the event is from another account, it'll be skipped.
This is the default mode. Nothing needs to be changed.
In multi account mode, the function will run in the local account but will also try to assume a role into another account if the event was from a different account than the one the function is running in. Each account that will have remediation bots will need a cross-account role to the master account.
In the Dome9CloudBots lambda function:
- Update the ACCOUNT_MODE environment variable from 'single' to 'multi'
- By default, the cross account roles will all need to be named "Dome9CloudBots". If you want a different name, add a new variable called "CROSS_ACCOUNT_ROLE_NAME" and set the value to the new name for the role.
cd cross_account_role_configs
Role creation needs to be done via something other than CloudFormation because CFTs don't output consistent role names
There is a small bash script in this directory that you can run (create_role.sh) to create these roles.
./create_role.sh <aws profile>
aws iam create-role \
--role-name Dome9CloudBots \
--assume-role-policy-document file://trust_policy.json \
--profile <aws_account_profile>
aws iam create-policy \
--policy-name Dome9CloudBots \
--policy-document file://remediation_policy.json \
--query 'Policy.Arn' \
--profile <aws_account_profile>
aws iam create-role \
--role-name Dome9CloudBots \
--assume-role-policy-document file://trust_policy.json \
--profile <aws_account_profile>
Take ARN from create-policy for the next command
aws iam attach-role-policy \
--role-name Dome9CloudBots \
--policy-arn <ARN FROM LAST COMMAND> \
--profile <aws_account_profile>
See this section for sample screenshots of the setup
It's recommended but not required to break remediation bots into their own bundles. There is a sample bundle (sample_bundle.json) that can be used as a starting point. The rule in the sample bundle will remove rules from the default security group if the SG is empty.
For all rules that you want to add remediation to, add the remediation tag to the "Compliance Section" of the rule.
All available remediation bots are in the bots folder.
Ex: AUTO: ec2_stop_instance
Make sure you're getting the results you want and expect
If you're in single account mode, there needs to be a 1 Continuous Compliance bundle per account. If not, select all the accounts that you set up cross-account roles in. Set the output topic as the ARN from the InputTopicARN one we set up Set the format to be JSON - Full Entity
Currently Continuous Compliance sends a 'diff' for the SNS notifications. Because of this, if you have ran the bundle before, only new issues will be sent to SNS. If you want to have the first auto-remediation run to include all pre-existing issues, you'll need to use the "send all events" button to force a re-send.
For the compliance policy you have set up, look for a button on the right hand side with an arrow pointing up.
In this page, select SNS as the delivery method and your notification policy as the place to send the events.
This can also be useful for rolling out new bots and/or testing since you can re-send the same event more than once.
-
For any other rules that you want to create and add remediation to, add the remediation tag to the "Compliance Section" of the rule.
-
Set the Dome9 compliance bundle to run via continuous compliance.
Updating your current CloudBots stack is very straightforward. In the UI, navigate to CloudFormation for the region that CloudBots is set up in.
- Select the dome9CloudBots stack and then select Actions > Update Stack
- Select "Specify an Amazon S3 template URL" and copy the link from the table below that corresponds to the region you're in.
- Click next/update all the way through and it'll deploy the new version of the template.
What it does: Creates a new S3 bucket and turns on a multi-region trail that logs to it.
Pre-set Settings:
Default bucket name: acct<account_id>cloudtraillogs
IsMultiRegionTrail: True (CIS for AWS V 1.1.0 Section 2.1)
IncludeGlobalServiceEvents: True
EnableLogFileValidation: True (CIS for AWS V 1.1.0 Section 2.2)
Usage: AUTO: cloudtrail_enable trail_name=<trail_name> bucket_name=<bucket_name>
Note: Trail_name and bucket_name are optional and don't need to be set.
Limitations: none
What it does: Makes CloudTrail output logs to CloudWatchLogs. If the log group doesn't exist alredy, it'll reate a new one.
Usage: AUTO: cloudtrail_send_to_cloudwatch <log_group_name>
Limitations: none
Defaults:
If no log group name is set, it'll default to CloudTrail/DefaultLogGroup
Role name: CloudTrail_CloudWatchLogs_Role
Log delivery policy name: CloudWatchLogsAllowDelivery
What it does: Creates CloudWatch Metric Filters to match the CIS Benchmark. A metric alarm and SNS subscripion is created as well
Usage: AUTO: cloudwatch_create_metric_filter <email_address> ....
Limitations: Cloudtrail needs to be set up to send the logs to a CloudWatchLogs group first.
Default: SNS topic name is CloudTrailMetricFilterAlerts
Available filters are: UnauthorizedApiCalls, NoMfaConsoleLogins, RootAccountLogins, IamPolicyChanges, CloudTrailConfigurationChanges, FailedConsoleLogins, DisabledOrDeletedCmks, S3BucketPolicyChanges, AwsConfigChanges, SecurityGroupChanges, NetworkAccessControlListChanges, NetworkGatewayChanges, RouteTableChanges, VpcChanges
What it does: Enables AWS Config. This DOES NOT create config rules. It only turns on the configuration recorders.
Usage: AUTO: config_enable bucket_name=mybucketlogs bucket_region=us-west-1 include_global_resource_types_region=us-west-1
Limitations: none
Variables (and their defaults):
bucket_name = accountNumber + "awsconfiglogs"
bucket_region = us-west-1
allSupported = True
includeGlobalResourceTypes = True (if you want to change this, use the variable include_global_resource_types_region=<desired_region>)
Defaults (not changable currently via variable): file deliveryFrequency(to S3) is set to One_Hour config_name = default
What it does: Attaches an instance role to an EC2 instance. This role needs be passed in through the params.
Usage: AUTO: ec2_update_instance_role role_arn=<role_arn>
If you have a role that is the same across accounts, and don't want to pass in an account specific ARN, add "$ACCOUNT_ID" to the role ARN and the function will automatically pull in the current account ID of the finding.
Example: AUTO: ec2_update_instance_role role_arn=arn:aws:iam::$ACCOUNT_ID:instance-profile/ec2SSM
Sample GSL: Instance should have roles
What it does: Disassociates and releases all EIPs on an instance
Usage: AUTO: ec2_release_eips
Limitations: none
What it does: Attaches the instance a SG with no rules so it can't communicate with the outside world
Usage: AUTO: ec2_quarantine_instance
Limitations: None
What it does: Stops an ec2 instance
Usage: AUTO: ec2_stop_instance
Limitations: none
What it does: If an instance is missing a specific tag, try to pull it from the VPC.
Usage: AUTO: ec2_tag_instance_from_vpc
Limitations: none
What it does: Terminates an ec2 instance
Usage: AUTO: ec2_terminate_instance
Limitations: none
What it does: Updates an EXISTING EC2 instance role by attaching another policy to the role. This policy needs be passed in through the params.
Usage: AUTO: ec2_update_instance_role policy_arn=<policy_arn>
Example: AUTO: ec2_update_instance_role policy_arn=arn:aws:iam::aws:policy/AlexaForBusinessDeviceSetup
Sample GSL: Instance where roles should have roles with [ managedPolicies contain [ name='AmazonEC2RoleforSSM' ] ]
What it does: Adds an explicit deny all policy to IAM and directly attaches it to a role
Usage: AUTO: iam_quarantine_role
Limitations: none
What it does: Adds an explicit deny all policy to IAM and directly attaches it to a user
Usage: AUTO: iam_quarantine_user
Limitations: none
What it does: Sets all settings in an account password policy
Usage: AUTO: iam_turn_on_password_policy MinimumPasswordLength: RequireSymbols:<True/False> RequireNumbers:<True/False> RequireUppercaseCharacters:<True/False> RequireLowercaseCharacters:<True/False> AllowUsersToChangePassword:<True/False> MaxPasswordAge: PasswordReusePrevention: HardExpiry:<True/False>
Limitations: ALL variables need to be set at the same time
Sample tag: AUTO: iam_turn_on_password_policy MinimumPasswordLength:15 RequireSymbols:True RequireNumbers:True RequireUppercaseCharacters:True RequireLowercaseCharacters:True AllowUsersToChangePassword:True MaxPasswordAge:5 PasswordReusePrevention:5 HardExpiry:True
What it does: Updates the setting for an IAM user so that they need to change their console password the next time they log in.
Usage: AUTO: iam_user_force_password_change
Limitations: none
What it does: Turns off ec2 instances with public IPs, detaches an IGW from a VPC, and then deletes it.
Limitations: VPCs have lots of interconnected services. This is currently just focused on EC2 but future enhancements will need to be made to turn off RDS, Redshift, etc.
What it does: Tags an ec2 resource with "marked_for_stop" and
Usage: AUTO: mark_for_stop_ec2_resource <unit(m,h,d)>
Example: AUTO: mark_for_stop_ec2_resource 3h
Note: This is meant to be used in conjunction with a more aggressive action like stopping or termanating an instance. The first step will be to tag an instance with the time that we want to tigger the remediation bot.
From there, a rule like "Instance should not have tags with [ key='marked_for_stop' and value before(1, 'minutes') ]" can be ran to check how long an instance has had the 'mark for stop' tag.
Limitations: none
THIS WORKS ACROSS ALL EC2 RELATED SERVICES:
- Image
- Instance
- InternetGateway
- NetworkAcl
- NetworkInterface
- PlacementGroup
- RouteTable
- SecurityGroup
- Snapshot
- Subnet
- Volume
- Vpc
- VpcPeeringConnection
What it does: Attaches the RDS instance a SG with no rules so it can't communicate with the outside world
Usage: AUTO: rds_quarantine_instance
Limitations: Instance needs to be "Available" in order to update. If it's in "backing up" state, this will fail
(Might not work with Aurora since it's in a cluster)
What it does: Deletes an S3 bucket
Usage: AUTO: s3_delete_bucket
Limitations: none
What it does: Deletes all ACLs and bucket policies from a bucket
Usage: AUTO: s3_delete_permissions
Limitations: none
What it does: Turns on AES-256 encryption on the target bucket
Usage: AUTO: s3_enable_encryption
Limitations: none
What it does: Turns on server access logging. The target bucket needs to be in the same region as the remediation bucket or it'll throw a CrossLocationLoggingProhibitted error. This bot will create a bucket to log to as well.
Usage: AUTO: s3_enable_logging
Limitations: none
What it does: Turns on versioning for an S3 bucket
Usage: AUTO: s3_enable_versioning
Limitations: none
What it does: Deletes a security group
Usage: AUTO: sg_delete
Limitations: This will fail if there is something still attached to the SG.
What it does: Deletes all ingress and egress rules from a SG
Usage: AUTO: sg_rules_delete
Limitations: none
What it does: Deletes a single rule on a security group Usage: AUTO: sg_single_rule_delete split=<true|false> protocol=<TCP|UDP> scope=<a.b.c.d/e> direction=<inbound|outbound> port=
Example: AUTO: sg_single_rule_delete split=false protocol=TCP scope=0.0.0.0/0 direction=inbound port=22 Sample GSL: SecurityGroup should not have inboundRules with [scope = '0.0.0.0/0' and port<=22 and portTo>=22]
Conditions and caveats: Deleting a single rule on a security group can be difficult because the problematic port can be nested within a wider range of ports. If SSH is open because a SG has all of TCP open, do you want to delete the whole rule or would you break up the SG into the same scope but port 0-21 and a second rule for 23-end of TCP port range? Currently the way this is being addressed is using the 'split' parameter. If it's set as false, CloudBots will only look for the specific port in question. If it's nested within a larger port scope, it'll be skipped. If you set split to true, then the whole rule that the problematic port is nested in will be removed and 2 split rules will be added in its place (ex: if port 1-30 is open and you want to remove SSH, the new rules will be for port 1-21 and port 23-30).
What it does: Tags an ec2 instance
Usage: AUTO: tag_ec2_resource "key" "value"
Note: Tags with spaces can be added if they are surrounded by quotes: ex: tag_ec2_resource "this is my key" "this is a value"
Limitations: none
THIS WORKS ACROSS ALL EC2 RELATED SERVICES:
- Image
- Instance
- InternetGateway
- NetworkAcl
- NetworkInterface
- PlacementGroup
- RouteTable
- SecurityGroup
- Snapshot
- Subnet
- Volume
- Vpc
- VpcPeeringConnection
What it does: Turns on flow logs for a VPC Settings: Log Group Name: vpcFlowLogs If traffic type to be logged isn't specified, it defaults to all. Usage: AUTO: vpc_turn_on_flow_logs traffic_type=<all|accept|reject> destination=<logs|s3> s3_arn=arn:aws:s3:::my-bucket/my-logs/ Example: AUTO: vpc_turn_on_flow_logs traffic_type=all destination=logs Example: AUTO: vpc_turn_on_flow_logs traffic_type=all destination=s3 s3_arn=arn:aws:s3:::my-bucket/my-logs/
Limitations: none Sample GSL: VPC should have hasFlowLogs=true
To specify a subfolder in the bucket, use the following ARN format: bucket_ARN/subfolder_name/ . For example, to specify a subfolder named my-logs in a bucket named my-bucket , use the following ARN: arn:aws:s3:::my-bucket/my-logs/
log delivery policy name is set as: vpcFlowLogDelivery log delivery role is set as: vpcFlowLogDelivery
Updated sg_single_rule_delete to support deleting just a single port from a wider scope of rules (ex: deleting just port 22 from ports 10-30).
2 new permissions are required to support this bot:
ec2:AuthorizeSecurityGroupEgress
ec2:AuthorizeSecurityGroupIngress
Updated vpc_turn_on_flow_logs to support sending logs to S3 instead of CloudWatch logs
Created a new folder called optional_bots. This will not be packaged with the standard Lambda function and will need to be added in manually as required.
Bots that are extremely impactful (s3_delete_bucket, ec2_terminate_instance, etc.) will live here as well as edge case bots that were made for specific customers (ec2_tag_instance_from_vpc).
Contact: Alex Corstorphine ([email protected])