Coder Social home page Coder Social logo

aws-samples / amazon-cloudwatch-auto-alarms Goto Github PK

View Code? Open in Web Editor NEW
83.0 7.0 85.0 403 KB

Automatically create and configure Amazon CloudWatch alarms for EC2 instances, RDS, and AWS Lambda using tags for standard and custom CloudWatch Metrics.

License: MIT No Attribution

Python 87.23% Shell 12.77%
aws cloudwatch cloudwatch-alarms cloudwatch-metrics monitoring ec2 aws-lambda aws-ec2 aws-rds

amazon-cloudwatch-auto-alarms's Introduction

CloudWatchAutoAlarms - Automatically create a set of CloudWatch alarms with tagging

CloudWatchAutoAlarms Architecture Diagram

The CloudWatchAutoAlarms AWS Lambda function enables you to quickly and automatically create a standard set of CloudWatch alarms for your Amazon EC2 instances or AWS Lambda functions using tags. It prevents errors that may occur by manually creating alarms, reduces the time required to deploy alarms, and reduces the skills gap required in order to create and manage alarms. It can be especially useful during a large migration to AWS where many resources may be migrated into your AWS account at once.

The default configuration creates alarms for the following Amazon EC2 metrics for Windows, Amazon Linux, Redhat, Ubuntu, or SUSE EC2 instances:

The default configuration creates alarms for the following AWS RDS metrics:

  • CPU Utilization

Alarms are created for RDS clusters as well as RDS database instances.

The default configuration also creates alarms for the following AWS Lambda metrics:

  • Errors
  • Throttles

You can change or add alarms by updating the default_alarms dictionary in cw_auto_alarms.py.

The created alarms can be configured to notify an Amazon SNS topic that you specify using the DEFAULT_ALARM_SNS_TOPIC_ARN environment variable. See the Setup section for details.

The Amazon CloudWatch alarms are created when an EC2 instance with the tag key Create_Auto_Alarms enters the running state and they are deleted when the instance is terminated. Alarms can be created when an instance is first launched or afterwards by stopping and starting the instance.

The alarms are created and configured based on EC2 tags which include the metric name, comparison, period, statistic, and threshold.

The tag name syntax for AWS provided metrics is:

AutoAlarm-<Namespace>-<MetricName>-<ComparisonOperator>-<Period>-<EvaluationPeriods>-<Statistic>-<Description>

Where:

  • Namespace is the CloudWatch Alarms namespace for the metric. For AWS provided EC2 metrics, this is AWS/EC2. For CloudWatch agent provided metrics, this is CWAgent by default. You can also specify a different name as described in the Configuration section.
  • MetricName is the name of the metric. For example, CPUUtilization for EC2 total CPU utilization.
  • ComparisonOperator is the comparison that should be used aligning to the ComparisonOperator parameter in the PutMetricData Amazon CloudWatch API action.
  • Period is the length of time used to evaluate the metric. You can specify an integer value followed by s for seconds, m for minutes, h for hours, d for days, and w for weeks. Your evaluation period should observe CloudWatch evaluation period limits.
  • EvaluationPeriods is the number of periods on which to evaluate the alarm. This property is optional and if it is left out, defaults to 1.
  • Statistic is the statistic for the MetricName specified, other than percentile.
  • Description is the description for the CloudWatch Alarm. This property is optional, and if it is left out then a default description is used.

The tag value is used to specify the threshold. You can also create alarms for custom Amazon CloudWatch metrics.

For example, one of the preconfigured, default alarms that are included in the default_alarms dictionary is AutoAlarm-AWS/EC2-CPUUtilization-GreaterThanThreshold-5m-1-Average-Created_by_CloudWatchAutoAlarms. When an instance with the tag key Create_Auto_Alarms enters the running state, an alarm for the AWS provided CPUUtilization CloudWatch EC2 metric will be created. Additional alarms will also be created for the EC2 instance based on the platform and alarms defined in the default_alarms python dictionary defined in cw_auto_alarms.py.

Alarms can be updated by changing the tag key or value and stopping and starting the instance.

Requirements

  1. The AWS CLI is required to deploy the Lambda function using the deployment instructions.
  2. The AWS CLI should be configured with valid credentials to create the CloudFormation stack, lambda function, and related resources. You must also have rights to upload new objects to the S3 bucket you specify in the deployment steps.
  3. EC2 instances must have the CloudWatch agent installed and configured with the basic, standard, or advanced predefined metric sets in order for the default alarms for custom CloudWatch metrics to work. Scripts named userdata_linux_basic.sh, userdata_linux_standard.sh, and userdata_linux_advanced.sh are provided to install and configure the CloudWatch agent on Linux based EC2 instances with their respective predefined metric sets.

Setup

There are a number of settings that can be customized by updating the CloudWatchAutoAlarms Lambda function environment variables defined in the CloudWatchAutoAlarms.yaml CloudFormation template. The settings will only affect new alarms that you create so you should customize these values to meet your requirements before you deploy the Lambda function. The following list provides a description of the setting along with the environment variable name and default value:

  • ALARM_TAG: Create_Auto_Alarms
    • The CloudWatchAutoAlarms Lambda function will only create alarms for instances that are tagged with this name tag. The default tag name is Create_Auto_Alarms. If you want to use a different name, change the value of the ALARM_TAG environment variable.
  • CREATE_DEFAULT_ALARMS: true
    • When true, this will result in the default alarm set being created when the Create_Auto_Alarms tag is present. If set to false, then alarms will be created only for the alarm tags defined on the instance.
  • CLOUDWATCH_NAMESPACE: CWAgent
    • You can change the namespace where the Lambda function should look for your CloudWatch metrics. The default CloudWatch agent metrics namespace is CWAgent. If your CloudWatch agent configuration is using a different namespace, then update the CLOUDWATCH_NAMESPACE environment variable.
  • CLOUDWATCH_APPEND_DIMENSIONS: InstanceId, ImageId, InstanceType, AutoScalingGroupName
    • You can add EC2 metric dimensions to all metrics collected by the CloudWatch agent. This environment variable aligns to your CloudWatch configuration setting for append_dimensions. The default setting includes all the supported dimensions: InstanceId, ImageId, InstanceType, AutoScalingGroupName
  • DEFAULT_ALARM_SNS_TOPIC_ARN: arn:${AWS::Partition}:sns:${AWS::Region}:${AWS::AccountId}:CloudWatchAutoAlarmsSNSTopic
    • You can define an Amazon Simple Notification Service (Amazon SNS) topic that the Lambda function will specify as the notification target for created alarms. The deployment instructions include an SNS topic that you can deploy and use with the solution. You provide the Amazon SNS Topic Amazon Resource Name (ARN) with the AlarmNotificationARN parameter when you deploy the CloudWatchAutoAlarms.yaml CloudFormation template.  If you leave the AlarmNotificationARN parameter value blank, then this environment variable is not set and created alarms won't use notifications. The solution also enables you to specify a unique SNS topic per AWS resource by including a tag with key notify with the value set to the SNS topic ARN that should be targeted for alarms for that specific resource.
  • ALARM_IDENTIFIER_PREFIX: AutoAlarm
    • The prefix name that is added to the beginning of each CloudWatch alarm created by the solution. (e.g. For "AutoAlarm": (e.g. AutoAlarm-i-00e4f327736cb077f-CPUUtilization-GreaterThanThreshold-80-5m)) You should update this variable via the AlarmIdentifierPrefix in the CloudWatchAutoAlarms.yaml CloudFormation template so that the IAM policy is updated to align with your custom name.

You can update the thresholds for the default alarms by updating the following environment variables:

For Anomaly Detection Alarms: * ALARM_DEFAULT_ANOMALY_THRESHOLD: 2 For Amazon EC2: * ALARM_CPU_HIGH_THRESHOLD: 75 * ALARM_CPU_CREDIT_BALANCE_LOW_THRESHOLD: 100 * ALARM_MEMORY_HIGH_THRESHOLD: 75 * ALARM_DISK_PERCENT_LOW_THRESHOLD: 20

For AWS RDS: * ALARM_RDS_CPU_HIGH_THRESHOLD: 75

For AWS Lambda: * ALARM_LAMBDA_ERROR_THRESHOLD: 0 * ALARM_LAMBDA_THROTTLE_THRESHOLD: 0

Deploy

  1. Clone the amazon-cloudwatch-auto-alarms github repository to your computer using the following command:

    git clone https://github.com/aws-samples/amazon-cloudwatch-auto-alarms
    
  2. Configure the AWS CLI with credentials for your AWS account. This walkthrough uses temporary credentials provided by AWS Single Sign On using the Command line or programmatic access option. This sets the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN AWS environment variables with the appropriate credentials for use with the AWS CLI.

  3. Create an Amazon SNS topic that CloudWatchAutoAlarms will use for notifications. You can use this sample Amazon SNS CloudFormation template to create an SNS topic.  Leave the OrganizationID parameter blank, it is used for multi-account deployments.

    aws cloudformation create-stack --stack-name amazon-cloudwatch-auto-alarms-sns-topic \
    --template-body file://CloudWatchAutoAlarms-SNS.yaml \
    --parameters ParameterKey=OrganizationID,ParameterValue="" \
    --region <enter your aws region id, e.g. "us-east-1">
    
  4. Create an S3 bucket that will be used to store and access the CloudWatchAutoAlarms lambda function deployment package if you don't have one. You can use this sample S3 CloudFormation template. You can leave the AWS Organizations ID parameter blank if this lambda function will only be deployed in your current account:

    aws cloudformation create-stack --stack-name amazon-cloudwatch-auto-alarms-s3-bucket \
     --template-body file://CloudWatchAutoAlarms-S3.yaml \
     --parameters ParameterKey=OrganizationID,ParameterValue="" \
     --region <enter your aws region id, e.g. "us-east-1">
    
  5. Update the environment variables in the CloudWatchAutoAlarms CloudFormation template to configure default settings such as alarm thresholds.

  6. Create a zip file containing the CloudWatchAutoAlarms AWS Lambda function code located in the src directory. This is the deployment package that you will use to deploy the AWS Lambda function. On a Mac, you can use the zip command:

    zip -j amazon-cloudwatch-auto-alarms.zip src/*
    
  7. Copy the amazon-cloudwatch-auto-alarms.zip file to your S3 bucket.

    aws s3 cp amazon-cloudwatch-auto-alarms.zip s3://<bucket name>
    

    If you created an S3 bucket using this sample S3 CloudFormation template in step 3, then you can get the bucket name from the AWS Management console or run the following AWS CLI command:

    aws cloudformation describe-stacks --stack-name amazon-cloudwatch-auto-alarms-s3-bucket \
    --query "Stacks[0].Outputs[?ExportName=='amazon-cloudwatch-auto-alarms-bucket-name'].OutputValue" \
    --output text \
    --region <enter your aws region id, e.g. "us-east-1">
    
  8. Deploy the AWS lambda function using the deployment package you uploaded to your S3 bucket:

    aws cloudformation create-stack --stack-name amazon-cloudwatch-auto-alarms \
    --template-body file://CloudWatchAutoAlarms.yaml \
    --capabilities CAPABILITY_IAM \
    --parameters ParameterKey=S3DeploymentKey,ParameterValue=amazon-cloudwatch-auto-alarms.zip \
    ParameterKey=S3DeploymentBucket,ParameterValue=<S3 bucket with your deployment package> \
    ParameterKey=AlarmNotificationARN,ParameterValue=<SNS Topic ARN for Alarm Notifications> \
    --region <enter your aws region id, e.g. "us-east-1">
    

    If you don't want to enable SNS notifications, you can set the ParameterValue to "" for AlarmNotificationARN.

    You can retrieve the SNS Topic ARN from step #3 for the AlarmNotificationARN parameter value by running the following command:

    aws cloudformation describe-stacks --stack-name amazon-cloudwatch-auto-alarms-sns-topic \
    --query "Stacks[0].Outputs[?ExportName=='amazon-cloudwatch-auto-alarms-sns-topic-arn'].OutputValue" \
    --output text --region <enter your aws region id, e.g. "us-east-1">
    

Activate

Amazon EC2

In order to create the default alarm set for an Amazon EC2 instance or AWS Lambda function, you simply need to tag the Amazon EC2 instance or AWS Lambda function with the activation tag key defined by the ALARM_TAG environment variable. The default tag activation key is Create_Auto_Alarms.

For Amazon EC2 instances, you must add this tag during instance launch or you can add this tag at any time to an instance and then stop and start the instance in order to create the default alarm set as well as any custom, instance specific alarms.

You can also manually invoke the CloudWatchAutoAlarms lambda function with the following event payload to create / update EC2 alarms without having to stop and start your EC2 instances:

{
  "action": "scan"
}

You can do this with a test execution of the CloudWatchAUtoAlarms AWS Lambda function. Open the AWS Lambda Management Console and perform a test invocation from the Test tab with the payload provided here.

The CloudWatchAutoAlarms.yaml template includes two CloudWatch event rules. One invokes the Lambda function on running and terminated instance states. The other invokes the Lambda function on a daily schedule. The daily scheduled event will update any existing alarms and also create any alarms with wildcard tags.

Amazon RDS

For Amazon RDS, you can add this tag to an RDS database cluster or database instance at any time in order to create the default alarm set as well as any custom alarms that have been specified as tags on the cluster or instance.

AWS Lambda

For AWS Lambda, you can add this tag to an AWS Lambda function at any time in order to create the default alarm set as well as any custom, function specific alarms.

Notification Support

You can define an Amazon Simple Notification Service (Amazon SNS) topic that the Lambda function will specify as the notification target for created alarms. The deployment instructions include an SNS topic that you can deploy and use with the solution. You provide the Amazon SNS Topic Amazon Resource Name (ARN) with the AlarmNotificationARN parameter when you deploy the CloudWatchAutoAlarms.yaml CloudFormation template. This parameter sets the DEFAULT_ALARM_SNS_TOPIC_ARN environment variable to the ARN you specified. If you leave the AlarmNotificationARN parameter value blank, then this environment variable is not set and created alarms won't use notifications.

The solution also enables you to specify a unique SNS topic per AWS resource by setting a tag with key notify and the value set to the SNS topic ARN that should be targeted for alarms for that specific resource. For any resources that don't have the notify tag set, the default SNS topic ARN will be used.

You can apply a tagging strategy that includes the notify tag for groups of resources to notify on specific groups of resources. For example, consider a tag with key Team and value Windows. You could align tagging of this specific key / value with the SNS topic for Windows support(e.g. notify: arn:aws:sns:us-east-1:123456789012:WindowsSupport)

Changing the default alarm set

You can add, remove, and customize alarms in the default alarm set. The default alarms are defined in the default_alarms python dictionary in cw_auto_alarms.py.

In order to create an alarm, you must uniquely identify the metric that you want to alarm on. Standard Amazon EC2 metrics include the InstanceId dimension to uniquely identify each standard metric associated with an EC2 instance. If you want to add an alarm based upon a standard EC2 instance metric, then you can use the tag name syntax: AutoAlarm-AWS/EC2-<MetricName>-<ComparisonOperator>-<Period>-<EvaluationPeriods>-<Statistic>-<Description> This syntax doesn't include any dimension names because the InstanceId dimension is used for metrics in the AWS/EC2 namespace. These AWS provided EC2 metrics are common across all platforms for EC2.

Similarly, AWS Lambda metrics include the FunctionName dimension to uniquely identify each standard metric associated with an AWS Lambda function. If you want to add an alarm based upon a standard AWS Lambda metric, then you can use the tag name syntax: AutoAlarm-AWS/Lambda-<MetricName>-<ComparisonOperator>-<Period>-<EvaluationPeriods>-<Statistic>-<Description> You can add any standard Amazon CloudWatch metric for Amazon EC2 or AWS Lambda into the default_alarms dictionary under the AWS/EC2 or AWS/Lambda dictionary key using this tag syntax.

Wildcard support for dimension values on EC2 instance alarms

The solution allows you to specify a wildcard for a dimension value in order to create CloudWatch alarms for all dimension values. This is particularly useful for creating alarms for all partitions and drives on a system or where the value of a dimension is not known or can vary across EC2 instances.

For example, the CloudWatch agent publishes the disk_used_percent metric for disks attached to a Linux EC2 instance. The dimensions for this metric for Amazon Linux are device name, fstype, and path.

The alarm tag for this metric is hardcoded in the default_alarms python dictionary in cw_auto_alarms.py to create an alarm for the root volume whose default dimensions and values are:

  • device: nvme0n1p1
  • fstype: xfs
  • path: /

this is equivalent to the following default tag in the solution:

AutoAlarm-CWAgent-disk_used_percent-device-nvme0n1p1-fstype-xfs-path-/-GreaterThanThreshold-5m-1-Average-Created_by_CloudWatchAutoAlarms

If you want to alarm on all disks attached to an EC2 instance then you must specify the device name, file system type, and path dimension values for each disk, which will vary. Each EC2 instance may also have a different number of disks and different dimension values.

The solution addresses this requirement by allowing you to specify a wildcard for the dimension value. For example, the Alarm tag for disk_used_percent For Amazon Linux specified in the default_alarms dictionary would change to:

                {
                    'Key': alarm_separator.join(
                        [alarm_identifier, cw_namespace, 'disk_used_percent', 'device', '*', 'fstype', 'xfs', 'path',
                         '*', 'GreaterThanThreshold', default_period, default_evaluation_periods, default_statistic,
                         'Created_by_CloudWatchAutoAlarms']),
                    'Value': alarm_disk_used_percent_threshold
                },

This yields the equivalent alarm tag:

AutoAlarm-CWAgent-disk_used_percent-device-*-fstype-xfs-path-*-GreaterThanThreshold-5m-1-Average-Created_by_CloudWatchAutoAlarms

In this example, we have specified a wildcard for the device and path dimensions. Using this example, the solution will query CloudWatch metrics and create an alarm for each unique device and path dimension values for each Amazon Linux instance.

If your EC2 instance had two disks with the following dimensions:

Disk 1

  • device: nvme0n1p1
  • fstype: xfs
  • path: /

Disk 2

  • device: nvme1n1p1
  • fstype: xfs
  • path: /disk2

Then two alarms would be created using a * wildcard for the device and path dimensions:

  • AutoAlarm-<InstanceId>-CWAgent-disk_used_percent-device-nvme0n1p1-fstype-xfs-path-/-GreaterThanThreshold-80-5m-1p-Average-Created_by_CloudWatchAutoAlarms
  • AutoAlarm-<InstanceId>-CWAgent-disk_used_percent-device-nvme1n1p1-fstype-xfs-path-/disk2-GreaterThanThreshold-80-5m-1p-Average-Created_by_CloudWatchAutoAlarms

In order to identify the dimension values, the solution queries CloudWatch metrics to identify all metrics that match the fixed dimension values for the metric name specified. It then iterates through the dimensions whose values are specified as a wildcard to identify the specific dimension values required for the alarm.

Because the solution relies on the available metrics in CloudWatch, it will only work after the CloudWatch agent has published and sent metrics to the CloudWatch service. Since the solution is designed to run on instance launch, these metrics will not be available on first start since the CloudWatch service will not have received them yet.

In order to resolve this, you should schedule the solution to run on schedule using the scan payload:

{
"action": "scan"
}

This will provide sufficient time for the CloudWatch agent to publish metrics for new instances. You can schedule the frequency of execution based on the acceptable timeframe for which wildcard based alarms for new instances are not yet created.

Creating CloudWatch Anomaly Detection Alarms

CloudWatch Anomaly Detection Alarms are supported using the comparison operators LessThanLowerOrGreaterThanUpperThreshold, LessThanLowerThreshold, or GreaterThanUpperThreshold.

When you specify one of these comparison operators, the solution creates an anomaly detection alarm and uses the value for the tag key as the threshold. Refer to the CloudWatch documentation for more details on the threshold and anomaly detection.

CloudWatch Anomaly detection uses machine learning models based on the metric, dimensions, and statistic chosen. If you create an alarm without a current model, CloudWatch Alarms creates a new model using these parameters from your alarm configuration.
For new models, it can take up to 3 hours for the actual anomaly detection band to appear in your graph. It can take up to two weeks for the new model to train, so the anomaly detection band shows more accurate expected values. Refer to the documentation for more details.

The solution includes commented out code for creating a CloudWatch Anomaly Detection Alarm for CPU Utilization in the default_alarms dictionary:

        # This is an example alarm using anomaly detection
        # {
        #     'Key': alarm_separator.join(
        #         [alarm_identifier, 'AWS/EC2', 'CPUUtilization', 'GreaterThanUpperThreshold', default_period,
        #          default_evaluation_periods, default_statistic, 'Created_by_CloudWatchAutoAlarms']),
        #     'Value': alarm_cpu_high_anomaly_detection_default_threshold
        # }

You can uncomment and update this code to test out anomaly detection support.

The solution implements the environment variable ALARM_DEFAULT_ANOMALY_THRESHOLD as an example threshold you can use for your anomaly detection alarms.

Alarming on custom Amazon EC2 metrics

Metrics captured by the Amazon CloudWatch agent are considered custom metrics. These metrics are created in the CWAgent namespace by default. Custom metrics may have any number of dimensions in order to uniquely identify a metric. Additionally, the metric dimensions may be named differently based upon the underlying platform for the EC2 instance.

For example, the metric name used to measure the disk space utilization is named disk_used_percent in Linux and LogicalDisk % Free Space in Windows. The dimensions are also different, in Linux you must also include the device, fstype, and path dimensions in order to uniquely identify a disk. In Windows, you must include the objectname and instance dimensions.

Consequently, it is more difficult to automatically create alarms across different platforms for custom CloudWatch EC2 instance metrics.

The disk_used_percent metric for Linux has the additional dimensions: 'device', 'fstype', 'path'. For metrics with custom dimensions, you can include the dimension name and value in the tag key syntax: AutoAlarm-<Namespace>-<MetricName>-<DimensionName-DimensionValue...>-<ComparisonOperator>-<Period>-<EvaluationPeriods>-<Statistic>-<Description> For example, the tag name used to create an alarm for the average disk_used_percent over a 5 minute period for the root partition on an Amazon Linux instance in the CWAgent namespace is: AutoAlarm-CWAgent-disk_used_percent-device-xvda1-fstype-xfs-path-/-GreaterThanThreshold-5m-1-Average-exampleDescription Where the device dimension has a value of xvda1, the fstype dimension has a value of xfs, and the path dimension has a value of /.

This syntax and approach allows you to collectively support metrics with different numbers of dimensions and names. Using this syntax, you can add alarms for metrics with custom dimensions to the appropriate platform in the default_alarms dictionary in cw_auto_alarms.py

You should also make sure that the CLOUDWATCH_APPEND_DIMENSIONS environment variable is set correctly in order to ensure that created alarms include these dimensions. The lambda function will dynamically lookup the values for these dimensions at runtime.

If your dimensions name uses the default separator character '-', then you can update the alarm_separator variable in cw_auto_alarms.py with an alternative seperator character such as '~'.

Create a specific alarm for a specific EC2 instance using tags

You can create alarms that are specific to an individual EC2 instance by adding a tag to the instance using the tag key syntax described in changing the default alarm set. Simply add a tag to the instance on launch or restart the instance after you have added the tag. You can also update the thresholds for created alarms by updating the tag values, causing the alarm to be updated when the instance is stopped and started.

For example, to add an alarm for the Amazon EC2 StatusCheckFailed CloudWatch metric for an existing EC2 instance:

  1. On the Tags tab, choose Manage tags, and then choose Add tag. For Key, enter AutoAlarm-AWS/EC2-StatusCheckFailed-GreaterThanThreshold-5m-1-Average-exampleDescription. For Value, enter 1. Choose Save.
  2. Stop and start the Amazon EC2 instance.
  3. After the instance is stopped and restarted, go to the Alarms page in the CloudWatch console to confirm that the alarm was created. You should find a new alarm named AutoAlarm--StatusCheckFailed-GreaterThanThreshold-1-5m-1p-exampleDescription.

Creating a specific alarm for a specific AWS Lambda function using tags

You can create alarms that are specific to an individual AWS Lambda function by adding a tag to the instance using the tag key syntax described in changing the default alarm set.

Deploying in a multi-region, multi-account environment

You can deploy the CloudWatchAutoAlarms lambda function into a multi-account, multi-region environment by using CloudFormation StackSets.

Follow steps 1 through 7 in the normal deployment process. For step #3 and step #4, enter your AWS Organizations ID for the OrganizationID parameter in the sample S3 CloudFormation template and sample SNS CloudFormation template. This will update the resource policy to allow access to all accounts in your AWS organization.

Continue with the following steps to deploy a service managed AWS StackSet for the CloudWatchAutoAlarms lambda function. This will deploy the CloudWatchAutoAlarms Lambda function into the organization units that you specify. The lambda function will also be automatically deployed to new accounts in the AWS organization.

  1. Use the CloudWatchAutoAlarms CloudFormation template to deploy the Lambda function across multiple regions and accounts in your AWS Organization. This walkthrough deploys a service managed CloudFormation StackSet in the AWS Organizations master account. You must also specify the account ID where the S3 deployment bucket was created so the same S3 bucket is used across account deployments in your organization. Use the following command to deploy the service managed CloudFormation StackSet:

    aws cloudformation create-stack-set --stack-set-name amazon-cloudwatch-auto-alarms \
    --template-body file://CloudWatchAutoAlarms.yaml \
    --capabilities CAPABILITY_NAMED_IAM \
    --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
    --permission-model SERVICE_MANAGED \
    --parameters ParameterKey=S3DeploymentKey,ParameterValue=amazon-cloudwatch-auto-alarms.zip \
    ParameterKey=S3DeploymentBucket,ParameterValue=<S3 bucket with your deployment package> \
    --region <enter your aws region id, e.g. "us-east-1">
    
    1. After the StackSet is created, you can specify which AWS accounts and regions the StackSet should be deployed. For service managed StackSets, you specify your AWS Organization ID or AWS Organizational Unit IDs to deploy the lambda function to all current and future accounts within them. Use the following AWS CLI command to deploy the StackSet to your organization / organizational units:
    aws cloudformation create-stack-instances --stack-set-name amazon-cloudwatch-auto-alarms \
    --operation-id amazon-cloudwatch-auto-alarms-deployment-$(date | md5) \
    --deployment-targets OrganizationalUnitIds=<Enter the target OUs where the lambda function should be deployed> \
    --regions <enter the target regions where the lambda function should be deployed e.g. "us-east-1"> \
    --region <enter your aws region id, e.g. "us-east-1">
    

    You can monitor the progress and status of the StackSet operation in the AWS CloudFormation service console.

    Once the deployment is complete, the status will change from RUNNING to SUCCEEDED.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

amazon-cloudwatch-auto-alarms's People

Contributors

amazon-auto avatar czantoine avatar glennchia avatar gpang-godaddy avatar knizami avatar nizamik avatar sdhoka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amazon-cloudwatch-auto-alarms's Issues

2 changes: flushed userdata scripts. changed constants to env vars

#39

Description of changes:
src/cw_auto_alarms.py
Took the following constants:
default_period = '5m'
default_evaluation_periods = '1'
default_statistic = 'Average'
Added unique env vars for each: ec2, lambda and rds
For instance in my shop, unless RDS CPU is 100% for over 90 minutes, do not care.

Flushed out and used /etc/os-release to make calls OS dependent instead of having to comment or not comment.
even if this is meant as an example to put in your own userdata, makes life easier when checking out this tool.
userdata_linux_advanced.sh
userdata_linux_basic.sh
userdata_linux_standard.sh

disk_used_percent does not work with device name xvda1 in Amazon Linux AMI

From the CwAgent namespace, the collected metrics do not contain information xvda1 device name but the mapping

[root@ip-10-0-196-70 bin]# ls -la /dev/xvda1
lrwxrwxrwx 1 root root 9 Apr 21 15:25 /dev/xvda1 -> nvme0n1p1
[root@ip-10-0-196-70 bin]# cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

image

Check Platform from instance_info if image_info is unavailable

Problem

Alarms creation depends on finding out the Platform for the EC2 instance. Within the code, there is a call made to describe_images to extract the Platform. This is dependent on the AMI still being available in the account. There could be the following cases (not exclusive) which render AMIs unavailable

  1. EC2 instance restored from a AWS Backup Recovery point that has since exceeded its Retention period
  2. EC2 instance that was launched from an AMI that has since been deregistered

This results in the following shown in the console:

Screenshot 2022-08-17 at 1 02 40 AM

The code then sets Platform as None which means no alarms are created.

Proposal

In the event when image_info is unavailable, we can still attempt to extract the platform from instance_info since the API contains information regarding it: EC2 describe_instances.

Reference PR: #22

Limitation

This only works for Windows, Red Hat, and SUSE platforms. The API does not return information to determine whether the EC2 instance is an Ubuntu or Amazon Linux platform since both return Linux/UNIX. Hence in the latter case, users are encouraged to either ensure the AMI description or name has ubuntu inside. If that is not possible, they will have to further customize the code to perhaps extract the Platform from a Tag that they create and design on the EC2 instance

Lambda ERR >> Failure creating alarm: list index out of range

I Have this issue with my Windows platform and Basic CWAgent Namespace :

[ERROR] IndexError: list index out of range Traceback (most recent call last): File "/var/task/cw_auto_alarms.py", line 145, in lambda_handler process_alarm_tags(instance_id, instance_info, default_alarms, metric_dimensions_map, sns_topic_arn, File "/var/task/actions.py", line 205, in process_alarm_tags create_alarm_from_tag(instance_id, alarm_tag, instance_info, metric_dimensions_map, sns_topic_arn, alarm_separator) File "/var/task/actions.py", line 165, in create_alarm_from_tag val = additional_dimensions[num * 2 + 1]

????
Thanks !

Permissions error when ending instance

Upon terminating an instance the permissions error log is generated for the Lambda function:

[ERROR] 2021-01-21T16:50:16.960Z
Error deleting alarms for instance i-xxxxxxxxxxxxxxx!: An error occurred (AccessDenied) when calling the DescribeAlarms operation: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/cloudwatch-auto-alarms-CloudWatchAutoAlarmLambdaRo-xxxxxxxxxxx/CloudWatchAutoAlarms is not authorized to perform: cloudwatch:DescribeAlarms on resource: arn:aws:cloudwatch:us-east-1:xxxxxxxxxxxxxxxx:alarm:* |

Anomaly detection alarms

Newer to monitoring in AWS and I'm tasked with setting up anomaly detection alerts. I'm not quite understanding how to use this for the anomaly detection alarms. How can this be used using a standard deviation calculation and only alert us when something is "out of band?" I know how to do this in the AWS console but wondering how I would do that with this repo. Thanks.

Great repo btw.

Feature Request: Add tags to alarms

It would be great if there is be a kind of auto tagging for the auto-generated alarms, e..g.

  • copy the tags of the corresponding EC2 instance to the alarm or
  • stick tags to the alarms using a lambda hook which can be populated by the user

[ERROR] CloudWatchAutoAlarms Lambda

Hi ! I have this error when I test the lambda. There is the tag on the instance in the Value section that is added but it is impossible to get the CloudWatch alerts.

{ "errorMessage": "list index out of range", "errorType": "IndexError", "stackTrace": [ " File \"/var/task/cw_auto_alarms.py\", line 165, in lambda_handler\n scan_and_process_alarm_tags(create_alarm_tag, default_alarms, metric_dimensions_map, sns_topic_arn,\n", " File \"/var/task/actions.py\", line 379, in scan_and_process_alarm_tags\n process_alarm_tags(instance[\"InstanceId\"], instance, default_alarms, metric_dimensions_map,\n", " File \"/var/task/actions.py\", line 222, in process_alarm_tags\n create_alarm_from_tag(instance_id, alarm_tag, instance_info, metric_dimensions_map, sns_topic_arn, alarm_separator, alarm_identifier)\n", " File \"/var/task/actions.py\", line 167, in create_alarm_from_tag\n val = additional_dimensions[num * 2 + 1]\n" ] }

Log output :

START RequestId: 0813ee3d-98c2-48b6-97e5-838d1806a908 Version: $LATEST [INFO] 2023-01-24T16:42:36.104Z 0813ee3d-98c2-48b6-97e5-838d1806a908 event received: {'action': 'scan'} [INFO] 2023-01-24T16:42:37.100Z 0813ee3d-98c2-48b6-97e5-838d1806a908 ImageId is: ami-0cc814d99c59f3789 [INFO] 2023-01-24T16:42:37.483Z 0813ee3d-98c2-48b6-97e5-838d1806a908 Platform is: Amazon Linux [ERROR] 2023-01-24T16:42:37.484Z 0813ee3d-98c2-48b6-97e5-838d1806a908 Getting dimensions: list index out of range [ERROR] 2023-01-24T16:42:37.484Z 0813ee3d-98c2-48b6-97e5-838d1806a908 Failure describing reservations : list index out of range [ERROR] 2023-01-24T16:42:37.484Z 0813ee3d-98c2-48b6-97e5-838d1806a908 Failure creating alarm: list index out of range [ERROR] IndexError: list index out of range Traceback (most recent call last):   File "/var/task/cw_auto_alarms.py", line 165, in lambda_handler     scan_and_process_alarm_tags(create_alarm_tag, default_alarms, metric_dimensions_map, sns_topic_arn,   File "/var/task/actions.py", line 379, in scan_and_process_alarm_tags     process_alarm_tags(instance["InstanceId"], instance, default_alarms, metric_dimensions_map,   File "/var/task/actions.py", line 222, in process_alarm_tags     create_alarm_from_tag(instance_id, alarm_tag, instance_info, metric_dimensions_map, sns_topic_arn, alarm_separator, alarm_identifier)   File "/var/task/actions.py", line 167, in create_alarm_from_tag     val = additional_dimensions[num * 2 + 1]END RequestId: 0813ee3d-98c2-48b6-97e5-838d1806a908 REPORT RequestId: 0813ee3d-98c2-48b6-97e5-838d1806a908 Duration: 1420.32 ms Billed Duration: 1421 ms Memory Size: 128 MB Max Memory Used: 80 MB

Thank for you help !

object is not subscriptable with instance_info['Tags']

Hello - Alarm was not generated in my case as your python scripts ran into errors specially with instance_info:
instance_info = check_alarm_tag(instance_id, create_alarm_tag)

instance_info was returned either as bool or NoneType from check_alarm_tag, neither was considered subscriptable in process_alarm_tags(instance_id, instance_info, default_alarms, metric_dimensions_map, sns_topic_arn,
cw_namespace, create_default_alarms_flag, alarm_separator)

Below are the errors in Lambda log events (the line numbers may not be the same as in your scripts as I've added some print lines for debugging purpose):

[ERROR] TypeError: 'bool' object is not subscriptable
Traceback (most recent call last):
  File "/var/task/cw_auto_alarms.py", line 149, in lambda_handler
    process_alarm_tags(instance_id, instance_info, default_alarms, metric_dimensions_map, sns_topic_arn,
  File "/var/task/actions.py", line 190, in process_alarm_tags
    tags = instance_info['Tags']

'NoneType' object is not subscriptable: TypeError
Traceback (most recent call last):
File "/var/task/cw_auto_alarms.py", line 158, in lambda_handler
cw_namespace, create_default_alarms_flag, alarm_separator)
File "/var/task/actions.py", line 190, in process_alarm_tags
tags = instance_info['Tags']
TypeError: 'NoneType' object is not subscriptable

I tried both Python 3.6 and 3.8. Why am I getting the above errors and how to resolve? Thanks!

Body of email

This isn't an issue but more a question, hope it's ok if I put it here, my apologies and just delete if not.

Is there a way to alter the body of the email that gets sent when an alarm is in alert?

Either way, thank you for the great project!

Can't create alarms with dimensions as maximum length of a tag key is 128 chars

This is a really useful project, but I'm unable to create a disk alarm on Windows servers in an autoscaling group because the tag key can be a max of 128 chars (ref: https://docs.aws.amazon.com/config/latest/APIReference/API_Tag.html)

The tag I'm trying to create is AutoAlarm-CWAgent-LogicalDisk % Free Space-AutoScalingGroupName--objectname-LogicalDisk-instance-C:-LessThanThreshold-10m-1-Minimum-C: drive is less than 30% free for 10 minutes with value 30 but this doesn't seem possible. Even before I get to the description this is 133 chars.

Does this mean this project can only be used for basic alarms? It appears to support dimensions, but it's very easy to exhaust 128 chars.

Alarm is not created. Asking for ways to debug it.

Hello,

I have followed the steps provided and after creating the tag and restarting the instance I still see that alarm is not created..
Also, the Tag value does not have a time value set.

Can you please provide a way to debug it?

Thanks,
Yogesh

after the Merge pull request #32 from aws-samples/add_evaluation_periods - the activate function is broken

after the Merge pull request #32 from aws-samples/add_evaluation_periods - the activate function is broken
see error:
Response
{
"errorMessage": "scan_and_process_alarm_tags() missing 1 required positional argument: 'evaluation_periods'",
"errorType": "TypeError",
"stackTrace": [
" File "/var/task/cw_auto_alarms.py", line 181, in lambda_handler\n scan_and_process_alarm_tags(create_alarm_tag, default_alarms, metric_dimensions_map, sns_topic_arn,\n"
]
}

Function Logs
START RequestId: 6bb18acb-abee-440d-9738-6cee3f656c87 Version: $LATEST
[INFO] 2023-03-26T10:06:15.966Z 6bb18acb-abee-440d-9738-6cee3f656c87 event received: {'action': 'scan'}
[ERROR] 2023-03-26T10:06:15.966Z 6bb18acb-abee-440d-9738-6cee3f656c87 Failure creating alarm: scan_and_process_alarm_tags() missing 1 required positional argument: 'evaluation_periods'
[ERROR] TypeError: scan_and_process_alarm_tags() missing 1 required positional argument: 'evaluation_periods'
Traceback (most recent call last):
  File "/var/task/cw_auto_alarms.py", line 181, in lambda_handler
    scan_and_process_alarm_tags(create_alarm_tag, default_alarms, metric_dimensions_map, sns_topic_arn,END RequestId: 6bb18acb-abee-440d-9738-6cee3f656c87
REPORT RequestId: 6bb18acb-abee-440d-9738-6cee3f656c87 Duration: 16.05 ms Billed Duration: 17 ms Memory Size: 128 MB Max Memory Used: 52 MB Init Duration: 271.69 ms

Request ID
6bb18acb-abee-440d-9738-6cee3f656c87

Add EC2 Instance Name Tag to CW Alarm

Hi, would it be possible to pull the EC2 Instance name tag and apply that to the CW Alarm name? I feel this would make identifying the alarm much simpler at a glance, to see what system is alerting.

Monitor MSK?

Hi,

is there a way to simply automate creation of similar alarms on Managed Kafka?

example adding something like this....

],
'AWS/Kafka': [
{
'Key': alarm_separator.join(
[alarm_identifier, 'AWS/Kafka', 'KafkaDataLogsDiskUsed', 'Cluster Name', 'test-cluster', 'Broker ID', '1','GreaterThanThreshold', '5m', 'Average']),
'Value': alarm_kafka_data_logs_disk_used
},

Thanks in advance

How to add dimensions with - in the value?

I'm trying to work with the following dimensions pulled via AWS CLI on a working alarm that I created manually for this purpose:

user@PC:~# aws cloudwatch describe-alarms --alarm-names "tester"
...
"Dimensions": [
                {
                    "Name": "InstanceId",
                    "Value": "i-111111111111111"
                },
                {
                    "Name": "NodeName",
                    "Value": "ip-123-45-123-45.us-east-1.compute.internal"
                },
                {
                    "Name": "ClusterName",
                    "Value": "some-cluster-name"
                }
            ],
...

I've been working with this in the cw_auto_alarms.py file to rule out any issues with tagging (before I got to dashes likely being the issue) and have it hard coded in there currently like this:

       {
            'Key': 'AutoAlarm-ContainerInsights-node_filesystem_utilization-NodeName-ip-123-45-123-45.us-east-1.compute.internal-ClusterName-some-cluster-name-GreaterThanThreshold-1m-Average',
            'Value': alarm_cpu_high_default_threshold
        }

I have also tried wrapping the values with " and also \" to escape the quotes, but no luck.

The alarm isn't creating, and I have been able to create other alarms without extra dimensions, so I'm assuming this has something to do with the dashes in the values. This is for custom metrics using ContainerInsights if that helps. Any help is appreciated.

Adding custom metrics for alarm tracking like nvidia_smi_utilization_gpu

Hello!
I'm trying to add a custom metric which I'm collecting with CWAgent for tracking GPU utilization but I am having a hard time understanding what files exactly am I supposed to edit and in what way? Any sort of help would be much appreciated as I'm really new to DevOps and I'm struggling a lot...
Thank you!

Errors when creating alarms

Hi, can you please help me? I have followed the documentation closely, yet I run into two errors. One is this:

[ERROR] 2022-07-17T13:41:04.941Z 5e79d7d4-a54c-4fc0-8176-8acb4d227ad6 Error deleting alarms for i-07b880f05f4b10d42!: An error occurred (AccessDenied) when calling the DeleteAlarms operation: User: arn:aws:sts::502937263541:assumed-role/amazon-cloudwatch-auto-al-CloudWatchAutoAlarmLambd-HOTJT0RIB2HM/CloudWatchAutoAlarms is not authorized to perform: cloudwatch:DeleteAlarms on resource: arn:aws:cloudwatch:ca-central-1:502937263541:alarm: because no identity-based policy allows the cloudwatch:DeleteAlarms action
But that is the more minor issue, because Alarms don't get created to begin with anyway. The main problem is this is happening:

[ERROR] KeyError: 'Description' Traceback (most recent call last):   File "/var/task/cw_auto_alarms.py", line 145, in lambda_handler     process_alarm_tags(instance_id, instance_info, default_alarms, metric_dimensions_map, sns_topic_arn,   File "/var/task/actions.py", line 191, in process_alarm_tags     platform = determine_platform(ImageId)   File "/var/task/actions.py", line 232, in determine_platform     if 'ubuntu' in image_info['Images'][0]['Description'].lower() or 'ubuntu' in image_info['Images'][0][

It seems like there's errors when trying to create the alarms. It happens each time the Lambda function is trying to create one.

Also seeing entries such as these in the logs:

[ERROR] 2022-07-17T14:58:51.906Z 2acc20db-165e-4cc9-a729-9f47454bb6aa Failure describing image ami-0ab0f3079b6bb9ec1: 'Description'

Any ideas what is going on here?

Thanks.

Alarms not creating

Hello,

After performing steps in part 1, i am able to see metrics under CWagent namespace but there are no alarms created.
When i checked logs of lambda function, below error was found.
cloudwatch_error

Can anyone help me on this.

Thanks
Anmol

[ERROR] Adding auto alarm tags to lambda functions

Hello,

When trying to add a tag to a lambda function so that alarms will be created for the lambda function, the alarms are never created. Upon further investigation this error shows up in the cloudwatch logs.

[ERROR] TypeError: create_alarm() takes 9 positional arguments but 10 were given
Traceback (most recent call last):
  File "/var/task/cw_auto_alarms.py", line 155, in lambda_handler
    process_lambda_alarms(function, tags, create_alarm_tag, default_alarms, sns_topic_arn, alarm_separator, alarm_identifier)
  File "/var/task/actions.py", line 112, in process_lambda_alarms
    create_alarm(AlarmName, MetricName, ComparisonOperator, Period, tag['Value'], Statistic, Namespace,

The solution works perfectly fine with EC2 Instances however adding the tag on lambda functions does not seem to work. Is anyone able to use this solution to add alarms to lambda functions as well?

Only the last default RDS alarm is created

It appears there's a small indentation error that mistakenly left some items outside the for loop, causing only the last alarm configured in the RDS default alarms to be created.

for tag in default_alarms['AWS/RDS']:
alarm_properties = tag['Key'].split(alarm_separator)
Namespace = alarm_properties[1]
MetricName = alarm_properties[2]
ComparisonOperator = alarm_properties[3]
Period = alarm_properties[4]
EvaluationPeriods = alarm_properties[5]
Statistic = alarm_properties[6]
AlarmName = alarm_separator.join(
[alarm_identifier, db_id, Namespace, MetricName, ComparisonOperator, str(tag['Value']),
Period, "{}p".format(EvaluationPeriods), Statistic])
# capture optional alarm description
try:
AlarmDescription = alarm_properties[7]
AlarmName += alarm_separator + AlarmDescription
except:
logger.info('Description not supplied')
AlarmDescription = None
create_alarm(AlarmName, AlarmDescription, MetricName, ComparisonOperator, Period, tag['Value'], Statistic,
Namespace, dimensions, EvaluationPeriods, sns_topic_arn)

If you compare to the earlier Lambda implementation:

for tag in default_alarms['AWS/Lambda']:
alarm_properties = tag['Key'].split(alarm_separator)
Namespace = alarm_properties[1]
MetricName = alarm_properties[2]
ComparisonOperator = alarm_properties[3]
Period = alarm_properties[4]
# Provide support for previous formatting of custom alarm tags where the evaluation period wasn't specified.
# If an evaluation period isn't specified in the tag then it defaults to 1, similar to past behavior.
if alarm_properties[5] in valid_statistics:
EvaluationPeriods = 1
eval_period_offset = 0
else:
EvaluationPeriods = alarm_properties[5]
eval_period_offset = 1
Statistic = alarm_properties[(5 + eval_period_offset)]
AlarmName = alarm_separator.join(
[alarm_identifier, function_name, Namespace, MetricName, ComparisonOperator, str(tag['Value']),
Period, "{}p".format(EvaluationPeriods), Statistic])
# capture optional alarm description
try:
AlarmDescription = alarm_properties[(6 + eval_period_offset)]
AlarmName += alarm_separator + AlarmDescription
except:
logger.info('Description not supplied')
AlarmDescription = None
create_alarm(AlarmName, AlarmDescription, MetricName, ComparisonOperator, Period, tag['Value'], Statistic,
Namespace,
dimensions, EvaluationPeriods, sns_topic_arn)

Greatly appreciate this sample, thanks for your work here!

Add support for LBs

This is a great tool, and highly useful for meeting SOC2 compliance by adding alarms to AWS resources. Would it be possible to add support for load balancers?

Not Sure if an Issue or My Misunderstanding - EC2 Tags

Hi,
I'm using this with some EC2 Windows instances and it works well for monitoring Drive C.
However, some instances have other drives (eg D).
I've tested adding:
{
'Key': 'AutoAlarm-{}-LogicalDisk % Free Space-objectname-LogicalDisk-instance-D:-LessThanThreshold-5m-Average'.format(
cw_namespace),
'Value': alarm_disk_space_percent_free_threshold
},

into cw_auto_alarms.py, which also works, but that would then apply to ALL instances.
I had hoped I could add an EC2 Tag:
AutoAlarm-CWAgent-LogicalDisk % Free Space-objectname-LogicalDisk-instance-D:-LessThanThreshold-5m-Average
With a value of 1.
This doesn't appear to work, so I'm guessing I've misunderstood the README:

The tag name syntax for AWS provided metrics is:

AutoAlarm-----

Where:

Namespace is the CloudWatch Alarms namespace for the metric. For AWS provided EC2 metrics, this is AWS/EC2. For CloudWatch agent provided metrics, this is CWAgent by default.
You can also specify a different name as described in the Configuration section.
MetricName is the name of the metric. For example, CPUUtilization for EC2 total CPU utilization.
ComparisonOperator is the comparison that should be used aligning to the ComparisonOperator parameter in the PutMetricData Amazon CloudWatch API action.
Period is the length of time used to evaluate the metric. You can specify an integer value followed by s for seconds, m for minutes, h for hours, d for days, and w for weeks. Your evaluation period should observe CloudWatch evaluation period limits.
Statistic is the statistic for the MetricName specified, other than percentile.

Can anyone help?

Thanks,
Richie

Create Alarms on All Devices on the Instance

The solution currently offers customers the ability to create CloudWatch alarms on the root device. While this is helpful, I believe many customers would benefit from being able to create CloudWatch alarms on all devices associated with the EC2 instance.

For example, if EBS volumes are attached to an EC2 instance, the root device as well as the devices associated with the volumes would have CloudWatch alarms associated with them.

AccessDenied on creating alarm

On one of three accounts I've deployed this to so far, there is a permissions issue. On all alarm creations there is an error in the lambda logs:

     Error creating alarm AutoAlarm-###-AWS/EC2-EBSReadOps-GreaterThanThreshold-30000-1m-1p-Average!: An error occurred (AccessDenied) when calling the PutMetricAlarm operation: User: arn:aws:sts::###:assumed-role/amazon-cloudwatch-auto-al-CloudWatchAutoAlarmLambd-TCKGG4NPM4VD/CloudWatchAutoAlarms is not authorized to perform: iam:CreateServiceLinkedRole on resource: arn:aws:iam::###:role/aws-service-role/events.amazonaws.com/AWSServiceRoleForCloudWatchEvents because no identity-based policy allows the iam:CreateServiceLinkedRole action

I added this inline policy, retried and the alarms were created:

    {
        "Effect": "Allow",
        "Action": "iam:CreateServiceLinkedRole",
        "Resource": "arn:aws:iam::*:role/aws-service-role/events.amazonaws.com/AWSServiceRoleForCloudWatchEvents"
    }

But then I removed the policy, it still works. Then I dropped and recreated the stack. It works. It's a head scratcher.

Any ideas?

Update and refresh alarms when an alarm tag is updated

If the alert tag is changed, or the defaults are changed, then new alerts are created. But the current alerts that are meant to be replaced are left as is.

There is a scan function, possibly a refresh that will delete alerts and re add them.

Possibly allow args and the function to run with parameters. So as a lambda lambda handler is called. From main another function. I've had code used as lambda and called from CLI.

Disk Alarms not getting created

I have deployed the solution and works well for the CPU alarms, however for generating disk alarms & memory alarms its not generating.

When i check the alarms, it shows insufficient data, however when i check in the all metrics section of Cloudwatch the disk_free, disk_used do populate the data.

Went through the documentation again, do I need to add another tag like the one below and reboot the ec2 to generate the alarm
Secondly, xvda2 does this need to be changed to actual disk name like nvme0n1p2
AutoAlarm-CW_Metrics-device-xvda2-fstype-xfs-path-/-GreaterThanThreshold-10m-Average-default1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.