mlevit / aws-auto-cleanup Goto Github PK

View Code? Open in Web Editor NEW

495.0 20.0 54.0 2.72 MB

Programmatically delete AWS resources based on an allowlist and time to live (TTL) settings

License: MIT License

Python 84.95% CSS 0.33% HTML 9.80% JavaScript 4.92%

aws amazon-web-services cleaner cleanup lambda serverless-framework serverless python3 cloud aws-lambda

aws-auto-cleanup's People

Stargazers

Watchers

Forkers

shankar-moeng yosuperg kantravikumar alns madman045 tianomagdaong l1ahim hemantkbajaj truerocha southpolemonkey sjagvi lovepurohit adewealthgit ogfunkycold samit2040 rfesi autonux saaslify kviliev greg-mora kreynoldsf5 strongdm troydieter isabellarossi ryapric miki79 mlungwsr syllogy vivek-siva zhen-yang-syd grhaonan maxat2416 pvrajesh khalidtanjaoui jignesh88 j1anm1nxu maheskrishnan wolfsrudel kjm0001 nanderoo 0xsojalsec 5l1v3r1 lchiaratti kalx9 atqhg23 wooodhead lgeurts xiaowei6688 michelle-svien-pd ramonsesma barrettai iq-scm robert-altmiller francisbilla netronbarry

aws-auto-cleanup's Issues

Error with serverless deploy

I'm running the latest version of serverless, but seem to get errors on my sls deploy command. I've configured my profile, company, and region in my serverless.yml file. Any ideas what could be causing this error on deploy?

Serverless: Running ...

  Error --------------------------------------------------

  Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/usr/local/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/usr/local/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 57, in install_given_reqs
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 913, in install
    use_user_site=use_user_site, pycompile=pycompile,
  File "/usr/local/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 445, in move_wheel_files
    warn_script_location=warn_script_location,
  File "/usr/local/lib/python3.7/site-packages/pip/_internal/wheel.py", line 320, in move_wheel_files
    prefix=prefix,
  File "/usr/local/lib/python3.7/site-packages/pip/_internal/locations.py", line 180, in distutils_scheme
    i.finalize_options()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/command/install.py", line 248, in finalize_options
    "must supply either home or prefix/exec-prefix -- not both")
distutils.errors.DistutilsOptionError: must supply either home or prefix/exec-prefix -- not both


     For debugging logs, run again after setting the "SLS_DEBUG=*" environment variable.

  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com

  Your Environment Information ---------------------------
     OS:                     darwin
     Node Version:           11.14.0
     Serverless Version:     1.41.0

Add ElastiCache Cache Cluster support

AWS IAM support

Cloudformation resources do not get whitelisted

Describe the bug
The debug logs say that resources within a cloudformation stack are whitelist but when you look at the whitelist itself those resources are not there.

To Reproduce
Steps to reproduce the behavior:
Run the app and have cloudformation deletion enabled.

Expected behavior
Resources within cloudformation stacks should be whitelist and they are not

Screenshots

Versions (please complete the following information):

Serverless Framework: 2.62.0
boto3: 1.17.75
botocore: 1.20.75

AWS (please complete the following information):

Region: us-west-2

Additional notes

When I invoke again it shows the same "adding to whitelist" message so this confirms that on the initial run it says it is adding them but it's not doing it.

Question: some of resources are skipped because of whitelist though they are not in whitelist

Hello,

I have couple of questions I am trying to figure out:

I noticed that though my whitelist looks like this as of now

In the run logs I see that
some resources are skipped because of being whitelisted,

But they are not part of any of the stacks in whitelist
for example

it is not visible that this one is not in any stack in whitelist but I can confirm it is not.

on the other hand some resources that ARE in CFN stacks that are in whitelist
are marked as skipped by TTL (though I thought they would be marked as skipped by whitelisted)

Could you please clarify how this works? Maybe I don't understand the logic of how the tool works.

Thank you!

Dependabot couldn't authenticate with https://pypi.python.org/simple/

Dependabot couldn't authenticate with https://pypi.python.org/simple/.

You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.

View the update logs.

Security group attached to load balancer shows delete action in execution log when it is in use

Describe the bug
The execution log is showing that a security group will be deleted even though it is attached to a load balancer.

To Reproduce
Steps to reproduce the behavior:

Create a new security group and attach it to a load balancer only
Run the cleanup in dry_run mode
Check the execution log entry for the security group
The action will say delete when it should skip it because it is in use

Expected behavior
The action for the security group should say SKIP - IN USE since the security group is in use by a load balancer.

Versions (please complete the following information):

Serverless Framework: [e.g. 1.42.3]
boto3: [1.9.156]
botocore: [1.12.156]
moto: [1.3.8]
pytest: [4.4.1]

AWS (please complete the following information):

Region: us-east-1

Additional context
Add any other context about the problem here.

Terraform state parser to determine whitelist

It might be cool to have an automated way to determine the whitelist map. Like be able to pass an argument to specify an S3 bucket containing .tfstate files and iterate over them maybe 🤔

Amazon EMR support

Resource types

Clusters
Notebooks

Consider refactoring python module directories

Have you considered refactoring the python code into module directories?

Deploying of api throws an error Rest API id could not be resolved.

Hello,

While trying to set up a pipeline to deploy and run aws-auto-cleanup noticed that when deploying api component

npm run deploy -- --region ap-southeast-2

the stack looks to be deployed fine, all resources inside it as well

But in the very end npm throws this error:
Rest API id could not be resolved.

and exits with status 1 (error)

Attaching full log:
apideploylogs.txt

When I was deploying initially from local machine it didn't matter much as I checked that it deployed fine and was working after that,
But now it is failing pipeline because of that.

What exactly is it trying to resolve and why?
Thanks!

Modify the account number header in the top-right corner to show the account alias as well

Is your feature request related to a problem? Please describe.
No.

Describe the solution you'd like
I would like for the header in the top right that shows the account number to show the account alias as well.

Describe alternatives you've considered
An output for the account alias could be added and then be referred to in the index.js file.

Additional context
Attached screenshot of header location.

For ec2:security_group status is changed depending on run mode

Describe the bug
When I run the tool in dry_run mode it marks a sg that is attached to an ec2 that is skipped due to TTL as DELETE
When I run the tool in destroy mode it marks a sg that is attached to an ec2 that is skipped due to TTL as SKIP IN USE

To Reproduce
Create an EC2 instance (with volume and SG attached by default)
Run in dry run mode, run in destroy mode, compare logs

Expected behavior
In both cases should be marked as SKIP in USE

Screenshots

[ERROR] Could not generate resource tree. (lambda_handler.py, build_tree(), line 352)

Everything runs Okay but at the end resource tree cannot be created and copied to s3 bucket.
See last lines I see in Lambda function output.

[INFO] Auto Cleanup completed. (lambda_handler.py, run_cleanup(), line 168)
00:40:31
[ERROR] Could not generate resource tree. (lambda_handler.py, build_tree(), line 352)
00:40:31
[ERROR] invalid syntax (_base.py, line 414) (lambda_handler.py, build_tree(), line 353)
00:40:31
END RequestId: 4d42e55b-b159-4524-bf4e-148447aa73ac

Security group used in auto scaling group launch configuration

Restrict API access by IP

At the moment when you deploy the API, there is no security mechanism in place, so everybody that can access the API can change the whitelist and look at the resources.

A quick solution to block the access would be via IP restriction, so that the frontend web app wouldn't require any changes.

To restrict access by IP, it would be possible to use the resource policy for the API gateway and set it up via serverless with something like this

  resourcePolicy:
    - Effect: Allow
      Principal: '*'
      Action: execute-api:Invoke
      Resource:
        - execute-api:/*/*/*
      Condition:
        IpAddress:
          aws:SourceIp:
            - ${self:custom.ipRange}

so the changes to serverless.yml in api directory would be like this

custom:
  log_level: INFO # DEBUG for dev | INFO for prod
  region: ${opt:region, "ap-southeast-2"} # AWS deployment region
  manifest:
    output: ../web/src/serverless.manifest.json
    silent: true
  pythonRequirements:
    layer:
      name: ${self:service}-${self:provider.stage}-requirements
      compatibleRuntimes:
        - python3.8
    noDeploy:
      - boto
      - boto3
      - botocore
    slim: true
  ipRange: ${opt:ip, "0.0.0.0/0"} # overwrite via CLI "--ip 50.1.0.0/24"

provider:
  name: aws
  runtime: python3.8
  stage: ${opt:stage, "prod"} # overwrite via CLI "--stage dev"
  region: ${self:custom.region}
  profile: ${opt:profile, ""} # overwrite via CLI "--aws-profile saml"
  resourcePolicy:
    - Effect: Allow
      Principal: '*'
      Action: execute-api:Invoke
      Resource:
        - execute-api:/*/*/*
      Condition:
        IpAddress:
          aws:SourceIp:
            - ${self:custom.ipRange}
  apiGateway:
    minimumCompressionSize: 1024

How to hide whitelist entries/items that are in the permanent whitelist?

Is your feature request related to a problem? Please describe.
No, sorry to keep bombarding with questions. I am trying to hide the items that are part of the permanent whitelist. I've been testing trying to make changes to the index files and css file, but have not gotten this working.

Describe the solution you'd like
Hide the items in the permanent whitelist.

Describe alternatives you've considered
A drop-down could be added for the permanent whitelist section so that the permanent entries are only shown when the drop-down is clicked.

Web app to expose DynamoDB settings and whitelist tables

Auto Cleanup interacts with two tables that live within DynamoDB (auto-cleanup-settings and auto-cleanup-whitelist). In order to enable users to interact with both of these tables in a more user friendly way, a simple to use web app (preferably static in nature) would need to be built.

Settings

The settings page should simple expose the table allowing addition, modification and deletion of rows.

Whitelist

The whitelist page should allow addition, modification and deletion of rows but should also ensure the expire_at column is automatically populated with an EPOCH timestamp of today plus 7 days (this ensures no one except admins can whitelist a resource permanently).

Against each whitelist record, users should have the ability to extend their whitelist from today to today plus 7. This will ensure that without interaction from the users, their whitelisting records will be removed by DynamoDB and consequently removed by Auto Cleanup.

Question: when removed TTL for EC2 instances - they are still skipped due t oTTL

I wanted to simulate the situation when a newly created ec2 instance will be destroyed as a result of the run.

I created the instance and removed the TTL setting for EC2 instances in dynamodb altogether so that even newly created instances could be destroyed.

...
"instance": {
"clean": true,
"id": "Instance ID"
},
....

When running in destroy mode after it I see that the EC2 instance is still being skipped because of TTL

Does it use some default TTL if none is specified?
How can I simulate a situation when TTL is not applied to a certain resource type?

Thanks!

Update DynamoDB whitelist Table

Whitelist table seem to be adding but not able to remove
For example, Adding new resource to the whitelist works fine, but if we update the resource through json, it did not seem to be updated

Security groups added to the whitelist showing blank action in execution log when it should say "SKIP - WHITELIST"

Describe the bug
When a security group is added to the whitelist, the action in the execution log is blank.

To Reproduce

Add a security group to the whitelist
Run the cleanup with dry-run on/off
Check the execution log action for the security group added to the whitelist
The action appear blank

Expected behavior
The execution log action should say "SKIP - WHITELIST"

AWS (please complete the following information):

Region: us-east-1

AWS Security Groups support

What needs to be changed to change the name of the execution log CSV files?

Is your feature request related to a problem? Please describe.
No, I am just trying to change the name of the execution log CSV files.

Describe the solution you'd like
I am trying to change the name of the execution log CSV files, and I changed the value in the app/src/main.py file. Below is what I changed:

Line 527
key = f"""{now.strftime("%Y")}/{now.strftime("%m")}/execution_log_{now.strftime("%Y_%m_%d_%H_%M_%S")}.csv"""

When I changed the line above, there is an issue with the API and no files come up now. A 400 Bad Request status code comes up. I think something else needs to be changed in the source code for the API, or index files. I searched the repo to see what else may need to change, but could not find this. I'm probably missing something simple, but appreciate any help on this.

EC2 instances are not being detected as part of the cleanup.

I am focusing on cleaning up EC2 resources including elastic IPs, images, instances, security groups, snapshots, and volumes. When I run the cleanup, all of the resources for EC2 appear in the execution log except EC2 instances. I verified that the AWS account has EC2 instances older than the TTL setting that I set, the role running the app lambda has the EC2 access needed and no errors show up in the lambda logs. Any ideas on what may be causing this?

How can the time zone for the execution log dates be changed?

The date shown for the execution logs is currently in Coordinated Universal Time (UTC). I would like to change the time zone to Eastern Standard Time (EST).

Error when invoking "TypeError: argument of type 'NoneType' is not iterable"

Describe the bug

Invoking a cleanup with
npm run logs -- --region ap-southeast-2 --aws-profile rsadmin

Getting this error in the process:
[INFO] Switching to 'ap-southeast-2' region. (main.py, run_cleanup(), line 66)
Exception in thread Thread-106:
Traceback (most recent call last):
File "/var/lang/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/var/lang/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/var/task/src/cloudformation_cleanup.py", line 255, in delete_stack
if "/" in resource_child_id:
TypeError: argument of type 'NoneType' is not iterable

Full log with debug level attached:
output.txt

I suspect it is failing on a certain resource, how do I figure out which one exactly?

Deleting resources with resource based policies fails

I noticed issues with deletion of KMS keys (and actually any other resource that has resource based access policy)

Though lambda role does have permissions to * when executed through cloudformation - the KMS policy allows only certain roles to delete or access it.

What is the suggested approach with such cases?

Consider adding .gitignore and remove .serverless directory

Nice project!

Also consider tweaking ReadCapacityUnits etc and set it to a lower number to reduce cost further.

[ERROR] [Errno 2] No such file or directory: 'auto_cleanup/data/auto-cleanup-settings.json' (lambda_handler.py, setup_dynamodb(), line 283)

Describe the bug
For the first time after deployment lambda executed in 'DRY' mode as by default. I changed the rate of the lambda execution to 'every 5 minutes' then and the mode to 'DESTROY'. It executed successfully the first time and cleaned up the resources. After that, it started logging this error message for any of the next executions.

START RequestId: 4438f2ea-325f-4e23-b75c-03dc8eaf7c23 Version: $LATEST
[ERROR] [Errno 2] No such file or directory: 'auto_cleanup/data/auto-cleanup-settings.json' (lambda_handler.py, setup_dynamodb(), line 283)
[INFO] Auto Cleanup started in DESTROY mode. (lambda_handler.py, run_cleanup(), line 45)
[INFO] Switching region to 'ap-south-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'eu-north-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'eu-west-3'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'eu-west-2'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'eu-west-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Skipping region 'ap-northeast-3'. (lambda_handler.py, run_cleanup(), line 152)
[INFO] Switching region to 'ap-northeast-2'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'ap-northeast-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'ca-central-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'sa-east-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Skipping region 'cn-north-1'. (lambda_handler.py, run_cleanup(), line 152)
[INFO] Switching region to 'ap-southeast-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'ap-southeast-2'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'eu-central-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'us-east-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'us-east-2'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Skipping region 'cn-northwest-1'. (lambda_handler.py, run_cleanup(), line 152)
[INFO] Switching region to 'us-west-1'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'us-west-2'. (lambda_handler.py, run_cleanup(), line 49)
[INFO] Switching region to 'global'. (lambda_handler.py, run_cleanup(), line 155)
[INFO] Skipping cleanup of S3 Buckets. (s3_cleanup.py, buckets(), line 186)
[INFO] Skipping cleanup of IAM Roles. (iam_cleanup.py, roles(), line 286)
[INFO] Auto Cleanup completed. (lambda_handler.py, run_cleanup(), line 171)
[INFO] Resource tree has been built and uploaded to S3 's3://auto-cleanup-prod-resourcetreebucket-kiliev/resource_tree_2020_05_20_22_31_50.txt. (lambda_handler.py, build_tree(), line 349)
END RequestId: 4438f2ea-325f-4e23-b75c-03dc8eaf7c23
REPORT RequestId: 4438f2ea-325f-4e23-b75c-03dc8eaf7c23 Duration: 81231.30 ms Billed Duration: 81300 ms Memory Size: 128 MB Max Memory Used: 126 MB
XRAY TraceId: 1-5ec5af85-16cceece5b61aa7ad3ac5956 SegmentId: 3e54e9ca12b29144 Sampled: true

Generate a report on what resources are deleted and what are whitelisted with some level of operational detail

Is your feature request related to a problem? Please describe.
we are frustrated when we need to generate a report on actions, e.g. what resources are deleted or whitelisted and when. It appears that CW logs has everything logged including when a resource is cleaned up or deleted, but it's all plain text and not so straightforward to extract those actions from all the logs in an organized way.

Describe the solution you'd like
If the actions can be recorded on a DynamoDB table with a few attributes, e.g. resource type, resource id, account id, region, type_of_action (deleted or whitelisted) and timestamp, that would be great for governance and reporting.

Describe alternatives you've considered
At the minimum level if the relevant logs can be redirected and saved in a specified S3 bucket as file, that would be a great start.

Additional context
I hope this FR makes sense. My apologies if the request has been implemented already or it has been discussed before and was decided not to move forward. Please advise. Thank you.

Amazon SageMaker support

Resource types

Notebook instances
Endpoints

Send email notification with exec log report attached 3 days before the actual cleanup takes place

Is your feature request related to a problem? Please describe.

Yes, the problem is that when dry-run is off, you do not know what actions the cleanup will take until it actually runs. Currently, dry-run mode can be turned on to view the execution log, but I would like to keep dry-run off, and also generate a report prior to the cleanup taking place. The goal is to allow for some time to review the execution log and determine what needs to be whitelisted before the cleanup destroys the resources without needing to manually change the dry-run mode.

Describe the solution you'd like

When the cleanup is running in destroy mode, it first runs in dry-run mode to generate the execution log, and then X number of days later, it will run in destroy mode.

When the execution log is generated by the first run, it is placed in the “dry-run” folder, and an S3 event notification is triggered based on the object creation that sends the report in an email to the address specified.

You now have X days to review the execution log and see what resources need to be added to the whitelist, before the cleanup runs in destroy mode and the execution log is placed in the “destroy” folder and another email notification is sent with the post-cleanup execution log.

I may be overthinking this, and there may be a simpler approach, but I’m open to any ideas to address this without having to manually switch the dry-run toggle and send an email notification with the pre-cleanup and post-cleanup execution log.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Not sure of other alternatives, but the goal is to add a buffer between when the execution log is generated and when the cleanup actually runs so that there is some time to review the execution log prior to the cleanup destroying any resources while keeping dry-run off.

Additional context

Example logic below:

Dry-run is off

Cleanup runs in dry-run mode to generate execution log.
The execution log is added to the S3 bucket in the “dry-run” folder and event is triggered that emails the report.
Cleanup runs again in 3 days (configurable) in destroy mode, places the execution log in the “destroy” folder, and emails the report.

Dry-run on

Cleanup runs in dry-run mode to generate execution log.
The execution log is added to the S3 bucket in the “dry-run” folder and an event is triggered that emails the report.
Cleanup runs again in 3 days (configurable) in dry-run mode, places the execution log in the “dry-run” folder and emails the report.

Question: DELETE - NOT CONFIRMED

I have a question about this status -
what exactly does it mean?

I ran in destroy mode and the resource that was marked like that - was not deleted.

What defines that?

Question/Feature Request

Just a quick question. I am looking into implementing this into my environment however I was wondering if there was a way to adapt this to utilizing tags instead of TTL?

The reason behind this is we often have ec2 instances brought up with either incorrect tags or in some cases large ec2 instances that have been forgotten. This is becoming quite costly. With the ability to use this cleanup method with tags I could eliminate this all together by having people stick with a standard or have their ec2 instance removed should they not follow it.

Add EKS Cluster support

Cognito support for Web feature

Is your feature request related to a problem? Please describe.
Add a secondary module that secures the Web deployment with AWS Cognito.

Describe the solution you'd like
For API - limit access to only the web portion using an AWS IAM role & associated policy to add items to the DDB table\whitelist.
For Web - AWS Cognito secured through a user pool

Describe alternatives you've considered
JWT authorizer, but figured it'd be easier with limiting API access through an AWS IAM Role with an IAM policy granting access.

Additional context
Current access for web seems to be Wide Open Web, restricting this.

DynamoDB values are overwritten each time the app runs

Each invocation of the app will overwrite all records in the two DynamoDB tables with the default values (new records will no be altered).

Amazon ECS support

Fail to evaluate volumes with empty attachments

https://github.com/servian/aws-auto-cleanup/blob/c6ea79d576f46bf1b66d344c1baa00bfcc9c5987/auto-cleanup/ec2_cleanup.py#L305

Current condition fails to detect empty lists return by "resource.get('Attachments')" .

[ERROR] Could not read DynamoDB table 'auto-cleanup-whitelist-prod'.

Describe the bug
The get_whitelist() method of lambda_handler.py is unable to read data from auto-cleanup-whitelist-prod table
auto-cleanup-whitelist-prod table exists

Expected behavior
The get_whitelist() method of lambda_handler.py must be able to read data from auto-cleanup-whitelist-prod table

Screenshots

Versions (please complete the following information):

Serverless Framework: [e.g. 1.58.0]

AWS (please complete the following information):

Region: [e.g. us-east-2]

serverless remove fails when trying to delete a non empty S3 bucket

mba:aws-auto-cleanup nico$ serverless remove
Serverless: Getting all objects in S3 bucket...
Serverless: Removing objects in S3 bucket...
Serverless: Removing Stack...
Serverless: Checking Stack removal progress...
.......
Serverless: Operation failed!
Serverless: View the full error output: https://us-east-2.console.aws.amazon.com/cloudformation/home?region=us-east-2#/stack/detail?stackId=auto-cleanup-production

  Serverless Error ---------------------------------------

  An error occurred: ResourceTreeBuckett - The bucket you tried to delete is not empty (Service: Amazon S3; Status Code: 409; Error Code: BucketNotEmpty; Request ID: 67617758934444FC; S3 Extended Request ID: HDY+1eCAEY8Ei2tmf2ufRlRFtnd0pmvVYa/4wiAGCMH+mOmH7Eq/rGa1zRVXt4pOLElsxmJEzYk=).

  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com

  Your Environment Information ---------------------------
     OS:                     darwin
     Node Version:           11.12.0
     Serverless Version:     1.41.1

Question: Just curious, what was the reason for setting this up with the Serverless Framework? Are you open to allowing others to contribute setting up the infrastructure using terraform as another deployment alternative?

Add Managed Streaming for Kafka Cluster support

Amazon Redshift support

Resource types

Clusters
Snapshots

Execution log is showing resource is in whitelist, but resource is not in the temporary or permanent whitelist

Describe the bug
The execution log is showing that an EC2 instance is part of the whitelist (the action appears as SKIP- WHITELIST), but the instance is not part of the temporary or permanent whitelist, and is not showing up in the whitelist DynamoDB table.

To Reproduce
I am not sure what caused this. This came up because a user was trying to add the instance to the whitelist from the execution log, but they were not able to since according to the exec log it was already part of the whitelist, when it was actually not.

I checked CloudTrail to see if there was any activity that updated the whitelist DynamoDB table to see if someone added the resource to the whitelist and then removed it, but no results came up. I looked for all non-read-only actions and also did another search only looking for actions specific to modifying the table items including BatchGetItem, ConditionCheckItem, GetItem, BatchWriteItem, DeleteItem, PutItem, UpdateItem.

I also checked the CloudWatch logs for the lambda function in dry-run, and the instance was not being at all in the logs, and actually the lambda function timed out as well since it went past 15 minutes, maybe this had something to do with it, but looking at the logs, the cleanup had already finished targeting instances and was now targeting security groups

It could be that this is no big deal. We have been running the cleanup for over 4 months now and this is the first time we hear about this.

Expected behavior
The instance should have been marked as delete since it is older than the TTL that we set (25 days).

AWS (please complete the following information):

Region: us-east-1

Dry Run disable in app

Dear Team,

I'd like to disable dry_run and delete the resources.

What I can see from the code is that I need to change both version and dry_run in the data file to fulfil the requirement. Is it correct?

Thanks in advance.

SG has been whitelisted but still deleted.

Describe the bug
SG has been whitelisted but still deleted.

To Reproduce
from CloudWatch logs
[INFO] EC2 Security Group 'sg-09d066f32959feb7d' is not associated with an EC2 instance and has been deleted. (ec2_cleanup.py, security_groups(), line 293)

Entry in DynamoDB as below
resource_id String: ec2:security_group:sg-0127c8a90d0070711

Expected behavior
I was expecting the process to skip this SG as it was whitelisted,

Screenshots
If applicable, add screenshots to help explain your problem.

Stacktrace
[INFO] EC2 Security Group 'sg-09d066f32959feb7d' is not associated with an EC2 instance and has been deleted. (ec2_cleanup.py, security_groups(), line 293)

Show the resource age in the execution log

Is your feature request related to a problem? Please describe.

Yes, the execution log shows the "SKIP - TTL" action for resources that are under the TTL setting, but it would be useful to know the exact resource age to see how long it will be until a resource is destroyed. This can also be useful in determining what the TTL setting should be set to. I think it would help to include the resource age in the whitelist (both temporary and permanent) as well.

Describe the solution you'd like

Add a column in the execution log titled "Age" that shows the age in days of each resource scanned.

Describe alternatives you've considered

Not really sure of an alternative for this.

Additional context
Example screenshot below

AWS Elastic Beanstalk support

Question/Feature Request

The documentation suggests that the type for ttl is an integer for the number of days.

The question is, can it be a float to represent the number of hours in a day?

If not, the feature request would be to either switch ttl to be the number of hours or for it to accept floats for fractions of days.

The reason behind this question/feature request is because we'd like to configure this for our CI infrastructure where we want to limit the resources to exists for way less than a day.

mlevit / aws-auto-cleanup Goto Github PK

aws-auto-cleanup's People

Stargazers

Watchers

Forkers

aws-auto-cleanup's Issues

Recommend Projects

Recommend Topics

Recommend Org