channl / dynamodb-lambda-autoscale Goto Github PK

View Code? Open in Web Editor NEW

313.0 313.0 88.0 209 KB

Autoscale DynamoDB provisioned capacity using Lambda

License: MIT License

JavaScript 100.00%

dynamodb-lambda-autoscale's People

Contributors

Stargazers

Watchers

Forkers

ubergeek42 mamoit icsfl mcinteer educationperfect gump59 keen99 northarrowresearch thekono theburningmonk fingerfoodstudios hefloricardo rifki192 sethdford quintonm tbensonwest tpeltola mahiguch mikepugh arleyandrada tylfin pretty00butt gnardari balmbees root360 alessh sturgiswebservices barianet laurihosio shadowhealth cast filippomortari appspected awinder miesvanderrobot kununu hwdgroup shewitt zerdush owi1972 pathos0925 bigbirds83 nkishorepics inreachventures collectiveds jedsumulong dkleissa oytuntez fumiki xaf cwhyland jressey sammeowww livingtocode louiedenolo jjang16 swinz jhuapl-boss ronfreiberg integratedresearch voteetvous andaag ausiv upday annawang csilivestru scrumworks vineetknehwal kencorbettjr paragjagdale asvarshn harsh0707 abhimanyu1289 unificdotcom nikolabravo hansbonini minhphamdhi bhanu735 devopsmatt trucnguyenlam raineydavid

dynamodb-lambda-autoscale's Issues

Add circuit breaker logic

[Feature Request] Different config for different tables :open_mouth:

Hi there I've started using this and it works fantastically well. I thought I'd offer some insights as to how I'm using it and a potential feature.

I'm scaling a subset of my tables with this function. The way I'm doing that is by maintaining a tables-to-scale dynamo table that holds the names of the tables I want to scale. Our config then selects that list in the getTableNamesAsync function. This works well in that I don't need to update the code in order to add/remove tables we want to scale.

What I have found though is that I want to scale the tables with different configuration. For example, I have 1 table that has enough load to warrant >1000 read throughput and another that barely reaches 5 read throughput. Ideally I would like to define a different scaling policy for each.

If I were to implement this, then I think I'd try and keep the config next to the tablenames in the dynamo table I mentioned earlier. I wanted to check in with you before doing that though. Is this a usecase that you have thought about implementing and if so do you have any idea as to how you would do it?

Thanks again,
Ryan

GetMetricStatistics returning 'unexpected' data

Hi again.

Since I pulled down the latest changes to get the config per table I noticed that my tables have not been scaling up as I had expected so I did some digging and found the following on my DynamoDb metrics page.

If I'm reading the CapacityCalculatorBase correctly then it's getting the Average count which is always returning as 0.5 for me even when my consumed capacity is much higher.

After this I went into Cloudwatch to try and explain what was going on and found that it was reporting consistent stats to that which the lambda was getting:

I then went through the process of trying to work with the cloudwatch metrics and the formula talked about on the DynamoDb metrics page to turn the ConsumedReadCapacityUnits into something similar to what was being reported by the DynamoDb metrics. I found that if I changed what I queried in cloudwatch to Sum then it started making a bit more sense:

1,752/60 = 29.05

And that marries up quite nicely with this number from my DynamoDb metrics:

One thing I don't understand is that I'm sure that the lambda was scaling my tables prior to pulling in the latest changes but from what I can see in your most recent commits, nothing around the cloudwatch metrics gathering changed.

So I think what I'm asking is, is this 🐛 or am I doing something silly?

I'm happy to put together a fix and open a PR if this is in fact a bug, let me know.

Thanks,
Ryan

Access Keys Required for Lambda?

This is more of a question, but potentially an issue. Does/should the config.env file be required when you deploy to a lambda? Isn't that running as a role, and therefore could inherit its permissions based on that?

Quick question about only scaling some tables

Hi there, I have a question around how you envisage people scaling only a set of tables. I have a number of dynamo tables and I just want a subset of them to use autoscaling. I've had a look through the code and I think the right place to filter the list of tables to scale is the DynamoDB object.

Is this correct, or is there some other built in way that I could filter tables?

Thanks in advance,
Ryan

Allow more decreases per day as per docs

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#default-limits-capacity-units-provisioned-throughput

"A dial down is allowed up to four times any time per day. A day is defined according to the GMT time zone. Additionally, if there was no decrease in the past four hours, an additional dial down is allowed, effectively bringing maximum number of decreases in a day to nine times (4 decreases in the first 4 hours, and 1 decrease for each of the subsequent 4 hour windows in a day)."

Right now only 4 decreases work, not the full potential of 9.

Would be nice to have a provisioner that selects tables based on tags

Thinking then I could just tag tables that should be included in scale decisions and not have to redeploy the lambda

Table IOPS are currently being updated

I use dynamodb-lambda-autoscale but when table and global secondary indexes allow INCREMENT. But I see only global secondary indexes INCREMENT and capacity of table NOT INCREMENT. The occur I get from log

Table IOPS are currently being updated

May I can INCREMENT table and global secondary indexes in this case.

[email protected] not compatible with linux

npm WARN notsup Not compatible with your operating system or architecture: [email protected]

running Linux 4.5.0-2-amd64 #1 SMP Debian 4.5.5-1 (2016-05-29) x86_64 GNU/Linux

Trigger run via CloudWatch Event

Hi Guys,

We have occasional spikes that are only picked up after a minute. During this time we see failures due to underprovisioned capacity. Would there be any issues in using CloudWatch events to trigger the lambda script additionally to a time interval?

Enrico

Support windows OS when running locally

Raised in response to #2

Looks like dotenv using a .env file is not supported on a windows OS as per motdotla/dotenv#83

npm run start fails due to case sensitivity

I guess that you run this on mac, since by default HFS+ is case insensitive.
Since I'm running this on debian, ./scripts/start.js and ./scripts/Start.js are not the same thing.

AWS Officially releases native DynamoDB autoscaling

And just a couple days after I forked your project and deployed it...

https://aws.amazon.com/blogs/aws/new-auto-scaling-for-amazon-dynamodb/

Cannot read property 'histogram' of undefined

{
  "errorMessage": "Cannot read property 'histogram' of undefined",
  "errorType": "TypeError",
  "stackTrace": [
    "Object._callee2$ (/var/task/index.js:184:3414)",
    "tryCatch (/var/task/index.js:8381:41)",
    "GeneratorFunctionPrototype.invoke [as _invoke] (/var/task/index.js:8655:23)",
    "GeneratorFunctionPrototype.prototype.(anonymous function) [as next] (/var/task/index.js:8414:22)",
    "tryCatch (/var/task/index.js:8381:41)",
    "invoke (/var/task/index.js:8457:21)",
    "/var/task/index.js:8465:14",
    "run (/var/task/index.js:4904:23)",
    "/var/task/index.js:4917:29",
    "flush (/var/task/index.js:5287:10)"
  ]
}

Add an SNS notification channel for increase / decrease throughput?

If dynamodb-lambda-autoscale had the ability to fire off an SNS notification for every increase or decrease of capacity, it could help with building some tooling around this. For instance, i'd love to create a simple Slackbot that just notifies when capacity increases/decreases, so that I can keep an eye on what is happening and when. Thoughts?

Lambda fails with error

Getting below message

"errorMessage": "RequestId: c544d7ed-704f-11e7-8aae-a5fadaa86aee Process exited before completing request"

@tmitchel2 Any pointers?

How to scale down working

This is my configure

{
  "ReadCapacity": {
    "Min": 2,
    "Max": 20,
    "Increment": {
      "When": {
        "UtilisationIsAbovePercent": 90
      },
      "By": {
        "Units": 2
      },
      "To": {
        "ConsumedPercent": 110
      }
    },
    "Decrement": {
      "When": {
        "UtilisationIsBelowPercent": 30,
        "AfterLastIncrementMinutes": 60,
        "AfterLastDecrementMinutes": 60,
        "UnitAdjustmentGreaterThan": 2
      },
      "To": {
        "ConsumedPercent": 50
      }
    }
  }
}

When I set table in Dynamodb is 25. It will scale down 20. But from 20 it doesn't scale down to 2. I want to autoscale to min value if the table in Dynamo doesn't read any units. How to configure?

Incorporate throttled events into provisioning calculations

Is there a setting to limit the number of downscales per day?

I see a constant in RateLimitedDecrement.js that I could change from 4 to the desired value I suspect but wasn't sure if this could be done via config.

Since we have to fork to deploy anyway I can just change the constant.

We have a challenge where we scale reads way up on a table, dump to S3, and then scale them back down. We need to reserve at least one decrement window in the 24 hour window for this process. Reducing the 4 to a 3 seems like it will solve this need.

Thanks!

No reaction on non-used system

I think there is a logical problem with the behavior of default config. I have tried to deploy this code on system with no traffic at this point. The reaction of default config.js was to do nothing since all metrics returned 0 reads and 0 writes.

Maybe I am doing something wrong, but it seems like I cannot make this code to react on such system. I configured one table with read and writes values equals 10 and created 2 additional ones with default value (5 for both). Still, nothing...

Odd logging message & failure to increase throughput capacity

I'm getting the following types of cloudwatch logs and inability to increase Read/Write Capacity past 100:

{{tablename}} is consuming 96.725 of 100 (96.725%) read capacity units and is above maximum threshold of 75% so an increment is not required

Maybe I'm just reading this incorrectly but why does this condition not require an increment? I'm thinking that possibly the config I'm using has a 100 unit ceiling, but that's not quite reflected in this message.

Deploy via Cloudformation

Create a command npm run deploy which:

Build a CloudFormation template
Deploys template (create or update) with a predefined stack name into predefined region

Testing for the project

Hi there - We've started using this app in production but the amount of tests over it is starting to worry me. I think I'm going to spend some time writing some tests for this @tmitchel2. Just checking that you have not already begun writing tests.

Assuming you haven't, do you have any preference for tools or frameworks to use? I was thinking of writing some unit tests using Mocha/Chai and see where I end up.

Usage without forking / cloning

I'm considering using this, but put off by the need to fork / clone. What do people think about publishing a reusable versioned artifact to npm?

Provision drops to 1

Thank you for this excellent project!

I have only one big issue when i use this package:

Do you have any idea how I can get around this issue?

Thanks!

Minimum capacity ignored

Setting the minimum write and read capacity has no effect.
Both get set to 1 if too little activity is detected no matter the setting.

{
"ReadCapacity": {
"Min": 5,
"Max": 500,
"Increment": {
"When": {
"UtilisationIsAbovePercent": 80
},
"By": {
"Units": 10
},
"To": {
"ConsumedPercent": 120
}
},
"Decrement": {
"When": {
"UtilisationIsBelowPercent": 30,
"AfterLastIncrementMinutes": 60,
"AfterLastDecrementMinutes": 60,
"UnitAdjustmentGreaterThan": 1
},
"To": {
"ConsumedPercent": 100
}
}
},
"WriteCapacity": {
"Min": 5,
"Max": 250,
"Increment": {
"When": {
"UtilisationIsAbovePercent": 80
},
"By": {
"Units": 10
},
"To": {
"ConsumedPercent": 120
}
},
"Decrement": {
"When": {
"UtilisationIsBelowPercent": 30,
"AfterLastIncrementMinutes": 60,
"AfterLastDecrementMinutes": 60,
"UnitAdjustmentGreaterThan": 1
},
"To": {
"ConsumedPercent": 100
}
}
}
}

0 read and write consumed capacity units

I'm running in eu-west-1, and after debugging the getTableUpdate I wondered why I got 0 consumed write capacity units while I was writing to it at quite a high rate.
I'm not observing any automatic increase and I have two tables.

"tableConsumedCapacityDescription": {
  "Table": {
    "TableName": "test",
    "ConsumedThroughput": {
      "ReadCapacityUnits": 0,
      "WriteCapacityUnits": 0
    },
    "GlobalSecondaryIndexes": []
  }
}

Install steps produce failing lambda

I followed the install instructions and that attached dist.zip was produced. This however fails on run with the following:

"errorMessage": "Cannot find module 'winston'",

dist.zip

before/after TotalMonthlyEstimatedCost ?

would it be possible to log both a before and after TotalMonthlyEstimatedCost? From what I can tell, the TotalMonthlyEstimatedCost that's logged currently appears to be "current state"

could we also assemble a projected state (since we can't query it post update since updates dont apply immediately) and estimate a cost off of that?

Seems like it would be useful to see the projected cost before/after any particular update.

Scale up not working correctly

On the screenshot, the increase was imposed manually, all the parameters are default except for the region (eu-west-1), and the max value (100).
I know the script is working to some extent because it scales down to 1, but it does not scale up properly.

Combine getmetricsstatistics into as few requests as possible

Configuration specific to env or tables

Hi @tmitchel2 First of all I would like to thank you for the great work. It's awesome!

I just configured this in my systems and I have few questions:

I have multiple envs(dev/qa/stg/uat) in one account and I want to run this on specific env rather than on all the tables in this account. What's the place to make those changes?
It's setting the value to 1, where can I modify those values?

GSI issue with CapacityCalculatorBase.js

getDimensions(tableName: string, globalSecondaryIndexName: ?string): Dimension[] {
if (globalSecondaryIndexName) {
return [
{ Name: 'TableName', Value: tableName},
{ Name: 'GlobalSecondaryIndex', Value: globalSecondaryIndexName}
];
}

return [ { Name: 'TableName', Value: tableName} ];

}
}

Name for the GSI Dimension is slightly off, should be GlobalSecondaryIndexName instead of GlobalSecondaryIndex. I haven't merged in your latest, but this bit is unchanged, and it was fixed when I changed it in my fork.

Decrement.By not working

Hello,

I'd like to scale a low number of dynamo-DB tables in a completely relative manner only setting boundaries by Min/Max Thresholds for read and write capacity.
The relative increment is working fine, but the relative decrement wouldn't simply decrement.
In the following I'd like to show u this just by changing definition of read-strategy.

Can you verify the issue or give me a hint if I am doing something wrong?

In (1) you can find the provisioner.json I intend to use. A run of it leads to the following result (recognize change wanted and allowed, but doing nothing):

node ./scripts/start.js
*** LAMBDA INIT ***
*** LAMBDA START ***
Getting table names
Getting table details
Getting table description testtable
Getting table consumed capacity description testtable
Getting table update request testtable
testtable is consuming 0 of 410 (0%) read capacity units
testtable is consuming 0 of 410 (0%) read capacity units so a decrement is WANTED and is ALLOWED
testtable is consuming 0 of 200 (0%) write capacity units
testtable is consuming 0 of 200 (0%) write capacity units
Getting required table update requests
No table updates required
{
"Index.handler": {
"mean": 242.57998418807983
},
"DynamoDB.listTablesAsync": {
"mean": 105.22508716583252
},
"DynamoDB.describeTableAsync": {
"mean": 73.80222463607788
},
"DynamoDB.describeTableConsumedCapacityAsync": {
"mean": 46.67149496078491
},
"CloudWatch.getMetricStatisticsAsync": {
"mean": 40.9902560710907
},
"TableUpdates": {
"count": 0
},
"TotalProvisionedThroughput": {
"ReadCapacityUnits": 410,
"WriteCapacityUnits": 200
}
"TotalMonthlyEstimatedCost": 131.976
}
*** LAMBDA FINISH ***

If I extend the provisioner.json from (1) with an Decrement.To to:

... "Decrement": {
"When": {
"UtilisationIsBelowPercent": 50,
"AfterLastIncrementMinutes": 60,
"AfterLastDecrementMinutes": 60
},
"By": {
"ConsumedPercent": 50
},
"To": {
"Units": 200
}

I get an successful update on the table as you can see in the following.
The Decrement done is a decrement to the value from the Decrement.To(200) and not a relative (should be 205).
To be complete, if i remove the Decrement.By only having Decrement.To the run is also successful.

node ./scripts/start.js
*** LAMBDA INIT ***
*** LAMBDA START ***
Getting table names
Getting table details
Getting table description testtable
Getting table consumed capacity description testtable
Getting table update request testtable
testtable is consuming 0 of 410 (0%) read capacity units
testtable is consuming 0 of 410 (0%) read capacity units so a decrement is WANTED and is ALLOWED
testtable is consuming 0 of 200 (0%) write capacity units
testtable is consuming 0 of 200 (0%) write capacity units
Getting required table update requests
Updating tables
Updating table testtable
Updated table testtable
Updated tables
{
"Index.handler": {
"mean": 336.56548500061035
},
"DynamoDB.listTablesAsync": {
"mean": 114.78861999511719
},
"DynamoDB.describeTableAsync": {
"mean": 66.52075910568237
},
"DynamoDB.describeTableConsumedCapacityAsync": {
"mean": 56.809733867645264
},
"CloudWatch.getMetricStatisticsAsync": {
"mean": 48.26799559593201
},
"TableUpdates": {
"count": 1
},
"TotalProvisionedThroughput": {
"ReadCapacityUnits": 410,
"WriteCapacityUnits": 200
},
"TotalMonthlyEstimatedCost": 131.976
}

(1)

{
"ReadCapacity": {
"Min": 200,
"Max": 1000,
"Increment": {
"When": {
"UtilisationIsAbovePercent": 70
},
"By": {
"ConsumedPercent": 20
}
},
"Decrement": {
"When": {
"UtilisationIsBelowPercent": 50,
"AfterLastIncrementMinutes": 60,
"AfterLastDecrementMinutes": 60
},
"By": {
"ConsumedPercent": 50
}
}
},
"WriteCapacity": {
"Min": 200,
"Max": 500,
"Increment": {
"When": {
"UtilisationIsAbovePercent": 70
},
"By": {
"ConsumedPercent": 20
}
},
"Decrement": {
"When": {
"UtilisationIsBelowPercent": 30,
"AfterLastIncrementMinutes": 60,
"AfterLastDecrementMinutes": 60
},
"By": {
"ConsumedPercent": 50
}
}
}

Configuration per table group

I have multiple groups of tables each of which need a different set of configuration settings.

These tables all have similar stems. Is the best way to do this by modifying Provisioner.js to look for those specific tables and load configs for each?

Socket timeouts

I'm getting a socket timeout when running this with a large amount of dynamo tables. I suspect its because of the large amount of connections needed at once.

Maybe a thread limit could help? Along with #46 possibly.

Fix CapacityCalculatorBase so it calls getStatisticSettings

Update packages

I just ran npm install and I got these warnings.

npm WARN deprecated [email protected]: graceful-fs v3.0.0 and before will fail on node releases >= v7.0. Please update to graceful-fs@^4.0.0 as soon as possible. Use 'npm ls graceful-fs' to find it in the tree.
npm WARN deprecated [email protected]: lodash@<3.0.0 is no longer maintained. Upgrade to lodash@^4.0.0.
npm WARN deprecated [email protected]: graceful-fs v3.0.0 and before will fail on node releases >= v7.0. Please update to graceful-fs@^4.0.0 as soon as possible. Use 'npm ls graceful-fs' to find it in the tree.

Update readme to be inline with restructured codebase

Would be nice to have a --dryRun feature

So that I could test what my current configuration would do against the current table metrics without actually making changes...

Add throttled events logging

Decreasing write capacity was disallowed without any reasons, decreasing using Management Console worked

Hi,

at first thank's for your great project. I have already tried it a lot using different scaling policies.
Today the write capacity was not decreased although 0 of 100 provisioned write capacity units have been used. This is the CloudWatch log:
...is consuming 0 of 100 (0%) write capacity units and is below minimum threshold of 20% so a decrement is WANTED but is DISALLOWED
At exactly this time the read capacity has been decreased. Thus there have been enough decreases left for this calendar day. This is really confusing, so I started to debug.
I ended in the file src/Provisioner.js containing this line:

let isAdjustmentAllowed = isAfterLastDecreaseGracePeriod && isAfterLastIncreaseGracePeriod && isReadDecrementAllowed;

Why is only ReadDecrement considered? Why is there no "isWriteDecrementAllowed" using the function calculateDecrementedWriteCapacityValue?
Maybe it's bug and this explains why decreasing read capacity has been allowed whereas decreasing write capacity was denied.

You find my config DefaultProvisioner.js as an attachment.

Thank's for your help.

Best regards,
Chris
DefaultProvisioner.txt

[Feature Request] Tie adjustmentPercent to consumed capacity instead of provisioned capacity

In cases of rapid changes in utilization, it can be slow to respond as it is limited to a multiplier of the provisioned capacity. In cases where the consumed capacity significantly exceeds the provisioned capacity, it can take several steps to meet demand.

[Discussion] Strategy with fixed values.

In order to accommodate multiple strategies for different tables/indexes I am proposing the following strategy schema update for fixed values:

ReadCapacity.Fixed: Number
WriteCapacity.Fixed: Number

The Provisioner.js file would have to be adapted, since there is quite some conditional logic on Read/WriteCapacity.Increment and Read/WriteCapacity.Increment.When.

Currently the following strategy would have to be set when using fixed values for specific tables/indexes without causing any errors in Provisioner.js:

{
  "ReadCapacity": {
    "Min": 1,
    "Max": 1,
    "Increment":{
      "To": {
        "Units": 1
      },
      "When":{
      }
    },
    "Decrement":{
      "To": {
        "Units": 1
      },
      "When":{
      }
    }
  },
  "WriteCapacity": {
    "Min": 1,
    "Max": 1,
    "Increment":{
      "To": {
        "Units": 1
      },
      "When":{
      }
    },
    "Decrement":{
      "To": {
        "Units": 1
      },
      "When":{
      }
    }
  }
}

I'd be happy to add a pull request should this functionality be approved by more people/maintainer.

Missing script build in package.json

Following the instructions in the readme, when carrying out
npm run build
I'm getting this error
npm ERR! missing script: build
It looks like the package.json requires a build entry in the script section

the Right Way to extend getTableNamesAsync ?

as a non-developer - what's the right approach to extend getTableNamesAsync to treat it as config instead of code?

It seems like updating it in src/Provisioner.js gets the job done - but will make merging in upstream changes painful through the lifetime of support.

If there's a config oriented mechanism for extending this i'd love to hear about it and see some examples of it. exploring forks of the project mostly seems to point to just changing the code in Provisioner.js is the common solution (for branches based off something like the current version, anyway)

LimitExceededException: Subscriber limit exceeded: Only 10 tables can be created, updated, or deleted simultaneously

Issue with 10 or more tables to be updated. Tried to use something like mapLimit for

let capacityTasks = tableNames
      .map(async tableName => {

but i'm not familiar with the babeljs stuff.