Comments (13)
Hello,
Thanks for the feedback. We have a new version of the streams-to-firehose forwarder that will support deaggregation of Kinesis Data written by Kinesis Producer Library, and I will ensure your points are addressed in that.
Regards,
Ian
from lambda-streams-to-firehose.
For the last error - it will be quite unwieldy to validate the Delivery Stream Name rather than letting the PutRecordBatch request do this, so I expect this will stay the same.
Thx,
Ian
from lambda-streams-to-firehose.
Thanks for your very quick response.
To be clear, I was only suggesting some changes to the documentation in the README file, not any code changes. The code works great as it stands and I have no problems there. The issue for me was getting the code successfully running required a few hints that might help others in a similar situation get up and running more smoothly.
Do you want to close this issue or leave it open?
from lambda-streams-to-firehose.
I will close this, because I believe it is being properly addressed. I have found that the nodejs code performance is too slow for our use and have migrated to using some python code based on the python kinesis example. Performance is nearly 10x better. Thanks for the useful repository example, however, in getting the project started!
from lambda-streams-to-firehose.
I'd love to hear about the performance metrics you observed, and if your Python code is functionally the same or if you have a model more tailored to your specific data?
from lambda-streams-to-firehose.
Our code is a drop-in replacement for yours. The only "customisation" is that our records are text so we batch separate lines into single records with newline delimiters for Firehose. It would not work for binary data at present. We also did not implement any features around formatting the records. It is about 30 lines of python based on the python kinesis lambda example in the AWS console. We rewrote it in Python because that is a native language for us, and I was unable to figure out where the potential bottleneck was in the nodejs code.
With your nodejs lambda function, we started with a Kinesis input batch size of 5000 but quickly ran into the 4MB firehose output batch size limit. We dropped the Kinesis batch size to 2000 and had several hundred invocation errors per hour (60 second timeouts!). Execution times were averaging 10-20 seconds, and memory utilisation was between 150MB-200MB. More concerning was that the Kinesis iterator age hovered in the 70K-85K second (one day) range. We were never able to catch up with the incoming 20K-40K records per second flow into Kinesis.
Using our python lambda function and a batch size of 2000, we were able to consume a whole day of Kinesis messages in several hours, as well as keep current with incoming flows. The iterator age hovers around 20-30 seconds, which is very acceptable. Execution times are between 1 and 2 seconds using 45MB-50MB memory. Occasionally we get timeouts at 10s, perhaps a few invocation errors per hour at peak. We have seen between 1K-2K invocations per second with 900-2000 records per invocation.
It will take us several weeks to finalise and document and get approvals for public distribution and open source. I will let you know when we do.
from lambda-streams-to-firehose.
Thanks for your note. I've done some testing and found a small issue with the batch sizing calculation which has now been improved, but other than that the performance looks to be ~1 second Firehose writes for batches of 500 records @ 10k. If you want to revisit testing this, consider increasing the amount of RAM, and feel free to drop me a note on [email protected] to further investigate.
from lambda-streams-to-firehose.
Once again, I appreciate your rapid responses. I am currently experiencing the following error with the kinesis test event:
{ "Records": [ { "eventID": "shardId-000000000000:49545115243490985018280067714973144582180062593244200961", "eventVersion": "1.0", "kinesis": { "partitionKey": "partitionKey-3", "data": "SGVsbG8sIHRoaXMgaXMgYSB0ZXN0IDEyMy4=", "kinesisSchemaVersion": "1.0", "sequenceNumber": "49545115243490985018280067714973144582180062593244200961" }, "invokeIdentityArn": "arn:aws:iam::EXAMPLE", "eventName": "aws:kinesis:record", "eventSourceARN": "arn:aws:kinesis:us-east-1:11111111:stream/my-stream", "eventSource": "aws:kinesis", "awsRegion": "us-east-1" } ] }
Returns:
START RequestId: 6f0ae07d-c5e3-11e5-8be9-13f0adb67535 Version: $LATEST 2016-01-28T17:20:32.023Z 6f0ae07d-c5e3-11e5-8be9-13f0adb67535 TypeError: Argument must be a string at Object.exports.getBatchRanges (/var/task/index.js:149:23) at Object.exports.processTransformedRecords (/var/task/index.js:231:25) at /var/task/index.js:369:15 at /var/task/node_modules/async/lib/async.js:52:16 at /var/task/node_modules/async/lib/async.js:363:13 at /var/task/node_modules/async/lib/async.js:52:16 at done (/var/task/node_modules/async/lib/async.js:248:21) at /var/task/node_modules/async/lib/async.js:44:16 at /var/task/node_modules/async/lib/async.js:360:17 at /var/task/index.js:356:10 END RequestId: 6f0ae07d-c5e3-11e5-8be9-13f0adb67535 REPORT RequestId: 6f0ae07d-c5e3-11e5-8be9-13f0adb67535 Duration: 427.46 ms Billed Duration: 500 ms Memory Size: 512 MB Max Memory Used: 36 MB Process exited before completing request
And, just in terms of testing, DynamoDB test events do the same. I am investigating if I'm doing anything wrong.
from lambda-streams-to-firehose.
Sorry for the formatting problems above. I am using version 1.2.0 with the zip file located in dist in this repository.
from lambda-streams-to-firehose.
Sorry about that - please resync - there was a bug in the Buffer deserialisation after transformation that my tests missed.
from lambda-streams-to-firehose.
Amazing: your latest code is now comparable to the python rewrite. Plus it seems you have added deaggregation (http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-kpl-consumer-deaggregation.html) which we were going to implement. I will test that and respond back when I have some more findings.
For now, we are having a lot of success with unaggregated records (we disabled aggregation). Memory utilization for batches of 2000 records is about 54MB. Execution times are between 200 and 600 ms which is very good. I think we have a winner!
Many, many thanks for your help. Writing the python lambda function was educational but now we do not have to support that codebase.
from lambda-streams-to-firehose.
That's great - really glad to hear it.
from lambda-streams-to-firehose.
Further testing with deaggregation shows that code works (it was not working in 1.1.0). We are now using this lambda function in the POC. We are taking 1K aggregated records per second from kinesis and pushing 3K new-line-joined records per second into firehose. Total bytes to S3 is about 3.5MB/s. Many thanks.
from lambda-streams-to-firehose.
Related Issues (20)
- Problem with a Cloudwatch Logs Destination to Kinesis stream? HOT 4
- exports.onCompletion is not a function HOT 2
- Small amount of event not streamed without errors HOT 5
- QUESTION: What is the value of using Firehose with DynamoDb streams vs just writing directly to S3? HOT 2
- Question about lambda - error handling HOT 4
- confusing data being written to S3 from the firehose HOT 3
- Firehose delivery missing? HOT 11
- README and Downloads have incorrect version reference HOT 2
- Exceeding 4MB limit HOT 7
- myregex example HOT 5
- Question around boolean `online` variable HOT 5
- init does not call callback on subsequent runs HOT 2
- Compression support? HOT 2
- Question on how to configure function for DDB Stream delivery to S3 HOT 5
- Filtering by event type
- This operation is not permitted on KinesisStreamAsSource delivery stream type. HOT 3
- Add to awesome-functions HOT 1
- AWS Lambda not writing cross AWS account HOT 2
- Support for filtering by event types HOT 5
- Is there any python code for this functionality HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lambda-streams-to-firehose.