Coder Social home page Coder Social logo

Comments (13)

IanMeyers avatar IanMeyers commented on July 30, 2024

Hello,

Thanks for the feedback. We have a new version of the streams-to-firehose forwarder that will support deaggregation of Kinesis Data written by Kinesis Producer Library, and I will ensure your points are addressed in that.

Regards,

Ian

from lambda-streams-to-firehose.

IanMeyers avatar IanMeyers commented on July 30, 2024

For the last error - it will be quite unwieldy to validate the Delivery Stream Name rather than letting the PutRecordBatch request do this, so I expect this will stay the same.

Thx,

Ian

from lambda-streams-to-firehose.

t-pascal avatar t-pascal commented on July 30, 2024

Thanks for your very quick response.

To be clear, I was only suggesting some changes to the documentation in the README file, not any code changes. The code works great as it stands and I have no problems there. The issue for me was getting the code successfully running required a few hints that might help others in a similar situation get up and running more smoothly.

Do you want to close this issue or leave it open?

from lambda-streams-to-firehose.

t-pascal avatar t-pascal commented on July 30, 2024

I will close this, because I believe it is being properly addressed. I have found that the nodejs code performance is too slow for our use and have migrated to using some python code based on the python kinesis example. Performance is nearly 10x better. Thanks for the useful repository example, however, in getting the project started!

from lambda-streams-to-firehose.

IanMeyers avatar IanMeyers commented on July 30, 2024

I'd love to hear about the performance metrics you observed, and if your Python code is functionally the same or if you have a model more tailored to your specific data?

from lambda-streams-to-firehose.

t-pascal avatar t-pascal commented on July 30, 2024

Our code is a drop-in replacement for yours. The only "customisation" is that our records are text so we batch separate lines into single records with newline delimiters for Firehose. It would not work for binary data at present. We also did not implement any features around formatting the records. It is about 30 lines of python based on the python kinesis lambda example in the AWS console. We rewrote it in Python because that is a native language for us, and I was unable to figure out where the potential bottleneck was in the nodejs code.

With your nodejs lambda function, we started with a Kinesis input batch size of 5000 but quickly ran into the 4MB firehose output batch size limit. We dropped the Kinesis batch size to 2000 and had several hundred invocation errors per hour (60 second timeouts!). Execution times were averaging 10-20 seconds, and memory utilisation was between 150MB-200MB. More concerning was that the Kinesis iterator age hovered in the 70K-85K second (one day) range. We were never able to catch up with the incoming 20K-40K records per second flow into Kinesis.

Using our python lambda function and a batch size of 2000, we were able to consume a whole day of Kinesis messages in several hours, as well as keep current with incoming flows. The iterator age hovers around 20-30 seconds, which is very acceptable. Execution times are between 1 and 2 seconds using 45MB-50MB memory. Occasionally we get timeouts at 10s, perhaps a few invocation errors per hour at peak. We have seen between 1K-2K invocations per second with 900-2000 records per invocation.

It will take us several weeks to finalise and document and get approvals for public distribution and open source. I will let you know when we do.

from lambda-streams-to-firehose.

IanMeyers avatar IanMeyers commented on July 30, 2024

Thanks for your note. I've done some testing and found a small issue with the batch sizing calculation which has now been improved, but other than that the performance looks to be ~1 second Firehose writes for batches of 500 records @ 10k. If you want to revisit testing this, consider increasing the amount of RAM, and feel free to drop me a note on [email protected] to further investigate.

from lambda-streams-to-firehose.

t-pascal avatar t-pascal commented on July 30, 2024

Once again, I appreciate your rapid responses. I am currently experiencing the following error with the kinesis test event:

{ "Records": [ { "eventID": "shardId-000000000000:49545115243490985018280067714973144582180062593244200961", "eventVersion": "1.0", "kinesis": { "partitionKey": "partitionKey-3", "data": "SGVsbG8sIHRoaXMgaXMgYSB0ZXN0IDEyMy4=", "kinesisSchemaVersion": "1.0", "sequenceNumber": "49545115243490985018280067714973144582180062593244200961" }, "invokeIdentityArn": "arn:aws:iam::EXAMPLE", "eventName": "aws:kinesis:record", "eventSourceARN": "arn:aws:kinesis:us-east-1:11111111:stream/my-stream", "eventSource": "aws:kinesis", "awsRegion": "us-east-1" } ] }

Returns:

START RequestId: 6f0ae07d-c5e3-11e5-8be9-13f0adb67535 Version: $LATEST 2016-01-28T17:20:32.023Z 6f0ae07d-c5e3-11e5-8be9-13f0adb67535 TypeError: Argument must be a string at Object.exports.getBatchRanges (/var/task/index.js:149:23) at Object.exports.processTransformedRecords (/var/task/index.js:231:25) at /var/task/index.js:369:15 at /var/task/node_modules/async/lib/async.js:52:16 at /var/task/node_modules/async/lib/async.js:363:13 at /var/task/node_modules/async/lib/async.js:52:16 at done (/var/task/node_modules/async/lib/async.js:248:21) at /var/task/node_modules/async/lib/async.js:44:16 at /var/task/node_modules/async/lib/async.js:360:17 at /var/task/index.js:356:10 END RequestId: 6f0ae07d-c5e3-11e5-8be9-13f0adb67535 REPORT RequestId: 6f0ae07d-c5e3-11e5-8be9-13f0adb67535 Duration: 427.46 ms Billed Duration: 500 ms Memory Size: 512 MB Max Memory Used: 36 MB Process exited before completing request

And, just in terms of testing, DynamoDB test events do the same. I am investigating if I'm doing anything wrong.

from lambda-streams-to-firehose.

t-pascal avatar t-pascal commented on July 30, 2024

Sorry for the formatting problems above. I am using version 1.2.0 with the zip file located in dist in this repository.

from lambda-streams-to-firehose.

IanMeyers avatar IanMeyers commented on July 30, 2024

Sorry about that - please resync - there was a bug in the Buffer deserialisation after transformation that my tests missed.

from lambda-streams-to-firehose.

t-pascal avatar t-pascal commented on July 30, 2024

Amazing: your latest code is now comparable to the python rewrite. Plus it seems you have added deaggregation (http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-kpl-consumer-deaggregation.html) which we were going to implement. I will test that and respond back when I have some more findings.

For now, we are having a lot of success with unaggregated records (we disabled aggregation). Memory utilization for batches of 2000 records is about 54MB. Execution times are between 200 and 600 ms which is very good. I think we have a winner!

Many, many thanks for your help. Writing the python lambda function was educational but now we do not have to support that codebase.

from lambda-streams-to-firehose.

IanMeyers avatar IanMeyers commented on July 30, 2024

That's great - really glad to hear it.

from lambda-streams-to-firehose.

t-pascal avatar t-pascal commented on July 30, 2024

Further testing with deaggregation shows that code works (it was not working in 1.1.0). We are now using this lambda function in the POC. We are taking 1K aggregated records per second from kinesis and pushing 3K new-line-joined records per second into firehose. Total bytes to S3 is about 3.5MB/s. Many thanks.

from lambda-streams-to-firehose.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.