jgilbert01 / aws-lambda-stream Goto Github PK
View Code? Open in Web Editor NEWCreate stream processors with AWS Lambda functions
License: MIT License
Create stream processors with AWS Lambda functions
License: MIT License
Hi,
I have 2 subsystems and want to share events between them both ways (sys1 shares and consumes events from sys2, sys2 shares and consumes events from sys1).
To save costs, I'd like to avoid creating separate ingress streams for the subsystems.
But it raises some questions. Let's look at sys1.
Both custom
and external
events will be routed to a single stream. Because of this, outgoing external
events, intended to be shared with sys2, will also end up on sys1's own stream.
This can easily create a circular situation where Egress ESG publishes/consumes its' own events in infinite loop.
So I'm thinking of a way to clearly distinguish external events inside Egress ESG listener. Like adding *-external
to event type (then removing it in Ingress ESG of target subsystem). Or adding some field similar to 'source': 'external' to the event and filtering on it.
Can you share your thoughts on this? Is it a valid question at all or am I entirely missing something?
Thanks.
Right now, using Limit
query parameter will result in multiple queries (each with stated Limit) due to cursor
implementation.
It makes impossible to use access patterns such as 'Fetch the most recent item'
{
..,
ScanIndexForward: false,
Limit: 1,
}
If we're on the same page I've submitted a PR with my solution, feel free to use it in any way you like.
this will give stream processors more control over versions, inclusion and function size
add trace support across event flow
food for thought at this point
When installing, I got a deprecation notice about uuid
:
Please upgrade to version 7 or higher. Older versions may use Math.random() in certain circumstances, which is known to be problematic. See https://v8.dev/blog/math-random for details.
Looks like there are performance optimizations in more recent versions than the 3.x
line as well.
Hi,
Here, query request to DataIndex contains ConsistentRead: true
which throws ValidationException: Consistent reads are not supported on global secondary indexes
. Looks like consistent reads are not supported on global secondary indexes, as per documentation.
fault events help us handle poison events and resubmit the failing events later. they also contain the uow which contains valuable information that can help diagnose the problem. however, this information may contain PII data, it can be very large and may have circular references.
this improvement will redact identified pii data, trim buffers and handle circular references.
it will leave the uow.record element intact to support resubmit. if the uow.record element has pii data then it should be encrypted at the source.
As it says in the title, the hyperlink in the readme that points to the highland.js website is old and now redirects to some other website. I think it should be updated to the latest website, which is https://caolan.github.io/highland/.
Describe the bug
I am sure this is an edge case, but I found this behavior when testing out the code.
The eventType provided in the RULES is matching with the wrong event.type when two names are similar but differ by only the last letter.
For example, when creating a rule with an eventType:
eventType: ['something-updated', 'something-deleted']
and passing an event into the handler with this type:
id: 'fcc12355-f339-4f71-bbed-eee646535bbb',
type: 'something-update',
It will match that rule though the event types are not the same.
To Reproduce
Steps to reproduce the behavior:
{
id: 'fcc12355-f339-4f71-bbed-eee646535bbb',
type: 'something-update',
}
{
id: 'p4',
flavor: materialize,
eventType: ['something-update**d'**, 'something-deleted'],
toUpdateRequest
}
export const handler = async (event) =>
initialize(PIPELINES, OPTIONS)
.assemble(fromKinesis(event))
.through(toPromise)
Expected behavior
The event.type in the event body should only match with the eventType in the RULES when the match is exact.
Can we do it? Similar to evaluate
flavor. I can create a PR.
upd: or same thing in materialize
(multiple updateRequests) - would also do the trick for me. Or both :)
If you ask me, I'll do it for both flavors, to have this option just in case, not to re-create whole flavor in the service just for this thing.
Query caching right now is very coarse, at the entire query level. This could be enhanced to support item level caching for key based queries, etc.
https://hackernoon.com/lambda-optimization-tip-enable-http-keep-alive-6dc503f6f114
add utils/agent.js
const sslAgent = require('https').Agent({
keepAlive: true, // this is new
maxSockets: Number(process.env.MAX_SOCKETS) || 50, // from the aws-sdk source code
rejectUnauthorized: true, // from the aws-sdk source code
});
sslAgent.setMaxListeners(0); // from the aws-sdk source code
aws.config.update({
httpOptions: {
agent: sslAgent,
},
});
Hi!
What is your thoughts on changing DataIndex {pk: data, sk: timestamp}
to more generic GSI1 {pk: GSI1pk, sk: GSI1sk}
?
This will enable the usage of this GSI with any imaginable use case (where data
and timestamp
is not what we want to index). Also this naming is used in Rick Houlihan's talks and Alex DeBrie's "The DynamoDB Book" so it is more familiar to people.
Hi, thanks for providing this library it has been very helpful.
I'm just getting started with EventBridge, but I'm not sure I am following exactly the assumptions around fromEventBridge
in this library. In fromEventBridge
the detail
is assumed to be a stringified JSON object. However, 1) in partner events I receive, this is rather a JSON object (part of the parent object) which throws a parse exception; and 2) the line above it seems to assume it is already parsed (or is an object) to access the detail.id
attribute which breaks in any use case I have come across.
See here:
aws-lambda-stream/src/from/eventbridge.js
Lines 14 to 17 in 1868379
and here in the test:
aws-lambda-stream/src/from/eventbridge.js
Lines 29 to 30 in 0281626
Why is this assumed to be a string needing to be parsed in this case? Just want to be sure I am not missing anything. If the reasoning is based on internal library defaults, maybe it could support either way (object or string) to be more flexible. Happy to open a PR if that is the case.
Thanks again
Hey John,
Can you enable Github Discussions for this repo so that we can have a place to discuss concepts discussed in your books that aren't exactly the Issues with this lib? What do you think?
Thanks
export * from './dynamodb';
export * from './eventbridge';
export * from './kinesis';
The s3.js
, sns.js
and sqs.js
are missing!
I thought when providing an array of strings it should filter on exact match
But:
const rule = { eventType: ['specific-thing-changed'] }
const uow = { event: { type: 'thing-changed' } }
expect(filterOnEventType(rule, uow)).toBe(false) <--- will fail
This is due to this check
aws-lambda-stream/src/filters/event.js
Line 10 in c678b09
I think it should be:
...
return rule.eventType.includes(uow.event.type);
...
CDC flavor
is intended to work with ENTITY table. But query
util, that is used to make a request in the flavor, has EVENT_TABLE_NAME set as tableName
by default
aws-lambda-stream/src/utils/dynamodb.js
Line 118 in 3f2c5c4
tableName
to query
via rule
, but maybe set it to ENTITY_TABLE_NAME in the flavor by default?aws-lambda-stream/src/flavors/cdc.js
Line 18 in 3f2c5c4
.through(query({tableName: process.env.ENTITY_TABLE_NAME, ...rule}))
optional feature
when an event is too big to publish to EB, Kinesis, etc
put to bucket in event hub
publish with s3 link
following the format expected in the from functions
see PR add-claim-check-support
short ttl on claim check bucket
es and s3 event lakes should pull in payload from s3 claim check bucket
Hi John,
Could you please clarify this thing to me.
I have a situation where I need to do some job reacting to entity data change in the same service.
I see 2 main options:
The first one uses familiar patterns and feels less 'custom'. But there's some doubt about publishing/listening to own events all across event hub.
The second does everything locally but brings some complexity because multiple trigger pipelines react to same event making tests a little less straight forward.
Both seems viable to me but I like first a little bit more.
Can you please drop couple of words which one you think is best and why.
Thanks!
add connector and utils
check the aws-sdk err.retryable flag in the fault handling logic
First i'd just like to say this library looks amazing! Very cool. I have so many use cases it could serve.
I gave it a test for a dynamodb stream but got an exception from fromDynamodb(event)
The tables primary partition key is called id, and it does not have a sk or descriminator.
Unfortunatly migrating the data is not an option right now.
These 2 lines throw exceptions.
aws-lambda-stream/src/from/dynamodb.js
Line 49 in db17e6d
aws-lambda-stream/src/from/dynamodb.js
Line 26 in db17e6d
I'd happily make a PR but am not exactly sure what a reasonable approach is.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.