The aws-lambda-stream from jgilbert01

Event sharing between subsystems without separate ingress streams

Hi,

I have 2 subsystems and want to share events between them both ways (sys1 shares and consumes events from sys2, sys2 shares and consumes events from sys1).

To save costs, I'd like to avoid creating separate ingress streams for the subsystems.
But it raises some questions. Let's look at sys1.

Both custom and external events will be routed to a single stream. Because of this, outgoing external events, intended to be shared with sys2, will also end up on sys1's own stream.

This can easily create a circular situation where Egress ESG publishes/consumes its' own events in infinite loop.

So I'm thinking of a way to clearly distinguish external events inside Egress ESG listener. Like adding *-external to event type (then removing it in Ingress ESG of target subsystem). Or adding some field similar to 'source': 'external' to the event and filtering on it.

Can you share your thoughts on this? Is it a valid question at all or am I entirely missing something?

Thanks.

add sns support

connector
fromSns

`DynamoDBConnector#query` - support `Limit` parameter

Right now, using Limit query parameter will result in multiple queries (each with stated Limit) due to cursor implementation.

It makes impossible to use access patterns such as 'Fetch the most recent item'

{   
    .., 
    ScanIndexForward: false,
    Limit: 1,
}

If we're on the same page I've submitted a PR with my solution, feel free to use it in any way you like.

move dependencies to peer

this will give stream processors more control over versions, inclusion and function size

add encryption support

DONE - decryptEvent
encryptData
decryptData
DONE - encryptEvent

add calculate pipeline flavor

add trace support

add trace support across event flow

food for thought at this point

add optional util and connector support to initialize x-ray
include trace id in events as a tag or special header
define standard field to store trace id in data stores like dynamodb and s3 records so the from functions can make them available to pipelines
update sinks and queries to pass trace id to sdk calls per pipeline/uow

add s3 support

connector
fromS3

add correlate flavor

correlate events in a micro event store
add dynamodb connector get

deprecation notice - uuid

When installing, I got a deprecation notice about uuid:

Please upgrade to version 7 or higher. Older versions may use Math.random() in certain circumstances, which is known to be problematic. See https://v8.dev/blog/math-random for details.

Looks like there are performance optimizations in more recent versions than the 3.x line as well.

add aws-sdk v3 support

[evaluate flavor] Consistent reads are not supported on global secondary indexes

Hi,
Here, query request to DataIndex contains ConsistentRead: true which throws ValidationException: Consistent reads are not supported on global secondary indexes. Looks like consistent reads are not supported on global secondary indexes, as per documentation.

redact fault events

fault events help us handle poison events and resubmit the failing events later. they also contain the uow which contains valuable information that can help diagnose the problem. however, this information may contain PII data, it can be very large and may have circular references.

this improvement will redact identified pii data, trim buffers and handle circular references.

it will leave the uow.record element intact to support resubmit. if the uow.record element has pii data then it should be encrypted at the source.

Link to highland.js in readme is broken

As it says in the title, the hyperlink in the readme that points to the highland.js website is old and now redirects to some other website. I think it should be updated to the latest website, which is https://caolan.github.io/highland/.

Fix: eventType passed in as array to RULES matches with wrong event

Describe the bug

I am sure this is an edge case, but I found this behavior when testing out the code.

The eventType provided in the RULES is matching with the wrong event.type when two names are similar but differ by only the last letter.

For example, when creating a rule with an eventType:

eventType: ['something-updated', 'something-deleted']

and passing an event into the handler with this type:

      id: 'fcc12355-f339-4f71-bbed-eee646535bbb',
      type: 'something-update',

It will match that rule though the event types are not the same.

To Reproduce
Steps to reproduce the behavior:

Create an event:

   {
      id: 'fcc12355-f339-4f71-bbed-eee646535bbb',
      type: 'something-update',
   }

Create a rule similar to the type in the event, but with an extra letter at the end, like so:

{
      id: 'p4',
      flavor: materialize,
      eventType: ['something-update**d'**, 'something-deleted'],
      toUpdateRequest
   }

Pass event as a kinesis record into the handler:

export const handler = async (event) =>
   initialize(PIPELINES, OPTIONS)
      .assemble(fromKinesis(event))
      .through(toPromise)

You will see that the toUpdateRequest function is called

Expected behavior
The event.type in the event body should only match with the eventType in the RULES when the match is exact.

[cdc flavor] add option to emit multiple events

Can we do it? Similar to evaluate flavor. I can create a PR.

upd: or same thing in materialize (multiple updateRequests) - would also do the trick for me. Or both :)
If you ask me, I'll do it for both flavors, to have this option just in case, not to re-create whole flavor in the service just for this thing.

enhance query caching

Query caching right now is very coarse, at the entire query level. This could be enhanced to support item level caching for key based queries, etc.

enable http keep alive

https://hackernoon.com/lambda-optimization-tip-enable-http-keep-alive-6dc503f6f114

add utils/agent.js

const sslAgent = require('https').Agent({
keepAlive: true, // this is new
maxSockets: Number(process.env.MAX_SOCKETS) || 50, // from the aws-sdk source code
rejectUnauthorized: true, // from the aws-sdk source code
});

sslAgent.setMaxListeners(0); // from the aws-sdk source code

aws.config.update({
httpOptions: {
agent: sslAgent,
},
});

add template projects

add kafka support

connector
util - batch
flavor
from
peer dependency - https://www.npmjs.com/package/kafkajs
do not include in index files

[Proposal] modify 'DataIndex' naming convention

Hi!
What is your thoughts on changing DataIndex {pk: data, sk: timestamp} to more generic GSI1 {pk: GSI1pk, sk: GSI1sk}?
This will enable the usage of this GSI with any imaginable use case (where data and timestamp is not what we want to index). Also this naming is used in Rick Houlihan's talks and Alex DeBrie's "The DynamoDB Book" so it is more familiar to people.

add eb scheduler support

connector
util
flavor ?
from function ?

Clarification on `fromEventBridge` event `detail` assumption

Hi, thanks for providing this library it has been very helpful.

I'm just getting started with EventBridge, but I'm not sure I am following exactly the assumptions around fromEventBridge in this library. In fromEventBridge the detail is assumed to be a stringified JSON object. However, 1) in partner events I receive, this is rather a JSON object (part of the parent object) which throws a parse exception; and 2) the line above it seems to assume it is already parsed (or is an object) to access the detail.id attribute which breaks in any use case I have come across.

See here:

aws-lambda-stream/src/from/eventbridge.js

Lines 14 to 17 in 1868379

    
           event: { 
        
             id: uow.record.detail.id || uow.record.id, 
        
             ...JSON.parse(uow.record.detail), 
        
           },

and here in the test:

aws-lambda-stream/src/from/eventbridge.js

Lines 29 to 30 in 0281626

    
           'detail-type': event.type, 
        
           'detail': JSON.stringify(event),

Why is this assumed to be a string needing to be parsed in this case? Just want to be sure I am not missing anything. If the reasoning is based on internal library defaults, maybe it could support either way (object or string) to be more flexible. Happy to open a PR if that is the case.

Thanks again

Enable Discussions?

Hey John,

Can you enable Github Discussions for this repo so that we can have a place to discuss concepts discussed in your books that aren't exactly the Issues with this lib? What do you think?

Thanks

add expire flavor

produce an expire event when the ttl of an event in a micro event store expires
can be used to close a window in cep processing

Some 'src/from' modules don't get exported into library namespace

src/from/index.js:

export * from './dynamodb';
export * from './eventbridge';
export * from './kinesis';

The s3.js, sns.js and sqs.js are missing!

add sqs support

connector
fromSqs

add-batch-retry-support

filterOnEventType - unexpected behaviour handling array of strings

I thought when providing an array of strings it should filter on exact match
But:

const rule = { eventType: ['specific-thing-changed'] }
const uow = { event: { type: 'thing-changed' } }
expect(filterOnEventType(rule, uow)).toBe(false) <--- will fail

This is due to this check

aws-lambda-stream/src/filters/event.js

Line 10 in c678b09

return rule.eventType.join().indexOf(uow.event.type) > -1;

I think it should be:

...
return rule.eventType.includes(uow.event.type);
...

[cdc flavor] - queryRelated - tableName confusion (minor)

CDC flavor is intended to work with ENTITY table. But query util, that is used to make a request in the flavor, has EVENT_TABLE_NAME set as tableName by default

aws-lambda-stream/src/utils/dynamodb.js

Line 118 in 3f2c5c4

tableName = process.env.EVENT_TABLE_NAME,

Great that we can pass tableName to query via rule, but maybe set it to ENTITY_TABLE_NAME in the flavor by default?

aws-lambda-stream/src/flavors/cdc.js

Line 18 in 3f2c5c4

.through(query(rule))

change to:

.through(query({tableName: process.env.ENTITY_TABLE_NAME, ...rule}))

add-source-side-claim-check-support

optional feature

when an event is too big to publish to EB, Kinesis, etc
put to bucket in event hub
publish with s3 link
following the format expected in the from functions
see PR add-claim-check-support

short ttl on claim check bucket
es and s3 event lakes should pull in payload from s3 claim check bucket

Best practices on reacting to data change in same service?

Hi John,

Could you please clarify this thing to me.

I have a situation where I need to do some job reacting to entity data change in the same service.

I see 2 main options:

Entity data is changed
Trigger emits CDC event
Listener listens to own CDC event and does the job.

Entity data is changed
Trigger emits CDC event
Another trigger pipeline does the job on changed data.

The first one uses familiar patterns and feels less 'custom'. But there's some doubt about publishing/listening to own events all across event hub.

The second does everything locally but brings some complexity because multiple trigger pipelines react to same event making tests a little less straight forward.

Both seems viable to me but I like first a little bit more.

Can you please drop couple of words which one you think is best and why.

Thanks!