ptdavies17 / cloudwatchfh2hec Goto Github PK
View Code? Open in Web Editor NEWCloudwatch Logs Transform for Firehose: formats into Splunk HEC Event
License: Other
Cloudwatch Logs Transform for Firehose: formats into Splunk HEC Event
License: Other
I'm getting an error when lambda tries to run the codes after invoking it. Can someone help me:
Not a gzipped file: IOError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 231, in handler
records = list(processRecords(event['records'],streamARN))
File "/var/task/lambda_function.py", line 115, in processRecords
data = json.loads(f.read())
File "/usr/lib64/python2.7/gzip.py", line 260, in read
self._read(readsize)
File "/usr/lib64/python2.7/gzip.py", line 302, in _read
self._read_gzip_header()
File "/usr/lib64/python2.7/gzip.py", line 196, in _read_gzip_header
raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file
That is the error I'm seeing on CloudWatch. Maybe I need to change something in script?
Hi .. can I query, what is the thinking behind this function using the batch re-ingestion to return the modified records to the firehose stream ?
from this blog, it seems that transformed records can simply be returned from the lambda
https://aws.amazon.com/blogs/compute/amazon-kinesis-firehose-data-transformation-with-aws-lambda/
which I've tested and seems to work ..
RE: How to Ingest Any Log from AWS Cloudwatch Logs via Firehose
I was wondering how to best use CloudwatchFH2HEC.py to ship other log sourcestypes besides VPC and cloudtrail logs (the only two sourcestypes defined in the example script). Which of the approaches below would do you recommend if any? Ideally I could use the same transform function for all Firehose to HEC log shipping.
SPLUNK_SOURCETYPE=aws:firehose:json
Alternatively I could create separate lambda functions for each sourcetype and pass different values for SPLUNK_SOURCETYPE
in the environment variable configuration... but that feels like an anti-pattern.
List of example sourcetypes/use-cases from cloudwatch logs
I might have missed some settings during configuration and I'm ending up with multiple values for host/source/sourcetype
host = http-inputs-company.splunkcloud.com host = arn:aws:firehose:us-east-1:123456789:deliverystream/splunk
source = http:HEC source = Destination:/aws/lambda/mylanbda
sourcetype = httpevent sourcetype = aws:cloudwatchlogs:lambda
Above prevents using host/source/sourcetype search terms (as it doesn't match multiple values?)
Is this the expected behaviour?
There are a couple of issues with the new Python 3 version of Cloudwatch2FH2HEC.py. import os
was removed even though os
is still used. Also, the return value of transformLogEvent
was changed such that the return_message
variable is now unused.
Hi there,
Thanks for your work on putting this script together it's helping us hugely!
We are seeing an error (where we are processing 3-4000 records of various sizes) where we are presumably trying to send too much data to the Firehose (metrics seem to show it throttles on bytes per second limit being hit) and hitting the limits and doing that 20 times therefore the function is erroring out.
We see lots of "Some records failed while calling PutRecordBatch to Firehose stream, retrying. Individual error codes: ServiceUnavailableException," in the CLoudWatch logs for the function.
Sometimes it seems to get through eventually, but sometimes it hits the 20 retries limit and errors with:
"[ERROR] RuntimeError: Could not put records after 20 attempts. Individual error codes: ServiceUnavailableException"
Looking online it seems to general fix for such issues is to implement a back-off and retry process as per: https://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecordBatch.html and https://docs.aws.amazon.com/general/latest/gr/api-retries.html.
I was planning on implementing this into your code, but before doing so wondered if there was an easier fix you knew of?
Thanks in advance
Is it required to use a classic load balancer? We were unable to get an ALB to work (sticky session error) however the classic load balancer did work.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.