ptdavies17 / cloudwatchfh2hec Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 17.0 584 KB

Cloudwatch Logs Transform for Firehose: formats into Splunk HEC Event

License: Other

Python 100.00%

cloudwatchfh2hec's People

Contributors

Stargazers

Watchers

Forkers

avillach a007928 subodhp pauld-splunk ashwinhs bk-byte chimajdev minhhungtrinh dicondur jwatson3d pyennamp chnagesh-deloitte david-nickerson-bhnetwork aseefahmed animetauren buzman92 fprzychodni-splunk

cloudwatchfh2hec's Issues

gzip file error

I'm getting an error when lambda tries to run the codes after invoking it. Can someone help me:

Not a gzipped file: IOError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 231, in handler
records = list(processRecords(event['records'],streamARN))
File "/var/task/lambda_function.py", line 115, in processRecords
data = json.loads(f.read())
File "/usr/lib64/python2.7/gzip.py", line 260, in read
self._read(readsize)
File "/usr/lib64/python2.7/gzip.py", line 302, in _read
self._read_gzip_header()
File "/usr/lib64/python2.7/gzip.py", line 196, in _read_gzip_header
raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file

That is the error I'm seeing on CloudWatch. Maybe I need to change something in script?

Why use re-ingestion ?

Hi .. can I query, what is the thinking behind this function using the batch re-ingestion to return the modified records to the firehose stream ?
from this blog, it seems that transformed records can simply be returned from the lambda
https://aws.amazon.com/blogs/compute/amazon-kinesis-firehose-data-transformation-with-aws-lambda/

which I've tested and seems to work ..

Proper way to support non VPC and cloudtrail sourcetypes

RE: How to Ingest Any Log from AWS Cloudwatch Logs via Firehose

I was wondering how to best use CloudwatchFH2HEC.py to ship other log sourcestypes besides VPC and cloudtrail logs (the only two sourcestypes defined in the example script). Which of the approaches below would do you recommend if any? Ideally I could use the same transform function for all Firehose to HEC log shipping.

Add a case statement to match additional cloudwatch log group names to their destination sourcetypes
don't set the sourcetypes at all and let Splunk handle it somehow
set SPLUNK_SOURCETYPE=aws:firehose:json

Alternatively I could create separate lambda functions for each sourcetype and pass different values for SPLUNK_SOURCETYPE in the environment variable configuration... but that feels like an anti-pattern.

List of example sourcetypes/use-cases from cloudwatch logs

multiple host/source/sourcetype values being set

I might have missed some settings during configuration and I'm ending up with multiple values for host/source/sourcetype

    host =	http-inputs-company.splunkcloud.com	host = arn:aws:firehose:us-east-1:123456789:deliverystream/splunk	
    source = http:HEC	source =	Destination:/aws/lambda/mylanbda	
    sourcetype = httpevent	sourcetype =	aws:cloudwatchlogs:lambda

Above prevents using host/source/sourcetype search terms (as it doesn't match multiple values?)

Is this the expected behaviour?

New py3 version is missing import, returns wrong string

There are a couple of issues with the new Python 3 version of Cloudwatch2FH2HEC.py. import os was removed even though os is still used. Also, the return value of transformLogEvent was changed such that the return_message variable is now unused.

Some records failed while calling PutRecordBatch to Firehose stream, retrying. Individual error codes: ServiceUnavailableException

Hi there,
Thanks for your work on putting this script together it's helping us hugely!
We are seeing an error (where we are processing 3-4000 records of various sizes) where we are presumably trying to send too much data to the Firehose (metrics seem to show it throttles on bytes per second limit being hit) and hitting the limits and doing that 20 times therefore the function is erroring out.

We see lots of "Some records failed while calling PutRecordBatch to Firehose stream, retrying. Individual error codes: ServiceUnavailableException," in the CLoudWatch logs for the function.

Sometimes it seems to get through eventually, but sometimes it hits the 20 retries limit and errors with:

"[ERROR] RuntimeError: Could not put records after 20 attempts. Individual error codes: ServiceUnavailableException"

Looking online it seems to general fix for such issues is to implement a back-off and retry process as per: https://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecordBatch.html and https://docs.aws.amazon.com/general/latest/gr/api-retries.html.

I was planning on implementing this into your code, but before doing so wondered if there was an easier fix you knew of?

Thanks in advance

Application Load Balancer

Is it required to use a classic load balancer? We were unable to get an ALB to work (sticky session error) however the classic load balancer did work.

Thanks

ptdavies17 / cloudwatchfh2hec Goto Github PK

cloudwatchfh2hec's People

Contributors

Stargazers

Watchers

Forkers

cloudwatchfh2hec's Issues

gzip file error

Why use re-ingestion ?

Proper way to support non VPC and cloudtrail sourcetypes

multiple host/source/sourcetype values being set

New py3 version is missing import, returns wrong string

Some records failed while calling PutRecordBatch to Firehose stream, retrying. Individual error codes: ServiceUnavailableException

Application Load Balancer

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent