threatresponse / python-lambda-inspector Goto Github PK
View Code? Open in Web Editor NEWA profiler for the lambda sandbox.
License: MIT License
A profiler for the lambda sandbox.
License: MIT License
After data gather post final output to the API for analysis
arn:aws:iam::576309420438:role/lambda_write_s3 is the appropriate policy
This is a low priority. I just want to capture it.
Since we have a custom policy for execution we should give users a way to provision that themselves.
This code:
sanitize_envvars = {
"AWS_SESSION_TOKEN":
{
"func": truncate, "args": [], "kwargs": {'end': 12}
},
"AWS_SECURITY_TOKEN":
{
"func": truncate, "args": [], "kwargs": {'end': 12}
},
"AWS_ACCESS_KEY_ID":
{
"func": truncate, "args": [], "kwargs": {'end': 12}
},
"AWS_SECRET_ACCESS_KEY":
{
"func": truncate, "args": [], "kwargs": {'end': 12}
}
}
Seems to actually be removing values from the actual environment instead of just from the dict. When this code is called it results in the function no longer being able to upload to S3. Lambda outputs error:
The provided token is malformed or otherwise invalid.
If fs is read-only handle and throw exception that is_warm can not be tested.
Request code needs a bit of exception handling to fallback to s3 storage if it can't get to the API
{
"stackTrace": [
[
"/var/task/main.py",
163,
"lambda_handler",
"api_call = store_results(res)"
],
[
"/var/task/main.py",
150,
"store_results",
"response = urllib2.urlopen(req)"
],
[
"/usr/lib64/python2.7/urllib2.py",
154,
"urlopen",
"return opener.open(url, data, timeout)"
],
[
"/usr/lib64/python2.7/urllib2.py",
435,
"open",
"response = meth(req, response)"
],
[
"/usr/lib64/python2.7/urllib2.py",
548,
"http_response",
"'http', request, response, code, msg, hdrs)"
],
[
"/usr/lib64/python2.7/urllib2.py",
473,
"error",
"return self._call_chain(*args)"
],
[
"/usr/lib64/python2.7/urllib2.py",
407,
"_call_chain",
"result = func(*args)"
],
[
"/usr/lib64/python2.7/urllib2.py",
556,
"http_error_default",
"raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)"
]
],
"errorType": "HTTPError",
"errorMessage": "HTTP Error 504: G
It’d be great to get find /var/task
in this profiler output if you’re releasing it as a general purpose tool.
@Miserlou : just capturing this in an issue
ps output empty in output dict.
TypeError: datetime.datetime(2017, 3, 6, 19, 13, 16) is not JSON serializable occurs in python for warm_since and warm_for
Let's move them all to same format.
Azure instances appear to have a writable tmp location. Basically the approach would be:
Detect OS is windows... we do this in main.py
Write your warm file to D:\local\Temp
Bazinga!
Here's a snippet that works against my local dev.
def store_results_api(res):
"""Store Results via the API Component.
Store results either in urllib2 or directly in s3 if lambda.
HTTP request will be a POST instead of a GET when the data
parameter is provided.
"""
data = json.dumps(res)
api_key = '4KGRb4PMlx1bBLZQ'
headers = {
"Authorization": "Basic %s" % api_key,
'Content-Type': 'application/json'
}
req = urllib2.Request(
'http://localhost:5000/api/profile',
data=data,
headers=headers
)
try:
response = urllib2.urlopen(req)
return response.read()
except Exception as e:
raise e
Each run should also log a date time in epoch as part of the json
Add which runtime we're evaling as static but present in the output
Example:
{
runtime: python | c-sharp | javascript
}
Function should add sandbox type to the dictionary for improved elastic analysis.
For windows and linux friendliness we're going to have to make dicts a standard. Something like
{
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
"writeable": "true"
}
},
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
"writeable": "true"
}
}
#"warm_since": is_warm.warm_since, # Issues with dynamo types
#"warm_for": is_warm.warm_for, # Issues with dynamo types
#"dmesg": get_dmesg, # Issues with dynamo types
Error is:
TypeError: Float types are not supported. Use Decimal types instead.
ClientError: An error occurred (ValidationException) when calling the PutItem operation: One or more parameter values were invalid: An AttributeValue may not contain an empty string
Redact by truncating to the first 12 characters or so the following fields gathered from get_env()
"AWS_SESSION_TOKEN":
"AWS_SECURITY_TOKEN":
"AWS_ACCESS_KEY_ID":
"AWS_SECRET_ACCESS_KEY":
profilers/utils.py
contains an os.popen
call which which was deprecated in python2.6.
We should consider replacing the os.popen
call with the subprocess
module down the line to support profiling python 3 runtimes.
This is not a requirement for the MVP but should be tracked for once the MVP is complete.
I thought it would be easy to do this in fluentd ... turns out it's not that easy without writing a fluentd plugin.
Basically bottom line is that if we take the output of CPUInfo and tokenize it in python as a dict it will parse to fields automatically in elastic.
Turns out lambda functions don't get any access to the internet without the presence of a cost prohibitive NAT gateway. This means that lambda functions running inside of the ThreatResponse AWS account will need to POST their results in a different way than runtimes out in the wild.
Potential options are:
1. Profiler writes a {uuid.hex()}.json.gz file directly to S3 the same way the API does
2. Profiler writes to dynamo and we pick that up somewhere else ( seems unnecessary )
3. Deploy NAT gateway. ( Not cost effective ).
4. Deploy the Lambda in the same VPC as the API box and point it directly at the API instead.
So I'm sure that you've gathered option 1 is preferred. It's just a matter of writing a little logic that only does the S3 upload if you're running from within a lambda function. We'll still need the urllib2.Request method in the python profiler that @jeffbryner wrote. Oddly it doesn't actually cause the function to fail. Simply nothing ever happens...
Option 4. isn't a bad choice either but has implications for if / when we want to go multi-region and puts heavier requirements on the CI/CD pipeline to attach things.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.