- Graham Jones
- Andrew Krug
From the project root zip lambda-inspector.zip *.py
Sample output is contained in the sample directory. (Slightly redacted.)
A profiler for the lambda sandbox.
License: MIT License
For windows and linux friendliness we're going to have to make dicts a standard. Something like
{
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
"writeable": "true"
}
},
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
"writeable": "true"
}
}
profilers/utils.py
contains an os.popen
call which which was deprecated in python2.6.
We should consider replacing the os.popen
call with the subprocess
module down the line to support profiling python 3 runtimes.
This is not a requirement for the MVP but should be tracked for once the MVP is complete.
Redact by truncating to the first 12 characters or so the following fields gathered from get_env()
"AWS_SESSION_TOKEN":
"AWS_SECURITY_TOKEN":
"AWS_ACCESS_KEY_ID":
"AWS_SECRET_ACCESS_KEY":
Function should add sandbox type to the dictionary for improved elastic analysis.
Request code needs a bit of exception handling to fallback to s3 storage if it can't get to the API
{
"stackTrace": [
[
"/var/task/main.py",
163,
"lambda_handler",
"api_call = store_results(res)"
],
[
"/var/task/main.py",
150,
"store_results",
"response = urllib2.urlopen(req)"
],
[
"/usr/lib64/python2.7/urllib2.py",
154,
"urlopen",
"return opener.open(url, data, timeout)"
],
[
"/usr/lib64/python2.7/urllib2.py",
435,
"open",
"response = meth(req, response)"
],
[
"/usr/lib64/python2.7/urllib2.py",
548,
"http_response",
"'http', request, response, code, msg, hdrs)"
],
[
"/usr/lib64/python2.7/urllib2.py",
473,
"error",
"return self._call_chain(*args)"
],
[
"/usr/lib64/python2.7/urllib2.py",
407,
"_call_chain",
"result = func(*args)"
],
[
"/usr/lib64/python2.7/urllib2.py",
556,
"http_error_default",
"raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)"
]
],
"errorType": "HTTPError",
"errorMessage": "HTTP Error 504: G
This code:
sanitize_envvars = {
"AWS_SESSION_TOKEN":
{
"func": truncate, "args": [], "kwargs": {'end': 12}
},
"AWS_SECURITY_TOKEN":
{
"func": truncate, "args": [], "kwargs": {'end': 12}
},
"AWS_ACCESS_KEY_ID":
{
"func": truncate, "args": [], "kwargs": {'end': 12}
},
"AWS_SECRET_ACCESS_KEY":
{
"func": truncate, "args": [], "kwargs": {'end': 12}
}
}
Seems to actually be removing values from the actual environment instead of just from the dict. When this code is called it results in the function no longer being able to upload to S3. Lambda outputs error:
The provided token is malformed or otherwise invalid.
ps output empty in output dict.
Azure instances appear to have a writable tmp location. Basically the approach would be:
Detect OS is windows... we do this in main.py
Write your warm file to D:\local\Temp
Bazinga!
If fs is read-only handle and throw exception that is_warm can not be tested.
Turns out lambda functions don't get any access to the internet without the presence of a cost prohibitive NAT gateway. This means that lambda functions running inside of the ThreatResponse AWS account will need to POST their results in a different way than runtimes out in the wild.
Potential options are:
1. Profiler writes a {uuid.hex()}.json.gz file directly to S3 the same way the API does
2. Profiler writes to dynamo and we pick that up somewhere else ( seems unnecessary )
3. Deploy NAT gateway. ( Not cost effective ).
4. Deploy the Lambda in the same VPC as the API box and point it directly at the API instead.
So I'm sure that you've gathered option 1 is preferred. It's just a matter of writing a little logic that only does the S3 upload if you're running from within a lambda function. We'll still need the urllib2.Request method in the python profiler that @jeffbryner wrote. Oddly it doesn't actually cause the function to fail. Simply nothing ever happens...
Option 4. isn't a bad choice either but has implications for if / when we want to go multi-region and puts heavier requirements on the CI/CD pipeline to attach things.
#"warm_since": is_warm.warm_since, # Issues with dynamo types
#"warm_for": is_warm.warm_for, # Issues with dynamo types
#"dmesg": get_dmesg, # Issues with dynamo types
Error is:
TypeError: Float types are not supported. Use Decimal types instead.
ClientError: An error occurred (ValidationException) when calling the PutItem operation: One or more parameter values were invalid: An AttributeValue may not contain an empty string
Here's a snippet that works against my local dev.
def store_results_api(res):
"""Store Results via the API Component.
Store results either in urllib2 or directly in s3 if lambda.
HTTP request will be a POST instead of a GET when the data
parameter is provided.
"""
data = json.dumps(res)
api_key = '4KGRb4PMlx1bBLZQ'
headers = {
"Authorization": "Basic %s" % api_key,
'Content-Type': 'application/json'
}
req = urllib2.Request(
'http://localhost:5000/api/profile',
data=data,
headers=headers
)
try:
response = urllib2.urlopen(req)
return response.read()
except Exception as e:
raise e
Each run should also log a date time in epoch as part of the json
I thought it would be easy to do this in fluentd ... turns out it's not that easy without writing a fluentd plugin.
Basically bottom line is that if we take the output of CPUInfo and tokenize it in python as a dict it will parse to fields automatically in elastic.
TypeError: datetime.datetime(2017, 3, 6, 19, 13, 16) is not JSON serializable occurs in python for warm_since and warm_for
Let's move them all to same format.
Add which runtime we're evaling as static but present in the output
Example:
{
runtime: python | c-sharp | javascript
}
It’d be great to get find /var/task
in this profiler output if you’re releasing it as a general purpose tool.
@Miserlou : just capturing this in an issue
This is a low priority. I just want to capture it.
Since we have a custom policy for execution we should give users a way to provision that themselves.
After data gather post final output to the API for analysis
arn:aws:iam::576309420438:role/lambda_write_s3 is the appropriate policy
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.