threatresponse / python-lambda-inspector Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 7.0 78 KB

A profiler for the lambda sandbox.

License: MIT License

Python 100.00%

python-lambda-inspector's People

Contributors

Stargazers

Watchers

Forkers

sparkyfen loganding sshobana misterfxguy chiranjeevirayapudi cclauss syllogy

python-lambda-inspector's Issues

Log data output to api

After data gather post final output to the API for analysis

https://showdown-api.ephemeralsystems.com/

CI Process should attach appropriate policy

arn:aws:iam::576309420438:role/lambda_write_s3 is the appropriate policy

Add documentation and cloudformation to include creating policy

This is a low priority. I just want to capture it.

Since we have a custom policy for execution we should give users a way to provision that themselves.

Sanitize Env Vars a little too good

This code:

sanitize_envvars = {
    "AWS_SESSION_TOKEN":
        {
            "func": truncate, "args": [], "kwargs": {'end': 12}
        },
    "AWS_SECURITY_TOKEN":
        {
            "func": truncate, "args": [], "kwargs": {'end': 12}
        },
    "AWS_ACCESS_KEY_ID":
        {
            "func": truncate, "args": [], "kwargs": {'end': 12}
        },
    "AWS_SECRET_ACCESS_KEY":
        {
            "func": truncate, "args": [], "kwargs": {'end': 12}
        }
}

Seems to actually be removing values from the actual environment instead of just from the dict. When this code is called it results in the function no longer being able to upload to S3. Lambda outputs error:

The provided token is malformed or otherwise invalid.

Handle is_warm exception

If fs is read-only handle and throw exception that is_warm can not be tested.

Fallback to s3 storage on exception

Request code needs a bit of exception handling to fallback to s3 storage if it can't get to the API

{
"stackTrace": [
[
"/var/task/main.py",
163,
"lambda_handler",
"api_call = store_results(res)"
],
[
"/var/task/main.py",
150,
"store_results",
"response = urllib2.urlopen(req)"
],
[
"/usr/lib64/python2.7/urllib2.py",
154,
"urlopen",
"return opener.open(url, data, timeout)"
],
[
"/usr/lib64/python2.7/urllib2.py",
435,
"open",
"response = meth(req, response)"
],
[
"/usr/lib64/python2.7/urllib2.py",
548,
"http_response",
"'http', request, response, code, msg, hdrs)"
],
[
"/usr/lib64/python2.7/urllib2.py",
473,
"error",
"return self._call_chain(*args)"
],
[
"/usr/lib64/python2.7/urllib2.py",
407,
"_call_chain",
"result = func(*args)"
],
[
"/usr/lib64/python2.7/urllib2.py",
556,
"http_error_default",
"raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)"
]
],
"errorType": "HTTPError",
"errorMessage": "HTTP Error 504: G

/var/task

It’d be great to get find /var/task in this profiler output if you’re releasing it as a general purpose tool.

@Miserlou : just capturing this in an issue

Output of ps aux empty

ps output empty in output dict.

Standardize DateTimes

TypeError: datetime.datetime(2017, 3, 6, 19, 13, 16) is not JSON serializable occurs in python for warm_since and warm_for

Let's move them all to same format.

Update is_warm to account for windows

Azure instances appear to have a writable tmp location. Basically the approach would be:

Detect OS is windows... we do this in main.py
Write your warm file to D:\local\Temp

Bazinga!

Support posting to observatory

Here's a snippet that works against my local dev.

def store_results_api(res):
    """Store Results via the API Component.

    Store results either in urllib2 or directly in s3 if lambda.
    HTTP request will be a POST instead of a GET when the data
    parameter is provided.
    """
    data = json.dumps(res)
    api_key = '4KGRb4PMlx1bBLZQ'
    headers = {
        "Authorization": "Basic %s" % api_key,
        'Content-Type': 'application/json'
        }

    req = urllib2.Request(
        'http://localhost:5000/api/profile',
        data=data,
        headers=headers
    )
    try:
        response = urllib2.urlopen(req)
        return response.read()
    except Exception as e:
        raise e

Log timedata

Each run should also log a date time in epoch as part of the json

Add runtime to json structure

Add which runtime we're evaling as static but present in the output
Example:
{
runtime: python | c-sharp | javascript
}

Attempt to detect sandbox type.

Function should add sandbox type to the dictionary for improved elastic analysis.

Get instance ip and add it to the output

tokenize filesystem info to standardized dict

For windows and linux friendliness we're going to have to make dicts a standard. Something like
{
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
"writeable": "true"
}
},
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
"writeable": "true"
}
}

Lookups return types that can not be stored in dynamo

#"warm_since": is_warm.warm_since, # Issues with dynamo types
#"warm_for":   is_warm.warm_for, # Issues with dynamo types
#"dmesg":      get_dmesg, # Issues with dynamo types

Error is:

TypeError: Float types are not supported. Use Decimal types instead.

ClientError: An error occurred (ValidationException) when calling the PutItem operation: One or more parameter values were invalid: An AttributeValue may not contain an empty string

Sanitize sensitive data from env

Redact by truncating to the first 12 characters or so the following fields gathered from get_env()

"AWS_SESSION_TOKEN":
"AWS_SECURITY_TOKEN":
"AWS_ACCESS_KEY_ID":
"AWS_SECRET_ACCESS_KEY":

Replace depricated os.popen with subprocess in call_shell_wrapper

profilers/utils.py contains an os.popen call which which was deprecated in python2.6.
We should consider replacing the os.popen call with the subprocess module down the line to support profiling python 3 runtimes.

This is not a requirement for the MVP but should be tracked for once the MVP is complete.

Tokenize CPUInfo Options to_dict

I thought it would be easy to do this in fluentd ... turns out it's not that easy without writing a fluentd plugin.

Basically bottom line is that if we take the output of CPUInfo and tokenize it in python as a dict it will parse to fields automatically in elastic.

Lambda functions and internet access

Turns out lambda functions don't get any access to the internet without the presence of a cost prohibitive NAT gateway. This means that lambda functions running inside of the ThreatResponse AWS account will need to POST their results in a different way than runtimes out in the wild.

Potential options are:
1. Profiler writes a {uuid.hex()}.json.gz file directly to S3 the same way the API does
2. Profiler writes to dynamo and we pick that up somewhere else ( seems unnecessary )
3. Deploy NAT gateway. ( Not cost effective ).
4. Deploy the Lambda in the same VPC as the API box and point it directly at the API instead.

So I'm sure that you've gathered option 1 is preferred. It's just a matter of writing a little logic that only does the S3 upload if you're running from within a lambda function. We'll still need the urllib2.Request method in the python profiler that @jeffbryner wrote. Oddly it doesn't actually cause the function to fail. Simply nothing ever happens...

Option 4. isn't a bad choice either but has implications for if / when we want to go multi-region and puts heavier requirements on the CI/CD pipeline to attach things.