rackerlabs / cloudpassage-lib Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 7.0 106 KB

A Clojure library for interacting with CloudPassage APIs.

License: Eclipse Public License 1.0

Clojure 100.00%

cloudpassage-lib's People

Contributors

Watchers

Forkers

ehashman derwolfe lvh timsimpsonr fxfitz sirsean fhocutt

cloudpassage-lib's Issues

Increase cache TTL for auth tokens

I was wondering why we need to fetch a new auth token every 4-5 requests in clark-kent, and realized this is because the cache TTL is set to 8000ms (i.e. 8 seconds): https://github.com/RackSec/cloudpassage-lib/blob/48eb2c6ee7840665a8a63cbf6719527a3fcb4ab4/src/cloudpassage_lib/core.clj#L83

The Halo docs suggest the tokens usually live for 15 minutes, although we can check the exact token lifetime in the response body. Is it possible to cache these for 8 minutes (which I believe we did in redis) instead?

Refactor the library such that it doesn't need to look at environment variables

Here's an example of an error message that doesn't say very much:

=> (scans/fim-report! "$ID" "$KEY")
16-03-11 20:25:17 MJV0HLDKQ4 INFO [cloudpassage-lib.core] - fetching new auth token for $ID
Mar 11, 2016 2:25:17 PM clojure.tools.logging$eval420$fn__424 invoke
SEVERE: error in stream handler
java.lang.NullPointerException
    at java.util.Arrays.copyOfRange(Arrays.java:3521)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
    at clojure.lang.Reflector.invokeStaticMethod(Reflector.java:207)
    at fernet.core$split_key.invokeStatic(core.clj:29)
    at fernet.core$split_key.invoke(core.clj:28)
    ...

Yay NPE!

The actual problem here was that the fernet key/redis environment variables weren't set properly in profiles.clj. It would be great to do some checking for that and provide a more user-friendly error message.

Codecov is not giving us a gating check for CI

We encountered this issue on another repo as well. Not sure what's going on, all the settings look right on our end. Suspect it's a codecov problem.

Sometimes, we fail to fetch an auth token

This looks something like the following, from the clark-kent logs:

16-07-13 14:44:19 cloudpassage-reporter INFO [cloudpassage-lib.core:56] - fetching new auth token for <id redacted>
16-07-13 14:44:19 cloudpassage-reporter INFO [cloudpassage-lib.core:74] - fetching https://api.cloudpassage.com/v1/servers/
16-07-13 14:44:21 cloudpassage-reporter INFO [cloudpassage-lib.scans:94] - no more urls to fetch
Jul 13, 2016 2:44:21 PM clojure.tools.logging$eval36$fn__40 invoke
SEVERE: error in stream handler
clojure.lang.ExceptionInfo: Invalid token. {}
        at clojure.core$ex_info.invokeStatic(core.clj:4617)
        at clojure.core$ex_info.invoke(core.clj:4617)
        at fernet.core$invalid_token.invokeStatic(core.clj:16)
        at fernet.core$invalid_token.invoke(core.clj:13)
        at fernet.core$decrypt_token.invokeStatic(core.clj:89)
        at fernet.core$decrypt_token.doInvoke(core.clj:78)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at clojure.lang.AFn.applyToHelper(AFn.java:156)
        at clojure.lang.RestFn.applyTo(RestFn.java:132)
        at clojure.core$apply.invokeStatic(core.clj:650)
        at clojure.core$apply.invoke(core.clj:641)
        at fernet.core$decrypt_to_string.invokeStatic(core.clj:110)
        at fernet.core$decrypt_to_string.doInvoke(core.clj:101)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at cloudpassage_lib.core$fetch_token_BANG_.invokeStatic(core.clj:104)
        at cloudpassage_lib.core$fetch_token_BANG_.invoke(core.clj:93)
        at cloudpassage_lib.scans$get_page_BANG_.invokeStatic(scans.clj:73)
        at cloudpassage_lib.scans$get_page_BANG_.invoke(scans.clj:70)
        at cloudpassage_lib.scans$scan_each_server_BANG_$scan_server_BANG___16574.invoke(scans.clj:141)
        at cloudpassage_lib.scans$scan_each_server_BANG_$fn__16577.invoke(scans.clj:145)

<giant manifold traceback>

We should add token-checking logic and a retry to ensure we don't attempt to proceed with an empty auth token.

(I thought I had filed this bug ages ago, but turns out I did not.)

Fetch scans for more precise time ranges

Right now, when I call fim-report! with the hard-coded time range of the last three hours, I may actually receive multiple scan reports for the same host in the results (as FIM scans are completed on an hourly basis). This almost certainly isn't the desired behaviour; rather, I want the last/most recent FIM scan for each individual host.

We need to figure out if this is just a matter of tweaking the time range, or if we'll have to approach this differently (i.e. by requesting scans per host).

Convert snake case API data to kebab case

@derwolfe pointed out the style clash between the snake case data returned by the API and the kebab case we usually use in Clojure land. Hence, for consistency's sake, it would be nice to convert all the snake case API data to kebab case for consumer use.

@sirsean has pointed me towards https://github.com/qerub/camel-snake-kebab, which will surely solve this problem. @lvh pointed out an example of this used in RackSec/desdemona@7fe497f.

`fetch-token!` should be asynchronous

fetch-token! is a blocking, non-asynchronous operation. It should return deferred/future that wraps a value.

Rework Pagination / use a url package, walk links

Instead of creating pagination links in advance, we should walk the links returned by the api. This is more robust in the case that our clock is out of sync with cloudpassage's.

This will likely require

parsing the url returned by the next link, and turning it into a request
pushing the resulting request map onto the input-stream
Reworking the tests.

Also, this looks like it would be a good place to clean up how urls and request data are being sent to the actual page fetcher. Instead of creating a string representing the complete uri in advance, Aleph has support for query parameters. fetch-events! currently builds this string manually.

(original issue here https://github.com/RackSec/cloudpassage-poller/issues/45#issuecomment-195562946)

Add a changelog to the project

It would be nice to have a changelog documenting what's new in each release, above and beyond the git log.

Off by one error when calling scans!

Calling the sca api is (likely) returning one too few page of results.

The reason for this behavior is (here)[https://github.com/RackSec/cloudpassage-lib/blob/4c67340c1224c90789deb8b7a7af1854e7bcdf82/src/cloudpassage_lib/scans.clj#L81]. The call to ms/put-all ... might be called on a stream that has already been closed. The solution would be to move the call to ms/put-all above the conditional branch that closes the streams, as shown here

Add CSM reporting capability

Although "CSM" might actually be an inaccurate: it looks like the docs call this "SCA"? See https://support.cloudpassage.com/entries/24082902-Scan-History for more details.

Add retry logic on HTTP errors

I got my first 502 gateway error from the CloudPassage SCA API today! If we end up losing a single report like that, for our purposes the combined report is incomplete and hence incorrect, so we need to start over from scratch.

But it's wasteful to have to throw all the fetched data away in order to try again. Ideally, we should add some retry logic on encountering an error, and only fail after trying some number of times.

Add docs for the clojars release process

How to sign up for clojars and deal with credentials
Adding someone to the group so they can upload releases
How to cut a release and push changes back to master using lein release

JSON parsing should stream from a reader

Cheshire can handle parsing of a stream; it shouldn't use the bs/to-string api. Instead it should use:

...
byte-stream/to-reader
#(cheshire.core/parse-stream % true)
)

https://github.com/RackSec/alertlogic-differ/blob/b6bc20a24c9c6f20d085521029c2f2a49d3df21c/src/alertlogic_differ/api.clj#L25

Fix the docs for compatibility with the lein-env plugin

lein-env the plugin will blow away .lein-env the file if there isn't a profiles.clj in the root of the repo. We should update the docs to reflect this.

Sample profiles.clj for dev testing in the repl:

{:repl {:env {:accounts "lvh:hunter2"
              :redis-url "redis://localhost:6379"
              :redis-timeout 4000
              :fernet-key "very_secret_fernet_key"}}}

Update some log levels for monitoring granularity

Currently, core.clj#L86 and scans.clj#L59 log as errors and not warnings. These would be more appropriately logged as warnings. I'd like to change the log levels here so we can only alert on ERROR or FATAL messages.

fim-report! should fetch more data

Currently the output of fim-report! looks like this:

({:server_id "09d36abea5cc11e591527d9f85f6c9bc",
  :server_url
  "https://api.cloudpassage.com/v1/servers/09d36abea5cc11e591527d9f85f6c9bc",
  :completed_at "2016-03-07T18:45:59.840Z",
  :server_hostname "111111-cf01",
  :module "fim",
  :non_critical_findings_count 0,
  :status "completed_clean",
  :id "d4d0caace49411e58d021b460156fb0c",
  :ok_findings_count 3801,
  :url
  "https://api.cloudpassage.com/v1/scans/d4d0caace49411e58d021b460156fb0c",
  :critical_findings_count 4,
  :created_at "2016-03-07T18:45:58.006Z"}
...)

This probably isn't detailed enough for compliance purposes. It looks like the rest of the scan data lives at the :url; we should probably improve fim-report! to fetch and return the data located there as well.

Performance issues with CloudPassage API calls

As the number of hosts in the reports increase, the CloudPassage API's performance issues become more readily apparent. Based on a trial run, I calculated:

SCA calls take ~7s but can take up to 10-15s and return ~190kB of data for Windows, ~250kB for Linux (RHEL)
SVM calls take ~3.5s and return ~20kB per server
FIM calls take ~0.5s and return ~40kB per server

Part of the discrepancy between the performance of the FIM vs. SCA calls can be explained by a large difference in data size returned: FIM scans at the "details" level have many fewer details than the SCA and SVM scans. However, this doesn't explain everything; SVM scans are smaller than FIM scans and still take longer to fetch.

I gathered this data by working with scans for a client with 58 hosts. It took a total of 10 minutes to fetch data the first time around, but could take longer on occasion. This concerns me because if those numbers are accurate, we can expect to spend nearly 3 hours fetching data for a client with 1000 hosts.

We haven't tried parallelizing these requests, because of worries about rate-limiting (see #41). It may also be helpful if we could batch the requests, but I don't see any API docs on the topic.

Clean up some of the server URL logic in scans.clj

There is at least one unused function amongst all of the URLs helpers at the top of scans. Someone should go through this and remove any unused code, and standardize how we do things between scans vs. servers.

Add SVM reporting capability

This is currently in progress ala #18.

Fix error-handling anywhere `get-page!` is used

scans/get-page! throws an Exception, but if an exception is thrown inside a manifold deferred, manifold catches this exception and prints out a stack trace instead. I suspect what we actually want to do is to md/catch the Exception before manifold does and throw another one ourselves, in order for that to propagate to the report consumer.

Consolidate mocks in tests to use behaviors

The tests inside of scans_test use a mixture of fake-get-page! and anonymous functions defined inside the tests. This is confusing to maintainers and should be refactored to use a single behavior driven mock that can return both the happy and failure paths.

rackerlabs / cloudpassage-lib Goto Github PK

cloudpassage-lib's People

Contributors

Watchers

Forkers

cloudpassage-lib's Issues

Recommend Projects

Recommend Topics

Recommend Org