Coder Social home page Coder Social logo

Comments (4)

nleach999 avatar nleach999 commented on September 28, 2024 1

Hi @istvanfedak ,

The issue you'll run into is that SAST will need to have an appropriately scaled DB and number of concurrent managers to be able to handle load bursts that would result from the scale that would be available from that architecture.

I had to recently change the number of concurrent threads used by the async initial crawl because it hammered the non-report APIs. This caused SAST system performance to degrade due to the sheer number of concurrent requests. If you're executing those requests on-demand as part of a realtime decision making step, you'll need to plan for your SAST system components to be scaled appropriately to avoid system instability.

The data you're trying to retrieve is currently only available in the report AFAIK. The comments come packed in a single field that you'd have to parse even if CxAnalytix were extracting it. While CxAnalytix was not designed to provide realtime data feeds, there are a few things you could do:

  • CxAnalytix persists to MongoDB, you build lambdas that do the extraction/parsing on demand at the scale you need. DocumentDB is an option some people use since it is mostly compatible with the Mongo API.
  • CxAnalytix sends filtered data records to an AMQP endpoint, it invokes a lambda that lets you parse/transform/persist the data, then provide other lambdas that can be queried from where you've persisted the extracted data.

Both of these would shift the data retrieval load away from your SAST system. It would have a time lag between crawls and you wouldn't get the most recent state of any project until the next scan is crawled, but would allow you to record/retrieve the comments and triage states ad hoc.

from cxanalytix.

nleach999 avatar nleach999 commented on September 28, 2024

Hi @istvanfedak ,

Can you please expand on your use-case a bit more? CxAnalytix crawls typically take more time than is feasible to put in a Lambda. There is also no external input; it chats with the SAST API, downloads scans in scope, outputs records.

There is a Docker container available that can run without a full machine instance.

If you're looking for something that invokes Lambda functions with the transformed data messages, the AMQP output mechanism can invoke Lambda functions when events are received at an AMQP endpoint. (This might need a bit of work to send boundary marker messages so async workflows can orchestrate events properly, but that might depend on your use-case.)

from cxanalytix.

istvanfedak avatar istvanfedak commented on September 28, 2024

Hi @nleach999,

I hope all is well. I'll look into the Docker container.

I was wondering if it was feasible to have CxAnalytix available as a AWS Lambda layer and be able to provide all the scan information for a single given scan (scanId). This would provide the AWS Lambda with a pre-packaged CxAnalytix executable that can be invoked from the Lambda.

From there you could split the workload between multiple lambdas using a fan out approach. For example, one lambda would get a list of all the projects in Checkmarx (this can be done using the SAST api as well) and it would invoke a Lambda project handler per project. From there the project handler Lambda would invoke CxAnalytix to obtain all the latest scan information for that specific project and save it in either a database or an S3 (the customer would decide what to do with the data). We need to pull the historical label data for a scan issue and we were hoping to leverage CxAnalytix.

We built a similar workflow using the Checkmarx API. We have a Lambda that gets a list of projects and then invokes a project handler lambda per project. The project handler lambda gets all the latest scan information, generates a CSV scan report and parses the data (not to mention the back end batch job to delete the reports on the Checkmarx server). The only thing we can't obtain is the historical label data. There is an endpoint in the Checkmarx API that allows us to pull the label for a scan issue (GET /sast/scans/{scanId}/results/{pathId}/labels) but it doesn't provide the historical data.

The scan report issue label historical data is all compressed together and it's quite hard to parse out. This would be a sample of the scan issue historical data we're trying to obtain:

[
  {
    "state": 0,
    "severity": 0,
    "userAssignment": "admin",
    "comment": "string",
    "datetime": "string"
  },
  {
    "state": 0,
    "severity": 0,
    "userAssignment": "admin",
    "comment": "string",
    "datetime": "string"
  }
]

Thanks for the help!

from cxanalytix.

istvanfedak avatar istvanfedak commented on September 28, 2024

@nleach999 yes we ran into the same scaling issues and we limited the number of concurrent lambda executions down to 5 lambdas running concurrently. AWS Lambda has a built in event queue so the Lambda that got the projects would add events to the project handler lambda queue.

We're not looking to pull the data live and we want to use it more for reporting or analytics. This ETL job runs once per day.

Since comments come packed in a single field that you'd have to parse even if CxAnalytix were extracting it, we won't be able to leverage CxAnalytix.

Thanks you for the explanation it was very helpful!

from cxanalytix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.