Comments (4)
Hi @istvanfedak ,
The issue you'll run into is that SAST will need to have an appropriately scaled DB and number of concurrent managers to be able to handle load bursts that would result from the scale that would be available from that architecture.
I had to recently change the number of concurrent threads used by the async initial crawl because it hammered the non-report APIs. This caused SAST system performance to degrade due to the sheer number of concurrent requests. If you're executing those requests on-demand as part of a realtime decision making step, you'll need to plan for your SAST system components to be scaled appropriately to avoid system instability.
The data you're trying to retrieve is currently only available in the report AFAIK. The comments come packed in a single field that you'd have to parse even if CxAnalytix were extracting it. While CxAnalytix was not designed to provide realtime data feeds, there are a few things you could do:
- CxAnalytix persists to MongoDB, you build lambdas that do the extraction/parsing on demand at the scale you need. DocumentDB is an option some people use since it is mostly compatible with the Mongo API.
- CxAnalytix sends filtered data records to an AMQP endpoint, it invokes a lambda that lets you parse/transform/persist the data, then provide other lambdas that can be queried from where you've persisted the extracted data.
Both of these would shift the data retrieval load away from your SAST system. It would have a time lag between crawls and you wouldn't get the most recent state of any project until the next scan is crawled, but would allow you to record/retrieve the comments and triage states ad hoc.
from cxanalytix.
Hi @istvanfedak ,
Can you please expand on your use-case a bit more? CxAnalytix crawls typically take more time than is feasible to put in a Lambda. There is also no external input; it chats with the SAST API, downloads scans in scope, outputs records.
There is a Docker container available that can run without a full machine instance.
If you're looking for something that invokes Lambda functions with the transformed data messages, the AMQP output mechanism can invoke Lambda functions when events are received at an AMQP endpoint. (This might need a bit of work to send boundary marker messages so async workflows can orchestrate events properly, but that might depend on your use-case.)
from cxanalytix.
Hi @nleach999,
I hope all is well. I'll look into the Docker container.
I was wondering if it was feasible to have CxAnalytix available as a AWS Lambda layer and be able to provide all the scan information for a single given scan (scanId). This would provide the AWS Lambda with a pre-packaged CxAnalytix executable that can be invoked from the Lambda.
From there you could split the workload between multiple lambdas using a fan out approach. For example, one lambda would get a list of all the projects in Checkmarx (this can be done using the SAST api as well) and it would invoke a Lambda project handler per project. From there the project handler Lambda would invoke CxAnalytix to obtain all the latest scan information for that specific project and save it in either a database or an S3 (the customer would decide what to do with the data). We need to pull the historical label data for a scan issue and we were hoping to leverage CxAnalytix.
We built a similar workflow using the Checkmarx API. We have a Lambda that gets a list of projects and then invokes a project handler lambda per project. The project handler lambda gets all the latest scan information, generates a CSV scan report and parses the data (not to mention the back end batch job to delete the reports on the Checkmarx server). The only thing we can't obtain is the historical label data. There is an endpoint in the Checkmarx API that allows us to pull the label for a scan issue (GET /sast/scans/{scanId}/results/{pathId}/labels) but it doesn't provide the historical data.
The scan report issue label historical data is all compressed together and it's quite hard to parse out. This would be a sample of the scan issue historical data we're trying to obtain:
[
{
"state": 0,
"severity": 0,
"userAssignment": "admin",
"comment": "string",
"datetime": "string"
},
{
"state": 0,
"severity": 0,
"userAssignment": "admin",
"comment": "string",
"datetime": "string"
}
]
Thanks for the help!
from cxanalytix.
@nleach999 yes we ran into the same scaling issues and we limited the number of concurrent lambda executions down to 5 lambdas running concurrently. AWS Lambda has a built in event queue so the Lambda that got the projects would add events to the project handler lambda queue.
We're not looking to pull the data live and we want to use it more for reporting or analytics. This ETL job runs once per day.
Since comments come packed in a single field that you'd have to parse even if CxAnalytix were extracting it
, we won't be able to leverage CxAnalytix
.
Thanks you for the explanation it was very helpful!
from cxanalytix.
Related Issues (20)
- CxAnalytix name and version in user agent
- Details update when no new scans between crawls HOT 1
- Cx Application Security Platform (CxOne) compatibility
- Analysis Orchestration with CxAnalytix crawls in a steady state HOT 1
- New record type: scan statistics HOT 1
- Requesting OSA reports when OSA is not licensed
- SinkFileName, SinkLine, and SinkColumn values reflect the source line/column
- include branching information for projects in CxAnalytix
- SAST scan summary totals should match the project state totals
- BFL Node HOT 1
- Scan custom fields
- Audit table crawl throws an exception if the connection string is not defined
- Authentication issue - can't use disposed object
- Can't start with SAST versions < 9.5
- AMQP transaction marker messages
- Additional permissions required to retrieve SAST scan statistics
- Output to gRPC endpoints
- Please add note to Troubleshooting Wiki/Docs
- Unable to obtain login token due to an unexpected exception HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cxanalytix.