Comments (8)
I looked a little farther and realized that you are caching results in _compose_pattern()
through the use of the @functools.lru_cache
decorator. However, your cache size of 2048 is far too small: in the checks for IAM alone, that function processed 7,100 unique inputs. I'm sure that the caching helps a little but it looks like a lot were expiring from the cache before being requested again.
So, I removed the size limit by changing this line (principalmapper/querying/local_policy_simulation.py:891):
@functools.lru_cache(maxsize=2048, typed=True)
to this:
@functools.cache
With that change, processing for CloudFormation dropped from 354 seconds to 41 seconds. Lambda went from >4 hours to 27 minutes.
Those times still seem excessive to me. I suspect that there are other regex-related bottlenecks in PMapper. The call to re.match
at local_policy_simulation.py:428 looks like another good candidate for optimization. Maybe if I find some time, I'll re-run my modified build with the profiler.
from pmapper.
I re-ran the modified version with cProfile; it turns out that local_policy_simulation.py:428 is a non-issue. However, _matches_after_expansion()
, the function that calls _compose_pattern()
, remains a bottleneck. In a 5225-second run (that I killed before completion), that function was called ~81 million times for a total of 1,769 seconds with 1,146 of those being in the function itself (most of the remainder was in pattern.match()
). policy_has_matching_statement()
, one of the main callers of _matches_after_expansion()
, was also a big time-sink, consuming 660 seconds with ~2.4 million calls. There must be something calling those functions far too often.
There are also some basic data operations that may be being called too often. 282 seconds were consumed by str.lower()
(called 42 million times), which is more than 5% of the execution time of the program. Adding things to and removing things from CaseInsensitiveDict
consumed more than 10%. These are probably related since CaseInsensitiveDict
calls str.lower
on every operation. Maybe convert data to lower-case on ingestion so that you don't need to worry about it later? Or maybe you can optimize the number of CaseInsensitiveDict operations down from ~20 million to something more reasonable and the time spent converting to lower case will no longer be important.
from pmapper.
I scanned an even bigger account; PMapper took five hours to run (with the fix above but without cProfile). The graph stats are:
# of Nodes: 1487 (63 admins)
# of Edges: 14406
# of Groups: 19
# of (tracked) Policies: 2030
While that is a big account, it shouldn't take that long.
from pmapper.
This is some awesome commentary. I have also been struggling with pmapper's performance. Typically the CPU % is high (~99%), whilst RAM usage would be less than 1%.
I will try and look into the recommendations you made, and also want to look into multi-threading the edge identification code. I killed pmapper last night after it took 6+ hrs just trying to perform edge identification on Lambda.
I am also looking at moving across to awspx. I was mainly using pmapper over awspx as it has more attack paths in it, and has a lower barrier to entry to contribute due to its more straightforward code base. But I am friends with the created of awspx, and would like to pick up improving it (the creator has moved onto a new job role, and maintaining awspx is no longer a priority for him)
from pmapper.
Maybe related to #55. 16 days to complete is an interesting figure.
from pmapper.
@rdegraaf I made some slight optimizations to local_policy_simulation - theres still probably more that can be done, but it improved performance in a small account by ~60%. I have also implemented multiprocessing.
Can you give this branch of my fork a try? https://github.com/Fennerr/PMapper/tree/multiprocessing
from pmapper.
@Fennerr Thank you for taking this on! Unfortunately, I no longer have access to the accounts described above or to anything similar; they are not my accounts and I only temporary access to perform a security review. I'll see if I can find a co-worker who can help, but no promises.
from pmapper.
Thank you @rdegraaf - I have also commited these changes to my main branch
from pmapper.
Related Issues (20)
- PMapper 1.1.5 builds edges that include role/AWSServiceRoleForSupport when performing authorization checks HOT 10
- Terraform Plans HOT 2
- Graph Deletion HOT 1
- Local user who can assume an admin role not in graph HOT 6
- Stuck at Generating Edges based on lambda data HOT 2
- MFA requirements in roles can lead to misleading results
- can_privesc() method only returns one edge_list ?
- Traceback when doing connected query for role that does not exist
- FileNotFoundError in graph_cli
- Exception When Policy is Only Used as Permission Boundary HOT 1
- Permission boundaries not considered when querying
- Python 3.10 fails to run HOT 1
- Does not run in 3.11 due to mapping import error HOT 1
- iam:ListAccessKeys denied exception in gathering.py
- Stack trace on incorrect PMAPPER_STORAGE environment variable
- Stack trace on missing credentials
- Crash while scanning principals that use deprecated permission policies HOT 3
- AWS Policy with minimal permissions
- Collections Module issue in Python 3.10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pmapper.