Coder Social home page Coder Social logo

google / gcp_scanner Goto Github PK

View Code? Open in Web Editor NEW
289.0 10.0 94.0 10.95 MB

A comprehensive scanner for Google Cloud

License: Apache License 2.0

Dockerfile 0.32% Python 79.24% Shell 2.86% JavaScript 0.29% HTML 0.15% TypeScript 14.84% CSS 2.31%
automation gcp google-cloud-platform scanning-tool security

gcp_scanner's Introduction

pytests GCP scanner version

Disclaimer

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

GCP Scanner

Logo

This is a GCP resource scanner that can help determine what level of access certain credentials possess on GCP. The scanner is designed to help security engineers evaluate the impact of a certain VM/container compromise, GCP service account, or OAuth2 token key leak.

Currently, the scanner supports the following GCP resources:

  • GCE
  • GCS
  • GKE
  • App Engine
  • Cloud SQL
  • BigQuery
  • Spanner
  • Pub/Sub
  • Cloud Functions
  • BigTable
  • CloudStore
  • KMS
  • Cloud Services
  • The scanner supports SA impersonation

The scanner supports extracting and using the following types of credentials:

  • GCP VM instance metadata;
  • User credentials stored in gcloud profiles;
  • OAuth2 Refresh Token with cloud-platform scope granted;
  • GCP service account key in JSON format.

The scanner does not rely on any third-party tool (e.g., gcloud). Thus, it can be compiled as a standalone tool and used on a machine with no GCP SDK installed (e.g. a Kubernetes pod). However, please keep in mind that the only OS that is currently supported is Linux.

Please note that GCP offers Policy Analyzer to find out which principals (users, service accounts, groups, and domains), have what access to which Google Cloud resources. However, it requires specific permissions on the GCP project and the Cloud Assets API needs to be enabled. If you just have a GCP SA key, access to a previously compromised VM, or an OAUth2 refresh token, gcp_scanner is the best option to use.

Installation

To install the package, use pip (you must also have git installed):

pip install gcp_scanner
python3 -m gcp_scanner --help

Alternatively:

git clone https://github.com/google/gcp_scanner
cd gcp_scanner
pip install .
gcp-scanner --help

There is a docker build file if you want to run the scanner from a container: docker build -f Dockerfile -t sa_scanner .

Command-line options

usage: python3 scanner.py -o folder_to_save_results -g -

GCP Scanner

options:
  -h, --help            show this help message and exit
  -ls, --light-scan     Return only the most important GCP resource fields in the output.
  -k KEY_PATH, --sa-key-path KEY_PATH
                        Path to directory with SA keys in json format
  -g GCLOUD_PROFILE_PATH, --gcloud-profile-path GCLOUD_PROFILE_PATH
                        Path to directory with gcloud profile. Specify - to search for credentials in default gcloud config path
  -m, --use-metadata    Extract credentials from GCE instance metadata
  -at ACCESS_TOKEN_FILES, --access-token-files ACCESS_TOKEN_FILES
                        A list of comma separated files with access token and OAuth scopes.TTL limited. A token and scopes should be stored in JSON format.
  -rt REFRESH_TOKEN_FILES, --refresh-token-files REFRESH_TOKEN_FILES
                        A list of comma separated files with refresh_token, client_id,token_uri and client_secret stored in JSON format.
  -s KEY_NAME, --service-account KEY_NAME
                        Name of individual SA to scan
  -p TARGET_PROJECT, --project TARGET_PROJECT
                        Name of individual project to scan
  -f FORCE_PROJECTS, --force-projects FORCE_PROJECTS
                        Comma separated list of project names to include in the scan
  -c CONFIG_PATH, --config CONFIG_PATH
                        A path to config file with a set of specific resources to scan.
  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set logging level (INFO, WARNING, ERROR)
  -lf LOG_FILE, --log-file LOG_FILE
                        Save logs to the path specified rather than displaying in console
  -pwc PROJECT_WORKER_COUNT, --project-worker-count PROJECT_WORKER_COUNT
                        Set limit for project crawlers run in parallel.
  -rwc RESOURCE_WORKER_COUNT, --resource-worker-count RESOURCE_WORKER_COUNT
                        Set limit for resource crawlers run in parallel.

Required parameters:
  -o OUTPUT, --output-dir OUTPUT
                        Path to output directory

Option -f requires an additional explanation. In some cases, the service account does not have permissions to explicitly list project names. However, it still might have access to underlying resources if we provide the correct project name. This option is specifically designed to handle such cases.

Building a standalone binary with PyInstaller

Please replace google-api-python-client==2.80.0 with google-api-python-client==1.8.0 in pyproject.toml. After that, navigate to the scanner source code directory and use pyinstaller to compile a standalone binary:

pyinstaller -F --add-data 'roots.pem:grpc/_cython/_credentials/' scanner.py

Working with results

The GCP Scanner produces a standard JSON file that can be handled by any JSON Viewer or DB. We are providing a web-based tool that can help you visualize the results. To run the tool, please use the following command:

usage: gcp-scanner-visualizer -p 8080

GCP Scanner Visualizer

options:
  -h, --help            show this help message and exit
  -p PORT, --port PORT  Port to listen on default 8080

To know more about how to use the tool, please visit GCP Scanner Visualizer Usage Guide page.

If you just need a convenient way to grep JSON results, we can recommend gron.

Contributing

See CONTRIBUTING.md for details.

License

Apache 2.0; see LICENSE for details.

gcp_scanner's People

Contributors

0xdeva avatar abhi99555 avatar adeptvin1 avatar am0stafa avatar aryanagrawal22 avatar bhardwaj-himanshu avatar csemanish12 avatar dependabot[bot] avatar dhruvpal05 avatar ggold7046 avatar github-actions[bot] avatar grumpyp avatar hayk96 avatar mshudrak avatar natea123 avatar peb-peb avatar phyxius avatar rahuldubey391 avatar ro4i7 avatar rudrakshkarpe avatar shravankshenoy avatar sudiptob2 avatar sumit-158 avatar tatsuiman avatar tausiq2003 avatar worldworm avatar yahia3200 avatar zetatwo avatar zhenglin-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcp_scanner's Issues

BUG: Output is appended for each scan on a single project

Current behavior -> The output is appended to the same JSON file every time we scan the same SA.

maybe we could try the following:

  • generate output files for each scan based on the time of the scan <-- (As discussed in the mail, this should be the way to approach this issue)
    add a scan number/time of scan in the JSON itself.

For reference, I am running the following command multiple times-
gcp-scanner -l INFO -o output -k /home/peb/Downloads

crawl other GCP services - cloud run

Hi, are there any other GCP services you want to crawl? Like Cloud Run for example? Is there any mandatoray checks I need to do before including?

Couldn't find any roadmap or something so I would just start with that.

Implement a script to mirror test GCP project

Currently, there is no simple way for developers to mirror GCP project used in testing. However, this might be required for developers in order to locally test their version of GCP scanner or even debug tests itself.

It would be great to have a script that can deploy a mirror/copy of our test GCP project for new developers. We also need to provide documentation to ensure easy setup.

Include commit message and branch naming convensions in the `contributing.md`

Currently, our contributing.md document lacks a branch naming convention and example commit messages.

This document plays a crucial role in maintaining a cohesive structure throughout the project and will greatly aid new contributors in familiarizing themselves with the project. Therefore, we need to incorporate standardized commit message formats and branch naming conventions in accordance with established community best practices.

Make scanner parallel

We need to crawl GCP resources in parallel to improve performance of the scanner.

Refactor the "crawl.py" file to improve maintainability and organisation

Description:

The crawl.py file in our project contains 31 functions and over 500 lines of code. While this file may have started out as a convenient place to put all of our crawling functions, it will become unwieldy and difficult to maintain as the project grow so it such be fixed as early as possible.

I propose refactoring crawl.py by splitting the functions into separate files based on their relation and functionality. Here are some benefits of doing so:

  • Improved organization: Breaking up the code into smaller, more focused files will make it easier to find and modify specific functions. It will also make the codebase easier to navigate for new team members.

  • Easier testing: Smaller files with focused functions make it easier to test individual pieces of functionality. This will improve our testing coverage and make it easier to catch bugs before they reach production.

  • Reduced complexity: The size and complexity of crawl.py makes it difficult to reason about the code and understand how it all fits together. By breaking up the functions into separate files, we can reduce the cognitive load required to work with the codebase. Basically making the code more beginner-friendly

I propose that each file should contain a subset of related functions that perform a specific task. This will allow us to more easily reason about each file's purpose and the functions contained within it.

Related Issues

#2

Use logging.info and logging.error consistently

There is lack of consistency in how we use our printing functionality. In some functions, we rely on print functions to display errors and results while in others we use logging.info. We should use logging module everywhere.

Enable versioning for gcp_scanner

There is no version history for GCP Scanner. We need to evaluate what options Github offer and/or develop our own versioning schema for the scanner.

Style: code format problem

I am using PyCharm to develop our project. I am confused about how to let PyCharm format code in our convention. It seems that our community uses 2 incidents, but PyCharm always lets me format it to 4 incidents.

How can I solve this? Is there a style file I can import into my PyCharm?

image

Integration test for get_scopes_from_refresh_token token.

An integration test for get_scopes_from_refresh_token with the actual live server is needed.

DoD

  • fetch refresh token from the live server using Cloud KMS.
  • Use the token to extract its scope
  • assert if the scopes are the same as expected scopes
  • In the CI pipeline we will be using our test_project

Challenges

  • Requires a credit card to create a GCP account, hence to implement this you must have a credit card account.
  • Enabling Cloud KMS const as low as 3.00 USD

Update the repo description

Is your feature request related to a problem? Please describe.
Currently, no description is added to the Repository. A small description with tags will make the project more visible to search and will increase its reachability.

Describe the solution you'd like
Add a description in the project details.

Additional context
image png-mh

move PR template to `.github` directory as its not part of the source code

The pull_request_template.md helps the user by providing a checklist while creating a PR. However, it's not a part of the main source code. It's an extension for GitHub.
Hence it should be moved into .github directory.

Also in the future we might need to support multiple PR templates for multiple issue types (such as bug-fix, feature, ci, docs, etc.). Therefore, it's better to put it under the .github/PULL_REQUEST_TEMPLATE

Ref: GitHub official doc

Implement "short" report generation option

Is your feature request related to a problem? Please describe.
Right now, our scan result contains mostly all the data returned by GCP. It is hard to navigate and requires greping data.

Describe the solution you'd like
We can implement a flag that would tell the scanner to return only the most important data from GCP.

We need to define what would be the most important fields to return. We also need to think about a flexible approach on defining what to omit/return from the report.

Out-of-memory error when scanning large GCS buckets

We identified that when gcp_scanner is used against gcs buckets with large amount of files, there is a possibility of running out of memory. It happens because our current implementation stores names of all files in python list in the memory and saves it at the end of scanning which is very inefficient.

Relax unit and functional tests checks

Currently, we check for many individual fields and responses from GCP API. However, many GCP APIs are volatile and change overtime. We need to check for key components of the response rather than comparing full output line by line.

Code refactoring

Currently, we have one giant scanner loop where we launch crawlers from crawl.py. We need to split each crawler into individual module that will improve code readability and allow to parallelize the scan. Besides that, we can leverage python classes for the state of execution control, config parsing and enabling/disabling certain functions.

Publish gcp scanner to PyPI using github actions.

Is your feature request related to a problem? Please describe.
Currently, GCP scanner is not published in PyPI. Publishing it in PyPI will give the project more visibility as well as the flexibility to download the application. Also, it will provide better SEO (Search engine optimization)

Describe the solution you'd like

  • Prepare pyproject.toml
  • Prepare project in PyPI
  • Implement release versioning using GitHub tag.
  • A GitHub action
    • On the creation of release in GitHub it should trigger an action to publish it PyPI also.

Please suggest if you have any other improvements.

bug: small typo

what the problem is:
pyinstaller -F --add-data 'roots.pem:grpc/_cython/_credentials/" scanner.py should be pyinstaller -F --add-data 'roots.pem:grpc/_cython/_credentials/' scanner.py

what we should do:

- pyinstaller -F --add-data 'roots.pem:grpc/_cython/_credentials/" scanner.py
+ pyinstaller -F --add-data 'roots.pem:grpc/_cython/_credentials/' scanner.py

where the problem is:

`pyinstaller -F --add-data 'roots.pem:grpc/_cython/_credentials/" scanner.py`

BUG: `Trying {candidate_service_account}` gets logged directly instead of candidate_service_account

Affected Component

  • Crawl
  • CredsDB
  • Scanner
  • Test
  • Docs
  • Python
  • GCP APIs

Describe the bug

During Scanning resources with latest codebase, I got the following output at the end:

2023-03-15 11:49:28 - INFO - Retrieving credentials from /home/peb/Downloads/sa_keys/qwiklabs-gcp-02-f08fbae4db62-d34106e8dd27.json
...
...
2023-03-15 11:52:14 - INFO - Retrieving cloud source repositories qwiklabs-gcp-02-fa95c3a79f7c
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Trying {candidate_service_account}
2023-03-15 11:52:17 - INFO - Saving results for qwiklabs-gcp-02-fa95c3a79f7c into the file

To Reproduce

Steps to reproduce the behavior:
3. Run the command gcp-scanner -o output -k /home/peb/Downloads/sa_keys -l INFO

Expected behavior

2023-03-15 12:03:27 - INFO - Trying qwiklabs-gcp-02-fa95c3a79f7c@qwiklabs-gcp-02-fa95c3a79f7c.iam.gserviceaccount.com should be logged

Current behavior

Trying {candidate_service_account} gets logged

Screenshots

image

Insufficient error message for bad keys

Describe the bug

If one of the keys contains some error, for example the json is invalid or the key is not a valid key, GCP Scanner crashes with a very unhelpful error message.

To Reproduce

  1. Put a few keys in a directory with at least one of them invalid format
  2. Run gcp scanner pointing to that directory

Expected behavior

Either

  1. A warning log telling me exactly which key failed to parse and then continue
  2. An error log telling me exactly which key failed to parse and then stop
  3. An error log telling me exactly which key failed to parse, then continue parsing (for more potential errors) and then stop without scanning (so that you can handle all errors in bulk instead of fixing one and re-running.

Current behavior

GCP Scanner crashes with an error telling me that it failed to parse some key

fomatting error in the `contributing.md`

There is a formatting error in the contributing.md. This following line is not rendered correctly.

(Example: feat//gcp-compute-engine-support)

DOD

  • mark the line as code to fix the error. for example, change Example: feat//gcp-compute-engine-support -> Example: feat/<issue-id>/gcp-compute-engine-support

Do not stop if one unit test fails

Currently, our unit test process stops as soon as one test fails thereby making it hard to evaluate what else breaking without actually fixing the error and relaunching test again. We need to first run all tests and then print summary of failed functions.

docker build fails

Affected Component

  • Docker build

Describe the bug

This is a known issue with Python3.7
grpc/grpc#24556

=> [internal] load build context                                                                                                                                                                      0.3s
 => => transferring context: 93.91kB                                                                                                                                                                   0.0s
 => [2/8] RUN mkdir /home/sa_scanner                                                                                                                                                                   0.4s
 => [3/8] COPY src/ /home/sa_scanner/                                                                                                                                                                  0.0s
 => [4/8] COPY pyproject.toml /home/sa_scanner/                                                                                                                                                        0.0s
 => [5/8] COPY README.md /home/sa_scanner                                                                                                                                                              0.0s
 => [6/8] WORKDIR /home/sa_scanner                                                                                                                                                                     0.0s
 => ERROR [7/8] RUN pip install .                                                                                                                                                                     18.9s
------                                                                                                                                                                                                      
 > [7/8] RUN pip install .:                                                                                                                                                                                 
#12 1.139 Processing /home/sa_scanner                                                                                                                                                                       
#12 1.141   Installing build dependencies: started                                                                                                                                                          
#12 5.902   Installing build dependencies: finished with status 'done'                                                                                                                                      
#12 5.905   Getting requirements to build wheel: started                                                                                                                                                    
#12 5.967   Getting requirements to build wheel: finished with status 'done'
#12 5.972   Preparing metadata (pyproject.toml): started
#12 6.100   Preparing metadata (pyproject.toml): finished with status 'done'
#12 6.500 Collecting google-api-python-client==2.80.0
#12 6.662   Downloading google_api_python_client-2.80.0-py2.py3-none-any.whl (11.0 MB)
#12 9.259      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.0/11.0 MB 4.3 MB/s eta 0:00:00
#12 9.344 Collecting google-cloud-container==2.17.4
#12 9.391   Downloading google_cloud_container-2.17.4-py2.py3-none-any.whl (217 kB)
#12 9.441      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 217.7/217.7 kB 4.4 MB/s eta 0:00:00
#12 9.580 Collecting google-cloud-iam==2.11.2
#12 9.630   Downloading google_cloud_iam-2.11.2-py2.py3-none-any.whl (115 kB)
#12 9.652      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 115.0/115.0 kB 5.3 MB/s eta 0:00:00
#12 9.718 Collecting httplib2==0.21.0
#12 9.760   Downloading httplib2-0.21.0-py3-none-any.whl (96 kB)
#12 9.778      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.8/96.8 kB 5.5 MB/s eta 0:00:00
#12 9.825 Collecting pyu2f==0.1.5
#12 9.874   Downloading pyu2f-0.1.5.tar.gz (27 kB)
#12 9.885   Preparing metadata (setup.py): started
#12 10.28   Preparing metadata (setup.py): finished with status 'done'
#12 10.38 Collecting requests==2.28.2
#12 10.48   Downloading requests-2.28.2-py3-none-any.whl (62 kB)
#12 10.49      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB 4.8 MB/s eta 0:00:00
#12 10.65 Collecting google-auth<3.0.0dev,>=1.19.0
#12 10.70   Downloading google_auth-2.16.2-py2.py3-none-any.whl (177 kB)
#12 10.75      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.2/177.2 kB 4.1 MB/s eta 0:00:00
#12 10.80 Collecting google-auth-httplib2>=0.1.0
#12 10.84   Downloading google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
#12 10.93 Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5
#12 10.98   Downloading google_api_core-2.11.0-py3-none-any.whl (120 kB)
#12 11.01      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 120.3/120.3 kB 4.2 MB/s eta 0:00:00
#12 11.11 Collecting uritemplate<5,>=3.0.1
#12 11.19   Downloading uritemplate-4.1.1-py2.py3-none-any.whl (10 kB)
#12 11.31 Collecting proto-plus<2.0.0dev,>=1.22.0
#12 11.40   Downloading proto_plus-1.22.2-py3-none-any.whl (47 kB)
#12 11.49      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.9/47.9 kB 6.2 MB/s eta 0:00:00
#12 11.79 Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5
#12 11.83   Downloading protobuf-4.22.1-cp37-abi3-manylinux2014_aarch64.whl (301 kB)
#12 11.90      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 301.2/301.2 kB 4.7 MB/s eta 0:00:00
#12 11.98 Collecting pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2
#12 12.03   Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
#12 12.04      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 8.0 MB/s eta 0:00:00
#12 12.10 Collecting six
#12 12.14   Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
#12 12.25 Collecting charset-normalizer<4,>=2
#12 12.29   Downloading charset_normalizer-3.1.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (193 kB)
#12 12.32      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.0/194.0 kB 6.9 MB/s eta 0:00:00
#12 12.38 Collecting idna<4,>=2.5
#12 12.42   Downloading idna-3.4-py3-none-any.whl (61 kB)
#12 12.43      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.5/61.5 kB 6.6 MB/s eta 0:00:00
#12 12.50 Collecting urllib3<1.27,>=1.21.1
#12 12.55   Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
#12 12.57      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.9/140.9 kB 7.0 MB/s eta 0:00:00
#12 12.63 Collecting certifi>=2017.4.17
#12 12.68   Downloading certifi-2022.12.7-py3-none-any.whl (155 kB)
#12 12.70      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 7.5 MB/s eta 0:00:00
#12 12.76 Collecting googleapis-common-protos<2.0dev,>=1.56.2
#12 12.80   Downloading googleapis_common_protos-1.59.0-py2.py3-none-any.whl (223 kB)
#12 12.83      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 223.6/223.6 kB 8.1 MB/s eta 0:00:00
#12 13.46 Collecting grpcio<2.0dev,>=1.33.2
#12 13.51   Downloading grpcio-1.51.3.tar.gz (22.1 MB)
#12 16.95      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.1/22.1 MB 6.9 MB/s eta 0:00:00
#12 18.13   Preparing metadata (setup.py): started
#12 18.38   Preparing metadata (setup.py): finished with status 'error'
#12 18.38   error: subprocess-exited-with-error
#12 18.38   
#12 18.38   × python setup.py egg_info did not run successfully.
#12 18.38   │ exit code: 1
#12 18.38   ╰─> [14 lines of output]
#12 18.38       Traceback (most recent call last):
#12 18.38         File "<string>", line 2, in <module>
#12 18.38         File "<pip-setuptools-caller>", line 34, in <module>
#12 18.38         File "/tmp/pip-install-7egp0l84/grpcio_2bc3ade628c0443a83519a9e2c691087/setup.py", line 262, in <module>
#12 18.38           if check_linker_need_libatomic():
#12 18.38              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#12 18.38         File "/tmp/pip-install-7egp0l84/grpcio_2bc3ade628c0443a83519a9e2c691087/setup.py", line 209, in check_linker_need_libatomic
#12 18.38           cpp_test = subprocess.Popen(cxx + ['-x', 'c++', '-std=c++14', '-'],
#12 18.38                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#12 18.38         File "/usr/local/lib/python3.11/subprocess.py", line 1024, in __init__
#12 18.38           self._execute_child(args, executable, preexec_fn, close_fds,
#12 18.38         File "/usr/local/lib/python3.11/subprocess.py", line 1901, in _execute_child
#12 18.38           raise child_exception_type(errno_num, err_msg, err_filename)
#12 18.38       FileNotFoundError: [Errno 2] No such file or directory: 'c++'
#12 18.38       [end of output]
#12 18.38   
#12 18.38   note: This error originates from a subprocess, and is likely not a problem with pip.
#12 18.38 error: metadata-generation-failed
#12 18.38 
#12 18.38 × Encountered error while generating package metadata.
#12 18.38 ╰─> See above for output.
#12 18.38 
#12 18.38 note: This is an issue with the package mentioned above, not pip.
#12 18.38 hint: See above for details.
#12 18.40 
#12 18.40 [notice] A new release of pip available: 22.3.1 -> 23.0.1
#12 18.40 [notice] To update, run: pip install --upgrade pip
------
executor failed running [/bin/sh -c pip install .]: exit code: 1

To Reproduce

docker build -t gcp_scan .

I am willing to fix it and open a PR :)

Fix unit_test failure

Our most up-to-date test fails with the following error


The following line was not identified in the output:
                "compute.googleapis.com/quota/local_ssd_total_storage_per_vm_family/exceeded"

The following line was not identified in the output:
                "compute.googleapis.com/quota/local_ssd_total_storage_per_vm_family/usage"

The following line was not identified in the output:
                "file.googleapis.com/nfs/server/connections"

The following line was not identified in the output:
                "logging.googleapis.com/billing/ingested_bytes"

It seems like Compute Engine API, File and Logging has changed. We need to mark corresponding lines as volatile and check that test is passing.

Improving Indentation and adding Comments in codebase

Problem

There are few indentation issues and the code lacks the comments that would help it be easier to read.

Description

Adding suitable comments in the codebase and maintaining-proper indentation throughout

Additional context

Comment serve as a reminder to change something in the future. Code commenting also becomes invaluable when you want to ‘comment out’ code. This is when you turn a block of code into a comment because you don’t want that code to be run, but you still want to keep it on hand in case you want to re-add it.

Implementation

  • I would be interested in implementing this feature.

Refactor the "scanner.py" file to improve maintainability and organization

Is your feature request related to a problem? Please describe.
The scanner.py file has a function called crawl_loop which creates a dict which could get a bit messy and not nice to work with in the future.

I propose to introduce dataclasses for each project. We could work with base-classes from here on and add the other functionality on each class independently. The crawlers should also be more generic like proposed in #108.

The concept would also integrate with the propsed #108

From there on it would be way easier and clear future implementations would be straight forward.

Describe the solution you'd like
All projects will be dataclasses.

# keep in mind this is only an example and not optimized yet, I want to showcase a cleaner implementation generelly
import gcp_scanner

class ScannerProcess():
  def __init__(self):
    projects = []
  
  @staticmethod
  def load(config: Dict) -> List[Crawler]:
    crawlers = []
    for crawl in config.keys():
      getattr(gcp_scanner, crawl)()

scanner = GpcScanner.load(config)
for project in projects:
  p = Project(name="project['projectId']")
  for scan in scanner: 
    p.scans.append(scan.load_data(p.project_number, credientials))

the if is_set(scan_config, 'service_accounts'): would be replaced by something like that:

# config.json
{
"compute_instances": {
    "fetch": true,
    "comment": "Fetch metadata about GCP VMs"
  },
  "compute_disks": {
    "fetch": true,
    "comment": "Fetch metadata about GCE disks"
  }
}
/gcp_scanner
  - __init__.py
# init.py includes the imports of all the crawler classes like
from gcp_scanner.crawler import GkeCluster
class Basecrawl(Abstract):
  def load_data():
    pass

class GkeCluster(Basecrawl):
  def load_data():
    #do the specific thing for the GkeCluster Crawler

I hope my thoughts are not too abstract now, but my concept would basically create objects and classes for the Crawlers.

I add a brief description of a AI to summarize the concept:

The problem being addressed is that the crawl_loop function in scanner.py creates a messy dictionary that is not easy to work with. The proposed solution is to introduce data classes for each project, which would make the code more organized and easier to work with in the future.

The proposed solution would involve creating a base class for all the crawlers, which would have a method called load_data(). Then, for each specific crawler (e.g., GkeCluster), a subclass would be created that would inherit from the base class and implement the load_data() method with the specific functionality for that crawler.

The crawl_loop function would be replaced by a function that loads the configuration from a JSON file, creates instances of the appropriate crawler classes based on the configuration, and then calls the load_data() method on each crawler instance to fetch the data for the project. The result of this process would be a list of projects, each with its own data.

The proposed solution would make the code more modular and easier to maintain in the future. By using data classes and inheritance, the code would be more organized and easier to extend with new functionality. Additionally, by using a configuration file to specify which crawlers to use, the code would be more flexible and adaptable to different use cases.

Scanning with multiple keys from the same project fails

Describe the bug

If you have two different keys for the same project in a scan, only the first scan will be saved since the log files are named [project]_[timestamp].json.

To Reproduce

  1. Put two different keys for the same project in a directory
  2. Run GCP Scanner with these keys

Expected behavior

Perform the scan and create two different log files, one for each key, probably by adding the key id to the filename as well.

Current behavior

The first key works fine but the second will fail with an error since the name of the log files collide.

Allow gcp_scanner to work with more scopes

As for now, gcp_scanner only supports retrieving and using OAuth2 tokens with cloud-platform scope granted. All other scopes will raise an error. However, some scopes would allow to retrieve partial information (e.g. https://www.googleapis.com/auth/devstorage.read_only to read GCS buckets and files)

Re-evaluation of the python version in Dockerfile related to #114

As mentioned in #114 we want to re-evaluate if the build would work with python3.11.

All related information can be found in #114 and #115

EDIT:

The issue is that there is an error during a Docker build related to the current python version in the docker file FROM python:3-slim-buster and the grpcio package. There have been suggestions to resolve the issue by installing build-essential python-dev which end up in a long build time > 20 min.

As a solution, it has been suggested to open a ticket to re-evaluate the issue after some time has passed to see if the issue has been resolved. The proposed solution is to lock to Python 3.10 for now, but re-evaluate the issue after some time has passed.

This will ensure that the issue is resolved while also avoiding the risk of locking to a specific version causing issues in the future.

I'd suggest to check again in 2 months.

bug: .github directory does not support PR_TEMPLATE aggregation

Due to the #66, it created a bug. Actually .github folder does not support PR_TEMPLATE aggregation. Hence the due to the PR #67, now we don't see PR template while creating a PR.

DOD

  • move pull_request_template.md to the .github/pull_request_template.md
  • remove the PR_TEMPLATE folder

🐛 MacOS Support for `gcp_scanner` development

As discussed in PR #123 here.

What should be the recommended development environment for MacOS.
Currently, the tool supports linux. So, wsl makes it possible to develop it on windows too.

If any solutions are found, they should be added to the appropriate docs.

Implement Cloud DNS policies crawler.

Cloud DNS API allows listing available DNS policies in the project.
A scanner to list the DNS policies can undoubtedly increase the capability of GCP Scanner

We already have a scanner for DNS Managed Zone (code), By using similar strategy we can scan Cloud DNS policies.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.