Coder Social home page Coder Social logo

ram's People

Contributors

brunoreboul avatar dependabot[bot] avatar sdenef-adeo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ram's Issues

Missing assets in BQ asset table leading to missing ancestryPaths in result dashboards

Usually this corner case in not that visible as for many asset types the information is already provided by the RESOURCE feed
.
Nervertheless, this issue is visible when:

  • RAM is set up to focus on IAM policies only (no RESOURCe feed)
  • An asset type have IAM policies and no configure RESOURCE feed

Fix: add the instance stream2bq_iam_assets to the microservice stream2bq configured to be triggered on the Pubsub topic cai-iam-policies and writing to the BQ table assets

ram -config configure stream2bq instances using method deployment.configureStream2bqAssetTypes() in the package ramcli

AssetType is empty for CGI GroupsSettings and GroupMembers assets in `last_compliancestatus` view

AssetName pattern
//directories//groups//groupSettings
//directories//groups//members/

The number of groups, groupsettings and groupmembers may be 10x the number of GCP assets.
There is not valuable ancestryPath for these assets.
In consequence these assets are not streamed into BQ asset table. The cost/value ratio is too low.

In Last_compliancestatus view, the assetType comes from the asset table. Explaining while assetType is empty for theses assets.

Workarround: update last_compliancestatus view SQL code to deduct the assetType from the asset name for theses assets

www.googleapis.com/admin/directory/groups
www.googleapis.com/admin/directory/members
groupssettings.googleapis.com/groupSettings

missing go.sum entry error RAM deployment failed in cloud build

Error:

Step #1 - "build a fresh ram cli": Already have image: golang
Step #1 - "build a fresh ram cli": go: github.com/BrunoReboul/[email protected]: missing go.sum entry; to add it:
Step #1 - "build a fresh ram cli":  go mod download github.com/BrunoReboul/ram

Occurs on any build, including existing successful build when hitting RETRY

Root Cause:

Cloud function GO runtime is bounded to GO v1.13

RAM developper’s GO version is aligned to the same version to avoid using new GO feature that won t be available on cloud functions.

go version go1.13.8 linux/amd64

Cloud Build environment uses the online GOLANG container that is continuously updated, currently using GO v1.16

Step #0 - "display go language version": Status: Downloaded newer image for golang:latest
Step #0 - "display go language version": docker.io/library/golang:latest
Step #0 - "display go language version": go version go1.16 linux/amd64

With GO 1.16 the -mod option for automatic updates default value changes

In Go 1.15 and lower, the -mod=mod flag was enabled by default, so updates were performed automatically. Since Go 1.16, the go command acts as if -mod=readonly were set instead: if any changes to go.mod are needed, the go command reports an error and suggests a fix.

Proposed fix

Update the Cloud Build triggers build steps definition:
add go version to ease troubleshooting by always displaying which version of GO is used by the GOLANG Docker container
Add to the go build command the flag -mod=mod to allow automatic updates

steps:
 - name: golang
   args:
     - go
     - version
   id: display go language version
 - name: golang
   args:
     - go
     - build
     - '-mod=mod'
     - ram.go
   id: build a fresh ram cli

As RAM user, I want Owner and Resolver field to be always populated so that the % of assigned non compliance is increased so that the % of resolution is improved too

Always populated means:

  • Have a yaml content providing owner and resolver identity for a given folder ID or org ID (aka address missing gcp label on folder and orgs)
  • When a resources does not have a owner of resolver label, use the parent project's owner and resolver label
  • When a project does not have a owner of resolver label, use the parent folder's owner and resolver provided in the yaml content
  • When a folder does not have a owner of resolver label, use the parent folder's owner and resolver provided in the yaml content

Good practice, at least have owner and resolver described to each top level folders

To deal with emails in label value:

  • replace _dot_ by .
  • replace _at_ by @
    example: marie-pierre_dot_dupondt_at_mycompany_dot_com translates to [email protected]

Update the DataStudio report template accordingly to leverage owner and resolver

As RAM manager I want a `compliant` field (true/false) in `monitor` service finish logs entries to ease filtering

Expected behaviour

When searching about monitor service logs in Cloud Logging, I would like to filter log entries depending on the compliance result status. Then please add a field for that.
It could be one of:

  • compliant: true|false
  • status: compliant|not_compliant|whatever

Actual behaviour

In the current version I have to filter on the message field which starts with finish compliant|not_compliant

RAM version

v0.3.2

Dumpinventory should optimize retry when :quota exceeded" / exhausted

Context:

  • All dumpinventory jobs start on the same cloud scheduler
  • The number on dumpinventory = (number of asset in solution.yaml + 1 (iam)) x number of org
    • Example: (34 + 1)x2 = 70
  • While Cloud Asset Inventory quota is limited to 60 request/minute
  • The cloud function exit on error / redo_on_transitent, which get worst as the defaut backoff do arround 100 retry on the first minute, keeping the quota exhauted

Proposed work arround:

  • If the error from the requested API is "quota exceeded" then wait for a configurable timer before exiting
  • pro:: fix the issue
  • con: more CPU time to pay for, while waiting, balanced by may be couple of tens cloud function, once a week. anyway useless retry are also paid when nothing is done to fix

Bug: stream2bq insertID too long

BigQuery streaming insertID field length max is 128

violation table insertID may be more than 1000 length

This issue did not popup before using cloud.google.com/go/bigquery 1.16.0

Proposed solution: hash the string the reduce its size
The underlying one way function preserving the BigQuery best effort to deduplicate

As Compliance Manager, I want one files (yaml, csv) with all configured constraints so that a text diff tool can be used to compare 1) sets of rules 2) over time

file name: constraints.xaml, constraints.csv
Example of yaml structure:

services:
  - name: bq
    rules:
      - name: dataset_location
        constraints:
          - apiVersion: constraints.gatekeeper.sh/v1alpha1
            kind: GCPBigQueryDatasetLocationConstraintV1
            metadata:
              name: myorg_sanboxes_europe_bq

File location: in monitor folder
Add the file to .gitiignore to avoid conflct
make it on anay ramcli execution

Bug: very first deployment of RAM stuck due to race on resource manager quota

Error

  • Deployemnt all microservices instance at the same time by using a ram-vx.y.z-env tag does not guarentee in which order the instances are deployed (each deployment is designed to be idempotent
  • setfeed deployments ussualy complete in 1min30sec while deployment based on cloud functions complete in 4min30sec
  • This lead to activate the real time tirggers while publish2fs cache as not yet been deplyed
  • the monitor instance are triggered by the realtime flows, as the cache is empty each execution fallback on querying resource manager to resolve org / folders / projects ids into displayNames
  • resources manager quotas it far smaller than the rate of real time changes on many existing org. leading to continuously exhaust resource manager quotas
  • the remaining deployment then all fail as each deployment nead a couple of query to resource manager to check iam bindigns.
  • leading to a dead lock.

Workarround

  • delete the CAI feeds,
  • relaunch all deployments but setfeeds
  • Trigger manually dumpinventory for Orgs, Folders, Projetcs, wait for the firestore cahce to be populated
  • Deploy setfeeed as the last microservice.

Fix

  • remove the fall back mechanism to query resource manager when the data is not found in cache, avoid so the dead lock to occur
  • Simplify the install doc accordingly

Refactor initialretrycheck, when to exit error, what to log

  1. get RID of function InitialRetryCheck, as it does not save any code redundancy
  2. How to exit:
  • exit error means log entry with REDO_ON_TRANSIENT + specific message, e.g.:
    • not able to retrieve function cloud function metadata mean exit error to retry
  • exit nil
    • case ERROR, means having logged NORETRY_ERROR + specific message, e,g,:
      • pubsub message too old
    • case INFO, means having logged NORETRY_INFO
  1. Always include pussub_id %s (unless not retrievable) to enable tracing of retries

For each core.go:

  • global type: add PubSubID string, keep the structure ordered
  • initialze, replace "ERROR - " by "" as all issues are returned in an err structure
  • EntryPoint func:
    • Replace InitialRetryCheck by direct code
    • Update all log messages
    • Remove retry not retry comment

decommision initialRetryCheck

ram -pipe without -service nor -instance lead to weird regex trigger tags on splitdump and publish2fs microservices

ram -pipe without -service nor -instance means target all microservices all instances
weird means:

  • (^publish2fs_cloudresourcemanager_Folder-v\d*.\d*.\d*-prd)|(^publish2fs-v\d*.\d*.\d*-prd)|(^ram-v\d*.\d*.\d*-prd)|(^container_Cluster-v\d*.\d*.\d*-prd)
  • (^splitdump_single_instance-v\d*.\d*.\d*-prd)|(^splitdump-v\d*.\d*.\d*-prd)|(^ram-v\d*.\d*.\d*-prd)|(^bigquery_Dataset-v\d*.\d*.\d*-prd)

Upgrade dependencies v0.3.1

Last update was on 2020-07-08
Update dependencies:

cloud.google.com/go v0.60.0 => v0.72.0
cloud.google.com/go/bigquery v1.9.0 => v0.13.0
cloud.google.com/go/firestore v1.2.0 => v1.3.0
cloud.google.com/go/logging v1.0.0 => v1.1.2
cloud.google.com/go/pubsub v1.4.0 => v1.8.3
cloud.google.com/go/storage v1.10.0 => v1.12.0
github.com/google/uuid v1.1.1 => v1.1.2
github.com/open-policy-agent/opa v0.21.0 => v0.24.0
golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d => v0.0.0-20201109201403-9fd604954f58
google.golang.org/api v0.28.0 => v0.35.0
google.golang.org/genproto v0.0.0-20200626011028-ee7919e894b5 => v0.0.0-20201119123407-9b1e624d6bc4
gopkg.in/yaml.v2 v2.3.0 => v2.4.0

Name length greater than 63 lead to failed cloud function deployment

Error

Cloud function Instance dumpinventory_org_cloudresourcemanager_Organization deployment failed.

Cause

Cloud function names length is limited to 63

OrganizationIDs have variable lengh, the string length is 51 without the organizationID, leaving 12 for the organizationID while it can more more (seen 14)

Fix

Truncate cloud function mame to max 63

Bug, Failure during cold start must exit with error to avoid invalid cloud function instance to receive more traffic

When errors occur during the initialize function they are logged on purpose as basic log entry, so that the cloud function terminates without error to avoid retries.

As retry is avoided by design then the pubsub message is acknowledged, so that it is not persisted to bigquery leading to data losses.

This behavior addresses well non transient errors like 403 missing permission where 1) retry will not solve the 403 and 2) it avoids having to pay for compute time during 1 hours of retries.

On another hand, If the error is a transient that occurs for a proportion of execution, like 443: write: connection reset by peer then it leads to data losses of the same proportion of pubsub message, while it would be worth retrying.

As RAM manager, I want a tool to check if configured instances have a cloud build trigger so that it saves time to do this control

A group of instances means:

  • the instances of a microservice, e.g, upload2gcs
  • the instances of all microservices aka RAM
  • the instances related to an asset type (TO BE CHALLENGED)

have cloud build trigger assume an environment is provided to find the associated project hosting the triggers

In complement to succeed of failed (issue an error) the result should list what is found vs what is missing.

Bug, ram cli -check -deploy rotate keys while it should not

scope: convertfeed2logs, listgroups, listgroupmembers, getgroupsettings
impact: as new jey are create while not needed, and recorded to firestore, the running usefull key are no more protected (name overwritten in firestore), which will lead to deletion of running keys in a multi org scenarion at the next time the cloud function is instanciateed (exe initialize)

sfix: during deploy, only create a record keys when mode is not -check, to be implemented in the 4 impacted services

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.