My personal Palladius gem
palladius / clouddeploy-platinum-path Goto Github PK
View Code? Open in Web Editor NEWCloud Build + Cloud Deploy from the ground up
License: Apache License 2.0
Cloud Build + Cloud Deploy from the ground up
License: Apache License 2.0
gcloud compute instances create sol0-pvt-connect --zone=$REGION-b \
--machine-type=e2-small --network-interface=subnet=dmarzi-proxy,no-address \
--maintenance-policy=MIGRATE --provisioning-model=STANDARD --service-account=$PROJECT_NUMBER-compute@developer.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append \
--tags=http-server,https-server \
--create-disk=auto-delete=yes,boot=yes,device-name=sol0-pvt-connect,image=projects/ubuntu-os-cloud/global/images/ubuntu-minimal-2204-jammy-v20220712,mode=rw,size=100,type=projects/cicd-platinum-test002/zones/europe-west1-b/diskTypes/pd-balanced \
--no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --reservation-affinity=any
# ERROR: (gcloud.compute.instances.create) Could not fetch resource:
# - Subnetwork must have purpose=PRIVATE.
My app has been upgraded this morning from 2.19 to 2.20.
However, the dev service still points to v2.14. I've shot in the face Service and pod, I've restored the service via cloud deploy but nothing,, still 2.14
Let me show you.
2.20
[DEBUG] DEBUG has been enabled! Please change to DEBUG=FALSE in your .env.sh to remove this. Some impotant fields:
[DEBUG] PROJECT_ID: 'cicd-platinum-test002'
[DEBUG] ACCOUNT: '[email protected]'
[DEBUG] GITHUB_REPO_OWNER: 'palladius'
[DEBUG] GCLOUD_REGION: 'europe-west1'
[DEBUG] GKE_REGION: 'europe-west1'
Getting status of my 8 apps them all WOW:
== app01: python web app ==
HTTP_RESPONSE: [DEV] 200 for http://34.78.131.190:8080/ ''
HTTP_RESPONSE: [STAG] 200 for http://34.76.157.232:8080/ ''
HTTP_RESPONSE: [CANA] 200 for http://34.140.176.245:8080/ ''
HTTP_RESPONSE: [PROD] 200 for http://35.187.20.43:8080/ ''
The freshest is dev, look : http://34.78.131.190:8080/
=> App01 (๐) v2.14 Hello world from Skaffold in python! [...] app01 (๐) v2.14 TARGET=None
(Yes I've tried reloading and curling - same result)
Script currently assumes default network exists and fails if it doesn't. It might be worth adding gcloud compute networks create default
to the script.
Also, the command takes a while to run so I would probably use --async and then have them run a "watch" command
This solution is currently not feasible.
There is an alpha solution which I started working onto, but I want to wait until the solution doesn't require project allowlisting (at this moment I can try but other users can't).
A colleague just noticed that this is blocking the GItHub connect
Issue is tracked here: https://issuetracker.google.com/issues/251424997
And is manifested by a "Unable to list repositories" error on the (3) Select repository as in the above bug.
I know engineers are working on it. Please follow progress in the public bug.
Talking to Daniel I realized that the sol1 has always been Single Cluster, not MC!
No biggie, seems like the best thing to do is to refactor the code into supporting Dev/Staging which are already in a single cluster (DEV).
Also, they are 1 pod each, so demonstratic proper traffic splitting seems less hypocritical (canary/prod have instead a 80/20 as of now).
ricc@cloudshell:~/clouddeploy-platinum-path (cloud-ops-sandbox-1808198420)$ ./00-init.sh
Created [cicd-platinum-02].
Activated [cicd-platinum-02].
ERROR: (gcloud.config.configurations.activate) Cannot activate configuration [cicd-platinum-02], it does not exist.
ERROR: (gcloud.config.configurations.create) Cannot create configuration [cicd-platinum-02], it already exists.
Rob Edward wrote: To round out the narrative we use when it comes to progressive rollout and SLI/SLO (well error budget) monitoring adding visibility in Ops Suite to support and monitor the rollout etc. Although it sounds like Cloud Deploy will expose something like this in a few months.
My idea:
templates/
)?sleep 1
cat template | sed s/MYPERCENTAGE/10/g | kubectl apply
sleep 5
cat template | sed s/MYPERCENTAGE/50/g | kubectl apply
sleep 5
cat template | sed s/MYPERCENTAGE/90/g | kubectl apply
sleep 5
cat template | sed s/MYPERCENTAGE/100/g | kubectl apply
just an idea..
I'm sad.
Everything seems corectly setup:
But still when I curl the IP i get a 404.
Seems lie solution1 is now complete. I create everything correctly, with the only issue that my GKE LoadBalancers
are unable to pull a public IP.
When i did this with Daniel for the first time it took it a while to get it up and running (a few hours) so since its 1800 I'll wait until tomorrow morning. But still pasting the output:
bin/kubectl-canary get gateways | egrep "sol1-app"
[CANA] sol1-app01-eu-w1-ext-gw sol1-app01-eu-w1-gke-l7-gxlb 27m
[CANA] sol1-app02-eu-w1-ext-gw sol1-app02-eu-w1-gke-l7-gxlb 15m
and again:
๐ผ bin/kubectl-canary describe gateways.gateway.networking.k8s.io sol1-app01-eu-w1-ext-gw
[DEBUG] DEBUG has been enabled! Please change to DEBUG=FALSE in your .env.sh to remove this. Some impotant fields:
[DEBUG] PROJECT_ID: 'cicd-platinum-test002'
[DEBUG] ACCOUNT: '[email protected]'
[DEBUG] GITHUB_REPO_OWNER: 'palladius'
[DEBUG] GCLOUD_REGION: 'europe-west1'
[DEBUG] GKE_REGION: 'europe-west1'
[CANA] W0718 18:17:39.258998 1315977 gcp.go:120] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.25+; use gcloud instead.
[CANA] To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
[CANA] Name: sol1-app01-eu-w1-ext-gw
[CANA] Namespace: default
[CANA] Labels: <none>
[CANA] Annotations: <none>
[CANA] API Version: gateway.networking.k8s.io/v1alpha2
[CANA] Kind: Gateway
[CANA] Metadata:
[CANA] Creation Timestamp: 2022-07-18T15:48:55Z
[CANA] Generation: 1
[CANA] Managed Fields:
[CANA] API Version: gateway.networking.k8s.io/v1alpha2
[CANA] Fields Type: FieldsV1
[CANA] fieldsV1:
[CANA] f:metadata:
[CANA] f:annotations:
[CANA] .:
[CANA] f:kubectl.kubernetes.io/last-applied-configuration:
[CANA] f:spec:
[CANA] .:
[CANA] f:gatewayClassName:
[CANA] f:listeners:
[CANA] .:
[CANA] k:{"name":"http"}:
[CANA] .:
[CANA] f:allowedRoutes:
[CANA] .:
[CANA] f:kinds:
[CANA] f:namespaces:
[CANA] .:
[CANA] f:from:
[CANA] f:name:
[CANA] f:port:
[CANA] f:protocol:
[CANA] Manager: kubectl-client-side-apply
[CANA] Operation: Update
[CANA] Time: 2022-07-18T15:48:55Z
[CANA] Resource Version: 8797104
[CANA] UID: abca00d1-c099-48ee-af3c-8227eec271f0
[CANA] Spec:
[CANA] Gateway Class Name: sol1-app01-eu-w1-gke-l7-gxlb
[CANA] Listeners:
[CANA] Allowed Routes:
[CANA] Kinds:
[CANA] Group: gateway.networking.k8s.io
[CANA] Kind: HTTPRoute
[CANA] Namespaces:
[CANA] From: Same
[CANA] Name: http
[CANA] Port: 80
[CANA] Protocol: HTTP
[CANA] Status:
[CANA] Conditions:
[CANA] Last Transition Time: 1970-01-01T00:00:00Z
[CANA] Message: Waiting for controller
[CANA] Reason: NotReconciled
[CANA] Status: Unknown
[CANA] Type: Scheduled
[CANA] Events: <none>
Following the docs linearly, I get the following errors when running ./08-cloud-deploy-setup.sh
.
2022-07-11T16:26:18.246728928Z Copying file:///workspace/rendered/skaffold.yaml [Content-Type=application/octet-stream]...
Info
2022-07-11T16:26:18.427942055Z AccessDeniedException: 403 [email protected] does not have storage.objects.create access to the Google Cloud Storage object.
Info
2022-07-11T16:26:18.453183206Z CommandException: 1 file/object could not be transferred.
Info
2022-07-11T16:26:18.981071868Z ERROR
Info
2022-07-11T16:26:18.981105981Z ERROR: could not upload /workspace/rendered/skaffold.yaml to gs://us-central1.deploy-artifacts.plat-path-00.appspot.com/app01-20220711-1625-v2-3-fa8aa86c7c3244e3b050e9c030241a20/canary/; err = gsutil exited with non-zero status: 1
Info
2022-07-11T16:26:19.142109184Z / [0/1 files][ 0.0 B/ 543.0 B] 0% Done
Giving [email protected]
Storage Object Creator role fixes the issue.
2022-07-11T16:42:47.972602063Z starting build "d763b2ea-b6b5-486e-987c-738fb5864c41"
Info
2022-07-11T16:42:47.972660475Z FETCHSOURCE
Info
2022-07-11T16:42:47.972979712Z BUILD
Info
2022-07-11T16:42:48.607019999Z Pulling image: gcr.io/k8s-skaffold/skaffold:v1.37.2-lts
Info
2022-07-11T16:42:49.557503177Z v1.37.2-lts: Pulling from k8s-skaffold/skaffold
Info
2022-07-11T16:42:49.562615333Z Digest: sha256:0bde2b09928ce891f4e1bfb8d957648bbece9987ec6ef3678c6542196e64e71a
Info
2022-07-11T16:42:49.565943896Z Status: Downloaded newer image for gcr.io/k8s-skaffold/skaffold:v1.37.2-lts
Info
2022-07-11T16:42:49.573509460Z gcr.io/k8s-skaffold/skaffold:v1.37.2-lts
Info
2022-07-11T16:42:54.820596796Z Copying gs://plat-path-00_clouddeploy_us-central1/source/1657557764.190844-f107cc873b384a01835d89bb1df1c4ca.tgz...
Info
2022-07-11T16:42:54.883669804Z / [0 files][ 0.0 B/216.2 KiB] / [1 files][216.2 KiB/216.2 KiB]
Info
2022-07-11T16:42:54.883684643Z Operation completed over 1 objects/216.2 KiB.
Info
2022-07-11T16:42:56.022407681Z profile selection ["production"] did not match those defined in any configurations. Check that values specified in the "--profile" or "-p" flags are valid profile names.
Info
2022-07-11T16:42:56.486140146Z ERROR
Info
2022-07-11T16:42:56.486173988Z ERROR: build step 0 "gcr.io/k8s-skaffold/skaffold:v1.37.2-lts" failed: step exited with non-zero status: 1
Change clouddeploy.template.yaml
to reference "prod" instead of "production"
2022-07-11T18:30:38.047032735Z Fetching cluster endpoint and auth data.
Info
2022-07-11T18:30:38.131125194Z ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission(s) for "projects/plat-path-00/locations/us-central1/clusters/cicd-dev".
Info
2022-07-11T18:30:38.747207944Z ERROR
Info
2022-07-11T18:30:38.747248722Z ERROR: build step 0 "gcr.io/k8s-skaffold/skaffold:v1.37.2-lts" failed: step exited with non-zero status: 1
Giving [email protected]
Kubernetes Engine Developer role fixes the issue.
2022-07-11T18:50:11.982941521Z Fetching cluster endpoint and auth data.
Info
2022-07-11T18:50:12.228428165Z kubeconfig entry generated for cicd-dev.
Info
2022-07-11T18:50:13.210498953Z Starting deploy...
Info
2022-07-11T18:50:18.973523079Z - service/app02-kuruby created
Info
2022-07-11T18:50:20.485026404Z - Warning: Autopilot set default resource requests for Deployment default/app02-kuruby, as resource requests were not specified. See http://g.co/gke/autopilot-defaults.
Info
2022-07-11T18:50:20.485711420Z - deployment.apps/app02-kuruby created
Info
2022-07-11T18:50:20.487990601Z Waiting for deployments to stabilize...
Info
2022-07-11T18:50:27.532443368Z - deployment/app02-kuruby: no nodes available to schedule pods
Info
2022-07-11T18:50:27.532455203Z - pod/app02-kuruby-68f787f7-mcpwj: no nodes available to schedule pods
Info
2022-07-11T18:53:04.685079362Z - deployment/app02-kuruby: container app02-kuruby is waiting to start: ricc-app01-ruby-example can't be pulled
Info
2022-07-11T18:53:04.685090778Z - pod/app02-kuruby-68f787f7-mcpwj: container app02-kuruby is waiting to start: ricc-app01-ruby-example can't be pulled
Info
2022-07-11T18:53:04.685092281Z - deployment/app02-kuruby failed. Error: container app02-kuruby is waiting to start: ricc-app01-ruby-example can't be pulled.
Info
2022-07-11T18:53:04.686693150Z 1/1 deployment(s) failed
Info
2022-07-11T18:53:04.956703356Z ERROR
Info
2022-07-11T18:53:04.956730985Z ERROR: build step 0 "gcr.io/k8s-skaffold/skaffold:v1.37.2-lts" failed: step exited with non-zero status: 1
TBD
Is there a reason why the namespace part was put in the apps: https://github.com/palladius/clouddeploy-platinum-path/blob/main/apps/app01/k8s/overlays/dev/namespace.yaml
and not consolidated in the base components/ part (https://github.com/palladius/clouddeploy-platinum-path/tree/main/components/common/dev)?
I don't see a reason to do this within apps instead of on component-level. And I would strongly recommend lifting this into component like you are proposing. ๐
Talked to Alex who pointed out this cannot work in MC mode.
Talked to Daniel who confirmed sol1 does NOT support MC.
I see three solutions:
3 problems with kustomize:
I'd love a kustomize expert to help me on this.
ricc@ricc:~/git/clouddeploy-platinum-path$ ๐ผ find . -name components
./components
./apps/components
Took me a week to fix it, but when last night I finally got it to work, I've noticed that script 15-solution2-xlb-GFE3-traffic-split.sh
is not 100% multitennant.
While it's now able to Load Balance with traffic splitting a generic app01 // app02, some artifacts (like FWD_RULE) have a fixed name which doesnt allow me to have it parametric in APP_ID. this means that once you call it with app01, it creates GCP infrastructure to route to app01, and then you do it with app02 and it changes and route to app02.
There's currently NO WAY to route both differently and separately.
Things I need to change in 15-solution2-xlb-GFE3-traffic-split.sh:
To keep things easy, I will enforce this best PRactice: every new name shall be:
new_var := $APPXX-$OLDNAME
and to make this even obviouser ;) I will rename the variables from
OLD_VAR_NAME
=> OLD_VAR_NAME_MTSUFFIX
for easy of grepping.
[from bielski:]
The image tagging in cloud-build/02-dev-to-staging-auto-promo script fails.
Why does it fail?
In order to recreate an existing tag (which the used gcloud command seems to do), the service account would need artifactregistry.tags.delete permission which is not included in the artifactregistry.writer role.
There are multiple tags that are potentially being overwritten:
latest
latest-cb2
$DOCKER_IMAGE_VERSION is the same as vSUPERDUPER_MAGIC_VERSION which also creates a conflict.
Solutions:
a) give an additional role to cloud build svc acc (roles/artifactregistry.repoAdmin)
b) prevent tagging duplicates (which would prevent us from using latest tag)
$ gcloud projects get-iam-policy ricc-cicd \
--flatten="bindings[].members" \
--format='table(bindings.role)' \
--filter="[bindings.members:[email protected]](mailto:bindings.members%[email protected])"
ROLE
roles/artifactregistry.reader
roles/container.developer <= was missing
roles/container.nodeServiceAgent
roles/storage.objectAdmin <= was missing
am I right Alex?
My code is broken. I was adding images which pointed to non-hydrated images for my solutions. this was silly.
I just need to pinpoint Services to my existing 2 apps (8 apps actually).
why am I trying to substitute Cloude Deploy with bash?!? :P
This bug tracks this execution since its about:
So quite complex :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.