Coder Social home page Coder Social logo

aks-landing-zone-accelerator's Introduction

AKS Landing Zone Accelerator

Azure Landing Zone Accelerators are architectural guidance, reference architecture, reference implementations and automation packaged to deploy workload platforms on Azure at Scale and aligned with industry proven practices.

AKS Landing Zone Accelerator represents the strategic design path and target technical state for an Azure Kubernetes Service (AKS) deployment. This solution provides an architectural approach and reference implementation to prepare subscriptions for a scalable Azure Kubernetes Service (AKS) cluster. For the architectural guidance, check out AKS landing zone accelerator in Microsoft Learn.

Below is a picture of what a golden state looks like and open source software like flux and traefik integrate well within the AKS ecosystem.

Golden state platform foundation with AKS landingzone highlighted in red

The AKS Landing Zone Accelerator is only concerned with what gets deployed in the landing zone subscription highlighted by the red box in the picture above. It is assumed that an appropriate platform foundation is already setup which may or may not be the official ESLZ platform foundation. This means that policies and governance should already be in place or should be setup after this implementation and are not a part of the scope this reference implementation. The policies applied to management groups in the hierarchy above the subscription will trickle down to the AKS Landing Zone Accelerator landing zone subscription. Having a platform foundation is not mandatory, it just enhances it. The modularized approach used in this program allows the user to pick and choose whatever portion is useful to them. You don't have to use all the resources provided by this program.


Choosing a Deployment Model

The reference implementations are spread across three repos that all build on top of the AKS baseline reference architecture and Azure Landing Zones.

  1. This one
  2. The AKS Construction Helper
  3. The Baseline Automation Module

This repo

In this repo, you get access to step by step guide covering various customer scenarios that can help accelerate the development and deployment of AKS clusters that conform with AKS Landing Zone Accelerator best practices and guidelines. This is a good starting point if you are new to AKS or IaC. Each scenario aims to represent common customer experiences with the goal of accelerating the process of developing and deploying conforming AKS clusters using Infrastructure-As-Code (IaC). They also provide a step by step learning experience for deploying well architected AKS environments. Most scenarios will eventually have a Terraform and Bicep version.

Use this repo if you would like step by step guidance on how to deploy secure and well architected AKS clusters using our scenario based model and/or you are new to AKS or IaC. This model promotes a separation of duties, modularized IaC so you can pick and choose components you want to build with your cluster and has implementations in ARM, Terraform and Bicep. It is the best starting point for people new to Azure or AKS.

AKS Construction Helper

A flexible templating approach using Bicep that enables multiple scenarios using a Web based tool. It provides tangible artifacts to quickly enable AKS deployments through CLI or in your CI/CD pipeline.

Driving the configuration experience is a wizard to guide your decision making, it provides presets for the main Azure Landing Zone deployment modes (Sandbox, Corp & Online). The output of this wizard experience are the parameters and CLI commands to immediately deploy using our maintained Bicep template to deploy your customized AKS environment in one step.

Use this repo if you would like to use a guided experience to rapidly create your environment with a maintained Bicep template based on the architecture of the AKS Secure Baseline.

Next Steps to implement AKS Landing Zone Accelerator

Pick one of two options below

Follow a scenario driven tutorial within this repo

Pick one of the scenarios below to get started on a reference implementation. For the AKS secure baseline with non-private cluster, use the AKS baseline reference implementation.

▶️ AKS Secure Baseline in a Private Cluster

▶️ Running Azure ML workloads on a private AKS cluster

▶️ Azure Policy initiative for AKS Landing Zone Accelerator (Brownfield scenario)

▶️ Backup Restore using Open source tool Velero

▶️ BlueGreen Deployment for AKS

▶️ AKS on prem & Hybrid

Or leverage one of the Landing Zone Accelerator implementations from our other repos

▶️ AKS Construction Helper

aks-landing-zone-accelerator's People

Contributors

ahems avatar arielram99 avatar bahramr avatar ckittel avatar cloudydemos avatar dcasati avatar desreela avatar fgauna12 avatar houssemdellai avatar iamaliyousefi avatar ibersanoms avatar jordanbean-msft avatar khowling avatar lenisha avatar mattleach25 avatar mattleach2512 avatar microsoftopensource avatar mosabami avatar nithinrad avatar olaseniadeniji avatar oliverlabs avatar pratiksharma-dev avatar robertopc1 avatar scottsimock avatar shubhammicrosoft1 avatar techbunny avatar vimorra avatar welasco avatar yegu-ms avatar yelghali avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aks-landing-zone-accelerator's Issues

Update the scenario to store other sensitive data in keyvault as opposed to config maps

The build intelligent apps scenario currently has sensitive information stored in config maps as opposed to keyvault. This needs to change. Update the manifest files and deployment instructions to pull the following environment variables from key vault instead of config maps to make the deployment more secure. Currently only the OPENAI_API_KEY is deployed this way. you can use it as a template for the remaining sensitive environment variables. Below is a list of them:
AzureWebJobsStorage
BLOB_ACCOUNT_KEY
FORM_RECOGNIZER_KEY
OPENAI_API_BASE
TRANSLATE_KEY

You can find the config map file here for your reference: https://github.com/Azure/AKS-Landing-Zone-Accelerator/blob/openai-embeddings/Scenarios/AKS-OpenAI-CogServe-Redis-Embeddings/kubernetes/env-configmap.yaml

This is the command currently used to pass the OPENAI_API_KEY to the keyvault here: DEP=$(az deployment group create --name aksenvironmentdeployment -g $RGNAME --parameters signedinuser=$SIGNEDINUSER api_key= -f aks.bicep -o json).

that deployment command will need to be updated to pass these other sensitive variables. The Bicep files will also need to be updated accordingly: https://github.com/Azure/AKS-Landing-Zone-Accelerator/blob/openai-embeddings/Scenarios/AKS-OpenAI-CogServe-Redis-Embeddings/infrastructure/kvRbac.bicep

Acceptance criteria:

Bicep file updated to require parameters for the additional sensitive variables
Readme command updated to pass the parameters
Deployment manifest files updated to pull secrets using the secret provider class just like it pulls the openaiapikey secret
PR submitted to the opernai-embeddings branch

Create the self-signed certificate using Lets Encrypt [BUG]

Describe the bug
The Scenario for AKS-Secure-Baseline-PrivateCluster/Bicep/07-workload.md page talks about deploying the Ingress with HTTPS support. When you run the following command:

az aks command invoke --resource-group $ClusterRGName --name $ClusterName --command "kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.8.0/cert-manager.yaml"

you get a message back stating:

command started at 2022-07-01 20:25:59+00:00, finished at 2022-07-01 20:26:03+00:00 with exitcode=1
Unable to connect to the server: EOF

The cert-manager is not installed into the cluster.

To Reproduce
Steps to reproduce the behavior:

  1. Connect to a private AKS cluster.
  2. Issue the above command (az aks command invoke --resource-group $ClusterRGName --name $ClusterName --command "kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.8.0/cert-manager.yaml")
  3. Wait until the command comes back to the jumpbox (Bastion)
  4. See error.

Expected behavior
cert-manager is installed into the cluster.

Desktop (please complete the following information):

  • OS: Ubuntu 18.04
  • Terminal with Azure CLI

[BUG] ARM Template container registry of AKS-Secure-Private-Cluster fails to deploy

Describe the bug
ARM Template container registry of AKS-Secure-Private-Cluster fails to deploy
https://github.com/Azure/AKS-Landing-Zone-Accelerator/blob/main/Scenarios/AKS-Secure-Baseline-PrivateCluster/ARM/Infrastructure-Deployment/Supporting-components/Templates/aks-eslz-containerregistry.template.json

The error is as follows:
**{"code": "InvalidContentLink", "message": "Unable to download deployment content from 'https://containerregistry.hosting.portal.azure.net/containerregistry/Content/1.0.20210721.6/DeploymentTemplates/PrivateEndpointForRegistry.json'. The tracking Id is 'adcb7379-10ad-48fb-8c7a-8dafc5c7339b'. Please see https://aka.ms/arm-deploy-resources for usage details."}**

Add build app image and push to ACR to the deployment steps

The Readme file for the Build intelligent apps scenario provides instructions on how to deploy the scenario by using an image already build and hosted on dockerhub. We know customers would want to build their images themselves using dockerfiles. Update the instructions to add a NEW OPTION to have to user build their images using the az acr build command and used their built images in their deployment instead. decide if you want them to complete the instructions using the dockerhub image first then update it to use their ACR or make it an option they can choose from (build your image or use ours). The dockerfiles can be fund in the submodule (App folder)

Create a new branch out of the opernai-embeddings branch: https://github.com/Azure/AKS-Landing-Zone-Accelerator/tree/openai-embeddings/Scenarios/AKS-OpenAI-CogServe-Redis-Embeddings

Acceptance criteria:

  1. Update the manifest file providing a placeholder as the image name so users can update it with their acr name
  2. add instructions to the readme ensuring users understand that when they git clone the repository they must use the command git clone --recurse-submodules https://github.com/Azure/AKS-Landing-Zone-Accelerator so that the application code can be pulled from the repo that hosts it
  3. Add instructions on how to build the image using az acr build
  4. add instructions on how to update the placeholder in the deployment files
  5. the changes should apply to the two application images and redis image should be pulled from dockerhub to the user's acr
  6. this change must be optional
  7. PR submitted to the opernai-embeddings branch

Policy question ESLZ platform foundation vs XXX Landing zone Accelerator

Quick question related to below quote.

The AKS Landing Zone Accelerator is only concerned with what gets deployed in the landing zone subscription highlighted by the red box in the picture above. It is assumed that an appropriate platform foundation is already setup which may or may not be the official ESLZ platform foundation. This means that policies and governance should already be in place or should be setup after this implementation and are not a part of the scope this reference implementation. The policies applied to management groups in the hierarchy above the subscription will trickle down to the AKS Landing Zone Accelerator landing zone subscription. Having a platform foundation is not mandatory, it just enhances it. The modularized approach used in this program allows the user to pick and choose whatever portion is useful to them. You don't have to use all the resources provided by this program.

I see that the AKS Landing zone Accelerator has an x number of Kubernetes policies which are not part of the ESLZ platform foundation as deployed by Azure Landingzones

In the examples they are assigned to the resourcegroup in which the AKS cluster lives.

Should the list of AKS policies in ESLZ platform be the same as the policies mentioned here? https://github.com/Azure/AKS-Landing-Zone-Accelerator/tree/main/Scenarios/Azure-Policy-ES-for-AKS

What I'm trying to understand in general is if all policies related to resources for which an application accelerator is available should be part of the platform foundation or if the application accelerator can be seen as providing an x number of policies on top of what platform foundation already deploys?

[FEATURE] Enable Azure Dev CLI to deploy AKS-LZA

Is your feature request related to a problem? Please describe.
A well architect AKS environment can be complicated to build. Enter the AKS deployment helper! However how do we make various sample apps available to customers who want to try out AKS and want the quickest option possible?

Describe the solution you'd like
Azure dev CLI makes it easy and quick to get started on Azure rapidly and is developer focused. Leveraging it to enable customers create well architected AKS environment will get more customers using AKS quickly

Describe alternatives you've considered
Using the AKS deploy helper to create the environment. However the dev CLI option will be even simpler albert not as flexible. The dev CLI can also be used to create sample apps in various languages. We will begin with our javascript app

Additional context

⚠️ Start by reading this doc to understand how to make a repo azd compatible

we will be using the AKS construction project for the IaC and have the repo pulled in as a submodule.
Here are the required tasks:

  • Create a repository in Azure-Samples org that the deployment and application scripts will be created in
  • Use this repo as a guide to create the repo: https://github.com/azure-samples/todo-nodejs-mongo-aks. It includes all the folders required to make a repo Dev CLI enabled. According to this the new repo should have the following folders: .devcontainers, .github/workflows, assets (where the pictures in readme will go), src
  • Use the content of this folder to create the code that goes into the infra folder. the AKS-Construction repo should be reference as a submodule in this project. We need to have a main.bicep and a main.parameters.json file in the infra folder, how the rest of the folder is structured apart from that is up to you. Make your own main.bicep file in the infra folder that uses the aks-construction's main.bicep file by referencing it like i did in this project. The only difference being that instead of listing the parameters like i did in that file as a json object, put those parameters in a fresh main.parameters.json file. Use the same parameters i used in my json object. Here is an example of what the parameters file will look like
  • The src folder should have two folders in it. The mslearn-aks-workshop-ratings-api folder submodule to the ratings app api repo. The mslearn-aks-workshop-ratings-web folder should do the same but submodule to this repo
  • The .devcontainers folder should use the same as this folder. Make changes to the upstream project's .devcontainer folder if need be and copy it into the .devcontainer for this project
  • the .github/workflows folder should include this pipeline to deploy the infra and this pipeline to deploy the app. This second pipeline will need access to the yaml files in this folder so this repo will need to be cloned as a submodule. If dev cli only allows you to use 1 pipeline, use the first one and we can figure out how to add the second later
  • The project will also require an azure.yaml file as seen in the sample repo provided

After all these steps are completed we will then proceed to complete step 2 and 3 highlighted in this doc

Enable to use of an nginx ingress controller using an internal LB to access the cluster resources

The build intelligent apps scenario's architecture diagram has Nginx used as an ingress controller to access the application. However our deployment steps currently expose the web application directly using a LB service. Make changes required so that an nginx ingress controller is used instead and have all the other services changed to ClusterIP. Here is a link to the scenario: https://github.com/Azure/AKS-Landing-Zone-Accelerator/tree/openai-embeddings/Scenarios/AKS-OpenAI-CogServe-Redis-Embeddings

You can make the Nginx LB use an internal LB by using the command below:

helm upgrade --install ingress-nginx ingress-nginx \
 --namespace ingress-basic \
 -f internal-ingress.yaml \
   --repo https://kubernetes.github.io/ingress-nginx \
   --set controller.replicaCount=2 

the internal-ingress.yaml file can be found in the kubernetes folder. We dont need to use https at this time

Acceptance criteria:

Ingress controller manifest file added using http
Ingress controller should have an internal IP address
Update the readme to deploy the ingress controller and also mention the fact that the app can only be access via app gateway as one of the features of this scenario
Include an az cli command to add the ingress controller's ip address (10.240.3.101) to the back end pool of the app gateway to the README or use bicep to do that
All other services are using ClusterIP service type
PR submitted to the opernai-embeddings branch

[FEATURE] Port back the best practices from the Windows baseline to the terraform baseline implementation

Is your feature request related to a problem? Please describe.
No it is not a problem, just continuous improvement

Describe the solution you'd like
we made a lot of improvements on the windows baseline reference implementation. we would like for those improvements be ported back to the private cluster baseline.

Additional context
the comments in This PR provides further details on the work that needs to be done:
#68

[FEATURE] - Blue Green Deployment Reference Implementation

Add a reference implementation for the blue green pattern, usually adopted to control the deployment of new cluster versions.

I would like to have a reference implementation integrated in the AKS Secure Baseline. The implementation is based on Terraform and is retro.compatible with the previous, enabling deployment also of a single cluster.

[BUG] Scenarios/AKS-OpenAI-CogServe-Redis-Embeddings : OpenAIName

Describe the bug
The bicep deployment fails with error, because parameters.json file has invalid value

"OpenAIName" : {
        "value": "<name for your OpenAI service>"
      },

Need to add instructions in README for user to set there own unique value in this parameter. NOTE: Globally unique value required*

#73

Ensure that following deployment steps for build intelligent apps works without any error

The Readme file for the Build intelligent apps scenario provides instructions on how to deploy the scenario at a high level. Some steps might be missing. Follow the instructions and make updates to the deployment instructions as you go. Also provide more details about which environment variables need to be updated in the config map and provide screenshots of the app working. Make changes to the bicep and k8s files as well as required. Create a new branch out of the opernai-embeddings branch: https://github.com/Azure/AKS-Landing-Zone-Accelerator/tree/openai-embeddings/Scenarios/AKS-OpenAI-CogServe-Redis-Embeddings

Acceptance criteria:

  1. Instructions and other files within the "AKS-OpenAI-CogServe-Redis-Embeddings" folder updated so that a user can successfully deploy the scenario
  2. PR submitted to the opernai-embeddings branch

GitHub Actions Runner not deploying Sample Application

Describe the bug
For the AKS Secure Baseline Cluster, I'm trying to deploy the sample applications. Github actions just sit there waiting for a runner.

To Reproduce
Steps to reproduce the behavior:

  1. Try to run the workflow Deploy the Sample App using GH Runner

Expected behavior
It should run to completion. I've gotten it running before.

Desktop (please complete the following information):

  • Github in Browser

[BUG] Scenarios/AKS-OpenAI-CogServe-Redis-Embeddings : text-davinci-002

Describe the bug
bicep deployment fails with :

{"code": "DeploymentModelNotSupported", "message": "Creating account deployment is not supported by the model 'text-davinci-002'. This is usually because there are better models available for the similar functionality."}

Appears to be this issue ("likely related to the OpenAI deprecation announcements. I'm testing to see if gpt3-turbo can be used instead"):
Azure-Samples/azure-search-openai-demo/issues#388

#73

Update one of the Microservices in the AKS openai demo to use Java (spring boot) as a programing language

The AKS product team recently released the openai demo you can find here: https://github.com/Azure-Samples/aks-store-demo/tree/main. It uses Javacript, Go, Rust, Python and Vue, but not Java, one of the most important programming languages our customers use. Pick one of the microservices written in Rust (because multiple microservices are written in rust) and convert it to Java. Preferable convert this microservice: https://github.com/Azure-Samples/aks-store-demo/tree/main/src/product-service

Acceptance criteria:
A rust microservice is converted to Java spring
PR submitted to the main branch of the application

[FEATURE] Include parameter to enable node auto provisioning add on

Is your feature request related to a problem? Please describe.
Please can you include a parameter to activate the node auto provisioning feature

Describe the solution you'd like
https://learn.microsoft.com/en-us/azure/aks/node-autoprovision?tabs=azure-cli

When you deploy workloads onto AKS, you need to make a decision about the node pool configuration regarding the VM size needed. As your workloads become more complex, and require different CPU, memory, and capabilities to run, the overhead of having to design your VM configuration for numerous resource requests becomes difficult.

Node autoprovisioning (NAP) (preview) decides based on pending pod resource requirements the optimal VM configuration to run those workloads in the most efficient and cost effective manner.

Please can you include a parameter to activate the node auto provisioning feature

Describe alternatives you've considered
Installing Karpenter using helm as a post deploy script

[FEATURE] Simplify the instructions for the Workload deployment

The steps for the workload deployment are too complex and do not provide much instructional value.
Consider implementing the following enhancements:

  1. Remove the steps to setup firewalls rule for the jumpbox
  2. Connecting the ACR and AKV to the hub should have been done previously through the IaC bicep or terraform code
  3. We should talk about the "AKS run command" as an alternative to running the kubectl commands from the jumpbox. All the commands should be changed to be run locally from the Linux jumpbox as that offers a better experience for the end user (also making use of the jumpbox and Bastionhost deployed in the environment).
  4. Remove all the manual customizations that need to be done to the app yaml files. Instead use helm and pass the environment specific variables as parameters
  5. Remove the steps to deploy ingress without HTTPS

kubelogin not installed in codespaces

Describe the bug
I'm using Codespaces to deploy this LZA. At Step 7.2 the command kubectl get nodes throws and error ( see screenshot).

To Reproduce
Steps to reproduce the behavior:
I run all command in VS Code Web attached to the Codespaces, so far no other command failed.
I'm in the /kubernetes directory and run : kubectl get nodes ( see screenshot attached for the error output)

Expected behavior
The command should come back with the running nodes in Kubernetes.

Screenshots
See attached.

Desktop (please complete the following information):

  • OS: [e.g. Windows]
  • Browser Edge
  • Terminal .devcontainer

Smartphone (please complete the following information):
N/A

Additional context
Screenshot of the error below.
image

Use -input=false in terraform github workflows / missing `location` input

the deployInfrastructure.yml seems to be missing this flag and Scenarios/Secure-Baseline/Terraform/04-Network-Hub seems to require a location in put that is not given so terraform prompts for input in the github action.

https://learn.hashicorp.com/tutorials/terraform/automate-terraform#automated-terraform-cli-workflow

I was following this file for reference https://github.com/Azure/AKS-Landing-Zone-Accelerator/blob/main/Scenarios/Secure-Baseline/Terraform/workflows/README.md#create-storage-account-for-terraform-state-files

[BUG] Deployment fails when reaching [stage 06] - ControlPlaneAddOnsNotReady

Describe the bug
Deployment error when reaching stage 06

code:
ControlPlaneAddOnsNotReady

message:
Pods not in Running status: coredns-547dd8b568-lrswn,coredns-autoscaler-6fb889cdfc-twmtf,metrics-server-7d59848cc6-jkd59,metrics-server-7d59848cc6-sl2tq,tunnelfront-69c958dc9-gndmc,tunnelfront-7f8f87f77c-wjqrn

Also kubectl get nodes from jumpbox returns 'No resources found'

To Reproduce
Follow steps in AKS-Secure-Baseline-PrivateCluster/Bicep

Expected behavior
AKS cluster should deploy correctly
kubectl get nodes should return 3 nodes

[BUG] Deploying AKS-Secure-Baseline-PrivateCluster to Azure Government doesn't work

Describe the bug
Deploying the AKS-Secure-Baseline-PrivateCluster to an Azure Government region doesn't work due to hard-coded paths & assumptions on names & URLs that are different in Azure Government

To Reproduce
Steps to reproduce the behavior:
Follow instructions for deploying AKS-Secure-Baseline-PrivateCluster, but to an Azure Government region

Expected behavior
I expect the Bicep templates & instructions to work for Azure Government similar to how they work for Azure commercial.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.