Coder Social home page Coder Social logo

ezdemo's Introduction

ezdemo's People

Contributors

dderichswei avatar erdincka avatar snowch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ezdemo's Issues

configure AD users on picasso

It would be good to configure ad_admin1 and ad_user1 on picasso.
As an ansible newbie I'm not sure the best way to structure the scripts.

Here's my attempt ...

root@1e2f223abb02:/app/server# cat ansible/routines/configure_picasso.yml
### Configure Picasso
- hosts: "{{ (groups['controllers'] | first) | default([])}}"
  tasks:
    - name: read cluster id
      shell: "hpecp k8scluster list -o text | cut -d' ' -f1"
      register: cluster_id

    - name: get cluster
      shell: "hpecp k8scluster get {{ cluster_id.stdout }} -o json"
      register: cluster_json
      ignore_errors: True

    - set_fact:
        cluster: "{{ cluster_json.stdout | from_json }}"
    - set_fact:
        firstmaster_id: "{{ (cluster | json_query(jmesquery)) | first }}"
      vars:
        jmesquery: "k8shosts_config[?role=='master'].node"

    - shell: "hpecp k8sworker get {{ firstmaster_id }} -o json"
      register: firstmaster_json
    - set_fact:
        firstmasterip: "{{ (firstmaster_json.stdout | from_json) | json_query('ipaddr') }}"

    - name: prepare tenants
      shell: |-
        function retry {
          local n=1
          local max=20
          local delay=30
          while true; do
            "$@" && break || {
              if [[ $n -lt $max ]]; then
                ((n++))
                echo "Command failed. Attempt $n/$max:"
                sleep $delay;
              else
                fail "The command has failed after $n attempts."
              fi
          }
          done
        }
        export SCRIPTPATH="/opt/bluedata/bundles/hpe-cp*"
        export MASTER_NODE_IP={{ firstmasterip }}
        export LOG_FILE_PATH=/tmp/register_k8s_prepare.log
        retry ${SCRIPTPATH}/startscript.sh --action prepare_dftenants
        export LOG_FILE_PATH=/tmp/register_k8s_configure.log
        [[ $(tail -1 ${LOG_FILE_PATH} 2> /dev/null ) == "The action configure_dftenants completed successfully." ]] || echo yes | ${SCRIPTPATH}/startscript.sh --action configure_dftenants
        export LOG_FILE_PATH=/tmp/register_k8s_register.log
        [[ $(tail -1 ${LOG_FILE_PATH} 2> /dev/null ) == "The action register_dftenants completed successfully." ]] || expect <<EOF
          set timeout 1800
          spawn $(realpath ${SCRIPTPATH})/startscript.sh --action register_dftenants
          expect ".*Enter Site Admin username: " { send "admin\r" }
          expect "admin\r\nEnter Site Admin password: " { send "{{ admin_password }}\r" }
          expect eof
        EOF
      register: result
      # retries: 15
      # delay: 60
      # until: result is not failed

- name: configure picasso DF users
  hosts: localhost
  tasks:

    - name: mapr password
      shell: "kubectl --kubeconfig {{ ansible_env.HOME }}/.kube/config -n dfdemo get secret system -o yaml | grep MAPR_PASSWORD | head -1 | awk '{print $2}' | base64 --decode"
      register: mapr_password

    - name: maprlogin
      shell: "kubectl --kubeconfig {{ ansible_env.HOME }}/.kube/config -n dfdemo exec admincli-0 -- bash -c 'echo {{ mapr_password.stdout }} | maprlogin password'"

    - name: add ad_admin1
      shell: "kubectl --kubeconfig {{ ansible_env.HOME }}/.kube/config -n dfdemo exec admincli-0 -- maprcli acl edit -type cluster -user ad_admin1:fc"

    - name: add ad_user1
      shell: "kubectl --kubeconfig {{ ansible_env.HOME }}/.kube/config -n dfdemo exec admincli-0 -- maprcli acl edit -type cluster -user ad_user1:login"

Provide usage instructions if no/wrong parameter provided

It would be good to provide usage if user provides wrong parameters, e.g. 00-run_all.sh

#!/usr/bin/env bash

set -euo pipefail

if ! echo "aws azure kvm vmware" | grep -w -q ${1}; then
   echo Usage: "${0} aws|azure|kvm|vmware"
   exit 1
fi

...

UI progress reporting and continuity

Instead of showing just the output of the run(s), we should have a nicer/simpler/friendlier component to report the progress.

This could be a progress bar configured with,

  • expected completion time vs processing time, or
  • total steps vs current step (steps are not good indicators on how long the run will take)
    And should be able to pick up where it left (ie, page restart).

Parallel execution for tasks

Some tasks can run in parallel, ie, "add workers" and "configure DF node", "create k8scluster" and "create DF".
Currently running Samba in docker container within ad_server is started as async task, and is never checked for completion/errors. Would be nice to have a method to submit tasks in the background, and check/wait just before the job that depends it (create tenant should check create cluster, install mapr should check AD integration etc).
Ansible has async method, and I guess it is the best option to implement this (need to ensure we submit and check these jobs on the same hosts etc).

ovirt Support

Should I open a new branch for ovirt and vmware, or just check into the existing repo?
the UI we can still adapt later for it, what do you think?

provide password option for admin user

Could we provide an option for users to set the admin password? E.g.

{
  "aws_access_key": "",
  "aws_secret_key": "",
  "is_mlops": false,
  "is_df": fale,
  "user": "",
  "project_id": ""
  "admin_password": "ChangeMe!!"
}

SSL is incorrectly setup

Feedback from eng ...

Chris Snow, You did not install the environment correctly if you want to have the root CA guarantee TLS transactions....

See the install options:
           
--ssl-cert : Absolute path to the SSL certificate.       
--ssl-priv-key : Absolute path to the SSL certificate's private key.        
--ssl-ca-data : Absolute path to the SSL CA certificate data file path (optional).

We should be using ...

--ssl-cert=/etc/pki/tls/certs/cert.pem 
--ssl-priv-key=/etc/pki/tls/private/key.pem 
--ssl-ca-data=/etc/pki/tls/certs/minica.pem

# If you install ECP in this fashion, you will not see the insecure TLS connection.

See EZESC-1160 (internal Jira)

Create MLOPS SCC entry with ansible

To update the MLOPS SCC configuration:

POST /api/v2/k8scluster/ {cluster_id} /kubectl

With payload data:

data = {
    "op": {kubectl_op}, // "create", "apply", "delete"
    "data": { 
      "apiVersion": "", 
      "kind": "", 
      "metadata": { 
        "namespace": "", 
        "name": "", 
        "labels":{ 
           "kubedirector.hpe.com/cmType": "source-control", 
           "createdByUser": "", 
           "createdByRole": "", 
           "parentConfiguration": "" 
         }
       }, 
      "data":{
         "sourceControlName": "",
         "type": "github | bitbucket", 
         "repoURL": "", 
         "authType": "token | password", 
         "branch": "", 
         "workingDirectory": "", 
         "proxyProtocol": "", 
         "proxyHostname": "", 
         "proxyPort": "", 
         "username": "", 
         "email": "", 
         "token": "", 
         "description": "" 
      }
   }
}

more information to follow on:

data.apiVersion
data.kind
data.metadata.labels.parentConfiguration (how do we retrieve this)?

E.g. Parent

{
  "method": "post",
  "apiurl": "https://127.0.0.1:8080",
  "timeout": 239,
  "data": {
    "kubectl_op": "create",
    "cluster_href": "/api/v2/k8scluster/1",
    "payload": {
      "apiVersion": "v1",
      "kind": "ConfigMap",
      "metadata": {
        "namespace": "k8s-tenant-1",
        "name": "abc",
        "labels": {
          "kubedirector.hpe.com/cmType": "source-control",
          "createdByUser": "6",
          "createdByRole": "Admin"
        }
      },
      "data": {
        "type": "github",
        "repoURL": "[email protected]:hpe-container-platform-community/example_active_directory_server.git",
        "authType": "token",
        "branch": "main",
        "workingDirectory": "",
        "proxyProtocol": "",
        "proxyHostname": "",
        "proxyPort": "",
        "description": ""
      }
    }
  },
  "op": "source_control_action"
}

Example child

{
  "method": "post",
  "apiurl": "https://127.0.0.1:8080",
  "timeout": 239,
  "data": {
    "kubectl_op": "create",
    "cluster_href": "/api/v2/k8scluster/1",
    "payload": {
      "apiVersion": "v1",
      "kind": "ConfigMap",
      "metadata": {
        "namespace": "k8s-tenant-1",
        "name": "mysccchild",
        "labels": {
          "kubedirector.hpe.com/cmType": "source-control",
          "createdByUser": "22",
          "createdByRole": "Member",
          "parentConfiguration": "myscc"
        }
      },
      "data": {
        "type": "github",
        "repoURL": "[email protected]:hpe-container-platform-community/example_active_directory_server.git",
        "authType": "token",
        "branch": "main",
        "workingDirectory": "",
        "proxyProtocol": "",
        "proxyHostname": "",
        "proxyPort": "",
        "username": "mygitusername",
        "email": "[email protected]",
        "token": "mygittoken",
        "description": ""
      }
    }
  },
  "op": "source_control_action"
}

lost internet connection during "drain gpu nodes" unable to restart job

ansible output ...

TASK [drain gpu nodes] *********************************************************
failed: [localhost] (item={'changed': True, 'stdout': 'ip-10-1-0-176.eu-west-1.compute.internal', 'stderr': '', 'rc': 0, 'cmd': 'kubectl get nodes -o json | jq -r \'.items[] | select( .status.addresses[].address == "10.1.0.176") | .metadata.name\'', 'start': '2022-02-09 10:03:24.206134', 'end': '2022-02-09 10:03:25.481530', 'delta': '0:00:01.275396', 'msg': '', 'invocation': {'module_args': {'_raw_params': 'kubectl get nodes -o json | jq -r \'.items[] | select( .status.addresses[].address == "10.1.0.176") | .metadata.name\'', '_uses_shell': True, 'warn': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': ['ip-10-1-0-176.eu-west-1.compute.internal'], 'stderr_lines': [], 'failed': False, 'item': '10.1.0.176', 'ansible_loop_var': 'item'}) => {"ansible_loop_var": "item", "changed": true, "cmd": "kubectl drain --ignore-daemonsets \"ip-10-1-0-176.eu-west-1.compute.internal\"", "delta": "0:00:01.718773", "end": "2022-02-09 10:03:27.463665", "item": {"ansible_loop_var": "item", "changed": true, "cmd": "kubectl get nodes -o json | jq -r '.items[] | select( .status.addresses[].address == \"10.1.0.176\") | .metadata.name'", "delta": "0:00:01.275396", "end": "2022-02-09 10:03:25.481530", "failed": false, "invocation": {"module_args": {"_raw_params": "kubectl get nodes -o json | jq -r '.items[] | select( .status.addresses[].address == \"10.1.0.176\") | .metadata.name'", "_uses_shell": true, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": false}}, "item": "10.1.0.176", "msg": "", "rc": 0, "start": "2022-02-09 10:03:24.206134", "stderr": "", "stderr_lines": [], "stdout": "ip-10-1-0-176.eu-west-1.compute.internal", "stdout_lines": ["ip-10-1-0-176.eu-west-1.compute.internal"]}, "msg": "non-zero return code", "rc": 1, "start": "2022-02-09 10:03:25.744892", "stderr": "error: unable to drain node \"ip-10-1-0-176.eu-west-1.compute.internal\", aborting command...\n\nThere are pending nodes to be drained:\n ip-10-1-0-176.eu-west-1.compute.internal\nerror: cannot delete Pods with local storage (use --delete-emptydir-data to override): istio-system/grafana-784c89f4cf-rk6g4", "stderr_lines": ["error: unable to drain node \"ip-10-1-0-176.eu-west-1.compute.internal\", aborting command...", "", "There are pending nodes to be drained:", " ip-10-1-0-176.eu-west-1.compute.internal", "error: cannot delete Pods with local storage (use --delete-emptydir-data to override): istio-system/grafana-784c89f4cf-rk6g4"], "stdout": "node/ip-10-1-0-176.eu-west-1.compute.internal already cordoned", "stdout_lines": ["node/ip-10-1-0-176.eu-west-1.compute.internal already cordoned"]}

I'm wondering if it is possible to handle this issue?

Ability to share state between local machine and docker image

I'm wondering whether a simple fix for this requirement could be to mount two volumes and use two rsync processes.

One process would copy config files allowing users to provide only the files they need to change, e.g.

If on the host (rsync source) you had ./app/server/aws/config.json rsync's default behavior would be to copy only the files on the source without deleting the other files in the destination.

rsync could also be used to create a backup of the entire docker /app folder to the local machine?

Thoughts?

Some changes needed to 03-install.sh?

I'm not sure if these issues are because I corrupted erdincka/ezdemo:latest with the github action...

I had to create the group_vars folder:

[[ -d ./ansible/group_vars/ ]] || mkdir ./ansible/group_vars
echo "ansible_ssh_common_args: ${SSH_OPTS}" > ./ansible/group_vars/all.yml

gateway_pub_dns was throwing an error, and relative path for key was failing unless in the right directory

### TODO: Move to ansible task
SSH_CONFIG="
Host *
  StrictHostKeyChecking no
Host hpecp_gateway
  # Hostname ${gateway_pub_dns}
  Hostname ${GATW_PUB_DNS[0]}
  # IdentityFile generated/controller.prv_key
  IdentityFile /app/server/generated/controller.prv_key
...

I had to create the ~/.ssh folder:

[[ -d ~/.ssh ]] || mkdir ~/.ssh && chmod 700 ~/.ssh
echo "${SSH_CONFIG}" > ~/.ssh/ssh_config ## TODO: move to ansible, delete on destroy

For some reason, ssh client wasn't using ~/.ssh/ssh_config by default, I had to use ~/.ssh/config instead:

[[ -d ~/.ssh ]] || mkdir ~/.ssh && chmod 700 ~/.ssh
echo "${SSH_CONFIG}" > ~/.ssh/config ## TODO: move to ansible, delete on destroy

Do these changes make sense? Shall I add them?

Enable HA within Ansible runs

Selecting High Availability option creates 2 gateways & 3 controllers, but installation only configures 2 gateways. Need to update the process within Ansible to configure & add these 2 additional controllers into the cluster (as EPIC workers) and then enable HA.
No Cluster IP or floating VIP required in this setup.

setup gitea ... /root/.kube/config: no such file or directory

TASK [setup gitea] *************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "./setup_gitea.sh 'kubectl --kubeconfig /root/.kube/config -n k8s-tenant-1'", "delta": "0:00:00.183659", "end": "2022-01-23 18:24:44.196309", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2022-01-23 18:24:44.012650", "stderr": "error: stat /root/.kube/config: no such file or directory", "stderr_lines": ["error: stat /root/.kube/config: no such file or directory"], "stdout": "", "stdout_lines": []}

I think the /root/.kube directory needs to be created, e.g.

ansible/refresh.yml ...

...
  - name: update kubeadmin config
    shell: |-
      [[ -d ~/.kube ]] || mkdir ~/.kube
      while : ; do
        hpecp k8scluster admin_kube_config {{ item }} > ~/.kube/config
        [ $(wc -l ~/.kube/config | cut -d' ' -f1) -lt 5 ] || break
        sleep 10
      done
    with_items: "{{ cluster_ids }}"

Lookup AWS instance types

With terraform it is possible to lookup instance types. This will allow deployment to continue (possibly at a higher cost) if the preferred instance type is not supported in the selected region and availability zone. E.g.

data "aws_ec2_instance_type_offering" "example" {
  filter {
    name   = "instance-type"
    values = ["t2.micro", "t3.micro"]
  }

  preferred_instance_types = ["t3.micro", "t2.micro"]
}

Source:

Also, aws_ec2_instance_types. E.g.

data "aws_ec2_instance_types" "test" {
  filter {
    name   = "auto-recovery-supported"
    values = ["true"]
  }

  filter {
    name   = "network-info.encryption-in-transit-supported"
    values = ["true"]
  }

  filter {
    name   = "instance-storage-supported"
    values = ["true"]
  }

  filter {
    name   = "instance-type"
    values = ["g5.2xlarge", "g5.4xlarge"]
  }
}

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ec2_instance_types

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.