Coder Social home page Coder Social logo

cloudera-deploy's Introduction

cloudera-deploy - Automation Quickstarts and Examples for the Cloudera Data Platform (CDP)

cloudera-deploy is a rich set of examples and quickstart projects for deploying and managing the Cloudera Data Platform (CDP). Its scope includes Cloudera Data Platform (CDP) Public Cloud, Private Cloud, and Data Services and the software lifecycle of these platforms and the applications that work upon and with them.

You can use the definitions and projects in cloudera-deploy as your entrypoint for getting started with CDP. These resources use straightforward configurations and playbooks to instruct the automation functions, yet each is extensible and highly configurable.

cloudera-deploy is designed to not only get you up and running quickly with CDP, but also to showcase the underlying toolsets and libraries. These projects demonstrate what you can build and layout a great foundation for your own entrypoints, CI/CD pipelines, integrations, and general platform and application operations.

Quickstart

The definitions and projects in cloudera-deploy are designed to run with ansible-navigator and other Execution Environment-based tools.

Follow these steps to get started:

  1. Install ansible-navigator
  2. Check your requirements
  3. Select and configure your project
  4. Set your credentials
  5. Run your project

If you need help, check out the Frequently Asked Questions, the FAQ for cldr-runner, and drop by the Discussions > Help board.

Catalog

The catalog of projects, examples, and definitions currently covers CDP Public Cloud for AWS. CDP Private Cloud and individual Data Services, Public and Private, as well as Public Cloud deployments to Azure and Google Cloud, are coming soon.

Project Platform CSP Description
datalake public cloud AWS Constructs a CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc.
datalake-tf public cloud AWS Constructs a CDP Public Cloud Environment and Datalake. Uses the terraform-cdp-modules, called via Ansible, to generate the AWS infrastructure pre-requisite resources and the CDP artifacts.
cde public cloud AWS Constructs a set of Cloudera Data Engineering (CDE) workspaces within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc.
cdf public cloud AWS Constructs a set of Cloudera Data Flow (CDF) workspaces and data hubs within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc.
cml public cloud AWS Constructs a set of Cloudera Machine Learning (CML) workspaces within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc.
base private cloud AWS IaaS Constructs a CDP Private Cloud Base cluster running on AWS IaaS. Uses Terraform to generate the AWS infrastructure and deploys to a SSH-proxied private cluster.

Roadmap

If you want to see what we are working on or have pending, check out:

Are we missing something? Let us know by creating a new issue or posting a new idea!

Contributions

For more information on how to get involved with the cloudera-deploy project, head over to CONTRIBUTING.md.

Requirements

cloudera-deploy itself is not an application, but its projects and examples expect to run within an execution environment called cldr-runner. This execution environment typically is a container that encapsulates the runtimes, libraries, Python and system dependencies, and general configurations needed to run an Ansible- and Terraform-enable project.

Note

It is worth pointing out that you don't have to use a container, but setting up a local execution environment is out-of-scope of cloudera-deploy; the projects in cloudera-deploy will run in any execution environment, for example AWX/Red Hat Ansible Automation Platform (AAP). If you want to learn more about setting up a local execution environment, head over to cloudera-labs/cldr-runner.

The cloudera-deploy projects and their playbooks are built with the automation resources provided by cldr-runner, notably, but not exclusively:

Besides these resources within cldr-runner, cloudera-deploy projects generally will need one or more of the following credentials:

CDP Public Cloud

For CDP Public Cloud, you will need an Access Key and Secret set in your user profile. The underlying automation libraries use your default profile unless you instruct them otherwise. See Configuring CDP client with the API access key for further details.

Cloud Providers

For Azure and AWS infrastructure, the process is similar to CDP Public Cloud, and these parameters may likewise be overridden.

For Google Cloud, we suggest you issue a credentials file, store it securely in your profile, and then reference that file as needed by a project's configuration, as this works best with both CLI and Ansible Gcloud interactions.

CDP Private Cloud

For CDP Private Cloud you will need a valid Cloudera license file in order to download the software from the Cloudera repositories. We suggest you store this file in your user profile in ~/.cdp/ and reference that file as needed by a project's configuration.

If you are also using Public Cloud infrastructure to host your CDP Private Cloud clusters, then you will need those credentials as well.

Installation and Usage

To use the projects in cloudera-deploy, you need to first set up ansible-navigator.

Important

Please note each OS has slightly different requirements for installing ansible-navigator. 🥴 Read more about installing ansible-navigator.

  1. Create and activate a new Python virtualenv.

    You can name your virtual environment anything you want; by convention, we like to call it cdp-navigator.

    # Note! You will need Python 3.9 or higher!
    python3.9 -m venv ~/cdp-navigator; source ~/cdp-navigator/bin/activate;

    This step is highly recommended yet optional.

  2. Install the latest ansible-core and ansible-navigator.

    These tools can be the latest versions, as the actual execution versions are encapsulated in the execution environment container.

     pip install ansible-core ansible-navigator

Note

Further details can be found in the NAVIGATOR document in cloudera-labs/cldr-runner.

Warning

On OSX, avoid using the stock Python executable with ansible-navigator; users report that the curses library in the stock installation is unable to run (throws a segfault). You might want to install another version of Python, such as using brew.

Then, clone this project.

git clone https://github.com/cloudera-labs/cloudera-deploy.git; cd cloudera-deploy;

Execution Engine

ansible-navigator can use either docker or podman. Either way, you will need a container runtime on your host.

Confirm your Docker service

Check that docker is available by running the following command to list any active Docker containers.

docker ps -a

If it is not running, please check your prerequisites process for Docker to install, start, and test the service.

Credentials

To check that your various credentials are available and valid -- that they match the expected accounts -- you can use ansible-navigator within your project and compare the user and account IDs produced with those found in the browser UI of the associated service.

Important

All of the instructions below assume that your project is using the correct CSP-flavored image of cldr-runner. If in doubt, you can use the full image which has all supported CSP resources.

Warning

Be sure you are within a project directory that has an ansible-navigator.yml configuration file that uses the cldr-runner image!

CDP Public Cloud

ansible-navigator exec -- cdp iam get-user

Note

If you do not yet have a CDP Public Cloud credential, follow these instructions on the Cloudera website.

See CDP CLI for further details.

AWS

ansible-navigator exec -- aws iam get-user

See AWS account requirements for further details.

Azure

ansible-navigator exec -- az account list

Note

If you cannot list your Azure accounts, consider using az login to refresh your local, i.e. host, credential.

See Azure subscription requirements for further details.

GCP

ansible-navigator exec -- gcloud auth list

Note

You need a provisioning Service Account for GCP setup (typically referenced by the gcloud_credential_file entry). If you do not yet have a Provisioning Service Account you can learn more on the Cloudera website.

See GCP requirements for further details.

Execution

All of the definitions and projects in cloudera-deploy are designed to work with ansible-navigator. Each project has discrete instructions on what and how to run, but in general, you will end up executing some form of the ansible-navigator run subcommand, like:

ansible-navigator run main.yml -e @config.yml -t plat

Occasionally, the instructions may ask you to run an individual module, such as ansible-navigator exec -- ansible some_group -m ping. You can learn more about the available subcommands on the ansible-navigator website.

Note

If you want to check out what's in the container, or use the container directly, run ansible-navigator exec -- /bin/bash!

Logs

The projects are configured to log their activities. In each, you will find a runs/ directory that houses all of the runtime artifacts of ansible-navigator and ansible-runner (the Ansible application and interface that does the actual Ansible command dispatching).

The log files are structured (JSON) and are indexed by playbook and timestamp. If you want to review, rather replay, you can load them into ansible-navigator:

ansible-navigator replay <playbook execution run file>.json

Upgrades

The cldr-runner image updates fairly often to include the latest libraries, new features and fixes. Depending on how ansible-navigator is configured (see the ansible-navigator.yml file), the application will check for an updated container image only if it is missing.

You can easily change this behavior; change your ansible-navigator.yml configuration in your project to:

ansible-navigator:
  execution-environment:
    pull:
      policy: always

Or use the CLI flags --pp or --pull-policy and set the value to always.

You can read more about updating this configuration on the ansible-navigator website.

Troubleshooting

If you need help, here are some resources:

Be sure to stop by the Discussions > Help board!

License and Copyright

Copyright 2023, Cloudera, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

cloudera-deploy's People

Contributors

anisf avatar asdaraujo avatar bbreak avatar chaffelson avatar clevesque avatar cmperro avatar curtishoward avatar fletchjeff avatar jimright avatar rch avatar rjendoubi avatar stevemar avatar tmgstevens avatar vsellappa avatar willdyson avatar wmudge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cloudera-deploy's Issues

Cannot change configuration parameter on deployed cluster

after deployment of the CDP private cluster and change of definition.yml for example hdfs failed volumes tolerated, like:

configs:
      HDFS:
        DATANODE:
          dfs_datanode_failed_volumes_tolerated: 3

when running the playbook again, no parameters are changed.

Trying to install CDP private cloud, but still need AWS credentials

Tried to install CDP private cloud. I updated the inventory file to include all my hosts, but during the deployment it still tries to connect to AWS and complains it does not have the appropriate credentials:

TASK [cloudera_deploy : Get AWS Account Info] ****************************************************************************************************************
Thursday 20 May 2021 22:45:31 +0000 (0:00:00.036) 0:00:38.058 **********
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: botocore.exceptions.NoCredentialsError: Unable to locate credentials
fatal: [localhost]: FAILED! => {"boto3_version": "1.17.66", "botocore_version": "1.20.66", "changed": false, "msg": "Failed to retrieve caller identity: Unable to locate credentials"}

How do I tell it this is a private cloud installation that has nothing to do with AWS ?

Can't detect the required Python library cryptography (>= 1.2.3)

While testing the deployment in the internal (cloudcat) and external environment - ran into the following issue:

TASK [cloudera.cluster.ca_server : Generate root private key] **************************************************************************************** Friday 28 January 2022 00:32:45 +0000 (0:00:00.854) 0:08:16.774 ******** fatal: [cla-tt-2a-mas1.clatest.telstraglobal.net]: FAILED! => {"changed": false, "msg": "Can't detect the required Python library cryptography (>= 1.2.3)"}

Check inside of the Docker:

cldr full-v1.5.3 #> pip show cryptography
Name: cryptography
Version: 3.3.2
Summary: cryptography is a package which provides cryptographic recipes and primitives to Python developers.
Home-page: https://github.com/pyca/cryptography
Author: The cryptography developers
Author-email: [email protected]
License: BSD or Apache License, Version 2.0
Location: /usr/local/lib64/python3.8/site-packages
Requires: cffi, six
Required-by: adal, ansible-base, azure-cli-core, azure-identity, azure-keyvault, azure-storage, msal, openstacksdk, paramiko, pyOpenSSL, pypsrp, pyspnego, requests-credssp, requests-ntlm

Facts distribution does not work with inline vaulted variables

Error message:
The full traceback is: Traceback (most recent call last): File "/home/centos/.local/lib/python3.6/site-packages/ansible/executor/task_executor.py", line 585, in _execute self._task.post_validate(templar=templar) File "/home/centos/.local/lib/python3.6/site-packages/ansible/playbook/task.py", line 307, in post_validate super(Task, self).post_validate(templar) File "/home/centos/.local/lib/python3.6/site-packages/ansible/playbook/base.py", line 431, in post_validate value = templar.template(getattr(self, name)) File "/home/centos/.local/lib/python3.6/site-packages/ansible/template/__init__.py", line 840, in template disable_lookups=disable_lookups, File "/home/centos/.local/lib/python3.6/site-packages/ansible/template/__init__.py", line 795, in template disable_lookups=disable_lookups, File "/home/centos/.local/lib/python3.6/site-packages/ansible/template/__init__.py", line 1057, in do_template res = j2_concat(rf) File "<template>", line 14, in root File "/home/centos/.local/lib/python3.6/site-packages/ansible/template/__init__.py", line 255, in wrapper ret = func(*args, **kwargs) File "/home/centos/.local/lib/python3.6/site-packages/ansible/plugins/filter/core.py", line 209, in from_yaml return yaml.safe_load(data) File "/home/centos/.local/lib/python3.6/site-packages/yaml/__init__.py", line 162, in safe_load return load(stream, SafeLoader) File "/home/centos/.local/lib/python3.6/site-packages/yaml/__init__.py", line 114, in load return loader.get_single_data() File "/home/centos/.local/lib/python3.6/site-packages/yaml/constructor.py", line 51, in get_single_data return self.construct_document(node) File "/home/centos/.local/lib/python3.6/site-packages/yaml/constructor.py", line 60, in construct_document for dummy in generator: File "/home/centos/.local/lib/python3.6/site-packages/yaml/constructor.py", line 413, in construct_yaml_map value = self.construct_mapping(node) File "/home/centos/.local/lib/python3.6/site-packages/yaml/constructor.py", line 218, in construct_mapping return super().construct_mapping(node, deep=deep) File "/home/centos/.local/lib/python3.6/site-packages/yaml/constructor.py", line 143, in construct_mapping value = self.construct_object(value_node, deep=deep) File "/home/centos/.local/lib/python3.6/site-packages/yaml/constructor.py", line 100, in construct_object data = constructor(self, node) File "/home/centos/.local/lib/python3.6/site-packages/yaml/constructor.py", line 429, in construct_undefined node.start_mark) yaml.constructor.ConstructorError: could not determine a constructor for the tag '!vault' in "<unicode string>", line 395, column 26: krb5_kdc_admin_password: !vault |

[Feature Req] Tool/Script to generate "cluster.yml" configs from existing exported CDP cluster template json files

Not sure I'm missing a simpler strategy, on howto prepare a new "cluster.yml" file required by this playbook (to be configured in the definition_path), from an exportable CDP (7.1.x / private-cloud) cluster ?
Would be great to get input from Cloudera guys :)
Alternatively, or in addition, it would be highly useful/helpful if much more advanced "cluster.yml" example would be added to the repo (for ex. to deploy a 3-master node HA cluster, as this was nicely done in the former HDP repo: https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/group_vars/example-hdp-ha-3-masters-with-ranger-atlas )

Once I learn here in the community, there's indeed no such existing script .. I'm happy to write&contribute myself something

Minimal features:

  • create the mapping of the contained services to the host-groups
  • create the mapping of all the found "configs" elements to the key/value pairs in the "cluster.yml" service's "dict" element
    • (later) nice to have: an option to skip any config values which are/were just defaults from the beginning
  • many other things .. that are required to make it useable/work?
  • handling these "refName" values in the sourc template json

Script Input / Output examples

  • Input file, just small extract (from exported cluster template)
    • Can give more infos later howto that for people not familiar.
{
  "cdhVersion" : "7.1.4",
  "displayName" : "Basic Cluster",
  "cmVersion" : "7.1.4",
  "repositories" : [ ... ],
  "products" : [ {
    "version" : "7.1.4-1.cdh7.1.4.p0.6300266",
    "product" : "CDH"
  } ],
  "services" : [ {
    "refName" : "zookeeper",
    "serviceType" : "ZOOKEEPER",
    "serviceConfigs" : [ {
      "name" : "zookeeper_datadir_autocreate",
      "value" : "true"
    } ],
    "roleConfigGroups" : [ {
      "refName" : "zookeeper-SERVER-BASE",
      "roleType" : "SERVER",
      "configs" : [ {
        "name" : "zk_server_log_dir",
        "value" : "/var/log/zookeeper"
      }, {
        "name" : "dataDir",
        "variable" : "zookeeper-SERVER-BASE-dataDir"
      }, {
        "name" : "dataLogDir",
        "variable" : "zookeeper-SERVER-BASE-dataLogDir"
      } ],
      "base" : true
    } ]
  }, 
...

  } ],
  "hostTemplates" : [ {
    "refName" : "HostTemplate-0-from-eval-cdp-public[1-3].internal.cloudapp.net",
    "cardinality" : 3,
    "roleConfigGroupsRefNames" : [ "hdfs-DATANODE-BASE", "spark_on_yarn-GATEWAY-BASE", "yarn-NODEMANAGER-BASE" ]
  }, {
    "refName" : "HostTemplate-1-from-eval-cdp-public0.internal.cloudapp.net",
    "cardinality" : 1,
    "roleConfigGroupsRefNames" : [ "hdfs-NAMENODE-BASE", "hdfs-SECONDARYNAMENODE-BASE", "spark_on_yarn-GATEWAY-BASE", "spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE", "yarn-JOBHISTORY-BASE", "yarn-RESOURCEMANAGER-BASE", "zookeeper-SERVER-BASE" ]
  } ],
...

Output file, following the format of cluster.yml, for ex: roles/cloudera_deploy/defaults/basic_cluster.yml

clusters:
  - name: Basic Cluster
    services: [HDFS, YARN, ZOOKEEPER]
    repositories:
      - https://archive.cloudera.com/cdh7/7.1.4.0/parcels/
    configs:
      ZOOKEEPER:
        SERVICEWIDE:
          zookeeper_datadir_autocreate: true
          zk_server_log_dir": "/var/log/zookeeper"
     
      HDFS:
        DATANODE:
          dfs_data_dir_list: /dfs/dn
        NAMENODE:
          dfs_name_dir_list: /dfs/nn
...
    host_templates:
      Master1:
        HDFS: [NAMENODE, SECONDARYNAMENODE]
        YARN: [RESOURCEMANAGER, JOBHISTORY]
        ZOOKEEPER: [SERVER]
      Workers:
        HDFS: [DATANODE]
        YARN: [NODEMANAGER]

keytool error: java.lang.Exception: Public keys in reply and keystore don't match

Trying to deploy CDP private cluster with kerberos, ranger and autotls.

playbook execution command:

ansible-playbook /runner/project/cloudera-deploy/main.yml -e "definition_path=/runner/project/cloudera-deploy/examples/sandbox" -e "profile=/home/runner/.config/cloudera-deploy/profiles/default" -t default_cluster,kerberos,tls  -i "/runner/project/cloudera-deploy/examples/sandbox/inventory_static.ini" --flush-cache

After execution, playbook fails on the task:

TASK [cloudera.cluster.tls_install_certs : Install signed certificate reply into keystore] ***
task path: /opt/cldr-runner/collections/ansible_collections/cloudera/cluster/roles/security/tls_install_certs/tasks/main.yml:126

with error below (on each node)

[ "cmd": "/usr/bin/keytool -importcert -alias \"node1.domain.com\" -file \"/opt/cloudera/security/pki/node1.domain.com.pem\" -keystore \"/opt/cloudera/security/pki/node1.domain.com.jks\" -storepass \"changeme\" -trustcacerts -noprompt](fatal: [node1.domain.com]: FAILED! => {"changed": false, "cmd": "/usr/bin/keytool -importcert -alias \"node1.domain.com\" -file \"/opt/cloudera/security/pki/node1.domain.com.pem\" -keystore \"/opt/cloudera/security/pki/node1.domain.com.jks\" -storepass \"changeme\" -trustcacerts -noprompt\n", "delta": "0:00:00.247693", "end": "2023-01-09 13:27:30.366003", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2023-01-09 13:27:30.118310", "stderr": "", "stderr_lines": [], "stdout": "keytool error: java.lang.Exception: Public keys in reply and keystore don't match", "stdout_lines": ["keytool error: java.lang.Exception: Public keys in reply and keystore don't match"]})

Any idea why is this happening ?

I have tried to import certs manually via

/usr/bin/keytool -importcert -alias node1.domain.com -file /opt/cloudera/security/pki/node1.domain.com.pem -keystore /opt/cloudera/security/pki/node1.domain.com.jks -trustcacerts -noprompt

And the cert have been added successfully...

Deployment fails with dynamic inventory and absolute definition paths

There appears to be a bug where, if you are using dynamic inventory and an absolute path to your definition, it fails to use the default inventory path.
e.g. ansible-playbook /opt/cloudera-deploy/main.yml -e "definition_path=/opt/cloudera-deploy/examples/c5secure" -t infra,full_cluster

Workaround is to pass in the correct inventory path with -i /runner/inventory
e.g.
ansible-playbook /opt/cloudera-deploy/main.yml -e "definition_path=/opt/cloudera-deploy/examples/c5secure" -i /runner/inventory -t infra,full_cluster

"unknown flag: --mount" in quickstart.sh

As the pre-req to run cloudera-deploy, I have installed the default docker version on CentOS 7.9 as follows:
yum install docker

That's the version that was installed:

# docker --version
Docker version 1.13.1, build 7d71120/1.13.1

Now, while trying to run quickstart.sh - I got the following error:

./quickstart.sh
Checking if Docker is running...
Docker OK
Trying to pull repository ghcr.io/cloudera-labs/cldr-runner ...
full-latest: Pulling from ghcr.io/cloudera-labs/cldr-runner
Digest: sha256:01504a335c7fe1c29ba695ca996b464be28925a545f7f2b5bb1c1624e145e208
Status: Image is up to date for ghcr.io/cloudera-labs/cldr-runner:full-latest
Ensuring default credential paths are available in calling using profile for mounting to execution environment
Ensure Default profile is present
Custom Cloudera Collection path not found
Mounting /home/ansible to container as Project Directory /runner/project
Creating Container cloudera-deploy from image ghcr.io/cloudera-labs/cldr-runner:full-latest
Checking OS
SSH authentication for container taken from /tmp/ssh-bAkDNAJcUgfV/agent.9389
Creating new execution container named 'cloudera-deploy'
unknown flag: --mount
See 'docker run --help'.

The above version does not recognise "--mount" flag.
It looks like "--mount" was introduced only in Docker 17.05
https://docs.docker.com/engine/release-notes/17.05/#client

Broken link

The link to CDP CLI in the readme page is broken. This is the text :
Visit the CDP CLI User Guide for further details regarding credential management. The link to user guide is broken

Cannot run ansible-navigator using Python3.8 on osx

tested on mac osx...

ansible-navigator using a Py3.8 env is unable to parse ansible-navigator.yml files as in our public-cloud/aws/ examples

when used with Py3.11 env, it was able to use the ansible-navigator.yml setting file

did not test any other versions of Python

SSH_AUTH_SOCK not set on Windows

When running the quickstart.sh script on an Ubuntu WSL session on Windows, the SSH_AUTH_SOCK variable isn't set as the ssh-agent isn't started by default.

$ ./quickstart.sh
Checking if Docker is running...
Docker OK
full-latest: Pulling from cloudera-labs/cldr-runner
Digest: sha256:15442500076f42918fd82f5f94cf0aaf4564aa235bd66b47edb2ec052e099e59
Status: Image is up to date for ghcr.io/cloudera-labs/cldr-runner:full-latest
ghcr.io/cloudera-labs/cldr-runner:full-latest
Ensuring default credential paths are available in calling using profile for mounting to execution environment
Ensure Default profile is present
Custom Cloudera Collection path not found
Mounting /mnt/c/Users/jeff/tmp to container as Project Directory /runner/project
Creating Container cloudera-deploy from image ghcr.io/cloudera-labs/cldr-runner:full-latest
Checking OS
SSH_AUTH_SOCK is empty or not set, unable to proceed. Exiting

One possible fix would be to add a line to the script to check and start the ssh-agent. But running ssh-agent directly doesn't set the SSH_AUTH_SOCK variable as it needs to be wrapped in an eval.

Would it be possible to add something like the following to the quickstart.sh script?

if pgrep -x "ssh-agent" >/dev/null
then
    echo "ssh-agent is running"
else
    echo "ssh-agent stopped"
    eval `ssh-agent -s` 
fi

I tried adding it to the start of the quickstart.sh script and it worked fine.

How to set proxy in the definition.yml ?

Hello, I am deploying the CDP private Basic Cluster via definition.yml file.
I have managed to add kerberos via adding the following code as an parameter in the basic cluster

 security:
        kerberos: true

I am looking for a solution of setting proxy parameters in definiton.yml
I think it should look like this:

configs:
    parcel_proxy_port: 1234
    parcel_proxy_server: my_beautiful_proxy.com

But in the configs section of the cluster it expects service name, I need to specify somehow that it's CM i suppose..

I have tried to add it into mgmt as well, it does not recognized those options.

SSH_AUTH_SOCK across OS :

Make the SSH_AUTH_SOCK implementation O.S agnostic

The current implementation of SSH_AUTH_SOCK is O.S specific specifically OSX specific.
The below hard-codes the target path that works only for OSX.

--mount type=bind,src=$SSH_AUTH_SOCK,target=/run/host-services/ssh-auth.sock \

Change this to make it work for all OS's , specifically Linux.

Centos7-init fails in various ways in some circumstance

pip3 install ansible

  • Fails when Ansible is already installed on the system as a package
  • Should also be pinned to >=2.10.0,<=2.11

tee -a ansible.cfg << EOF

  • if the script is run multiple times, this will be continuously concatenated to the file

inventory=inventory

Unqualified properties causing execution to fail

It seems that commit 0526f52 introduced some unqualified variables that are causing the execution to fail with the following error:

TASK [cloudera_deploy : Check Supplied terraform_base_dir variable] ************
task path: /runner/project/cloudera-deploy/roles/cloudera_deploy/tasks/init.yml:232

fatal: [localhost]: FAILED! => {
    "msg": "The conditional check 'infra_deployment_engine == 'terraform'' failed. The error was: error while evaluating conditional (infra_deployment_engine == 'terraform'): 'infra_deployment_engine' is undefined\n\nThe error appears to be in '/runner/project/cloudera-deploy/roles/cloudera_deploy/tasks/init.yml': line 232, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check Supplied terraform_base_dir variable\n  ^ here\n"
}

Password Requirements : CDP

CDP Password requirements not checked / enforced upfront.

Quickstart depends on cdp cli for creating a CDP environment and requires a specific password standard . This is not enforced or checked upfront, thereby failing later if an incorrect password is set.

The CDP password requirements must be checked upfront before cluster creation.

ec2 instance types check fails if aws cli not set to json

The call which checks on which ec2 instance types are available relies on AWS CLI as there isn't an Ansible collection call for it, the task which parses the output assumes it will be json and it fails if the user has set a non-json output in their AWS profile.
Task failing is cloudera.exe.infrastructure: Check required AWS EC2 Instance Types

Running playbook in DEBUG fails on localhost ssh connection

I tried to run the playbook in DEBUG mode and found that a local ssh connection failed:
[email protected]: Permission denied (publickey,password)

Turned out that in cloudera-deploy ~/.ssh is mounted under /home/runner/.ssh and HOME is set to be /home/runner… but in debug mode some bits still depend on /root/.ssh/.
Copying content from /home/runner/.ssh/ to /root/.ssh/ solved these issues.

TASK [cloudera.cluster.repometa : Download parcel manifest information url={{ repository | regex_replace('/?$','') + '/manifest.json' }}, status_code=200, body_format=json, retu
rn_content=True, url_username={{ parcel_repo_username | default(omit) }}, url_password={{ parcel_repo_password | default(omit) }}] ***
task path: /opt/cldr-runner/collections/ansible_collections/cloudera/cluster/roles/deployment/repometa/tasks/parcels.yml:17
Monday 31 May 2021  07:32:09 +0000 (0:00:00.111)       0:00:24.509 ************
 11741 1622446329.37654: sending task start callback
 11741 1622446329.37659: entering _queue_task() for localhost/uri
...
<127.0.0.1> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbas
ed,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/runner/.ansible/cp/21f0e6a9ae 127.0.0.1 '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
 12412 1622446329.49261: stderr chunk (state=2):
>>>OpenSSH_8.0p1, OpenSSL 1.1.1g FIPS  21 Apr 2020
...
 12412 1622446345.11344: stderr chunk (state=3):
>>>debug3: authmethod_lookup publickey
debug3: remaining preferred: ,gssapi-keyex,hostbased,publickey
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Trying private key: /root/.ssh/id_rsa
debug3: no such identity: /root/.ssh/id_rsa: No such file or directory
...
[email protected]: Permission denied (publickey,password).```

Unable to create CDP environment

Hi,
I am getting the following error while executing /opt/cldr-runner/collections/ansible_collections/cloudera/cloud/plugins/modules/env.py
from /opt/cldr-runner/collections/ansible_collections/cloudera/exe/roles/platform/tasks/setup_aws_env.yml

below is the error message
*
Monday 25 October 2021 17:08:40 +0000 (0:00:02.728) 0:01:33.504 ********
ok: [localhost] => {
"msg": {
"changed": false,
"exception": "Traceback (most recent call last):\n File "/root/.ansible/tmp/ansible-tmp-1635181717.7778952-26659-167799228659899/AnsiballZ_env.py", line 102, in \n _ansiballz_main()\n File "/root/.ansible/tmp/ansible-tmp-1635181717.7778952-26659-167799228659899/AnsiballZ_env.py", line 94, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File "/root/.ansible/tmp/ansible-tmp-1635181717.7778952-26659-167799228659899/AnsiballZ_env.py", line 40, in invoke_module\n runpy.run_module(mod_name='ansible_collections.cloudera.cloud.plugins.modules.env', init_globals=None, run_name='main', alter_sys=True)\n File "/usr/lib64/python3.8/runpy.py", line 207, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File "/usr/lib64/python3.8/runpy.py", line 97, in _run_module_code\n _run_code(code, mod_globals, init_globals,\n File "/usr/lib64/python3.8/runpy.py", line 87, in _run_code\n exec(code, run_globals)\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 1055, in \n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 1045, in main\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 662, in init\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/module_utils/cdp_common.py", line 42, in _impl\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 687, in process\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 926, in _reconcile_existing_state\nKeyError: 'logStorage'\n",
"failed": true,
"module_stderr": "Traceback (most recent call last):\n File "/root/.ansible/tmp/ansible-tmp-1635181717.7778952-26659-167799228659899/AnsiballZ_env.py", line 102, in \n _ansiballz_main()\n File "/root/.ansible/tmp/ansible-tmp-1635181717.7778952-26659-167799228659899/AnsiballZ_env.py", line 94, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File "/root/.ansible/tmp/ansible-tmp-1635181717.7778952-26659-167799228659899/AnsiballZ_env.py", line 40, in invoke_module\n runpy.run_module(mod_name='ansible_collections.cloudera.cloud.plugins.modules.env', init_globals=None, run_name='main', alter_sys=True)\n File "/usr/lib64/python3.8/runpy.py", line 207, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File "/usr/lib64/python3.8/runpy.py", line 97, in _run_module_code\n _run_code(code, mod_globals, init_globals,\n File "/usr/lib64/python3.8/runpy.py", line 87, in _run_code\n exec(code, run_globals)\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 1055, in \n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 1045, in main\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 662, in init\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/module_utils/cdp_common.py", line 42, in _impl\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 687, in process\n File "/tmp/ansible_cloudera.cloud.env_payload_69lmx119/ansible_cloudera.cloud.env_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env.py", line 926, in _reconcile_existing_state\nKeyError: 'logStorage'\n",
"module_stdout": "{'environmentName': 'arcp-aw-env', 'crn': 'crn:cdp:environments:us-west-1:a0ec84c6-fee6-4e9c-acdc-68e1f49a5184:environment:714b0df3-e459-40ba-b722-837018456722', 'status': 'CREATE_FAILED', 'region': 'us-east-1', 'cloudPlatform': 'AWS', 'credentialName': 'arcp-aw-xaccount-cred', 'created': datetime.datetime(2021, 10, 14, 6, 53, 1, 412000, tzinfo=tzlocal())}\nsdf\nexisting\n{'environmentName': 'arcp-aw-env', 'crn': 'crn:cdp:environments:', 'status': 'CREATE_FAILED', 'region': 'us-east-1', 'cloudPlatform': 'AWS', 'credentialName': 'arcp-aw-xaccount-cred', 'created': datetime.datetime(2021, 10, 14, 6, 53, 1, 412000, tzinfo=tzlocal())}\narn:aws:iam:::instance-profile/arcp-logs-role\n",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 1
}
}

Broken Link

The document link in the cloudera deploy instruction page, in the location:

If you do not have CDP user click here , produces a broken link : This is the link used

EZMode Documentation :

EZMode Documentation

Require better, more prescriptive steps that can be cut and pasted without having to read it .

CDP private teardown - asks for a credentials

I was able to deploy the cdp private without any credentials, only cdp license has been used.
I am trying to teardown our deployed cluster via tags -t teardown,all . however it fails with this missing credetials error.

TASK [cloudera.exe.runtime : Refresh Environment Info with Descendants] ****************************************************************************************************
task path: /opt/cldr-runner/collections/ansible_collections/cloudera/exe/roles/runtime/tasks/initialize_teardown.yml:17
Friday 11 November 2022  13:39:06 +0000 (0:00:00.069)       0:00:08.557 *******
fatal: [localhost]: FAILED! => {"changed": false, "error": "{'base_error': NoCredentialsError('Unable to locate CDP credentials: No credentials found anywhere in chain. The shared credentials file should be stored at /home/runner/.cdp/credentials.'), 'ext_traceback': ['  File \"/root/.ansible/tmp/ansible-tmp-1668173946.776787-24441-170028905131803/AnsiballZ_env_info.py\", line 102, in <module>\\n    _ansiballz_main()\\n', '  File \"/root/.ansible/tmp/ansible-tmp-1668173946.776787-24441-170028905131803/AnsiballZ_env_info.py\", line 94, in _ansiballz_main\\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\\n', '  File \"/root/.ansible/tmp/ansible-tmp-1668173946.776787-24441-170028905131803/AnsiballZ_env_info.py\", line 40, in invoke_module\\n    runpy.run_module(mod_name=\\'ansible_collections.cloudera.cloud.plugins.modules.env_info\\', init_globals=None, run_name=\\'__main__\\', alter_sys=True)\\n', '  File \"/usr/lib64/python3.8/runpy.py\", line 207, in run_module\\n    return _run_module_code(code, init_globals, run_name, mod_spec)\\n', '  File \"/usr/lib64/python3.8/runpy.py\", line 97, in _run_module_code\\n    _run_code(code, mod_globals, init_globals,\\n', '  File \"/usr/lib64/python3.8/runpy.py\", line 87, in _run_code\\n    exec(code, run_globals)\\n', '  File \"/tmp/ansible_cloudera.cloud.env_info_payload_51viniow/ansible_cloudera.cloud.env_info_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env_info.py\", line 471, in <module>\\n', '  File \"/tmp/ansible_cloudera.cloud.env_info_payload_51viniow/ansible_cloudera.cloud.env_info_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env_info.py\", line 461, in main\\n', '  File \"/tmp/ansible_cloudera.cloud.env_info_payload_51viniow/ansible_cloudera.cloud.env_info_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env_info.py\", line 424, in __init__\\n', '  File \"/tmp/ansible_cloudera.cloud.env_info_payload_51viniow/ansible_cloudera.cloud.env_info_payload.zip/ansible_collections/cloudera/cloud/plugins/module_utils/cdp_common.py\", line 42, in _impl\\n    result = f(self, *args, **kwargs)\\n', '  File \"/tmp/ansible_cloudera.cloud.env_info_payload_51viniow/ansible_cloudera.cloud.env_info_payload.zip/ansible_collections/cloudera/cloud/plugins/modules/env_info.py\", line 429, in process\\n', '  File \"/usr/local/lib/python3.8/site-packages/cdpy/environments.py\", line 55, in describe_environment\\n    resp = self.sdk.call(\\n', '  File \"/usr/local/lib/python3.8/site-packages/cdpy/common.py\", line 594, in call\\n    parsed_err = CdpError(err)\\n'], 'error_code': None, 'violations': None, 'message': None, 'status_code': None, 'rc': None, 'service': None, 'operation': None, 'request_id': None}", "msg": "None", "violations": null}

--skip-tags "database" doesn't function

@asdaraujo
We try to execute the deployer with --skip-tags "database" and it failed with the below error. Although we explicitly mentioned that Postages DB doesn't need to be install, the deployer tries to install a Postgres library.

ansible-playbook -i /runner/project/inventory_static.ini /runner/project/cloudera-deploy/main.yml -e "definition_path=/runner/project/" -e "abs_profile=/runner/project/profile.yml" -t full_cluster  --skip-tags "database" -vvv

and this is the error message  The full traceback is:
WARNING: The below traceback may not be related to the actual failure.
  File "/tmp/ansible_postgresql_user_payload_qTn8l4/ansible_postgresql_user_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_user.py", line 277, in
[WARNING]: Module remote_tmp /var/lib/pgsql/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create
the remote_tmp dir with the correct permissions manually
fatal: [semicjs02-bi-1.int.semicjs02.nice.com -> semicjs02-bi-1.int.semicjs02.nice.com]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "ca_cert": null,
            "comment": null,
            "conn_limit": null,
            "db": "",
            "encrypted": true,
            "expires": null,
            "fail_on_user": true,
            "groups": null,
            "login_host": "",
            "login_password": "",
            "login_unix_socket": "",
            "login_user": "postgres",
            "name": "scm",
            "no_password_changes": false,
            "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "port": 5432,
            "priv": null,
            "role_attr_flags": "",
            "session_role": null,
            "ssl_mode": "prefer",
            "state": "present",
            "trust_input": true,
            "user": "scm"
        }
    },
    "msg": "Failed to import the required Python library (psycopg2) on semicjs02-bi-1's Python /usr/bin/python. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"

quickstart.sh: unintentional command execution?

quickstart.sh includes the following checks:

[...]

if [ -n "${CLDR_PYTHON_PATH}" ]; then
  echo "Path to custom Python sourcecode supplied as ${CLDR_PYTHON_PATH}, setting as System PYTHONPATH"
  PYTHONPATH="${CLDR_PYTHON_PATH}"
else
  echo "'CLDR_PYTHON_PATH' is not set, skipping setup of PYTHONPATH in execution container"
fi

echo "Checking if ssh-agent is running..."
if pgrep -x "ssh-agent" >/dev/null
then
    echo "ssh-agent OK"
else
    echo "ssh-agent is stopped, please start it by running: eval `ssh-agent -s` "
    #eval `ssh-agent -s`
fi

echo "Checking OS"
if [ ! -f "/run/host-services/ssh-auth.sock" ];
then
   if [ -n "${SSH_AUTH_SOCK}" ];
   then
        SSH_AUTH_SOCK=${SSH_AUTH_SOCK}
   else
    echo "ERROR: SSH_AUTH_SOCK is empty or not set, unable to proceed. Exiting"
    exit 1
   fi
else
    SSH_AUTH_SOCK=${SSH_AUTH_SOCK}
fi

[...]

The first check looks for an env var being set, and expliticly skips some setup if that env var is not found, i.e. that setting is explicitly optional. Fine.

The third check looks for an env var being set, and explicitly exits with an error if it's not present. Also fine.

The second check is to see if ssh-agent is running. If it is not running, the code looks like it intends to print a helpful error message for the user, to empower the user to manually do something before coming back to try this script again. There are a couple of issues here:

  1. If ssh-agent is required, shouldn't the else block here explicitly exit 1, like the next check for $SSH_AUTH_SOCK does? (Otherwise, the intention would be clearer if it stated explicitly that it's carrying on regardless, like the first check does, but I don't think that's the idea here).
  2. (main issue) Because the backticks aren't escaped, printing the error message actually runs the ssh-agent -s command. Pretty sure that's not what's intended, since in that case the actual message to the user doesn't make sense. Also, judging by the fact the next line is a commented-out version of the same command, seems it was considered to do automatically, but chose not to. Again, this is where I think an explicit exit 1 should be here instead.

If my read is right, happy to submit a PR?

Improve definition file loading

Context :

I began to have a huge unreadable definition file, so i wanted to use ansible variables.
I discovered that the definition file was loaded as a simple file, parsed as-is, in yaml. That means variables won't be interpreted (i.e. host_templates: "{{ cdp_host_templates }}" will be interpred as string : "{{ cdp_host_templates }}")

My definition file looks like this :

clusters:

- name: "{{ cdp_cluster_name }}"
  type: base
  services: "{{ cdp_services }}"
  databases: "{{ cdp_databases }}"
  configs: "{{ fresh_install_configs }}"
  host_templates: "{{ cdp_host_templates }}"
...

Issue :

Using variables in the defintion file raises an error due to a check done on the host_templates here

The error message is : Unable to host template {{ host_template }} in the cluster definition

This check is trying to find a host template named 'xxx' in : host_templates: "{{ host_templates }}" and fails because it is interpreted as a string...
The tasks reponsible for this is here

Solution :

Use include_vars instead of lookup(file...)

Log Levels :

Feature Request

It would be nice to have different log levels and the logging to be more verbose by default vs whats output when the ansible scripts actually run.
This would make it easier to check if a specific step is executing or has stopped. , especially important when during creation of VM's that take longer to finish.

Reduce User Interaction :

The Ansible script creates and connects to newly formed EC2 instances and while connecting to each VM it asks the trust question.
We should have a flag to automate this. A standard -y to continue un-interrupted would allow the creation and deployment without constant user interaction.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.