Coder Social home page Coder Social logo

one-script-deploy's Introduction

One Script Deploy

One script to rule them all: CDP / CDP Private Cloud / CDH / HDP !

Given some machines, this script will setup all pre-requisites, Install, Configure a fully secure cluster and Load Data into it.

Requirements

Launch the script requirements.sh to enable all requirements before launching the full script.

Installation

Command line tool

To install a cluster, default one is a CDP 7 - 10 nodes with Kerberos and TLS set:

export PAYWALL_USER=  # Your Paywall User from Cloudera to access archive.cloduera.com
export PAYWALL_PASSWORD=  # Your Paywall password from Cloudera to access archive.cloduera.com
export LICENSE_FILE=   # Your Licence file from Cloudera
export CLUSTER_NAME=   # A name of your choice (ex: cloudera-test )
export NODES=   # *Space* separated list of nodes (ex: "node1 node2 node3 ") (You must provide as much as nodes are needed for the type of installation you are launching, default being 10.)
./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
        --license-file=${LICENSE_FILE} \
        --paywall-username=${PAYWALL_USER} \
        --paywall-password=${PAYWALL_PASSWORD} \
        --nodes="${NODES}"

N.B. : This assumes that a passwordless connection is present from here to all your cluster nodes, however provide a password with --node-password or a private key file with --node-key

Configuration

Many more configurations are available, see them all with:

./setup-cluster.sh --help

Examples

!!! Special No license or Paywall Cluster : CDP 7 - Basic 6 nodes !!!

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --cluster-type=basic \
    --nodes-base="${NODES}"

CDP 7 - Full 10 nodes with almost all services (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --nodes-base="${NODES}"

CDP 7 - Basic 6 nodes (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=basic \
    --nodes-base="${NODES}"

CDP 7 - Basic encrypted 6 nodes (Kerberos / TLS) (You can specify 1 or 2 nodes for KTS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=basic-enc \
    --nodes-kts=<Dedicated Node(s) for KTS> \
    --nodes-base="${NODES}"

CDP 7 - Basic 6 nodes with Free IPA on a dedicated node (All CDP clusters can have free-ipa just by adding --free-ipa=true and provide a node with --node-ipa=) (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=basic \
    --free-ipa=true \
    --node-ipa=<One node dedicated to IPA> \
    --nodes-base="${NODES}"

CDP 7 - 9 nodes with 3 dedicated for PvC with ECS (Kerberos / TLS / FreeIPA)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=pvc \
    --nodes-ecs=<Space separated list of 3 nodes> \
    --node-ipa=<One node dedicated to IPA> \
    --nodes-base="${NODES}"

CDP 7 - 6 nodes basic for PVC with Openshift (Experiences installed on a provided OCP cluster) (Kerberos / TLS / FreeIPA)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=pvc-oc \
    --kubeconfig-path=<Path to your kubeconfig file> \
    --oc-tar-file-path=<Path to your oc.tar file downloaded from RedHat> \
    --node-ipa=<One node dedicated to IPA> \
    --nodes-base="${NODES}"
./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=streaming \
    --nodes-base="${NODES}"
./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=all-services-pvc \
    --nodes-kts=<Dedicated Node for KTS> \
    --node-ipa=<Dedicated Node for IPA> \
    --kubeconfig-path=<Path to your kubeconfig file> \
    --oc-tar-file-path=<Path to your oc.tar file downloaded from RedHat> \
    --nodes-base="${NODES}"
./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=full-enc-pvc \
    --nodes-kts=<Dedicated Node(s) for KTS> \
    --node-ipa=<Dedicated Node for IPA> \
    --kubeconfig-path=<Path to your kubeconfig file> \
    --oc-tar-file-path=<Path to your oc.tar file downloaded from RedHat> \
    --nodes-base="${NODES}"

CDP 7 - Workload XM cluster (1 WXM cluster of 5 nodes associated with a base cluster (provided in command line) ) (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=wxm \
    --altus-key-id=<ALTUS key ID provided by Cloudera> \
    --altus-private-key=<path to ALTUS private key provided by Cloudera> \
    --cm-base-url=<http://<CM host to connect to WXM>:<Port> \
    --tp-host=<Host in base cluster that will have Telemetry Publisher installed> \
    --nodes-base="${NODES}"

CDP 7.1.8 - Full 10 nodes with almost all services (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cdh-version='7.1.8.1' \
    --cm-version='7.7.3-33365545' \
    --nodes-base="${NODES}"

CDP 7 - Unsecure

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --kerberos=false \
    --tls=false \
    --nodes-base="${NODES}"

CDH 6 (Kerberos)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=cdh6 \
    --nodes-base="${NODES}"

CDH 5 (Kerberos)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=cdh5 \
    --nodes-base="${NODES}"

HDP 3 (Kerberos)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=hdp3 \
    --nodes-base="${NODES}"

HDP 2 (Kerberos)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=hdp2 \
    --nodes-base="${NODES}"

Output

CM & Ambari

At the end, CM or Ambari depending on your installation should be available at the first node URL with appropriate http or https and port (depending on tls parameters for HDP which is false by default and tls for CDP which is true by default).

During the installation, you can also follow the installation from CM or Ambari by connecting to it.

N.B.: It is recommended to not interfer with the cluster during ansible installation until it is done

Users and Data

At the end of the installation, if it completed successfully, users are created on machines, their keytabs too and are retrieved in your local computer under /tmp/, krb5.conf is also retrieved.

Moreover, it is also possible to launch some random data generation into various systems.

All default passwords are Cloudera1234

Details on Installation

This describe in details the steps made during the installation in the right order, each one could be skipped and hence be launched separately.

Architecture

Once you gathered all previous requirements, a launch could be made, it will mainly consist of 5 steps:

  • Prepare your machines

  • Launch the installation from the first node of your cluster using appropriate ansible playbook and files

  • Do post-install configuration (mainly for CDP)

  • Create users on your cluster

  • Load some data into your cluster

Each step could be skipped (see command line help).

Scripts

This group of scripts, coordinated by main script: setup-cluster.sh has the goal to configure machines provided and launch a CDP (or HDP, CDH) installation with ansible. Finally, some extra configurations steps and random data could be generated into different services.

All this, is only made from your machine.

This script relies on ansible scripts that must be accessible from your machine (if they are not, please setup an internal webserver and provide its url through command line).

Ansible script relies also on Cloudera repository to access CDP, CM, HDP, Ambari etc…​ (if they are not accessible, please setup an internal webserver and provide its url through command line).

This script relies also on github repository to load data. (if they are not accessible, please setup an internal webserver and provide its url through command line).

Setup Machines

This step uses Playbook hosts_setup.

If you did not set parameter --setup to false, it will prepare all machines by setting ssh-passwordless, pushing required files to them.

N.B.: This step can be done only one time and then bypass if you reuse same machines

Ansible Installation

This step uses Playbook ansible_install_preparation and then launch commands directly on the host to launch ansible installation there.

The first playbook used can be skipped setting parameter --install to false, which is true by default.

It cleans up the first node, creates a directory ~/deployment/ansible-repo/, get ansible repository as a zip in it and add files for your installation in it.

Then, the proper ansible command corresponding to the installation is lauched directly on the first node.

Post Installation

This step uses Playbook post_install.

If you install a CDP cluster and let parameter --post-install to true, it will do some extra-steps, such as setting no unlogin on CM, fix various potential bugs.

User Creation

This step uses Playbook user_creation.

If you did not set explicitly parameter --user-creation to false, and installation completed succesfully, some users are created defined in extra_vars of user_creation.

They are present on all nodes with their /home directory containing their keytabs.

Their keytabs are also fetch in your /tmp directory along with the krb5.conf allowing you to kinit directly from your computer.

Data Loading

This step uses Playbook data_load.

If you let parameter --data-load to true, a data loading step will start (only on CDP, HDP 2 and CDH 5 currently) to generate data into existing services of the paltform: HDFS, HBase, Hive etc…​

It is based on random-datagen project

Note that this step is completely extensible as you can add new files to specify how data should be generated in folder playbooks/data_load/generate_data/models

N.B.: This step will also create Ranger required policies, and these are also extensible by adding policies in playbooks/data_load/ranger_policies/push_policies/policies

Extension

Once you are familiar with these scripts, you can easily tune them using command-line parameters to provide your own cluster files and repositories.

Cluster Definition

To provide a quick new definition of a cluster:

  1. Copy-Paste directory ansible-cdp and name it for example: ansible-cdp-configured

  2. Make all your modifications in files of your copied directory

  3. Launch script with argument: --cluster-type=ansible-cdp-configured (It will automatically take files under ansible-cdp-configured/ directory)

User Creation & Data Loading

Those steps can be launched indepently and you can configure it to create more users or load different and more data.

Look inside playbooks folder to extra_vars.yml to get more about possibilities.

Private Cloud

Private Cloud setup (on ECS or OC) can also be launched independently on a running cluster.

Configuration of private cloud cluster can also be launched independently. (Use --install-pvc=false but --pvc=true to configure but not re-install your pvc).

In extra_vars.yml you can provide CDWs, CDEs, CMLs that will be provisionned for you and also rights that you expect on your users.

Limitations & Known Bugs

  • TLS is not set for HDP & CDH clusters

  • Data loading is not made for HDP 3 & CDH 6 clusters

  • Free IPA is only available for CDP clusters

Please feel free to contribute and help solve and implement TODOs listed in TODOs.adoc

one-script-deploy's People

Contributors

frischhwc avatar siryaro avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.