Coder Social home page Coder Social logo

oracle-quickstart / oci-hortonworks Goto Github PK

View Code? Open in Web Editor NEW
1.0 6.0 1.0 168 KB

Terraform module to deploy Hortonworks on Oracle Cloud Infrastructure (OCI)

License: Apache License 2.0

HCL 36.37% Shell 63.63%
cloud terraform oci hortonworks hdp hdf hadoop spark partner-led oracle

oci-hortonworks's Introduction

oci-hortonworks

These are Terraform modules that deploy Hortonworks on Oracle Cloud Infrastructure (OCI). They are developed jointly by Oracle and Cloudera.

Worker Nodes Bastion Instance Utility and Master Instances
Recommended (Default) BM.DenseIO2.52 VM.Standard2.4 VM.Standard2.16
Minimum VM.Standard2.8 VM.Standard2.1 VM.Standard2.8

Host types can be customized in this template. Also included with this template is an easy method to customize block volume quantity and size as pertains to HDFS capacity. See "variables.tf" for more information in-line.

Scaling

Modify the variable "worker_node_count" to scale the number of cluster workers. The default (minimum) is 3 nodes.

variable "worker_node_count" { default = "3" }

HDFS Storage Capacity - NVME & Block Volumes

By default this template is set to leverage DenseIO Bare Metal shapes with local NVME, as well as Block Volumes using Data Tiering for HDFS. When mixed storage is provisioned, local NVME defaults to [DISK] and Block Volumes are set to [ARCHIVE] using the default Data Tiering storage policies. If you change the worker type to a non-DenseIO shape, then you will need to scale the amount of HDFS capacity for the cluster by incrementing the "block_volumes_per_worker" variable. By default these volumes are 700GB in size, which maximizes IOPS and throughput at a per-volume level.

When using DenseIO local storage only, set "block_volumes_per_worker" to "0" to remove Block Volumes for HDFS entirely and only use local NVME for HDFS.

If higher density is required, the block volume size can be scaled up in tandem with the block volume count using the variable "data_blocksize_in_gbs". Best practice is to scale wider on the number of block volumes per worker rather than using a small number of high capacity volumes.

Block Volumes for Logs & Cache

This deployment also sets up a 4-volume RAID0 partition using 700GB Block Volumes mounted to Workers as /hadoop. This facilitates a 2TB caching layer for log data and other data in transit as part of workload. The /tmp file system is also bond mounted to this location, to provide a fast caching layer as well as prevent the OS filesystem from being consumed entirely.

Deployment Customization

Version 3.x versus 2.x deployment is controlled using the following variables:

hdp_version
ambari_version

Refer to the Hortonworks Support Matrix for version dependencies between platform components.

The variable "deployment_type" controls whether Secure Cluster is setup as part of deployment. By default this template is set to use "simple" deployment, which is a fast deployment. To enable Kerberos for cluster security, change the "deployment_type" to "secure". This setup takes a little longer, see the Cluster Security section below for more detail.

Additional Deployment customization is done by modifying a few files:

  • scripts/hdp_deploy.sh Cluster Deployment, customize Ambari admin credentials, HDP version, Cluster Name, Configuration and Cluster Topology. Configuration and Topology customization requires knowledge of Ambari Blueprints. YAML can be modified/inserted into the appropriate section of this script to allow for custom deployment.

    UTILS_version CLUSTER_NAME ambari_login ambari_password

Deployment

Deployment uses OCI Resource Manager. Step by step instructions can be found here.

  • Create a Stack in Resource Manager
  • Upload a zipball of this branch
  • Copy the contents of a public/private SSH keypair to the variables section
  • Alternatively modify any deployment variables (such as worker_shape)
  • Run a Plan action in Resource Manager
  • Run an Apply action in Resource Manager

Log output will show the Ambari IP address once Apply is finished. Note that this URL will be available several minutes after the action is finished, as installation and deployment is done using CloudInit.

Cluster Security

This template also includes Kerberos Secure Cluster installation as an option. Setting "deployment_type" to "secure" will enable Kerberos. This uses a local KDC on the Utility host. Administration of Kerberos principals can be done on this host using "kadmin.local" as root user. Principals are in the format of "user/<host_fqdn>@HADOOP.COM". An admin principal for use with Ambari is also setup as part of deployment, "ambari/[email protected]".

When using secure cluster you should not over-write any of the default principals which are setup by Ambari. For cluster administration it is suggested you set a principal for the HDFS user on the Utility host:

kadmin.local
add_principal hdfs/hw-utility-1.public3.hwvcn.oraclevcn.com

You will be prompted for a password for the principal. Once complete, you can change to the HDFS user and authenticate using kinit:

sudo su - hdfs
kinit hdfs/`hostname -f`@HADOOP.COM

At this point HDFS administration can be done using normal hdfs dfs commands.

Kerberos can be customized to a custom realm if desired. This will require modification of scripts/boot.sh and scripts/boot_ambari.sh for KDC server, and krb5 host configs.

It is also highly suggested you modify scripts/boot_ambari.sh and change the following default paramaters prior to Cluster deployment:

KERBBEROS_PASSWORD="SOMEPASSWORD"
AMBARI_USER_PASSWORD="somepassword"

MySQL for Ambari and Metadata

MySQL Server is installed on the Ambari host for use with Ambari and Cluster Metadata. The default database passwords are controlled in scripts/boot_ambari.sh. It is recommended to change these default values prior to deployment:

mysql_db_password="somepassword"
ambari_db_password="somepassword"

The "ambari_user" variable is the Database user for Ambari.

Destroy Deployment

  • Run a Destroy action from Resource Manager

This will prompt to ensure you wish to destroy everything, then will remove all elements of the deployment (effectively destroying all data in the process). This is not recoverable.

oci-hortonworks's People

Contributors

benofben avatar smithzc avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

oci-portfolio

oci-hortonworks's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.