Coder Social home page Coder Social logo

ansible-hadoop's Introduction

ansible-hadoop

These Ansible playbooks will build a Hadoop cluster (Hortonworks Data Platform).

You can pre-build a Rackspace cloud environment or run the playbooks against an existing environment.

[Configuration files] (id:configuration)

To customize, change the variables under playbooks/group_vars folder:

  1. playbooks/group_vars/all: contains global cluster and cloud settings
  2. playbooks/group_vars/master-nodes: master-nodes configuration
  3. playbooks/group_vars/slave-nodes: slave-nodes configuration
  4. playbooks/group_vars/edge-nodes: edge-nodes configuration

For a one-node cluster, set cloud_nodes_count in master-nodes to 1 and cloud_nodes_count in slave-nodes to 0.

[Requirements] (id:requirements)

  • Requires Ansible 1.8 or newer

  • Expects CentOS/RHEL 6.x hosts

  • Building the cloud environment requires the pyrax Python module: https://github.com/rackspace/pyrax

    Also recommended is to run pip install oslo.config netifaces.

  • The cloud environment requires the standard pyrax credentials file that looks like this:

    [rackspace_cloud]
    username = my_username
    api_key = 01234567890abcdef
    

    This file will be referenced in playbooks/group_vars/all (the rax_credentials_file variable).

    By default, the file is expected to be: ~/.raxpub

[Scripts] (id:scripts)

###provision_rax.sh

To provision a cloud environment, run the provision_rax.sh script after you've customized the variables under playbooks/group_vars:

bash provision_rax.sh

###bootstrap* and hortonworks*

Similarly, run the bootstrap and hortonworks scripts (in this order), depending what type of environment you have.

Example for a cloud environment:

bash bootstrap_rax.sh
bash hortonworks_rax.sh

For dedicated / prebuilt environments, you'll need to manually add the nodes in the inventory/static file.

Accessing Ambari

Once you are at this point you can see progress by accessing the Ambari interface (the ambari-node will be the last host that ran a play).

The provided Ansible playbook will only open the firewall if you've added your workstation IP to allowed_external_ips variable in the playbooks/group_vars/all file.

Alternatively, you can access Ambari by either opening the firewall manually or by opening a socks proxy with the following command:

ssh -D 12345 root@ambari-node

You will need to modify your browser settings to use socks proxy localhost and port 12345.

You'll then be able to navigate to http://ambari-node:8080 in your configured browser and access all subsidiary links.

###provision_cbd.sh

Provision a Rackspace Cloud Big Data cluster (http://www.rackspace.com/cloud/big-data) by running this script.

Customize it via the playbooks/group_vars/cbd file.

bash provision_cbd.sh

[Ansible-Hadoop History] (id:history)

As with many projects this code is the end result of a lot of effort from individuals not properly represented by a simple commit history.

Rackspace started deploying Hadoop on dedicated gear for customers more than a year ago in a very manual process. This process landed with myself and these Rockstars:

Joe Engel (Racker Emeritus)

Mark Lessel

Alexandru Anghel

All of whom wrote a lot of the automation for deploying Hadoop on customer gear at Rackspace.

Today with a pile of customers under our belt and many more all the time, we wanted to share our efforts with the world by publishing this code which you can also use to deploy Hadoop in various ways at Rackspace.

This of course is only the beginning!

I hope this project evolves and inspires even more Rockstars to find ways to contribute.

ansible-hadoop's People

Contributors

alexandruanghel avatar grierdavid avatar magglass1 avatar jinglejengel avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.