Coder Social home page Coder Social logo

helm-hadoop-3's Introduction

Hadoop-3.2.1 Chart

This chart is modified from stable/hadoop.

Hadoop is a framework for running large scale distributed applications.

This chart is primarily intended to be used for YARN and MapReduce job execution where HDFS is just used as a means to transport small artifacts within the framework and not for a distributed filesystem. Data should be read from cloud based datastores such as Google Cloud Storage, S3 or Swift.

Chart Details

Installing the Chart

To install the chart with the release name hadoop that utilizes 50% of the available node resources:

$ helm install --name hadoop $(stable/hadoop/tools/calc_resources.sh 50) stable/hadoop

Note that you need at least 2GB of free memory per NodeManager pod, if your cluster isn't large enough, not all pods will be scheduled.

The optional calc_resources.sh script is used as a convenience helper to set the yarn.numNodes, and yarn.nodeManager.resources appropriately to utilize all nodes in the Kubernetes cluster and a given percentage of their resources. For example, with a 3 node n1-standard-4 GKE cluster and an argument of 50, this would create 3 NodeManager pods claiming 2 cores and 7.5Gi of memory.

Persistence

To install the chart with persistent volumes:

$ helm install --name hadoop $(stable/hadoop/tools/calc_resources.sh 50) \
  --set persistence.nameNode.enabled=true \
  --set persistence.nameNode.storageClass=standard \
  --set persistence.dataNode.enabled=true \
  --set persistence.dataNode.storageClass=standard \
  stable/hadoop

Change the value of storageClass to match your volume driver. standard works for Google Container Engine clusters.

Configuration

The following table lists the configurable parameters of the Hadoop chart and their default values.

Parameter Description Default
image.repository Hadoop image (source) danisla/hadoop
image.tag Hadoop image tag 2.9.0
imagee.pullPolicy Pull policy for the images IfNotPresent
hadoopVersion Version of hadoop libraries being used 2.9.0
antiAffinity Pod antiaffinity, hard or soft hard
hdfs.nameNode.pdbMinAvailable PDB for HDFS NameNode 1
hdfs.nameNode.resources resources for the HDFS NameNode requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m
hdfs.dataNode.replicas Number of HDFS DataNode replicas 1
hdfs.dataNode.pdbMinAvailable PDB for HDFS DataNode 1
hdfs.dataNode.resources resources for the HDFS DataNode requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m
hdfs.webhdfs.enabled Enable WebHDFS REST API false
yarn.resourceManager.pdbMinAvailable PDB for the YARN ResourceManager 1
yarn.resourceManager.resources resources for the YARN ResourceManager requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m
yarn.nodeManager.pdbMinAvailable PDB for the YARN NodeManager 1
yarn.nodeManager.replicas Number of YARN NodeManager replicas 2
yarn.nodeManager.parallelCreate Create all nodeManager statefulset pods in parallel (K8S 1.7+) false
yarn.nodeManager.resources Resource limits and requests for YARN NodeManager pods requests:memory=2048Mi,cpu=1000m,limits:memory=2048Mi,cpu=1000m
persistence.nameNode.enabled Enable/disable persistent volume false
persistence.nameNode.storageClass Name of the StorageClass to use per your volume provider -
persistence.nameNode.accessMode Access mode for the volume ReadWriteOnce
persistence.nameNode.size Size of the volume 50Gi
persistence.dataNode.enabled Enable/disable persistent volume false
persistence.dataNode.storageClass Name of the StorageClass to use per your volume provider -
persistence.dataNode.accessMode Access mode for the volume ReadWriteOnce
persistence.dataNode.size Size of the volume 200Gi

Related charts

The Zeppelin Notebook chart can use the hadoop config for the hadoop cluster and use the YARN executor:

helm install --set hadoop.useConfigMap=true stable/zeppelin

References

helm-hadoop-3's People

Contributors

chenseanxy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.