kiwenlau / hadoop-cluster-docker Goto Github PK

View Code? Open in Web Editor NEW

1.8K 91.0 853.0 190 KB

Run Hadoop Custer within Docker Containers

License: Apache License 2.0

Shell 100.00%

hadoop-cluster-docker's Introduction

Run Hadoop Cluster within Docker Containers

Blog: Run Hadoop Cluster in Docker Update
博客: 基于Docker搭建Hadoop集群之升级版

3 Nodes Hadoop Cluster

1. pull docker image

sudo docker pull kiwenlau/hadoop:1.0

2. clone github repository

git clone https://github.com/kiwenlau/hadoop-cluster-docker

3. create hadoop network

sudo docker network create --driver=bridge hadoop

4. start container

cd hadoop-cluster-docker
sudo ./start-container.sh

output:

start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
root@hadoop-master:~#

start 3 containers with 1 master and 2 slaves
you will get into the /root directory of hadoop-master container

5. start hadoop

./start-hadoop.sh

6. run wordcount

./run-wordcount.sh

output

input file1.txt:
Hello Hadoop

input file2.txt:
Hello Docker

wordcount output:
Docker    1
Hadoop    1
Hello    2

Arbitrary size Hadoop cluster

1. pull docker images and clone github repository

do 1~3 like section A

2. rebuild docker image

sudo ./resize-cluster.sh 5

specify parameter > 1: 2, 3..
this script just rebuild hadoop image with different slaves file, which pecifies the name of all slave nodes

3. start container

sudo ./start-container.sh 5

use the same parameter as the step 2

4. run hadoop cluster

do 5~6 like section A

hadoop-cluster-docker's People

Contributors

Stargazers

Watchers

Forkers

chengdian-yang vovoma haiyang1987 hejian991 tidycoding qlycool sunyifei83 harrypenny giserh yyljlyy b-cube jingruhou yerly lipeng-github jonsychen omusico cherrydocker tunnyblock linjk hayekw piaolinzhi chenhuimin bwboy ericdoug tian-xiaobo srangwal nalingarg2 vaibhav276 hqxsn ye-yong-chi xwzpp kangzhenkang jude90 san21886 nanqiangyipo wenpei renozhang hujb2000 cndavy vigarbuaa liuzheng javsalgar bubuhui wziyong xjkwq1qq amine-ben zhedasuiyuan zhengweisk tb12315 diginoise howwise wiseant taicai young8 jurriaan arindamchoudhury caicloud lincoln310 ming-wu vaseem-mv mfelgamal andyzheng soa4java wilsonlv ashwinrayaprolu maxid tufeiping hydracz healtech mage234 myjack gcoban zhilinwang kalmarimi sensaid bobsense fivesheep jbdalido kramano taylornm08z geekhuyang allenfang 144lucky asanjar xiaoleigua andieguo lancergr chanehua renwu58 launching-pad cphilo itsmeolivia tong-1324 liamzhuuu himanshuarora ilyak77 joseph5wu seusofthd a520ass carmhuo

hadoop-cluster-docker's Issues

org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

hbase-env.sh

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.7+ required.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

# Extra Java CLASSPATH elements.  Optional.
 export HBASE_CLASSPATH=/usr/local/hadoop/etc/hadoop/

# The maximum amount of heap to use. Default is left to JVM default.
# export HBASE_HEAPSIZE=1G

# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations
# needed setting up off-heap block caching. 

# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX
# section in HBase Reference Guide for instructions.

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true

hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://hadoop-master:9000/hbase</value>
  </property>
  <property>
     <name>hbase.cluster.distributed</name>
     <value>true</value>
  </property>
  <property>
      <name>hbase.master</name>
      <value>hadoop-master:60000</value>
  </property>
   <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/usr/local/zookeeper</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>hadoop-master,hadoop-slave1,hadoop-slave2,hadoop-slave3,hadoop-slave4</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>
</configuration>

hbase(main):001:0> status

ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
	at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2293)
	at org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:777)
	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55652)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2180)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
	at java.lang.Thread.run(Thread.java:745)

Here is some help for this command:
Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
default is 'summary'. Examples:

  hbase> status
  hbase> status 'simple'
  hbase> status 'summary'
  hbase> status 'detailed'
  hbase> status 'replication'
  hbase> status 'replication', 'source'
  hbase> status 'replication', 'sink
'

root@hadoop-master:~# jps
2116 Jps
1320 HQuorumPeer
222 NameNode
1430 HMaster
696 ResourceManager
1737 Main
467 SecondaryNameNode

run my mapreduce job giving following error

When I run

hadoop jar autoComplete.jar src.autoComplete.Driver input output 5

It takes a long time to see the run log changes, and then it's stucking in the loop:

17/11/28 12:07:02 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/11/28 12:07:03 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/11/28 12:07:04 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/11/28 12:07:05 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

What's wrong here? Thanks.

start-container.sh脚本问题

脚本中的代码：
# start hadoop master container sudo docker rm -f hadoop-master &> /dev/null echo "start hadoop-master container..." sudo docker run -itd \ --net=hadoop \ -p 50070:50070 \ -p 8088:8088 \ --name hadoop-master \ --hostname hadoop-master \ kiwenlau/hadoop:1.0 &> /dev/null

但是下载的却是kiwenlau/hadoop-master:0.1.0，到底该用哪个呢？
最后我有5个镜像：

error execute start-container.sh?

: not foundiner.sh: 2: start-container.sh:
: not foundiner.sh: 5: start-container.sh:
: not foundiner.sh: 6: start-container.sh:
: Permission denied 8: start-container.sh: cannot create /dev/null
start hadoop-master container...
Error response from daemon: No such container: hadoop-master
docker: invalid reference format.
See 'docker run --help'.
start-container.sh: 11: start-container.sh: --net=hadoop: not found
start-container.sh: 12: start-container.sh: -p: not found
start-container.sh: 13: start-container.sh: -p: not found
start-container.sh: 14: start-container.sh: --name: not found
start-container.sh: 15: start-container.sh: --hostname: not found
: Permission denied 16: start-container.sh: cannot create /dev/null
: not foundiner.sh: 17: start-container.sh:
: not foundiner.sh: 18: start-container.sh:
start-container.sh: 31: start-container.sh: Syntax error: "done" unexpected (expecting "do")
start-container.sh: 16: start-container.sh: kiwenlau/hadoop:1.0: not found

slaves cannot be found

I had type the following :
root@master:~# : serf member
master.kiwenlau.com 172.17.0.65:7946 alive

5 mins later still

master.kiwenlau.com 172.17.0.65:7946 alive

don't seem to load up

failed on connection exception: java.net.ConnectException: Connection refused

I am trying to run sequenceiq/hadoop-docker:2.4.1 in docker as :

docker run -i -t sequenceiq/hadoop-docker:2.4.1 /etc/bootstrap.sh -bash

and getting the following error :

17/01/09 00:17:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
java.net.ConnectException: Call From be491b66d596/172.17.0.2 to be491b66d596:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
	at org.apache.hadoop.ipc.Client.call(Client.java:1414)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy14.delete(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy14.delete(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:482)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1703)
	at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
	at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
	at org.apache.hadoop.examples.Grep.run(Grep.java:95)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.examples.Grep.main(Grep.java:101)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
	at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
	... 31 more

Any reason why am getting the same?

Currently working on CENTOS 7 with kernel version :
3.10.0-514.2.2.el7.x86_64

How to delete the HDFS data in Docker containers

I run 'hadoop-master' by mount a local folder by '-v'. Then I enter the hadoop-master and 'cd' to the mount folder run 'hdfs dfs -put ./data/* input/'. It works.

But my problem is that I cannot delete the data I copy to 'hdfs'. I delete containers by 'docker rm' ,but the data still exist. Now I only can reset Docker and the data can be deleted.

Is there any other solution?

This is my docker info

➜  hadoop docker info
Containers: 5
 Running: 5
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.12.3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 22
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null bridge host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.27-moby
Operating System: Alpine Linux v3.4
OSType: linux
Architecture: x86_64
CPUs: 5
Total Memory: 11.71 GiB
Name: moby
ID: NPR6:2ZTU:CREI:BHWE:4TQI:KFAC:TZ4P:S5GM:5XUZ:OKBH:NR5C:NI4T
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 56
 Goroutines: 81
 System Time: 2016-11-22T08:10:37.120826598Z
 EventsListeners: 2
Username: chaaaa
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
Insecure Registries:
 127.0.0.0/8

这个项目还维护吗？感觉项目主维护不是很积极的样子

Hadoop streaming

How can I use Hadoop streaming with this cluster? I can't find the jar for it

changing the base images to centOS

Hii

I'm keen in building centos how can i reference your and build a centos rather then ubuntu

can't upload By Java.

我开启了 9000 和 8032 端口. 使用 JAVA API 进行上传文件. 会提示 : could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation.

我不明白这是 hadoop 配置问题还是什么? 目前还出现在 sequenceiq/hadoop-docker. 自己搭建的 hadoop 2.9.1 也存在此问题.

Thank you.

serf cluster 新加入(非第一台机器)的机器是否都需要指定JOIN_IP

@kiwenlau 你好。我看slave hadoop启动的时候都需要配置 -e JOIN_IP=$FIRST_IP 如下:
sudo docker run -d -t --dns 127.0.0.1 -P --name slave$i -h slave$i.kiwenlau.com -e JOIN_IP=$FIRST_IP kiwenlau/hadoop-slave:0.1.0 &> /dev/null
serf是否都需要配置这个环境变量才能工作，那么FIRT_IP的机器是否每次都要先启动。

pengding in wordcount.sh

slaves cannot be found

on osx, a master cannot find its default 2 slaves: slave1.kiwenlau.com and slave2.kiwenlau.com. the 2 urls must be added to /etc/hosts:
172.17.0.2 slave1.kiwenlau.com slave1

how to update the version of hadoop to the latest ？

Error response from daemon: Container

Error response from daemon: Container 2e445deeec26bd4d44b95f9a982e8c840ae2bcd99fbeed842c4f39bfc9f986f5 is not running

serf members

root@master:~# serf members
Error connecting to Serf agent: dial tcp 127.0.0.1:7373: connection refused

./start-ssh-serf.sh: line 9: /etc/serf/start-serf-agent.sh: Permission denied

Multi-host cluster, Apache Spark Cluster and Production Ready

I tried this out on my local and it was fantastic, 5 node cluster. However, I want to also set up a spark cluster on the docker images as well as a multi-host cluster for high availability.

E.g have a 5 node cluster per physical host for 3 physical hosts and have them communicate with each other.

Also want to know if this image is production ready.

Thank you

java.net.UnknownHostException: hadoop-slave2.hadoop

JAVA API access Hbase-cluster.

Caused by: java.net.UnknownHostException: hadoop-slave2.hadoop
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.<init>(AbstractRpcClient.java:315)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1639)
	at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:163)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:376)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
	... 4 more

自动化使用docker部署hadoop集群问题？

hi，
您好！我fork您的工程进行二次开发，但是这其中发现一些小问题，想跟您讨论一下？

我想使用Docker集群快速部署hadoop节点，但是在各个docker中如何能使得各个节点件免密码通信，且每个docker的ssh－key均不一致？
我想自动化的调整hadoop集群的节点数量，可以自动的扩展机器，我该如何操作？

我对这个问题的初步方案，具体准备的实施方案。
个人感觉可以解决问题，但是不是很优雅！不知道回碰到什么问题？

自己经过查找，发现有个框架可以较好的进行配置，名字是 ambari－server，但是对这个不是很熟悉，如果您对这个学习较多，希望能给我些指导？

thx！

always hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused

wordcount output:
cat: Call From hadoop-master/172.18.0.2 to hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

run wordcount giving following error

17/04/03 18:03:31 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:32 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:33 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:34 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:35 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:36 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Could not find or load main class com.sun.tools.javac.Main

when running my own mapreduce job, I met that problem. And I have changed the hadoop-env.sh from jdk7 to jdk8
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Below is the error message:
root@hadoop-master:~/src# hadoop com.sun.tools.javac.Main *.java
Error: Could not find or load main class com.sun.tools.javac.Main

cannot run worldcount

i follow the instructions and successful startup the hadoop using ./start_hadoop.sh. But when i run the wordcount example, i got this messages, and it just stop there:

root@hadoop-master:~# ./run-wordcount.sh
mkdir: cannot create directory 'input': File exists
16/12/05 15:03:01 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.18.0.2:8032
16/12/05 15:03:02 INFO input.FileInputFormat: Total input paths to process : 2
16/12/05 15:03:02 INFO mapreduce.JobSubmitter: number of splits:2
16/12/05 15:03:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480950171648_0001
16/12/05 15:03:02 INFO impl.YarnClientImpl: Submitted application application_1480950171648_0001
16/12/05 15:03:02 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1480950171648_0001/
16/12/05 15:03:02 INFO mapreduce.Job: Running job: job_1480950171648_0001

multi-host hadoop cluster

This project holds good only for single host. How to use it on multi-host ? i have been trying different implementations provided on web, but nothing is straight forward.

README.MD的标题cluster拼错了

about hadoop compile

i have see your install hadoop cmd in hadoop-base dockerfile, it's just a ln cmd instead of install cmd
(/ □ ):
RUN ln -s /usr/local/hadoop-2.3.0 /usr/local/hadoop.

you must have compiled hadoop and i found compile step in your blog. http://www.cnblogs.com/kiwenlau/p/4227204.html

maybe need a ADD cmd in dockerfile.
ADD xxx /usr/local/hadoop-2.3.0

Accessing HDFS outside docker

Hi,
I have your setup up & running. run-wordcount.sh ran smoothly without any issues. However the problem that I have is that the NameNode and DataNodes are "hidden" behind their network. Ports 50070 & 8088 are open, but is it enough to access this HDFS from my local machine (outside the docker network that was created)? Probably not.

In other words, I'm looking for ability to

instantiate Hadoop with HDFS on Docker
access HDFS from my machine (outside of docker)

Any hints?

找不到hadoop-streaming*.jar

执行
$HADOOP_HOME/bin/hadoop jar
$HADOOP_HOME/mapred/contrib/streaming/hadoop-streaming*.jar，
报错：No such file or directory。
请问是怎么回事？小白求指教

Pulling repository docker.io/kiwenlau/hadoop failed

When pulling the repository, it appears the following errors:
`docker@ubuntu:~$ sudo docker pull kiwenlau/hadoop:1.0

[sudo] password for docker:

Pulling repository docker.io/kiwenlau/hadoop

Tag 1.0 not found in repository docker.io/kiwenlau/hadoop
`

java 如何访问docker中的hdfs？

请问我想通过java来操作hdfs
FileSystem fs = FileSystem.get(new URI("hdfs://172.18.0.2:9000/"), configuration, "root"); System.out.println("begin copy"); fs.copyFromLocalFile(new Path("/Users/xxx/apps/test/test.log"), new Path("/")); System.out.println("done!");
用hadoop上master上的ip 没法在hdfs上创建文件
我仿照脚本加上了一个 0.0.0.0:9000 -> 9000/tcp 冲宿主机上映射到hadoop-master上的9000端口，hdfs://localhost:9000/ 发现虽然能创建文件但size是0
请教一下，谢谢！

remove sudo in start-container

some users don't have sudo permissions but they belong to docker group.

docker build image error

build docker hadoop image

Sending build context to Docker daemon 305.2 kB
Step 1/15 : FROM ubuntu:14.04
14.04: Pulling from library/ubuntu
c60055a51d74: Pull complete
755da0cdb7d2: Pull complete
969d017f67e6: Pull complete
37c9a9113595: Pull complete
a3d9f8479786: Pull complete
Digest: sha256:8f5f12335124c1b78e4cf2f8860d395f75ba279bae70a3c18dd470e910e38ec5
Status: Downloaded newer image for ubuntu:14.04
---> b969ab9f929b
Step 2/15 : MAINTAINER KiwenLau [email protected]
mkdir /var/lib/docker/overlay/0c7a9394e523f14a0f16f8f940b14702c7454ecd04b0aeb1dfe85a4373a7e111-init/merged/dev/shm: invalid argument

docker info:
Containers: 2
Running: 0
Paused: 0
Stopped: 2
Images: 3
Server Version: 1.13.1
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51 GiB
Name: iZ2zegz864acs9ifmprfaqZ
ID: CC4X:3HWP:UKBS:4XQK:2FFT:YJ32:5WOS:JTN5:W6KX:7FXX:SAAE:YKC4
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://zd8ozs9s.mirror.aliyuncs.com
Live Restore Enabled: false

linux version : centos 7

Problems to connect on master when run "./run-wordcount.sh" on master

All steps were successful but when I try to run the ./run-wordcount.sh on master I receive the following messages:

15/09/18 15:34:14 INFO client.RMProxy: Connecting to ResourceManager at master.kiwenlau.com/172.17.0.8:8040
15/09/18 15:34:16 INFO ipc.Client: Retrying connect to server: master.kiwenlau.com/172.17.0.8:8040. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
15/09/18 15:39:53 INFO ipc.Client: Retrying connect to server: master.kiwenlau.com/172.17.0.8:8040. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Then I send SIGINT signal and I receive the following messages:

input file1.txt:
Hello Hadoop

input file2.txt:
Hello Docker

wordcount output:
cat: `output/part-r-00000': No such file or directory

Logs

There are other ways of logs that can help you know what's going on? I could not use the YARN.

启动容器的时候报错

报错如下：

[root@computer002 hadoop-cluster-docker]# ./start-container.sh 
start master container...
start slave1 container...
start slave2 container...
FATA[0000] Error response from daemon: Container master is not running

docker info 如下：

[root@computer002 hadoop-cluster-docker]# docker info
Containers: 5
Images: 71
Storage Driver: devicemapper
 Pool Name: docker-253:0-2494751-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: extfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 4.494 GB
 Data Space Total: 107.4 GB
 Metadata Space Used: 5.648 MB
 Metadata Space Total: 2.147 GB
 Udev Sync Supported: true
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.89-RHEL6 (2014-09-01)
Execution Driver: native-0.2
Kernel Version: 2.6.32-504.23.4.el6.x86_64
Operating System: <unknown>
CPUs: 12
Total Memory: 23.39 GiB
Name: computer002
ID: 7N3H:JKBG:43WH:SOJM:PSJM:MJEP:V37S:ZHMP:2YPK:QL74:PT4C:FUB3

建议可以使用hadoop3

能否提供hadoop3的docker啊！

如何开启yarn?

怎么样配置yarn以及开启yarn呢？

Error: image kiwenlau/serf-dnsmasq not found

sudo docker pull index.alauda.cn/kiwenlau/serf-dnsmasq

but,

Pulling repository index.alauda.cn/kiwenlau/serf-dnsmasq
2015/06/09 05:29:32 Error: image kiwenlau/serf-dnsmasq not found

-sh: ./start-container.sh: not found

When i try to start-container get this error. I am new to docker and i don't know is this error belongs here. Looks like docker-machine does't have bash.

Image needs to be updated

When I pull your image, I am getting the following error:

Status: Downloaded newer image for kiwenlau/hadoop-master:0.1.0 docker.io/kiwenlau/hadoop-master: this image was pulled from a legacy registry. Important: This registry version will not be supported in future versions of docker.

I think this is gonna fix it: ansible/ansible-modules-core#2351

Error response from daemon: Container xxx is not running

Hadoop-cluster-Docker Inquires

I'm confuse on what are the file inside the docker image before get clone how I could take a look at it . Looking forward to heard from you soon ! can guide me how you build your dockerfile images ? I very confuse

Local machine Master and docker slaves

If I want to only go up the slaves by the docker and my local machine be the master, is it possible ?, How?
Example:
My notebook - master
Docker image - 3 slaves

hadoop-streaming issue

Hi,
This repo really help me a lot, thank for your kindness.
However I run python script with hadoop-streaming jar and it stop at map 100% reduce 0%
and then I go to dashboard and find reducer is stop at STATING, which word count example is work perfect.

hadoop jar hadoop-streaming.jar \
-files mapper.py,reducer.py \
-mapper "python mapper.py" \
-reducer "python reducer.py" \
-input streaming-data \
-output streaming-output

And I run it by local command, it also work

cat data/* | python mapper.py | sort | python reducer.py

http://www.sunlab.org/teaching/cse6250/fall2017/lab/hadoop-streaming/

Where can I find out office python example, to clarify where is the problem?

Comments and a doubt about Docker volume and HDFS

I've searched many articles about Hadoop running on Docker and like Mathew said "this is a unique one," indeed the strategy adopted is very good to make a set of big data of low cost and low complexity.

Thank you very much for sharing this!

But I stayed with a doubt ... Is there any problem in using the feature docker volume to "share" folder hdfs of the hadoop-master with the host in order to insert new files and make backups.

Such doubts came up to me after reading this article Understanding Volumes in Docker by Adrian Mouat

Error when run './start-container.sh '

start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
Error response from daemon: Container 2bfaaa643f18a33c9bb018140b91af35b68c777b3f8c1c5d4081af84e0b8af5a is not running

docker pull is very slow in China

I found a better way to solve this issue. Docker **官方镜像加速.

sudo docker pull registry.docker-cn.com/kiwenlau/hadoop:1.0

I strongly recommend adding this to README.md.

start-hadoop doesn't start node manager and data node. ony start resource manager

root@hadoop-master:# $HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [hadoop-master]
hadoop-master: Warning: Permanently added 'hadoop-master,172.18.0.2' (ECDSA) to the list of known hosts.
hadoop-master: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-86-generic x86_64)
hadoop-master:
hadoop-master: * Documentation: https://help.ubuntu.com/
hadoop-master:
hadoop-master: The programs included with the Ubuntu system are free software;
hadoop-master: the exact distribution terms for each program are described in the
hadoop-master: individual files in /usr/share/doc//copyright.
hadoop-master:
hadoop-master: Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
hadoop-master: applicable law.
hadoop-master:
hadoop-slave2: Warning: Permanently added 'hadoop-slave2,172.18.0.4' (ECDSA) to the list of known hosts.
hadoop-slave1: Warning: Permanently added 'hadoop-slave1,172.18.0.3' (ECDSA) to the list of known hosts.
hadoop-slave1: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-86-generic x86_64)
hadoop-slave1:
hadoop-slave1: * Documentation: https://help.ubuntu.com/
hadoop-slave1:
hadoop-slave1: The programs included with the Ubuntu system are free software;
hadoop-slave1: the exact distribution terms for each program are described in the
hadoop-slave1: individual files in /usr/share/doc//copyright.
hadoop-slave1:
hadoop-slave1: Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
hadoop-slave1: applicable law.
hadoop-slave1:
hadoop-slave2: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-86-generic x86_64)
hadoop-slave2:
hadoop-slave2: * Documentation: https://help.ubuntu.com/
hadoop-slave2:
hadoop-slave2: The programs included with the Ubuntu system are free software;
hadoop-slave2: the exact distribution terms for each program are described in the
hadoop-slave2: individual files in /usr/share/doc/*/copyright.
hadoop-slave2:
hadoop-slave2: Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
hadoop-slave2: applicable law.
hadoop-slave2:
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 4.1.12-61.1.28.el6uek.x86_64 x86_64)
0.0.0.0:
0.0.0.0: * Documentation: https://help.ubuntu.com/
root@hadoop-master:# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 21:11 ? 00:00:00 sh -c service ssh start; bash
root 33 1 0 21:11 ? 00:00:00 /usr/sbin/sshd
root 36 1 0 21:11 ? 00:00:00 bash
root 46 0 0 21:11 ? 00:00:00 bash
root 400 46 0 21:12 ? 00:00:00 ps -ef

fail to run start-container.sh

error message :

zeegin@zeegin-Virtual-Machine:~/hadoop-cluster-docker$
 ./start-container.sh
start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
Error response from daemon: Container ebcfe89abd2ffa3c038b6960d279e95e3b3e2426cd002406422facfd8c09b04b is not running

docker info:

zeegin@zeegin-Virtual-Machine:~/hadoop-cluster-docker$ docker info
Containers: 3
 Running: 0
 Paused: 0
 Stopped: 3
Images: 1
Server Version: 1.12.6
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 18
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host overlay bridge null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.8.0-46-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: i686
CPUs: 1
Total Memory: 998.4 MiB
Name: zeegin-Virtual-Machine
ID: 4TFL:RFFY:DZIF:5ATO:OJRP:5RAD:HLXT:X6ZH:VIJN:426D:7WXK:V6JG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

This issue seems like this one : #2

But still failed after reboot

./run-wordcount.sh 脚本运行

root@hadoop-master:~# ./run-wordcount.sh
18/12/04 03:48:57 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.18.0.2:8032
18/12/04 03:48:58 INFO input.FileInputFormat: Total input paths to process : 2
18/12/04 03:48:58 INFO mapreduce.JobSubmitter: number of splits:2
18/12/04 03:48:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543895269322_0001
18/12/04 03:48:59 INFO impl.YarnClientImpl: Submitted application application_1543895269322_0001
18/12/04 03:48:59 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1543895269322_0001/
18/12/04 03:48:59 INFO mapreduce.Job: Running job: job_1543895269322_0001
18/12/04 03:49:10 INFO mapreduce.Job: Job job_1543895269322_0001 running in uber mode : false
18/12/04 03:49:10 INFO mapreduce.Job: map 0% reduce 0%
18/12/04 03:49:20 INFO mapreduce.Job: Task Id : attempt_1543895269322_0001_m_000001_0, Status : FAILED
Exception from container-launch.
Container id: container_1543895269322_0001_01_000003
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

我看了issue,但没有解决这个问题.