big-data-europe / docker-hadoop Goto Github PK
View Code? Open in Web Editor NEWApache Hadoop docker image
Apache Hadoop docker image
问题(question):
不能写入(can't put ) hdfs://localhost:9000/work/test.txt
报错如下(error log as follow):
namenode_1 | java.io.IOException: File /work/test.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
namenode_1 | at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1628)
namenode_1 | at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3121)
namenode_1 | at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3045)
namenode_1 | at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
namenode_1 | at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:493)
namenode_1 | at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
namenode_1 | at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
namenode_1 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
namenode_1 | at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
namenode_1 | at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
namenode_1 | at java.security.AccessController.doPrivileged(Native Method)
namenode_1 | at javax.security.auth.Subject.doAs(Subject.java:422)
namenode_1 | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
namenode_1 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213)
有遇到相同问题的吗?
Thanks
when i execute docker-compose up, and throw a exception ....how can i solve it?
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected version of storage directory /hadoop/dfs/name. Reported: -64. Expecting = -63.
at org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:179)
at org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:131)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.setFieldsFromProperties(NNStorage.java:626)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.readProperties(NNStorage.java:655)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:386)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:225)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:978)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:685)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:819)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:803)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
You changed hadoop namenode port in 2.2.0 release but forgot to change it inside enviroment file.
Thanks for sharing this wonderful git repo!
The installation fails with error:
---> Running in 0dd3cbd1a7c0
I fixed it by replacing with 3.2.0 but I am not sure if it will finally succeed
Is there an easy way to achieve this installation yet not hardcoding a specific s/w version?
thanks
Sriram
Hi,
I want to scale the hadoop datanode to 3, I run the command docker-compose up --scale datanode=3
. But the resulting cluster still has only one datanode.
Could you please give me some suggestions on how to run the cluster with multiple datanodes?
Thanks,
Hello,
i'm trying to put together a package using your framework.
First let me say thank you, this is a great starting point and I appreciate the modularity.
i'm trying to access localhost:8088 but i'm unable to,
however i can access the name node on http://localhost:50070.
seems similar to #4 although there wasn't much detail for that issue.
I'm on:
macOSX 10.12.5
Docker CE
Version 17.06.0-ce-mac17 (18432)
Channel: edge
4bb7a7dfa0
I also have some other general questions, is there a chat (slack or something?) for this project?
successful connection to 8088:
bash-3.2$ curl localhost:50070
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="REFRESH" content="0;url=dfshealth.html" />
<title>Hadoop Administration</title>
</head>
</html>
Connection refused:
bash-3.2$ curl localhost:8088
curl: (7) Failed to connect to localhost port 8088: Connection refused
Hi
I am getting the following error when I try to build Dockerfile base
in branch 2.0.0-hadoop3.1.1-java8
.
Please kindly suggest me,
Thanks!
�[0m�[91mgpg: no ultimately trusted keys found
�[0m�[91mgpg: Total number processed: 30
gpg: imported: 30 (RSA: 23)
�[0mRemoving intermediate container a591c9a7764c
---> 5a39f77c4fe6
Step 7/21 : RUN gpg --keyserver pool.sks-keyservers.net --recv-key C36C5F0F
---> Running in 22cf96979a57
�[91mgpg: requesting key C36C5F0F from hkp server pool.sks-keyservers.net
�[0mgpgkeys: key C36C5F0F can't be retrieved
�[91mgpg: no valid OpenPGP data found.
gpg: Total number processed: 0
�[0mThe command '/bin/sh -c gpg --keyserver pool.sks-keyservers.net --recv-key C36C5F0F' returned a non-zero code: 2
I have set up docker swarm cluster , use the following configuration file to deploy hdfs cluster on the overlay network named test in my swarm cluster .
version: '3'
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.1.2-java8
volumes:
- namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
networks:
- hbase
ports:
- 9870:9870
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.hostname == 10-20-28-111
labels:
traefik.docker.network: hbase
traefik.port: 9870
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.1.2-java8
volumes:
- datanode:/hadoop/dfs/data
env_file:
- ./hadoop.env
networks:
- hbase
environment:
SERVICE_PRECONDITION: "namenode:9000"
deploy:
restart_policy:
condition: on-failure
labels:
traefik.docker.network: hbase
traefik.port: 9866
placement:
constraints:
- node.hostname == 10-20-28-111
volumes:
datanode:
namenode:
networks:
hbase:
external:
name: test
I can see registed datanode from namenode web ui after successful deployment,
but the datanode hostname and ip don't match.And i can't access datanode 9866
port through the datanode ip,but i can access it via datanode hostname.
Can you tell me why and give me a hand?Thank you!
make wordcount
failed in my environment with UnknownHostException error.
How can I fix it?
Any help would be appreciated.
Thanks.
java.lang.IllegalArgumentException: java.net.UnknownHostException: historyserver
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445)
at org.apache.hadoop.yarn.util.timeline.TimelineUtils.buildTimelineTokenService(TimelineUtils.java:163)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:186)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:111)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.mapred.ResourceMgrDelegate.<init>(ResourceMgrDelegate.java:105)
at org.apache.hadoop.mapred.YARNRunner.<init>(YARNRunner.java:152)
at org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:130)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:109)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:102)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1540)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1536)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1536)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1564)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
at WordCount.main(WordCount.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
Caused by: java.net.UnknownHostException: historyserver
... 27 more
Hi, I'm using spark inside docker stack named "hadoop2" which has docker services:
I'm trying to use spark with yarn as a cluster manager in same overlay network "hadoop_network". But when I'm trying to enter pyspark console in that yarn-spark-driver's container or python console tries to use pyspark and create spark session it is not proceeding further as below:
Dockerfile of yarn spark driver image
FROM hadoop-base:2
ENV SPARK_VERSION=2.4.4
ENV HADOOP_VERSION=2.7
RUN apt-get update && apt-get install -y procps wget make build-essential libssl-dev zlib1g-dev libbz2-dev libsqlite3-dev \
# && ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2 \
&& wget https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
&& tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
&& mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
&& rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
#&& cd /css \
#&& jar uf /spark/jars/spark-core_2.11-${SPARK_VERSION}.jar org/apache/spark/ui/static/timeline-view.css \
&& cd /
ENV PYTHONHASHSEED 1
ENV SPARK_HOME /spark
COPY spark-env.sh $SPARK_HOME/conf/spark-env.sh
COPY spark-defaults.conf $SPARK_HOME/conf
ENV PYSPARK_PYTHON /usr/local/bin/python
ENV PYSPARK_DRIVER_PYTHON /usr/local/bin/python
COPY mysql-connector-java-5.1.48.jar /spark/jars/mysql-connector-java-5.1.48.jar
COPY mysql-connector-java-5.1.48-bin.jar /spark/jars/mysql-connector-java-5.1.48-bin.jar
COPY postgresql-42.2.10.jar /spark/jars/postgresql-42.2.10.jar
ENV PYTHON_VERSION 3.5.9
RUN apt-get update && apt-get install -y libssl-dev openssl build-essential manpages-dev pandoc && wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz && tar zxf Python-${PYTHON_VERSION}.tgz && cd Python-${PYTHON_VERSION}/ && ./configure --prefix=/usr/local && make && make install && cd ../ && rm -rf Python-${PYTHON_VERSION} && rm Python-${PYTHON_VERSION}.tgz && rm -rf /var/lib/apt/lists/* && ln -s /usr/local/bin/python3 /usr/local/bin/python && ln -s /usr/local/bin/pip3 /usr/local/bin/pip
RUN pip install --upgrade setuptools && pip install wheel pypandoc && pip install --upgrade pip
ENV GRPC_PYTHON_VERSION 1.25.0
RUN pip install grpcio==${GRPC_PYTHON_VERSION} grpcio-tools==${GRPC_PYTHON_VERSION} numpy==1.18.1 pandas==0.25.3 sqlalchemy==1.3.13 pyspark==2.4.5
ENV HADOOP_CONF_DIR /opt/hadoop-2.7.7/etc/hadoop
ENV YARN_CONF_DIR /opt/hadoop-2.7.7/etc/hadoop
COPY spark-defaults.conf /spark/conf
ADD jersey-bundle-1.17.1.jar /spark/jars/jersey-bundle-1.17.1.jar
ADD run.sh /run.sh
RUN chmod a+x /run.sh
EXPOSE 18080 8080 50051 8088
CMD ["/run.sh"]
run.sh file
#!/bin/bash
export LD_LIBRARY_PATH=/opt/hadoop-2.7.7/lib/native:$LD_LIBRARY_PATH
$HADOOP_PREFIX/bin/hdfs dfs -mkdir /spark-logs
$SPARK_HOME/sbin/start-history-server.sh
# GRPC start-server should be written here
sleep 1000000
Spark Yarn config file
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://namenode:9000/spark-logs
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://namenode:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080
spark.sql.warehouse.dir hdfs://namenode:9000/user/hive/warehouse
spark.yarn.jars /spark/jars
spark-env.sh file
export PYSPARK_PYTHON=/usr/local/bin/python
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python
export HADOOP_CONF_DIR=/opt/hadoop-2.7.7/etc/hadoop
export YARN_CONF_DIR=/opt/hadoop-2.7.7/etc/hadoop
Using the tag 2.0.0-hadoop3.1.3-java8 bash
, the actual installed version is 3.2.1
and not 3.1.3
.
~/docker-hive$ docker run -it bde2020/hadoop-namenode:2.0.0-hadoop3.1.3-java8 bash
Configuring core
- Setting fs.defaultFS=hdfs://bc96e1de2c94:8020
Configuring hdfs
- Setting dfs.namenode.name.dir=file:///hadoop/dfs/name
Configuring yarn
Configuring httpfs
Configuring kms
Configuring mapred
Configuring for multihomed network
root@bc96e1de2c94:/# echo $PATH
/opt/hadoop-3.2.1/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root@784684f7199c:/bin# which hdfs
/opt/hadoop-3.2.1/bin//hdfs
Also, run.sh
fails to run as hdfs
cannot be found, presumably related to this different version being installed.
$ docker logs 765
Configuring core
- Setting hadoop.proxyuser.hue.hosts=*
- Setting fs.defaultFS=hdfs://namenode:8020
- Setting hadoop.http.staticuser.user=root
- Setting hadoop.proxyuser.hue.groups=*
Configuring hdfs
- Setting dfs.namenode.datanode.registration.ip-hostname-check=false
- Setting dfs.webhdfs.enabled=true
- Setting dfs.permissions.enabled=false
- Setting dfs.namenode.name.dir=file:///hadoop/dfs/name
Configuring yarn
- Setting yarn.timeline-service.enabled=true
- Setting yarn.resourcemanager.system-metrics-publisher.enabled=true
- Setting yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
- Setting yarn.log.server.url=http://historyserver:8188/applicationhistory/logs/
- Setting yarn.resourcemanager.fs.state-store.uri=/rmstate
- Setting yarn.timeline-service.generic-application-history.enabled=true
- Setting yarn.log-aggregation-enable=true
- Setting yarn.resourcemanager.hostname=resourcemanager
- Setting yarn.resourcemanager.resource_tracker.address=resourcemanager:8031
- Setting yarn.timeline-service.hostname=historyserver
- Setting yarn.resourcemanager.scheduler.address=resourcemanager:8030
- Setting yarn.resourcemanager.address=resourcemanager:8032
- Setting yarn.nodemanager.remote-app-log-dir=/app-logs
- Setting yarn.resourcemanager.recovery.enabled=true
Configuring httpfs
Configuring kms
Configuring mapred
Configuring for multihomed network
Formatting namenode name directory: /hadoop/dfs/name
/run.sh: line 16: /bin/hdfs: No such file or directory
/run.sh: line 19: /bin/hdfs: No such file or directory
Me and a colleague have used/adapted some of the Dockerfiles and config provided here.
To keep in line with regulations ("Generally speaking, the absence of a license means that the default copyright laws apply."), I'd appreciate if you could add a license to this repo.
While I have my own java container, do I need to have java in this container? Can I use this container with another java/tomcat container?
Sorry if this question does not make much sense. Please let me know if you need clarification.
I have had no luck getting the stack
version of this repo running.
I have created one docker-machine (DM) and initialised it as a swarm manager - advertising its address.
As it uses an external network, I created the hbase
network with the scope
swarm
.
Then, I pulled the repo onto the DM and run docker stack deploy
as detailed.
None of the containers reaches 1/1
. Are there some other detailed instructions? Or am I missing something here?
Any input would be greatly appreciated.
in the docker-hadoop project , i can not build the dockerfile with debian:8 , but i can use debian:9 with the 3.1.3 Dockerfile to build the hadoop 2.x version. you can upgrade the hadoop 2.x Dockerfile?
ps : in the dockerfile you can add a code line as follow:
RUN ln -s /opt/hadoop-$HADOOP_VERSION /opt/hadoop
then i can see the version in the docker containner.
resourcemanager cannot start if it cannot write to the HDFS, so it is require to wait that namenode + datanode(s) are ready.
At the present time the only solution is to start in two steps:
> docker-compose up -d namenode datanode1 datanode2 datanode3
[...]
> docker-compose up -d resourcemanager historyserver nodemanager1
I'm completely new to Hadoop, and I found this repo because I had the thought, "Wow, installing Hadoop is hard, and all I want is HDFS. Surely there's got to be an easier way to do this. Maybe someone made a Docker container!"
Indeed, this repo does an amazing job of getting all the complicated details out of the way. But there's a number of questions that are left unanswered after getting this running. Thought it'd be useful to list them here:
docker-compose.yml
? I opened up the other ports (8088, 8042, 9864, and 8188) in the other services and can access all the UIs now.docker-compose up
? It'd be amazing to be able to do something like docker-compose up
(and maybe another command) on another host and have them connect.http://localhost:9870
->`Utilities->Browse the file system, and it failed.These questions will probably be answered just by working with Hadoop more, but I thought they could help you guys if you're looking to address the new crowd. Lots of university students, especially those doing data science/engineering, are starting to feel the need to get familiar with tools like this.
First of all, thank you very much for this repository. I used your docker-compose and I had a problem on the HDFS web. If you use the hdfs browser on the web, you can't download files. If you want to repair that, you can use a network for hadoop and IP addresses in docker-compose. Afterwards, you have to use 'hostname: ASSIGND_IP' for each container. I use it for the thesis of my master.
Thank you for sharing.
I am trying to write to HDFS from an ElasticSearch cluster via port 9000.
With the mapping port 9866 on datanode and ElasticSearch on the same server it works.
But when ElasticSearch is hosted on a separate server I get this error:
File /user/elasticsearch/repositories/backup/tests-drDBcjJtS5ak1AAQEJuRoA/pending-master.dat-PSdMuA6WRl6oJhPK2Ejy9w could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2219)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2789)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
There is no firewall between the 2 servers.
Do you have an idea?
Thanks a lot for your help!
Hi, I'm using docker-compose.xml file for version 2.7.4. I have a Spark job that was trying to write data into hdfs and it produced this error:
java.io.IOException: File /temp/test_write/_temporary/0/_temporary/attempt_20190522040044_0000_m_000000_0/part-00000-38a5ed5a-fac2-42f7-8be6-d9b11406d642-c000.snappy.parquet could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
So I want to check namenode log to find the reason but cannot figure out where the log is inside the container of namenode.
Hello,
typically there is a license file or statement found in a repo, what license is your project under?
Thank you,
Nathanael
Thanks for sharing this docker. It is really helpful for hadoop development. Currently when I pull the latest source and run the containers with docker-compose, the cluster runs well with only 1 datanode.
Wondering how to start 3 datanode at least for development purpose?
Thanks.
Can someone help me troubleshoot the issue I'm having with make wordcount
?
Running that command yields this error:
docker run --network docker-hadoop_default --env-file hadoop.env bde2020/hadoop-base: hdfs dfs -mkdir -p /input/
docker: invalid reference format.
See 'docker run --help'.
make: *** [wordcount] Error 125
Hello,
Do you see any issue with using different hadoop versions among containers with different roles?
for instance if i'm running docker-hive containers with hadoop 2.8 and want to add a service to the cluster built from hadoop-base 2.7 due to OS version constraints/availability etc.
Would that cause any issues that you are aware of?
I have not experienced a problem as of yet, but as things usually are you won't experience a problem until much further down the road.
Thanks
See Journal Nodes setup and this github repo
I tried to deploy a hadoop cluster in an overlay network and the datanodes could not connect to the namenode. Namenode reports that the reverse IP resolution for the specific node fails. The same exception is reported at the datanode.
To fix this set HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false
Bellow the reported error
datanode1.1.rdz8cymlzo24@slave1 | 18/05/22 12:50:22 ERROR datanode.DataNode: Initialization failed for Block pool BP-579303948-10.0.0.11-1526993259750 (Datanode Uuid null) service to namenode2/10.0.0.10:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=10.0.0.13, hostname=10.0.0.13): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=c990d411-7fa1-43b4-a7fa-f45fed48147b, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a856ef45-e3f7-4896-9973-543c07777c95;nsid=90446212;c=0)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:863)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4529)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1279)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:95)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28539)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
datanode1.1.rdz8cymlzo24@slave1 | at java.security.AccessController.doPrivileged(Native Method)
datanode1.1.rdz8cymlzo24@slave1 | at javax.security.auth.Subject.doAs(Subject.java:422)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
datanode1.1.rdz8cymlzo24@slave1 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
When switching all images version in docker-compose.yml from 1.1.0-hadoop2.7.1-java8
to 1.2.0-hadoop2.7.4-java8
, my nodemanager1
exit with code 255
. The logs (docker-compose logs nodemanager1
) contain this error :
nodemanager1 | 17/10/04 12:56:21 ERROR nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
nodemanager1 | org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: NodeManager from ba78946515dc doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.
nodemanager1 | at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:278)
nodemanager1 | at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:197)
nodemanager1 | at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
nodemanager1 | at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
nodemanager1 | at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:272)
nodemanager1 | at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
nodemanager1 | at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:496)
nodemanager1 | at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:543)
TL;DR : Message from ResourceManager: NodeManager from ba78946515dc doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager
It works fine with 1.1.0-hadoop2.7.1-java8
. I can't figure out what change intruduced by version 1.2.0-hadoop2.7.4-java8
leads to this error. Am I the only one getting the error ?
FYI all docker-compose.yml
in every git branches always refers to 1.1.0-hadoop2.7.1-java8
, including branches master
and 1.2.0-hadoop2.7.4-java8
.
Hi. I'm using docker-compose.yml from master, and getting error:
namenode | - Setting yarn.resourcemanager.scheduler.address=resourcemanager:8030
namenode | - Setting yarn.resourcemanager.address=resourcemanager:8032
namenode | - Setting yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage=98.5
namenode | - Setting yarn.nodemanager.resource.memory-mb=16384
namenode | - Setting yarn.nodemanager.resource.cpu-vcores=8
namenode | Configuring httpfs
namenode | Configuring kms
namenode | Configuring mapred
namenode | - Setting mapreduce.map.java.opts=-Xmx3072m
namenode | - Setting mapreduce.map.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1/
namenode | - Setting mapreduce.reduce.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1/
namenode | - Setting mapred.child.java.opts=-Xmx4096m
namenode | - Setting yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1/
namenode | - Setting mapreduce.framework.name=yarn
namenode | - Setting mapreduce.reduce.java.opts=-Xmx6144m
namenode | - Setting mapreduce.reduce.memory.mb=8192
namenode | - Setting mapreduce.map.memory.mb=4096
namenode | Configuring for multihomed network
namenode | WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
namenode | namenode is running as process 359. Stop it first.
namenode exited with code 1
could you please help me ?
Here is output I got from docker network. What IP address should I use to access uis?
docker network list
NETWORK ID NAME DRIVER SCOPE
532de5548fae docker-hadoop_default bridge local
docker network inspect docker-hadoop_default
[
{
"Name": "docker-hadoop_default",
"Id": "532de5548fae05ec73401584d8d745a242d48880aba7431654d29bcae43c787c",
"Created": "2018-08-04T00:31:40.0297858Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.29.0.0/16",
"Gateway": "172.29.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"01a2683d5861d879bfb760237da459430a23c4380342b982af392ecb15be40b8": {
"Name": "datanode3",
"EndpointID": "6395431456b043799cd5ec35cf28de2eedc3d440d6da3035cd8bf84f7c127cdd",
"MacAddress": "02:42:ac:1d:00:03",
"IPv4Address": "172.29.0.3/16",
"IPv6Address": ""
},
"0c1ac1c3f75bb3372a0c5fc7ebc0e76d879c3e5c10f58d91c92d73a393dafd31": {
"Name": "resourcemanager",
"EndpointID": "f230411a390ae9f544ce1548861300d33c7506a6ab695501b0c7cdb363532632",
"MacAddress": "02:42:ac:1d:00:06",
"IPv4Address": "172.29.0.6/16",
"IPv6Address": ""
},
"8affe1a3b2a89b8e384ce25e1161b39ba4f4376f51071954fa423f7e85fbc05d": {
"Name": "historyserver",
"EndpointID": "c63c9d103aad8328e54503c9015442a3f9bf0f8fd97ef7a8107242f3b2b82073",
"MacAddress": "02:42:ac:1d:00:07",
"IPv4Address": "172.29.0.7/16",
"IPv6Address": ""
},
"9a307d47e485c8da103a26e9f8ff42262a1e7389cc08e0497f73144367906bed": {
"Name": "datanode2",
"EndpointID": "d2d674aefbe8000e80d73be8d3402884f8d357a5d9b5a957024aec650f8f2823",
"MacAddress": "02:42:ac:1d:00:04",
"IPv4Address": "172.29.0.4/16",
"IPv6Address": ""
},
"b92efbfaa1275937ca6193d300175107f13bfc655b5492b1947a28248c4c540e": {
"Name": "nodemanager1",
"EndpointID": "7d0e7d747e326cc88bbea45e5ea34bb1d2cf082569683420ce8285b0a368dae9",
"MacAddress": "02:42:ac:1d:00:08",
"IPv4Address": "172.29.0.8/16",
"IPv6Address": ""
},
"c510a8be2178487214fbe7b14e5bd06285b746afff6787b3f6394f6d8d259b22": {
"Name": "datanode1",
"EndpointID": "e0639cc31ad421711584f5c01fdc848f521473c95e0ef54eb5865a38e907770d",
"MacAddress": "02:42:ac:1d:00:05",
"IPv4Address": "172.29.0.5/16",
"IPv6Address": ""
},
"f90c3696b7246724ec6438c2bbb67430612ee7fded231afabd09bdc86ed5c488": {
"Name": "namenode",
"EndpointID": "238d110ad84fe88855d6c26451358d735c487c88ad6b04c80b4411b08f9b797e",
"MacAddress": "02:42:ac:1d:00:02",
"IPv4Address": "172.29.0.2/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {}
}
]
Hello!
First of all, thanks for your work.
Debian 8 is outdated now, so the repositories doesn't work as expected, which means that the docker can't be built.
There are two solutions as related in this post: https://unix.stackexchange.com/questions/508724/failed-to-fetch-jessie-backports-repository
Of course the first one is the most desirable. Is there any plan on solving this?
Thanks,
Alberto.
I am running this using docker on Ubuntu 16.04 LS. I am unable to access any of the interfaces once I startup this docker-hadoop cluster. I dropped all my firewall rules to be sure. I see the ip addresses for each using docker network inspect hadoop but none of them work when.
delete
provide docker-compose v2 definition compatible with docker swarm and instructions on how to run it inside swarm.
I just run it:
docker stack deploy -c docker-compose-v3.yml hadoop
get the Error:
Error grabbing logs: rpc error: code = Unknown desc = warning: incomplete log stream. some logs could not be retrieved for the following reasons: task k7yssbdrd1kzsrckahj8oapjm has not been scheduled
Hi!
Yet another Hadoop version is no longer available. Lets update to https://www.apache.org/dist/hadoop/common/hadoop-3.1.3
Hello,
Just curious if anyone is actively developing a docker impala container to plugin to this ecosystem?
I'm just getting started trying to assemble a docker hadoop stack with the following,
HDFS
Hive
Spark
Kafka
Impala
Just looking to keep from duplicating someone else's work.
Thanks
Hi
I am using Macos and latests docker. I am instaling and doing the instructions from top to bottum using
first
$ docker-compose up.
$ docker network inspect docker-hadoop_default
[
{
"Name": "docker-hadoop_default",
"Id": "ac0edd9732673890e6e1d9ddd5b0efaaf84de22384b29535e7950efbc659c56d",
"Created": "2018-10-13T15:30:10.6899672Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.19.0.0/16",
"Gateway": "172.19.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"013bb40c313c7be8557e19bf2e40892d0a6a6374da7601f21b2ed4e734d33d4b": {
"Name": "datanode3",
"EndpointID": "de8f76d89f34209bb7186e70c4c4fdb563da16414c3814a8ecbd60de65cd99dc",
"MacAddress": "02:42:ac:13:00:04",
"IPv4Address": "172.19.0.4/16",
"IPv6Address": ""
},
"23df674bc81d609600da7d06452d2e28f5160191e8ee2f9450e412d8321692f6": {
"Name": "datanode1",
"EndpointID": "cb12e0292dff539ffe015153a8562de0cd2f9252611ad9feee9b7eb8a467158a",
"MacAddress": "02:42:ac:13:00:03",
"IPv4Address": "172.19.0.3/16",
"IPv6Address": ""
},
"2b9d0c7c613403a75a76f228f2827f801d29e59d2ab8a740a1d4e1b6d343632a": {
"Name": "namenode",
"EndpointID": "a0fb15337946fdca7ee60ed82a42e73a6d47616269ebf4d538b11025281a0a6e",
"MacAddress": "02:42:ac:13:00:02",
"IPv4Address": "172.19.0.2/16",
"IPv6Address": ""
},
"aed21bba8203c70b77d5cd571d8b0c0004760f0f57804b536596bb44171e6713": {
"Name": "datanode2",
"EndpointID": "f3a74788a3a246d1e5916e65c83553f42b832e39b6b568a972a5eb36dfac17e6",
"MacAddress": "02:42:ac:13:00:05",
"IPv4Address": "172.19.0.5/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {}
}
]
No errors are shown in the logs. I can't open up the User Interface of namenode. I didn't change anything and I am using http://172.19.0.2:50070/dfshealth.html#tab-overview
which should be the default indicated by networks inspect.
Nothing seems wrong, yet it leads to
ERR_CONNECTION_TIMED_OUT
.
Is it mandatory to "Configure Environment Variables
The configuration parameters can be specified in the hadoop.env file or as environmental variables for specific services (e.g. namenode, datanode etc.):" before trying to open the UI?
Please help
I tried to run the compose v3 file from the repo at swarm mode with 3 nodes. namenode and resourcemanager can run perfectly. But the remaining services just kept starting and starting and threw out an error "task: non-zero exit (137)".
Does anyone have any idea?
In our lab, we are using some of big-data-europe containers (for research purposes only). I wonder which license should I include together when using them?
I am trying to deploy hadoop on kubernetes using the images build by you. the problem I am facing is that the values of "hadoop.env" that I am passing using configmap not working properly. Means the variable are reflected in hdfs and core sites but the value part are missing. can you please help me to sort this out. I have paste screen dumps below.
Screendumps:
dfs.webhdfs.enabled
dfs.permissions.enabled
dfs.namenode.datanode.registration.ip-hostname-check
hadoop.proxyuser.sdc.hosts
hadoop.proxyuser.sdc.groups
Thanks in advance.
Add supported versions of Hadoop
Thanks!
$ docker-compose up
ERROR: manifest for bde2020/hadoop-namenode:2.0.0-hadoop3.1.3-java8 not found: manifest unknown: manifest unknown
Can be fixed by changing 2.0.0-hadoop3.1.3-java8
to latest
in docker-compose.yml
. Looks like there's no tag in docker hub for 2.0.0-hadoop3.1.3-java8
.
i want to take some measure to ensure that my cluster is not illegally used by others,so i want to use tools like kerberos,can someone help me ?
How do I extend the datanode on another machine
Hi, I used your repo for a class project and succeeded at submitting a spark job in Spark Standalone client mode. How would one go around submitting it in a YARN cluster? I ran "yarn resourcemanager" and "yarn nodemanager" in the spark master and then tried to submit the app with yarn as master. The application was submitted but stuck at this:
18/01/06 21:37:56 INFO yarn.Client: Application report for application_1515274638576_0001 (state: ACCEPTED)
18/01/06 21:37:56 INFO yarn.Client:
client token: N/A
diagnostics: [Sat Jan 06 21:37:55 +0000 2018] Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:1024, vCores:1>; Queue Resource Limit for AM = <memory:0, vCores:0>; User AM Resource Limit of the queue = <memory:0, vCores:0>; Queue AM Resource Usage = <memory:0, vCores:0>;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1515274674664
final status: UNDEFINED
tracking URL: http://9ffd3c47538e:8088/proxy/application_1515274638576_0001/
user: root
18/01/06 21:37:57 INFO yarn.Client: Application report for application_1515274638576_0001 (state: ACCEPTED)
18/01/06 21:37:58 INFO yarn.Client: Application report for application_1515274638576_0001 (state: ACCEPTED)
18/01/06 21:37:59 INFO yarn.Client: Application report for application_1515274638576_0001 (state: ACCEPTED)
I tried adding additional env vars from this other project of yours: https://github.com/big-data-europe/docker-hbase/blob/master/distributed/hadoop.env with no luck.
Could you give me any pointers on how to proceed?
Thanks in advance
The Hadoop official document recommends to set dfs.encrypt.data.transfer.cipher.suites to AES.
"AES offers the greatest cryptographic strength and the best performance."
Setting it to AES/CTR/NoPadding activates AES encryption. By default, this is unspecified, so AES is not used.
I wonder if the default need to be changed for security and performance. Thanks.
subj
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.