Coder Social home page Coder Social logo

读hdfs crash about x-deeplearning HOT 11 CLOSED

codescv avatar codescv commented on August 24, 2024
读hdfs crash

from x-deeplearning.

Comments (11)

codescv avatar codescv commented on August 24, 2024 3

我最近也遇到类似的问题, 把hadoop 的classpath,全喂到 CLASSPATH 后就不再core了。
可以试下这样做能否解决相关的问题
export CLASSPATH=$(for p in hadoop classpath --glob | sed s'/:/ /g'; do find $p -name '*.jar' 2>/dev/null; done | tr '\n' ':')

刚刚试了一下,照旧还是一样的core。

设置 CLASSPATH HADOOP_HDFS_HOME LD_LIBRARY_PATH 这三个变量即可。
LD_LIBRARY_PATH 要加入$JAVA/jre/lib/amd64/server
CLASSPATH=$(hadoop classpath --glob)

from x-deeplearning.

songyue1104 avatar songyue1104 commented on August 24, 2024 1

HADOOP_HDFS_HOME这个环境变量设了吗?

from x-deeplearning.

songyue1104 avatar songyue1104 commented on August 24, 2024 1

XDL读写hdfs需要在运行时dlopen libhdfs.so, 目前会在如下路径寻找libhdfs.so: $HADOOP_HDFS_HOME/lib/native/libhdfs.so,请确保HADOOP_HDFS_HOME设置正确

from x-deeplearning.

sydpz avatar sydpz commented on August 24, 2024 1

我最近也遇到类似的问题, 把hadoop 的classpath,全喂到 CLASSPATH 后就不再core了。
可以试下这样做能否解决相关的问题
export CLASSPATH=$(for p in hadoop classpath --glob | sed s'/:/ /g'; do find $p -name '*.jar' 2>/dev/null; done | tr '\n' ':')

from x-deeplearning.

githcx avatar githcx commented on August 24, 2024 1

yiran

我觉得是时候把我的错误贴上来了,做TDM的训练的部分,跑下面这句,也是一样的问题:
python train.py --run_mode=local --config=config.train.json

可以考虑在镜像里加上配好的hadoop,如果这个问题是hadoop配置引起的话

#0 0x00007fffcaba98ce in std::function<hdfs_internal* (char const*, unsigned short)>::operator()(char const*, unsigned short) const (__args#1=0,
__args#0=0x23df6f8 "http://localhost:9000", this=0x7fffcb5b49b8) at /usr/include/c++/5/functional:2267
#1 xdl::io::FileSystemHdfs::FileSystemHdfs (this=0x9b8a20, namenode=0x1cb28a8 "http://localhost:9000") at /home/yue.song/x-deeplearning/xdl/xdl/data_io/fs/file_system_hdfs.cc:222
#2 0x00007fffcaba9af8 in xdl::io::FileSystemHdfs::Get (namenode=0x1cb28a8 "http://localhost:9000") at /home/yue.song/x-deeplearning/xdl/xdl/data_io/fs/file_system_hdfs.cc:212
#3 0x00007fffcaba7a9b in xdl::io::GetFileSystem (type=type@entry=xdl::io::kHdfs, ext=) at /home/yue.song/x-deeplearning/xdl/xdl/data_io/fs/file_system.cc:58
#4 0x00007fffcab8d05b in xdl::io::DataIO::DataIO (this=0x9b8ce0, ds_name="tdm", parser_type=, fs_type=xdl::io::kHdfs, namenode="http://localhost:9000")
at /home/yue.song/x-deeplearning/xdl/xdl/data_io/data_io.cc:31

反过头来看,当初看下源码的libhdfs.cc文件的以下段落,就不会卡在这个地方了。
这个应该补充到用户文档,毕竟是很关键的hadoop的环境配置。使用者要求debug到代码里,对于初学初次使用门槛略高。
image

from x-deeplearning.

codescv avatar codescv commented on August 24, 2024

HADOOP_HOME=/usr/hdp/2.6.1.0-129/hadoop
HADOOP_HDFS_HOME=/usr/hdp/2.6.1.0-129/hadoop-hdfs

$HADOOP_HDFS_HOME/bin/hdfs dfs -ls 可以显示样本文件
还缺什么吗?

样本路径应该如何设置?
我试过:
hdfs:/path/to/file
hdfs://server/path/to/file
hdfs:///path/to/file
hdfs://default/path/to/file

都会core

from x-deeplearning.

codescv avatar codescv commented on August 24, 2024

export HADOOP_HDFS_HOME=/usr/hdp/2.6.1.0-129/hadoop
ls $HADOOP_HDFS_HOME/lib/native/libhdfs.so
/usr/hdp/2.6.1.0-129/hadoop/lib/native/libhdfs.so

这个文件是有的, 仍然和之前一样的错误

对hadoop版本有要求吗?

from x-deeplearning.

guoxinyang avatar guoxinyang commented on August 24, 2024

你尝试把 namenode指定为 xxx-host:9000 而不是 yiran-data-ns

from x-deeplearning.

githcx avatar githcx commented on August 24, 2024

yiran

我觉得是时候把我的错误贴上来了,做TDM的训练的部分,跑下面这句,也是一样的问题:
python train.py --run_mode=local --config=config.train.json

可以考虑在镜像里加上配好的hadoop,如果这个问题是hadoop配置引起的话

#0 0x00007fffcaba98ce in std::function<hdfs_internal* (char const*, unsigned short)>::operator()(char const*, unsigned short) const (__args#1=0,
__args#0=0x23df6f8 "http://localhost:9000", this=0x7fffcb5b49b8) at /usr/include/c++/5/functional:2267
#1 xdl::io::FileSystemHdfs::FileSystemHdfs (this=0x9b8a20, namenode=0x1cb28a8 "http://localhost:9000") at /home/yue.song/x-deeplearning/xdl/xdl/data_io/fs/file_system_hdfs.cc:222
#2 0x00007fffcaba9af8 in xdl::io::FileSystemHdfs::Get (namenode=0x1cb28a8 "http://localhost:9000") at /home/yue.song/x-deeplearning/xdl/xdl/data_io/fs/file_system_hdfs.cc:212
#3 0x00007fffcaba7a9b in xdl::io::GetFileSystem (type=type@entry=xdl::io::kHdfs, ext=) at /home/yue.song/x-deeplearning/xdl/xdl/data_io/fs/file_system.cc:58
#4 0x00007fffcab8d05b in xdl::io::DataIO::DataIO (this=0x9b8ce0, ds_name="tdm", parser_type=, fs_type=xdl::io::kHdfs, namenode="http://localhost:9000")
at /home/yue.song/x-deeplearning/xdl/xdl/data_io/data_io.cc:31

from x-deeplearning.

githcx avatar githcx commented on August 24, 2024

我最近也遇到类似的问题, 把hadoop 的classpath,全喂到 CLASSPATH 后就不再core了。
可以试下这样做能否解决相关的问题
export CLASSPATH=$(for p in hadoop classpath --glob | sed s'/:/ /g'; do find $p -name '*.jar' 2>/dev/null; done | tr '\n' ':')

刚刚试了一下,照旧还是一样的core。

from x-deeplearning.

ustcdane avatar ustcdane commented on August 24, 2024

遇到如下的问题

Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:561)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:376)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:731)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:726)
... 22 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 31 more
2019-08-20 08:04:43.609134: F /home/yue.song/XDL-OpenSource-master/xdl/xdl/data_io/fs/file_system_hdfs.cc:129] Check failed: info != nullptr can't open dir hdfs://10.140.52.62:8020/user/iminers/wangdan/tdm_t/tree_data/data

from x-deeplearning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.