Coder Social home page Coder Social logo

tony-core runtime error about tony HOT 14 OPEN

tonywang-sh avatar tonywang-sh commented on September 2, 2024
tony-core runtime error

from tony.

Comments (14)

zuston avatar zuston commented on September 2, 2024

If u submit tony app to secured cluster, the machine must be certified, which means keytab or principle must be provided.

I think you could use this machine to submit spark app for test. If it's ok, the tony app also can be submitted to cluster.

from tony.

tonywang-sh avatar tonywang-sh commented on September 2, 2024

Thanks for your reply. The cluster is hadoop 3.2.2 with kerberos, and I tried spark example successfully. I tried minist-tensorflow example according to the guide, https://github.com/tony-framework/TonY/tree/master/tony-examples/mnist-tensorflow, but it failed. Do I need any other setting or configuration for this task?

from tony.

zuston avatar zuston commented on September 2, 2024

Please attach the detailed error log and submit cli command args/ tony.xml and so on.

from tony.

tonywang-sh avatar tonywang-sh commented on September 2, 2024

cli command:
#!/usr/bin/env bash
java -cp hadoop classpath:/data/tony-dist/tony-cli-0.5.3-uber.jar com.linkedin.tony.cli.ClusterSubmitter
--python_venv=/data/venv/myvenv.zip
--src_dir=/data/tony-dist/mnist-tensorflow
--executes=mnist_distributed.py \ # relative path inside src/
--task_params="--steps 1000 --data_dir /user/test/tony/data --working_dir /user/test/tony/model" \ # You can use your HDFS path here.
--conf_file=/data/tony-dist/tony.xml
--python_binary_path=venv/bin/python # relative path inside venv.zip

tony.xml,
image

error logs as the below:
AM Container for appattempt_1657011602166_1367_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2022-08-03 13:41:09.319]Exception from container-launch.
Container id: container_e94_1657011602166_1367_02_000001
Exit code: 1
Exception message: Launch container failed
Shell output: main : command provided 1
main : run as user is test
main : requested yarn user is test
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /data1/yarn/nm/nmPrivate/application_1657011602166_1367/container_e94_1657011602166_1367_02_000001/container_e94_1657011602166_1367_02_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
[2022-08-03 13:41:09.321]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of amstderr.log :
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataOutputStream
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataOutputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more

from tony.

zuston avatar zuston commented on September 2, 2024

Is the same problem? #672

It looks the nodemanager machine don't have the complete hadoop environment.

from tony.

tonywang-sh avatar tonywang-sh commented on September 2, 2024

Got it, I have updated hadoop environment, and it reported python error as the below.
image

The error: ModuleNotFoundError: No module named 'contextlib'

from tony.

zuston avatar zuston commented on September 2, 2024

You should package your pyenv zip at linux system machine same as the NM system. @tonywang-sh

from tony.

tonywang-sh avatar tonywang-sh commented on September 2, 2024

My package pyenv is set at ubuntu 18.04 system with anaconda according to the guide https://github.com/tony-framework/TonY/tree/master/tony-examples/mnist-tensorflow. Do you have another guide about setting up nomachine system package env to package this pyenv zip? Thanks.

from tony.

zuston avatar zuston commented on September 2, 2024

Conda is also OK. If you want to check whether the env is OK, you could launch it in local machine.

from tony.

tonywang-sh avatar tonywang-sh commented on September 2, 2024

I used anaconda to package virtualenv python and obtained virtualenv pyenv zip, but this pyenv zip can not work at worker nodes. Is it right method?

from tony.

zuston avatar zuston commented on September 2, 2024

Does this pyenv can be used in your local machine? You'd better to pre-check

from tony.

tonywang-sh avatar tonywang-sh commented on September 2, 2024

It worked in local machine by using "ven/bin/python " cmd line, but failed in remote worker node by submitting task with TonY script.

from tony.

zuston avatar zuston commented on September 2, 2024

I guess this is caused by your local machine' env is not consistent with the nodemanager.

from tony.

tonywang-sh avatar tonywang-sh commented on September 2, 2024

If pyenv is packaged by virtualenv or anaconda, does it need to activate this pyenv python environment at the worker node, such as the comand, 'venv/bin/activate' before the task start at the worker. But I didn't find this "activate" operation in TonY project.

from tony.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.