I am trying to run some jobs on the spark-cluster, the job finish but i am not able to

Unable to see submitted job on spark history journal? about ansible-spark-cluster HOT 4 CLOSED

lresende commented on June 8, 2024

Unable to see submitted job on spark history journal?

from ansible-spark-cluster.

Comments (4)

ziedbouf commented on June 8, 2024

So i trying pass pyspark options before instantiating the sparkContext but i am getting the following error:

import os
import platform
from pyspark import SparkContext

os.environ["JAVA_HOME"] = "/usr/java/jdk1.8.0_181-amd64"
os.environ["SPARK_HOME"] = "/usr/hdp/current/spark2-client"
os.environ["PYSPARK_SUBMIT_ARGS"] = "--master yarn --deploy-mode client --num-executors 24 --executor-memory 10g --executor-cores 5 pyspark-shell"

sc = SparkContext.getOrCreate()

leaves me with the following error:

/opt/anaconda3/lib/python3.6/site-packages/pyspark/java_gateway.py in launch_gateway(conf)
     91 
     92             if not os.path.isfile(conn_info_file):
---> 93                 raise Exception("Java gateway process exited before sending its port number")
     94 
     95             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

The following section with the same issue published on jupyter repos helps me to print more details on the stack:

from jupyter notebook side i got the following:

['/usr/hdp/current/spark2-client/./bin/spark-submit', '--master', 'yarn', '--deploy-mode', 'client', '--num-executors', '24', '--executor-memory', '10g', '--executor-cores', '5', 'pyspark-shell'] {'PATH': '/opt/anaconda3/bin:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.8.0_181-amd64/bin:/usr/java/jdk1.8.0_181-amd64/jre/bin:/opt/anaconda3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/elyra/.local/bin:/home/elyra/bin', 'KERNEL_USERNAME': 'elyra', 'KERNEL_GATEWAY': '1', 'KERNEL_ID': '080e0238-8106-400a-a7a5-65b9231e939b', 'EG_IMPERSONATION_ENABLED': 'False', 'JPY_PARENT_PID': '23227', 'TERM': 'xterm-color', 'CLICOLOR': '1', 'PAGER': 'cat', 'GIT_PAGER': 'cat', 'MPLBACKEND': 'module://ipykernel.pylab.backend_inline', 'JAVA_HOME': '/usr/java/jdk1.8.0_181-amd64', 'SPARK_HOME': '/usr/hdp/current/spark2-client', 'PYSPARK_SUBMIT_ARGS': '--master yarn --deploy-mode client --num-executors 24 --executor-memory 10g --executor-cores 5 pyspark-shell', '_PYSPARK_DRIVER_CONN_INFO_PATH': '/tmp/tmpbnwc7g4l/tmp70u_tphi'}
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-4-997392139fba> in <module>()
----> 1 sc = SparkContext.getOrCreate()

/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py in getOrCreate(cls, conf)
    341         with SparkContext._lock:
    342             if SparkContext._active_spark_context is None:
--> 343                 SparkContext(conf=conf or SparkConf())
    344             return SparkContext._active_spark_context
    345 

/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    113         """
    114         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 115         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    116         try:
    117             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf)
    290         with SparkContext._lock:
    291             if not SparkContext._gateway:
--> 292                 SparkContext._gateway = gateway or launch_gateway(conf)
    293                 SparkContext._jvm = SparkContext._gateway.jvm
    294 

/opt/anaconda3/lib/python3.6/site-packages/pyspark/java_gateway.py in launch_gateway(conf)
     93 
     94             if not os.path.isfile(conn_info_file):
---> 95                 raise Exception("Java gateway process exited before sending its port number")
     96 
     97             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

from jupyter entreprise gateway, it shows the following:

[D 2018-09-09 12:23:50.418 EnterpriseGatewayApp] kernel_id=080e0238-8106-400a-a7a5-65b9231e939b, kernel_name=python3, last_activity=2018-09-09 12:23:50.402744+00:00
  File "/bin/hdp-select", line 242
    print "ERROR: Invalid package - " + name
                                    ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)?
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
[D 2018-09-09 12:23:50.617 EnterpriseGatewayApp] activity on 080e0238-8106-400a-a7a5-65b9231e939b: stream
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/09/09 12:23:52 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST
	at scala.collection.MapLike$class.default(MapLike.scala:228)
	at scala.collection.AbstractMap.default(Map.scala:59)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at scala.collection.AbstractMap.apply(Map.scala:59)
	at org.apache.spark.api.python.PythonGatewayServer$$anonfun$main$1.apply$mcV$sp(PythonGatewayServer.scala:50)
	at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1302)
	at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:37)
	at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

It seems that i am missing somehow a specfic configuration ?

from ansible-spark-cluster.

ziedbouf commented on June 8, 2024

It seems that the following error due to the automation deployment using ansible, as the installed version of spark includes the last version of py4j-0.10.6-src.zip instead where we deploy the kernelspec we push the kernelspecs with py4j-0.10.4-src.zip.

Adding to this if we set python=3 no local python of version 2 ${HOME}/.local/lib/python2.7/site-package which it seems to lead to an issue related to hdp-select script as ti

from ansible-spark-cluster.

ziedbouf commented on June 8, 2024

So it was the issue when pushing kernels with mismatch on py4j version it will raise the issue of va.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST i succefully was able to connect the kernel to the yarn cluster but got another issue related:

[E 180910 11:05:29 web:1621] Uncaught exception POST /api/kernels (127.0.0.1)
    HTTPServerRequest(protocol='http', host='127.0.0.1:8888', method='POST', uri='/api/kernels', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/web.py", line 1543, in _execute
        result = yield result
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
        value = future.result()
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
        yielded = self.gen.throw(*exc_info)
      File "/opt/anaconda3/lib/python3.6/site-packages/kernel_gateway/services/kernels/handlers.py", line 71, in post
        yield super(MainKernelHandler, self).post()
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
        value = future.result()
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
        yielded = self.gen.throw(*exc_info)
      File "/opt/anaconda3/lib/python3.6/site-packages/notebook/services/kernels/handlers.py", line 47, in post
        kernel_id = yield gen.maybe_future(km.start_kernel(kernel_name=model['name']))
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
        value = future.result()
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
        yielded = self.gen.throw(*exc_info)
      File "/opt/anaconda3/lib/python3.6/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 28, in start_kernel
        kernel_id = yield gen.maybe_future(super(RemoteMappingKernelManager, self).start_kernel(*args, **kwargs))
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
        value = future.result()
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
        yielded = self.gen.throw(*exc_info)
      File "/opt/anaconda3/lib/python3.6/site-packages/kernel_gateway/services/kernels/manager.py", line 81, in start_kernel
        kernel_id = yield gen.maybe_future(super(SeedingMappingKernelManager, self).start_kernel(*args, **kwargs))
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
        value = future.result()
      File "/opt/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 315, in wrapper
        yielded = next(result)
      File "/opt/anaconda3/lib/python3.6/site-packages/notebook/services/kernels/kernelmanager.py", line 148, in start_kernel
        super(MappingKernelManager, self).start_kernel(**kwargs)
      File "/opt/anaconda3/lib/python3.6/site-packages/jupyter_client/multikernelmanager.py", line 110, in start_kernel
        km.start_kernel(**kwargs)
      File "/opt/anaconda3/lib/python3.6/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 102, in start_kernel
        return super(RemoteKernelManager, self).start_kernel(**kw)
      File "/opt/anaconda3/lib/python3.6/site-packages/jupyter_client/manager.py", line 259, in start_kernel
        **kw)
      File "/opt/anaconda3/lib/python3.6/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 131, in _launch_kernel
        return self.process_proxy.launch_process(kernel_cmd, **kw)
      File "/opt/anaconda3/lib/python3.6/site-packages/enterprise_gateway/services/processproxies/yarn.py", line 53, in launch_process
        self.confirm_remote_startup(kernel_cmd, **kw)
      File "/opt/anaconda3/lib/python3.6/site-packages/enterprise_gateway/services/processproxies/yarn.py", line 166, in confirm_remote_startup
        self.detect_launch_failure()
      File "/opt/anaconda3/lib/python3.6/site-packages/enterprise_gateway/services/processproxies/processproxy.py", line 587, in detect_launch_failure
        if self.local_proc and self.local_proc.poll() > 0:
    TypeError: '>' not supported between instances of 'NoneType' and 'int'

from ansible-spark-cluster.

lresende commented on June 8, 2024

The TypeError: '>' not supported between instances of 'NoneType' and 'int' exception is being fixed by this EG commit and we are going to provide an update release with the fix later today.

from ansible-spark-cluster.

Unable to see submitted job on spark history journal? about ansible-spark-cluster HOT 4 CLOSED

Comments (4)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent