swan-cern / sparkmonitor Goto Github PK

View Code? Open in Web Editor NEW

41.0 8.0 8.0 4.89 MB

An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks

Home Page: https://pypi.org/project/sparkmonitor/

License: Apache License 2.0

JavaScript 1.15% Python 34.21% CSS 9.29% Scala 20.92% TypeScript 34.42%

jupyter jupyterlab-extension spark jupyter-notebook-extension jupyterlab

sparkmonitor's Introduction

SparkMonitor

An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks

About

SparkMonitor is an extension for Jupyter Notebook & Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface.

Requirements

Jupyter Lab 4 OR Jupyter Notebook 4.4.0 or higher
pyspark 2 or 3

Features

Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
A table of jobs and stages with progressbars
A timeline which shows jobs, stages, and tasks
A graph showing number of active tasks & executor cores vs time

Quick Start

Setting up the extension

pip install sparkmonitor # install the extension

# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

# For use with jupyter notebook install and enable the nbextension
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable  sparkmonitor --py

# The jupyterlab extension is automatically enabled

With the extension installed, a SparkConf object called conf will be usable from your notebooks. You can use it as follows:

from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)

If you already have your own spark configuration, you will need to set spark.extraListeners to sparkmonitor.listener.JupyterSparkMonitorListener and spark.driver.extraClassPath to the path to the sparkmonitor python package path/to/package/sparkmonitor/listener_<scala_version>.jar

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', 'venv/lib/python3.<X>/site-packages/sparkmonitor/listener_<scala_version>.jar')\
        .getOrCreate()

Development

If you'd like to develop the extension:

# See package.json scripts for building the frontend
yarn run build:<action>

# Install the package in editable mode
pip install -e .

# Symlink jupyterlab extension
jupyter labextension develop --overwrite .

# Watch for frontend changes
yarn run watch

# Build the spark JAR files
sbt +package

History

This project was originally written by krishnan-r as a Google Summer of Code project for Jupyter Notebook with the SWAN Notebook Service team at CERN.
Further fixes and improvements were made by the team at CERN and members of the community maintained at swan-cern/jupyter-extensions/tree/master/SparkMonitor
Jafer Haider created the fork jupyterlab-sparkmonitor to update the extension to be compatible with JupyterLab as part of his internship at Yelp.
This repository merges all the work done above and provides support for Lab & Notebook from a single package.

Changelog

This repository is published to pypi as sparkmonitor

2.x see the github releases page of this repository
1.x and below were published from swan-cern/jupyter-extensions and some initial versions from krishnan-r/sparkmonitor

sparkmonitor's People

Contributors

Stargazers

Watchers

Forkers

krishnan-r darshanparab rahul26goyal richardfontaine utkarshgupta137 kaspian-inc akhilputhiry etejedor

sparkmonitor's Issues

Monitoring progress bar not showing when using spark on kubernetes kernel

I am trying to use sparkmonitor for monitoring my spark job which i am running on jupyterlab and spark on kubernetes kernel.It works fine when i am using python3 kernel.

In logs i can see that extension is loaded fine and some other info.

INFO:SparkMonitorKernel:Starting Kernel Extension
INFO:SparkMonitorKernel:Socket Listening on port 33039
INFO:SparkMonitorKernel:Starting socket thread, going to accept
INFO:SparkMonitorKernel:33039

My jupyterlab version is 3.5.0 and spark on kubernetes is 3.0.2

Can someone please help me on this.

log4j default logging level for sparkmonitor

Hi there,

Currently, when we are configuring logging in Spark 3.3.2 with JupyterLab 3.6.5, we are using the log4j2.properties file to set the logging the level for pyspark notebooks (iPython).

When we use sparkmonitor we receive the following INFO messages, which appear as red warning level message:

INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 47646)

Looking into the source code for the listener, I could see the class path should be something like JupyterSparkMonitorListener, which means in the pyspark log4j2 properties file, we would expect something like this:

logger.sparkmonitor.name = sparkmonitor.listener.JupyterSparkMonitorListener
logger.sparkmonitor.level = error

However, this didn't seem to work. Looking into the JupyterSparkMonitorListener class, I could see it was using log4j,not log4j2. I then tried including a log4j.properties file in the spark configuration directory using the old configuration format:

log4j.logger.sparkmonitor.listener.JupyterSparkMonitorListener=WARN

However, this didn't work either. Do you have any advice on how to adjust the log4j default logging level. We want to hide info level message from users in Jupyter Lab running Spark - not sure where or how to configure this.

ScalaMonitor' object has no attribute 'comm'

when click Restart the kernel, then re-run the whole notebook ,just throw this error.

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/sparkmonitor/kernelextension.py", line 126, in run
    self.onrecv(msg)
  File "/usr/local/lib/python3.6/dist-packages/sparkmonitor/kernelextension.py", line 145, in onrecv
    'msg': msg
  File "/usr/local/lib/python3.6/dist-packages/sparkmonitor/kernelextension.py", line 223, in sendToFrontEnd
    monitor.send(msg)
  File "/usr/local/lib/python3.6/dist-packages/sparkmonitor/kernelextension.py", line 57, in send
    self.comm.send(msg)
AttributeError: 'ScalaMonitor' object has no attribute 'comm'

spark.driver.extraClassPath=/usr/local/lib/python3.6/dist-packages/sparkmonitor/listener_2.12.jar
spark.extraListeners=sparkmonitor.listener.JupyterSparkMonitorListener
SparkContext

[Spark UI](http://localhost:4041/)

Versionv3.1.2Masterlocal[*]AppNamepyspark-demo

but if I manualy click the Restart the Kernel and rerun the cell spark context init , it's working

Text changing colour in JupyterLab dark theme, background remains white (JupyterLab 4.1.2)

I've been working on a new Pyspark JupyterLab build and was testing extension support one at a time using the Docker container quay.io/jupyter/pyspark-notebook:x86_64-spark-3.5.0. I installed the sparkmonitor extension as per normal and it works fine in the default JupyterLab light theme. However, as the table background CSS attributes for sparkmonitor are static, when changing to the dark theme for JupyterLab, the text of the table is changed to lighter colours and make the text illegible.

Either, the plugin should adjust the appearance based on theme (inherit the theme aware CSS attributes), or force the font colour CSS in the extension to always have the dark colours as per the light theme extension appearance for JupyterLab.

It should be noted that the tasks and event timeline views of sparkmonitor are rendering with black text when the dark theme is selected. These do appear to be rendering libraries though, and it might be difficult to inherit the JupyterLab theme CSS to update the appearance correctly. Perhaps the easiest thing is to force the font colour in the extension CSS and then in a future body of work look at adding theme colour support.

In the meantime, I will try earlier versions of JupyterLab until I find something working as expected. We are currently using JupyterLab 3.6.2, and the text is visible even in dark theme.

spark-sql query doesn't show the sparkmonitor progressbar

it work fine with spark session simple demo

but if I test the hive query in jupyter lab with sparkmonitor extention enabled.
the

from pyspark.sql import SparkSession
app_name="pyspark-hive-demo"

spark = SparkSession.builder.appName(app_name).enableHiveSupport()

spark = spark.config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')
spark = spark.config('spark.driver.extraClassPath', '/usr/local/lib/python3.6/dist-packages/sparkmonitor/listener_2.12.jar')
spark = spark.getOrCreate()

df=spark.sql("select title,source_id,source_url  from db.tb where pt_d='2022-10-18' limit 10")
df.show(10)

why just print the result ,has no spark monitor process bar

Scrolling for a Spark lengthy job progress bar

Requested by Michał (UCA-386):

I am working with a large dataset (1000+ signals) queried for several periods of time with NXCALS. The query results in tens of load and collect jobs. As a result the progress bar under the cell is quite long making difficult the use of the notebook (see attached). For a longer query, my browser couldn't handle hundreds of lines appearing.

Thus I'd like to suggest adding an internal scroll bar and/or grouping similar tasks into a single subtree.

AttributeError: 'ScalaMonitor' object has no attribute 'comm'

Summary of Issue

Note - Related container log error: ERROR JupyterSparkMonitorListener: Exception sending socket message:

Hello! We are running a Jupyterhub service that spawns Kubernetes pods for single user servers. We also run our jupyterhub proxy as a separate service. We've been experiencing a very persistent, but flaky, issue with the Sparkmonitor extension. When creating a spark session, I am seeing the following error AttributeError: 'ScalaMonitor' object has no attribute 'comm'. However, this does not happen consistently. I oftentimes am able to create a session where the monitor will work but then if I restart the kernel, the monitor is broken with the above error (or vice versa).

For example, on my first attempt at creating a spark session, the extension works fine. Container log:

INFO:SparkMonitorKernel:Starting Kernel Extension
INFO:SparkMonitorKernel:Socket Listening on port 59845
INFO:SparkMonitorKernel:Starting socket thread, going to accept
INFO:SparkMonitorKernel:59845
INFO:SparkMonitorKernel:Adding jar from /work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/listener_2.12.jar 
[I 2021-12-21 23:45:52.892 SingleUserNotebookApp log:189] 101 GET /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/channels?session_id=aca66ed3-e383-4dfd-bd21-eae11e96ccb2 ([email protected]) 1877.03ms
[I 2021-12-21 23:45:53.002 SingleUserNotebookApp log:189] 101 GET /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/channels?session_id=26868351-57cd-43c6-b14b-c85bd1842316 ([email protected]) 2.18ms
[I 2021-12-21 23:45:53.147 SingleUserNotebookApp log:189] 101 GET /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/channels?session_id=0ccb7d08-e61b-44e6-a08c-57a36d2797ed ([email protected]) 2.12ms
[I 2021-12-21 23:45:55.711 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130355597 ([email protected]) 1.13ms
[I 2021-12-21 23:45:56.253 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130356141 ([email protected]) 1.09ms
[I 2021-12-21 23:45:56.254 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130356142 ([email protected]) 1.01ms
INFO:SparkMonitorKernel:SparkMonitor comm opened from frontend.
INFO:SparkMonitorKernel:SparkMonitor comm opened from frontend.
[I 2021-12-21 23:46:03.074 SingleUserNotebookApp log:189] 200 GET /user/username/api/contents?content=1&1640130360080 ([email protected]) 2874.67ms
[I 2021-12-21 23:46:05.820 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130365706 ([email protected]) 1.36ms
[I 2021-12-21 23:46:05.981 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernelspecs?1640130365868 ([email protected]) 2.78ms
[I 2021-12-21 23:46:07.068 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130366954 ([email protected]) 1.18ms
[I 2021-12-21 23:46:07.070 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130366956 ([email protected]) 1.06ms
[I 2021-12-21 23:46:16.011 SingleUserNotebookApp log:189] 200 GET /user/username/api/contents?content=1&1640130373104 ([email protected]) 2795.89ms
[I 2021-12-21 23:46:16.014 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130375820 ([email protected]) 0.98ms
[I 2021-12-21 23:46:17.181 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130377069 ([email protected]) 1.22ms
[I 2021-12-21 23:46:17.182 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130377071 ([email protected]) 0.90ms
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
I1221 23:46:20.384598   242 sched.cpp:232] Version: 1.7.2
I1221 23:46:20.385978   225 sched.cpp:336] New master detected at [email protected]:5050
I1221 23:46:20.386255   225 sched.cpp:401] Authenticating with master [email protected]:5050
I1221 23:46:20.386270   225 sched.cpp:408] Using default CRAM-MD5 authenticatee
I1221 23:46:20.386425   234 authenticatee.cpp:97] Initializing client SASL
I1221 23:46:20.387233   234 authenticatee.cpp:121] Creating new client SASL connection
W1221 23:46:20.397209   241 process.cpp:838] Failed to recv on socket 437 to peer '36.72.23.53:34312': Decoder error
I1221 23:46:20.398273   229 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I1221 23:46:20.398306   229 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I1221 23:46:20.398959   238 authenticatee.cpp:259] Received SASL authentication step
I1221 23:46:20.399524   239 authenticatee.cpp:299] Authentication success
I1221 23:46:20.399653   236 sched.cpp:513] Successfully authenticated with master [email protected]:5050
I1221 23:46:20.401082   230 sched.cpp:744] Framework registered with 195f7369-64e5-4bc5-a8ed-8b576994450a-11035
21/12/21 23:46:20 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 35478)
[I 2021-12-21 23:46:26.141 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130386026 ([email protected]) 1.00ms
[I 2021-12-21 23:46:28.410 SingleUserNotebookApp log:189] 200 GET /user/username/api/contents?content=1&1640130386068 ([email protected]) 2230.40ms
[I 2021-12-21 23:46:28.416 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130387180 ([email protected]) 3.12ms
[I 2021-12-21 23:46:28.417 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130387182 ([email protected]) 3.31ms
[I 2021-12-21 23:46:39.097 SingleUserNotebookApp log:189] 200 GET /user/username/api/contents?content=1&1640130396071 ([email protected]) 2914.93ms
[I 2021-12-21 23:46:39.100 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130396136 ([email protected]) 1.93ms
[I 2021-12-21 23:46:39.101 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130398473 ([email protected]) 2.26ms
[I 2021-12-21 23:46:39.101 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130398474 ([email protected]) 2.46ms
/source/virtualenv_run_jupyter/lib/python3.7/site-packages/jupyter_client/manager.py:358: FutureWarning: Method cleanup(connection_file=True) is deprecated, use cleanup_resources(restart=False).
  FutureWarning)
[I 2021-12-21 23:46:39.217 SingleUserNotebookApp multikernelmanager:534] Kernel restarted: e5800417-4831-4bb4-9eec-85d7fb16b388

But if I restart the kernel and try again, the extension fails with AttributeError: 'ScalaMonitor' object has no attribute 'comm' (Note: it does not consistently follow this pattern of first time it works, second time it fails. Often the first session will fail and then it works after a kernel restart. Or it will fail three times in a row then work.)

INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 35472)
Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 126, in run
    self.onrecv(msg)
  File "/work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 145, in onrecv
    'msg': msg
  File "/work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 223, in sendToFrontEnd
    monitor.send(msg)
  File "/work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 57, in send
    self.comm.send(msg)
AttributeError: 'ScalaMonitor' object has no attribute 'comm'

Container log on failure:

[I 2021-12-21 23:46:39.219 SingleUserNotebookApp log:189] 200 POST /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/restart?1640130391290 ([email protected]) 7817.75ms
[I 2021-12-21 23:46:39.388 SingleUserNotebookApp log:189] 101 GET /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/channels?session_id=26868351-57cd-43c6-b14b-c85bd1842316 ([email protected]) 2.70ms
INFO:SparkMonitorKernel:Starting Kernel Extension
INFO:SparkMonitorKernel:Socket Listening on port 53939
INFO:SparkMonitorKernel:Starting socket thread, going to accept
INFO:SparkMonitorKernel:53939
INFO:SparkMonitorKernel:Adding jar from /work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/listener_2.12.jar 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
I1221 23:46:45.923640   389 sched.cpp:232] Version: 1.7.2
I1221 23:46:45.925024   378 sched.cpp:336] New master detected at [email protected]:5050
I1221 23:46:45.925308   378 sched.cpp:401] Authenticating with master [email protected]:5050
I1221 23:46:45.925323   378 sched.cpp:408] Using default CRAM-MD5 authenticatee
I1221 23:46:45.925534   382 authenticatee.cpp:97] Initializing client SASL
I1221 23:46:45.926290   382 authenticatee.cpp:121] Creating new client SASL connection
W1221 23:46:45.928249   388 process.cpp:838] Failed to recv on socket 437 to peer '36.72.23.53:37450': Decoder error
I1221 23:46:45.929303   375 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I1221 23:46:45.929327   375 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I1221 23:46:45.929982   383 authenticatee.cpp:259] Received SASL authentication step
I1221 23:46:45.930640   386 authenticatee.cpp:299] Authentication success
I1221 23:46:45.930742   385 sched.cpp:513] Successfully authenticated with master [email protected]:5050
I1221 23:46:45.933359   377 sched.cpp:744] Framework registered with 195f7369-64e5-4bc5-a8ed-8b576994450a-11038
21/12/21 23:46:46 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 35472)
INFO:SparkMonitorKernel:SparkMonitor comm opened from frontend.
[I 2021-12-21 23:46:49.219 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130409107 ([email protected]) 1.17ms
[I 2021-12-21 23:46:49.220 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130409109 ([email protected]) 1.04ms
[I 2021-12-21 23:46:49.222 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130409111 ([email protected]) 0.99ms
21/12/21 23:46:51 ERROR JupyterSparkMonitorListener: Exception sending socket message: 
java.net.SocketException: Broken pipe (Write failed)
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
        at sparkmonitor.listener.JupyterSparkMonitorListener.send(CustomListener.scala:54)
        at sparkmonitor.listener.JupyterSparkMonitorListener.onExecutorAdded(CustomListener.scala:652)
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:63)
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
        at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
        at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
        at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
        at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
        at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1319)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

I'm experiencing this issue even with a minimal configuration such as this:

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', '/source/virtualenv_run_jupyter/lib/python3.7/site-packages/sparkmonitor/listener_2.12.jar:/opt/spark/extra_jars/guava-27.0.1-jre.jar')\
        .getOrCreate()

Additional Details

We are using the following Jupyter related packages/dependencies in our containers:

ipykernel==5.3.4
ipython==7.20.0
jupyter-client==6.1.5
jupyter-core==4.7.1
jupyter-packaging==0.10.2
jupyter-server==1.7.0
jupyter-server-proxy==1.6.0
jupyter-telemetry==0.1.0
jupyterhub==1.4.2
jupyterlab==3.0.16
jupyterlab-pygments==0.1.2
jupyterlab-server==2.5.2
jupyterlab-widgets==1.0.0
nbconvert==5.5.0
nbformat==5.1.3
traitlets==4.3.3

Any assistance with troubleshooting this issue would be greatly appreciated. A lot of our users really like the Sparkmonitor! Please let me know if I can provide any other details about our setup that would be helpful. Thank you!

Explore an alternative approach to Spark UI Proxy

Explore using https://github.com/jupyterhub/jupyter-server-proxy or another generic approach to provide the Spark UI through a proxy.

The current approach is brittle as it works only on localhost and is hardcoded. (This is currently removed in the refactor #1, will be added back.).

In our current deployment, we rely on https://github.com/swan-cern/jupyter-extensions/tree/master/SparkConnector as an external link. (this requires to be in the same network)

Task error and output not showing anymore in Spark monitor

In the task timeline tab of the Spark monitor, when a task fails and the user clicks on the task bar to see information about the task, it is not possible anymore to see the error and output of that particular task.

This is important for the debugging experience of the user.

Apparently the problem has to do with getting that information from the Spark UI.

ERROR JupyterSparkMonitorListener: Exception creating socket: java.lang.NumberFormatException: For input string: "ERRORNOTFOUND"

Hi there!

I installed the sparkmonitor extension into a dockerimage that is based on jupyterhub:

FROM jupyterhub/k8s-singleuser-sample:1.2.0 

RUN pip install pyspark==3.2.0
RUN pip install delta-spark==1.1.0

USER root
RUN apt update
RUN apt install default-jdk -y
RUN apt install nodejs -y
#USER ${NB_UID}
RUN java --version

RUN pip install sparkmonitor # install the extension

# set up an ipython profile and add our kernel extension to it
#ipython profile create # if it does not exist
RUN echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

# For use with jupyter notebook install and enable the nbextension
RUN jupyter nbextension install sparkmonitor --py
RUN jupyter nbextension enable  sparkmonitor --py

# The jupyterlab extension is automatically enabled

USER ${NB_UID}

When I opened the SparkSessions with:

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config("spark.extraListeners", "sparkmonitor.listener.JupyterSparkMonitorListener") \
        .config("spark.driver.extraClassPath", "/opt/conda/lib/python3.9/site-packages/sparkmonitor/listener_2.12.jar") \
        .getOrCreate()

I get the following error:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/conda/lib/python3.9/site-packages/pyspark/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/04/01 11:14:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/01 11:14:25 ERROR JupyterSparkMonitorListener: Exception creating socket: 
java.lang.NumberFormatException: For input string: "ERRORNOTFOUND"
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.base/java.lang.Integer.parseInt(Integer.java:652)
	at java.base/java.lang.Integer.parseInt(Integer.java:770)
	at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
	at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
	at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
	at sparkmonitor.listener.JupyterSparkMonitorListener.startConnection(CustomListener.scala:63)
	at sparkmonitor.listener.JupyterSparkMonitorListener.<init>(CustomListener.scala:48)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2876)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2868)
	at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1(SparkContext.scala:2538)
	at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1$adapted(SparkContext.scala:2537)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2537)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:641)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)
22/04/01 11:14:25 ERROR JupyterSparkMonitorListener: Exception sending socket message: 
java.lang.NullPointerException
	at sparkmonitor.listener.JupyterSparkMonitorListener.send(CustomListener.scala:53)
	at sparkmonitor.listener.JupyterSparkMonitorListener.onExecutorAdded(CustomListener.scala:652)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:63)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
22/04/01 11:14:26 ERROR JupyterSparkMonitorListener: Exception sending socket message: 
java.lang.NullPointerException
	at sparkmonitor.listener.JupyterSparkMonitorListener.send(CustomListener.scala:53)
	at sparkmonitor.listener.JupyterSparkMonitorListener.onApplicationStart(CustomListener.scala:147)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

Does somebody have any hints how I can fix that?

Sparkmonitor failure for kernel restart - java.net.SocketException: Broken pipe (Write failed)

Hi,

Noticed the following issue when restarting the kernel from a classic Jupyter
Notebook using JEG to launch remote spark kernels in kubernetes. The
sparkmonitor doesn't show up, and in the driver logs we see its a
java.net.SocketException: Broken pipe (Write failed) that's being thrown and the following line -

[IPKernelApp] WARNING | No such comm: b5b03d3c1393459f9b736fb5f5dd5461

PFA the stack trace at the end.

Observations so far -

For a successful case -

Comm opened
Client connected

INFO:SparkMonitorKernel:Comm opened
[I 2021-12-28 06:36:19,883.883 SparkMonitorKernel] Comm opened
...
INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 35792)
[I 2021-12-28 06:36:19,914.914 SparkMonitorKernel] Client Connected ('127.0.0.1', 35792)

For failure case -

Client Connected
Comm opened

INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 33320)
[I 2021-12-28 05:44:11,603.603 SparkMonitorKernel] Client Connected ('127.0.0.1', 33320)
...
INFO:SparkMonitorKernel:Comm opened
[I 2021-12-28 05:44:11,760.760 SparkMonitorKernel] Comm opened

For a temporary fix, to replicate the successful case, a delay of 20secs has
been placed in the CustomListener.scala before establishing the socket
connection. This is to ensure the Comm opened is done before Client Connected

Thanks @akhileshram for pointing out the fix

def startConnection(): Unit = {
  try {
      Thread.sleep(20000) // added
      socket = new Socket("localhost", port.toInt)
      out = new OutputStreamWriter(socket.getOutputStream())

      ....
  }
}

Any hint, or help with this issue will help out a lot.

Error

2021-12-28 05:44:48,802 INFO  [spark-listener-group-shared] listener.JupyterSparkMonitorListener (CustomListener.scala:onJobStart(267)) - Job Start: 0
2021-12-28 05:44:48,804 ERROR [spark-listener-group-shared] listener.JupyterSparkMonitorListener (CustomListener.scala:send(86)) - Exception sending socket message:
java.net.SocketException: Broken pipe (Write failed)
	at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
	at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
	at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
	at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
	at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:316)
	at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:153)
	at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251)
	at sparkmonitor.listener.JupyterSparkMonitorListener.send(CustomListener.scala:83)
	at sparkmonitor.listener.JupyterSparkMonitorListener.onJobStart(CustomListener.scala:269)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

Difference between the old sparkmonitor and the new one

the old monitor's feature

the new:

So where is the feature which is in old ver but not in the new one.

Sparkmonitor failure for kernel restart - java.net.SocketException: Broken pipe (Write failed)

Hi,

Noticed the following issue when restarting the kernel from a classic Jupyter Notebook using JEG to launch remote spark kernels in kubernetes. The sparkmonitor doesn't show up, and in the driver logs we see its a java.net.SocketException: Broken pipe (Write failed) that's being thrown and the following line -

[IPKernelApp] WARNING | No such comm: b5b03d3c1393459f9b736fb5f5dd5461

PFA the stack trace at the end.

Observations so far -

For a successful case -

Comm opened
Client connected

INFO:SparkMonitorKernel:Comm opened
[I 2021-12-28 06:36:19,883.883 SparkMonitorKernel] Comm opened
...
INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 35792)
[I 2021-12-28 06:36:19,914.914 SparkMonitorKernel] Client Connected ('127.0.0.1', 35792)

For failure case -

Client Connected
Comm opened

INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 33320)
[I 2021-12-28 05:44:11,603.603 SparkMonitorKernel] Client Connected ('127.0.0.1', 33320)
...
INFO:SparkMonitorKernel:Comm opened
[I 2021-12-28 05:44:11,760.760 SparkMonitorKernel] Comm opened

For a temporary fix, to replicate the successful case, a delay of 20secs has been placed in the CustomListener.scala before establishing the socket connection. This is to ensure the Comm opened is done before Client Connected

Thanks @akhileshram for pointing out the fix

def startConnection(): Unit = {
  try {
      Thread.sleep(20000) // added
      socket = new Socket("localhost", port.toInt)
      out = new OutputStreamWriter(socket.getOutputStream())

      ....
  }
}

Any hint, or help with this issue will help out a lot.

Error

2021-12-28 05:44:48,802 INFO  [spark-listener-group-shared] listener.JupyterSparkMonitorListener (CustomListener.scala:onJobStart(267)) - Job Start: 0
2021-12-28 05:44:48,804 ERROR [spark-listener-group-shared] listener.JupyterSparkMonitorListener (CustomListener.scala:send(86)) - Exception sending socket message:
java.net.SocketException: Broken pipe (Write failed)
	at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
	at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
	at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
	at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
	at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:316)
	at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:153)
	at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251)
	at sparkmonitor.listener.JupyterSparkMonitorListener.send(CustomListener.scala:83)
	at sparkmonitor.listener.JupyterSparkMonitorListener.onJobStart(CustomListener.scala:269)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

swan-cern / sparkmonitor Goto Github PK

sparkmonitor's Introduction

SparkMonitor

About

Requirements

Features

Quick Start

Setting up the extension

Development

History

Changelog

sparkmonitor's People

Contributors

Stargazers

Watchers

Forkers

sparkmonitor's Issues

Summary of Issue

Additional Details

Observations so far -

For a successful case -

For failure case -

Error

Observations so far -

For a successful case -

For failure case -

Error

Recommend Projects

Recommend Topics

Recommend Org