vesoft-inc / nebula-spark-connector Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Describe alternatives you've considered
Additional context
Ref to GraphAr and this planing feature
本地测试nebula-connector遇到这个问题,求解答
support spark 3.3
为什么一个connect需要这么大的体积,这完全不符合常理
论坛上提到的用python client扫描全部点和边的方式不适合分布式计算,希望能支撑pyspark 连接
问题描述
nebula 点集合VertexID类型是String, 在调用loadVerticesToGraphx 时候 报错 Long类型转换报错。
下面是修改后可以正常运行:type VertexID = String
package object connector {
type Address = (String, Int)
type NebulaType = Int
type Prop = List[Any]
type PropertyNames = List[String]
type PropertyValues = List[Any]
// 是否可以将VertextID 值改为 String 。 Long转String 没问题, 但String 未必能成功转换成Long
//type VertexID = Long
type VertexID = String
type VertexIDSlice = String
type NebulaGraphxVertex = (VertexID, PropertyValues)
type NebulaGraphxEdge = org.apache.spark.graphx.Edge[(EdgeRank, Prop)]
type EdgeRank = Long
reason
The current logic is to read all the partition data and then compose insert statements to write it out to new nebula .
It's possible to run out of memory if you have a large amount of data, and of course spark does overflow to disk, but I did run into OOM job interruptions. In addition, it does not give full play to the ability of multitasking, in fact, it is completely possible to split the partition, each partition a task, read a partition to write, the efficiency will be much higher.
// multi-thread sync one partition data of tag or edge
for (partitionId <- 1 to partitions) {
val task = new Runnable {
def run(): Unit = {
syncTagPartitionData(spark,
........
partitionId
)
}
}
threadPool.execute(task);
}
//set special scan partitionId
val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
.builder()
.with.....
......
.withPartitionId(partitionId)
.build()
var vertex = spark.read.nebula(sourceConfig, nebulaReadVertexConfig).loadVerticesToDF()
// create task for special partition id
class SimpleScan(nebulaOptions: NebulaOptions, nebulaTotalPart: Int, schema: StructType)
extends Scan
with Batch {
override def planInputPartitions(): Array[InputPartition] = {
//return special partiton id for task
if (nebulaOptions.readPartitionId != null && nebulaOptions.readPartitionId > 0) {
LOG.info(s"planInputPartitions partions:${nebulaOptions.readPartitionId}")
return Array(NebulaPartitionBatch(Array(nebulaOptions.readPartitionId)))
}
.....
}
We may need withStorageAddress
, else, due to the StorageDs' addresses being fetched from MetaD only, it's not possible to reach those headless service addresses for spark running outside of the K8s cluster.
Spark Version: 3.2.1
We are using com.vesoft:nebula-spark-connector:3.0.0 in our pyspark application
We are getting ModuleNotFoundError error.
Thanks
Code snippet
conf = SparkConf()
conf.set("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.3.1,com.vesoft:nebula-spark-connector:3.0.0,com.vesoft:nebula-flink:3.0.0")
conf.set("spark.executor.extraClassPath", "com.vesoft:nebula-spark-connector:3.0.0,com.vesoft:nebula-flink:3.0.0")
# removed reset of the code for simplicity
spark = SparkSession.builder.config(conf=conf).getOrCreate()
# Prepare rdd
# removed reset of the code for simplicity
rdd.foreach(lambda f: process_blob(spark , f))
def process_blob(spark:SparkSession, parquet_file:str):
from com.vesoft.nebula.connector import NebulaOptions
# removed reset of the code for simplicity
Error
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in stage 0.0 failed 4 times, most recent failure: Lost task 18.3 in stage 0.0 (TID 73) (10.201.37.17 executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/worker.py", line 619, in main
process()
File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/worker.py", line 609, in process
out_iter = func(split_index, iterator)
File "/usr/local/spark/python/pyspark/rdd.py", line 2918, in pipeline_func
File "/usr/local/spark/python/pyspark/rdd.py", line 2918, in pipeline_func
File "/usr/local/spark/python/pyspark/rdd.py", line 2918, in pipeline_func
File "/usr/local/spark/python/pyspark/rdd.py", line 417, in func
File "/usr/local/spark/python/pyspark/rdd.py", line 917, in processPartition
File "/usr/local/spark/python/pyspark/util.py", line 74, in wrapper
File "<ipython-input-8-fa91e09b3d08>", line 16, in <lambda>
File "<ipython-input-12-4edd6eeef9a7>", line 3, in process_blob
ModuleNotFoundError: No module named 'com.vesoft.nebula.connector'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:713)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:695)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Description:
When attempting to pull or push data from another cluster to the Nebula cluster, we often encounter errors related to DNS unreachability.
Motivation:
This enhancement is necessary because these DNS-related errors can hinder data transfer operations between clusters. By modifying the Nebula Spark Connector build to dynamically replace the DNS addresses during connection attempts, we can overcome these issues and facilitate smooth data transfer between the Nebula cluster and external clusters.
Proposed Solution:
We propose modifying the Nebula Spark Connector build to dynamically replace the DNS addresses during both read and write operations. In read operations, the connector will be enhanced to include configurable storage addresses that can be replaced with the appropriate local DNS addresses at the time of connection. Similarly, in write operations, we will implement dynamic address allocation to ensure that data is pushed to the Nebula cluster through a mechanism that seamlessly handles local DNS resolution.
For read operations, this dynamic addressing will significantly improve the reliability and flexibility of retrieving data from the Nebula cluster, especially when dealing with remote clusters or external systems with varying DNS configurations.
For write operations, dynamic address allocation will enable the Nebula Spark Connector to adapt to changing DNS configurations during data transfers. This enhancement ensures a consistent and error-free data transfer process.
Additional Information:
This enhancement will significantly enhance the reliability and flexibility of data read and write operations within the Nebula Spark Connector. It will be especially beneficial when dealing with remote clusters or external systems that may require dynamic address allocation to resolve DNS issues.
Support Apache Spark 3.x
as title, or other names that won't cause confusion.
When using nebula-spark-connector data is not reaching the nebula storage with the right encoding.
The issue is only reproducing when spark is running in a mesos cluster.
By trying to do the following
val df = Seq((1,s"测试")).toDF("id","value")
df.write.nebula(config, nebulaWriteVertexConfig).writeVertices()
The value
arrives in nebula as ??
More env data:
spark version: 3.0.2
java version: 1.8.0_392
mesos 1.11
I have tried multiple combinations of env:
(spark 3.2.1 or mesos 1.10) the results have not changed
Data should arrive in nebula as 测试 not as ??
(please note that running with spark local or even on kubernetes works fine.)
现在工作大部分都用java来写的Spark ,希望能够提供java版本的api
能插入部分数据,有大概30个tag,能插入9个,后续就不断的time out
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR Executor: Exception in task 0.0 in stage 34.0 (TID 539)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 3.0 in stage 34.0 (TID 542)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 11.0 in stage 34.0 (TID 550)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 6.0 in stage 34.0 (TID 545)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 7.0 in stage 34.0 (TID 546)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 10.0 in stage 34.0 (TID 549)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 8.0 in stage 34.0 (TID 547)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 4.0 in stage 34.0 (TID 543)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 2.0 in stage 34.0 (TID 541)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 1.0 in stage 34.0 (TID 540)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 5.0 in stage 34.0 (TID 544)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 9.0 in stage 34.0 (TID 548)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 18 more
22/02/24 17:16:50 ERROR TaskSetManager: Task 4 in stage 34.0 failed 1 times; aborting job
22/02/24 17:16:50 ERROR WriteToDataSourceV2Exec: Data source writer com.vesoft.nebula.connector.writer.NebulaDataSourceVertexWriter@5cf70fbf is aborting.
22/02/24 17:16:50 ERROR NebulaDataSourceVertexWriter: NebulaDataSourceVertexWriter abort
22/02/24 17:16:50 ERROR WriteToDataSourceV2Exec: Data source writer com.vesoft.nebula.connector.writer.NebulaDataSourceVertexWriter@5cf70fbf aborted.
22/02/24 17:16:50 ERROR Utils: Aborting task
org.apache.spark.TaskKilledException
at org.apache.spark.TaskContextImpl.killTaskIfInterrupted(TaskContextImpl.scala:149)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:36)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.sort_addToSorter_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:80)
at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$13.apply(RDD.scala:845)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$13.apply(RDD.scala:845)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:117)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
22/02/24 17:16:50 ERROR DataWritingSparkTask: Aborting commit for partition 16 (task 555, attempt 0, stage 34.0)
22/02/24 17:16:50 ERROR NebulaVertexWriter: insert vertex task abort.
22/02/24 17:16:50 ERROR DataWritingSparkTask: Aborted commit for partition 16 (task 555, attempt 0, stage 34.0)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
Now only MATCH ()-[e:xx]->() return e LIMIT 10000
is supported, ideally we could parse all edges of arbitrary results to enable complex MATCH/FIND PATH/GET SUBGRAPH
to be leveraged, just add a whitelist of parsed edge schema.
点总共24个属性,但是通过com.vesoft.nebula.client.storage.StorageClient
Schema schema = metaManager.getTag(spaceName, tagName).getSchema();查询出来只有15个属性,封装过程中导致数组下标越界
以下是报错日志
22/05/14 19:50:43 INFO connector.NebulaDataSource: create reader
22/05/14 19:50:43 INFO connector.NebulaDataSource: options {spacename=AssoNet, nocolumn=false, metaaddress=xxxxxxx label=cust, type=vertex, connectionretry=2, timeout=6000, executionretry=1, paths=[], limit=10, returncols=, partitionnumber=14}
root
|-- _vertexId: string (nullable = false)
|-- cust_no: string (nullable = true)
|-- cust_name: string (nullable = true)
xxxxxx总共24个属性
22/05/14 19:50:44 INFO spark.ContextCleaner: Cleaned accumulator 1
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 158.8538 ms
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 7.8005 ms
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 22.0784 ms
22/05/14 19:50:44 INFO spark.SparkContext: Starting job: count at Nebula2HiveVertexTest.scala:54
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Registering RDD 7 (count at Nebula2HiveVertexTest.scala:54)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Got job 0 (count at Nebula2HiveVertexTest.scala:54) with 1 output partitions
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at Nebula2HiveVertexTest.scala:54)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[7] at count at Nebula2HiveVertexTest.scala:54), which has no missing parents
22/05/14 19:50:44 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 36.9 KB, free 1983.3 MB)
22/05/14 19:50:44 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 1983.3 MB)
22/05/14 19:50:44 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on SMY-U1-0040.smyoa.com:62328 (size: 13.9 KB, free: 1983.3 MB)
22/05/14 19:50:44 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Submitting 14 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[7] at count at Nebula2HiveVertexTest.scala:54) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13))
22/05/14 19:50:44 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 14 tasks
22/05/14 19:50:44 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:44 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:44 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
22/05/14 19:50:44 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 1 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 3 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 5 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 7 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 9 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 11 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 13 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 15 has not allocation host.
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 2, scanParts: List(2, 16, 30, 44, 58)
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 1, scanParts: List(1, 15, 29, 43, 57)
22/05/14 19:50:45 WARN storage.BlockManager: Putting block rdd_2_1 failed due to exception java.lang.ArrayIndexOutOfBoundsException: 16.
22/05/14 19:50:45 WARN storage.BlockManager: Block rdd_2_1 could not be removed as it was not found on disk or in memory
22/05/14 19:50:45 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
22/05/14 19:50:45 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:45 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
22/05/14 19:50:45 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
22/05/14 19:50:45 ERROR scheduler.TaskSetManager: Task 1 in stage 0.0 failed 1 times; aborting job
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 3, scanParts: List(3, 17, 31, 45, 59)
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
22/05/14 19:50:45 INFO executor.Executor: Executor is trying to kill task 0.0 in stage 0.0 (TID 0), reason: Stage cancelled
22/05/14 19:50:45 INFO executor.Executor: Executor is trying to kill task 2.0 in stage 0.0 (TID 2), reason: Stage cancelled
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
22/05/14 19:50:45 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at Nebula2HiveVertexTest.scala:54) failed in 0.499 s due to Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
程序执行截图
(https://user-images.githubusercontent.com/31941244/168424814-2a51fa2f-b8ea-4e3e-8653-81f8009934df.png)
(https://user-images.githubusercontent.com/31941244/168424821-7d8db61f-a6ec-49c9-816f-e0c1a194cf55.png)
(https://user-images.githubusercontent.com/31941244/168424834-7d890aee-1c87-4a11-b001-3cf1a28f332b.png)
Hi!
Are all releases of Nebula spark connector only compatible with spark 2.2/2.4?
Pyspark is best supported. Algorithmic people are familiar with Python and easy to use
as titled.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.