Coder Social home page Coder Social logo

nebula-spark-connector's People

Contributors

amber1990zhang avatar artemkorsakov avatar codelone avatar cooper-lzy avatar darionyaphet avatar dutor avatar guojun85 avatar harrischu avatar jievince avatar jude-zhu avatar koustreak avatar laura-ding avatar liuxiaocs7 avatar nicole00 avatar oldlady344 avatar randomjoe211 avatar reid00 avatar riverzzz avatar shinji-ikarig avatar sophie-xie avatar wey-gu avatar whitewum avatar yixinglu avatar zhongqishang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nebula-spark-connector's Issues

#feature req 支持pyspark

论坛上提到的用python client扫描全部点和边的方式不适合分布式计算,希望能支撑pyspark 连接

nebula VertexID是String 类型时loadVerticesToGraphx 报错,类型转换失败。

问题描述
nebula 点集合VertexID类型是String, 在调用loadVerticesToGraphx 时候 报错 Long类型转换报错。

下面是修改后可以正常运行:type VertexID = String

package object connector {

type Address = (String, Int)
type NebulaType = Int
type Prop = List[Any]
type PropertyNames = List[String]
type PropertyValues = List[Any]

// 是否可以将VertextID 值改为 String 。 Long转String 没问题, 但String 未必能成功转换成Long
//type VertexID = Long
type VertexID = String
type VertexIDSlice = String
type NebulaGraphxVertex = (VertexID, PropertyValues)
type NebulaGraphxEdge = org.apache.spark.graphx.Edge[(EdgeRank, Prop)]
type EdgeRank = Long

Can Nebula2Nebula support multi-thread sync ?

reason
The current logic is to read all the partition data and then compose insert statements to write it out to new nebula .
It's possible to run out of memory if you have a large amount of data, and of course spark does overflow to disk, but I did run into OOM job interruptions. In addition, it does not give full play to the ability of multitasking, in fact, it is completely possible to split the partition, each partition a task, read a partition to write, the efficiency will be much higher.

// multi-thread sync  one partition data of tag or edge
for (partitionId <- 1 to partitions) {
      val task = new Runnable {
        def run(): Unit = {
          syncTagPartitionData(spark,
            ........
            partitionId
          )
        }
      }
 
      threadPool.execute(task);
    }

//set  special scan partitionId  
   val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
      .builder()
      .with.....
      ......
      .withPartitionId(partitionId) 
      .build()
    var vertex = spark.read.nebula(sourceConfig, nebulaReadVertexConfig).loadVerticesToDF()
 
  // create task for special partition id
   class SimpleScan(nebulaOptions: NebulaOptions, nebulaTotalPart: Int, schema: StructType)
  extends Scan
    with Batch {
  override def planInputPartitions(): Array[InputPartition] = {
    //return special partiton id for task
    if (nebulaOptions.readPartitionId != null && nebulaOptions.readPartitionId > 0) {
      LOG.info(s"planInputPartitions partions:${nebulaOptions.readPartitionId}")
      return Array(NebulaPartitionBatch(Array(nebulaOptions.readPartitionId)))
    }
    .....
  }
  

withStorageAddress introduced

We may need withStorageAddress, else, due to the StorageDs' addresses being fetched from MetaD only, it's not possible to reach those headless service addresses for spark running outside of the K8s cluster.

在代码中为什么要将 nebula 中类型全部转化为long类型

nebula-spark-common/src/main/scala/com/vesoft/nebula/connector/NebulaUtils.scala
在代码中为什么要将 nebula 中类型全部转化为long类型
case PropertyType.VID | PropertyType.INT8 | PropertyType.INT16 | PropertyType.INT32 | PropertyType.INT64 => LongType

图片
本身spark是支持这些类型的呀,求解

ModuleNotFoundError: No module named 'com.vesoft.nebula.connector'

Spark Version: 3.2.1

We are using com.vesoft:nebula-spark-connector:3.0.0 in our pyspark application
We are getting ModuleNotFoundError error.
Thanks

Code snippet

conf = SparkConf()
conf.set("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.3.1,com.vesoft:nebula-spark-connector:3.0.0,com.vesoft:nebula-flink:3.0.0")
conf.set("spark.executor.extraClassPath", "com.vesoft:nebula-spark-connector:3.0.0,com.vesoft:nebula-flink:3.0.0")

# removed reset of the code for simplicity

spark = SparkSession.builder.config(conf=conf).getOrCreate()

# Prepare rdd
# removed reset of the code for simplicity

rdd.foreach(lambda f: process_blob(spark , f))

def process_blob(spark:SparkSession, parquet_file:str):
    from com.vesoft.nebula.connector import NebulaOptions 
    # removed reset of the code for simplicity

Error

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in stage 0.0 failed 4 times, most recent failure: Lost task 18.3 in stage 0.0 (TID 73) (10.201.37.17 executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/worker.py", line 619, in main
    process()
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/worker.py", line 609, in process
    out_iter = func(split_index, iterator)
  File "/usr/local/spark/python/pyspark/rdd.py", line 2918, in pipeline_func
  File "/usr/local/spark/python/pyspark/rdd.py", line 2918, in pipeline_func
  File "/usr/local/spark/python/pyspark/rdd.py", line 2918, in pipeline_func
  File "/usr/local/spark/python/pyspark/rdd.py", line 417, in func
  File "/usr/local/spark/python/pyspark/rdd.py", line 917, in processPartition
  File "/usr/local/spark/python/pyspark/util.py", line 74, in wrapper
  File "<ipython-input-8-fa91e09b3d08>", line 16, in <lambda>
  File "<ipython-input-12-4edd6eeef9a7>", line 3, in process_blob
ModuleNotFoundError: No module named 'com.vesoft.nebula.connector'

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:713)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:695)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Dynamic Address Allocation for Local DNS in Nebula Spark Connector

Description:
When attempting to pull or push data from another cluster to the Nebula cluster, we often encounter errors related to DNS unreachability.

Motivation:

This enhancement is necessary because these DNS-related errors can hinder data transfer operations between clusters. By modifying the Nebula Spark Connector build to dynamically replace the DNS addresses during connection attempts, we can overcome these issues and facilitate smooth data transfer between the Nebula cluster and external clusters.

Proposed Solution:
We propose modifying the Nebula Spark Connector build to dynamically replace the DNS addresses during both read and write operations. In read operations, the connector will be enhanced to include configurable storage addresses that can be replaced with the appropriate local DNS addresses at the time of connection. Similarly, in write operations, we will implement dynamic address allocation to ensure that data is pushed to the Nebula cluster through a mechanism that seamlessly handles local DNS resolution.

For read operations, this dynamic addressing will significantly improve the reliability and flexibility of retrieving data from the Nebula cluster, especially when dealing with remote clusters or external systems with varying DNS configurations.

For write operations, dynamic address allocation will enable the Nebula Spark Connector to adapt to changing DNS configurations during data transfers. This enhancement ensures a consistent and error-free data transfer process.

Additional Information:
This enhancement will significantly enhance the reliability and flexibility of data read and write operations within the Nebula Spark Connector. It will be especially beneficial when dealing with remote clusters or external systems that may require dynamic address allocation to resolve DNS issues.

Encoding issue when using spark with mesos

When using nebula-spark-connector data is not reaching the nebula storage with the right encoding.
The issue is only reproducing when spark is running in a mesos cluster.

By trying to do the following

val df = Seq((1,s"测试")).toDF("id","value")
df.write.nebula(config, nebulaWriteVertexConfig).writeVertices()

The value arrives in nebula as ??

More env data:
spark version: 3.0.2
java version: 1.8.0_392
mesos 1.11

I have tried multiple combinations of env:
(spark 3.2.1 or mesos 1.10) the results have not changed

Data should arrive in nebula as 测试 not as ??

(please note that running with spark local or even on kubernetes works fine.)

Get Space Error Operation timed out

能插入部分数据,有大概30个tag,能插入9个,后续就不断的time out

论坛

22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:16:50 ERROR Executor: Exception in task 0.0 in stage 34.0 (TID 539)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 3.0 in stage 34.0 (TID 542)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 11.0 in stage 34.0 (TID 550)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 6.0 in stage 34.0 (TID 545)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 7.0 in stage 34.0 (TID 546)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 10.0 in stage 34.0 (TID 549)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 8.0 in stage 34.0 (TID 547)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 4.0 in stage 34.0 (TID 543)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 2.0 in stage 34.0 (TID 541)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 1.0 in stage 34.0 (TID 540)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 5.0 in stage 34.0 (TID 544)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR Executor: Exception in task 9.0 in stage 34.0 (TID 548)
com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
	at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:148)
	at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:168)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:230)
	at com.vesoft.nebula.connector.nebula.MetaProvider.getVidType(MetaProvider.scala:66)
	at com.vesoft.nebula.connector.writer.NebulaWriter.<init>(NebulaWriter.scala:40)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriter.<init>(NebulaVertexWriter.scala:18)
	at com.vesoft.nebula.connector.writer.NebulaVertexWriterFactory.createDataWriter(NebulaSourceWriter.scala:28)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:113)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
	... 18 more
22/02/24 17:16:50 ERROR TaskSetManager: Task 4 in stage 34.0 failed 1 times; aborting job
22/02/24 17:16:50 ERROR WriteToDataSourceV2Exec: Data source writer com.vesoft.nebula.connector.writer.NebulaDataSourceVertexWriter@5cf70fbf is aborting.
22/02/24 17:16:50 ERROR NebulaDataSourceVertexWriter: NebulaDataSourceVertexWriter abort
22/02/24 17:16:50 ERROR WriteToDataSourceV2Exec: Data source writer com.vesoft.nebula.connector.writer.NebulaDataSourceVertexWriter@5cf70fbf aborted.
22/02/24 17:16:50 ERROR Utils: Aborting task
org.apache.spark.TaskKilledException
	at org.apache.spark.TaskContextImpl.killTaskIfInterrupted(TaskContextImpl.scala:149)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:36)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.sort_addToSorter_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:80)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$13.apply(RDD.scala:845)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$13.apply(RDD.scala:845)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
	at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
	at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
	at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
	at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:117)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
22/02/24 17:16:50 ERROR DataWritingSparkTask: Aborting commit for partition 16 (task 555, attempt 0, stage 34.0)
22/02/24 17:16:50 ERROR NebulaVertexWriter: insert vertex task abort.
22/02/24 17:16:50 ERROR DataWritingSparkTask: Aborted commit for partition 16 (task 555, attempt 0, stage 34.0)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)
22/02/24 17:18:05 ERROR MetaClient: Get Space Error: java.net.ConnectException: Operation timed out (Connection timed out)

Spark-Connector:Compatible with spark2.2

  1. develop nebula-spark-connector with spark 2.2, which means use SparkSQL's DataSource v1 to extend Nebula datasource.
  2. nebula-spark-connector with spark 2.2 should be a dependent module in this repo.
  3. make this product's name as neubla-spark-connector_2.2.

spark-connector2.0查询点偶尔出现ArrayIndexOutOfBoundsException

点总共24个属性,但是通过com.vesoft.nebula.client.storage.StorageClient
Schema schema = metaManager.getTag(spaceName, tagName).getSchema();查询出来只有15个属性,封装过程中导致数组下标越界
以下是报错日志
22/05/14 19:50:43 INFO connector.NebulaDataSource: create reader
22/05/14 19:50:43 INFO connector.NebulaDataSource: options {spacename=AssoNet, nocolumn=false, metaaddress=xxxxxxx label=cust, type=vertex, connectionretry=2, timeout=6000, executionretry=1, paths=[], limit=10, returncols=, partitionnumber=14}
root
|-- _vertexId: string (nullable = false)
|-- cust_no: string (nullable = true)
|-- cust_name: string (nullable = true)
xxxxxx总共24个属性

22/05/14 19:50:44 INFO spark.ContextCleaner: Cleaned accumulator 1
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 158.8538 ms
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 7.8005 ms
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 22.0784 ms
22/05/14 19:50:44 INFO spark.SparkContext: Starting job: count at Nebula2HiveVertexTest.scala:54
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Registering RDD 7 (count at Nebula2HiveVertexTest.scala:54)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Got job 0 (count at Nebula2HiveVertexTest.scala:54) with 1 output partitions
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at Nebula2HiveVertexTest.scala:54)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[7] at count at Nebula2HiveVertexTest.scala:54), which has no missing parents
22/05/14 19:50:44 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 36.9 KB, free 1983.3 MB)
22/05/14 19:50:44 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 1983.3 MB)
22/05/14 19:50:44 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on SMY-U1-0040.smyoa.com:62328 (size: 13.9 KB, free: 1983.3 MB)
22/05/14 19:50:44 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Submitting 14 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[7] at count at Nebula2HiveVertexTest.scala:54) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13))
22/05/14 19:50:44 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 14 tasks
22/05/14 19:50:44 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:44 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:44 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
22/05/14 19:50:44 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 1 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 3 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 5 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 7 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 9 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 11 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 13 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 15 has not allocation host.
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 2, scanParts: List(2, 16, 30, 44, 58)
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 1, scanParts: List(1, 15, 29, 43, 57)
22/05/14 19:50:45 WARN storage.BlockManager: Putting block rdd_2_1 failed due to exception java.lang.ArrayIndexOutOfBoundsException: 16.
22/05/14 19:50:45 WARN storage.BlockManager: Block rdd_2_1 could not be removed as it was not found on disk or in memory
22/05/14 19:50:45 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
22/05/14 19:50:45 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:45 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
22/05/14 19:50:45 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

22/05/14 19:50:45 ERROR scheduler.TaskSetManager: Task 1 in stage 0.0 failed 1 times; aborting job
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 3, scanParts: List(3, 17, 31, 45, 59)
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
22/05/14 19:50:45 INFO executor.Executor: Executor is trying to kill task 0.0 in stage 0.0 (TID 0), reason: Stage cancelled
22/05/14 19:50:45 INFO executor.Executor: Executor is trying to kill task 2.0 in stage 0.0 (TID 2), reason: Stage cancelled
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
22/05/14 19:50:45 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at Nebula2HiveVertexTest.scala:54) failed in 0.499 s due to Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
程序执行截图
(https://user-images.githubusercontent.com/31941244/168424814-2a51fa2f-b8ea-4e3e-8653-81f8009934df.png)
(https://user-images.githubusercontent.com/31941244/168424821-7d8db61f-a6ec-49c9-816f-e0c1a194cf55.png)
(https://user-images.githubusercontent.com/31941244/168424834-7d890aee-1c87-4a11-b001-3cf1a28f332b.png)

Support pyspark

Pyspark is best supported. Algorithmic people are familiar with Python and easy to use

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.