Coder Social home page Coder Social logo

Comments (10)

GMYL avatar GMYL commented on August 16, 2024 3

Solved
Put SPARK_ RAPIDS_ PLUGIN_ JAR and SPARK_ RAPIDS_ UDF_ EXAMPLES_ JARs are placed in the peer directory, which solves my problem.

from spark-rapids-examples.

GaryShen2008 avatar GaryShen2008 commented on August 16, 2024 1

I tried once on my local machine, the building failed but didn't reproduce the same failure.

@GMYL
Seems you used [INFO] [exec] -- The CUDA compiler identification is NVIDIA 11.0.194.
I think the cudf 22.06 requires 11.5 according to https://github.com/rapidsai/cudf/blob/branch-22.06/CONTRIBUTING.md#general-requirements.

Can you use the docker container from the dockerfile?

from spark-rapids-examples.

GMYL avatar GMYL commented on August 16, 2024

I tried once on my local machine, the building failed but didn't reproduce the same failure.

@GMYL Seems you used [INFO] [exec] -- The CUDA compiler identification is NVIDIA 11.0.194. I think the cudf 22.06 requires 11.5 according to https://github.com/rapidsai/cudf/blob/branch-22.06/CONTRIBUTING.md#general-requirements.

Can you use the docker container from the dockerfile?

Yes, the current version of CUDA for the server environment is really 11.0+.
Currently my server is running spark-rapids-22.06 with CUDA 11.0+, spark-sql and some simple examples of rapids udf.
Now that I want to use RAPIDS-accelerated-UDFs, at first I just ran the MVN package for the RAPIDS-accelerated-UDFs project, running the StringWordCount.java example in the jar package where the executor failed due to the missing libudfexamplesjni.so problem.
Then trying to run mvn clean package -Pudf-native-examples resulted in the above error, it needs to be noted that my server network is limited and I cannot access the Internet.
If the problem with the CUDA version is the cause, then I need to upgrade the CUDA and try compiling it.
Dockerfile has not been used yet.

from spark-rapids-examples.

GMYL avatar GMYL commented on August 16, 2024

@GaryShen2008
Reference link https://github.com/rapidsai/cudf/blob/branch-22.06/CONTRIBUTING.md#general-requirements
Re-detect and upgrade the software version on the server,
Now the software version information on the server is as follows
Compilers:
gcc version 10.1.0
nvcc version 11.5.119
cmake version 3.24.2

CUDA/GPU:
CUDA 11.5
NVIDIA driver 495.29.05
GPU Tesla T4

Executing mvn clean package -Pudf-native-examples still appears
"Unknown CMake command "CPMFindPackage"." Is there any other way I can run the string_word_count example in the project?

Latest error message:

[WARNING] Some problems were encountered while building the effective settings
[WARNING] expected START_TAG or END_TAG not TEXT (position: TEXT seen ...\n\t\n \ua0 \ua0 <i... @110:9) @ /home/ssd3/software/apache-maven-3.6.3/conf/settings.xml, line 110, column 9
[WARNING]
[INFO] Scanning for projects...
[INFO]
[INFO] ------------< com.nvidia:rapids-4-spark-udf-examples_2.12 >-------------
[INFO] Building RAPIDS Accelerator for Apache Spark UDF Examples 22.06.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] The POM for org.slf4j:slf4j-api:jar:1.6.1 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for commons-lang:commons-lang:jar:2.6 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ rapids-4-spark-udf-examples_2.12 ---
[INFO]
[INFO] --- maven-antrun-plugin:3.0.0:run (cmake) @ rapids-4-spark-udf-examples_2.12 ---
[INFO] Executing tasks
[INFO] [mkdir] Created dir: /home/ssd3/target/gmy/2206/spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs/target/cpp-build
[INFO] [exec] -- The C compiler identification is GNU 10.1.0
[INFO] [exec] -- The CXX compiler identification is GNU 10.1.0
[INFO] [exec] -- The CUDA compiler identification is NVIDIA 11.5.119
[INFO] [exec] -- Detecting C compiler ABI info
[INFO] [exec] -- Detecting C compiler ABI info - done
[INFO] [exec] -- Check for working C compiler: /usr/bin/cc - skipped
[INFO] [exec] -- Detecting C compile features
[INFO] [exec] -- Detecting C compile features - done
[INFO] [exec] -- Detecting CXX compiler ABI info
[INFO] [exec] -- Detecting CXX compiler ABI info - done
[INFO] [exec] -- Check for working CXX compiler: /usr/bin/c++ - skipped
[INFO] [exec] -- Detecting CXX compile features
[INFO] [exec] -- Detecting CXX compile features - done
[INFO] [exec] -- Detecting CUDA compiler ABI info
[INFO] [exec] -- Detecting CUDA compiler ABI info - done
[INFO] [exec] -- Check for working CUDA compiler: /usr/local/cuda-11.5/bin/nvcc - skipped
[INFO] [exec] -- Detecting CUDA compile features
[INFO] [exec] -- Detecting CUDA compile features - done
[INFO] [exec] -- CUDA_VERSION_MAJOR: 11
[INFO] [exec] -- CUDA_VERSION_MINOR: 5
[INFO] [exec] -- CUDA_VERSION: 11.5
[INFO] [exec] -- Configuring incomplete, errors occurred!
[INFO] [exec] See also "/home/ssd3/target/gmy/2206/spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs/target/cpp-build/CMakeFiles/CMakeOutput.log".
[INFO] [exec] CMake Error at /home/ssd3/target/gmy/2206/spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs/target/cpp-build/_deps/rapids-cmake-src/rapids-cmake/cpm/find.cmake:152 (CPMFindPackage):
[INFO] [exec] Unknown CMake command "CPMFindPackage".
[INFO] [exec] Call Stack (most recent call first):
[INFO] [exec] CMakeLists.txt:87 (rapids_cpm_find)
[INFO] [exec]
[INFO] [exec]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:46 min
[INFO] Finished at: 2022-10-14T16:45:40+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.0.0:run (cmake) on project rapids-4-spark-udf-examples_2.12: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...... @ 5:167 in /home/ssd3/target/gmy/2206/spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

from spark-rapids-examples.

GaryShen2008 avatar GaryShen2008 commented on August 16, 2024

@GMYL Can you try to use Dockerfile to build?
I succeeded to build it by using Docker.
Just follow up the instructions here.
If you build branch-22.06, you need to upgrade the cmake version to 3.23.1 at least in the Dockerfile.

from spark-rapids-examples.

GMYL avatar GMYL commented on August 16, 2024

@GaryShen2008
Can you take a look at your target directory structure after a successful build?

from spark-rapids-examples.

GaryShen2008 avatar GaryShen2008 commented on August 16, 2024

@GaryShen2008 Can you take a look at your target directory structure after a successful build?

root@babba52ba9fc:/spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs/target# ll
total 640
drwxr-xr-x 10 root root 4096 Oct 14 14:33 ./
drwxr-xr-x 4 root root 4096 Oct 14 12:54 ../
drwxr-xr-x 2 root root 4096 Oct 14 12:54 antrun/
drwxr-xr-x 4 root root 4096 Oct 14 14:33 classes/
drwxr-xr-x 11 root root 4096 Oct 14 14:28 cpp-build/
drwxr-xr-x 2 root root 4096 Oct 14 14:33 dependency/
drwxr-xr-x 3 root root 4096 Oct 14 14:33 generated-sources/
drwxr-xr-x 2 root root 4096 Oct 14 14:33 maven-archiver/
drwxr-xr-x 3 root root 4096 Oct 14 14:33 maven-status/
drwxr-xr-x 3 root root 4096 Oct 14 14:33 native-deps/
-rw-r--r-- 1 root root 613835 Oct 14 14:33 rapids-4-spark-udf-examples_2.12-22.06.0-SNAPSHOT.jar

from spark-rapids-examples.

nvliyuan avatar nvliyuan commented on August 16, 2024

the error "Unknown CMake command "CPMFindPackage" seems that the CMakeLists.txt cannot include the related rapids-makefiles such as:

include(rapids-cmake)
include(rapids-cpm)
include(rapids-cuda)
include(rapids-export)
include(rapids-find)

since you update the https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-22.10/RAPIDS.cmake
to https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-22.06/RAPIDS.cmake in the CMakeLists.txt which is out of date, could you try the latest branch-22.10/RAPIDS.cmake file?

from spark-rapids-examples.

nvliyuan avatar nvliyuan commented on August 16, 2024

the correct output log should be :

.........
[INFO]      [exec] -- CUDA_VERSION_MINOR: 5
[INFO]      [exec] -- CUDA_VERSION: 11.5
[INFO]      [exec] -- CPM: adding package [email protected] (branch-22.10)
[INFO]      [exec] -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.5.119")
[INFO]      [exec] -- Looking for pthread.h
[INFO]      [exec] -- Looking for pthread.h - found
[INFO]      [exec] -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
[INFO]      [exec] -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
[INFO]      [exec] -- Check if compiler accepts -pthread
[INFO]      [exec] -- Check if compiler accepts -pthread - yes
[INFO]      [exec] -- Found Threads: TRUE
[INFO]      [exec] -- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11")
[INFO]      [exec] -- CPM: cudf: adding package [email protected] (jitify2)
[INFO]      [exec] -- CPM: cudf: using local package [email protected]
[INFO]      [exec] -- CPM: cudf: adding package [email protected] (1.17.2)
[INFO]      [exec] -- Found Thrust: /spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs/target/cpp-build/_deps/thrust-src/thrust/cmake/thrust-config.cmake (found version "1.17.2.0")
[INFO]      [exec] -- Found CUB: /spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs/target/cpp-build/_deps/thrust-src/dependencies/cub/cub/cmake/cub-config.cmake (found suitable version "1.17.2.0", minimum required is "1.17.2.0")
[INFO]      [exec] -- CPM: cudf: adding package [email protected] (branch-22.10)
[INFO]      [exec] -- RMM: RMM_LOGGING_LEVEL = 'INFO'
[INFO]      [exec] -- CPM: cudf: rmm: adding package [email protected] (v1.8.5)
[INFO]      [exec] -- Build spdlog: 1.8.5
[INFO]      [exec] -- Build type: Release
[INFO]      [exec] -- Generating install
[INFO]      [exec] -- CPM: cudf: adding package [email protected] (apache-arrow-9.0.0)

from spark-rapids-examples.

GMYL avatar GMYL commented on August 16, 2024

@GaryShen2008 @nvliyuan
Thanks for your help. It has been compiled successfully. The main problem is my server network problem. Some dependencies cannot be downloaded normally during the compilation process.

But a new problem arises when running jar:
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Could not locate native dependency amd64/Linux/libudfexamplesjni.so

Use spark sql to test StringWordCount and sql statements:
CREATE TEMPORARY FUNCTION wordcount AS 'com.nvidia.spark.rapids.udf.hive.StringWordCount';
select wordcount(rowkey) from perceive.hdfs_bayonet_vehiclepass where prtday between 20210309 and 20210309 group by rowkey limit 10;

There is no problem with the drive execution. The log displays:
*Expression HiveSimpleUDF#com.nvidia.spark.rapids.udf.hive.StringWordCount(rowkey#0) AS wordcount(rowkey)#158 will run on GPU

An error occurred during the execution of executor.

Jar package structure:
image

The error log is as follows:
22/10/19 15:06:01 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 15) (xx.xx.xx.xx executor 0): org.apache.spark.SparkException: Failed to execute user defined function (StringWordCount: (string) => int)
at com.nvidia.spark.rapids.GpuUserDefinedFunction.$anonfun$columnarEval$4(GpuUserDefinedFunction.scala:69)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
at org.apache.spark.sql.hive.rapids.GpuHiveSimpleUDF.withResource(hiveUDFs.scala:44)
at com.nvidia.spark.rapids.GpuUserDefinedFunction.$anonfun$columnarEval$2(GpuUserDefinedFunction.scala:57)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:46)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:44)
at org.apache.spark.sql.hive.rapids.GpuHiveSimpleUDF.withResource(hiveUDFs.scala:44)
at com.nvidia.spark.rapids.GpuUserDefinedFunction.columnarEval(GpuUserDefinedFunction.scala:55)
at com.nvidia.spark.rapids.GpuUserDefinedFunction.columnarEval$(GpuUserDefinedFunction.scala:53)
at org.apache.spark.sql.hive.rapids.GpuHiveSimpleUDF.columnarEval(hiveUDFs.scala:44)
at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
at com.nvidia.spark.rapids.GpuAlias.columnarEval(namedExpressions.scala:109)
at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
at com.nvidia.spark.rapids.GpuExpressionsUtils$.columnarEvalToColumn(GpuExpressions.scala:93)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.$anonfun$finalProjectBatch$5(aggregate.scala:538)
at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1(implicits.scala:216)
at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1$adapted(implicits.scala:213)
at scala.collection.immutable.List.foreach(List.scala:431)
at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.safeMap(implicits.scala:213)
at com.nvidia.spark.rapids.RapidsPluginImplicits$AutoCloseableProducingSeq.safeMap(implicits.scala:248)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.$anonfun$finalProjectBatch$4(aggregate.scala:535)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.withResource(aggregate.scala:181)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.$anonfun$finalProjectBatch$1(aggregate.scala:534)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.withResource(aggregate.scala:181)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.finalProjectBatch(aggregate.scala:510)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.next(aggregate.scala:262)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.next(aggregate.scala:181)
at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:241)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:187)
at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:238)
at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:215)
at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:255)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Could not locate native dependency amd64/Linux/libudfexamplesjni.so
at com.nvidia.spark.rapids.udf.java.NativeUDFExamplesLoader.ensureLoaded(NativeUDFExamplesLoader.java:34)
at com.nvidia.spark.rapids.udf.hive.StringWordCount.evaluateColumnar(StringWordCount.java:77)
at com.nvidia.spark.rapids.GpuUserDefinedFunction.$anonfun$columnarEval$4(GpuUserDefinedFunction.scala:59)
... 53 more
Caused by: java.io.FileNotFoundException: Could not locate native dependency amd64/Linux/libudfexamplesjni.so
at ai.rapids.cudf.NativeDepsLoader.createFile(NativeDepsLoader.java:210)
at ai.rapids.cudf.NativeDepsLoader.loadDep(NativeDepsLoader.java:181)
at ai.rapids.cudf.NativeDepsLoader.loadNativeDeps(NativeDepsLoader.java:129)
at com.nvidia.spark.rapids.udf.java.NativeUDFExamplesLoader.ensureLoaded(NativeUDFExamplesLoader.java:31)
... 55 more

from spark-rapids-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.