This repository contains native support code for the RAPIDS Accelerator for Apache Spark.
nvidia / spark-rapids-jni Goto Github PK
View Code? Open in Web Editor NEWRAPIDS Accelerator JNI For Apache Spark
License: Apache License 2.0
RAPIDS Accelerator JNI For Apache Spark
License: Apache License 2.0
This repository contains native support code for the RAPIDS Accelerator for Apache Spark.
Is your feature request related to a problem? Please describe.
For better supporting non-interactive automated use, I wish the Fault Injection Tool allowed
cuInit
As a part of our build we download boost source code. Ideally we should have a checksum with it too so we can verify that nothing changed.
This is not quite a bug, instead something needs to be improved as we have compiler warnings:
cast_string.cu(121): warning #186-D: pointless comparison of unsigned integer with zero
[INFO] [exec] detected during:
[INFO] [exec] instantiation of "void spark_rapids_jni::detail::string_to_integer_kernel(T *, cudf::bitmask_type *, const char *, const cudf::offset_type *, const cudf::bitmask_type *, cudf::size_type, __nv_bool) [with T=uint8_t]"
We're going to first release a spark-rapids jar for Arm in 22.12.
But we don't have time to adjust the modules in spark-rapids-jni.
For 22.12, we won't release a arm-based jar of spark-rapids-jni, but only release the arm64 jar of rapids-4-spark jar, which packages everything together.
So, we should provide a doc to explain how to build on ARM for the case that a customer wants to build Arm-based jar.
We'll plan to adjust the modules in spark-rapids-jni in 23.02.
After rapidsai/cudf#10884 CudaFatalTest is generating a fatal CUDA error which causes subsequent tests using CUDA APIs to fail. We need to port the pom changes from the cudf PR into the spark-rapids-jni pom.
Is your feature request related to a problem? Please describe.
work w/ blossom team to get pre-merge blossom CI work
Is your feature request related to a problem? Please describe.
The executables built with the build-in-docker script probably have a different environment than the host. This results in them being unable to find libraries that are required to run.
$ target/cmake-build/gtests/ROW_CONVERSION
target/cmake-build/gtests/ROW_CONVERSION: error while loading shared libraries: libcudf.so: cannot open shared object file: No such file or directory
$ ldd !$
ldd target/cmake-build/gtests/ROW_CONVERSION
linux-vdso.so.1 (0x00007ffe2abf3000)
libcudf.so => not found
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe277626000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fe27760a000)
libnvcomp.so => not found
libnvcomp_gdeflate.so => not found
libnvcomp_bitcomp.so => not found
libcudart.so.11.0 => not found
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe277602000)
libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fe275e38000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe275e15000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe275c33000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe275ae2000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe275ac7000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe2758d5000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe277646000)
Describe the solution you'd like
If the docker image used the same path structure as the host, the host would find the runpath information in the elf to match up and be able to find the libraries needed.
Describe alternatives you've considered
It is also possible to just force running everything inside of the docker environment via scripts
We have a lot of noise in the build logs we upload and link to GH workflow runs
https://github.com/NVIDIA/spark-rapids-jni/runs/8214034914?check_suite_focus=true#step:2:20347
...
239/283 KB 60/60 KB 350/350 KB 454/1473 KB 127/127 KB
239/283 KB 60/60 KB 350/350 KB 458/1473 KB 127/127 KB
243/283 KB 60/60 KB 350/350 KB 458/1473 KB 127/127 KB
243/283 KB 60/60 KB 350/350 KB 462/1473 KB 127/127 KB
243/283 KB 60/60 KB 350/350 KB 466/1473 KB 127/127 KB
...
std::thread
instead of pthreadsglobalControl
singletonh/t @mythrocks for suggestions in #399
Describe the bug
nightly failed UT w/ 9e593b3
10:50:06 [ERROR] testCudaAsyncMemoryResourceSize Time elapsed: 0.008 s <<< ERROR!
10:50:06 ai.rapids.cudf.CudfException: CUDA error at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-pre_release-4-cuda11/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/detail/dynamic_load_runtime.hpp:139: cudaErrorInvalidValue invalid argument
10:50:06 at ai.rapids.cudf.Rmm.initializeInternal(Native Method)
10:50:06 at ai.rapids.cudf.Rmm.initialize(Rmm.java:119)
10:50:06 at ai.rapids.cudf.RmmTest.testCudaAsyncMemoryResourceSize(RmmTest.java:392)
10:50:06 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
10:50:06 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
10:50:06 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
10:50:06 at java.lang.reflect.Method.invoke(Method.java:498)
10:50:06 at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
10:50:06 at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
10:50:06 at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
10:50:06 at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
10:50:06 at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
10:50:06 at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
10:50:06 at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
10:50:06 at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
10:50:06 at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
10:50:06 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
10:50:06 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
10:50:06 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
10:50:06 at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
10:50:06 at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
10:50:06 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
10:50:06 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
10:50:06 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
10:50:06 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
10:50:06 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
10:50:06 at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
10:50:06 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
10:50:06 at java.util.ArrayList.forEach(ArrayList.java:1259)
10:50:06 at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
10:50:06 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
10:50:06 at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
10:50:06 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
10:50:06 at java.util.ArrayList.forEach(ArrayList.java:1259)
10:50:06 at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
10:50:06 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
10:50:06 at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
10:50:06 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
10:50:06 at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
10:50:06 at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
10:50:06 at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
10:50:06 at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
10:50:06 at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:220)
10:50:06 at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:188)
10:50:06 at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:202)
10:50:06 at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:181)
10:50:06 at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
10:50:06 at org.junit.platform.surefire.provider.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:155)
10:50:06 at org.junit.platform.surefire.provider.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:134)
10:50:06 at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:383)
10:50:06 at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:344)
10:50:06 at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
10:50:06 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:417)
10:50:06
Is your feature request related to a problem? Please describe.
After cudf CI of 22.12 is available, we should create a PR,
git submodule update --remote --merge
nvcc: Eliminating unused kernels https://developer.nvidia.com/blog/reducing-application-build-times-using-cuda-c-compilation-aids/
gcc: https://gcc.gnu.org/onlinedocs/gnat_ugn/Compilation-options.html
--ffunction-sections
and --fdata-sections
--gc-sections
Describe the bug
https://github.com/NVIDIA/spark-rapids-jni/runs/5877737893?check_suite_focus=true
******** JOB LOGS with sensitive data redacted **************
In order for the RAPIDS Accelerator to start depending on the spark-rapids-jni artifact instead of cudf, we need to publish it so it can be downloaded during the RAPIDS Accelerator builds. Nightly builds should be setup to publish the spark-rapids-jni snapshot jar as we have done for cudf.
Describe the bug
Spark CPU handling of string to float conversions has some odd behavior around the max values of floats and doubles. The GPU kernel needs to match these behaviors.
Steps/Code to reproduce bug
val df = Seq("1.7976931348623158E308", "1.79769313486231581E308", "1.7976931348623157E308", "1.7976931348623159E308", "-1.7976931348623158E308", "-1.79769313486231581E308", "-1.7976931348623157E308", "-1.7976931348623159E308", "1.7976931348623158E-308", "1.79769313486231581E-308", "1.7976931348623157E-308", "1.7976931348623159E-308").toDF
df.coalesce(1).selectExpr("*", "CAST(value as double)").show(truncate = false)
+------------------------+-----------------------+
|value |value |
+------------------------+-----------------------+
|1.7976931348623158E308 |1.7976931348623157E308 |
|1.79769313486231581E308 |Infinity |
|1.7976931348623157E308 |1.7976931348623157E308 |
|1.7976931348623159E308 |Infinity |
|-1.7976931348623158E308 |-1.7976931348623157E308|
|-1.79769313486231581E308|-Infinity |
|-1.7976931348623157E308 |-1.7976931348623157E308|
|-1.7976931348623159E308 |-Infinity |
|1.7976931348623158E-308 |1.797693134862316E-308 |
|1.79769313486231581E-308|1.797693134862316E-308 |
|1.7976931348623157E-308 |1.7976931348623155E-308|
|1.7976931348623159E-308 |1.797693134862316E-308 |
+------------------------+-----------------------+
spark.conf.set("spark.rapids.sql.enabled", "true")
df.coalesce(1).selectExpr("*", "CAST(value as double)").show(truncate = false)
+------------------------+----------------+
|value |value |
+------------------------+----------------+
|1.7976931348623158E308 |Infinity |
|1.79769313486231581E308 |1.797693135E308 |
|1.7976931348623157E308 |1.797693135E308 |
|1.7976931348623159E308 |Infinity |
|-1.7976931348623158E308 |-Infinity |
|-1.79769313486231581E308|-1.797693135E308|
|-1.7976931348623157E308 |-1.797693135E308|
|-1.7976931348623159E308 |-Infinity |
|1.7976931348623158E-308 |1.797693135e-308|
|1.79769313486231581E-308|1.797693135e-308|
|1.7976931348623157E-308 |1.797693135e-308|
|1.7976931348623159E-308 |1.797693135e-308|
+------------------------+----------------+
Of note, 1.79769313486231580E308
results in 1.7976931348623157E308
, but 1.7976931348623158E308
results in Infinity
, which is interesting. This implies that there are some special cases in the code around the edges.
Expected behavior
CPU and GPU conversions should match
New cudf PR (rapidsai/cudf#12002) has a new option DCUDF_JNI_ENABLE_PROFILING
. That option has also been added into cudf Java build config: https://github.com/rapidsai/cudf/pull/12002/files#diff-d8225ebcfc11b480a6e4f54e183b67c3ead51635a167c106d928c2abf1f9ef66R459. We should also update it here too.
Is your feature request related to a problem? Please describe.
create a ticket to discuss the CICD requirement for faultinj tooling.
There is still some question for the tooling,
A. The artifact is a .so file, where should we deploy it to?
internal only or external artifactory store? Or we ask developers to build it whenever they want the tool?
B. What is the plan for this tooling? do we have plan to release it?
do we have some roadmap for it? like what we are trying to achieve in next release
C. We have several scenarios in design doc, but there is still no specific test specs (SW&HW) and expectation to make sure we have deterministic regular runs nightly. It would be nice to have some tables to clarify the details to help define the scenarios instead of simply giving a command.
e.g.
spark test w/ some specific configs
some faultinj specific configs
driver 450.xx
ubuntu 18.04
GPU w/ 12Gi mem
should return error count X. Then if using driver 465.yy/centos7/24Gi-mem gpu, it should return error count Y/Z/A
Or explicit saying that like cuda/OS/GPU types do not matter here, or we do not care about error count, or if test error out then all the setup meets our expectations. Then we could have a regular run for it
Thanks
When row conversions for strings was added the goal was implementation speed over operational speed. Now that there is a working version, some investigation into the performance of the kernel is warranted. Investiagtions:
When building spark-rapids-jni jar on an arm64 instance, the following error is thrown:
[INFO] [exec] CMake Error at faultinj/CMakeLists.txt:39 (target_link_libraries):
[INFO] [exec] Target "cufaultinj" links to:
[INFO] [exec]
[INFO] [exec] CUDA::cupti_static
[INFO] [exec]
[INFO] [exec] but the target was not found. Possible reasons include:
[INFO] [exec]
[INFO] [exec] * There is a typo in the target name.
[INFO] [exec] * A find_package call is missing for an IMPORTED target.
[INFO] [exec] * An ALIAS target is missing.
This is due to the lack of static libcupti in nvidia/cuda arm64 docker images(e.g. nvidia/cuda:11.5.2-devel-ubuntu18.04) and only a static link way is provided in the cmake file.
We can add a conditional link for arm64 architectures.
Describe the bug
Running CuFileTest results in
ai.rapids.cudf.CudfException: cuDF failure at: /rapids/spark-rapids-jni/thirdparty/cudf/java/src/main/native/src/CuFileJni.cpp:162: Failed to register cuFile handle: internal error
Steps/Code to reproduce bug
PARALLEL_LEVEL=6 ./build/build-in-docker clean install -DGPU_ARGCHS=native -Dtest=ai.rapids.cudf.CuFileTest#testCopyToFile
Expected behavior
should pass.
Environment details (please complete the following information)
Additional context
Probably need a cudf issue to have exceptions include the path on the filesystem.
The parquet footer parser feature is experimental and we need to harden it.
This means we need to
That last one could be rather complicated as it will require some interface changes to the parser where we need to pass down the schema of the data, not just the names of the columns.
Now that c++ tests are being added to spark-rapids-jni
, it would be nice to have a single script to run that would build the library, tests, and run the tests. This could be leveraged in CI.
Is your feature request related to a problem? Please describe.
I wish the Fault Injection Tool allowed comma-separated values for substitute Return Code such that I don't need a verbose list of separate rules for each one individually
Describe the solution you'd like
"cuLaunchKernel_ptsz": {
"percent": 1,
"injectionType": 2,
"substituteReturnCode": "2,3,999" ,
"interceptionCount": 1000
}
with the semantic: if the rule is matched pick one of the values randomly
Describe alternatives you've considered
replicate the rule per return code
Is your feature request related to a problem? Please describe.
follow-up of #2
Leverage ninja-build
to speed up cudf build
Since this project builds libcudf and libcudfjni, ideally the Dockerfile used for this project should derive from the Dockerfile used for the nightly cudf Java jar builds. Doing so would require publishing the cudf Java Docker image so it can be referenced in this repository's Dockerfile.
Is your feature request related to a problem? Please describe.
I wish RAPIDS Accelerator JNI for Apache Spark would be built with ccache
support introduced in rapidsai/cudf#10790
Describe the solution you'd like
Provide an option or environment variable to easily turn on/off ccache via build-in-docker
Describe alternatives you've considered
A one-off personal script
Additional context
Sometimes one needs to remove the build directory to fix build issues. Always being able to clean build without taking a productivity hit prevents issues from occurring to begin with.
Describe the bug
The string to float code fails when parsing the string "E15" resulting in a value of 0 instead of a null value.
Is your feature request related to a problem? Please describe.
Intermediate branches for auto-merged should be auto deleted by the action itself.
But if the PR was closed by other like manual conflict fix, we should have a way to auto clean up them
It would be nice to have tools to format and precommit checks to enforce a C++ style as was done in the RAPIDS cudf repository with clang-format.
Describe the bug
Standard implementation in antrun plugin does not propagate error output to Maven. Therefore error output is logged as info.
The actual ERROR log from Maven only includes the return code 1 without explanation.
Steps/Code to reproduce bug
Run build when cudf submodule is stale
[INFO] ------------------------------------------------------------------------
[INFO] Building RAPIDS Accelerator JNI for Apache Spark 22.08.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-antrun-plugin:3.0.0:run (submodule check) @ spark-rapids-jni ---
[INFO] Executing tasks
[INFO] [exec] ERROR: submodules out of date: +dba4eea4a5db9e1b3ceb8ceb8f2762cf86b91170 thirdparty/cudf (v0.12.0-16695-gdba4eea4a5). To fix: git submodule update --init --recursive
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.741s
[INFO] Finished at: Mon Jul 18 19:06:16 UTC 2022
[INFO] Final Memory: 15M/477M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.0.0:run (submodule check) on project spark-rapids-jni: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...<exec failonerror="true" dir="/home/gshegalov/gits/NVIDIA/spark-rapids-jni" executable="/home/gshegalov/gits/NVIDIA/spark-rapids-jni/build/submodule-check"></exec>... @ 4:161 in /home/gshegalov/gits/NVIDIA/spark-rapids-jni/target/antrun/build-main.xml
[ERROR] -> [Help 1]
Expected behavior
ERROR output should include diagnostics beyond the result code.
This can be achieved by turning off failonerror, capturing result code and error in properties of exec and having an additional check
<fail message="$Exit code: {exitCode}, Error message: ${errorMsg}">
<condition>
<not>
<equals arg1="${exitCode}" arg2="0"/>
</not>
</condition>
</fail>
Environment details (please complete the following information)
Additional context
N/A
Is your feature request related to a problem? Please describe.
I wish CUDA Fault Injector allowed configurable non-console logger sink such as a file.
Describe the solution you'd like
Add some config key "logSink" with values such as
stderr can be the default
Describe alternatives you've considered
N/A. Logging to the console are sometimes convenient for a demo. but never for production.
Additional context
N/A
The CICD should include:
Is your feature request related to a problem? Please describe.
When testing local builds of spark-rapids-jni with Spark, we currently have to carefully examine that the spar-rapids build consumed the local build output of spark-rapids-jni instead of a downloaded dependency from a Maven Central repo.
Describe the solution you'd like
This proposes to use the mechanism described in Maven CI Friendly Versions, which should be straightforward in spark-rapids-jni since it's a single-module build.
If the version is defined to the tune of from the doc
<version>${revision}${sha1}${changelist}</version>
...
<properties>
<revision>22.08.0</revision>
<changelist>-SNAPSHOT</changelist>
<sha1/>
</properties>
the user can produce a sufficiently unique local artifact in order not to worry about collisions with the SNAPSHOTs from Central
by adding -Dsha1="-$(git rev-parse --short HEAD)"
Then when building spark-rapids repo the user will specify -Dspark-rapids-jni.version=22.08.0-6453047ef-SNAPSHOT
Describe alternatives you've considered
Continue watching build info output at run time carefully
Additional context
Numerous confusions due to version inconsistencies
rapidsai/cudf#10877 is changing PER_THREAD_DEFAULT_STREAM to CUDF_USE_PER_THREAD_DEFAULT_STREAM. We need to update the spark-rapids-jni build accordingly and should switch to the new flag name.
Describe the bug
CPU code is able to convert the string "-2.21363921575273728E17" without issue, but the kernel produces a null value.
Is your feature request related to a problem? Please describe.
The main issue is NVIDIA/spark-rapids#5639 and it lays out kernels we would like to create to improve performance of casting in spark-rapids. This issue is for the integer kernel.
Describe the solution you'd like
A kernel should be created to convert strings to integers for reading CSV and JSON.
Is your feature request related to a problem? Please describe.
Spark supports hexadecimal for strings being cast to floats. Strings like 0x1p0
. The values is HEX VALUE * 2 ^ EXP
as defined in this PR comment Note that the parsing code needs to support intermixed decimal and hexadecimal strings.
Describe the solution you'd like
The float parsing kernel should be augmented to support these string values.
Additional context
https://www.exploringbinary.com/hexadecimal-floating-point-constants/
Is your feature request related to a problem? Please describe.
The main issue is NVIDIA/spark-rapids#5639 and it lays out kernels we would like to create to improve performance of casting in spark-rapids. This issue is for the float/double kernel.
Describe the solution you'd like
A kernel should be created to convert strings to floats
Describe the bug
spark-rapids-jni_nightly-dev ID 184 failed at,
16:28:05 [INFO] [exec] -- Found JNI: /usr/lib/jvm/java/jre/lib/amd64/libjawt.so
16:28:05 [INFO] [exec] -- JDK with JNI in /usr/lib/jvm/java/include;/usr/lib/jvm/java/include/linux;/usr/lib/jvm/java/include
16:28:05 [INFO] [exec] -- Found nvcomp: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib/cmake/nvcomp/nvcomp-config.cmake (found version "2.3.3")
16:28:05 [INFO] [exec] -- Found Thrust: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/include/libcudf/Thrust/thrust/cmake/thrust-config.cmake (found version "1.15.0.0")
16:28:05 [INFO] [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:10 (file):
16:28:05 [INFO] [exec] file failed to open for reading (No such file or directory):
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/src/main/cpp/_THRUST_VERSION_INCLUDE_DIR-NOTFOUND/thrust/version.h
16:28:05 [INFO] [exec] Call Stack (most recent call first):
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:30 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05 [INFO] [exec] CMakeLists.txt:104 (rapids_find_package)
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:17 (math):
16:28:05 [INFO] [exec] math cannot parse the expression: " / 100000": syntax error, unexpected
16:28:05 [INFO] [exec] exp_DIVIDE (2).
16:28:05 [INFO] [exec] Call Stack (most recent call first):
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:30 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05 [INFO] [exec] CMakeLists.txt:104 (rapids_find_package)
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:18 (math):
16:28:05 [INFO] [exec] math cannot parse the expression: "( / 100) % 1000": syntax error,
16:28:05 [INFO] [exec] unexpected exp_DIVIDE (3).
16:28:05 [INFO] [exec] Call Stack (most recent call first):
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:30 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05 [INFO] [exec] CMakeLists.txt:104 (rapids_find_package)
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:19 (math):
16:28:05 [INFO] [exec] math cannot parse the expression: " % 100": syntax error, unexpected
16:28:05 [INFO] [exec] exp_MOD (2).
16:28:05 [INFO] [exec] Call Stack (most recent call first):
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:30 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05 [INFO] [exec] CMakeLists.txt:104 (rapids_find_package)
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:10 (file):
16:28:05 [INFO] [exec] file failed to open for reading (No such file or directory):
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/src/main/cpp/_THRUST_VERSION_INCLUDE_DIR-NOTFOUND/thrust/version.h
16:28:05 [INFO] [exec] Call Stack (most recent call first):
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05 [INFO] [exec] /usr/local/cmake-3.22.3-linux-x86_64/share/cmake-3.22/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:31 (find_dependency)
16:28:05 [INFO] [exec] -- Found rmm: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake (found version "22.10.0") /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05 [INFO] [exec] CMakeLists.txt:104 (rapids_find_package)
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:17 (math):
16:28:05 [INFO] [exec] math cannot parse the expression: " / 100000": syntax error, unexpected
16:28:05 [INFO] [exec] exp_DIVIDE (2).
16:28:05 [INFO] [exec] Call Stack (most recent call first):
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05 [INFO] [exec] /usr/local/cmake-3.22.3-linux-x86_64/share/cmake-3.22/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:31 (find_dependency)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05 [INFO] [exec] CMakeLists.txt:104 (rapids_find_package)
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:18 (math):
16:28:05 [INFO] [exec] math cannot parse the expression: "( / 100) % 1000": syntax error,
16:28:05 [INFO] [exec] unexpected exp_DIVIDE (3).
16:28:05 [INFO] [exec] Call Stack (most recent call first):
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05 [INFO] [exec] /usr/local/cmake-3.22.3-linux-x86_64/share/cmake-3.22/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:31 (find_dependency)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05 [INFO] [exec] CMakeLists.txt:104 (rapids_find_package)
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:19 (math):
16:28:05 [INFO] [exec] math cannot parse the expression: " % 100": syntax error, unexpected
16:28:05 [INFO] [exec] exp_MOD (2).
16:28:05 [INFO] [exec] Call Stack (most recent call first):
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05 [INFO] [exec] /usr/local/cmake-3.22.3-linux-x86_64/share/cmake-3.22/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:31 (find_dependency)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05 [INFO] [exec] /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05 [INFO] [exec] CMakeLists.txt:104 (rapids_find_package)
16:28:05 [INFO] [exec]
16:28:05 [INFO] [exec]
16:28:06 [INFO] [exec] -- Check if compiler accepts -pthread
16:28:06 [INFO] [exec] -- Check if compiler accepts -pthread - yes
16:28:06 [INFO] [exec] -- Found libcudacxx: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/libcudacxx/libcudacxx-config.cmake (found version "1.7.0")
16:28:06 [INFO] [exec] -- Found cuco: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cuco/cuco-config.cmake (found version "0.0.1")
16:28:06 [INFO] [exec] -- Found cuFile: /usr/local/cuda/lib64/libcufile.so
16:28:06 [INFO] [exec] -- Found KvikIO: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/kvikio/kvikio-config.cmake (found version "22.10.0")
16:28:06 [INFO] [exec] -- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7")
16:28:06 [INFO] [exec] -- Found cudf: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake (found version "22.10.0")
16:28:06 [INFO] [exec] -- Found GTest: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/GTest/GTestConfig.cmake (found version "1.10.0")
16:28:06 [INFO] [exec] -- Found Boost: /usr/local/lib/cmake/Boost-1.79.0/BoostConfig.cmake (found version "1.79.0")
16:28:06 [INFO] [exec] -- Configuring incomplete, errors occurred!
16:28:06 [INFO] [exec] See also "/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/CMakeFiles/CMakeOutput.log".
16:28:06 [INFO] [exec] See also "/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/CMakeFiles/CMakeError.log".
16:28:06 [INFO] ------------------------------------------------------------------------
16:28:06 [INFO] BUILD FAILURE
16:28:06 [INFO] ------------------------------------------------------------------------
16:28:06 [INFO] Total time: 1:26:13.167s
16:28:06 [INFO] Finished at: Sat Aug 13 08:28:06 UTC 2022
16:28:07 [INFO] Final Memory: 19M/654M
16:28:07 [INFO] ------------------------------------------------------------------------
16:28:07 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.0.0:run (build-sparkrapidsjni) on project spark-rapids-jni: An Ant BuildException has occured: exec returned: 1
16:28:07 [ERROR] around Ant part ...<exec failonerror="true" dir="/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build" executable="cmake">... @ 5:152 in /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/antrun/build-main.xml
16:28:07 [ERROR] -> [Help 1]
16:28:07 [ERROR]
16:28:07 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
16:28:07 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
16:28:07 [ERROR]
16:28:07 [ERROR] For more information about the errors and possible solutions, please read the following articles:
16:28:07 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
One question that came up during review is how does one who's used to working in cudf apply/test their custom changes to cudf against the RAPIDS Accelerator when it's no longer using cudf directly but instead using the spark-rapids-jni artifact. There should be documentation (and ideally scripts if the steps are complicated or hard to remember) on how one can port changes against the cudf repo, either changes to a local cudf repository or a pending cudf PR, into this repo.
Describe the bug
In the native parquet footer we try to convert UTF-8 characters to lower case. The only way to do this I found was to convert the data to wide characters, do the lower case conversion, and convert them back to multi-byte (UTF-8).
Recently when looking at the code and docs again I found the following...
https://en.cppreference.com/w/cpp/string/multibyte/mbstowcs
In most implementations, this function updates a global static object of type std::mbstate_t as it processes through the string, and cannot be called simultaneously by two threads, std::mbsrtowcs should be used in such cases.
We are using the non r
versions to do the conversions. This is only for a small amount of metadta, but we should switch to the r versions of these functions to be safe.
Another big area where for processing the footer is predicate push down. It would be great if we could push down the predicates and filter out row groups that do not match before sending data back to java. We could also drop all of the column chunk statistics after this because they are not going to be needed and it would save both time and memory to serialize and de-serialize them again.
To be clear this does not include work for bloom filters or the dictionary predicate checks. The dictionary checks are something that we will keep in java. Bloom filters is something that we need to investigate more.
Setup an automation release workflow for jni
Is your feature request related to a problem? Please describe.
in several discussions with CUDF we have come to the conclusion that the CSV parser is not likely to get a lot of love/fixes any time soon unless we do those fixes ourselves. We have some goals to support the Hive Text format in the next release 23.02, but with the complexity in CUDF parser I think it is going to be simpler for us to write a custom parser ourselves in the short term, and target it directly at the Hive Text file format, specifically the default settings for the HiveTextFile format. We can discuss other settings that might be common with the HiveTextFile format.
Describe the solution you'd like
I would like to have an API that takes a String column as input (we already have split each of the rows), and list of columns to keep. It would then return a table of string columns that we would then parse further into smaller parts. The main goal would be to split on the record deliminator, and handle quotes and escapes correctly.
Describe alternatives you've considered
We fix all of the bugs and new features in CUDF that are needed to do this.
Currently, the Java binding and JNI in the cudf repository expand significantly. Over years of development, Java source files and JNI cpp files contain thousands LOC in each file. Almost all the functionalities of column/table/etc. are put into the same file. With the increasing number of LOC, things become more and more unorganized. Nowadays, it is very difficult to check for coverage of JNI/Java binding of the same category (such as string functions) because doing so would require scanning through several thousand LOC in several files.
Having a fresh repository, I believe that we can do much better by implementing things from scratch. I suggest that we organize the new Java binding functions and JNI cpp functions by categories, similar to what cudf is doing. For example: string functions would be put in one source file, struct functions would be put in another source file, list functions would be put in different file again, and so on. The way we organize functions can closely follow cudf so we can easily trace back the binding coverage for cudf.
The solution for such organization is very simple:
getNativeHandle
) in the old classes. We then create additional Java classes like CudfStrings
, CudfStructs
etc. which reflect corresponding cudf's namespaces. The cudf functions (like strings::split
) will be bound as static member functions of these new classes (so we will have CudfStrings::split
) which will operate on free input columns as function arguments.The solution above is breaking changes but it is very simple to implement and I believe that it can be a significant improvement for the codebase organization. For long-term development, we should make such breaking changes ASAP so the cost of building things from scratch can be minimal.
Is your feature request related to a problem? Please describe.
Building benchmarks in spark-rapids-jni
requires building of the cudf benchmarks in order to get the libcudf_datagen
library. The cudf team isn't interested in making this library public or decoupling it from the cudf benchmarks, so a copy of this could should be brought over to spark-rapids-jni
. The data generation code doesn't change often and seems preferable to building all the cudf benchmarks, which currently takes a non-trivial amount of time that will surely get longer over time.
This was discussed in #331 and an attempt was made at that time to bring this files internal to spark-rapids-jni
, but compilation issue arose that put it outside of the scope of that work.
Is your feature request related to a problem? Please describe.
I wish the Fault Injector would allow specifying the seed for the random number generator.
And it should log the default time(0) value. If incorrect failure handling scenario is discovered during an automated run, the developer will be able to reproduce the sequence of interceptions if the CUDA app is otherwise deterministic.
Describe the solution you'd like
top level config "seed"
Describe alternatives you've considered
Hope that the failure is not rare enough as to depend on the exact sequence of faults
Additional context
N/A
It was brought up in reviews for 10871 that the some of the kernels running for the row to column conversion could be run concurrently. This could provide a performance boost and should be investigated. The row to column string kernel is using information computed by the fixed-width copy kernel, but others have no dependencies.
This requests something similar to what was implemented in rapidsai/cudf#9873 for cudf. Ideally spark-rapids-jni should provide a single shared library that statically links the CUDA runtime.
Is your feature request related to a problem? Please describe.
There should be a way to determine if a key is not present in a map column.
Describe the solution you'd like
Create a method similar to contains(scalar)
that takes in a vector. This is necessary when we want to handle the case when a key isn't present and want to show the user an error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.