rapids-shell

A utility to start RAPIDS-enabled Spark Shell with access to unit tests resources from https://github.com/NVIDIA/spark-rapids Before running the examples make sure to at least execute mvn package in your local spark-rapids repo if you are not using binaries.

Comand line options

See rapids.sh --help for up to date information

Usage: rapids.sh [OPTION]
Options:
  --debug
    enable bash tracing
  -h, --help
    prints this message
  -l4j=LOG4J_CONF_FILE, --log4j-file=LOG4J_CONF_FILE
    LOG4J_CONF_FILE location of a custom log4j config for local mode
  -nsys, --nsys-profile
    run with Nsights profile
  -m=MASTER, --master=MASTER
    specify MASTER for spark command, default is local[-cluster], see --num-local-execs
  -n, --dry-run
    generates and prints the spark submit command without executing
  -nle=N, --num-local-execs=N
    specify the number of local executors to use, default is 2. If > 1 use pseudo-distributed
    local-cluster, otherwise local[*]
  -uecp, --use-extra-classpath
    use extraClassPath instead of --jars to add RAPIDS jars to spark-submit (default)
  -uj, --use-jars
    use --jars instead of extraClassPath to add RAPIDS jars to spark-submit
  --ucx-shim=spark<3xy>
    Spark buildver to populate shim-dependent package name of RapidsShuffleManager.
    Will be replaced by a Boolean option
  -cmd=CMD, --spark-command=CMD
    specify one of spark-submit (default), spark-shell, pyspark, jupyter, jupyter-lab
  -dopts=EOPTS, --driver-opts=EOPTS
    pass EOPTS as --driver-java-options
  -eopts=EOPTS, --executor-opts=EOPTS
    pass EOPTS as spark.executor.extraJavaOptions
  --gpu-fraction=GPU_FRACTION
    GPU share per executor JVM unless local or local-cluster mode, see spark.rapids.memory.gpu.allocFraction

Environment variables

SPARK_RAPIDS_HOME - the path either to the local repo or to the location used for downloading the binaries
SPARK_HOME - the path either to the local Spark repo or to the root fo binary distro
SPARK_CMD - one of spark-shell, spark-submit (default), pyspark, jupyter, jupyter-lab

Examples

Use Spark RAPIDS in Jupyter notebook

SPARK_HOME=~/spark-3.1.1-bin-hadoop3.2 SPARK_CMD=jupyter[-lab] rapids.sh

Run in pseudo-distirbuted local-cluster mode

NUM_LOCAL_EXECS=2 SPARK_HOME=~/spark-3.1.1-bin-hadoop3.2 rapids.sh

Allow attaching a java debugger to the driver JVM

JDBSTR=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 SPARK_HOME=~/spark-3.1.1-bin-hadoop3.2 rapids.sh

Running Spark RAPIDS ScalaTests in `spark-shell` once started

Single test suite

scala> run(new com.nvidia.spark.rapids.InsertPartition311Suite)
InsertPartition311Suite:
...

Single test case

scala> run(new com.nvidia.spark.rapids.HashAggregatesSuite, "sum(floats) group by more_floats 2 partitions")
HashAggregatesSuite:
...

Using integration test datagens

In pyspark based drivers one can use data generators from spark-rapids/integration-tests or run whole pytests.

Add rapids.py as an ipython startup file, e.g. on *NIX

cp src/python/rapids.py ~/.ipython/profile_default/startup/

Datagen

key_data_gen = StructGen([
        ('a', IntegerGen(min_val=0, max_val=4)),
        ('b', IntegerGen(min_val=5, max_val=9)),
    ], nullable=False)
val_data_gen = IntegerGen()
df = two_col_df(spark, key_data_gen, val_data_gen)

...

Pytest

runpytest('test_struct_count_distinct')

wjxiz1992 / rapids-shell Goto Github PK

rapids-shell's Introduction

rapids-shell

Comand line options

Environment variables

Examples

Running Spark RAPIDS ScalaTests in `spark-shell` once started

Using integration test datagens

Datagen

Pytest

rapids-shell's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

wjxiz1992 / rapids-shell Goto Github PK

rapids-shell's Introduction

rapids-shell

Comand line options

Environment variables

Examples

Running Spark RAPIDS ScalaTests in spark-shell once started

Using integration test datagens

Datagen

Pytest

rapids-shell's People

Contributors

Stargazers

Recommend Projects

Recommend Topics

Recommend Org

Running Spark RAPIDS ScalaTests in `spark-shell` once started