Coder Social home page Coder Social logo

rapids-shell's Introduction

rapids-shell

A utility to start RAPIDS-enabled Spark Shell with access to unit tests resources from https://github.com/NVIDIA/spark-rapids Before running the examples make sure to at least execute mvn package in your local spark-rapids repo if you are not using binaries.

Comand line options

See rapids.sh --help for up to date information

Usage: rapids.sh [OPTION]
Options:
  --debug
    enable bash tracing
  -h, --help
    prints this message
  -l4j=LOG4J_CONF_FILE, --log4j-file=LOG4J_CONF_FILE
    LOG4J_CONF_FILE location of a custom log4j config for local mode
  -nsys, --nsys-profile
    run with Nsights profile
  -m=MASTER, --master=MASTER
    specify MASTER for spark command, default is local[-cluster], see --num-local-execs
  -n, --dry-run
    generates and prints the spark submit command without executing
  -nle=N, --num-local-execs=N
    specify the number of local executors to use, default is 2. If > 1 use pseudo-distributed
    local-cluster, otherwise local[*]
  -uecp, --use-extra-classpath
    use extraClassPath instead of --jars to add RAPIDS jars to spark-submit (default)
  -uj, --use-jars
    use --jars instead of extraClassPath to add RAPIDS jars to spark-submit
  --ucx-shim=spark<3xy>
    Spark buildver to populate shim-dependent package name of RapidsShuffleManager.
    Will be replaced by a Boolean option
  -cmd=CMD, --spark-command=CMD
    specify one of spark-submit (default), spark-shell, pyspark, jupyter, jupyter-lab
  -dopts=EOPTS, --driver-opts=EOPTS
    pass EOPTS as --driver-java-options
  -eopts=EOPTS, --executor-opts=EOPTS
    pass EOPTS as spark.executor.extraJavaOptions
  --gpu-fraction=GPU_FRACTION
    GPU share per executor JVM unless local or local-cluster mode, see spark.rapids.memory.gpu.allocFraction

Environment variables

  • SPARK_RAPIDS_HOME - the path either to the local repo or to the location used for downloading the binaries

  • SPARK_HOME - the path either to the local Spark repo or to the root fo binary distro

  • SPARK_CMD - one of spark-shell, spark-submit (default), pyspark, jupyter, jupyter-lab

Examples

Use Spark RAPIDS in Jupyter notebook

SPARK_HOME=~/spark-3.1.1-bin-hadoop3.2 SPARK_CMD=jupyter[-lab] rapids.sh

Run in pseudo-distirbuted local-cluster mode

NUM_LOCAL_EXECS=2 SPARK_HOME=~/spark-3.1.1-bin-hadoop3.2 rapids.sh

Allow attaching a java debugger to the driver JVM

JDBSTR=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 SPARK_HOME=~/spark-3.1.1-bin-hadoop3.2 rapids.sh

Running Spark RAPIDS ScalaTests in spark-shell once started

Single test suite

scala> run(new com.nvidia.spark.rapids.InsertPartition311Suite)
InsertPartition311Suite:
...

Single test case

scala> run(new com.nvidia.spark.rapids.HashAggregatesSuite, "sum(floats) group by more_floats 2 partitions")
HashAggregatesSuite:
...

Using integration test datagens

In pyspark based drivers one can use data generators from spark-rapids/integration-tests or run whole pytests.

Add rapids.py as an ipython startup file, e.g. on *NIX

cp src/python/rapids.py ~/.ipython/profile_default/startup/

Datagen

key_data_gen = StructGen([
        ('a', IntegerGen(min_val=0, max_val=4)),
        ('b', IntegerGen(min_val=5, max_val=9)),
    ], nullable=False)
val_data_gen = IntegerGen()
df = two_col_df(spark, key_data_gen, val_data_gen)

...

Pytest

runpytest('test_struct_count_distinct')

rapids-shell's People

Contributors

gerashegalov avatar mythrocks avatar

Stargazers

Allen Xu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.