Coder Social home page Coder Social logo

lightweight-spark-distrib's Introduction

lightweight-spark-distrib

lightweight-spark-distrib is a small application allowing to make Spark distributions more lightweight. From an existing Spark distribution, lightweight-spark-distrib looks for the JARs it contains and tries to find those on Maven Central. It then copies all files but the JARs it found on Maven Central to a new directory, and writes alongside them a script that relies on coursier to fetch the missing JARs.

The resulting Spark distributions are much more lightweight (~25 MB uncompressed / ~16 MB compressed) than their original counterpart (which usually weight more than 200 MB). As a consequence, the former are easier to distribute, and more easily benefit from mechanisms such as CI caches.

Generate a lightweight archive

$ scala-cli run \
    --workspace . \
    src \
    -- \
      --dest spark-3.0.3-bin-hadoop2.7-lightweight.tgz \
      https://archive.apache.org/dist/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz \
      --spark 3.0.3 \
      --scala 2.12.10 \
      --archive

Using a lightweight archive

Run the fetch-jars.sh script right before use. This script downloads missing JARs using coursier. It downloads coursier on its own if needed.

$ curl -fLo spark-distrib.tar.gz https://github.com/scala-cli/lightweight-spark-distrib/releases/download/v0.0.4/spark-2.4.2-bin-hadoop2.7-scala2.12.tgz
$ tar -zxf spark-distrib.tar.gz
$ cd spark-2.4.2-bin-hadoop2.7
$ ./fetch-jars.sh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.