Coder Social home page Coder Social logo

awesome-spark's Introduction

Awesome Spark Awesome

A curated list of awesome Apache Spark packages and resources.

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance (Wikipedia 2017).

Users of Apache Spark may choose between different the Python, R, Scala and Java programming languages to interface with the Apache Spark APIs.

Contents

Packages

Language Bindings

Notebooks and IDEs

  • Apache Zeppelin - Web-based notebook that enables interactive data analytics with plugable backends, integrated plotting, and extensive Spark support out-of-the-box.
  • Spark Notebook - Scalable and stable Scala and Spark focused notebook bridging the gap between JVM and Data Scientists (incl. extendable, typesafe and reactive charts).
  • sparkmagic - Jupyter magics and kernels for working with remote Spark clusters, for interactively working with remote Spark clusters through Livy, in Jupyter notebooks.

General Purpose Libraries

  • Succinct - Support for efficient queries on compressed data.

SQL Data Sources

Bioinformatics

  • ADAM - Set of tools designed to analyse genomics data.
  • Hail - Genetic analysis framework.

GIS

  • Magellan - Geospatial analytics using Spark.
  • GeoSpark - Cluster computing system for processing large-scale spatial data.

Time Series Analytics

  • Spark-Timeseries - Scala / Java / Python library for interacting with time series data on Apache Spark.
  • flint - A time series library for Apache Spark.

Graph Processing

  • Mazerunner - Graph analytics platform on top of Neo4j and GraphX.
  • GraphFrames - Data frame based graph API.
  • neo4j-spark-connector - Bolt protocol based, Neo4j Connector with RDD, DataFrame and GraphX / GraphFrames support.
  • SparklingGraph - Library extending GraphX features with multiple functionalities useful in graph analytics (measures, generators, link prediction etc.).

Machine Learning Extension

Middleware

  • Livy - REST server with extensive language support (Python, R, Scala), ability to maintain interactive sessions and object sharing.
  • spark-jobserver - Simple Spark as a Service which supports objects sharing using so called named objects. JVM only.
  • Mist - Service for exposing Spark analytical jobs and machine learning models as realtime, batch or reactive web services.
  • Apache Toree - IPython protocol based middleware for interactive applications.

Utilities

  • silex - Collection of tools varying from ML extensions to additional RDD methods.
  • sparkly - Helpers & syntactic sugar for PySpark.
  • pyspark-stubs - Static type annotations for PySpark.

Natural Language Processing

Streaming

  • Apache Bahir - Collection of the streaming connectors excluded from Spark 2.0 (Akka, MQTT, Twitter. ZeroMQ).

Interfaces

  • Apache Beam - Unified data processing engine supporting both batch and streaming applications. Apache Spark is one of the supported execution environments.
  • Blaze - Interface for querying larger than memory datasets using Pandas-like syntax. It supports both Spark DataFrames and RDDs.

Testing

Workflow Management

Resources

Books

Papers

MOOCS

Workshops

Projects Using Spark

  • Oryx 2 - Lambda architecture platform built on Apache Spark and Apache Kafka with specialization for real-time large scale machine learning.
  • Photon ML - A machine learning library supporting classical Generalized Mixed Model and Generalized Additive Mixed Effect Model.
  • PredictionIO - Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time.
  • Crossdata - Data integration platform with extended DataSource API and multi-user environment.

Blogs

  • Spark Technology Center - Great source of highly diverse posts related to Spark ecosystem. From practical advices to Spark commiter profiles.

Docker Images

Miscellaneous

References

Wikipedia. 2017. “Apache Spark — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=Apache_Spark&oldid=781182753.

License

Public Domain Mark
This work (Awesome Spark, by https://github.com/awesome-spark/awesome-spark), identified by Maciej Szymkiewicz, is free of known copyright restrictions.

Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. This compilation is not endorsed by The Apache Software Foundation.

Inspired by sindresorhus/awesome.

awesome-spark's People

Contributors

andypetrella avatar cbaenziger avatar cycorey avatar eliasah avatar jimmyho avatar mbonaci avatar mrpowers avatar mrsrinivas avatar oluies avatar riomus avatar spushkarev avatar veelenga avatar zero323 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.