Coder Social home page Coder Social logo

zhouhang-chn / spark-the-definitive-guide Goto Github PK

View Code? Open in Web Editor NEW

This project forked from utopiazh/spark-the-definitive-guide

0.0 0.0 0.0 523.89 MB

Spark: The Definitive Guide's Code Repository

Home Page: http://shop.oreilly.com/product/0636920034957.do

License: Other

Python 33.36% Java 4.89% Scala 54.80% R 2.20% TSQL 4.76%

spark-the-definitive-guide's Introduction

Spark: The Definitive Guide

This is the central repository for all materials related to Spark: The Definitive Guide by Bill Chambers and Matei Zaharia.

This repository is currently a work in progress and new material will be added over time.

Spark: The Definitive Guide

Code from the book

You can find the code from the book in the code subfolder where it is broken down by language and chapter.

How to run the code

Run on your local machine

To run the example on your local machine, either pull all data in the data subfolder to /data on your computer or specify the path to that particular dataset on your local machine.

Run on Databricks

To run these modules on Databricks, you're going to need to do two things.

  1. Sign up for an account. You can do that here.
  2. Import individual Notebooks to run on the platform

Databricks is a zero-management cloud platform that provides:

  • Fully managed Spark clusters
  • An interactive workspace for exploration and visualization
  • A production pipeline scheduler
  • A platform for powering your favorite Spark-based applications

Instructions for importing

  1. Navigate to the notebook you would like to import

For instance, you might go to this page. Once you do that, you're going to need to navigate to the RAW version of the file and save that to your Desktop. You can do that by clicking the Raw button. Alternatively, you could just clone the entire repository to your local desktop and navigate to the file on your computer.

  1. Upload that to Databricks

Read the instructions here. Simply open the Databricks workspace and go to import in a given directory. From there, navigate to the file on your computer to upload it. Unfortunately due to a recent security upgrade, notebooks cannot be imported from external URLs. Therefore you must upload it from your computer.

  1. You're almost ready to go!

Now you just need to simply run the notebooks! All the examples run on Databricks Runtime 3.1 and above so just be sure to create a cluster with a version equal to or greater than that. Once you've created your cluster, attach the notebook.

  1. Replacing the data path in each notebook

Rather than you having to upload all of the data yourself, you simply have to change the path in each chapter from /data to /databricks-datasets/definitive-guide/data. Once you've done that, all examples should run without issue. You can use find and replace to do this very efficiently.

spark-the-definitive-guide's People

Contributors

bllchmbrs avatar neeleshkumar-mannur avatar hajimurtaza avatar abouklila avatar adigiosaffatte avatar evohnave avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.