Coder Social home page Coder Social logo

settingjupyternotebookforpyspark's Introduction

SettingJupyterNotebookForPyspark

Setting Jupyter Notebook for pyspark programming

23 June 2018

We often install local copies of Spark for experimentation and exploration. In particular, we do so when a new version of Spark is released. This small note describes how to set up the Jupyter notebook to program in pyspark. I assume that you have already installed and configured Jupyter notebook. If not have a quick look at Installing Jupyter.

Installing spark

Download the version of the spark that you wish to work with from the Download Apache Spark page. I am downloading spark version 2.3.1.

Spark 2.3.1 download page

We will end up with a .tgz file, extract the file in a suitable directory. I have saved the file into a directory called /path/to/installation/dir/.

tar -xzf spark-2.3.1-bin-hadoop2.7.tgz

I am a mac/linux user and normally add the following lines to my .bash_profile to be able to invoke spark-shell and pyspark from my terminal.

alias spark-shell='/path/to/installation/dir/spark-2.3.1-bin-hadoop2.7/bin/spark-shell'
alias pyspark='/path/to/installation/dir/spark-2.3.1-bin-hadoop2.7/bin/pyspark'

Using Environments

To avoid collision between packages we will set up a conda environment. If you are not familiar with the Conda Environments check out Managing Environment. There is a short tutorial about using of Conda and managment of conda environments.

Follow these steps

Create an enviroment and give it a name of your choice. Here we call it spark_2.3.1_test.

conda create -n spark_2.3.1_test

Check to see if the environment is listed and then activate it.

conda env list
conda activate spark_2.3.1_test

Then install ipykernel.

conda install ipykernel
python -m ipykernel install --name spark_2.3.1_test

Replace spark_2.3.1_test with the name of your environment.

We also need to install findspark.

conda install -c conda-forge findspark

I normally use Apache Arrow. If you also use Arrow install it.

conda install -c conda-forge pyarrow

Now your environment is ready for use. Move to the folder that you wish to code at. Type jupyter notebook. Create a new notebook using the create environment as follows.

Creating a new notebook using spark_2.3.1_test

Write a step in the middle about choosing the environment in your Jupyter notebook use findspark and use init() to access. The path is the address of the local version.

import findspark
findspark.init("/path/to/installation/dir/spark-2.3.1-bin-hadoop2.7")
import pyspark
sc = pyspark.SparkContext() ...

Note, if you add SPARK_HOME into your .bash_profile you can simply by not passing the path and write findspark.init(). Add the following lines to your .bash_profile.

export SPARK_HOME=//path/to/installation/dir/spark-2.3.1-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

Running spark

Have fun!

settingjupyternotebookforpyspark's People

Contributors

bbordbar avatar

Watchers

 avatar

Forkers

donscara

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.