Install Spark 1.4.1 with hadoop 2.4.0 pandas and jupyter
Requirements
- User running the scripts should be in the sudoers list.
- Setup hostname and fqdn
- Make sure to review/update the config files
Usage
Copy this repo:
$ git clone git://github.com/ezhaar/spark-installer
Run the install script:
$ cd spark-installer;./install
Go grab a coffee.
What Happened?
- Boostraps environment variables
- Installed Jdk-1.7 and set Java Path
- Downloaded, installed and configured hadoop-2.4.0 in
/usr/local/hadoop
and update PATH. - Downloaded, installed and configured Scala-2.10.3.
- Downloaded, installed and configured Spark-1.4.1 standalone
Post Install
Make sure to update the slaves file in /user/local/hadoop/etc/hadoop/
Switch to the newly created hduser and cd to home directory:
$ cd; source ~/.bashrc $ fab create_hdfs_dirs # if running on single node $ fab init_local $ /usr/local/spark/sbin/start-all.sh $ ./start_notebook.sh
Now you should be able to access the jupyter notebook on localhost:9999