lresende / ansible-spark-cluster Goto Github PK
View Code? Open in Web Editor NEWAnsible roles to install an Spark Standalone cluster (HDFS/Spark/Jupyter Notebook) or Ambari based Spark cluster
License: Apache License 2.0
Ansible roles to install an Spark Standalone cluster (HDFS/Spark/Jupyter Notebook) or Ambari based Spark cluster
License: Apache License 2.0
Thanks for this solution, we love the idea of connecting jupyter notebooks to spark cluster. I gone through the following ansible playbook and i was able to get the setup up and running on gcloud compute engine. Right now i am facing an issue while trying to connecting my jupyter notebook to the cluster [W 21:49:37.661 NotebookApp] Error loading kernelspec 'python3'
and i am not able to catch what i am missing. I follow up the following instruction to connect my notebook:
export KG_URL=http://spark-master:8888 export KG_HTTP_USER=elyra export KG_HTTP_PASS= export KG_REQUEST_TIMEOUT=30 export KERNEL_USERNAME=${KG_HTTP_USER} jupyter notebook \ --NotebookApp.session_manager_class=nb2kg.managers.SessionManager \ --NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager \ --NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager
Any help on this?
The following failure is occurring when attempt to run the setup-ambari-cluster playbook:
fatal: [elyra-wtf2]: FAILED! => {"changed": false, "dest": "/tmp/ansible-install/jdk-8u144-linux-x64.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "http://download.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.rpm"}
After thinking this might be because 8u144 is no longer available, I attempted to update the files to use 8u152 after determining the appropriate build identifier and md5 hash, but got the same kind of issue:
fatal: [elyra-wtf2]: FAILED! => {"changed": false, "dest": "/tmp/ansible-install/jdk-8u152-linux-x64.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "http://download.oracle.com/otn-pub/java/jdk/8u152-b16/b6979be30bdc4077dc93cd99134ad84d/jdk-8u152-linux-x64.rpm"}
Since there doesn't seem to be a way to "get the latest", it would be nice to figure out a better way to determine the download url and what is causing the 404
exception.
The workaround is to download the appropriate rpm
file to /tmp/ansible-install
on each node, then update roles/common/defaults/main.yml
with any file name changes, roles/common/tasks/main.yml
to not delete the install_temp_dir
directory, and roles/common/tasks/java.yml
to not delete the rpm
from the install_temp_dir
and not perform the download.
Use standard ansible scripts to setup the environment on a brand new stack/cluster with 4 nodes based on Redhat 7.3. Here are the steps from @sxguo to setup the environment on the master node:
root
.Add(read uncomment) the following configuration in /etc/ansible/ansible.cfg
[defaults]
host_key_checking = False
hash_behaviour = merge
git clone https://github.com/lresende/spark-cluster-install
on your local machinespark-cluster-install
folder on local machine and upload the archive to the master nodespark-cluster-install
folder on the master nodecd spark-cluster-install
on the master nodehosts-fyre-spark
with node names/ips for your cluster on the master nodeansible-playbook --verbose setup-ambari-cluster.yml -i hosts-fyre-spark
on the master nodeOnce this is done, start Enterprise Gateway on the master node as shown below:
$ cd /opt/elyra/bin
$ start_elyra.sh
This will result in the following exception:
[E 2017-10-13 15:10:28.215 EnterpriseGatewayApp] Exception 'AuthenticationException'
occurred when creating a SSHClient connecting to '172.16.193.76' with user 'elyra',
message='Authentication failed.'.
Note that EG_REMOTE_USER
is set to elyra
in the /opt/elyra/bin/start_elyra.sh
. Change the vaue of EG_REMOTE_USER
to root
in /opt/elyra/bin/start_elyra.sh
and run it again. This time, it will launch Enterprise Gateway successfully.
My hosts:
[master]
holycow-node-1 ansible_host=holycow-node-1.fyre.ibm.com ansible_host_id=1
[nodes]
holycow-node-2 ansible_host=holycow-node-2.fyre.ibm.com ansible_host_id=2
holycow-node-3 ansible_host=holycow-node-3.fyre.ibm.com ansible_host_id=3
holycow-node-4 ansible_host=holycow-node-4.fyre.ibm.com ansible_host_id=4
Command:
ansible-playbook --verbose setup-ambari.yml -i hosts-fyre -c paramiko
Log:
TASK [ambari : restart ambari-server on master node] **************************************************************************************************************************************
skipping: [holycow-node-2] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
skipping: [holycow-node-3] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
skipping: [holycow-node-4] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
changed: [holycow-node-1] => {"changed": true, "cmd": "/usr/sbin/ambari-server restart", "delta": "0:00:20.229347", "end": "2017-11-09 17:05:04.540365", "failed": false, "rc": 0, "start": "2017-11-09 17:04:44.311018", "stderr": "", "stderr_lines": [], "stdout": "Using python /usr/bin/python\nRestarting ambari-server\nAmbari Server is not running\nAmbari Server running with administrator privileges.\nOrganizing resource files at /var/lib/ambari-server/resources...\nAmbari database consistency check started...\nServer PID at: /var/run/ambari-server/ambari-server.pid\nServer out at: /var/log/ambari-server/ambari-server.out\nServer log at: /var/log/ambari-server/ambari-server.log\nWaiting for server start......................\nServer started listening on 8081\n\nDB configs consistency check: no errors and warnings were found.", "stdout_lines": ["Using python /usr/bin/python", "Restarting ambari-server", "Ambari Server is not running", "Ambari Server running with administrator privileges.", "Organizing resource files at /var/lib/ambari-server/resources...", "Ambari database consistency check started...", "Server PID at: /var/run/ambari-server/ambari-server.pid", "Server out at: /var/log/ambari-server/ambari-server.out", "Server log at: /var/log/ambari-server/ambari-server.log", "Waiting for server start......................", "Server started listening on 8081", "", "DB configs consistency check: no errors and warnings were found."]}
TASK [ambari : verify connection to ambari-server port 8081] ******************************************************************************************************************************
Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': ok: [holycow-node-1] => {"attempts": 1, "cache_control": "no-store", "changed": false, "connection": "close", "content_type": "text/plain", "cookies": {"AMBARISESSIONID": "5v***ew2a63t"}, "expires": "Thu, 01 Jan 1970 00:00:00 GMT", "failed": false, "msg": "OK (unknown bytes)", "pragma": "no-cache", "redirected": false, "set_cookie": "AMBARISESSIONID=5v***ew2a63t;Path=/;HttpOnly", "status": 200,
"url": "http://holycow-node-1:8081/api/v1/hosts", "user": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "vary": "Accept-Encoding, User-Agent", "x_content_type_options": "nosniff", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}
Notice URL "url": "http://holycow-node-1:8081/api/v1/hosts"
Should be "url": "http://holycow-node-1.fyre.ibm.com:8081/api/v1/hosts"
The problem:
which is missing the ansible_domain
qualifier like so:
url: "http://{{ groups['master'][0] }}.{{ ansible_domain }}:8081/api/v1/hosts"
I will create a PR shortly.
@marcindulak -- FYI
When running playbook setup-enterprise-gateway.yml
on my Mac to setup Enterprise Gateway on a remote Ambari cluster, it fails at TASK [notebook : download and install elyra]
with error:
TASK [notebook : download and install elyra] *******************************************
fatal: [notagain-node-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect
to the host via ssh: ssh: connect to host localhost port 22: Connection refused\r\n",
"unreachable": true}
A bit more context (no errors prior):
...
TASK [notebook : debug] *****************************************************************************************************************************************************
ok: [notagain-node-1] => {
"msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-2] => {
"msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-3] => {
"msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-4] => {
"msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
TASK [notebook : download and install elyra] ********************************************************************************************************************************
fatal: [notagain-node-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host localhost port 22: Connection refused\r\n", "unreachable": true}
NO MORE HOSTS LEFT **********************************************************************************************************************************************************
to retry, use: --limit @/Users/ckadner/PycharmProjects/spark-cluster-install/setup-enterprise-gateway.retry
PLAY RECAP ******************************************************************************************************************************************************************
notagain-node-1 : ok=24 changed=11 unreachable=1 failed=0
notagain-node-2 : ok=22 changed=9 unreachable=0 failed=0
notagain-node-3 : ok=22 changed=9 unreachable=0 failed=0
notagain-node-4 : ok=22 changed=9 unreachable=0 failed=0
I am trying to run some jobs on the spark-cluster, the job finish but i am not able to see submitted jobs on spark history journal:
But nothing show up on spark history:
Just to add to this, i installed pyspark on anaconda by running conda install -c conda-forge pyspark
to be able to load pyspark module.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.