Coder Social home page Coder Social logo

lresende / ansible-spark-cluster Goto Github PK

View Code? Open in Web Editor NEW
59.0 9.0 34.0 207 KB

Ansible roles to install an Spark Standalone cluster (HDFS/Spark/Jupyter Notebook) or Ambari based Spark cluster

License: Apache License 2.0

Shell 100.00%
ansible ansible-roles anaconda apache-spark apache-ambari jupyter-notebook jupyter-enterprise-gateway

ansible-spark-cluster's People

Contributors

aazhou1 avatar akchinstc avatar ckadner avatar kevin-bates avatar leucir avatar lresende avatar marcindulak avatar sanjay-saxena avatar sumkincpp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ansible-spark-cluster's Issues

Enable to load kernels using entreprise_gateway?

Thanks for this solution, we love the idea of connecting jupyter notebooks to spark cluster. I gone through the following ansible playbook and i was able to get the setup up and running on gcloud compute engine. Right now i am facing an issue while trying to connecting my jupyter notebook to the cluster [W 21:49:37.661 NotebookApp] Error loading kernelspec 'python3' and i am not able to catch what i am missing. I follow up the following instruction to connect my notebook:

export KG_URL=http://spark-master:8888 export KG_HTTP_USER=elyra export KG_HTTP_PASS= export KG_REQUEST_TIMEOUT=30 export KERNEL_USERNAME=${KG_HTTP_USER} jupyter notebook \ --NotebookApp.session_manager_class=nb2kg.managers.SessionManager \ --NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager \ --NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager

Any help on this?

jdk download fails when running common/tasks/java.yml

The following failure is occurring when attempt to run the setup-ambari-cluster playbook:

fatal: [elyra-wtf2]: FAILED! => {"changed": false, "dest": "/tmp/ansible-install/jdk-8u144-linux-x64.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "http://download.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.rpm"}

After thinking this might be because 8u144 is no longer available, I attempted to update the files to use 8u152 after determining the appropriate build identifier and md5 hash, but got the same kind of issue:

fatal: [elyra-wtf2]: FAILED! => {"changed": false, "dest": "/tmp/ansible-install/jdk-8u152-linux-x64.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "http://download.oracle.com/otn-pub/java/jdk/8u152-b16/b6979be30bdc4077dc93cd99134ad84d/jdk-8u152-linux-x64.rpm"}

Since there doesn't seem to be a way to "get the latest", it would be nice to figure out a better way to determine the download url and what is causing the 404 exception.

The workaround is to download the appropriate rpm file to /tmp/ansible-install on each node, then update roles/common/defaults/main.yml with any file name changes, roles/common/tasks/main.yml to not delete the install_temp_dir directory, and roles/common/tasks/java.yml to not delete the rpm from the install_temp_dir and not perform the download.

Detailed steps for setting up a working development environment

Use standard ansible scripts to setup the environment on a brand new stack/cluster with 4 nodes based on Redhat 7.3. Here are the steps from @sxguo to setup the environment on the master node:

  • SSH into the master node as root.
  • Installing Ansible on RHEL - Execute the following steps on the master node:
  • Updating Ansible configuration on the master node
    • Add(read uncomment) the following configuration in /etc/ansible/ansible.cfg

      [defaults]
      host_key_checking = False
      hash_behaviour = merge

  • git clone https://github.com/lresende/spark-cluster-install on your local machine
  • Zip up spark-cluster-install folder on local machine and upload the archive to the master node
  • Unzip the archive to create spark-cluster-install folder on the master node
  • cd spark-cluster-install on the master node
  • Edit hosts-fyre-spark with node names/ips for your cluster on the master node
  • Execute ansible-playbook --verbose setup-ambari-cluster.yml -i hosts-fyre-spark on the master node

Once this is done, start Enterprise Gateway on the master node as shown below:

$ cd /opt/elyra/bin
$ start_elyra.sh

This will result in the following exception:

[E 2017-10-13 15:10:28.215 EnterpriseGatewayApp] Exception 'AuthenticationException' 
occurred when creating a SSHClient connecting to '172.16.193.76' with user 'elyra', 
message='Authentication failed.'.

Note that EG_REMOTE_USER is set to elyra in the /opt/elyra/bin/start_elyra.sh. Change the vaue of EG_REMOTE_USER to root in /opt/elyra/bin/start_elyra.sh and run it again. This time, it will launch Enterprise Gateway successfully.

Stuck on task "verify connection to ambari-server port 8081"

My hosts:

[master]
holycow-node-1   ansible_host=holycow-node-1.fyre.ibm.com   ansible_host_id=1

[nodes]
holycow-node-2   ansible_host=holycow-node-2.fyre.ibm.com   ansible_host_id=2
holycow-node-3   ansible_host=holycow-node-3.fyre.ibm.com   ansible_host_id=3
holycow-node-4   ansible_host=holycow-node-4.fyre.ibm.com   ansible_host_id=4

Command:

ansible-playbook --verbose setup-ambari.yml -i hosts-fyre -c paramiko

Log:

TASK [ambari : restart ambari-server on master node] **************************************************************************************************************************************
skipping: [holycow-node-2] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
skipping: [holycow-node-3] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
skipping: [holycow-node-4] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
changed: [holycow-node-1] => {"changed": true, "cmd": "/usr/sbin/ambari-server restart", "delta": "0:00:20.229347", "end": "2017-11-09 17:05:04.540365", "failed": false, "rc": 0, "start": "2017-11-09 17:04:44.311018", "stderr": "", "stderr_lines": [], "stdout": "Using python  /usr/bin/python\nRestarting ambari-server\nAmbari Server is not running\nAmbari Server running with administrator privileges.\nOrganizing resource files at /var/lib/ambari-server/resources...\nAmbari database consistency check started...\nServer PID at: /var/run/ambari-server/ambari-server.pid\nServer out at: /var/log/ambari-server/ambari-server.out\nServer log at: /var/log/ambari-server/ambari-server.log\nWaiting for server start......................\nServer started listening on 8081\n\nDB configs consistency check: no errors and warnings were found.", "stdout_lines": ["Using python  /usr/bin/python", "Restarting ambari-server", "Ambari Server is not running", "Ambari Server running with administrator privileges.", "Organizing resource files at /var/lib/ambari-server/resources...", "Ambari database consistency check started...", "Server PID at: /var/run/ambari-server/ambari-server.pid", "Server out at: /var/log/ambari-server/ambari-server.out", "Server log at: /var/log/ambari-server/ambari-server.log", "Waiting for server start......................", "Server started listening on 8081", "", "DB configs consistency check: no errors and warnings were found."]}

TASK [ambari : verify connection to ambari-server port 8081] ******************************************************************************************************************************
Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': ok: [holycow-node-1] => {"attempts": 1, "cache_control": "no-store", "changed": false, "connection": "close", "content_type": "text/plain", "cookies": {"AMBARISESSIONID": "5v***ew2a63t"}, "expires": "Thu, 01 Jan 1970 00:00:00 GMT", "failed": false, "msg": "OK (unknown bytes)", "pragma": "no-cache", "redirected": false, "set_cookie": "AMBARISESSIONID=5v***ew2a63t;Path=/;HttpOnly", "status": 200, 
"url": "http://holycow-node-1:8081/api/v1/hosts", "user": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "vary": "Accept-Encoding, User-Agent", "x_content_type_options": "nosniff", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}

Notice URL "url": "http://holycow-node-1:8081/api/v1/hosts"

Should be "url": "http://holycow-node-1.fyre.ibm.com:8081/api/v1/hosts"

The problem:

https://github.com/lresende/spark-cluster-install/blob/9fea4448ef5ff927194a83dd3f33d9bd992db1a6/roles/ambari/tasks/install.yml#L62

which is missing the ansible_domain qualifier like so:

url: "http://{{ groups['master'][0] }}.{{ ansible_domain }}:8081/api/v1/hosts"

I will create a PR shortly.

@marcindulak -- FYI

Can't run `setup-enterprise-gateway.yml` with remote hosts (Mac -> Ambari cluster)

When running playbook setup-enterprise-gateway.yml on my Mac to setup Enterprise Gateway on a remote Ambari cluster, it fails at TASK [notebook : download and install elyra] with error:

TASK [notebook : download and install elyra] *******************************************
fatal: [notagain-node-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect 
to the host via ssh: ssh: connect to host localhost port 22: Connection refused\r\n", 
"unreachable": true}

A bit more context (no errors prior):

...

TASK [notebook : debug] *****************************************************************************************************************************************************
ok: [notagain-node-1] => {
    "msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-2] => {
    "msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-3] => {
    "msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-4] => {
    "msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}

TASK [notebook : download and install elyra] ********************************************************************************************************************************
fatal: [notagain-node-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host localhost port 22: Connection refused\r\n", "unreachable": true}

NO MORE HOSTS LEFT **********************************************************************************************************************************************************
	to retry, use: --limit @/Users/ckadner/PycharmProjects/spark-cluster-install/setup-enterprise-gateway.retry

PLAY RECAP ******************************************************************************************************************************************************************
notagain-node-1            : ok=24   changed=11   unreachable=1    failed=0   
notagain-node-2            : ok=22   changed=9    unreachable=0    failed=0   
notagain-node-3            : ok=22   changed=9    unreachable=0    failed=0   
notagain-node-4            : ok=22   changed=9    unreachable=0    failed=0   

Unable to see submitted job on spark history journal?

I am trying to run some jobs on the spark-cluster, the job finish but i am not able to see submitted jobs on spark history journal:

screen shot 2018-09-09 at 12 59 33 pm

But nothing show up on spark history:
screen shot 2018-09-09 at 1 23 15 pm

Just to add to this, i installed pyspark on anaconda by running conda install -c conda-forge pyspark to be able to load pyspark module.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.