Coder Social home page Coder Social logo

cdsw_install's People

Contributors

griffinridgeback avatar tobyhferguson avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cdsw_install's Issues

Ensure ulimit is set to 1048576

Saw this in the cdsw init output:

WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576.
It is currently set to [32768] as per 'ulimit -n'

I think I need to add lines thus:

cat >/etc/security/limits.conf <<EOF
* soft nofile 1048576
* hard nofile 1048576
EOF

And also to set both hard and soft limits in the currently running system:

ulimit -n 1048576

Update the documentation to make it easier for people to understand this project

Few people have just cloned this project and got it to work. I believe that's because the README is too sense - they just gloss over it and never get to grips with what they've got to do.

I propose dramatically shortening the README and replacing it with a wiki page that describes the steps etc. in a more easily consumed fashion.

Update build structure to ensure stability and manageability

I think I should be much more prescriptive in the build to make it easier for people to use.

In particular I should fix the exact versions of CDH and other parcels, and then tag these releases. That way one can check out a specific tag and know that it worked with that specific release of CDH, CDSW, Anaconda, Spark etc. etc.

Remove reverse lookup from AWS bind server

There might be no need to provide reverse lookup capabilities using the local bind server in AWS - simply handling the cdsw.cdh-cluster.internal domain should be sufficient, provided that's a CNAME to the internal machines.

Update to cdsw 1.0.1

Modify repo to use baseurl of 1.0.1

[cloudera-cdsw]
# Packages for Cloudera's Distribution for data science workbench, Version 1, on RedHat	or CentOS 7 x86_64
name=Cloudera's Distribution for cdsw, Version 1
# old baseurl
# baseurl=https://archive.cloudera.com/cdsw/1/redhat/7/x86_64/cdsw/1/

baseurl=https://archive.cloudera.com/cdsw/1/redhat/7/x86_64/cdsw/1.0.1/
gpgkey =https://archive.cloudera.com/cdsw/1/redhat/7/x86_64/cdsw/RPM-GPG-KEY-cloudera    
gpgcheck = 1

probably

sed -i 's|/1/$|/1.0.1/' ....

creating multiple CDSW Clusters for difference environments DEV / STG / PRD

Dear Toby,

I would like to investigate with you the possibility of adding an ENVIRONMENT variable for deployment.

Indeed, it is quite common to deploy a DEV / STG / PRD environment.
Which only differs on 3 parameters: instance prefix (common.conf & aws/instance.conf), instance type (aws/instance.conf), instance count (common.conf).

My proposal in multiple steps:
1°) Will it be possible to create a new instance type for CM in aws/instance.conf?
Then I will be able to centralise instance prefix, type in the same file.

2°) Will it be possible to add an environment variable in provider.properties?
For example: ENVIRONMENT=DEV
And update the prefix with --cdsw-${name}
So that we do not modify the prefix under aws/instance.conf, but only the instance types.

3°) A way to variabilise the number of instance per environment ... any suggestions?

BR

OWNER is duplicated in 2 different .properties files

Hi Toby,

Updated the AWS directory .properties files.
And noticed that the OWNER global variable has been duplicated in provider.properties.

grep OWNER aws/*
aws/instances.conf: owner: ${OWNER}
aws/owner_tag.properties:OWNER=aheib
aws/provider.properties:OWNER=aheib

BR

Consolidate the multiple provider specific files into a smaller number.

At least one set of users found it confusing to have multiple .properties and .conf files for each cloud provider.

Perhaps it would be easier to focus on providing all of the user settable information in a single .properties file.

Of course it would be nice to have a set of sensible defaults, and only have the user need to worry about specific values they must change. Maybe just divide the properties file into two sections, mandatory and default.

The goal is to make this as low touch as possible to enable a user to create a simple cluster. Thus minimizing configuration to essential only is important.

Another thing that I want to do is to make it clear which bits of the system are not going to be put into git. A GITIGNORED directory might be the way to go, and then put the SECRETS and the ssh private keys in there.

cdsw can fail to mount nfs

Sometimes cdsw fails to mount nfs properly.

The solution is to ensure that the rpcbind service is started:

systemctl start rpcbind

GCP - make the workers be stoppable

With the current release the workers have Local SSD Scratch Disks and they cannot be stopped. This means that we have to delete and then recreate a cluster, which is painful.

gcp/instances.conf contains:

    worker : ${common-instance-properties} {
    	   type: n1-highmem-4
	   instanceNamePrefix: worker-${name}
	   dataDiskCount:1
	   dataDiskType: Standard
    }

I think the disk type needs to be something like pd-standard ...

New Anaconda version

Hi Toby,

Just ran into an issue on the deployment.
There has been some version incompatibility during our latest deployment (with Andre Molinaar).

Found errors in cluster configuration:

  • ErrorInfo{code=NO_PARCEL_FOUND_WITH_VERSION, properties={availableProductVersions=4.3.1, clusterProduct=Anaconda, clusterProductVersion=4.2}, causes=[]}

The workaround is to modify the "common.conf", and replaced Anaconda: 4.2 to Anaconda: 4.3.1.

Not sure if it is the best approach.

BR

CDH Packages not visible on AWS with 5.12

#fc45bce
When I use an AMI that doesn't have CDH prepackaged the system fails and the following error is seen in the director server log:

Waiting for product versions to be visible in CM: {CDH=5.12.0-1.cdh5.12.0.p0.29}

I've tried this with the following AMIs in West-2 region:

  • ami-a3fa16c3 - rhel 72 ami taken from the faster-bootstrap system
  • ami-5dd3743d - community rhel 72
  • ami-e2167182 - community rhel 73

I have no idea why the specific CDH product version is being searched for - it certainly isn't with the URLs that I gave

install of cdsw failed because data2 wasn't found

Filippo had an issue where his disks were mounted as data1 and data2, not data0 and data1 - he's using Director 2.5. Maybe that's the problem?

To fix the issue we had to remove data2 from /etc/fstab and to edit /etc/cdsw/config/cdsw.conf to setup DBD

Clock offset problem

Test of the host clock's offset from its NTP server.

Bad : The host's NTP service could not be located or did not respond to a request for the clock offset.
Actions

Change Host Clock Offset Thresholds for all hosts
Change Host Clock Offset Thresholds for this host
Advice

This is a host health test that checks if the host's system clock appears to be out-of-sync with its NTP server(s). The test uses the 'ntpdc -np' (if ntpd is running) or 'chronyc sources' (if chronyd is running) command to check that the host is synchronized to an NTP peer and that the absolute value of the host's clock offset from that peer is not too large. If the command fails, NTP is not synchronized to a server, or the host's NTP daemon is not running or cannot be contacted, the test returns "Bad" health.

The 'ntpdc -np' or 'chronyc sources' output contains a row for each of the host's NTP servers. The row starting with a '' (if ntpdc) or '^' (if chronyc) contains the peer to which the host is currently synchronized. No row starting with a '' or '^' indicates that the host is not currently synchronized. Communication errors, and an offset between the peer and the host time that is too large, are examples of conditions that can lead to a host being unsynchronized.

Make sure that UDP port 123 is open on any firewall that is in use. Check the system log for ntpd or chronyd messages related to configuration errors. If running ntpd, use 'ntpdc -c iostat' to verify that packets are sent and recieved between the different peers. More information about the conditions of each peer can be found by running the command 'ntpq -c as'. The output of this command includes the association ID that can be used in combination with 'ntpq -c "rv "' to get more information about the status of each peer. Use the command 'ntpq -c pe' to return a summary of all peers and the reason they are not in use. If running chronyd, use 'chronyc activity' to check how many NTP sources are online/offline. More information about the conditions of each peer can be found by running the command 'chronyc sourcestats'. To check chrony tracking, issue the command 'chronyc tracking'.

If NTP is not in use on the host, disable this check for the host, using the configuration options shown below. Cloudera recommends using NTP for time synchronization of Hadoop clusters.

A failure of this health test can indicate a problem with the host's NTP service or configuration.

This test can be configured using the Host Clock Offset Thresholds host configuration setting.

ensure selinux is off before running cdsw init

It seems that simply doing a setenforce 0 might be insufficient; the selinux state is PERMISSIVE (or something like that) and it might be necessary to reboot to setup selinux to DISABLED

separate the scripting from the conf file

In Director 2.4 its possible to simply refer to local script files rather than put all the text into the conf file. This would greatly simplify the conf files and allow for easier maintenance.

Use Relative Path includes

HOCON allows for a relative path include as per [https://github.com/typesafehub/config/blob/master/HOCON.md#include-semantics-file-formats-and-extensions]:

if the included file is a relative path, then it should be located relative to the directory containing the including file. The current working directory of the process parsing a file must NOT be used when interpreting included paths.

Allow for different domain names

The domain name is fixed at cdh-cluster.internal - might want to consider changing this and allowing the user to specify the name.

Kerberos add principle command error in README.md

Tried running sudo kadmin.local addprinc cdsw -pw Cloudera1

Getting an error:

[centos@ip-10-0-0-33 ~]$ sudo kadmin.local addprinc cdsw -pw Cloudera1
usage: add_principal [options] principal
	options are:
		[-randkey|-nokey] [-x db_princ_args]* [-expire expdate] [-pwexpire pwexpdate] [-maxlife maxtixlife]
		[-kvno kvno] [-policy policy] [-clearpolicy]
		[-pw password] [-maxrenewlife maxrenewlife]
		[-e keysaltlist]
		[{+|-}attribute]
	attributes are:
		allow_postdated allow_forwardable allow_tgs_req allow_renewable
		allow_proxiable allow_dup_skey allow_tix requires_preauth
		requires_hwauth needchange allow_svr password_changing_service
		ok_as_delegate ok_to_auth_as_delegate no_auth_data_required

where,
	[-x db_princ_args]* - any number of database specific arguments.
			Look at each database documentation for supported arguments

I ran sudo kadmin.local addprinc -pw Cloudera1 cdsw and that worked.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.