cloudera / cloudera-playbook Goto Github PK
View Code? Open in Web Editor NEWCloudera deployment automation with Ansible
License: Apache License 2.0
Cloudera deployment automation with Ansible
License: Apache License 2.0
Hi,
I have created a new ansible script for multi-note deployment and configuration of CDH. I want to publish this is this or a related repository. What are the steps to do this?
Thanks,
Vicky
@roczei could you please take a look when you have time? Otherwise I will try to fix when I get a chance.
Error received in installing 5.14.0.
Command works fine locally.
TASK [scm : Install CM Python API Client]
Basically running playbook from CM host and getting below error, tried lot of things however could not figure out what is going wrong, if it’s problem with variables in group_var or am I missing something.
TASK [scm : file] **********************************************************************************************************************************************************************************************************************************************************************************************************************************************************
changed: [kkulkani-cdhkerberos-1]
TASK [scm : Import KDC admin credentials] **********************************************************************************************************************************************************************************************************************************************************************************************************************************
ok: [kkulkani-cdhkerberos-1]
TASK [scm : Wait for agent heartbeats] *************************************************************************************************************************************************************************************************************************************************************************************************************************************
Pausing for 30 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
Press 'C' to continue the play or 'A' to abort
ok: [kkulkani-cdhkerberos-1]
TASK [scm : Prepare CMS template] ******************************************************************************************************************************************************************************************************************************************************************************************************************************************
fatal: [kkulkani-cdhkerberos-1]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute u'kkulkani-cdhkerberos-1'"}
I have been slowly chugging through getting this repo to work for me, and I think I have finally hit a problem I don't know how to solve whatsoever myself. I am setting up with an SCM server, a DB server, an edge server, 2 name nodes, and 4 data nodes (different from readme by lack of a 3rd name node and no KRB5) This is the error I am getting:
TASK [cdh : Wait for import cluster template command to complete] ****************************************************************************************************************************************** FAILED - RETRYING: Wait for import cluster template command to complete (10 retries left). FAILED - RETRYING: Wait for import cluster template command to complete (9 retries left). FAILED - RETRYING: Wait for import cluster template command to complete (8 retries left). FAILED - RETRYING: Wait for import cluster template command to complete (7 retries left). FAILED - RETRYING: Wait for import cluster template command to complete (6 retries left). FAILED - RETRYING: Wait for import cluster template command to complete (5 retries left). FAILED - RETRYING: Wait for import cluster template command to complete (4 retries left). fatal: [scm-server.eastus.companyname.com]: FAILED! => {"attempts": 8, "changed": false, "connection": "close", "content": "{\n \"id\" : 34,\n \"name\" : \"ClusterTemplateImport\",\n \"startTime\" : \"2020-01-17T00:20:36.163Z\",\n \"endTime\" : \"2020-01-17T00:27:04.770Z\",\n \"active\" : false,\n \"success\" : false,\n \"resultMessage\" : \"Failed to import cluster template.\",\n \"children\" : {\n \"items\" : [ {\n \"id\" : 46,\n \"name\" : \"First Run\",\n \"startTime\" : \"2020-01-17T00:27:03.941Z\",\n \"endTime\" : \"2020-01-17T00:27:04.765Z\",\n \"active\" : false,\n \"success\" : false,\n \"resultMessage\" : \"Failed to perform First Run of services.\"\n }, {\n \"id\" : 36,\n \"name\" : \"DeployParcels\",\n \"startTime\" : \"2020-01-17T00:20:36.446Z\",\n \"endTime\" : \"2020-01-17T00:27:00.061Z\",\n \"active\" : false,\n \"success\" : true,\n \"resultMessage\" : \"The Following parcels successfully activated : CDH-6.3.2-1.cdh6.3.2.p0.1605554.\",\n \"clusterRef\" : {\n \"clusterName\" : \"cluster_1\",\n \"displayName\" : \"cluster_1\"\n }\n } ]\n },\n \"canRetry\" : true\n}", "content_type": "application/json;charset=utf-8", "cookies": {"CLOUDERA_MANAGER_SESSIONID": "node0lu3fbcycum4o1wjvg7w5o2rjl16974.node0"}, "cookies_string": "CLOUDERA_MANAGER_SESSIONID=node0lu3fbcycum4o1wjvg7w5o2rjl16974.node0", "date": "Fri, 17 Jan 2020 00:27:43 GMT", "elapsed": 0, "expires": "Thu, 01 Jan 1970 00:00:00 GMT", "failed_when_result": true, "json": {"active": false, "canRetry": true, "children": {"items": [{"active": false, "endTime": "2020-01-17T00:27:04.765Z", "id": 46, "name": "First Run", "resultMessage": "Failed to perform First Run of services.", "startTime": "2020-01-17T00:27:03.941Z", "success": false}, {"active": false, "clusterRef": {"clusterName": "cluster_1", "displayName": "cluster_1"}, "endTime": "2020-01-17T00:27:00.061Z", "id": 36, "name": "DeployParcels", "resultMessage": "The Following parcels successfully activated : CDH-6.3.2-1.cdh6.3.2.p0.1605554.", "startTime": "2020-01-17T00:20:36.446Z", "success": true}]}, "endTime": "2020-01-17T00:27:04.770Z", "id": 34, "name": "ClusterTemplateImport", "resultMessage": "Failed to import cluster template.", "startTime": "2020-01-17T00:20:36.163Z", "success": false}, "msg": "OK (unknown bytes)", "redirected": false, "set_cookie": "CLOUDERA_MANAGER_SESSIONID=node0lu3fbcycum4o1wjvg7w5o2rjl16974.node0;Path=/;HttpOnly", "status": 200, "url": "http://scm-server.eastus.companyname.com:7180/api/v33/commands/34", "x_content_type_options": "nosniff", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}
Yes, that is what it gives, \n's and all. Reading through it, I cannot figure out what is going wrong. The cloudera server is running, however I do see a some errors when looking through the log file (these happen 100's of lines apart, condensed for reading ):
2020-01-17 00:27:02,724 ERROR scm-web-475:com.cloudera.cmf.service.AbstractRoleHandler: Unable to generate configuration for GATEWAY base group 2020-01-17 00:27:02,725 WARN scm-web-475:com.cloudera.server.cmf.descriptor.components.DescriptorFactory: Could not generate client configs for service: YARN (MR2 Included) java.lang.RuntimeException: com.cloudera.cmf.service.config.ConfigGenException: Could not find JOBHISTORY dependent role
2020-01-17 00:27:02,848 WARN scm-web-475:com.cloudera.server.cmf.descriptor.components.DescriptorFactory: Could not generate client configs for service: Hive java.lang.RuntimeException: java.lang.RuntimeException: com.cloudera.cmf.service.config.ConfigGenException: Could not find JOBHISTORY dependent role
2020-01-17 00:31:43,798 INFO scm-web-494:com.cloudera.api.ApiExceptionMapper: Exception caught in API invocation. Msg:Role does not have a process. java.util.NoSuchElementException: Role does not have a process.
2020-01-17 00:31:43,885 WARN scm-web-494:com.cloudera.server.cmf.components.OperationsManagerImpl: Exception while building client config: java.lang.RuntimeException: com.cloudera.cmf.service.config.ConfigGenException: Could not find JOBHISTORY dependent role
2020-01-17 00:31:43,888 WARN scm-web-494:com.cloudera.api.ApiExceptionMapper: Unexpected exception. Msg:java.lang.IllegalStateException: Failed to create client configuration for service yarn java.lang.RuntimeException: java.lang.IllegalStateException: Failed to create client configuration for service yarn
2020-01-17 00:31:43,973 WARN scm-web-476:com.cloudera.server.cmf.components.OperationsManagerImpl: Exception while building client config: java.lang.RuntimeException: java.lang.RuntimeException: com.cloudera.cmf.service.config.ConfigGenException: Could not find JOBHISTORY dependent role
The list of these goes on for about another 30 errors, all mentioning JOBHISTORY. What am I doing wrong for this to occur? Do I need to be running the third name node?
... and therefore does not support Python 3, which is EOL soon.
What's worse is that because the plugin runs on the Ansible control node, it also means we don't support running Ansible itself with Python 3. We need to fix this code or replicate what it does without using the Python CM API client.
Error message:
TASK [scm : Extract the host identifiers and names into facts]
****************************************************
task path: /root/cloudera-playbook/roles/scm/tasks/main.yml:81
fatal: [...]: FAILED! => {
“msg”: “An unhandled exception occurred while running the lookup plugin ‘template’. Error was a <type ‘exceptions.AttributeError’>, original message: ‘VariableManager’ object has no attribute ‘_loader’”
}
Version details:
ansible --version
ansible 2.9.0
config file = /root/.ansible.cfg
configured module search path = [u’/root/.ansible/plugins/modules’, u’/usr/share/ansible/plugins/modules’]
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Apr 11 2018, 07:36:10) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
Could be related to ansible/ansible#57437 ?
I'm using this playbook as a base for an offline and secured environment. From the way it works, it looks like it was designed only to deploy clusters. It's not doing anything to maintain them after that. Is there any chance to be developed that way in the future? I'm thinking about features like:
-maintaining service configuration
-creating and maintaining host templates
-detecting changes in the code/templates and applying them
I'm currently working in this direction, but I'm not too efficient due to the fact that I'm kind of new to Ansible and also Cloudera API to some extent. Thanks.
Hi,
Can I know any one have experience of running the script on latest version of cloudera.
THanks
For the moment there are NO tests at all, which makes it more difficult to refactor the code, to test changes and thus to contribute.
My proposal is to change that, using the great Molecule framework.
As a start, I got an initial version done, running in a docker container (with systemd), of course single node first (multiple nodes are possible too)
You can check at my branch: https://github.com/scigility/cloudera-playbook/tree/molecule_tests
Plz note: This is not ready yet, but I wanted to inform the community about the initial work
Error is -
fatal: [cld1.cisco.local]: FAILED! => {
"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'cld5.cisco.local'\n\nThe error appears to be in '/root/cloudera-playbook/roles/cdh/tasks/main.yml': line 40, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# https://www.cloudera.com/documentation/enterprise/latest/topics/install_cluster_template.html\n- set_fact:\n ^ here\n"
}
where cld1.cisco.local is SCM host.
Host file used for this deployment -
[scm_server]
cld1.cisco.local
[db_server]
cld1.cisco.local
[krb5_server]
cld1.cisco.local
[utility_servers:children]
scm_server
db_server
krb5_server
[gateway_servers]
cld1.cisco.local host_template=HostTemplate-Gateway
#[edge_servers]
# host_template=HostTemplate-Edge role_ref_names=HDFS-HTTPFS-1
[master_servers]
cld5.cisco.local host_template=HostTemplate-Master1
cld6.cisco.local host_template=HostTemplate-Master2
cld7.cisco.local host_template=HostTemplate-Master3
[worker_servers]
cld2.cisco.local
cld3.cisco.local
cld4.cisco.local
[worker_servers:vars]
host_template=HostTemplate-Workers
[cdh_servers:children]
utility_servers
gateway_servers
master_servers
worker_servers
#[all:vars]
#ansible_user=ec2-user
Let us know if this issue has been seen earlier and there are any fixes and/or work-around available.
Thanks,
TASK [scm : set_fact] *************************************************************************************************************************************************************************
task path: /home/deploy/cloudera-playbook/roles/scm/tasks/cms.yml:9
fatal: [cdh1.dev]: FAILED! => {
"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'cdh1.dev'\n\nThe error appears to be in '/home/deploy/cloudera-playbook/roles/scm/tasks/cms.yml': line 9, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- set_fact:\n ^ here\n"
}
I cannot seem to get around this:
TASK [cdh : Import cluster template] **************************************************************************************************************************************************************
fatal: [xxxxxxxxxxxxxxxxx]: FAILED! => {"changed": false, "connection": "close", "content": "{\n "message" : "Role configuration group reference in host template SPARK2_ON_YARN-1-SPARK2_YARN_HISTORY_SERVER-BASE is not valid.\nRole configuration group reference in host template SPARK2_ON_YARN-1-GATEWAY-BASE is not valid."\n}", "content_type": "application/json", "date": "Tue, 28 Apr 2020 18:30:08 GMT", "elapsed": 0, "expires": "Thu, 01-Jan-1970 00:00:00 GMT", "json": {"message": "Role configuration group reference in host template SPARK2_ON_YARN-1-SPARK2_YARN_HISTORY_SERVER-BASE is not valid.\nRole configuration group reference in host template SPARK2_ON_YARN-1-GATEWAY-BASE is not valid."}, "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request", "redirected": false, "server": "Jetty(6.1.26.cloudera.4)", "set_cookie": "CLOUDERA_MANAGER_SESSIONID=qz8i0dbv9an36cbnv88y8xdk;Path=/;HttpOnly", "status": 400, "url": "http://xxxxxxxxxxxxxx:7180/api/v19/cm/importClusterTemplate?addRepositories=true"}
Hi,
Im also facing the same issue. Getting error on scm template task. I have modified the inventory file with FQDN also and I have done all possible things but couldn't figure out what is the issue. Appreciated if anyone can provide us quick solution.
TASK [scm : Prepare CMS template] ****************************************************************************
task path: /root/cloudera-playbook/roles/scm/tasks/cms.yml:9
fatal: [server1.example.com]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute u'server1.example.com'"}
==========================================================
Hosts:
[scm_server]
server1.example.com license_file=/path/to/cloudera_license.txt
[db_server]
server1.example.com
[utility_servers:children]
scm_server
db_server
[gateway_servers]
servergw.example.com host_template=HostTemplate-Gateway role_ref_names=HDFS-HTTPFS-1
[master_servers]
server1.example.com host_template=HostTemplate-Master1
[worker_servers]
server2.example.com
[worker_servers:vars]
host_template=HostTemplate-Workers
[cdh_servers:children]
utility_servers
gateway_servers
master_servers
worker_servers
[all:vars]
ansible_user=root
@roczei - I'm executing playbook from local vm machine (its single node vm, CM is installed on VM), getting error - AnsibleUndefinedVariable: 'dict object' has not attribute u'xxx.xxx.xxx.xxx'"
TASK [scm : Wait for agent heartbeats] *************************************************************************************************************************************************************************************************************************************************************************************************************************************
Pausing for 30 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
Press 'C' to continue the play or 'A' to abort
ok: [xxx.xxx.xxx.xxx
TASK [scm : Prepare CMS template] ****************************************************************************************************************************************
fatal: [xxx.xxx.xxx.xxx]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute u'xxx.xxx.xxx.xxx'"}
Please note i'm using ip address : 127.0. 0.1., please guide me how to use FQDN name in this case instead of IP address -
/etc/host file look as below -
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost6.localdomain6 localhost6
host file look as below -
[local]
127.0.0.1
[scm_server]
127.0.0.1
[db_server]
127.0.0.1
[utility_servers:children]
scm_server
db_server
#[krb5_server]
#127.0.0.1
As of Cloudera Manager 6.3 there is a simplified way to enable kerberos via cluster templates, using configuration similar to this:
"instantiator": {
"clusterName": "test",
"enableKerberos": {
"datanodeTransceiverPort" : <optional/default 1004>,
"datanodeWebPort" : <optional/default 1006>
},
The playbook could expose this functionality.
After commenting out Sentry services, using this existing Ansible play fails on template import for CDHv7.1.4. Ansible log has following error -
"json": {
"active": false,
"canRetry": true,
"children": {
"items": [
{
"active": false,
"endTime": "2020-12-09T19:32:15.333Z",
"id": 52,
"name": "First Run",
"resultMessage": "Failed to perform First Run of services.",
"startTime": "2020-12-09T19:32:09.202Z",
"success": false
},
{
"active": false,
"clusterRef": {
"clusterName": "cdpsol-auto-cluster",
"displayName": "cdpsol-auto-cluster"
},
"endTime": "2020-12-09T19:31:59.777Z",
"id": 41,
"name": "DeployParcels",
"resultMessage": "The Following parcels successfully activated : CDH-7.1.4-1.cdh7.1.4.p0.6300266.",
"startTime": "2020-12-09T19:24:52.679Z",
"success": true
}
]
},
"endTime": "2020-12-09T19:32:15.335Z",
"id": 39,
"name": "ClusterTemplateImport",
"resultMessage": "Failed to import cluster template.",
"startTime": "2020-12-09T19:24:52.535Z",
"success": false
},
"msg": "OK (unknown bytes)",
"pragma": "no-cache",
"redirected": false,
"set_cookie": "SESSION=f5f6ea92-46d8-48a0-bd97-f9b6c5aeabf9;Path=/;HttpOnly",
"status": 200,
"url": "http://cdp-scmnode.cisco.local:7180/api/v42/commands/39",
"x_content_type_options": "nosniff",
"x_frame_options": "DENY",
Command status on the UI shows the error as - Command failed to run because service HUE has invalid configuration. First error : Expected dependency of type HIVE_ON_TEZ/HIVE_LLAP but is HIVE.
SS attached herewith.
If we accept the dialogue to fix this issue via cluster url, it goes through successful deployment of cluster.
Any pointers how to fix this through Ansible play book code, would be highly helpful.
Basically running playbook from CM host and getting below error, tried lot of things however could not figure out what is going wrong.
PLAY [Install CDH] ************************************************************************************************************************************
TASK [Gathering Facts] ********************************************************************************************************************************
ok: [leon-test-1.test.com]
TASK [cdh : include_vars] *****************************************************************************************************************************
ok: [leon-test-1.test.com]
TASK [cdh : include_vars] *****************************************************************************************************************************
ok: [leon-test-1.test.com]
TASK [cdh : include_vars] *****************************************************************************************************************************
ok: [leon-test-1.test.com]
TASK [cdh : include_vars] *****************************************************************************************************************************
ok: [leon-test-1.test.com]
TASK [cdh : Check whether cluster exists] *************************************************************************************************************
ok: [leon-test-1.test.com]
TASK [cdh : set_fact] *********************************************************************************************************************************
ok: [leon-test-1.test.com]
TASK [cdh : debug] ************************************************************************************************************************************
ok: [leon-test-1.test.com] => {
"msg": "Cluster 'cluster_1' exists - False"
}
TASK [cdh : Prepare cluster template] *****************************************************************************************************************
fatal: [leon-test-1.test.com -> localhost]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.parsing.yaml.objects.AnsibleUnicode object' has no attribute u'leon-test-1.test.com'"}
to retry, use: --limit @/root/shell/cluster-script/test/cloudera-playbook-master/cdh.retry
PLAY RECAP ********************************************************************************************************************************************
leon-test-1.test.com : ok=8 changed=0 unreachable=0 failed=1
hosts
[scm_server]
leon-test-1.test.com license_file=/path/to/cloudera_license.txt
[db_server]
leon-test-1.test.com
[krb5_server]
leon-test-1.test.com default_realm=
[utility_servers:children]
scm_server
db_server
krb5_server
[gateway_servers]
leon-test-1.test.com host_template=HostTemplate-Gateway role_ref_names=HDFS-HTTPFS-1
[master_servers]
leon-test-1.test.com host_template=HostTemplate-Master1
leon-test-2.test.com host_template=HostTemplate-Master2
leon-test-3.test.com host_template=HostTemplate-Master3
[worker_servers]
leon-test-3.test.com
leon-test-4.test.com
leon-test-5.test.com
[worker_servers:vars]
host_template=HostTemplate-Workers
[cdh_servers:children]
utility_servers
gateway_servers
master_servers
worker_servers
Hi everyone,
I'm currently working on a playbook in wich I need to upload a script to my remote server.
`- name: set currentminio_port
set_fact:
curr_port: "{{ item.value.minio_port }}"
name:
debug:
msg: "{{ install_dir }}/minio/bin/minio-{{ curr_port }}.sh"
name: minio | Install | copy policy_file
become: yes
#no_log: true
template:
src: "{{ policy_file }}"
dest: "{{ install_dir }}/minio/policies/minio_policy_{{ vm_role }}_{{ bucket_list }}-{{ curr_port }}.json"
owner: "{{ user_app }}"
group: "{{ user_app }}"
when: policy_file is defined
name:
debug:
msg: "item: {{ item }} \nitem.key: {{ item.key }}\nitem.value: {{ item.value }}\nitem.value.minio_port: {{ item.value.minio_port }}"
name: minio | Install | upload minio script
become: yes
#no_log: true
template:
src: "minio.sh"
dest: "{{ install_dir }}/minio/bin/minio-{{ item.value.minio_port }}.sh"
owner: "{{ user_app }}"
group: "{{ user_app }}"
mode: "0750"`
This is the last template task who fail each time I launch the playbook.
I'm looping over a dictionnary that look like that:
ports: dumps: minio_port: 8100 minio_console_port: 8101 executables: minio_port: 8102 minio_console_port: 8103
and to finish my issue I let you see the log so you can check that my variables are reachable.
`TASK [minio : debug] **************************************************************************************************************************************************************************************
ok: [Z36-DV-I1-PSQ01] =>
msg: /opt/minio/bin/minio-8100.sh
TASK [minio : minio | Install | copy policy_file] *********************************************************************************************************************************************************
ok: [Z36-DV-I1-PSQ01]
TASK [minio : debug] **************************************************************************************************************************************************************************************
ok: [Z36-DV-I1-PSQ01] =>
msg: |-
item: {'key': u'dumps', 'value': {u'minio_port': 8100, u'minio_console_port': 8101}}
item.key: dumps
item.value: {u'minio_port': 8100, u'minio_console_port': 8101}
item.value.minio_port: 8100
TASK [minio : minio | Install | upload minio script] ******************************************************************************************************************************************************
fatal: [Z36-DV-I1-PSQ01]: FAILED! => changed=false
msg: 'AnsibleUndefinedVariable: ''dict object'' has no attribute ''minio_port'''
PLAY RECAP ************************************************************************************************************************************************************************************************
Z36-DV-I1-PSQ01 : ok=28 changed=1 unreachable=0 failed=1 skipped=28 rescued=0 ignored=0 `
I hope someone already expereinced this issue and can help me.
Thanks in advance
Hi I am setting up cluster without license, I managed to run the script though found few issues in the script. Will share the details, I removed some lines of code from script to run without license as those modules were included in script like advanced reporting,
Can any body suggest what changes need to be made to run the script without license or free mode
As already mentioned in #40 (comment)
I detected a number of former PR/fixes, that were not anymore in master (since too long)
I identified 5 LOST PRs, by looking at:
https://github.com/cloudera/cloudera-playbook/pulls?q=is%3Apr+is%3Aclosed+sort%3Aupdated-desc
(and analyzing the code)
#25, merged 20190827, 11h48
#28, merged 20190827, 11h58
#37, merged 20190828, 10h49
#38, merged 20190829, 19h14
#39, merged 20190830, 11h56
Today I'll focus on bringing back #28, which contains critical fixes required for an install happening today at one customer where imstall runs from AWX/Tower
In can also work on brining back the useful changes from PRs 37-39 (all by @dbeech ), if @dbeech has no time
Please note I am trying to install a single node cluster with all basic services.
I have deployed cdp 7.0.3 and cdh 7.0.3 with some tweaks in this code and the playbook but it failed at the last step on importing cluster template (error is "Could not find Oozie Server for service oozie") and my issues are:
I can see that services are deployed but all services are stopped and in error state
4 major services i.e. Hue, Impala, Oozie and Spark are not deployed with their server e.g. oozie server.
I tried to manually install the services and put the oozie server but its not letting me do so.
However, I can see that the config is already present for oozie database.
I am attaching the cluster template i.e. http://x.x.x.x:7180/api/v40/cm/deployment
cdp_cluster_template.txt
I added the hdfs role later (not included in the template). now it asking to deploy minimum 3 journal nodes, that I will figure out.
Please see if we need to modify the cluster and service templates in the cdh role to fix these.
I just hope nobody used it and excluded it in the site.yml playbook (or by using tags, but not 'java' tag) ..
since JDK8 is more than recommended since some years now
As it's not the idea to use external ansible roles here (of course a large choice out there for Java/JDK installs), 2 good solutions:
JCE unlimited policy
tasks!)Hi, I see following error while running the playbook.
"msg": "No hosts defined in SCM"
It's fails in the,
I tried looking at the scm_hosts.py under action_plugin but no luck. Can someone help on this pls ?
The JCE installation code tries to check/edit the file $JAVA_HOME/jre/lib/security/java.security
but this does not exist after installation of java-11-openjdk-devel
package. We need to ignore errors or skip that task.
Hi, this repo has been very stale and I see that other contributors have forked this project and made some valuable changes. Would you be able to aggregate those changes and update this repository in order to have a more central and trustworthy repository to look for.
Thanks in advance! ;)
I have been working with this repo for the past couple days and have had a recurring issue that I cannot seem to solve. I am using Microsoft Azure as a server host, and I am trying to run this playbook without KRB5. I have followed the instructions given, and am receiving this error:
fatal: [business-scm-server.eastus.cloudapp.azure.com]: FAILED! => {"changed": false, "elapsed": 301, "msg": "Timeout when waiting for business-scm-server.eastus.cloudapp.azure.com:7180"}
I have checked the scm server to determine that cloudera is running, however I am unable to connect to it through browser. I am unable to determine the cause of this. Did I have to change something not in the documentation due to Azure?
Hi I am trying to install cdp 7.0.3 using this playbook and I have managed to install it. The second last step of play "Import cluster template" didn't run successfully and it failed with error below. Could someone help me here please
java.lang.IllegalArgumentException: No service type 'SENTRY' available for cluster with version 'CDH 7.0.3'.
fatal: [52.61.239.57]: FAILED! => {"changed": false, "connection": "close", "content": "{\n \"message\" : \"Invalid role mapping for HDFS-HTTPFS-1. No role group configuration of type HTTPFS is present in the host template HostTemplate-Gateway.\"\n}", "content_type": "application/json;charset=utf-8", "date": "Mon, 30 Sep 2019 21:06:45 GMT", "elapsed": 0, "expires": "Thu, 01 Jan 1970 00:00:00 GMT", "json": {"message": "Invalid role mapping for HDFS-HTTPFS-1. No role group configuration of type HTTPFS is present in the host template HostTemplate-Gateway."}, "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request", "redirected": false, "set_cookie": "CLOUDERA_MANAGER_SESSIONID=node07hsc8tsx0kuffp3j6ziceh437.node0;Path=/;HttpOnly", "status": 400, "url": "http://52.61.239.57:7180/api/v33/cm/importClusterTemplate?addRepositories=true", "x_content_type_options": "nosniff", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}
I get error on temp file, when I switch value in ansible.cfg value of pipelining the error goes away and come back not sure why?
fatal: [cloudmanager.ee-hadoop.com.au]: FAILED! => {"msg": "Failed to get information on remote file (/home/hadoop/tmp/scm.json): Sorry, try again.\n[sudo via ansible, key=bkfkxloiefebvjjruekluqqrpzaeymwo] password: \nsudo: 1 incorrect password attempt\n"}
From testing today we observed that sometimes the download from https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.48.zip
worked and sometimes timed-out. When it does work, it actually redirects to a CDN (https://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-5.1.48.zip
)
I guess there could be some rate-limiting in place?
Although on first installation it starts OK, after restarting the server it assumes that mysql_pid_file dir will be at /var/run/mariadb instead of the one declared in the template.
On a centos 7.4 system, the /roles/mariadb/tasks/main.yml fails when trying to start mariadb after successfully creating the configuration file, the log file and PID directory.
A manual yum install of mariadb-server works fine.
Logs attached:
logs.txt
job_35.txt
I set
krb5_kdc_type: none
in all
but I got:
TASK [scm : Update Cloudera Manager settings] ************************************************************
fatal: [ip-10-0-1-170.ap-southeast-2.compute.internal]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'krb5_server'\n\nThe error appears to be in '/home/cdh_terraform_aws/cloudera-playbook/roles/scm/tasks/scm.yml': line 4, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# https://cloudera.github.io/cm_api/apidocs/v13/path__cm_config.html\n- name: Update Cloudera Manager settings\n ^ here\n"}
Everything else was straight from the master branch
any ideas?
Is there any intent or desire to support ubuntu as a platform?
Service are not getting assigned after playbook execution.
I suggest we use a well-established role from Ansible Galaxy to do this, instead of re-inventing the wheel. https://github.com/geerlingguy/ansible-role-postgresql
I have an issue related to set_fact on cms tasks:
TASK [scm : set_fact] **************************************************************************************************************************************************************************************************************
fatal: [admin-cdh-dev.intra.local]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'admin-cdh-dev.intra.local'\n\nThe error appears to be in '/home/ansible/playbooks/oat/roles/scm/tasks/cms.yml': line 9, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- set_fact:\n ^ here\n"}
PLAY RECAP *************************************************************************************************************************************************************************************************************************
admin-cdh-dev.intra.local : ok=67 changed=5 unreachable=0 failed=1 skipped=24 rescued=0 ignored=0
edge01-cdh-dev.intra.local : ok=42 changed=3 unreachable=0 failed=0 skipped=14 rescued=0 ignored=0
master01-cdh-dev.intra.local : ok=42 changed=3 unreachable=0 failed=0 skipped=14 rescued=0 ignored=0
master02-cdh-dev.intra.local : ok=42 changed=3
All nodes are registred to cloudera servers:
Connexion : vendredi 22 janvier 2021 à 22:23:47 CET de master01-cdh-dev.intra.local sur pts/7
[ansible@master01-cdh-dev ~]$ curl -s -H "Accept: application/json" -H "Content-Type: application/json" --user "admin:admin" http://admin-cdh-dev.intra.local:7180/api/v33/hosts
{
"items" : [ {
"maintenanceOwners" : [ ],
"hostId" : "1b7229a2-5ec3-4b65-990f-8011bbd079cf",
"ipAddress" : "10.181.24.153",
"hostname" : "admin-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/1b7229a2-5ec3-4b65-990f-8011bbd079cf",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 2,
"numPhysicalCores" : 2,
"totalPhysMemBytes" : 8201801728
}, {
"maintenanceOwners" : [ ],
"hostId" : "b76b87ef-dfe2-488c-8305-3bc7d17658d1",
"ipAddress" : "10.181.24.151",
"hostname" : "edge01-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/b76b87ef-dfe2-488c-8305-3bc7d17658d1",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 8,
"numPhysicalCores" : 8,
"totalPhysMemBytes" : 33566892032
}, {
"maintenanceOwners" : [ ],
"hostId" : "1a573cd3-e32a-4339-b588-956927be2d50",
"ipAddress" : "10.181.24.145",
"hostname" : "master01-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/1a573cd3-e32a-4339-b588-956927be2d50",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 4,
"numPhysicalCores" : 4,
"totalPhysMemBytes" : 16657203200
}, {
"maintenanceOwners" : [ ],
"hostId" : "3fb6cb06-a560-4401-a854-0a82ab349cf8",
"ipAddress" : "10.181.24.146",
"hostname" : "master02-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/3fb6cb06-a560-4401-a854-0a82ab349cf8",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 4,
"numPhysicalCores" : 4,
"totalPhysMemBytes" : 16657203200
}, {
"maintenanceOwners" : [ ],
"hostId" : "2805e250-f367-4a83-a003-201d850f0780",
"ipAddress" : "10.181.24.147",
"hostname" : "master03-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/2805e250-f367-4a83-a003-201d850f0780",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 4,
"numPhysicalCores" : 4,
"totalPhysMemBytes" : 16657203200
}, {
"maintenanceOwners" : [ ],
"hostId" : "a397854a-9024-4cc2-9c8d-4574301622cc",
"ipAddress" : "10.181.24.152",
"hostname" : "utility-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/a397854a-9024-4cc2-9c8d-4574301622cc",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 8,
"numPhysicalCores" : 8,
"totalPhysMemBytes" : 33566892032
}, {
"maintenanceOwners" : [ ],
"hostId" : "8009ebf7-5482-4ed3-8ab8-bd941d0033aa",
"ipAddress" : "10.181.24.148",
"hostname" : "worker01-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/8009ebf7-5482-4ed3-8ab8-bd941d0033aa",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 8,
"numPhysicalCores" : 8,
"totalPhysMemBytes" : 33566892032
}, {
"maintenanceOwners" : [ ],
"hostId" : "01f4a1c0-e4ca-47c1-aafc-0825d69cfe94",
"ipAddress" : "10.181.24.149",
"hostname" : "worker02-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/01f4a1c0-e4ca-47c1-aafc-0825d69cfe94",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 8,
"numPhysicalCores" : 8,
"totalPhysMemBytes" : 33566892032
}, {
"maintenanceOwners" : [ ],
"hostId" : "eb80d0f6-6a52-4eaa-a52c-31b71c34d84f",
"ipAddress" : "10.181.24.150",
"hostname" : "worker03-cdh-dev.intra.local",
"rackId" : "/default",
"hostUrl" : "http://admin-cdh-dev.intra.local:7180/cmf/hostRedirect/eb80d0f6-6a52-4eaa-a52c-31b71c34d84f",
"maintenanceMode" : false,
"commissionState" : "COMMISSIONED",
"numCores" : 8,
"numPhysicalCores" : 8,
"totalPhysMemBytes" : 33566892032
} ]
Trying to figure out any logical explanation on this however i am struggling to understand why it's blocking on cms template creation.
By the way my current setup is to set krb5_kdc_type
to none
and removing krb5_server
from the hosts, however that lead me to an issue issue#70 and to solve it i only ended up adding krb5_server
to host while keeping the krb5_kdc_type: 'none'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.