acalhounrh / automated_ceph_test Goto Github PK
View Code? Open in Web Editor NEWcombination of Jenkins, ceph benchmarking tools and scripts that enable a user to automate the testing of ceph
License: GNU General Public License v3.0
combination of Jenkins, ceph benchmarking tools and scripts that enable a user to automate the testing of ceph
License: GNU General Public License v3.0
in log of prepare_agent.sh:
patching file ceph-linode/launch.sh
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n]
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file ceph-linode/launch.sh.rej
We probably want to make agent names be handled in a purely case-insensitive way or purely case-sensitive way. Otherwise you wind up with master ~jenkins/agent_list like this one:
bash-4.4$ cat agent_list
bene_linode_jenkins_agent=li136-116.members.linode.com
Alex_Jenkins_Linode_Agent=li1032-230.members.linode.com
Bene_Linode_Jenkins_Agent=li1929-135.members.linode.com
And the other scripts only pick up the first match, I guess, because it failed to find the right IP address for my agent, which was Bene_Linode_Jenkins_Agent=li1929-135.members.linode.com.
For reasons I can't explain, in 1-Deploy_Ceph_Cluster.sh, wildcard like RHCEPH-*-x86_64-dvd.iso does not work for the ISO image name, but if you put in a non-wildcard name it works :-( Would really like the wildcard to work as it saves time - you only have to specify the URL for the directory containing the ISO image then (and the version_adjust_repo and ansible_version)
The EPEL repo interferes with selection of the right ansible version, so I had to disable it before installing ansible. Also, in cases where there are multiple versions of ansible supplied, we may have to explicitly name the ansible version, so I added a parameter for that (does nothing if left blank). Am trying this change on my own agent.
bengland@bene-laptop jobs]$ diff -u 1-Deploy_Linode_Ceph_Cluster.sh /tmp/ | more
--- 1-Deploy_Linode_Ceph_Cluster.sh 2018-11-09 12:49:30.983926816 -0500
+++ /tmp/1-Deploy_Linode_Ceph_Cluster.sh 2018-11-09 12:48:32.840526477 -0500
@@ -22,8 +22,19 @@
script_dir=$HOME/automated_ceph_test
inventory_file=$HOME/ceph-linode/ansible_inventory
-sudo yum remove ceph-ansible -y
+rm -fv $inventory_file $HOME/ceph-linode/LINODE_GROUP
+
+sudo yum remove -y ceph-ansible ansible
rm -rf /usr/share/ceph-ansible
+yum-config-manager --disable epel
+
+#specify version of ansible if necessary
+if [ -n "$ansible_version" ] ; then
+ yum install ansible-$ansible_version
+else
+ yum install ansible
+fi
+
cd $script_dir/staging_area/rhcs_latest/
new_ceph_iso_file="$(ls)"
@@ -60,9 +71,6 @@
sudo cp /usr/share/ceph-ansible/site.yml.sample /usr/share/ceph-ansible/site.yml
-#upgrade ansible if necessary
-yum upgrade -y ansible
-
#Start Ceph-linode deployment
cd $HOME/ceph-linode
echo "$Linode_Cluster_Configuration" > cluster.json
during a run of 1-Deploy_Linode_Ceph_Cluster.sh I got this error, have no idea how scripting could have caused this failure. It's an intermittent failure. Anyone else see it?
error: unpacking of archive failed on file /usr/lib/python2.7/site-packages/urllib3/packages/ssl_match_hostname: cpio: rename
Installing : python-chardet-2.2.1-1.el7_1.noarch 13/19
error: python-urllib3-1.10.2-5.el7.noarch: install failed
After updates the wget command for retrieving the target iso was removed from Deploy_Ceph_Cluster, this prevents the installation of ceph-ansilbe and ceph.
After linode updated, scripts need to be updated to support latest API.
The following files and directories should not be in the git tree for automated_ceph_test on the master or remote agent. For example, you want to be able to tell if anything was changed in automated_ceph_test and it's harder if the software places files into the source tree. We could put them in /var/tmp or anywhere else.
$ git status
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# ansible_inventory
# staging_area/rhcs_latest/
# staging_area/rhcs_old/
# staging_area/tmp/
it updates agent_list, but the 1A-Retreive_RHCS_ISO_for_Linode job does not read that, it reads linode_agent_list. Why have 2 agent lists? Just have 1 and then you can tell if it's a linode agent because it will have linode.com in the FQDN (fully qualified DNS domain name)
both ansible-2.4.2.0-2.el7.noarch and ansible-2.7.1-1.el7.noarch are complaining about ceph-linode/ansible_inventory file and won't run. However, if I remove the line "rgws" from the inventory file, it works. Or if I just add a group "[rgws]" with no members, that's ok too. Here's the errors I get:
+ ansible -m shell -a 'yum install -y wget yum-utils' all
...
[WARNING]: * Failed to parse /root/ceph-linode/ansible_inventory with ini
plugin: /root/ceph-linode/ansible_inventory:16: Section [servers:children]
includes undefined group: rgws
[WARNING]: Unable to parse /root/ceph-linode/ansible_inventory as an inventory
source
...
and here's the inventory file I got from ceph-linode:
[mdss]
mds-000 ansible_ssh_host=192.168.197.73 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='/root/.ssh/id_rsa'
[clients]
client-000 ansible_ssh_host=192.168.134.43 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='/root/.ssh/id_rsa'
[mgrs]
mgr-000 ansible_ssh_host=192.168.213.41 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='/root/.ssh/id_rsa'
[osds]
osd-000 ansible_ssh_host=192.168.197.178 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='/root/.ssh/id_rsa'
[mons]
mon-000 ansible_ssh_host=192.168.138.154 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='/root/.ssh/id_rsa' monitor_address=192.168.138.154
[servers:children]
osds
mons
mgrs
rgws
clients
This script tries to get the client.admin keyring but it doesn't work for RHCS 3.1. ceph-ansible succeeded.
ceph_client_key=/ceph-ansible-keys/ls /ceph-ansible-keys/ | grep -v conf
/etc/ceph/ceph.client.admin.keyring
cp $ceph_client_key /etc/ceph/ceph.client.admin.keyring
This doesn't work with RHCS 3.1. My suggestion is copy your status check script over to $monname and run it there, then you don't need to install Ceph on the Jenkins Agent at all, just ceph-ansible, right?
cp: cannot stat ‘/ceph-ansible-keys//etc/ceph/ceph.client.admin.keyring’: No such file or directory
Traceback (most recent call last):
File "/root/automated_ceph_test/scripts/utils/check_cluster_status.py", line 74, in
main()
File "/root/automated_ceph_test/scripts/utils/check_cluster_status.py", line 11, in main
new_client = ceph_client()
File "/root/automated_ceph_test/scripts/utils/check_cluster_status.py", line 58, in init
logger.exception("Connection error: %s" % e.strerror )
AttributeError: 'PermissionError' object has no attribute 'strerror'
the script tries to clone a redhat-internal git repo and can't do it.
The reason for this is that the bash script doesn't exit if a yum command fails, it just keeps on going.
In this case, that might have been the right outcome, because we probably don't need perf-dept repo unless you are on RH-internal servers, right? And what do we use these for? Is it running my ansible playbook to reset the HDDs? That wouldn't work on linode anyway.
So two AIs:
conditionalize use of perf-dept playbook for RH-internal servers (i.e. hostname contains redhat.com)
long-term - harden all our scripts to return a failure status if an error occurs
Cloning into 'perf-dept'...
fatal: unable to access 'http://git.app.eng.bos.redhat.com/git/perf-dept.git/': Could not resolve host: git.app.eng.bos.redhat.com; Unknown error
/root/automated_ceph_test/setup.sh: line 71: cd: perf-dept/sysadmin/Ansible: No such file or directory
/root/automated_ceph_test/setup.sh: line 74: ansible-playbook: command not found
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.