Coder Social home page Coder Social logo

projectatomic / atomic-host-tests Goto Github PK

View Code? Open in Web Editor NEW
17.0 17.0 20.0 918 KB

A collection of single-host tests for Atomic Host

License: GNU General Public License v3.0

Python 24.12% Shell 23.03% Makefile 1.00% Dockerfile 36.59% Jinja 15.26%

atomic-host-tests's People

Contributors

ashcrow avatar cevich avatar cgwalters avatar chuanchang avatar dustymabe avatar guillaumevincent avatar jlebon avatar kbidarkar avatar miabbott avatar mike-nguyen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atomic-host-tests's Issues

refactor the 'improved-sanity-test' to run on only HEAD

Our testing strategy with the improved-sanity-test has always had a known drawback: if a problem exists in the HEAD-1 commit, the whole test fails and we miss out on testing the bits in the HEAD commit.

We've been able to live with this and workaround it via manual testing, but this is not going to scale.

It's time to re-think the test suite to test only the HEAD commit. This will require the test suite to synthesize a commit to upgrade to and may not be a 'real-world' upgrade test, but I believe the trade-off is worth it for the coverage we gain.

configure PAPR to run tests in PRs, not just 'improved-sanity-test'

Our PAPR implementation is pretty rudimentary - it will run the improved-sanity-test on multiple platforms, which is a start. But often new PRs are introducing new roles/tests that aren't covered by the improved-sanity-test.

I'm thinking we need a helper script to determine if a test has been changed or added, then run that test.

This might be a little tricky if the PR just modifies a role (although, I guess you could grep to see which test used said role), but we can always fallback to just running improved-sanity-test

Tagging @jlebon because I feel like he may have helped solve this problem elsewhere.

add tests for sssd container install/run

In the most recent RHELAH release, we got bit by a bug (RHBZ #1454292) where the sssd container would not start when using the atomic run command.

We should add a check to the improved-sanity-test that we can use atomic install and atomic run with the sssd container. I think we can do this for only the HEAD version.

This would enable regular testing of the container on our continuous streams where atomic is built from git master.

update rpm ostree status and verify role to use json_query

The current rpm ostree status and verify roles check assumes only two deployments while setting the booted deployment fact. With livefs, there can be three deployments so its probably a good idea to update these roles to use json_query.

It will probably look something like:


- name: Get rpm-ostree status output
  command: rpm-ostree status --json
  register: ros_status

- name: Import JSON
  set_fact:
    ros_json: "{{ ros_status | from_json }}"

- name: Get matching list entry
  set_fact:
    booted_deployment: "{{ item }}"
  with_items: "{{ ros_json | json_query(query) }}"
  vars:
    query: "[?booted=='true']"

Then the individual properties can be grabbed using booted_deployment['checksum'], booted_deployment['origin'], etc. This needs to be compatible with the tests that use the role already unless all the tests are updated also.

Branching strategy for supporting multiple versions of Ansible

As seen in #100, we will eventually be in a future where Ansible 2.1 is old and Ansible 2.2 will be the default installed on all the new hosts.

I wonder if a potential solution is to maintain multiple branches of the code, one for each version of Ansible we decide to support. I'd say initially we would have two branches: ansible_2_1 and ansible_2_2

Would like to hear what other folks think about this idea.

callback plugin needs to support Ansible 2.2

seems like fd39cc9 generates output that isn't very helpful trying to figure out which test failed. Example snippet below:

changed: [vm1] => (item=/usr/bin/ostree)
changed: [vm1] => (item=/usr/bin/rpm-ostree)
changed: [vm1] => (item=/usr/libexec/rpm-ostreed)
fatal: [vm1]: FAILED! => {
    "changed": true, 
    "cmd": "docker rm -f $(docker ps -aq)", 
    "delta": "0:00:00.087934", 
    "end": "2017-03-06 03:09:19.331074", 
    "failed": true, 
    "rc": 1, 
    "start": "2017-03-06 03:09:19.243140", 
    "warnings": []
}

STDERR:
---
"docker rm" requires at least 1 argument(s).
See 'docker rm --help'.

Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]

Remove one or more containers
---
...ignoring
ok: [vm1] => (item=docker)

openshift-ansible-test fails for fedora 25 atomic host for openshift origin 3.6

Aim:
Trying to test openshift origin 3.6 release
using master branch of atomic-host-tests

Environment :
Fedora 25 atomic host on libvirt
image_src:
https://ci.centos.org/artifacts/fedora-atomic/f25/images/fedora-atomic-25.84-fe4aabcd9a1e012.qcow2

Command used:

ansible-playbook -vvvv -i /tmp/linchpintest/inventories/libvirt.inventory tests/openshift-ansible-test/main.yml

Inventory used:

[example]
192.168.124.44 hostname=192.168.124.44 ansible_ssh_user=admin ansible_ssh_private_key_file=/root/.ssh/ex ansible_become=true

[all]
192.168.124.44 hostname=192.168.124.44 ansible_ssh_user=admin ansible_ssh_private_key_file=/root/.ssh/ex ansible_become=true

Note:
I have made minor changes to cluster inventory such as version and image tag, private key

output :

FAILED - RETRYING: HANDLER: openshift_master : Verify API Server (2 retries left).
FAILED - RETRYING: HANDLER: openshift_master : Verify API Server (1 retries left).
fatal: [192.168.124.44]: FAILED! => {
    "attempts": 120, 
    "changed": false, 
    "cmd": [
        "curl", 
        "--silent", 
        "--tlsv1.2", 
        "--cacert", 
        "/etc/origin/master/ca-bundle.crt", 
        "https://192.168.124.44:8443/healthz/ready"
    ], 
    "delta": "0:00:01.392229", 
    "end": "2017-07-19 20:17:21.601212", 
    "failed": true, 
    "rc": 0, 
    "start": "2017-07-19 20:17:20.208983", 
    "warnings": []
}

STDOUT:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "User \"system:anonymous\" cannot \"get\" on \"/healthz/ready\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

PLAY RECAP *********************************************************************
192.168.124.44             : ok=401  changed=70   unreachable=0    failed=1   
localhost                  : ok=11   changed=0    unreachable=0    failed=0   


Failure summary:

  1. Host:     192.168.124.44
     Play:     Configure masters
     Task:     openshift_master : Verify API Server
     Message:  ???
---

STDERR:
---
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
---
        to retry, use: --limit @/home/srallaba/workspace/venvs/libvirt/ansos/atomic-host-tests/tests/openshift-ansible-test/main.retry

Any help on why its failing is highly appreciated.
Thanks

added AVC denial check to 'improved-sanity-tests'

We should put some checks in the improved-sanity-test to look for AVC denials in the journal.

Offhand, I'd say do a check before and after every boot/reboot. This should let us catch any denials that happen during boot or any that were silently ignored before the system reboots.

cc: @dustymabe

add a kubernetes test to 'sanity' suite

Our current set of tests in the 'sanity' suite does not include any kube tests.

We have separate test suites for Kube + OpenShift Origin, but they currently don't get run often enough to catch any regressions.

make updates regarding Vagrant boxes

We have a Vagrantfile in the repo and minimal documentation in the README about how to use it, but neither have been updated in a long while.

Both need to be updated to address the current streams we are supporting and the change to move away from the HEAD-1 requirement.

cleanup and consolidate roles

I was stepping (via --step) through the improved-sanity-tests and examining the test itself by hand and realized that we have more than a few instances of duplication in our roles directory.

We should attempt to identify roles that can be combined and made more flexible via passed in parameters.

Things that jump right out:

  • docker_build_httpd & docker_build_tag_push
  • docker_pull_base_image & docker_pull_run_remove
  • docker_rm_httpd_conainer could be generalized
  • maybe some of the verify missing/present roles

versioning scheme for tests

We have encountered situations where users have run the tests from git master against older releases and the tests do not run successfully. This is usually because the tests assume certain packages will be installed or certain files will be present.

This seems to push us in the direction of needing tags/releases for our tests.

The big question is how to setup a versioning scheme that works for the multiple streams we claim support for.

Do we use time based releases? A monthly release would let us claim support for the AH releases that landed during that month.

Or tag according to well defined AH releases? Tagging a release is relatively inexpensive, so we could create tags for each RHELAH release or each Fedora AH release.

Looking for ideas and input, please feel free to comment.

ansible.plugins.callback.default.CallbackModule object at 0x7f45e46a0b10>): '_ansible_no_log'

Been seeing this error while working on additional roles:

[WARNING]: Failure using method (v2_runner_on_unreachable) in callback plugin (<ansible.plugins.callback.default.CallbackModule object at 0x7f45e46a0b10>): '_ansible_no_log'

...and it appears to be hiding an underlying error in the Ansible role/task.

I edited callback_plugins/default.py and removed the check for _ansible_no_log and the real error in my code was revealed:

The conditional check '{{ remote_name|length }} == 0' failed. The error was: error while evaluating conditional ({{ remote_name|length }} == 0): 'remote_name' is undefined

The error appears to have been in '/home/miabbott/workspaces/projectatomic/atomic-host-tests/roles/rpm_ostree_rebase/tasks/main.yml': line 47, column 5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- block:
  - name: Rebase to commit on refspec
    ^ here

In this case, the remote_name variable was not defined.

Not sure if this is something we can work around in the plugin?

i-s-t still relies on booted commit having a parent

coreos/rpm-ostree#899

TASK [Get checksum of HEAD-1] **************************************************
fatal: [testnode]: FAILED! => {"changed": true, "cmd": ["ostree", "rev-parse", "7a0b4ce81d82a736b4ad923451336e96b2cad5b300bcd18295bfd0860b14b449^"], "delta": "0:00:00.030643", "end": "2017-07-27 20:54:21.074234", "failed": true, "rc": 1, "start": "2017-07-27 20:54:21.043591", "stderr": "error: Commit 7a0b4ce81d82a736b4ad923451336e96b2cad5b300bcd18295bfd0860b14b449 has no parent", "stderr_lines": ["error: Commit 7a0b4ce81d82a736b4ad923451336e96b2cad5b300bcd18295bfd0860b14b449 has no parent"], "stdout": "", "stdout_lines": []}
	to retry, use: --limit @/var/tmp/checkout/atomic-host-tests/common/ans_ah_head-1_deploy.retry

(Full ephemeral logs available at: https://s3.amazonaws.com/aos-ci/ghprb/projectatomic/rpm-ostree/a35e47ae7c4764a54cd9851774bb4661dc74322f.3.1501188509707110020/output.log).

Choose version of Ansible to target

Currently, tests have been developed using Ansible 1.9.x because that was what we had been using with the original tests.

Ansible 2.x brings a number of new features and support for new modules, perhaps most interesting is better support for Docker.

Support JSON output from 'rpm-ostree status'

When coreos/rpm-ostree#315 lands, we'll have the ability to consume the output from rpm-ostree status in JSON format.

There are a number of places in the existing tests that use awk to pull out information from the rpm-ostree status output, so we'll need to modify them to check for the --json option and act accordingly.

A quick git grep reveals:

$ git grep -l -e 'rpm-ostree status' --and -e 'awk'
common/collect_data.yaml
common/compare_version.yaml
common/multiple_rollback.yaml
common/multiple_rollback_reboot.yaml
tests/new-image-smoketest/main.yaml
tests/new-tree-smoketest/main.yaml
tests/rollback-interrupt/main.yaml
tests/upgrade-interrupt/main.yaml

Need some livefs tests

Probably could be an entire test suite by itself, but we should try to get some simple coverage into the sanity tests.

Hook up to merge bot

We should hook up homu to this repo. The issue with "Squash and merge" is that you lose the semantics of separate commits and have a non-sensical commit msg that joins all the msgs. Commit msgs from fixup! commits also get in, which is yucky.

Let's install a basic testsuite that homu can gate on (I'm ignoring the existing PR tests we have now since as I mentioned before, it's conceptually not really appropriate for gating PRs). And from there, hook up homu.

consider adding retry logic to operations that require external networks

We've seen failures in the tests due to connection timeouts or abrupt resets of connections when the tests are doing things like atomic pull or docker pull

We might be able to workaround some of these transient failures by just using the retries parameter in the roles.

Example:

- name: Pull busybox
  command: atomic pull busybox
  register: pull
  retries: 6
  delay: 30
  until: pull.rc == 0

consolidate the 'set_fact' statements into a single role

We have a number of set_fact statements littered in the roles directory:

[~/workspaces/projectatomic/atomic-host-tests (master)*]$ git grep set_fact roles/ | wc -l
31

I'd like to get these consolidated into a single role, in order to eliminate any hunting around for where the fact is set or should be set.

fix 'docker' test to handle additional 'docker-latest' requirements

Seen on CAHC, the version of docker-latest (docker-latest-1.13-28.git6cd0bbe.el7.x86_64) has additional lines in /etc/sysconfig/docker that need to be uncommented for everything to work smoothly.

See below:

# docker-latest daemon can be used by starting the docker-latest unitfile.
# To use docker-latest client, uncomment below lines
#DOCKERBINARY=/usr/bin/docker-latest
#DOCKERDBINARY=/usr/bin/dockerd-latest
#DOCKER_CONTAINERD_BINARY=/usr/bin/docker-containerd-latest
#DOCKER_CONTAINERD_SHIM_BINARY=/usr/bin/docker-containerd-shim-latest

Need to use IP(s) rather than the hostname for reboot based tests

Any test that requires a reboot, for example pkg-layering requires IP(s) rather than a short hostname, otherwise they fail after the reboot.

TASK [reboot : restart hosts] **************************************************
ok: [host1]
ok: [host2]

TASK [reboot : wait for hosts to come back up (inventory_hostname)] ************
fatal: [host2 -> localhost]: FAILED! => {"changed": false, "elapsed": 120, "failed": true, "msg": "Timeout when waiting for host2:22"}
fatal: [host1 -> localhost]: FAILED! => {"changed": false, "elapsed": 120, "failed": true, "msg": "Timeout when waiting for host1:22"}
	to retry, use: --limit @/home/kdas/tmp/gotun/atomic-host-tests/tests/admin-unlock/main.retry

PLAY RECAP *********************************************************************
host1                      : ok=25   changed=15   unreachable=0    failed=1   
host2                      : ok=25   changed=15   unreachable=0    failed=1   

 [WARNING]: Consider using get_url or uri module rather than running curl
 [WARNING]: Consider using yum, dnf or zypper module rather than running rpm

If we use the IP directly instead of host1 or host2, then the tests pass successfully.

k8-cluster test is failing on Fedora 25 AH

I believe it is failing due to how the IP addressing is dealt out to containers. We assume the DB is going to be at a specific address and I don't think that is working any more.

fatal: [10.8.174.97]: FAILED! => {
    "changed": true, 
    "cmd": "curl http://localhost:80/cgi-bin/action | grep \"RedHat rocks\"", 
    "delta": "0:00:00.017701", 
    "end": "2017-03-21 16:43:31.794328", 
    "failed": true, 
    "rc": 1, 
    "start": "2017-03-21 16:43:31.776627", 
    "warnings": [
        "Consider using get_url or uri module rather than running curl"
    ]
}

STDERR:
---
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   527  100   527    0     0  49801      0 --:--:-- --:--:-- --:--:-- 52700
---

When I get on the system and try to curl that URL:

# curl http://localhost:80/cgi-bin/action
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 root@localhost to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

Special case Fedora 26 selinux context for /usr/bin/docker-storage-setup

Container-storage-setup is used in F26 which makes /usr/bin/docker-storage-setup a symlink with the selinux context of system_u:object_r:bin_t:s0. Dan Walsh said "docker-storage-setup is a symbolic link to container-storage-setup So bin_t is fine."

The selinux_verify role currently checks for container_runtime_exec_t so there needs to be a special case for Fedora 26.

Explore logging all playbook output

I could imagine a scenario where after a failure, one would like to see the output of all the roles/tasks leading up to the failure. In the current implementation/configuration of the playbooks, this is currently not possible because the STDOUT from all the tasks is kept hidden.

Alternately, having the output logged to a file, we could export it to other locations for archival purposes or perhaps feed it into something like Kibana for analysis.

It could be as simple as using the log_path variable in ansible.cfg to start.

A more complete approach might be to use a callback plugin to log the output in a more human-readable form discussed here:

http://blog.cliffano.com/2014/04/06/human-readable-ansible-playbook-log-output-using-callback-plugin/

https://gist.github.com/dmsimard/cd706de198c85a8255f6

https://github.com/n0ts/ansible-human_log

https://github.com/jinesh-choksi/ansible-human_log

Make the k8-cluster test more generic

While the reviews continue on #30, I wanted to capture what I found about using upstream containers for Fedora/CentOS.

Google has a registry that has the necessary containers, but they don't tag their containers with latest like the rest of the world (AFAIK).

Based on the comment here kubernetes/kubernetes#11751 (comment), you can curl a special URL that returns the latest stable release of the containers. Then using that tag, you can pull the containers.

$ curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt
v1.2.4
$ docker pull gcr.io/google_containers/kube-apiserver:v1.2.4
Trying to pull repository gcr.io/google_containers/kube-apiserver ... v1.2.4: Pulling from google_containers/kube-apiserver
52becac66096: Pull complete 
Digest: sha256:32901915205c4c3be4960638f8f6440b0c924d8d925ad18694db8e4f00036ed0
Status: Downloaded newer image for gcr.io/google_containers/kube-apiserver:v1.2.4

cc: @mnguyen-gh @jlebon

use 'roles_path' to centralize roles rather than symlinks

We currently symlink to the root roles directory from the test directories which use roles. While this works, it has always seemed hacky.

The roles_path variable could be used to centralize the location of the roles. We would just need to make some adjustments to ansible.cfg on systems running the playbooks.

rerun integration test suites from some components

There's a natural tension between a project like this and the per-project test suites we have in atomic, rpm-ostree etc. Particularly because we aren't running this on upstream PRs.

Anyways, I think a relatively simple thing we could do to avoid duplication would be to run some of the upstream integration tests here again. We'll need to be sure that the upstream suites don't actually rebuild the code, we just want to use them as a source of tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.