Coder Social home page Coder Social logo

clapper's People

Contributors

aathomas avatar anshulbehl avatar gaelrehault avatar johbro avatar jpeeler avatar jtaleric avatar knowncitizen avatar larsks avatar mandre avatar tomassedovic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clapper's Issues

--distribute isn't needed.

Just a small thing from the README:

`virtualenv --distribute .venv`

The distribute project has been merged into setuptools, so that option doesn't do anything anymore, and can be dropped.

Remove compute_node_connectivity.yaml

The validator calls "ip r" and parses for default routes. If you have multiple nics it fails because it tries to ping many IPs on the same command line.

This validator should be removed as the test is implemented in the Tripleo Heat Templates anyways.

Which is the correct port?

This isn't so much an issue as a question -

The doc in ansible-ports says that the API runs at port 5000, but the source of validation-api.py says it is port 5001 - both ports are a bit of trouble as 5000 is frequently used for keystone and 5001 is used as a work around in some of the setup files for TripleO - would it be too much trouble to change this default port to something much higher like 45000?

Update the validation stages

Currently, we have three stages: pre-deployment, network validation and post-deployment

However, some of the pre-deployment validations can (and should) be run before the undercloud is installed -- e.g. the hardware requirements.

So based on what we know now, I propose these stages (we need to come up with names for them):

  1. Before we even install the undercloud
    • validate the hardware
    • can be run from outside of the undercloud node
  2. Validate the undercloud installation
    • ping the gateway from undercloud.conf, rogue dhcp,
  3. Before hardware discovery
    • validate instackenv.json, check ipmi connectivity
  4. After hardware discovery
    • check the discovered hardware for firmware differences etc.
  5. Before the overcloud deployment
    • check the network environment setup, verify the heat templates & parameters for typos, check connectivity of the compute nodes, etc.
  6. After the overcloud deployment
    • neutron-sanity-check, open file limits, rabbitmq connections, NTP, etc.

The UIs would then use Mistral to run these groups in the right places, but we would have a good story for the manual verification as well. E.g.:

"Before installing the undercloud, clone validations locally and run ansible-playbook -i hosts stages/undercloud-hardware.yaml. Then install the undercloud, run the hardware validations again as well as stages/undercloud-installation.yaml. Next, write the instackenv.json and run the validations/instackenv.yaml, etc...."

What do you think?

Additional validations

As per the set of tests I have created for internal testing, here's a list of infrastructure components I'd like to be able to see:

  1. If the UC/OC were deployed with SSL, verify all endpoints are actually listening on SSL enabled ports
  2. check SELinux for errors all on nodes (I usually grep for AVC denials)
  3. check HAProxy (I usually curl the stat page and parse the output for errors)
  4. check Galera on all the nodes (mysql -e "SHOW STATUS LIKE 'wsrep%'"; and parse the output for problematic messages - wsrep_local_state_comment - must be in sync, wsrep_cluster_status, wsrep_cluster_size - must equal to the number of controllers etc etc)
  5. check pacemaker on all the relevant nodes (I look for failure messages in pcs status)
  6. check RabbitMQ (rabbitmqctl status and look for lines that start with "Error")
  7. check MongoDB
  8. Check Redis
  9. services status on all nodes. Will be even more relevant with composable roles, since the service list will match the service to node mapping in the deployment yaml
  10. check ceph (ceph health and ceph status are the commands I use)
  11. check keepalived (for the versions where it's relevant) - this one cna be tricky since there is no status command, I had to parse the config file and then verify the IPs and services were actually there using nc/telnet/curl

I can share the code for my tests internally if that will help

tripleo-ansible-inventory.py - overcloud IPs from Nova rather than Heat

'tripleo-ansible-inventory.py' returns hosts with IP addresses on the tenant network. While this may be an issue of the particular hardware setup under test, the tenant network is not always reachable from the Undercloud.
If Nova is used to get the Overcloud nodes as in:

nova = client.Client(2,
os.environ.get("OS_USERNAME"),
os.environ.get("OS_PASSWORD"),
os.environ.get("OS_TENANT_NAME"),
os.environ.get("OS_AUTH_URL"))
print {server.name: server.networks['ctlplane'][0] for server in nova.servers.list()}

and the IP addresses of the Overcloud nodes are pulled from the ctrlplane (provisioning network), these IPs will be reachable from the Undercloud by default.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.