Coder Social home page Coder Social logo

projman's People

Contributors

jesusaurus avatar missaugustina avatar omgjlk avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

projman's Issues

Add Nodepool monitoring to data dog

From @missaugustina on November 18, 2016 23:43

Currently need to know if nodepool is running or not. Nodepool provides monitoring data via statsd. We don't yet know what additional monitoring data we want.

Addtionally: are nodepool images building?

Copied from original issue: BonnyCI/hoist#62

deploy user ssh keys need to live somewhere

From @gandelman-a on November 15, 2016 22:52

this could be part of BonnyCI/hoist#18

when provisioning a bastion from scratch, the deploy user (currently 'cideploy') needs ssh keys generated or installed so that it can reach the infrastructure its managing (the zuul and nodepool nodes)

we can add this to secrets.yml (or whatever that turns into), but the cideploy user is managed as a normal bastion_user in roles/bastion/, and not as a special cased deploy user with ssh key management.

Copied from original issue: BonnyCI/hoist#33

Figure out production monitoring

From @missaugustina on November 23, 2016 21:41

What metrics and monitoring would this need in production?

So far we've come up with:

  • zuul: merger count
  • nodepool:
    • total instance count
    • ratio of (building + deleting) to (ready + in-use) instances

Copied from original issue: BonnyCI/hoist#102

Improve pip role

From @j2sol on November 14, 2016 20:55

There are a couple issues with pip role that should be addressed:

  • pip install script is downloaded and executed without any validation
  • pip install script is ran every playbook execution

We should discuss how to address these in the near future.

Copied from original issue: BonnyCI/hoist#20

Figure out system log publishing

From @missaugustina on November 23, 2016 21:38

where are the logs for our stuff that's running?

  • system logs (ansible crons)
  • daemon logs (nodepool, zuul)
  • drop debug logs (too noisy)
  • Spike on aggregating logs on a single host
  • Consider an ELK stack in addition
  • Tune log parsing for our service logs

Copied from original issue: BonnyCI/hoist#99

need to validate deployment against a gerrit

From @gandelman-a on November 21, 2016 21:39

before messing with github integration, we need to point the CI deployment at an existing gerrit to validate basic functionality of v2.5: event processing, tests running, results posting, nodepool nodepooling.

nibz suggests we can use review.portbleu.com as a sandbox. we need a bot user created there and a preferably a project to run against.

Copied from original issue: BonnyCI/hoist#75

zuul worker nodes fall off the radar

From @j2sol on November 30, 2016 21:24

Something we're doing in our fast and furious debugging is causing existing nodepool nodes which are zuul workers for given jobs (like echo-true) are falling off the radar. Nodepool knows about them, but zuul does not and jobs will sit indefinitely until we manually boot the nodepool node. We're not doing something right here.

Copied from original issue: BonnyCI/hoist#147

Get a Blue Box Cloud

We need a cloud to run our services on (public facing) as well as capacity for node pool. Our options so far have not panned out, a blue dollar Blue Box Cloud may be just what we need.

We have to figure out sizing of it first.

Improve security group settings

From @j2sol on November 22, 2016 21:55

Currently every CI node we create in create-hosts.yml is getting the default security group. Additionally the default group has been set to allow incoming port 22 and 80 across the board, which is inappropriate.

If we must have 22, I propose we create a new "default" security group for our systems that opens 22, and we create additional security groups that are appropriate for each node that may have a service that requires external to the private network incoming connections to a particular service.

Copied from original issue: BonnyCI/hoist#91

Design sign up workflow

What will the user see and how will they sign up to use the service?

Original spec is that it will be manual, just like Zuul.

Create a comprehensive onboarding document for contributors

Per issue #18 a README was created but it is far from complete. To consider this issue resolved, the onboarding document must:

  • contain step by step instructions for creating a development environment

  • docs for setting up environment with a vm

  • docs for setting up environment with docker

  • explain the code submission and review process

  • provide guidance for code submissions

  • how to add a repository to BonnyCI

  • provide community details including where to talk about issues, community meetings, or other areas where project communication happens

  • be tested out by someone other than the committer

Nodes will eventually run out of IPv4 addresses

ssh proxy, v6 tunnel (or non v6 tunnel), or grandstand about having enough public v4

  • upstream doesnt support running without public ip's everywhere, so IMO as long as its super low cost then we're fine to do it (see if config files or something super cheap works) but we shouldnt rely on anything requiring much effort which is going to make us diverge
  • please let's not rely on proxies if we can help it
  • someone should just figure out how easy it is to ssh proxy with paramiko though because that could solve this real quick
  • expense ovh nodes
  • currently using private keys

Test the CI

  • Syntax check via TravisCI
  • Zuul layout check
  • Run CI twice, check changed is what we expect (might not be 0)

Random non-idempotent ansible tasks

As we are right now, every run of system-ansible has 14 changes at stable state, and cideploy has 5 (3 on one host, 2 on another). It'd be nice to get that down to zero.

I enabled even more verbose logs for a minute, and here are links to those, which should help figure out before/after things.

System Ansible: http://contrasjc-bastion.portbleu.com/cron-logs/system-ansible/ansible_bastion.yml_20161203031908.log

cideploy: http://contrasjc-bastion.portbleu.com/cron-logs/cideploy/ansible_install-ci.yml_20161203033129.log

Improve secrets management

From @j2sol on November 14, 2016 20:46

Currently we have a plain text secrets file that we hand copy from bastion to bastion. This is a bit scary as it sits in plain text on whatever filesystem it happens to be copied to.

I propose we instead encrypt this file with ansible vault. This allows the file to remain encrypted at rest, and we simply have to write a passphrase to the filesystem of the bastions so that cron executed jobs can read the file. Any backup we make of the file will be encrypted.

We'll put aside discussion of where to store/backup this file for a later discussion.

The new bastion flow would be:

  1. Use hoist to boot new instance with an ssh key you can use
  2. Log into new instance and put secrets file in place and write passphrase to expected path
  3. execute remainder of hoist on the bastion, enabling it to self-update

As new people are on-boarded we can communicate the vault passphrase to them. As people leave, we can rotate the secrets inside the vault, and rekey the value itself, communicating the new passphrase to the remainders.

Part of this is we will need to track all the secrets and use Ansible to write them out to the filesystem, rather than hand copying a bunch of files.

  • deploy ssh keys

  • clouds.yaml

Copied from original issue: BonnyCI/hoist#18

REMAINING WORK:

  • Back it up
  • Track changes (at least indicate when it was edited and what was changed, even if old value is not tracked)

Document Planning Meeting Outcome

We spent Nov 7-10 together talking about our team and what we want to build. Many notes were taken and even though the majority of folks are pretty clear on what we're doing there still seems to be confusion. To that effect, we should put together an official BonnyCI MVP document that highlights the key points of these discussions and use that as a basis for future discussions.

Zuul v3: Fix Tests

Zuul v3 has tests but they don't work. The first step in understanding Zuul v3 and fixing the tests is to group the current tests by feature. This will allow us to more obviously identify coverage gaps, audit redundancies, and enable a more focused deep dive into the Zuul code base. Once we've grouped the tests, individuals can assign themselves test groups to dive in on.

  • Organize tests into groups
  • Create storyboard tickets for the test groups

Hoist -> GHE

Move hoist to ghe

  • add API key to Hoist secrets so we can clone
  • need to figure out how to link issues and metadata (auggy)
  • delete from github

First-time setup of inventory hosts from bastion cron causes failure (known_hosts)

From @rattboi on November 16, 2016 2:13

When ansible attempts to connect for the first time to a host in the inventory on a new bastion, the known_hosts file is missing entries for these hosts, and attempts to interactively ask if you're ok with the host key.

Since this is from cron, the interactive aspect fails, and so the hosts are never properly registered.

Note that this is default behavior in ansible.

More info here: http://docs.ansible.com/ansible/intro_getting_started.html#host-key-checking

Copied from original issue: BonnyCI/hoist#38

Hoist -> GHE

Move hoist to ghe

  • add API key to Hoist secrets so we can clone
  • need to figure out how to link issues and metadata (auggy)
  • delete from github

Validate BonnyCI Deployment

  • forked Zuul with Github integration patches applied

  • runs automatically when PR submit/reopen

  • Gerrit sandbox no-op

  • Github sandbox no-op

  • Github echo true

  • Redeploy from scratch

  • Hoist on BonnyCI

  • Github repo: Basic Python repo with tests

  • Github repo: Python Crypto

  • Github repo: Basic repo, non-python

Document system design and architecture

Document system workflow, where things are and where to look for issues (almost time to design user experience, we should write down what we want not what we do)

solve ability to define jobs in-repo

Right now any project that gets gated needs to have its gate tests defined in hoist but we want a project to have its own tests similar to the travis.yaml model.

Someone who knows Zuul needs to review our Gearman implementation

From @missaugustina on November 23, 2016 21:49

gearman-job-server role runs gearman as a daemon so we can monitor instead of having zuul boot it, outside of zuul. Justification is that looking at mergers requires a check to see how many are listening, if zuul controls this the monitoring could be flaky.

There is a possible issue with gearman job server vs geard. Geard doesn't work the way zuul expects it to so we need to use the python one.

Question: Can you tell zuul to use a separate gear? it's been working so far...

Copied from original issue: BonnyCI/hoist#104

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.