Coder Social home page Coder Social logo

choregraphie's Introduction

choregraphie

choregraphie is French for choreography. By providing primitives to allow you to easily coordinate the convergence of Chef resources, choregraphie enables you to orchestrate the execution of actions that could cause downtime in clustered applications, among other things. For example, say you want to upgrade your Mesos cluster to the latest version but don't want to take the whole cluster offline. You could use an external orchestrator, but choregraphie means you can reduce the number of moving parts and keep all your logic and code in Chef.

By protecting your important resources with choregraphie, you can isolate risk to a single place, enabling much more controlled application of potentially dangerous changes.

Build Status

Concepts

A protected resource is a resource whose convergence can induce downtime on the service. For instance, service[mydatabase] is usually a resource to protect.

A choregraphie describes actions which operate on some chef events. It allows, for instance, to run an action before and after the convergence of a resource (currently: after means at the end of a sucessful run).

A primitive is a helper for common idioms in choregraphies. Examples: grabbing a lock, silencing the monitoring, executing a shell command.

Example

choregraphie 'my elasticsearch' do
  # protect against service and network restart
  on 'service[mydatase]'
  on 'service[network]'

  # protect against all reboot resources
  on /^reboot\[/

  on :weighted_resources # compatiblity with resource-weight cookbook

  # built-in primitive
  consul_lock(path: 'choregraphie/locks/myes', concurrency: 2)

  before do
    # roll your own code
    downtime_in_monitoring
  end
end

Support

Only chef >= 12.6 is supported (due to a dependency on :before notifications).

Usage of compat_resource cookbook is highly discouraged as it modifies chef behavior and has silently broken :before notification in the past which are the foundation of choregraphie. Branch 'criteo' in criteo-forks organization is a safely patched version of this cookbook to avoid any chef monkeypatching.

Choregraphies can be applied only on resources that support whyrun (currently chef default resources and resource/provider style). Custom resources (the whole resource defined in the resources/ directory) are not supported at the moment (see chef/chef#4537 for a discussion).

With chef >14, it's not possible anymore to hook on log resources.

Available Primitives

See the code for up-to-date information.

Four very basic primitives:

  • Before: before { ... } will execute code before protected resources are converged. The block will receive the converged resource as argument.
  • Cleanup: cleanup { ... } will execute code at the end of a successful chef-client run. The cleanup block will be executed at each chef-client run. This code should thus be efficient and safe to run at the end of all chef-client runs (for instance cleaning a file only if it exists).
  • After: after { ... } will execute code at the end of choregraphie. The after block will be executed at the end of choregraphie but might be executed after some chef-client run. It does best-effort to avoid running when not necessary, code run in after must cope with useless run though.
  • Finish: finish { ... } will execute code after cleanup stage. There can be only one finish block.

Slightly more advanced primitives:

  • CheckFile: check_file '/tmp/do_it' will wait until the given file exists on the filesystem. This file is cleaned after.
  • WaitUntil: wait_until "ping -c 1 google.com" will wait until the command exit with a 0 status. This primitives supports string, mixlib/shellout instance and blocks. One can specify to run the wait_until in "before" or "cleanup" stages using the options (see code for details).
  • ConsulLock: consul_lock {path: '/lock/my_app', id: 'my_node', concurrency: 5} will grab a lock from consul and release it afterwards. This primitive is based on optimistic concurrency rather than consul sessions. It uses finish block to release the lock ensuring that the lock release happens after all cleanup blocks. It is also possible to specify the :datacenter option to take the lock in another datacenter.
  • ConsulRackLock: consul_rack_lock {path: '/lock/my_app', id: 'my_node', rack: 'my_rack_id', concurrency: 2} will grab a lock from consul and release it afterwards. This has the same properties as ConsulLock but will allow in node to enter if another node with the same rack is already under the lock. Concurrency level is on the number of concurrent racks (not on concurrent nodes per rack).
  • ConsulMaintenance: consul_maintenance reason: 'My reason', token: 'foo' will enable maintenance mode on the consul agent before the choregraphie starts. consul_maintenance service_id: 'consul service_id', reason: 'My reason', token: 'foo' will enable maintenance mode on the consul service before the choregraphie starts.
  • ConsulHealthCheck: consul_health_check(checkids: %w(service:consul-http-agent service:myhealthcheck)) will block until consul health check is passing. By default it will wait for 150s before failing the chef run. ids for checkids are the composition of the check type and the id of the check (For ex. for service check myhealthcheck, id is service:myhealthcheck). Option servicescan be passed instead ofcheckidsto list all checks for the configuredservices`.
  • EnsureChoregraphie: ensure_choregraphie will make sure that another choregraphie is already protecting the resources, or wait for a file (an optional file path can be provided). This primitive is useful for cookbook providers to make sure users will protect some critical resources.

Note: all primitives interacting with consul require the diplomat gem. You can install it easily with consul cookbook.

Missing Primitives

Write your own, it is easy.

How to write a primitive

You should have a look at the example primitives such as check_file.

Primitives can implement two callbacks: before and cleanup. See the Primitives section above for more details.

choregraphie's People

Contributors

achamo avatar annih avatar antonofthewoods avatar brugidou avatar dlukman avatar jeremy-clerc avatar jmauro avatar kamaradclimber avatar komuta avatar mat-co avatar pierrecdn avatar pierresouchay avatar thomas-maurice avatar tionebsalocin avatar wdauchy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

choregraphie's Issues

Chef v17 compatibility

Can we get this fixed?

  ArgumentError
  -------------
  tried to create Proc object without a block

  Relevant File Content:
  ----------------------
  /var/cinc/cache/cookbooks/choregraphie/libraries/dsl.rb:

    1:  module Choregraphie
    2:    module DSL
    3:      # DSL helper
    4:      def choregraphie(name)
    5>>       Choregraphie.add(Choregraphie.new(name, &Proc.new))
    6:      end
    7:    end
    8:  end
    9:

  System Info:
  ------------
  chef_version=17.9.26
  platform=centos
  platform_version=7.8.2003
  ruby=ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux]
  program_name=/opt/cinc/bin/cinc-client
  executable=/opt/cinc/bin/cinc-client

Implement a rack consul lock.

Here is a proposal implementation:

  • Based on the current consul lock primitive mechanism
  • If choregraphie is of type "rack" add a field rack to the value
    {"version":1,"concurrency":30,"holders":{},"rack":""}
  • Take lock if rack empty or node rack is the same than the one in the rack field and number of holders < concurrency
  • During lock release:
  1. Remove host entry in the holders hash
  2. If holders empty set rack field to an empty string

This will ensure that protected resources are only executed on node in the same rack at a time but it won't ensure that all nodes from a rack are up to date before to update an other rack. It also permitts to have a max number of nodes in the same rack being updated in //.

Thanks to let me know your thoughts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.