Coder Social home page Coder Social logo

remind101 / empire Goto Github PK

View Code? Open in Web Editor NEW
2.7K 141.0 158.0 22.12 MB

A PaaS built on top of Amazon EC2 Container Service (ECS)

License: BSD 2-Clause "Simplified" License

Makefile 0.06% Go 98.02% Shell 0.39% Ruby 1.51% Dockerfile 0.02%
paas docker aws ecs

empire's Introduction

Empire

readthedocs badge Circle CI Slack Status

Empire

Empire is a control layer on top of Amazon EC2 Container Service (ECS) that provides a Heroku like workflow. It conforms to a subset of the Heroku Platform API, which means you can use the same tools and processes that you use with Heroku, but with all the power of EC2 and Docker.

Empire is targeted at small to medium sized startups that are running a large number of microservices and need more flexibility than what Heroku provides. You can read the original blog post about why we built empire on the Remind engineering blog.

Quickstart

Install

To use Empire, you'll need to have an ECS cluster running. See the quickstart guide for more information.

Architecture

Empire aims to make it trivially easy to deploy a container based microservices architecture, without all of the complexities of managing systems like Mesos or Kubernetes. ECS takes care of much of that work, but Empire attempts to enhance the interface to ECS for deploying and maintaining applications, allowing you to deploy Docker images as easily as:

$ emp deploy remind101/acme-inc:master

Heroku API compatibility

Empire supports a subset of the Heroku Platform API, which means any tool that uses the Heroku API can probably be used with Empire, if the endpoint is supported.

As an example, you can use the hk CLI with Empire like this:

$ HEROKU_API_URL=<empire_url> hk ...

However, the best user experience will be by using the emp command, which is a fork of hk with Empire specific features.

Routing

Empire's routing layer is backed by internal ELBs. Any application that specifies a web process will get an internal ELB attached to its associated ECS Service. When a new version of the app is deployed, ECS manages spinning up the new versions of the process, waiting for old connections to drain, then killing the old release.

When a new internal ELB is created, an associated CNAME record will be created in Route53 under the internal TLD, which means you can use DNS for service discovery. If we deploy an app named feed then it will be available at http://feed within the ECS cluster.

Apps default to only being exposed internally, unless you add a custom domain to them. Adding a custom domain will create a new external ELB for the ECS service.

Deploying

Any tagged Docker image can be deployed to Empire as an app. Empire doesn't enforce how you tag your Docker images, but we recommend tagging the image with the git sha that it was built from (any any immutable identifier), and deploying that.

When you deploy a Docker image to Empire, it will extract a Procfile from the WORKDIR. Like Heroku, you can specify different process types that compose your service (e.g. web and worker), and scale them individually. Each process type in the Procfile maps directly to an ECS Service.

Contributing

Pull requests are more than welcome! For help with setting up a development environment, see CONTRIBUTING.md

Community

We have a google group, empire-dev, where you can ask questions and engage with the Empire community.

You can also join our Slack team for discussions and support.

Auth Flow

The current authentication model used by emp login relies on a deprecated GitHub endpoint that is scheduled to be deactivated in November 2020. Therefore both the client and the server need to be updated to support the web authentication flow

The web flow works like this

  1. The user runs a command like emp web-login
  2. The client starts up a HTTP listener on a free local port
  3. The client opens a browser window on the local machine to $EMPIRE_API_URL/oauth/start?port=?????
    • The port parameter specifies where the client is listening
  4. The browser executes a GET against the URL
  5. The Empire server sees the request and constructs an OAuth request URL that will hit the GitHub OAuth endpoint and returns it as a redirect
  6. The browser makes the request to the GitHub auth endpoint, which shows the UI a request to authorize the application
    • If they've previously authorized it will just immediately grant the request
  7. GitHub redirects the browser back to the redirect URL specified in the configuration, meaning back to the Empire server
  8. The Empire server receives the browser request and can now perform the code exchange to turn the provided code into an actual authentication token
    • This is just like it would have received from the old endpoint. However, it's not usable yet because it still isn't in the possession of the client, only the browser
  9. The Empire server now redirects the browser back to localhost on the original port provided by the client
  10. The client receives the token, but can't use it directly. The Empire server expects it to be wrapped in a JSON Web Token that only the server can create.
  11. The client can now make a request directly to the Empire server (its first in this sequence) providing the token and requesting a JSON Web Token in response
  12. The client stores the received token just as it would have with the response to an emp login command
  13. The client is authenticated

In theory the Empire server could construct the JWT directly after the code exchange and push that directly to the client, but the abstraction doesn't really seem to easily support that flow

empire's People

Contributors

aengelas avatar baversjo avatar benguillet avatar bgentry avatar billwanjohi avatar bmarini avatar brianestlin avatar brianz avatar davidlu1997 avatar desmondmorris avatar dustinhorton avatar ejholmes avatar hamiltop avatar iserko avatar jherico avatar jonson avatar markpeek avatar mwildehahn avatar nosajool avatar phobologic avatar rgabo avatar rokcarl avatar russellballestrini avatar sdawson avatar srt32 avatar stretch96 avatar sumeet avatar tyrken avatar zabawaba99 avatar zachlatta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

empire's Issues

Add better integration test suite

It would be great if we:

  1. Had some happy day cases using the heroku-go client. Maybe we can do something with JSON schema to verify that responses are heroku api compatible.
  2. Maybe even some integration tests that shell out to the hk command to verify things work (probably only a handful of these).
  3. Possibly some tests that boot up a vagrant cluster and deploys acme-inc and tests that it's running. That might be difficult to run in CI though, so dunno if it's worth it.

Logging

This is not empire logging, but centralized logging for apps running in the minion cluster. We'll want a way to pipe these logs to other systems like sumologic and librato.

bablefish starts in failed state

-- Logs begin at Fri 2015-03-06 02:02:25 UTC. --
Mar 06 02:30:04 c1 systemd[1]: r101-bablefish.1.web.1.service: control process exited, code=exited status=1
Mar 06 02:30:04 c1 systemd[1]: Unit r101-bablefish.1.web.1.service entered failed state.
Mar 06 02:30:04 c1 systemd[1]: r101-bablefish.1.web.1.service failed.
Mar 06 02:30:04 c1 systemd[1]: r101-bablefish.1.web.1.service holdoff time over, scheduling restart.
Mar 06 02:30:04 c1 systemd[1]: Stopping r101-bablefish.1.web.1...
Mar 06 02:30:04 c1 systemd[1]: Starting r101-bablefish.1.web.1...
Mar 06 02:30:04 c1 systemd[1]: start request repeated too quickly for r101-bablefish.1.web.1.service
Mar 06 02:30:04 c1 systemd[1]: Failed to start r101-bablefish.1.web.1.
Mar 06 02:30:04 c1 systemd[1]: Unit r101-bablefish.1.web.1.service entered failed state.
Mar 06 02:30:04 c1 systemd[1]: r101-bablefish.1.web.1.service failed.

Haven't seen this one before:

start request repeated too quickly for r101-bablefish.1.web.1.service

Move API into it's own package.

The Heroku compatible API should just be a consumer of the empire package, with owns own App, Release, Dyno etc representations of things.

Consider a GitHub Deployments integration

Right now, we handle this with Shipr, but I think there would be a lot of value to having built in handling for github deployment events.

An integration might look something like this:

  1. Creating an app also adds a webhook to the github repo for deployment events, pointed at https://empire.remind.com/deploys/github.
  2. The /deploys/github endpoint would basically look the same as this where we:
    1. Resolve the git sha to an image id using the docker registry api.
    2. Trigger a Deploy using the DeploysService.

The primary advantage that Shipr provides right now is an abstraction around deployment, log storage from the build, and slack integration for deployment_status events. The deployment_status events could be split out of Shipr (and probably should be) into it's own project, and log storage is not an issue since there's no build output when deploying to empire.

Re-organize into sub directories

We should probably re-organize the root directory into a structure like this:

├── cluster
├── empire
│   ├── cmd
│   │   └── empire
│   ├── Dockerfile
│   └── README.md
├── etcd_peers
│   ├── Dockerfile
│   └── README.md
├── tests
├── README.md
└── Vagrantfile

At the very least, it'll make coming into the project a little less daunting.

hk dynos timezone bug

Clearly a UTC to PST issue:

ben@Bens-MacBook-Air:empire (master)
$ hk scale web=3 -a acme-inc
Scaled acme-inc to web=3:1X.
ben@Bens-MacBook-Air:empire (master)
$ hk dynos -a acme-inc
acme-inc.2.web.1    active    8h  "./bin/web"
acme-inc.2.web.2    unknown   8h  "./bin/web"
acme-inc.2.web.3    unknown   8h  "./bin/web"
ben@Bens-MacBook-Air:empire (master)
$ hk dynos -a acme-inc
acme-inc.2.web.1    active   8h  "./bin/web"
acme-inc.2.web.2    active   8h  "./bin/web"
acme-inc.2.web.3    active   8h  "./bin/web"

Old release not unscheduled?

Not sure how this happened, but I still have an old release running after 1 minute:

empire $ empire dynos -a acme-inc
acme-inc.2.web.1    active   5m  "./bin/acme-inc -port=$PORT"
acme-inc.3.web.1    active   4m  "./bin/acme-inc -port=$PORT"

Docker container GC

Just something to think about. We'll probably eventually need something to GC old unused containers.

empire_controller AWS image needs to build off base

Because we do some stuff in base (install consul, docker, etc) we need to make sure that the empire_controller AWS image builds off of it. This is a little more difficult because there's no artifact to push up into S3 for the build to work with.

Another option is to have the empire_controller boxes run the base setup script as well, ensuring that things are installed.

Better metadata support

See GH-120

Right now we hardcode 'role=empire_minion' on all jobs in Empire, but it'd be good if:

a. we could change that (with the flag, like @ejholmes mentions)
b. we could pass along more metadata, for better control over where things get scheduled.

Convert router docker container to use remind101/base as it's base.

Ubuntu has some 'known issues' with their docker images (maybe these have been resolved, I haven't dug into it) so most folks use phusion's baseimage. That's the base of our remind101/base image. It'd be good if the router docker image was based off of this as well.

See https://github.com/phusion/baseimage-docker for info on how to launch a daemon, etc.

You can look at any of the other of our images in github.com/remind101/docker_images for examples of using it.

Extract `package scheduler` into a `package container`

package container would be something that could potentially be used by other systems that want to schedule containers onto a cluster. The basic abstraction might look like:

type Image struct {
    Repo string
    ID   string
}

type Limits struct {
    // If provided, represents the maximum amount of bytes to allow the
    // container to use.
    Memory *int
}

type Container struct {
    // The name of the container
    Name string

    // Environment variables to set in the container
    Environment map[string]string

    // The command to run.
    Command string

    // The image to create the container from.
    Image Image

    // Any limits that this container should have.
    Limits Limits

    // Constraints represents constraints about what machine this container
    // is scheduled onto. The semantics of the keys and values depends on
    // the scheduler implementation.
    Constraints map[string]string
}

// ContainerState represents the state of a container in a cluster.
type ContainerState struct {
    *Container

    // The state of the container. "running", "failed", etc.
    State

    // The machine that the container is schedule on.
    Machine string
}

type Scheduler interface {
    // Schedule schedules containers onto the cluster.
    Schedule(...*Container) error

    // Unschedule unschedules containers from the cluster.
    Unschedule(...string) error

    // SetState sets the desired state of a container.
    SetState(string) error

    // ContainerStates returns the state of the containers in the cluster.
    ContainerStates() ([]*ContainerState, error)

    // ContainerState returns a ContainerState for the given container.
    ContainerState(string) (*ContainerState, error)

    // Restart restarts a container.
    Restart(string) error
}

And the goal would be to support fleet and swarm, and hopefully be generic enough to support both docker and rocket.

Proposal: Use app id instead of repo for deployment

I'm thinking it will be more convenient to reference the app id (assuming app Id will be a name and not a uuid, rename apps.ID to apps.Name?) when we deploy.

// Current
POST /apps { "repo":"remind101/r101-api" }
POST /deploys { "image":{ "id":"0123456789abcdef0123456789abcdef", "repo":"remind101/r101-api" } }

// Proposed
POST /apps { "id":"api", "repo":"remind101/r101-api" }
POST /deploys/api { "image":{ "id":"0123456789abcdef0123456789abcdef" } }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.