vulcand / vulcan Goto Github PK

[DEPRECATING] Development has moved to https://github.com/vulcand/oxy

License: Apache License 2.0

Makefile 0.21% Go 99.79%

vulcan's Introduction

Vulcand

Vulcand is a programmatic extendable proxy for microservices and API management. It is inspired by Hystrix and powers Mailgun microservices infrastructure.

Focus and priorities

Vulcand is focused on microservices and API use-cases.

Features

Uses etcd as a configuration backend.
API and command line tool.
Pluggable middlewares.
Support for canary deployments, realtime metrics and resilience.

Project info

documentation	https://vulcand.github.io/
status	Used in production@Mailgun on moderate workloads. Under active development.
discussions	https://groups.google.com/d/forum/vulcan-proxy
roadmap	roadmap.md
build status

Opentracing Support

Vulcand has support for open tracing via the Jaeger client libraries. Users who wish to use tracing support should use the --enableJaegerTracing flag and must either run the Jaeger client listening on localhost:6831/udp or set the environment variables JAEGER_AGENT_HOST and JAEGER_AGENT_POST. (See the Jaeger client libraries for all available configuration environment variables.)

When enabled vulcand will create 2 spans: one span called vulcand which covers the entire downstream request and another span called middleware which only spans the processing of the middleware before the request is routed downstream.

Aliased Expressions

When running vulcand in a kubernetes DaemonSet vulcand needs to know requests from the local node can match Host("localhost") rules. This --aliases flag allows an author of a vulcand DaemonSet to tell vulcand the name of the node it's currently running on, such that vulcand correctly routes requests for Host("localhost"). The --aliases flag allows the user to pass in multiple aliases separated by commas.

Example

$ vulcand --aliases 'Host("localhost")=Host("192.168.1.1")'

vulcan's People

Contributors

Stargazers

Watchers

vulcan's Issues

request.RequesURI is empty

We erase request.RequestURI in favor of Opaque data, but this makes access to URL.Query() impossible as it returns empty results.

Add way to build 3rd Party Vulcan Modules

Background

A common feature of web servers and reverse proxies is to allow 3rd parties to extend their capabilities with custom code.
This is difficult generally in Go, because there is no dynamic linking / dlopen.

Proposal

Users would create a main.go, and run go build to produce a single binary containing Vulcan and their custom modules.

Because of how Go works, Vulcan would be a library that users would import. An example, if I made two Vulcan Modules aka "vmod"s, one to implement PageSpeed compressions and another to store rate limiting data in Postgres, and wanted to enable them in my own build, I would make a very simple main.go, that imported each, and passed them to vulcan.service.RunWithModules:

package main

import (
    "github.com/mailgun/vulcan/service"
    "github.com/pquerna/vmod-pagespeed"
    "github.com/pquerna/vmod-postgres-ratelimit"
)

func main() {
    modules := []service.Vmod{
        vmod_pagespeed,
        vmod_postgres_ratelimit}
    service.RunWithModules(modules)
}

These modules would each have an interface that exposes middlewares to the Javascript API documented in #25. They could also register themselves to handle command line args, or to run at hooks during request processing.

The modules would implement something like the following interface:

type RegisterFunc func(s *vulcan.Service) error

type Vmod interface {
    RegisterFunc
}

vulcan.Service would be extended to have many more methods like AddMiddleware, AddCommand, AddRequestTransformer, AddLogger etc, which would each take a callback function.

Improved Logging

Logging Improvement Ideas

Use more structured format
Use case: Trace transactions from client through vulcan to upstream.
Outputs: Basic-JSON, systemd-journal, Logstash/Graylog.

Proxy Authentication Header Missing

When a request is issued, Vulcan fails to respond with the "Proxy-Authenticate" header defined by the RFC 2617 (section 3.6).

Upon receiving a request which requires authentication, the proxy/server
must issue the "407 Proxy Authentication Required" response with a
"Proxy-Authenticate" header.  The digest-challenge used in the
Proxy-Authenticate header is the same as that for the WWW-
Authenticate header as defined above in section 3.2.1.

Example:

GET /my/endpoint/goes/here HTTP/1.1
User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Host: myfancyproxy.com
Accept: */*

HTTP/1.1 407 Proxy Authentication Required
Server: nginx/1.4.1
Date: Wed, 30 Oct 2013 18:45:30 GMT
Content-Type: application/json
Content-Length: 41
Connection: keep-alive

* Connection #0 to host myfancyproxy.com left intact
* Closing connection #0

The header that should be returned is:

Proxy-Authenticate: Basic

Documentation and some questions

Hi, I'm try use vulcan.

I read the README.md and I have some questions:

What routes the vulcan send to server control?

On my tests the vulcan call: /auth and /

/auth - with params like described here https://github.com/mailgun/vulcan#authorization
/ - I believe here I need parser param url and create the json with tokens and upstreams

but on example https://github.com/mailgun/vulcan#control-server-example only /auth route is created,

I lost some information on README?

What I need do to denied access?

My /auth look like it: (I am using Sinatra to control server)

get '/auth' do
  content_type :json
  if user_logged?
    status 200
  else
    status 403
  end
end

Even when user_logged? is false and status is 403 vulcan keep passing request to my server

Regards,
Duke

ps: I tried send a message a google groups but didn't work

Add Graceful Restart support

I should be able to restart Vulcan without dropping clients.

Consider using: https://github.com/ParsePlatform/go.grace

Support for raw-TCP balancing

Support for listening to non-HTTP listeners.
API changes to allow connection balancing rather than just request balancing.
Support for streaming TCP connections to upstreams

MiddlewareChain usage and interface

Here, the comment says that middleware chain implements the middleware interface:

https://github.com/mailgun/vulcan/blob/master/middleware/chain.go#L10-L14

However, I don't see a ProcessRequest or ProcessResponse method on MiddlewareChain.

By contrast, ObserverChain correctly implements the Observer interface.

In general, how does one add a MiddlewareChain to a LoadBalancer? I quick example would be very useful!

I'm just getting into it - thanks for this fantastic package.

Return 401 instead of 407 when auth credentials are missing

From the customer (https://mailgun.zendesk.com/agent/#/tickets/38252):

Basic auth requires that you return 401 for unauthenticated request, then
client knows to retry with the username and password.

Instead you're returning 407, so the client does not know to retry. CURL
works because it always sends the parameters, not waiting for 401.

You should return 401, per spec:
http://en.wikipedia.org/wiki/Basic_access_authentication

Configurable Transport MaxIdleConnsPerHost

This question is specific to httploc.HttpLocation. src

Maybe I did something wrong but I was doing some benchmarking with multiples connections and I was noticed that only 2 TCP connections were re-used between the proxy server and the final endpoint net/http. It could be possible to add some options parameter for set the MaxIdleConnsPerHost variable?

Middleware can't alter a response when there isn't one.

If the request.Attempt failed (i.e. upstream is unavailable), there is no response, and no way for the middleware to communicate back to the client.

It would be nice to be able to return an error page other than {"error":"Bad Gateway"}, but we can't without access to a Response or something that can write to the http.ResponseWriter.

Hardening of Vulcan

Right now Vulcan probably isn't that awesome to expose to the general internet.

We need to add many limits and hardening of what is accepted from clients, and better understand abusive upstreams.

For example:

Request bodies are not limited in any way, and may consume all memory.
Response Bodies are not streamed, and are buffered completely in memory.
Add or improve timeouts of clients, upstreams, etc.
Improve limits on number of headers, size of headers, etc.

Limits on per-upstream connections

Per-upstream you should be able to say, this upstream can only do 5 requests per second, or N concurrent connections at a time.

Add Forwarding Headers

Add the many headers to upstream requests, some are documented in
draft-ietf-appsawg-http-forwarded-10:

X-Forwarded-For (client IP)
X-Forwarded-Proto (original protocol, eg http vs https)
X-Forwarded-Server (hostname of vulcan server)
Forwarded (new draft standard)

httploc.HttpLocation should ensure that the endpoint host is respected

I'm pretty sure I have this right, but LMK if I'm wrong. Using my own Middleware to provide ProcessRequest it's too easy to override which host the RoundTripper will Dial thereby overriding which host I expect the Endpoint to specify.

request.Request.GetHttpRequest().URL is a pointer, so any edits to URL.Host are carry all the way through to the RoundTrip. Perhaps a quick override above this line to guarantee the Location dials the correct upstream host?

Upstream healthcheck API

You should be able to hit a specific url to test if an upstream is healthy.

Add FastCGI backend support

You should be able to proxy to a FastCGI destination.
This destination could be over TCP, or a local socket.

Middleware not able to modify http.Request as expected

Again, this is specific to httploc.HttpLocation, and I'm a little uncertain of what gets copied in a struct copy.

Based on trial and error, the headers I add in my Middleware.ProcessRequest are lost and not in the outbound http.Request that my upstream host sees. This line shows req is passed into Middleware, but only httpReq is sent to RoundTrip.

I'm happy to supply patches, but I'm not actually sure what a good fix is. It mostly depends on the expectation of projects linking to vulcan.

I could also supply my own Location implementation, but I thought I'd bring this up since Middlewares are supposed to be able to change http.Request and not just the http.Request.URL according to the docs.

Load Balancer example from quickstart keeps getting 404 not found

I'm trying the load balancer example to reach https://www.google.com (i.e. single upstream node) from localhost. Such that when I do,

curl localhost:8000

It should return the page from Google. The problem is that I keep getting a 404 not found. Any idea please?

$ curl "http://localhost:8000/"
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/errors/logo_sm_2.png) no-repeat}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/errors/logo_sm_2_hr.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/errors/logo_sm_2_hr.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/errors/logo_sm_2_hr.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:55px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL <code>/</code> was not found on this server.  <ins>That’s all we know.</ins>

Here is my code.

package main

import (
    "log"
    "net/http"
    "os"
    "time"

    "github.com/mailgun/vulcan"
    "github.com/mailgun/vulcan/endpoint"
    "github.com/mailgun/vulcan/loadbalance"
    "github.com/mailgun/vulcan/loadbalance/roundrobin"
    "github.com/mailgun/vulcan/location/httploc"
    "github.com/mailgun/vulcan/route"
)

var PORT string = os.Getenv("PORT")
var smtpServers = []string{"https://www.google.com"}

func NewBalancer(servers []string) loadbalance.LoadBalancer {
    // Create a round robin load balancer with some endpoints
    rr, err := roundrobin.NewRoundRobin()
    if err != nil {
        log.Fatalf("Error: %s", err)
    }

    for _, s := range servers {
        rr.AddEndpoint(endpoint.MustParseUrl(s))
    }

    return rr
}

func main() {
    rr := NewBalancer(smtpServers)

    // Create a http location with the load balancer we've just added
    loc, err := httploc.NewLocation("loc1", rr)
    if err != nil {
        log.Fatalf("Error: %s", err)
    }

    // Create a proxy server that routes all requests to "loc1"
    proxy, err := vulcan.NewProxy(&route.ConstRouter{Location: loc})
    if err != nil {
        log.Fatalf("Error: %s", err)
    }

    // Proxy acts as http handler:
    server := &http.Server{
        Addr:           ":" + PORT,
        Handler:        proxy,
        ReadTimeout:    10 * time.Second,
        WriteTimeout:   10 * time.Second,
        MaxHeaderBytes: 1 << 20,
    }
    log.Println("Listening on port " + PORT)
    server.ListenAndServe()
    if err != nil {
        log.Panic(err)
    }
}

Global middleware

Currently it seems like there is no way to create a middleware that acts on a request before it's routed.
I think that it would make sense in some use cases (e.g.: blacklist of certain clients, site-wide authentication...).
What do you think ?

Add SPDY / HTTP 2.0 Support

Vulcan should support many versions of SPDY and HTTP 2.0 as reasonable.

Add PROXY protocol line support

HAProxy or stud are commonly used to do edge load balancing or TLS termination.
When you are behind an ELB that is terminating TLS or doing a TCP load balancer, you will use the PROXY protocol to keep the client IP address.

For these reasons Vulcan should support the PROXY protocol line.

Library for doing this is here:
https://github.com/racker/go-proxy-protocol

cc @songgao

Proxy Requests Transparently

When Vulcan proxies to a URL that contains an encoded URL, the contained URL is decoded by Vulcan. For example:

http://vulcan.site.com/log/http%3A%2F%2Fwww.site.com%2Fsomething

is rewritten to:

http://vulcal.site.com/log/http://www.site.com/something

See this issue on mailgun/vulcan for more information.

Extract and extend the rate limiting and load balancing routines

I am keen to use or build both these "modules" for a proxy I am building for MQTT.

Would be keen to know what you thought of this?

Aim would be to decouple them from your backends, while still supporting the interface they provide and adding more metrics about the operation of these "modules" for want of a better word.

Cheers

Incorrect usage of log.err

log.err takes 2 arguments, _stuff, and _why these arguments are both poorly named and under-documented. The first argument (_stuff) is expected to be None, an Exception, or a Failure object.

The second argument (_why) is a string describing the first argument.

Two incorrect usages exist in the code base:

log.err with string first argument.
log.err with with no arguments immediately following log.err with string first argument.

In both cases the better construction would be a single log.err of the form:

log.err(None, "String error message")

Or if you prefer to be explicit about the creation of the Failure object

log.err(Failure(), "String error message")

In some cases it appears that you are using an existing failure instance (often called reason) and passing a string so you can format the error message and the traceback into the string. In which case it'd still be better to use either of the above constructions and either fix twisted#6424 or do your own message formatting in a log observer.

Javascript API improvements

I was thinking a more modular, hook-based JS API might be nice. Additionally it would be best to use a CommonJS module style.

There's synchronous IO here despite concerns (we invented a vulcan.http module).

Conceptual request phases (which can be applied in arbitrary order)

Request transformer
Rate limits
Identity
Authorization
Handler

To enable short and clear code, we have helper functions which take a function and return an opaque middleware object (or function) which can be passed back to Vulcan in vulcan.order().

helper functions

vulcan.transform: the user implements a function that returns a request object
vulcan.rate_limit: the user implements a function that returns a structured object mapping keys to the limits
vulcan.identity: the user implements a function that returns a boolean
vulcan.reverse_proxy: the user implements a function that returns an array of URLs

In vulcan.order the user implements a function that returns an array of helper function return values.

basic complex example

var v = require('vulcan');

var transformer = v.transform(function(request) {
  request.tenant_id = request.uri.match(/[^\/]*\/([0-9]+)\//)[1];
  return request;
});

var rate_limiter = v.rate_limit(function(request) {
  if (request.ip === '127.0.0.1') {
    // no limits for localhost
    return {};
  }
  return {
    request.ip: {'KB/sec': 100, 'requests per sec': 1000}
  };
});

var identity = v.identity(function(request) {
  var auth_response = v.http.post(discover('/auth_endpoints'), {
    'Auth-Token': request.headers['X-Auth-Token'],
    'Tenant-ID': request.tenant_id
  });
  if (auth_response.code == 200) {
    return true;
  }
  return false;
});

var user_rate_limiter = v.rate_limit(function(request) {
  return {request.tenant_id: {'requests per day': 50000}};
});

var handler = v.reverse_proxy(function(request) {
  return ["http://localhost:5000", "http://localhost:5001"];
});

v.order(function(request) {
  return [transformer, rate_limiter, identity, user_rate_limiter, handler];
});

simple Hello World example

var v = require('vulcan');

v.order(function(request) {
  return [v.reverse_proxy(function(request) {
    return ["http://localhost:5000", "http://localhost:5001"];
  }];
});

This hello world is a little cumbersome, we can iterate, the indirection seems important right now.

Autoscaling groups for discover backend

Hitting the Rackspace or Amazon API, find all members of an auto-scaling group, and add all active instances as responses to a discover() command.
This will let more people use discover without having to buy into etcd

epoll is default reactor on linux, do not explicitly install it.

https://twistedmatrix.com/trac/ticket/5478

Fundamentally non-GUI reactor selection is a deployment choice and not something that the code should enforce.