Coder Social home page Coder Social logo

Add middleware proof of concept about raxx HOT 10 CLOSED

nietaki avatar nietaki commented on June 11, 2024
Add middleware proof of concept

from raxx.

Comments (10)

nietaki avatar nietaki commented on June 11, 2024

I'll work on an implementation 😎

from raxx.

nietaki avatar nietaki commented on June 11, 2024

I played around with some prototypes today and, as always, there's a number of different approaches with different trade-offs. I'll try to show them by example and discuss how I see their properties.

All examples show a snippet of the implementation of a custom middleware, which modifies the
headers and passes it further down to subsequent middlewares in the "pipeline".

1. "Hidden State"

This is the approach I was originally aiming for, until I realised it's a bit inconvenient to achieve...

defmodule MyMiddleware1 do
  alias Raxx.Middleware

  defstruct []

  @behaviour Middleware

  @impl Middleware
  def handle_head(request, _state, head_continuation) do
    my_state = %__MODULE__{}
    request = Raxx.set_header(request, "content-type", "application/json")

    next = head_continuation.(request)
    # next :: [Raxx.part()] | Raxx.Response.t()

    {next, my_state}
  end

  # (other callbacks omitted)
end

Advantages:

  • simple callbacks, similar to Raxx.Server
  • the state can't leak between middlewares/controller

Disadvantages:

  • the state of subsequent middlewares is hidden by head_continuation, but still needs to be maintained. It would need to somehow be passed from the continuation lambda to the coordinating code that would maintain the state of the middlewares and the controller. This would probably need to be done by sending the states as tagged messages to self(), or, worse yet, using Process dictionary.
  • The implementation would be complicated, for example to craft the continuation lambdas and orchestrate the sending and receiving of middleware state.
  • One could argue it's not the most functional approach

2. "Pipeline list"

Here Raxx.Middleware would define the pipeline type:

   # (inside Raxx.Middleware)
  # the last `module` is a controller module implementing Raxx.Server, all
  # the other ones implement Raxx.Middleware
  @type pipeline :: [{module, state :: any()}]
defmodule MyMiddleware2 do
  alias Raxx.Middleware

  defstruct []

  @behaviour Middleware

  @impl Middleware

  def handle_head(request, _state, remaining_pipeline) do
    my_state = %__MODULE__{}
    request = Raxx.set_header(request, "content-type", "application/json")

    {next, remaining_pipeline} = Middleware.handle_head(request, remaining_pipeline)
    # next :: [Raxx.part()] | Raxx.Response.t()

    {next, my_state, remaining_pipeline}
    # OR, alternatively
    {next, [{__MODULE__, my_state} | remaining_pipeline]}
  end

  # (other callbacks omitted)
end

Advantages:

  • Simple model and implementation
  • performant implementation

Disadvantages:

  • the middleware in its code has to pass the state of the rest of the pipeline (remaining_pipeline) around. It could modify it (or mess it up) and break other code.
    • on the other hand the "work" the middleware code has to do is dead simple and can be easily validated outside the middleware code

3. "Split callbacks"

That's the simplest approach to come up with, but I'd argue it's not particularly elegant.

defmodule MyMiddleware3 do
  alias Raxx.Middleware

  defstruct []

  @behaviour Middleware

  @impl Middleware

  def before_handle_head(request, _state) do
    my_state = %__MODULE__{}
    request = Raxx.set_header(request, "content-type", "application/json")

    {:continue, request, my_state}
    # or {:halt, request, my_state} to not call through to remaining middlewares/controller
  end

  # `next` is what came back from subsequent middlewares/controller
  # will be nil if before_* returned a {:halt, _, _} tuple
  def after_handle_head(request, state, next)
    {next, my_state}
  end

  # (other callbacks omitted)
end

Advantages

  • Performant implementation

Disadvantages

  • Twice as many callbacks to implement compared to the other approaches
  • before_ and after_ callbacks can't share variables, everything needs to be passed through the middleware state.
  • Arguably unintuitive interface
    • what does "before" and "after" mean?
    • what happens if one of the subsequent middlewares changes the request?
    • what does :halt and :continue mean?

My take

After looking at it from different sides I like approach 2 (pipeline list) the most. I think it's going to yield nice implementation and a simple mental model while still being relatively difficult to use wrong.

I'll focus on implementing that one right now.

from raxx.

nietaki avatar nietaki commented on June 11, 2024

Also, I remember some discussions about how performant it is to call functions on modules stored in variables which might happen quite a lot when implementing this. I did a quick and dirty benchmark to make sure it wasn't terrible:

  test "calling modules via functions" do
    list = Enum.to_list(1..10)

    enum = Enum
    lambda = fn l -> Enum.reverse(l) end

    Benchee.run(%{
      "directly" => fn -> Enum.reverse(list) end,
      "module in a variable" => fn -> enum.reverse(list) end,
      "apply on a variable" => fn -> apply(enum, :reverse, [list]) end,
      "apply on a Module" => fn -> apply(Enum, :reverse, [list]) end,
      "lambda" => fn -> lambda.(list) end
    })
  end

Results:

Operating System: Linux"
CPU Information: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 30.96 GB
Elixir 1.7.3
Erlang 21.0.9

(some warnings and information cut)

Name                           ips        average  deviation         median         99th %
apply on a Module          22.64 M      0.0442 μs    ±50.94%      0.0410 μs      0.0870 μs
module in a variable       22.40 M      0.0446 μs    ±25.09%      0.0420 μs      0.0880 μs
directly                   22.04 M      0.0454 μs    ±31.05%      0.0420 μs      0.0920 μs
apply on a variable        20.98 M      0.0477 μs   ±734.85%      0.0400 μs       0.120 μs
lambda                     17.95 M      0.0557 μs    ±23.71%      0.0530 μs       0.106 μs

Comparison: 
apply on a Module          22.64 M
module in a variable       22.40 M - 1.01x slower
directly                   22.04 M - 1.03x slower
apply on a variable        20.98 M - 1.08x slower
lambda                     17.95 M - 1.26x slower

(all of those come with the Warning: The function you are trying to benchmark is super fast, making measurements more unreliable! This holds especially true for memory measurements. See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning warnings, but I think it's a good starting point for an intuition anyways.

from raxx.

CrowdHailer avatar CrowdHailer commented on June 11, 2024

After reading your discussion about the problems with version 1, my immediate though was something along the lines of version 2.

Nice metrics, looks to me like everything, except lambda is basically a draw.

from raxx.

nietaki avatar nietaki commented on June 11, 2024

Cool! I have a fully working and partially tested implementation of 2 going, but I want to iron out some API kinks and integrate it with the Router before I share it.

from raxx.

CrowdHailer avatar CrowdHailer commented on June 11, 2024

I had a thought for benchmarks. Code in this PR #140

Operating System: Linux"
CPU Information: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
Number of Available Cores: 4
Available memory: 7.70 GB
Elixir 1.7.1
Erlang 21.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: none specified
Estimated total run time: 14 s


Benchmarking macro...
Benchmarking stack...

Name            ips        average  deviation         median         99th %
macro      123.47 K        8.10 μs    ±68.84%           8 μs          11 μs
stack       20.52 K       48.72 μs    ±10.47%          48 μs          71 μs

Comparison: 
macro      123.47 K
stack       20.52 K - 6.02x slower

Possibly still not very informative, to get rid of the fast code warning I stacked 1000(!!) middleware

from raxx.

nietaki avatar nietaki commented on June 11, 2024

#139 and #141 are both experiments of how it could look, let's unify the discussion on the approach here.

from raxx.

nietaki avatar nietaki commented on June 11, 2024

I needed a bit more to think about #141 to digest and formulate what I do and don't like about it. Let me try to split it into topics

Should Ace know if it's serving a stack of middlewares or just a server?

Good point, it probably shouldn't - a stack of middlewares with a vanilla server attached to it should have the same Raxx.Server interface as just a server. Whether the interface should be exposed by the "topmost" middleware of some middleware stack mechanism, is a different thing.

Is MyMiddleware.wrap({MyServer, :my_server_state}) a sufficient interface?

I'd argue it isn't. It only allows you to construct a "pipeline" of middlewares (plus a baked-in server) once, and you can't re-use it anymore. I can imagine situations where we'd want to "concatenate" pipelines, in a similar way you'd do in phx routers - I have a "pipeline" of middlewares, that based on the request I want to put before a server. The wrap interface doesn't allow for that, or anything else, really.

Is it important or even viable that a single Middleware is a valid Server or the other way around?

I don't think so. A middleware should always have the option to hand the request further down the "stack", but a server never would.

Side note: application-specific responses

After some things that I've been trying to do with Phoenix/Plug recently, I think it would be cool/useful if the Server could return something other than a Raxx.part to it's middlewares, for an application specific action. In order for that to work the middleware infrastructure would either need to not modify the response parts at all or pass through some specifically tagged ones (like {:custom, %MyAppError{}})

A proposed approach coming in the next comment ;)

from raxx.

nietaki avatar nietaki commented on June 11, 2024

Ok, here we go:

We're going to have 3 main concepts here. It might seem like it complicates things, but I'd argue it allows us to have a clean division, more straightforward api and a simpler programming experience.

Middleware is just a single middleware that can modify requests/responses. It provides a behaviour for the individual middlewares to implement. This is slightly modified code from my PR:

defmodule Raxx.Middleware do
  @type t :: {module, state}

  @type state :: any()

  @type next :: {[Raxx.part()], state, Server.t()}

  @callback process_head(Raxx.Request.t(), state(), inner_server :: Server.t()) ::
              next()

  @callback process_data(binary(), state(), inner_server :: Server.t()) :: next()

  @callback process_tail([{binary(), binary()}], state(), inner_server :: Server.t()) ::
              next()

  @callback process_info(any(), state(), inner_server :: Server.t()) :: next()
end

As you can see, it has the capability to wrap any Server and modify requests going to it and its responses.

Then we have the Pipeline. Pipeline is a collection of Middlewares that you can re-use and compose however you want, because it's just a list of them.

defmodule Raxx.Middleware.Pipeline do
   @type t :: [Raxx.Middleware.t()]
end

In the end we have a Stack, which is a Pipeline connected to a Server. The Stack implements the Raxx.Server interface, so it can be used anywhere a Server can be used. You can also easily swap out the pipeline or the server it's attached to:

defmodule Raxx.Stack do
  defstruct [:pipeline, :server]

  @behaviour Raxx.Server

  @type t :: %__MODULE__{
    pipeline: Raxx.Pipeline.t(),
    server: Raxx.Server.t(),
  }

  # all Raxx.Server callbacks implemented here to route through the pipeline and the server and maintain their states correctly
end

The way you implement a Middleware is very simple and doesn't require you to learn any special interfaces. It uses just the Server helper functions from your PR: https://github.com/CrowdHailer/raxx/pull/141/files#diff-2095b50d041367f92320505f03fe049cR217

defmodule Raxx.Middleware.Head do
  alias Raxx.Server

  @behaviour Raxx.Middleware

  def process_head(request = %{method: :HEAD}, state, inner_server) do
    request = %{request | method: :GET}
    {parts, inner_server} = Server.handle_head(server, request)
    parts = strip_body(parts)
   {parts, state, inner_server}
end

In practice most of the time the inner_server would be the Stack with all the Middlewares up to the current one stripped away, but it wouldn't make a difference to the current Middleware.

The exact order of the arguments is up for debate.

Biased summary

So in the end we seem to be arriving at a solution that has it all:

  • a simple to use Middleware interface/behaviour
  • Ace (or any other server) only needs to worry about the Raxx.Server behaviour - the Stack exposes the the interface for a pipeline + server
  • everything can still be done at runtime - nothing stops you from constructing the Pipelines and the Stack in the Application as you did in your example.
  • middleware pipelines are trivial to use - they're just lists which you can concat however you want, making arbitrarily complicated routers using them should be trivial and wouldn't require you to learn any new API.
  • All of it is still asymptotically very fast - the Stack implementation will only push and pop the front elements from lists. Also swapping out the servers in a Stack is fast and simple.

I skipped all of the implementation, but I think it would be very simple - let me know if you have any questions.

from raxx.

nietaki avatar nietaki commented on June 11, 2024

An implementation of the above now in #139

from raxx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.