Coder Social home page Coder Social logo

concurrency-limits's Introduction

Overview

Java Library that implements and integrates concepts from TCP congestion control to auto-detect concurrency limits for services in order to achieve optimal throughput with optimal latency.

Background

When thinking of service availability operators traditionally think in terms of RPS (requests per second). Stress tests are normally performed to determine the RPS at which point the service tips over. RPS limits are then set somewhere below this tipping point (say 75% of this value) and enforced via a token bucket. However, in large distributed systems that auto-scale this value quickly goes out of date and the service falls over by becoming non-responsive as it is unable to gracefully shed excess load. Instead of thinking in terms of RPS, we should be thinking in terms of concurrent request where we apply queuing theory to determine the number of concurrent requests a service can handle before a queue starts to build up, latencies increase and the service eventually exhausts a hard limit such as CPU, memory, disk or network. This relationship is covered very nicely with Little's Law where Limit = Average RPS * Average Latency.

Concurrency limits are very easy to enforce but difficult to determine as they would require operators to fully understand the hardware services run on and coordinate how they scale. Instead we'd prefer to measure or estimate the concurrency limits at each point in the network. As systems scale and hit limits each node will adjust and enforce its local view of the limit. To estimate the limit we borrow from common TCP congestion control algorithms by equating a system's concurrency limit to a TCP congestion window.

Before applying the algorithm we need to set some ground rules.

  • We accept that every system has an inherent concurrency limit that is determined by a hard resources, such as number of CPU cores.
  • We accept that this limit can change as a system auto-scales.
  • For large and complex distributed systems it's impossible to know all the hard resources.
  • We can use latency measurements to determine when queuing happens.
  • We can use timeouts and rejected requests to aggressively back off.

Limit algorithms

Vegas

Delay based algorithm where the bottleneck queue is estimated as

L * (1 - minRTT/sampleRtt)

At the end of each sampling window the limit is increased by 1 if the queue is less than alpha (typically a value between 2-3) or decreased by 1 if the queue is greater than beta (typically a value between 4-6 requests)

Gradient2

This algorithm attempts to address bias and drift when using minimum latency measurements. To do this the algorithm tracks uses the measure of divergence between two exponential averages over a long and short time time window. Using averages the algorithm can smooth out the impact of outliers for bursty traffic. Divergence duration is used as a proxy to identify a queueing trend at which point the algorithm aggresively reduces the limit.

Enforcement Strategies

Simple

In the simplest use case we don't want to differentiate between requests and so enforce a single gauge of the number of inflight requests. Requests are rejected immediately once the gauge value equals the limit.

Percentage

For more complex systems it's desirable to provide certain quality of service guarantees while still making efficient use of resources. Here we guarantee specific types of requests get a certain percentage of the concurrency limit. For example, a system that takes both live and batch traffic may want to give live traffic 100% of the limit during heavy load and is OK with starving batch traffic. Or, a system may want to guarantee that 50% of the limit is given to write traffic so writes are never starved.

Integrations

GRPC

A concurrency limiter may be installed either on the server or client. The choice of limiter depends on your use case. For the most part it is recommended to use a dynamic delay based limiter such as the VegasLimit on the server and either a pure loss based (AIMDLimit) or combined loss and delay based limiter on the client.

Server limiter

The purpose of the server limiter is to protect the server from either increased client traffic (batch apps or retry storms) or latency spikes from a dependent service. With the limiter installed the server can ensure that latencies remain low by rejecting excess traffic with Status.UNAVAILABLE errors.

In this example a GRPC server is configured with a single adaptive limiter that is shared among batch and live traffic with live traffic guaranteed 90% of throughput and 10% guaranteed to batch. For simplicity we just expect the client to send a "group" header identifying it as 'live' or 'batch'. Ideally this should be done using TLS certificates and a server side lookup of identity to grouping. Any requests not identified as either live or batch may only use excess capacity.

// Create and configure a server builder
ServerBuilder builder = ...;

builder.addService(ServerInterceptors.intercept(service,
    ConcurrencyLimitServerInterceptor.newBuilder(
        new GrpcServerLimiterBuilder()
            .partitionByHeader(GROUP_HEADER)
            .partition("live", 0.9)
            .partition("batch", 0.1)
            .limit(WindowedLimit.newBuilder()
                    .build(Gradient2Limit.newBuilder()
                            .build()))
            .build();

    ));

Client limiter

There are two main use cases for client side limiters. A client side limiter can protect the client service from its dependent services by failing fast and serving a degraded experience to its client instead of having its latency go up and its resources eventually exhausted. For batch applications that call other services a client side limiter acts as a backpressure mechanism ensuring that the batch application does not put unnecessary load on dependent services.

In this example a GRPC client will use a blocking version of the VegasLimit to block the caller when the limit has been reached.

// Create and configure a channel builder
ChannelBuilder builder = ...;

// Add the concurrency limit interceptor
builder.intercept(
    new ConcurrencyLimitClientInterceptor(
        new GrpcClientLimiterBuilder()
            .blockOnLimit(true)
            .build()
        )
    );

Servlet Filter

The purpose of the servlet filter limiter is to protect the servlet from either increased client traffic (batch apps or retry storms) or latency spikes from a dependent service. With the limiter installed the server can ensure that latencies remain low by rejecting excess traffic with HTTP 429 Too Many Requests errors.

In this example a servlet is configured with a single adaptive limiter that is shared among batch and live traffic with live traffic guaranteed 90% of throughput and 10% guaranteed to batch. The limiter is given a lookup function that translates the request's Principal to one of the two groups (live vs batch).

Map<String, String> principalToGroup = ...;
Filter filter = new ConcurrencyLimitServletFilter(new ServletLimiterBuilder()
    .partitionByUserPrincipal(principal -> principalToGroup.get(principal.getName())
    .partition("live", 0.9)
    .partition("batch", 0.1))
    .build());

Executor

The BlockingAdaptiveExecutor adapts the size of an internal thread pool to match the concurrency limit based on measured latencies of Runnable commands and will block when the limit has been reached.

public void drainQueue(Queue<Runnable> tasks) {
    Executor executor = new BlockingAdaptiveExecutor(
        SimpleLimiter.newBuilder()
            .build());
    
    while (true) {
        executor.execute(tasks.take());
    }
}

concurrency-limits's People

Contributors

a4vision avatar abersnaze avatar ayangster avatar chali avatar coekie avatar elandau avatar fedorka avatar guohao avatar igorperikov avatar j-baker avatar jaketerrito avatar jeanbza avatar jeffreymyers avatar johshoff avatar kilink avatar mdolinin-chwy avatar pandeyabhi1987 avatar rpalcolea avatar serceman avatar stanleyrh avatar umairk79 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

concurrency-limits's Issues

AdaptiveBlockingExecutor.Builder differs critically from AdaptiveBlockingExecutor ctor

The ctor automatically wraps its Limiter in BlockingLimiter whereas the Builder does not, leading to unexpected RejectedExecutionExceptions under common upgrade/update conditions.

I'm going to put together a pull request to at least simplify the application of BlockingLimiter from within the Builder pattern, although it's probably too late to make it the default.

BlockingLimiter: java.lang.ArithmeticException: long overflow after updating from 0.1.1 to 0.1.2

After updating from 0.1.1 to 0.1.2 I get:

java.lang.ArithmeticException: long overflow
at java.lang.Math.multiplyExact(Math.java:892)
at java.time.Duration.toMillis(Duration.java:1171)
at com.netflix.concurrency.limits.limiter.BlockingLimiter.tryAcquire(BlockingLimiter.java:70)
at com.netflix.concurrency.limits.limiter.BlockingLimiter.acquire(BlockingLimiter.java:87)

The same exception occurs when I use version 0.1.4.

Race condition in SimpleLimiter.acquire

SimpleLimiter.acquire checks if the current in-flight executions are above the limit, but doesn't do so in a thread-safe manner, allowing the possibility of exceeding the limit.

[Question] How can we control the queue size with long running requests.

Consider an asynchronous system with external queue from where requests are sent. The response to this request merely means an ack to the request and not the handling. The server will send a reply related to the request after an arbitrary time which depends on the load on the system. For long response times, say in the order of minutes, keeping the corresponding requests in memory to measure the latency defeats the purpose of async system.

Is it possible to find the concurrency limits in such a system? I assume that we have request rate at which we are sending the requests and response rates that which system is finishing the requests (these can be measured cheaply and have no request/response association needed). In a steady state, request rate and response rate matches. We can increase the request rate until response rate stagnates.

Is there anything we can apply from TCP congestion control (or some other technique) to fine tune the request rate to slowly increase to max capacity and keep tweaking the request rate based on the response rates?

The basic idea is, in a given window:

if request_rate <= response_rate:
// (less is possible since responses are for requests from a prev window)
request_rate = request_rate * (1 + x)
else:
request_rate = request_rate * (1 - y)

Questions:

  1. Does this make sense or there's another way to model or think about limiting?
  2. How are x and y typically measured?

ConcurrencyLimitServerInterceptor ignores Limiter for one-way streams

The ConcurrencyLimitServerInterceptor simply passes thru, bypassing its Limiter, any calls that don’t use bi-directional streaming:

@OverRide
public <ReqT, RespT> Listener interceptCall(final ServerCall<ReqT, RespT> call,
final Metadata headers,
final ServerCallHandler<ReqT, RespT> next)

{ if (!call.getMethodDescriptor().getType().serverSendsOneMessage() || !call.getMethodDescriptor().getType().clientSendsOneMessage()) {
return next.startCall(call, headers);
}

So, in systems that make use of only unary or one-direction streaming, which one would think are quite common, the Limiter, and essentially the entire subsystem, do nothing.

Am I correct in presuming that the assumption here is that any call that doesn’t do bidirectional streaming will be “fast” and has no need for limiting? Any plans to change or config this behavior?

Thanks.

Probe minimum RTT

The limiter currently keeps track of an absolute minimum latency since startup. For traffic with a very wide range of latency distributions this can prevent the limit from growing. I can think of two possible solutions,

  1. Track a low percentile (10%) of latency instead of minimum to help filter out outliers.
  2. Periodically double the minimum RTT to effectively probe for higher RTT. This is similar to probing techniques used in Google's BRR. We tried this out early on in our internal implementation and there was concern that during prolonged periods of increased latency this can result in limit drifting upwards, which is the opposite of the desired effect.

Q: VegasLimit performs poorly when rate limited, if the rate limit is similar or less than the window size?

We currently have servers which have rate limits. We'd prefer to keep these, since at a certain level they're useful for enforcing good development practices. Right now, our backoff policy is randomized exponential backoff with a maximum number of backoffs per individual request.

This does not work for providing backpressure, since if the 'target rate' is greater than the rate limit, eventually a request will (randomly or not) hit its maximum number of retries by being unlucky.

I've been investigating using this library in order to cause backpressure on the client side. The gist is that the server has a rate limit, and if the client is exceeding this rate limit, they will immediately receive an HTTP 429. The client will then back off for a certain amount of time, after which they mark their request as 'dropped'.

With AIMD, this works moderately well - eventually the client goes above the rate limit and a request is dropped. The next window, the concurrency is dropped by half, which generally will correspondingly drop the concurrency too, and cause the rate limit to be missed.

Now, the main issue here is that if the window size is larger than the minimum amount of time spent backing off, there is still some small probability that the request will fail when it hits this point, e.g. if I have 21 threads spinning on making requests, a rate limit of 20 requests per second, with a successful request taking 1 second, and a rate limited request finishing immediately, I will almost surely fail with AIMD - since if e.g. the window is 100, it'll take 5 seconds to close the window, and we don't back off for 5 seconds.

The Vegas limit is even worse, since it's even more aggressive at ramping up and takes potentially many windows to ramp down, which means that failure is roughly guaranteed when the rate limit is low compared to the window size.

It also looks like I should prefer to use the Vegas limit in general, since it is the one that actually responds to being queued.

I'm wondering if there's any changes that could be made to the library here to help with this. At the moment, it seems that only successful requests fill the window, so if I'm being rate limited, I could potentially see many many failures before the backoff occurs. Is this intentional?

I'm wondering whether it would be viable/desirable to have a 'dropped' request

  1. immediately cause backoff to occur (would require changes to the gradient limit, but they're minor)
  2. debounce this so that it can only happen once per full window seen?

Another question would be: Would I end up with something drastically broken if the 'decrease function' for the Vegas limit were to be different for didDrop vs queueSize > beta?

rtt_noload selection makes Vegas hard to use operationally

I've been toying with this implementation to see if it would fit in our system. I've set up a harness to send in work to the algorithm to see how it responds to different scenarios.

I've noticed it is incredibly sensitive to the rtt_noload value. This is currently selected as the fastest seen rtt. In our system we can have a fairly wide deviation on rtts so we sometimes get requests that are significantly faster than at other times. For example, we might have an endpoint that has a mean response time of 300ms but, very occasionally, might return in 150ms.

That 150ms will get stored as the rtt_noload. This causes our estimatedlimit to drop hard and we will frequently go over the limit.

I've tried modifying the algorithm to store the lowest weighted moving average as the rtt_noload and this can help somewhat but is still susceptible to "lucky" periods that happen to be faster than the mean.

Are there any thoughts on a better way to select the rtt_noload? This selection seems fundamental to the algorithm.

Provide an ExecutorService implementation?

To make it easier to use concurrency-limits with libraries using ExecutorService (Apache Camel in my case), it would be great if concurrency-limits would provide an ExecutorService implementation.

Sorry if this is out of scope.

By the way: Thank you very much for providing concurrency-limits!

Trying to understand part of windowed algorithm

I'm trying to figure out, why (and does it?) this condition check for windowed limit is important https://github.com/Netflix/concurrency-limits/blob/master/concurrency-limits-core/src/main/java/com/netflix/concurrency/limits/limit/WindowedLimit.java#L155

It looks like under low load it wouldn't let the delegate limit to perform (while it probably should raise limit, especially if it was pessimisized before on overload)
And under high load it will work fine with averages, but with percentiles - you're either skipping too many samples(because they exceed window size) or letting actual logic not to be executed, because you will be setting window size to a big number. So picking window size became really tricky and hard to achieve properly

VegasLimit use the min rtt as noloadRtt

when i use a ThreadPoolExecutor. may be all Runnable wait in queue for some time.

I think we had better get a function to send the real_no_loadTime to deal the concurrency-limit

VegasLimit manually setting rtt_noload

recently, we tried this lib for concurrency limitation. And we found that the rtt_noload may grow too large to limit client concurrency. so we may need to set the base line of rtt_noload

Multiple Unlocks of a Critical Resource

lock.lock();
if (getInflight() >= getLimit() && partition.isLimitExceeded()) {
lock.unlock();

VegasLimit for batch processing

One limitation of current VegasLimit is it doesn't work well with batch processing system.

Different from a real-time processing system, a batch processing system has a buffer, requests first put into buffer, and being processed in batch. Definition of rtt is time elapsed between request put into buffer and request being processed.

The problem of VegasLimit is, rtt > rtt_noload indicates queue is building up. however in batch processing system, it's not true. For example, there is a virtual cargo shipping system, at terminal, shipping containers are being shipped at fixed interval. (for convenience, shipping container has unlimited capacity.) Packages arrived at the terminal will be shipped by next pending send shipping container.

In this model, a package arrived at terminal just before next shipping event should have smallest rtt aka rtt_noload, then right after container shipped, next arrived package should have largest rtt (rtt_noload + interval). current VegasLimit could incorrect treat this system as saturated, because of interval, rtt could be significantly larger than noload_rtt. however, this virtual shipping system is not saturated at all.

Kafka message producing is a concrete example of batch processing, Our goal is to use VegasLimit to detect saturation of message producing.

GC Pauses makes maxInflightRequest highly volatile when response times are lower than the pauses itself

Hi.
I've just ask in stackoverflow this question.

https://stackoverflow.com/questions/59311752/how-to-limit-concurrency-in-a-webapp-when-gc-pauses-last-more-than-the-average-r#

I've forked this project and implemented in an application and I realized that the GC Pauses last long than my average response time, so when tracking the value of maxInflightRequest I can see that when a GC (Minor o Major) is performed the value of maxInflightRequest goes up and reach the threshold that I've configured stressing the application. So I'm having rejections that should've been processed.

All the details are in the stackoverflow question

[RFI] How does this differentiate between a dependent service getting slower and standard too much concurrency impacting latency?

I can absolutely see as this thing is tuning connection limits up and down watching average latency being a great signal. What happens when you're at steady state and one of your dependencies (say a database) becomes unresponsive and your latencies spike? It's hard to prescribe the perfect one-size meets-all response to this situation but in some scenarios I'd want it to add more concurrency to counteract the blocked threads waiting on a DB connection timeout.

Ideally it would incorporate a cpu or blocked threads metric into the equation to help determine if it's a dependency or the service itself that is impacting latency.

Rest Client concurrency limiter - Clarification

Hi - I'm invoking a REST based service from my application using webclient, and would like to use this ConcurrencyLimitServletFilter in my client to apply backpressure to limit the number of request sent to the server so the server will not get overloaded too much and client will not wait a long time to get connection failure.

ConcurrencyLimitServletFilter can this filter be used for both client and server concurrency limiting?

If someone can provide a Sample of how to register ConcurrencyLimitServletFilter in webclient will be helpful.

Is this project dormant?

Hi, I just noticed a drop-off in releases and activity. Zipkin currently depends on this, but we've hidden it due to the <1.0 status. Can you let us know if this project will continue?

Thrift integration

Hi @elandau, what's your input on integrating this library with Thrift? I would like to experiment on that. As far as I know, Thrift doesn't have flexible interceptors to tap into but I think I could figure out a workaround using Object Proxy Design pattern.

Why does Log10RootFunction compute Math.sqrt?

Currently Log10RootFunction has the following code

    @Override
    public Integer apply(Integer t) {
        return t < 1000 ? lookup[t] : (int)Math.sqrt(t);
    }

I assume this is a mistake as we are pre computing log10 for first 1000 number like this

    static {
        IntStream.range(0, 1000).forEach(i -> lookup[i] = Math.max(1, (int)Math.log10(i)));
    }

Can you fix this?

Question about simulating problems

I'm starting to investigate creating a .NET library that implements some of the concepts in this library, but the part that I'm struggling with a bit is figuring out how I can simulate some of the issues that the library is meant to protect against (latency spikes, resource exhaustion, etc). Basically I want to make sure that what I'm implementing is actually working as expected and providing a benefit.

I was just wondering if you had any pointers on how to go about simulating or benchmarking to save me reinventing the wheel, or had any links to information you used when designing and implementing this library.

Thanks in advance!

Getting service unavailable

image

I am getting a server off response for Concurrency exchange data.

Is there any other back up server to get data from?

Impact of Cost on Concurrency limit of a Partition

Hi, I would like to discuss a feature I need. I am not entirely sure if it fits well with concurrency limiting logic.

Problem: The cost (latency or resource usage) of the jobs should have high impact on the concurrency limit of the partition that issued them. The partition that is issuing costly jobs should receive lower concurrency limit when compared to other partitions with no change in their load pattern. The concurrency limit could get affected overall initially but as soon as the partition with costly jobs is penalized, the other partitions should be restored to normal limits.

To achieve this, do we just maintain separate SimpleLimiter for each of our partition type or is there a better way?

Add PartitionRegistry

Hi, do you think it is a good idea to have a PartitionRegistry that is capable of adding or removing Partitions during runtime? I can submit a PR based on our conclusion.

Based on the current code, the limit for a partition is just totalLimit * Partition#percentage. Now, as per controlled external config changes, I want to add or remove partitions from the limiter. When I add new partitions with adjusted percentages across the board, I want the current limit to be updated for existing partitions and get appropriate limits for new partitions. Similar case applies for when partitions are removed. This way I can evenly divide the quota and then pool it up when there are fewer actors.

Discuss abount Gradient2Limit in stable timeout situtation

hi, i want to discuss about Gradient2Limit in stable timeout situtation.

I use Gradient2Limit at consumer side (to control the consumer's rate to avoid bursting the provider side)

First, what is stable timeout situtation ?

In my situtation , every consumer has set a client timeout , like 5000ms. the stable timeout situtation is :

1. provider is bursted by other consumer which doesn't hava any rate limit.
2. and the consumer which i use Gradient2Limit always received the same timeout rtt `5000ms`, i call it as a `stable timeout` situtation

Once my consumer which used Gradient2Limit step in stable timeout situtation, the Gradient2Limit will not take effect.

It because Gradient2Limit only care about the volatility of rtt . if rtt has zero volatility, such like stable timeout situtation, Gradient2Limit will grow its limit output until Max_Limit.

So, how could i deal with this situtation? can we make Gradient2Limit to support this situtation?

Now i consider using BBR algorithm to implement a new Limit to resolve it.

Anyone has met the same situtation? how do you deal with it?

Queuesize problem in Gradient2Limit

Why does queuesize in Gradient2Limit always return 4 instead of using the square root as written in the "Performance Under Load" article? What's the purpose of doing this?

1694511328907

Spring WebFlux Implementation

I'm wondering if it makes sense/is possible to have an implementation for Spring WebFlux.

With reactive service listeners not being tied to a thread pool, there would seem to be even greater need for some mechanism limiting incoming requests to not overwhelm downstream systems.

VegasLimit without slow start is unusable

Thank you for bringing this thing to the public.

VegasLimit is almost unusable as of current implementation for any high concurrent workloads (cpu bound with high core count or IO bound) due to very slow ramp up.

I see it removed in #16 because of being too aggressive and overshooting, my experiments with smoothing and exponential growth show good results. Also another thing to note that was not implemented is that Vegas seems to have another parameter, gamma (default 1). When queuing is higher than gamma it stops slow-start (sets new ssthreshold) and enters congestion avoidance. Some papers say to decrease limit by 1/8, linux does something interesting too https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_vegas.c#L219

ymmv

no PRs since I'm experimenting with this idea using Go at the moment

I find some question of module concurrency-limits-servlet

  1. the sub module concurrency-limits-servlet test partition don't have folder java ,like this:
─test
    └─com
        └─netflix
            └─concurrency
                └─limits

standard folder should be like this:

─test
    └─java
        └─com
            └─netflix
                └─concurrency
                    └─limits
  1. in class com.netflix.concurrency.limits.GroupServletLimiterTest and
    ConcurrencyLimitServletFilterTest build Limiter maybe wrong.
Limiter<HttpServletRequest> limiter = new ServletLimiterBuilder()
                .limit(VegasLimit.newDefault())
                .partitionByUserPrincipal(p -> principalToGroup.get(p.getName()), builder -> builder
                 .assign("live", 0.8)
                 .assign("batch", 0.2))
                .build();

method partitionByUserPrincipal expect paramters is Function<Principal, String> principalToGroup, but threre find two paramter.

I think maybe use document introduce's way can resolve

Limiter<HttpServletRequest> limiter = new ServletLimiterBuilder()
                .limit(VegasLimit.newDefault())
                .partitionByUserPrincipal(p -> principalToGroup.get(p.getName()), builder -> builder
                 .partition("live", 0.8)
                .partition("batch", 0.2)
                .build();

of cause,maybe this is a new API developing

Logic behind dropping AIMD current limit by half when reaching maximum

if (currentLimit >= maxLimit) {
currentLimit = currentLimit / 2;
}

May anyone describe why is that needed? It looks like if congestion doesn't occur and current limit reached its maximum, it's because maxlimit was pessimistically low, so instead of keep working successfully at full steam, limits will be halved and so the incoming traffic (and redistributing traffic from current instance will bring it to others and potentially hit maximum there and be halved again), which otherwise could've been processed correctly

Add ability for AIMDLimit to grow by a configurable amount

When AIMD works in conjunction with windowed limit, it might be way too careful in growing back after bursty spike (and, therefore, a backoff), because it has to proceed window size requests, before increasing limit by 1

My proposal is to add new method to AIMDLimit Builder, e.g.

public AIMDLimit.Builder growBy(int amount)

defaulting to 1 (current behavior)

add ability to ignore CORS requests in ConcurrencyLimitServletFilter

prehistory

I have a backend application, which handles requests from frontend application directly (from the browser), so it sends CORS preflight requests to make sure it's a permitted client.

Those requests is being sent against the url, which runs actual logic. OPTIONS request cannot be cached so in the worst case scenario we will hit 1:1 ratio between extremely fast non-business requests and actual business logic requests. According to java filter specification, filters can be applied to urls, without any extra conditions, so the only place we can bring this is filter itself.

actual state of the library

WindowedLimit has a way to solve this via minRttThreshold, however this is not the best solution, because it doesn't target CORS problem exclusively, therefore it can mess with statistics. I am not very good at understanding what's going on in Gradient2/Vegas limits, but I assume they might be affected as well.

solution

So I would like to suggest 3 different approaches:

  • create CorsAwareConcurrencyLimitServletFilter/CorsIgnoringConcurrencyLimitServletFilter filter
  • add parameter ignoreOptionsRequests/doNotCountOptionsRequests/ignoreCorsRequests or somewhat similar to the ConcurrencyLimitServletFilter
  • support passing Predicate<HttpServletRequest> instead and let everyone implement whatever they might want to

I like first solution, because it highlights the hypothetic problem the most. Third approach is obviously much more flexible, but this might be an overkill + it doesn't notify users, that they should think about CORS requests problem.

Do you think this is something worth implementing into library?

Also as soon as we will decide on the approach, I'd like to provide the implementation.

Starving less important partitions

Hi @elandau, thanks for providing this library. I have a question around partition based limiters.

From the code, the partition limits are calculated as newTotalLimit * Partition#percent. But the README talks about starving one partition to favor another. How would I achieve starving something like the batch traffic from the example?

For example, a system that takes both live and batch traffic may want to give live traffic 100% of the limit during heavy load and is OK with starving batch traffic.

Any help from you or other users is highly appreciated!

Deprecate GradientLimit?

GradientLimit is very similar to Gradient2Limit except for few minor things and looks like GradientLimit needs some fixes, such as:

-            estimatedLimit = Math.max(minLimit, queueSize);
+            estimatedLimit = Math.max(minLimit, Math.min(maxLimit, queueSize));
...
-        newLimit = Math.max(queueSize, Math.min(maxLimit, newLimit));
+        newLimit = Math.max(minLimit, Math.min(maxLimit, newLimit));

And those from this PR: #134.

With all these fixes, GradientLimit will be exactly equal to Gradient2Limit except for the probeInterval.

WindowedLimit with configurable percentile in addition to average

I've been thinking how a perfect installation of concurrency-limits might look like for my needs. I walked along the path Vegas -> Gradient2 -> AIMD -> AIMD wrapped with Windowed. And it gave me an idea that calculating average response times usually less representative for real-time services, even if average would've been a perfect p50, I'd like to have my limits reduced earlier than the time when half of my users are out of acceptable timings. What people usually monitor is percentiles like p90, p95, p99, p999, and that's the metric I would prefer to have for my AIMDLimit, because it's a better representation for users experience. So I think concurrency-limits would be a better tool, if it has a percentile-window support.

I've made some research according to moving percentile algorithms and it seems like the only O(1) algorithm described here and it requires some knowledge about time distribution and it might be less accurate in some cases (check out the plot at the bottom of the article), also it's less straightforward, therefore it's more error-prone and harder to trust in.

I'd like to help with that, if team doesn't mind and I think that O(N) space, O(logN) insert time, O(1) percentile calculation implementation might be an appropriate solution here. My idea for now looks like this:

  1. Allow WindowedLimit to work with abstractions other than ImmutableSampleWindow.
  2. Write some sort of PercentileSampleWindow class which is mutable (I think copying N observations on each sample is too much) which will be backed with PriorityBlockingQueue
  3. When window is ready - fetch queue-backing array with reflection (to avoid extra copies), get size()*percentile value, call onSample() and update the WindowedLimit's field with a brand new instance of PercentileSampleWindow.

What do you think so far?

[Question] Is this gitrepo active?

Dear experts,

Adaptive concurrent limits is really awesome! Seems last release of this repo is Year 2019, is this git repo still active?
BTW, does it support netflix spring cloud?

Really appreciate ur great help in advance!

Thanks,
Roy

Share experience with the limiter

We'd love to hear about how people are using the limiter. We'd especially like to hear about where it works and under which scenarios it does not perform as expected. Feel free to comment on this issue with your observations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.