netflix / concurrency-limits Goto Github PK

License: Apache License 2.0

Java 100.00%

concurrency-limits's Issues

VegasLimit manually setting rtt_noload

recently, we tried this lib for concurrency limitation. And we found that the rtt_noload may grow too large to limit client concurrency. so we may need to set the base line of rtt_noload

Unavailable is misleading

concurrency-limits/concurrency-limits-grpc/src/main/java/com/netflix/concurrency/limits/grpc/client/ConcurrencyLimitClientInterceptor.java

Line 39 in 18692b0

    
           private static final Status LIMIT_EXCEEDED_STATUS = Status.UNAVAILABLE.withDescription("Client concurrency limit reached");

Could this be changed to RESOURCE_EXHAUSTED that doesn't make it look like the server is down?

Race condition in SimpleLimiter.acquire

SimpleLimiter.acquire checks if the current in-flight executions are above the limit, but doesn't do so in a thread-safe manner, allowing the possibility of exceeding the limit.

Readme formula clarification

https://github.com/Netflix/concurrency-limits/blob/master/README.md#vegas doesn't mention exactly what L, minRTT and sampleRTT are.

WindowedLimit with configurable percentile in addition to average

I've been thinking how a perfect installation of concurrency-limits might look like for my needs. I walked along the path Vegas -> Gradient2 -> AIMD -> AIMD wrapped with Windowed. And it gave me an idea that calculating average response times usually less representative for real-time services, even if average would've been a perfect p50, I'd like to have my limits reduced earlier than the time when half of my users are out of acceptable timings. What people usually monitor is percentiles like p90, p95, p99, p999, and that's the metric I would prefer to have for my AIMDLimit, because it's a better representation for users experience. So I think concurrency-limits would be a better tool, if it has a percentile-window support.

I've made some research according to moving percentile algorithms and it seems like the only O(1) algorithm described here and it requires some knowledge about time distribution and it might be less accurate in some cases (check out the plot at the bottom of the article), also it's less straightforward, therefore it's more error-prone and harder to trust in.

I'd like to help with that, if team doesn't mind and I think that O(N) space, O(logN) insert time, O(1) percentile calculation implementation might be an appropriate solution here. My idea for now looks like this:

Allow WindowedLimit to work with abstractions other than ImmutableSampleWindow.
Write some sort of PercentileSampleWindow class which is mutable (I think copying N observations on each sample is too much) which will be backed with PriorityBlockingQueue
When window is ready - fetch queue-backing array with reflection (to avoid extra copies), get size()*percentile value, call onSample() and update the WindowedLimit's field with a brand new instance of PercentileSampleWindow.

What do you think so far?

Spring WebFlux Implementation

I'm wondering if it makes sense/is possible to have an implementation for Spring WebFlux.

With reactive service listeners not being tied to a thread pool, there would seem to be even greater need for some mechanism limiting incoming requests to not overwhelm downstream systems.

vegaslimit is designed to be used alone or decorated by the windowedLimit ?

ConcurrencyLimitServerInterceptor ignores Limiter for one-way streams

The ConcurrencyLimitServerInterceptor simply passes thru, bypassing its Limiter, any calls that don’t use bi-directional streaming:

@OverRide
public <ReqT, RespT> Listener interceptCall(final ServerCall<ReqT, RespT> call,
final Metadata headers,
final ServerCallHandler<ReqT, RespT> next)

{ if (!call.getMethodDescriptor().getType().serverSendsOneMessage() || !call.getMethodDescriptor().getType().clientSendsOneMessage()) {
return next.startCall(call, headers);
}

So, in systems that make use of only unary or one-direction streaming, which one would think are quite common, the Limiter, and essentially the entire subsystem, do nothing.

Am I correct in presuming that the assumption here is that any call that doesn’t do bidirectional streaming will be “fast” and has no need for limiting? Any plans to change or config this behavior?

Thanks.

Share experience with the limiter

We'd love to hear about how people are using the limiter. We'd especially like to hear about where it works and under which scenarios it does not perform as expected. Feel free to comment on this issue with your observations.

Impact of Cost on Concurrency limit of a Partition

Hi, I would like to discuss a feature I need. I am not entirely sure if it fits well with concurrency limiting logic.

Problem: The cost (latency or resource usage) of the jobs should have high impact on the concurrency limit of the partition that issued them. The partition that is issuing costly jobs should receive lower concurrency limit when compared to other partitions with no change in their load pattern. The concurrency limit could get affected overall initially but as soon as the partition with costly jobs is penalized, the other partitions should be restored to normal limits.

To achieve this, do we just maintain separate SimpleLimiter for each of our partition type or is there a better way?

Provide an ExecutorService implementation?

To make it easier to use concurrency-limits with libraries using ExecutorService (Apache Camel in my case), it would be great if concurrency-limits would provide an ExecutorService implementation.

Sorry if this is out of scope.

By the way: Thank you very much for providing concurrency-limits!

Deprecate GradientLimit?

GradientLimit is very similar to Gradient2Limit except for few minor things and looks like GradientLimit needs some fixes, such as:

-            estimatedLimit = Math.max(minLimit, queueSize);
+            estimatedLimit = Math.max(minLimit, Math.min(maxLimit, queueSize));
...
-        newLimit = Math.max(queueSize, Math.min(maxLimit, newLimit));
+        newLimit = Math.max(minLimit, Math.min(maxLimit, newLimit));

And those from this PR: #134.

With all these fixes, GradientLimit will be exactly equal to Gradient2Limit except for the probeInterval.

BlockingLimiter: java.lang.ArithmeticException: long overflow after updating from 0.1.1 to 0.1.2

After updating from 0.1.1 to 0.1.2 I get:

java.lang.ArithmeticException: long overflow
at java.lang.Math.multiplyExact(Math.java:892)
at java.time.Duration.toMillis(Duration.java:1171)
at com.netflix.concurrency.limits.limiter.BlockingLimiter.tryAcquire(BlockingLimiter.java:70)
at com.netflix.concurrency.limits.limiter.BlockingLimiter.acquire(BlockingLimiter.java:87)

The same exception occurs when I use version 0.1.4.

Why does gradient2 ignore the effect of didDrop on latency

If we drop the request, the delay will be almost 0. At this time, onSample will not be able to increase the limit very well, but will keep dropping, and then let more requests be dropped.

[Question] How can we control the queue size with long running requests.

Consider an asynchronous system with external queue from where requests are sent. The response to this request merely means an ack to the request and not the handling. The server will send a reply related to the request after an arbitrary time which depends on the load on the system. For long response times, say in the order of minutes, keeping the corresponding requests in memory to measure the latency defeats the purpose of async system.

Is it possible to find the concurrency limits in such a system? I assume that we have request rate at which we are sending the requests and response rates that which system is finishing the requests (these can be measured cheaply and have no request/response association needed). In a steady state, request rate and response rate matches. We can increase the request rate until response rate stagnates.

Is there anything we can apply from TCP congestion control (or some other technique) to fine tune the request rate to slowly increase to max capacity and keep tweaking the request rate based on the response rates?

The basic idea is, in a given window:

if request_rate <= response_rate:
// (less is possible since responses are for requests from a prev window)
request_rate = request_rate * (1 + x)
else:
request_rate = request_rate * (1 - y)

Questions:

Does this make sense or there's another way to model or think about limiting?
How are x and y typically measured?

VegasLimit use the min rtt as noloadRtt

when i use a ThreadPoolExecutor. may be all Runnable wait in queue for some time.

I think we had better get a function to send the real_no_loadTime to deal the concurrency-limit

Getting service unavailable

I am getting a server off response for Concurrency exchange data.

Is there any other back up server to get data from?

rtt_noload selection makes Vegas hard to use operationally

I've been toying with this implementation to see if it would fit in our system. I've set up a harness to send in work to the algorithm to see how it responds to different scenarios.

I've noticed it is incredibly sensitive to the rtt_noload value. This is currently selected as the fastest seen rtt. In our system we can have a fairly wide deviation on rtts so we sometimes get requests that are significantly faster than at other times. For example, we might have an endpoint that has a mean response time of 300ms but, very occasionally, might return in 150ms.

That 150ms will get stored as the rtt_noload. This causes our estimatedlimit to drop hard and we will frequently go over the limit.

I've tried modifying the algorithm to store the lowest weighted moving average as the rtt_noload and this can help somewhat but is still susceptible to "lucky" periods that happen to be faster than the mean.

Are there any thoughts on a better way to select the rtt_noload? This selection seems fundamental to the algorithm.

.travis.yml: The 'sudo' tag is now deprecated in Travis CI

Travis are now recommending removing the sudo tag.

"If you currently specify sudo: false in your .travis.yml, we recommend removing that configuration"

Thrift integration

Hi @elandau, what's your input on integrating this library with Thrift? I would like to experiment on that. As far as I know, Thrift doesn't have flexible interceptors to tap into but I think I could figure out a workaround using Object Proxy Design pattern.

Question about simulating problems

I'm starting to investigate creating a .NET library that implements some of the concepts in this library, but the part that I'm struggling with a bit is figuring out how I can simulate some of the issues that the library is meant to protect against (latency spikes, resource exhaustion, etc). Basically I want to make sure that what I'm implementing is actually working as expected and providing a benefit.

I was just wondering if you had any pointers on how to go about simulating or benchmarking to save me reinventing the wheel, or had any links to information you used when designing and implementing this library.

Thanks in advance!

Spectator Metrics

Rest Client concurrency limiter - Clarification

Hi - I'm invoking a REST based service from my application using webclient, and would like to use this ConcurrencyLimitServletFilter in my client to apply backpressure to limit the number of request sent to the server so the server will not get overloaded too much and client will not wait a long time to get connection failure.

ConcurrencyLimitServletFilter can this filter be used for both client and server concurrency limiting?

If someone can provide a Sample of how to register ConcurrencyLimitServletFilter in webclient will be helpful.

Trying to understand part of windowed algorithm

I'm trying to figure out, why (and does it?) this condition check for windowed limit is important https://github.com/Netflix/concurrency-limits/blob/master/concurrency-limits-core/src/main/java/com/netflix/concurrency/limits/limit/WindowedLimit.java#L155

It looks like under low load it wouldn't let the delegate limit to perform (while it probably should raise limit, especially if it was pessimisized before on overload)
And under high load it will work fine with averages, but with percentiles - you're either skipping too many samples(because they exceed window size) or letting actual logic not to be executed, because you will be setting window size to a big number. So picking window size became really tricky and hard to achieve properly

Queuesize problem in Gradient2Limit

Why does queuesize in Gradient2Limit always return 4 instead of using the square root as written in the "Performance Under Load" article? What's the purpose of doing this?

README.md: Update Executor section code

The code in the Executor section of README.md refers to the class DefaultLimiter which does not seem exist anymore.

Starving less important partitions

Hi @elandau, thanks for providing this library. I have a question around partition based limiters.

From the code, the partition limits are calculated as newTotalLimit * Partition#percent. But the README talks about starving one partition to favor another. How would I achieve starving something like the batch traffic from the example?

For example, a system that takes both live and batch traffic may want to give live traffic 100% of the limit during heavy load and is OK with starving batch traffic.

Any help from you or other users is highly appreciated!

Q: VegasLimit performs poorly when rate limited, if the rate limit is similar or less than the window size?

We currently have servers which have rate limits. We'd prefer to keep these, since at a certain level they're useful for enforcing good development practices. Right now, our backoff policy is randomized exponential backoff with a maximum number of backoffs per individual request.

This does not work for providing backpressure, since if the 'target rate' is greater than the rate limit, eventually a request will (randomly or not) hit its maximum number of retries by being unlucky.

I've been investigating using this library in order to cause backpressure on the client side. The gist is that the server has a rate limit, and if the client is exceeding this rate limit, they will immediately receive an HTTP 429. The client will then back off for a certain amount of time, after which they mark their request as 'dropped'.

With AIMD, this works moderately well - eventually the client goes above the rate limit and a request is dropped. The next window, the concurrency is dropped by half, which generally will correspondingly drop the concurrency too, and cause the rate limit to be missed.

Now, the main issue here is that if the window size is larger than the minimum amount of time spent backing off, there is still some small probability that the request will fail when it hits this point, e.g. if I have 21 threads spinning on making requests, a rate limit of 20 requests per second, with a successful request taking 1 second, and a rate limited request finishing immediately, I will almost surely fail with AIMD - since if e.g. the window is 100, it'll take 5 seconds to close the window, and we don't back off for 5 seconds.

The Vegas limit is even worse, since it's even more aggressive at ramping up and takes potentially many windows to ramp down, which means that failure is roughly guaranteed when the rate limit is low compared to the window size.

It also looks like I should prefer to use the Vegas limit in general, since it is the one that actually responds to being queued.

I'm wondering if there's any changes that could be made to the library here to help with this. At the moment, it seems that only successful requests fill the window, so if I'm being rate limited, I could potentially see many many failures before the backoff occurs. Is this intentional?

I'm wondering whether it would be viable/desirable to have a 'dropped' request

immediately cause backoff to occur (would require changes to the gradient limit, but they're minor)
debounce this so that it can only happen once per full window seen?

Another question would be: Would I end up with something drastically broken if the 'decrease function' for the Vegas limit were to be different for didDrop vs queueSize > beta?

Files don't have license header for Apache2

Add ability for AIMDLimit to grow by a configurable amount

When AIMD works in conjunction with windowed limit, it might be way too careful in growing back after bursty spike (and, therefore, a backoff), because it has to proceed window size requests, before increasing limit by 1

My proposal is to add new method to AIMDLimit Builder, e.g.

public AIMDLimit.Builder growBy(int amount)

defaulting to 1 (current behavior)

AdaptiveBlockingExecutor.Builder differs critically from AdaptiveBlockingExecutor ctor

The ctor automatically wraps its Limiter in BlockingLimiter whereas the Builder does not, leading to unexpected RejectedExecutionExceptions under common upgrade/update conditions.

I'm going to put together a pull request to at least simplify the application of BlockingLimiter from within the Builder pattern, although it's probably too late to make it the default.

[RFI] How does this differentiate between a dependent service getting slower and standard too much concurrency impacting latency?

I can absolutely see as this thing is tuning connection limits up and down watching average latency being a great signal. What happens when you're at steady state and one of your dependencies (say a database) becomes unresponsive and your latencies spike? It's hard to prescribe the perfect one-size meets-all response to this situation but in some scenarios I'd want it to add more concurrency to counteract the blocked threads waiting on a DB connection timeout.

Ideally it would incorporate a cpu or blocked threads metric into the equation to help determine if it's a dependency or the service itself that is impacting latency.

Subproject for Servlet Filter integration

Is this project dormant?

Hi, I just noticed a drop-off in releases and activity. Zipkin currently depends on this, but we've hidden it due to the <1.0 status. Can you let us know if this project will continue?

Spring Webflux implementation

The provided link https://github.com/spring-cloud/spring-cloud-netflix/tree/master/spring-cloud-netflix-concurrency-limits is not resolvable.

I find some question of module concurrency-limits-servlet

the sub module concurrency-limits-servlet test partition don't have folder java ,like this:

─test
    └─com
        └─netflix
            └─concurrency
                └─limits

standard folder should be like this:

─test
    └─java
        └─com
            └─netflix
                └─concurrency
                    └─limits

in class com.netflix.concurrency.limits.GroupServletLimiterTest and
ConcurrencyLimitServletFilterTest build Limiter maybe wrong.

Limiter<HttpServletRequest> limiter = new ServletLimiterBuilder()
                .limit(VegasLimit.newDefault())
                .partitionByUserPrincipal(p -> principalToGroup.get(p.getName()), builder -> builder
                 .assign("live", 0.8)
                 .assign("batch", 0.2))
                .build();

method partitionByUserPrincipal expect paramters is Function<Principal, String> principalToGroup, but threre find two paramter.

I think maybe use document introduce's way can resolve

Limiter<HttpServletRequest> limiter = new ServletLimiterBuilder()
                .limit(VegasLimit.newDefault())
                .partitionByUserPrincipal(p -> principalToGroup.get(p.getName()), builder -> builder
                 .partition("live", 0.8)
                .partition("batch", 0.2)
                .build();

of cause,maybe this is a new API developing

Why does Log10RootFunction compute Math.sqrt?

Currently Log10RootFunction has the following code

    @Override
    public Integer apply(Integer t) {
        return t < 1000 ? lookup[t] : (int)Math.sqrt(t);
    }

I assume this is a mistake as we are pre computing log10 for first 1000 number like this

    static {
        IntStream.range(0, 1000).forEach(i -> lookup[i] = Math.max(1, (int)Math.log10(i)));
    }

Can you fix this?

GC Pauses makes maxInflightRequest highly volatile when response times are lower than the pauses itself

Hi.
I've just ask in stackoverflow this question.

https://stackoverflow.com/questions/59311752/how-to-limit-concurrency-in-a-webapp-when-gc-pauses-last-more-than-the-average-r#

I've forked this project and implemented in an application and I realized that the GC Pauses last long than my average response time, so when tracking the value of maxInflightRequest I can see that when a GC (Minor o Major) is performed the value of maxInflightRequest goes up and reach the threshold that I've configured stressing the application. So I'm having rejections that should've been processed.

All the details are in the stackoverflow question

Logic behind dropping AIMD current limit by half when reaching maximum

concurrency-limits/concurrency-limits-core/src/main/java/com/netflix/concurrency/limits/limit/AIMDLimit.java

Lines 102 to 104 in 5ae2be4

    
           if (currentLimit >= maxLimit) { 
        
               currentLimit = currentLimit / 2; 
        
           }

May anyone describe why is that needed? It looks like if congestion doesn't occur and current limit reached its maximum, it's because maxlimit was pessimistically low, so instead of keep working successfully at full steam, limits will be halved and so the incoming traffic (and redistributing traffic from current instance will bring it to others and potentially hit maximum there and be halved again), which otherwise could've been processed correctly

Probe minimum RTT

The limiter currently keeps track of an absolute minimum latency since startup. For traffic with a very wide range of latency distributions this can prevent the limit from growing. I can think of two possible solutions,

Track a low percentile (10%) of latency instead of minimum to help filter out outliers.
Periodically double the minimum RTT to effectively probe for higher RTT. This is similar to probing techniques used in Google's BRR. We tried this out early on in our internal implementation and there was concern that during prolonged periods of increased latency this can result in limit drifting upwards, which is the opposite of the desired effect.

VegasLimit without slow start is unusable

Thank you for bringing this thing to the public.

VegasLimit is almost unusable as of current implementation for any high concurrent workloads (cpu bound with high core count or IO bound) due to very slow ramp up.

I see it removed in #16 because of being too aggressive and overshooting, my experiments with smoothing and exponential growth show good results. Also another thing to note that was not implemented is that Vegas seems to have another parameter, gamma (default 1). When queuing is higher than gamma it stops slow-start (sets new ssthreshold) and enters congestion avoidance. Some papers say to decrease limit by 1/8, linux does something interesting too https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_vegas.c#L219

ymmv

no PRs since I'm experimenting with this idea using Go at the moment

Subproject for GRPC Integration

Add PartitionRegistry

Hi, do you think it is a good idea to have a PartitionRegistry that is capable of adding or removing Partitions during runtime? I can submit a PR based on our conclusion.

Based on the current code, the limit for a partition is just totalLimit * Partition#percentage. Now, as per controlled external config changes, I want to add or remove partitions from the limiter. When I add new partitions with adjusted percentages across the board, I want the current limit to be updated for existing partitions and get appropriate limits for new partitions. Similar case applies for when partitions are removed. This way I can evenly divide the quota and then pool it up when there are fewer actors.

Question:Backoff with exception?

I currently only saw it takes latency/process time into account. How about exceptions?

Multiple Unlocks of a Critical Resource

concurrency-limits/concurrency-limits-core/src/main/java/com/netflix/concurrency/limits/limiter/AbstractPartitionedLimiter.java

Lines 217 to 219 in 18692b0

    
           lock.lock(); 
        
           if (getInflight() >= getLimit() && partition.isLimitExceeded()) { 
        
               lock.unlock();

concurrency-limits/concurrency-limits-core/src/main/java/com/netflix/concurrency/limits/limiter/AbstractPartitionedLimiter.java

Lines 255 to 257 in 18692b0

    
           } finally { 
        
               if (lock.isHeldByCurrentThread()) 
        
                   lock.unlock();

add ability to ignore CORS requests in ConcurrencyLimitServletFilter

prehistory

I have a backend application, which handles requests from frontend application directly (from the browser), so it sends CORS preflight requests to make sure it's a permitted client.

Those requests is being sent against the url, which runs actual logic. OPTIONS request cannot be cached so in the worst case scenario we will hit 1:1 ratio between extremely fast non-business requests and actual business logic requests. According to java filter specification, filters can be applied to urls, without any extra conditions, so the only place we can bring this is filter itself.

actual state of the library

WindowedLimit has a way to solve this via minRttThreshold, however this is not the best solution, because it doesn't target CORS problem exclusively, therefore it can mess with statistics. I am not very good at understanding what's going on in Gradient2/Vegas limits, but I assume they might be affected as well.

solution

So I would like to suggest 3 different approaches:

create CorsAwareConcurrencyLimitServletFilter/CorsIgnoringConcurrencyLimitServletFilter filter
add parameter ignoreOptionsRequests/doNotCountOptionsRequests/ignoreCorsRequests or somewhat similar to the ConcurrencyLimitServletFilter
support passing Predicate<HttpServletRequest> instead and let everyone implement whatever they might want to

I like first solution, because it highlights the hypothetic problem the most. Third approach is obviously much more flexible, but this might be an overkill + it doesn't notify users, that they should think about CORS requests problem.

Do you think this is something worth implementing into library?

Also as soon as we will decide on the approach, I'd like to provide the implementation.

Discuss abount Gradient2Limit in stable timeout situtation

hi, i want to discuss about Gradient2Limit in stable timeout situtation.

I use Gradient2Limit at consumer side (to control the consumer's rate to avoid bursting the provider side)

First, what is stable timeout situtation ?

In my situtation , every consumer has set a client timeout , like 5000ms. the stable timeout situtation is :

1. provider is bursted by other consumer which doesn't hava any rate limit.
2. and the consumer which i use Gradient2Limit always received the same timeout rtt `5000ms`, i call it as a `stable timeout` situtation

Once my consumer which used Gradient2Limit step in stable timeout situtation, the Gradient2Limit will not take effect.

It because Gradient2Limit only care about the volatility of rtt . if rtt has zero volatility, such like stable timeout situtation, Gradient2Limit will grow its limit output until Max_Limit.

So, how could i deal with this situtation? can we make Gradient2Limit to support this situtation?

Now i consider using BBR algorithm to implement a new Limit to resolve it.

Anyone has met the same situtation? how do you deal with it?

VegasLimit for batch processing

One limitation of current VegasLimit is it doesn't work well with batch processing system.

Different from a real-time processing system, a batch processing system has a buffer, requests first put into buffer, and being processed in batch. Definition of rtt is time elapsed between request put into buffer and request being processed.

The problem of VegasLimit is, rtt > rtt_noload indicates queue is building up. however in batch processing system, it's not true. For example, there is a virtual cargo shipping system, at terminal, shipping containers are being shipped at fixed interval. (for convenience, shipping container has unlimited capacity.) Packages arrived at the terminal will be shipped by next pending send shipping container.

In this model, a package arrived at terminal just before next shipping event should have smallest rtt aka rtt_noload, then right after container shipped, next arrived package should have largest rtt (rtt_noload + interval). current VegasLimit could incorrect treat this system as saturated, because of interval, rtt could be significantly larger than noload_rtt. however, this virtual shipping system is not saturated at all.

Kafka message producing is a concrete example of batch processing, Our goal is to use VegasLimit to detect saturation of message producing.

[Question] Is this gitrepo active?

Dear experts,

Adaptive concurrent limits is really awesome! Seems last release of this repo is Year 2019, is this git repo still active?
BTW, does it support netflix spring cloud?

Really appreciate ur great help in advance!

Thanks,
Roy

Adding License info in README.md

Adding License section and related info in README.md of the project
e.g.

	if (currentLimit >= maxLimit) {
	currentLimit = currentLimit / 2;
	}

	lock.lock();
	if (getInflight() >= getLimit() && partition.isLimitExceeded()) {
	lock.unlock();

netflix / concurrency-limits Goto Github PK

concurrency-limits's Issues

prehistory

actual state of the library

solution

Recommend Projects

Recommend Topics

Recommend Org