netflix / concurrency-limits Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
recently, we tried this lib for concurrency limitation. And we found that the rtt_noload may grow too large to limit client concurrency. so we may need to set the base line of rtt_noload
Could this be changed to RESOURCE_EXHAUSTED
that doesn't make it look like the server is down?
SimpleLimiter.acquire checks if the current in-flight executions are above the limit, but doesn't do so in a thread-safe manner, allowing the possibility of exceeding the limit.
https://github.com/Netflix/concurrency-limits/blob/master/README.md#vegas doesn't mention exactly what L, minRTT and sampleRTT are.
I've been thinking how a perfect installation of concurrency-limits might look like for my needs. I walked along the path Vegas
-> Gradient2
-> AIMD
-> AIMD
wrapped with Windowed
. And it gave me an idea that calculating average response times usually less representative for real-time services, even if average would've been a perfect p50, I'd like to have my limits reduced earlier than the time when half of my users are out of acceptable timings. What people usually monitor is percentiles like p90, p95, p99, p999, and that's the metric I would prefer to have for my AIMDLimit, because it's a better representation for users experience. So I think concurrency-limits would be a better tool, if it has a percentile-window support.
I've made some research according to moving percentile algorithms and it seems like the only O(1) algorithm described here and it requires some knowledge about time distribution and it might be less accurate in some cases (check out the plot at the bottom of the article), also it's less straightforward, therefore it's more error-prone and harder to trust in.
I'd like to help with that, if team doesn't mind and I think that O(N)
space, O(logN)
insert time, O(1)
percentile calculation implementation might be an appropriate solution here. My idea for now looks like this:
WindowedLimit
to work with abstractions other than ImmutableSampleWindow
.PercentileSampleWindow
class which is mutable (I think copying N observations on each sample is too much) which will be backed with PriorityBlockingQueue
size()*percentile
value, call onSample()
and update the WindowedLimit
's field with a brand new instance of PercentileSampleWindow
.What do you think so far?
I'm wondering if it makes sense/is possible to have an implementation for Spring WebFlux.
With reactive service listeners not being tied to a thread pool, there would seem to be even greater need for some mechanism limiting incoming requests to not overwhelm downstream systems.
The ConcurrencyLimitServerInterceptor simply passes thru, bypassing its Limiter, any calls that don’t use bi-directional streaming:
@OverRide
public <ReqT, RespT> Listener interceptCall(final ServerCall<ReqT, RespT> call,
final Metadata headers,
final ServerCallHandler<ReqT, RespT> next)
{ if (!call.getMethodDescriptor().getType().serverSendsOneMessage() || !call.getMethodDescriptor().getType().clientSendsOneMessage()) {
return next.startCall(call, headers);
}
So, in systems that make use of only unary or one-direction streaming, which one would think are quite common, the Limiter, and essentially the entire subsystem, do nothing.
Am I correct in presuming that the assumption here is that any call that doesn’t do bidirectional streaming will be “fast” and has no need for limiting? Any plans to change or config this behavior?
Thanks.
We'd love to hear about how people are using the limiter. We'd especially like to hear about where it works and under which scenarios it does not perform as expected. Feel free to comment on this issue with your observations.
Hi, I would like to discuss a feature I need. I am not entirely sure if it fits well with concurrency limiting logic.
Problem: The cost (latency or resource usage) of the jobs should have high impact on the concurrency limit of the partition that issued them. The partition that is issuing costly jobs should receive lower concurrency limit when compared to other partitions with no change in their load pattern. The concurrency limit could get affected overall initially but as soon as the partition with costly jobs is penalized, the other partitions should be restored to normal limits.
To achieve this, do we just maintain separate SimpleLimiter for each of our partition type or is there a better way?
To make it easier to use concurrency-limits
with libraries using ExecutorService
(Apache Camel in my case), it would be great if concurrency-limits
would provide an ExecutorService
implementation.
Sorry if this is out of scope.
By the way: Thank you very much for providing concurrency-limits
!
GradientLimit is very similar to Gradient2Limit except for few minor things and looks like GradientLimit needs some fixes, such as:
- estimatedLimit = Math.max(minLimit, queueSize);
+ estimatedLimit = Math.max(minLimit, Math.min(maxLimit, queueSize));
...
- newLimit = Math.max(queueSize, Math.min(maxLimit, newLimit));
+ newLimit = Math.max(minLimit, Math.min(maxLimit, newLimit));
And those from this PR: #134.
With all these fixes, GradientLimit will be exactly equal to Gradient2Limit except for the probeInterval.
After updating from 0.1.1
to 0.1.2
I get:
java.lang.ArithmeticException: long overflow
at java.lang.Math.multiplyExact(Math.java:892)
at java.time.Duration.toMillis(Duration.java:1171)
at com.netflix.concurrency.limits.limiter.BlockingLimiter.tryAcquire(BlockingLimiter.java:70)
at com.netflix.concurrency.limits.limiter.BlockingLimiter.acquire(BlockingLimiter.java:87)
The same exception occurs when I use version 0.1.4
.
If we drop the request, the delay will be almost 0. At this time, onSample will not be able to increase the limit very well, but will keep dropping, and then let more requests be dropped.
Consider an asynchronous system with external queue from where requests are sent. The response to this request merely means an ack to the request and not the handling. The server will send a reply related to the request after an arbitrary time which depends on the load on the system. For long response times, say in the order of minutes, keeping the corresponding requests in memory to measure the latency defeats the purpose of async system.
Is it possible to find the concurrency limits in such a system? I assume that we have request rate at which we are sending the requests and response rates that which system is finishing the requests (these can be measured cheaply and have no request/response association needed). In a steady state, request rate and response rate matches. We can increase the request rate until response rate stagnates.
Is there anything we can apply from TCP congestion control (or some other technique) to fine tune the request rate to slowly increase to max capacity and keep tweaking the request rate based on the response rates?
The basic idea is, in a given window:
if request_rate <= response_rate:
// (less is possible since responses are for requests from a prev window)
request_rate = request_rate * (1 + x)
else:
request_rate = request_rate * (1 - y)
Questions:
when i use a ThreadPoolExecutor. may be all Runnable wait in queue for some time.
I think we had better get a function to send the real_no_loadTime to deal the concurrency-limit
I've been toying with this implementation to see if it would fit in our system. I've set up a harness to send in work to the algorithm to see how it responds to different scenarios.
I've noticed it is incredibly sensitive to the rtt_noload value. This is currently selected as the fastest seen rtt. In our system we can have a fairly wide deviation on rtts so we sometimes get requests that are significantly faster than at other times. For example, we might have an endpoint that has a mean response time of 300ms but, very occasionally, might return in 150ms.
That 150ms will get stored as the rtt_noload. This causes our estimatedlimit to drop hard and we will frequently go over the limit.
I've tried modifying the algorithm to store the lowest weighted moving average as the rtt_noload and this can help somewhat but is still susceptible to "lucky" periods that happen to be faster than the mean.
Are there any thoughts on a better way to select the rtt_noload? This selection seems fundamental to the algorithm.
Travis are now recommending removing the sudo tag.
"If you currently specify sudo: false in your .travis.yml, we recommend removing that configuration"
Hi @elandau, what's your input on integrating this library with Thrift? I would like to experiment on that. As far as I know, Thrift doesn't have flexible interceptors to tap into but I think I could figure out a workaround using Object Proxy Design pattern.
I'm starting to investigate creating a .NET library that implements some of the concepts in this library, but the part that I'm struggling with a bit is figuring out how I can simulate some of the issues that the library is meant to protect against (latency spikes, resource exhaustion, etc). Basically I want to make sure that what I'm implementing is actually working as expected and providing a benefit.
I was just wondering if you had any pointers on how to go about simulating or benchmarking to save me reinventing the wheel, or had any links to information you used when designing and implementing this library.
Thanks in advance!
Hi - I'm invoking a REST based service from my application using webclient, and would like to use this ConcurrencyLimitServletFilter in my client to apply backpressure to limit the number of request sent to the server so the server will not get overloaded too much and client will not wait a long time to get connection failure.
ConcurrencyLimitServletFilter can this filter be used for both client and server concurrency limiting?
If someone can provide a Sample of how to register ConcurrencyLimitServletFilter in webclient will be helpful.
I'm trying to figure out, why (and does it?) this condition check for windowed limit is important https://github.com/Netflix/concurrency-limits/blob/master/concurrency-limits-core/src/main/java/com/netflix/concurrency/limits/limit/WindowedLimit.java#L155
It looks like under low load it wouldn't let the delegate limit to perform (while it probably should raise limit, especially if it was pessimisized before on overload)
And under high load it will work fine with averages, but with percentiles - you're either skipping too many samples(because they exceed window size) or letting actual logic not to be executed, because you will be setting window size to a big number. So picking window size became really tricky and hard to achieve properly
The code in the Executor section of README.md
refers to the class DefaultLimiter
which does not seem exist anymore.
Hi @elandau, thanks for providing this library. I have a question around partition based limiters.
From the code, the partition limits are calculated as newTotalLimit * Partition#percent. But the README talks about starving one partition to favor another. How would I achieve starving something like the batch traffic from the example?
For example, a system that takes both live and batch traffic may want to give live traffic 100% of the limit during heavy load and is OK with starving batch traffic.
Any help from you or other users is highly appreciated!
We currently have servers which have rate limits. We'd prefer to keep these, since at a certain level they're useful for enforcing good development practices. Right now, our backoff policy is randomized exponential backoff with a maximum number of backoffs per individual request.
This does not work for providing backpressure, since if the 'target rate' is greater than the rate limit, eventually a request will (randomly or not) hit its maximum number of retries by being unlucky.
I've been investigating using this library in order to cause backpressure on the client side. The gist is that the server has a rate limit, and if the client is exceeding this rate limit, they will immediately receive an HTTP 429. The client will then back off for a certain amount of time, after which they mark their request as 'dropped'.
With AIMD, this works moderately well - eventually the client goes above the rate limit and a request is dropped. The next window, the concurrency is dropped by half, which generally will correspondingly drop the concurrency too, and cause the rate limit to be missed.
Now, the main issue here is that if the window size is larger than the minimum amount of time spent backing off, there is still some small probability that the request will fail when it hits this point, e.g. if I have 21 threads spinning on making requests, a rate limit of 20 requests per second, with a successful request taking 1 second, and a rate limited request finishing immediately, I will almost surely fail with AIMD - since if e.g. the window is 100, it'll take 5 seconds to close the window, and we don't back off for 5 seconds.
The Vegas limit is even worse, since it's even more aggressive at ramping up and takes potentially many windows to ramp down, which means that failure is roughly guaranteed when the rate limit is low compared to the window size.
It also looks like I should prefer to use the Vegas limit in general, since it is the one that actually responds to being queued.
I'm wondering if there's any changes that could be made to the library here to help with this. At the moment, it seems that only successful requests fill the window, so if I'm being rate limited, I could potentially see many many failures before the backoff occurs. Is this intentional?
I'm wondering whether it would be viable/desirable to have a 'dropped' request
Another question would be: Would I end up with something drastically broken if the 'decrease function' for the Vegas limit were to be different for didDrop vs queueSize > beta?
When AIMD works in conjunction with windowed limit, it might be way too careful in growing back after bursty spike (and, therefore, a backoff), because it has to proceed window size
requests, before increasing limit by 1
My proposal is to add new method to AIMDLimit Builder, e.g.
public AIMDLimit.Builder growBy(int amount)
defaulting to 1 (current behavior)
The ctor automatically wraps its Limiter in BlockingLimiter whereas the Builder does not, leading to unexpected RejectedExecutionExceptions under common upgrade/update conditions.
I'm going to put together a pull request to at least simplify the application of BlockingLimiter from within the Builder pattern, although it's probably too late to make it the default.
I can absolutely see as this thing is tuning connection limits up and down watching average latency being a great signal. What happens when you're at steady state and one of your dependencies (say a database) becomes unresponsive and your latencies spike? It's hard to prescribe the perfect one-size meets-all response to this situation but in some scenarios I'd want it to add more concurrency to counteract the blocked threads waiting on a DB connection timeout.
Ideally it would incorporate a cpu or blocked threads metric into the equation to help determine if it's a dependency or the service itself that is impacting latency.
Hi, I just noticed a drop-off in releases and activity. Zipkin currently depends on this, but we've hidden it due to the <1.0 status. Can you let us know if this project will continue?
The provided link https://github.com/spring-cloud/spring-cloud-netflix/tree/master/spring-cloud-netflix-concurrency-limits is not resolvable.
concurrency-limits-servlet
test partition don't have folder java
,like this:─test
└─com
└─netflix
└─concurrency
└─limits
standard folder should be like this:
─test
└─java
└─com
└─netflix
└─concurrency
└─limits
com.netflix.concurrency.limits.GroupServletLimiterTest
andConcurrencyLimitServletFilterTest
build Limiter maybe wrong.Limiter<HttpServletRequest> limiter = new ServletLimiterBuilder()
.limit(VegasLimit.newDefault())
.partitionByUserPrincipal(p -> principalToGroup.get(p.getName()), builder -> builder
.assign("live", 0.8)
.assign("batch", 0.2))
.build();
method partitionByUserPrincipal
expect paramters is Function<Principal, String> principalToGroup
, but threre find two paramter.
I think maybe use document introduce's way can resolve
Limiter<HttpServletRequest> limiter = new ServletLimiterBuilder()
.limit(VegasLimit.newDefault())
.partitionByUserPrincipal(p -> principalToGroup.get(p.getName()), builder -> builder
.partition("live", 0.8)
.partition("batch", 0.2)
.build();
of cause,maybe this is a new API developing
Currently Log10RootFunction has the following code
@Override
public Integer apply(Integer t) {
return t < 1000 ? lookup[t] : (int)Math.sqrt(t);
}
I assume this is a mistake as we are pre computing log10
for first 1000 number like this
static {
IntStream.range(0, 1000).forEach(i -> lookup[i] = Math.max(1, (int)Math.log10(i)));
}
Can you fix this?
Hi.
I've just ask in stackoverflow this question.
I've forked this project and implemented in an application and I realized that the GC Pauses last long than my average response time, so when tracking the value of maxInflightRequest I can see that when a GC (Minor o Major) is performed the value of maxInflightRequest goes up and reach the threshold that I've configured stressing the application. So I'm having rejections that should've been processed.
All the details are in the stackoverflow question
May anyone describe why is that needed? It looks like if congestion doesn't occur and current limit reached its maximum, it's because maxlimit was pessimistically low, so instead of keep working successfully at full steam, limits will be halved and so the incoming traffic (and redistributing traffic from current instance will bring it to others and potentially hit maximum there and be halved again), which otherwise could've been processed correctly
The limiter currently keeps track of an absolute minimum latency since startup. For traffic with a very wide range of latency distributions this can prevent the limit from growing. I can think of two possible solutions,
Thank you for bringing this thing to the public.
VegasLimit is almost unusable as of current implementation for any high concurrent workloads (cpu bound with high core count or IO bound) due to very slow ramp up.
I see it removed in #16 because of being too aggressive and overshooting, my experiments with smoothing and exponential growth show good results. Also another thing to note that was not implemented is that Vegas seems to have another parameter, gamma (default 1). When queuing is higher than gamma it stops slow-start (sets new ssthreshold) and enters congestion avoidance. Some papers say to decrease limit by 1/8, linux does something interesting too https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_vegas.c#L219
ymmv
no PRs since I'm experimenting with this idea using Go at the moment
Hi, do you think it is a good idea to have a PartitionRegistry that is capable of adding or removing Partitions during runtime? I can submit a PR based on our conclusion.
Based on the current code, the limit for a partition is just totalLimit * Partition#percentage. Now, as per controlled external config changes, I want to add or remove partitions from the limiter. When I add new partitions with adjusted percentages across the board, I want the current limit to be updated for existing partitions and get appropriate limits for new partitions. Similar case applies for when partitions are removed. This way I can evenly divide the quota and then pool it up when there are fewer actors.
I currently only saw it takes latency/process time into account. How about exceptions?
I have a backend application, which handles requests from frontend application directly (from the browser), so it sends CORS preflight requests to make sure it's a permitted client.
Those requests is being sent against the url, which runs actual logic. OPTIONS request cannot be cached so in the worst case scenario we will hit 1:1 ratio between extremely fast non-business requests and actual business logic requests. According to java filter specification, filters can be applied to urls, without any extra conditions, so the only place we can bring this is filter itself.
WindowedLimit
has a way to solve this via minRttThreshold
, however this is not the best solution, because it doesn't target CORS problem exclusively, therefore it can mess with statistics. I am not very good at understanding what's going on in Gradient2
/Vegas
limits, but I assume they might be affected as well.
So I would like to suggest 3 different approaches:
CorsAwareConcurrencyLimitServletFilter
/CorsIgnoringConcurrencyLimitServletFilter
filterignoreOptionsRequests
/doNotCountOptionsRequests
/ignoreCorsRequests
or somewhat similar to the ConcurrencyLimitServletFilter
Predicate<HttpServletRequest>
instead and let everyone implement whatever they might want toI like first solution, because it highlights the hypothetic problem the most. Third approach is obviously much more flexible, but this might be an overkill + it doesn't notify users, that they should think about CORS requests problem.
Do you think this is something worth implementing into library?
Also as soon as we will decide on the approach, I'd like to provide the implementation.
hi, i want to discuss about Gradient2Limit in stable timeout
situtation.
I use Gradient2Limit at consumer side (to control the consumer's rate to avoid bursting the provider side)
First, what is stable timeout
situtation ?
In my situtation , every consumer has set a client timeout
, like 5000ms. the stable timeout
situtation is :
1. provider is bursted by other consumer which doesn't hava any rate limit.
2. and the consumer which i use Gradient2Limit always received the same timeout rtt `5000ms`, i call it as a `stable timeout` situtation
Once my consumer which used Gradient2Limit step in stable timeout
situtation, the Gradient2Limit will not take effect.
It because Gradient2Limit only care about the volatility of rtt . if rtt has zero volatility, such like stable timeout
situtation, Gradient2Limit will grow its limit output
until Max_Limit
.
So, how could i deal with this situtation? can we make Gradient2Limit to support this situtation?
Now i consider using BBR algorithm to implement a new Limit to resolve it.
Anyone has met the same situtation? how do you deal with it?
One limitation of current VegasLimit is it doesn't work well with batch processing system.
Different from a real-time processing system, a batch processing system has a buffer, requests first put into buffer, and being processed in batch. Definition of rtt is time elapsed between request put into buffer and request being processed.
The problem of VegasLimit is, rtt > rtt_noload indicates queue is building up. however in batch processing system, it's not true. For example, there is a virtual cargo shipping system, at terminal, shipping containers are being shipped at fixed interval. (for convenience, shipping container has unlimited capacity.) Packages arrived at the terminal will be shipped by next pending send shipping container.
In this model, a package arrived at terminal just before next shipping event should have smallest rtt aka rtt_noload, then right after container shipped, next arrived package should have largest rtt (rtt_noload + interval). current VegasLimit could incorrect treat this system as saturated, because of interval, rtt could be significantly larger than noload_rtt. however, this virtual shipping system is not saturated at all.
Kafka message producing is a concrete example of batch processing, Our goal is to use VegasLimit to detect saturation of message producing.
Dear experts,
Adaptive concurrent limits is really awesome! Seems last release of this repo is Year 2019, is this git repo still active?
BTW, does it support netflix spring cloud?
Really appreciate ur great help in advance!
Thanks,
Roy
Adding License section and related info in README.md of the project
e.g.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.