aw / cacherules Goto Github PK
View Code? Open in Web Editor NEWRuby HTTP caching library aimed at being RFC 7234 compliant
Home Page: https://a1w.ca
License: Other
Ruby HTTP caching library aimed at being RFC 7234 compliant
Home Page: https://a1w.ca
License: Other
When redirecting to another URI, the client is redirected using the 307
HTTP status code.
This might not work for user agents using HTTP/1.0
or older.
302
302
or 307
For some strange reason, all recently cached files are automatically labeled as "expired" if requested without any cache-control
or expiry headers.
I have a feeling this is wrong, should investigate..
It appears there is no validation on the max-age
client request header.
There is also only an if max-age == 0
check on the cached response header. This means if a client or server sends Cache-Control: max-age=10
and the response's Age
field is great than 10, it will be ignored. Oops.
This continues from #9 which didn't fully fix the problem.
After some local tests, we've observed this:
"last-modified"=>"Thu, 30 Apr 2015 13:17:42 GMT", # HIT
"last-modified"=>"Fri, 01 May 2015 02:45:29 GMT", # STALE
That's obviously an massive error and completely backwards. I believe it's related to the freshness_lifetime
calculation.. looking into it.
This was discovered thanks to Amazon S3 sending an invalid HTTP Content-Type
header, which in fact is not even allowed to be empty according to the RFCs.
Location: https://s3.amazonaws.com/production.s3.rubygems.org/specs.4.8.gz
HTTP/1.1 200 OK
x-amz-id-2: alURqgFXa/nVfPpLEJQVOs3fJ8t9C9Z+MlAdMAaC62d1ZsccK06NVpqRLxlD1KVB
x-amz-request-id: 1581E6BF3225E185
Date: Wed, 29 Apr 2015 15:22:57 GMT
x-amz-version-id: qc5dRNqFJO68y1JxMODSS1fQf.8WZ8IF
Last-Modified: Wed, 29 Apr 2015 15:21:33 GMT
ETag: "c64eff68233a552b9737972ad8c2fb86"
Accept-Ranges: bytes
Content-Type:
Content-Length: 2364063
Server: AmazonS3
I vote ๐ to just drop empty headers.
Detected this ๐ in our production setup.
Validation always returns a 504 Gateway Timeout / EXPIRED
result when no Cache-Control: max-stale
header is provided. This is an error as seen in the RFC:
If no value is assigned to max-stale, then the client is willing to accept a stale response of any age.
Hi there,
Whilst source sifting I stumbled upon this:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L64-L74
It is used in the "validate_allow_stale" or may I serve this request as stale? However, min-fresh
means: the request needs to be fresh for at least this long when you return it. A request with a min-fresh value can never be stale, as by definition it's then no longer fresh.
In fact, you COULD use min-fresh to reduce the maximum age before it becomes stale. and as is, I think, if I understand this correctly, that is what you have implemented it here (in inverse, by adding to the current age, acting as if it's older):
Lines 317 to 324 in 537667a
Now since that is correct, unfortunately, if min-fresh is given, and it's fresh, the validate_allow_stale
function give... true? Min-fresh should probably not have anything to do with this.
p.s. I love what you've done!
Hi again,
When you hit the revalidation code, the following rules are applied:
https://github.com/aw/CacheRules/blob/master/lib/cache_rules.rb#L127-L133
Which internally calls:
https://github.com/aw/CacheRules/blob/master/lib/helpers.rb#L365-L369
Now I could not find such mandatory. This might be a design choice in your library, but I don't think a Gateway Timeout is appropriate for all cases here.
Let's consider the case of a simple must-revalidate
request.
< Cache-Control: must-revalidate, max-age=60
< Date: Fri, 13 Jul 2018 16:40:00 +0000
< HTTP 200 Ok
Meaning, fresh for 60 seconds, MUST NOT use stale when it has expired. If requested past 16:41, it SHOULD just retry the request. In this case, because no ETag
or Last-Modified
is present in the cached response, nor is there a If-None-Match
in the request, it gives us a 504, but we have not even tried to reach the origin server.
I think you implemented it as such because of https://tools.ietf.org/html/rfc7234#section-4.3.1 where it says
When sending a conditional request for cache validation, a cache
sends one or more precondition header fields containing validator
metadata from its stored response(s), which is then compared by
recipients to determine whether a stored response is equivalent to a
current representation of the resource.
However, when you don't have these headers, you would not send a conditional request, but a regular one. This is how both Chrome and Firefox have implemented it. It is mentioned in the mozilla docs: It is either validated or fetched again.
Because of the careful wording in the RFC, and not using a capitalized MUST/SHOULD in this paragraph, I believe you must always try to revalidate in the flow, regardless of the presence of the preconditions. It becomes, semantically, a conditional request if one of the headers is present, but otherwise it's a regular fetch request (and will always return a non-304 result).
Posted the RFC entry just for ease. The other mentions are only "triggering" extra invalidation/rules, but nothing says anything about an ETag / Last-Modified being mandatory.
https://tools.ietf.org/html/rfc7234#section-5.2.2.1
The "must-revalidate" response directive indicates that once it has
become stale, a cache MUST NOT use the response to satisfy subsequent
requests without successful validation on the origin server.
The must-revalidate directive is necessary to support reliable
operation for certain protocol features. In all circumstances a
cache MUST obey the must-revalidate directive; in particular, if a
cache cannot reach the origin server for any reason, it MUST generate
a 504 (Gateway Timeout) response.
The must-revalidate directive ought to be used by servers if and only
if failure to validate a request on the representation could result
in incorrect operation, such as a silently unexecuted financial
transaction.
Related to #15
In the decision table:
rule | result |
---|---|
500 error | no |
matches precondition | no |
> Host: "https://example.org"
> Location: /my-resource
> Method: HEAD # or GET
> If-None-Match: "W/myetag"
< Host: "https://example.org"
< Location: /my-resource
< HTTP 200 Ok
< ETag: "W/newetag"
Currently it returns a 200 with the "cached" result, but actually a GET
should be done to fetch the resource again. The server would return a 304 (correct in the table as well) if the preconditions match, and in that case it may use the result from the cache, but on a 200 it means that the resource has changed and should be served from the origin server.
I propose changing it to MISS
. You might as well do a GET
for revalidation. In case of 304, you'll get a head :not_modified
, so you don't really gain anything here from doing a HEAD
. Additionally, I don't the 200 status here is correct. It should report whichever status was returned from the server. Example:
> Host: "https://example.org"
> Location: /my-resource
> Method: HEAD # or GET
> If-None-Match: "W/myetag"
< Host: "https://example.org"
< Location: /my-resource
< HTTP 410 Gone
In this case on revalidation the resource has reported gone. It is correctly not a "stale-allowing-error" (your implementation of validate_is_error
) and definitely not REVALIDATED
. Should be MISS
with the code 410. This is true for anything that's not a server error (500..599).
All of the above counts when the preconditions do not match, or are not even present in the first place (revalidation is then a simple fetch, as per #15).
A precondition such as If-None-Match
, ETag
or Last-Modified
header is necessary to perform revalidation of an expired response.
If this is not present, then we have nothing to revalidate against.
The tests passed, so everything must be fine right? wrong!
In fact, I'm certain the tests are wrong and the cache rules are not being applied correctly. Investigating..
When sending a 200
or 304
response, we need to return the original cached headers with existing headers overwritten by the revalidated response.
At the moment only the following are returned: Cache-Control Content-Location Date ETag Expires Vary
.
This was by design, but it's wrong.
If a revalidate response returns a 302 or other HTTP redirect, the make_request()
call should follow it.
Hi there,
Currently, on revalidation, a 304 will only be generated if the precondition matches:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L124-L127"
However, this does not handle all the preconditions:
https://tools.ietf.org/html/rfc7234#section-4.3.2
If an If-None-Match header field is not present, a request containing
an If-Modified-Since header field (Section 3.3 of [RFC7232])
indicates that the client wants to validate one or more of its own
stored responses by modification date. A cache recipient SHOULD
generate a 304 (Not Modified) response (using the metadata of the
selected stored response) if one of the following cases is true: 1)
the selected stored response has a Last-Modified field-value that is
earlier than or equal to the conditional timestamp; 2) no
Last-Modified field is present in the selected stored response, but
it has a Date field-value that is earlier than or equal to the
conditional timestamp; or, 3) neither Last-Modified nor Date is
present in the selected stored response, but the cache recorded it as
having been received at a time earlier than or equal to the
conditional timestamp.
On such a request, CacheRules
will return a 200 with REVALIDATED
as lookup value.
The funny thing is, you have actually already written this:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L29
A simple replacement of validator_match
with precond_match
here should fix this:
https://github.com/aw/CacheRules/blob/master/lib/cache_rules.rb#L89
Header should be returned as a String
, but processed as an Integer
Fixed here 92c3b05
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.