Coder Social home page Coder Social logo

cacherules's Introduction

What is CacheRules

CacheRules is a well-behaved HTTP caching library aimed at being RFC 7234 compliant.

This library does not actually cache anything, and it is not a proxy. It validates HTTP headers and returns the appropriate response to determine if a request can be served from the cache.

It is up to the HTTP Cache implementation to store the cached results and serve responses from the cache if necessary.

Build Status Coverage Status Gem Version

Getting started

Add this line to your Gemfile: gem 'cache_rules'

or

Install with: gem install cache_rules

Usage

There is only 1 public API call when using this library: validate().

require 'cache_rules'

# test without cached response
url     = 'https://status.rubygems.org'
request = {'Version' => 'HTTP/1.1'}
cached  = {}

CacheRules.validate url, request, cached

=> {:body=>nil, :code=>307, :headers=>{"Cache-Lookup"=>"MISS", "Location"=>"https://status.rubygems.org"}}

# test with cached response (status code 200 because no ETag or If-None-Match supplied)
cached  = { "Date" => {"timestamp"=>1420095825}, "X-Cache-Req-Date"  => {"timestamp"=>1420268625}, "X-Cache-Res-Date"  => {"timestamp"=>1420268625} }

CacheRules.validate url, request, cached

=> {:body=>"stale", :code=>200, :headers=>{"Date"=>"Wed, 21 Feb 2018 05:09:27 GMT", "Age"=>"99094242", "Warning"=>"110 - \"Response is Stale\"", "Cache-Lookup"=>"STALE"}}

# test with cached response (status code 304 because If-None-Match supplied)
request = {"Version"=>"HTTP/1.1", "If-None-Match"=>"*"}
cached  = { "Date" => {"timestamp"=>1519190160}, "X-Cache-Req-Date"  => {"timestamp"=>1519190160}, "X-Cache-Res-Date"  => {"timestamp"=>1519190160} }

CacheRules.validate url, request, cached

=> {:body=>nil, :code=>304, :headers=>{"Date"=>"Wed, 21 Feb 2018 05:15:01 GMT", "Age"=>"241", "Warning"=>"110 - \"Response is Stale\"", "Cache-Lookup"=>"STALE"}}

The request headers must be a Ruby Hash or Array of 2-element Arrays.

The cached headers must already have been normalized by this caching library, i.e: they must include array keys:

  • Date['timestamp']
  • X-Cache-Req-Date['timestamp']
  • X-Cache-Res-Date['timestamp']

See test/ directory for more examples.

Decision tables

There are two decision tables to help figure out how to process each type of HTTP Caching request.

Request/Cache Table

cached

Revalidation Table

revalidation

RFC compliance

This HTTP Caching library aims to be RFC 7230-7235 compliant. It is a best effort attempt to correctly interpret these documents. Some errors may exist, so please notify me if something isn't processed correctly according to the RFCs.

Feature list

  • Normalizing header names and field values (ex: Last-Modified)
  • Ensuring date fields are correctly formatted (ex: Fri, 31 Dec 1999 23:59:59 GMT)
  • Merging duplicate header fields
  • Interop with HTTP/1.0 clients (ex: Pragma: no-cache)
  • Weak entity-tag matching (ex: If-None-Match: "W/abc123")
  • Last modified date matching (ex: If-Modified-Since: Thu, 01 Jan 2015 07:03:45 GMT)
  • Various header validation including Cache-Control headers
  • Cache-Control directives with quoted strings (ex: no-cache="Cookie")
  • Removing non-cacheable headers (ex: Authorization)
  • Correctly calculating freshness and current age of cached responses
  • Explicit and Heuristic freshness calculation
  • Returning 110 and 111 Warning headers when serving stale responses
  • Revalidating expired responses with the origin server (using HEAD)
  • Returning the correct status code based on validation/revalidation results
  • Lots more little things sprinkled throughout the RFCs...
  • Written in purely functional Ruby (mostly) with 100% unit/integration test coverage

Custom headers

Custom headers are generated to help with testing and compliance validation.

These are somewhat based on CloudFlare's cache headers:

  • Cache-Lookup: HIT: resource is in cache and still valid. Serving from the cache.
  • Cache-Lookup: MISS: resource is not in cache. Redirecting to the origin server.
  • Cache-Lookup: EXPIRED: resource is in cache, but expired. Redirecting to the origin server or serving an error message.
  • Cache-Lookup: STALE: resource is in cache and expired, but the origin server wasn't contacted successfully to revalidate the request. Serving stale response from the cache.
  • Cache-Lookup: REVALIDATED: resource is in cache, was expired, but was revalidated successfully at the origin server. Serving from the cache.

Tests

To run the tests, type:

bundle exec rake test

TODO

  • Validation of s-maxage response header
  • Handling Vary header and different representations for the same resource
  • Handling 206 (Partial) and Range headers for resuming downloads
  • Handling Cache-Control: private headers
  • Caching other cacheable responses such as 404 and 501

What is C.R.E.A.M. ?

C.R.E.A.M. is an influencial lyrical masterpiece from the 90s performed by the Wu-Tang Clan

It's also the premise of this troll video

Further reading

Some useful articles explaining HTTP Caching:

LICENSE

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

Copyright (c) 2014-2018 Alexander Williams, Unscramble

cacherules's People

Contributors

aw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cacherules's Issues

Improper validation of max-stale

Detected this ๐Ÿ› in our production setup.

Validation always returns a 504 Gateway Timeout / EXPIRED result when no Cache-Control: max-stale header is provided. This is an error as seen in the RFC:

If no value is assigned to max-stale, then the client is willing to accept a stale response of any age.

Caching rules are inconsistent

The tests passed, so everything must be fine right? wrong!

In fact, I'm certain the tests are wrong and the cache rules are not being applied correctly. Investigating..

Revalidation with matching If-Not-Modified-Since is 200 instead of 304

Hi there,

Currently, on revalidation, a 304 will only be generated if the precondition matches:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L124-L127"

However, this does not handle all the preconditions:

https://tools.ietf.org/html/rfc7234#section-4.3.2

   If an If-None-Match header field is not present, a request containing
   an If-Modified-Since header field (Section 3.3 of [RFC7232])
   indicates that the client wants to validate one or more of its own
   stored responses by modification date.  A cache recipient SHOULD
   generate a 304 (Not Modified) response (using the metadata of the
   selected stored response) if one of the following cases is true: 1)
   the selected stored response has a Last-Modified field-value that is
   earlier than or equal to the conditional timestamp; 2) no
   Last-Modified field is present in the selected stored response, but
   it has a Date field-value that is earlier than or equal to the
   conditional timestamp; or, 3) neither Last-Modified nor Date is
   present in the selected stored response, but the cache recorded it as
   having been received at a time earlier than or equal to the
   conditional timestamp.

On such a request, CacheRules will return a 200 with REVALIDATED as lookup value.

The funny thing is, you have actually already written this:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L29

A simple replacement of validator_match with precond_match here should fix this:
https://github.com/aw/CacheRules/blob/master/lib/cache_rules.rb#L89

Revalidation is "EXPIRED" when there is no precondition

Hi again,

When you hit the revalidation code, the following rules are applied:
https://github.com/aw/CacheRules/blob/master/lib/cache_rules.rb#L127-L133

Which internally calls:
https://github.com/aw/CacheRules/blob/master/lib/helpers.rb#L365-L369

Now I could not find such mandatory. This might be a design choice in your library, but I don't think a Gateway Timeout is appropriate for all cases here.

Let's consider the case of a simple must-revalidate request.

< Cache-Control: must-revalidate, max-age=60
< Date: Fri, 13 Jul 2018 16:40:00 +0000
< HTTP 200 Ok

Meaning, fresh for 60 seconds, MUST NOT use stale when it has expired. If requested past 16:41, it SHOULD just retry the request. In this case, because no ETag or Last-Modified is present in the cached response, nor is there a If-None-Match in the request, it gives us a 504, but we have not even tried to reach the origin server.

I think you implemented it as such because of https://tools.ietf.org/html/rfc7234#section-4.3.1 where it says

   When sending a conditional request for cache validation, a cache
   sends one or more precondition header fields containing validator
   metadata from its stored response(s), which is then compared by
   recipients to determine whether a stored response is equivalent to a
   current representation of the resource.

However, when you don't have these headers, you would not send a conditional request, but a regular one. This is how both Chrome and Firefox have implemented it. It is mentioned in the mozilla docs: It is either validated or fetched again.

Because of the careful wording in the RFC, and not using a capitalized MUST/SHOULD in this paragraph, I believe you must always try to revalidate in the flow, regardless of the presence of the preconditions. It becomes, semantically, a conditional request if one of the headers is present, but otherwise it's a regular fetch request (and will always return a non-304 result).


Posted the RFC entry just for ease. The other mentions are only "triggering" extra invalidation/rules, but nothing says anything about an ETag / Last-Modified being mandatory.

https://tools.ietf.org/html/rfc7234#section-5.2.2.1   

   The "must-revalidate" response directive indicates that once it has
   become stale, a cache MUST NOT use the response to satisfy subsequent
   requests without successful validation on the origin server.

   The must-revalidate directive is necessary to support reliable
   operation for certain protocol features.  In all circumstances a
   cache MUST obey the must-revalidate directive; in particular, if a
   cache cannot reach the origin server for any reason, it MUST generate
   a 504 (Gateway Timeout) response.

   The must-revalidate directive ought to be used by servers if and only
   if failure to validate a request on the representation could result
   in incorrect operation, such as a silently unexecuted financial
   transaction.

Errors caused by empty HTTP headers

This was discovered thanks to Amazon S3 sending an invalid HTTP Content-Type header, which in fact is not even allowed to be empty according to the RFCs.

Location: https://s3.amazonaws.com/production.s3.rubygems.org/specs.4.8.gz

HTTP/1.1 200 OK
x-amz-id-2: alURqgFXa/nVfPpLEJQVOs3fJ8t9C9Z+MlAdMAaC62d1ZsccK06NVpqRLxlD1KVB
x-amz-request-id: 1581E6BF3225E185
Date: Wed, 29 Apr 2015 15:22:57 GMT
x-amz-version-id: qc5dRNqFJO68y1JxMODSS1fQf.8WZ8IF
Last-Modified: Wed, 29 Apr 2015 15:21:33 GMT
ETag: "c64eff68233a552b9737972ad8c2fb86"
Accept-Ranges: bytes
Content-Type: 
Content-Length: 2364063
Server: AmazonS3

I vote ๐Ÿ‘ to just drop empty headers.

Cached headers are not returned

When sending a 200 or 304 response, we need to return the original cached headers with existing headers overwritten by the revalidated response.

At the moment only the following are returned: Cache-Control Content-Location Date ETag Expires Vary.

This was by design, but it's wrong.

Valid cached files are labeled as expired

For some strange reason, all recently cached files are automatically labeled as "expired" if requested without any cache-control or expiry headers.

I have a feeling this is wrong, should investigate..

Max-age header isn't fully validated

It appears there is no validation on the max-age client request header.

There is also only an if max-age == 0 check on the cached response header. This means if a client or server sends Cache-Control: max-age=10 and the response's Age field is great than 10, it will be ignored. Oops.

Revalidation on no match incorrectly gives back cached value, even on deleted resource

Related to #15

In the decision table:

rule result
500 error no
matches precondition no
> Host: "https://example.org"
> Location: /my-resource
> Method: HEAD # or GET
> If-None-Match: "W/myetag"
< Host: "https://example.org"
< Location: /my-resource
< HTTP 200 Ok
< ETag: "W/newetag"

Currently it returns a 200 with the "cached" result, but actually a GET should be done to fetch the resource again. The server would return a 304 (correct in the table as well) if the preconditions match, and in that case it may use the result from the cache, but on a 200 it means that the resource has changed and should be served from the origin server.

I propose changing it to MISS. You might as well do a GET for revalidation. In case of 304, you'll get a head :not_modified, so you don't really gain anything here from doing a HEAD. Additionally, I don't the 200 status here is correct. It should report whichever status was returned from the server. Example:

> Host: "https://example.org"
> Location: /my-resource
> Method: HEAD # or GET
> If-None-Match: "W/myetag"
< Host: "https://example.org"
< Location: /my-resource
< HTTP 410 Gone

In this case on revalidation the resource has reported gone. It is correctly not a "stale-allowing-error" (your implementation of validate_is_error) and definitely not REVALIDATED. Should be MISS with the code 410. This is true for anything that's not a server error (500..599).

All of the above counts when the preconditions do not match, or are not even present in the first place (revalidation is then a simple fetch, as per #15).

Recently cached responses are considered STALE

This continues from #9 which didn't fully fix the problem.

After some local tests, we've observed this:

"last-modified"=>"Thu, 30 Apr 2015 13:17:42 GMT", # HIT
"last-modified"=>"Fri, 01 May 2015 02:45:29 GMT", # STALE

That's obviously an massive error and completely backwards. I believe it's related to the freshness_lifetime calculation.. looking into it.

Incorrect min-fresh?

Hi there,

Whilst source sifting I stumbled upon this:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L64-L74

It is used in the "validate_allow_stale" or may I serve this request as stale? However, min-fresh means: the request needs to be fresh for at least this long when you return it. A request with a min-fresh value can never be stale, as by definition it's then no longer fresh.

In fact, you COULD use min-fresh to reduce the maximum age before it becomes stale. and as is, I think, if I understand this correctly, that is what you have implemented it here (in inverse, by adding to the current age, acting as if it's older):

CacheRules/lib/helpers.rb

Lines 317 to 324 in 537667a

def helper_min_fresh
Proc.new {|request, freshness_lifetime, current_age|
if request && request['min-fresh']
token = request['min-fresh']['token']
freshness_lifetime.to_i >= (current_age + token.to_i)
end
}
end

Now since that is correct, unfortunately, if min-fresh is given, and it's fresh, the validate_allow_stale function give... true? Min-fresh should probably not have anything to do with this.

p.s. I love what you've done!

Older user agents might not understand 307 responses

When redirecting to another URI, the client is redirected using the 307 HTTP status code.

This might not work for user agents using HTTP/1.0 or older.

  • One solution is to change the code to 302
  • Another solution would be to detect the version and send 302 or 307

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.