yelp / casper Goto Github PK
View Code? Open in Web Editor NEWA fast web application platform built in Rust and Luau
License: Other
A fast web application platform built in Rust and Luau
License: Other
Right now we copy each key to a new table, which forces you to add a new line every time you add a config option. We should simply return the entire cache_entry, ie refactor:
cacheability_info = {
is_cacheable = true,
ttl = cache_entry['ttl'],
pattern = cache_entry['pattern'],
cache_name = cache_name,
reason = nil,
vary_headers_list = vary_headers_list,
bulk_support = cache_entry['bulk_support'],
id_identifier = cache_entry['id_identifier'],
dont_cache_missing_ids = cache_entry['dont_cache_missing_ids'],
enable_invalidation = cache_entry['enable_invalidation'],
refresh_cache = false,
num_buckets = cache_entry['buckets'],
}
into:
cacheability_info = {
is_cacheable = true,
cache_entry = cache_entry,
cache_name = cache_name,
reason = nil,
vary_headers_list = vary_headers_list,
refresh_cache = false,
}
Context
During debugging, its helpful to know how much TTL is left for a cache key. This helps to investigate if cache is being stale and/or TTL is working correctly.
Example usecase:
We don't track timestamps in ElasticSearch for optimization reason. The cache was returning old data and we want to confirm that cache was last updated before the write happened.
/status?check_cassandra=true
should also return the C* nodes that the driver is using. Ideally they should be split between local and remote nodes, so that we can check that the driver is using the right ones.
That'd make it easier to debug cases where the driver is misconfigured or behaving weirdly.
Currently "make dev" fails outside of Yelp devboxes because there's no "/nail":
=> make dev
....
docker run -d -t \
-p 32927:8888 \
-e "PAASTA_SERVICE=spectre" \
-e "PAASTA_INSTANCE=test" \
-v /nail/etc:/nail/etc:ro \
-v /nail/srv/configs/spectre:/nail/srv/configs/spectre:ro \
-v /var/run/synapse/services/:/var/run/synapse/services/:ro \
-v /Users/abrousse/git/casper:/code:ro \
--name=spectre-dev-abrousse spectre-dev-abrousse
e61b51810461c509e15a176397bbc9fd78af769f0b814f32c7e5017a6511e0e8
docker: Error response from daemon: Mounts denied:
The paths /var/run/synapse/services/ and /nail/srv/configs/spectre and /nail/etc
are not shared from OS X and are not known to Docker.
We use https://github.com/Yelp/casper/blob/master/lua/caching_handlers.lua#L9 to extract the ids from the url. However it's only called when we store the result in the cache and not when we read it.
Since our get_bucket
logic returns different results based on whether we have a list of ids or not, we end up writing the result in a different bucket than what the read expects.
Right now Casper relies on the caller's Smartstack to add the X-Smartstack-Source
header. This is a bit surprising given that the request to Casper will have a source of casper.main
or whatever.
I think it makes more sense to actually have Casper set this header itself when it proxies the request.
The specific reason that I'd like this is due to the way that I plan on implementing this logic in Envoy. I plan on having a special priority routing configuration with Casper being the most preferred priority but then failing over gradually to real service.
Example:
If X-Smartstack-Source is present:
P0 az-local endpoints (habitat)
P1 region-local endpoints (region)
P2 the rest (ecosystem)
If X-Smartstack-Source is not present
P0 casper endpoints
P1 az-local endpoints (habitat)
P2 region-local endpoints (region)
P3 the rest (ecosystem)
There might be other ways that we can implement this, but this seems to be the most straightforward for now. It also gives us the advantage of failing over from Spectre gradually based on health status rather than all at once
Similar to what https://github.com/openzipkin/zipkin/ does, we can have Casper create the cassandra schema when it starts.
This will simplify the logic used in itests and acceptance tests
Here's the output for make itest
on my macbook:
=> make itest
....
bin/docker-compose-1.19.0 -f itest/docker-compose.yml up -d spectre backend cassandra Creating itest_syslog_1 ... done
Creating itest_cassandra_1 ... done
Creating itest_backend_1 ... done
Creating itest_cassandra_1 ...
Creating itest_spectre_1 ... done
bin/docker-compose-1.19.0 -f itest/docker-compose.yml exec -T cassandra /opt/setup.sh
ERROR: No container found for cassandra_1
make: *** [run-itest] Error 1
Looking into the cassandra image, it seems like the problem is cassandra refusing to start:
=> bin/docker-compose-1.19.0 -f itest/docker-compose.yml up cassandra
Starting itest_cassandra_1 ... done
Attaching to itest_cassandra_1
cassandra_1 | Cassandra 2.0 and later require Java 7u25 or later.
itest_cassandra_1 exited with code 1
...and that's probably because java is busted inside of docker:
=> bin/docker-compose-1.19.0 -f itest/docker-compose.yml images
Container Repository Tag Image Id Size
-------------------------------------------------------------------------
itest_backend_1 itest_backend latest 35ad69bdd1ab 173 MB
itest_cassandra_1 itest_cassandra latest 909b2b4d803d 577 MB
itest_spectre_1 spectre-dev-abrousse latest f11d204004e9 484 MB
itest_syslog_1 itest_syslog latest 566da6e9c7a1 154 MB
=> docker run -ti 909b2b4d803d bash
dckruser@2c86d8724510:/$ java -version
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar: invalid LOC header (bad signature)
Error occurred during initialization of VM
java.lang.NoClassDefFoundError: java/lang/ref/Reference$1
at java.lang.ref.Reference.<clinit>(Reference.java:235)
Currently we rely on configuration to know which parts of an HTTP request are part of the cache key (code link). Instead of a static configuration option per service, it'd be neat to have this be more granular and dynamic, driven by the server through a "Vary" HTTP header in the response: Vary: Accept-Encoding, Cookie
. See these docs for more info.
By default nginx drops any header which contain underscores: http://nginx.org/en/docs/http/ngx_http_core_module.html#underscores_in_headers
When an instance fails the healthcheck, paasta sends a SIGTERM, waits a bit and then kills the process if it hasn't stopped yet.
Right now we don't catch SIGTERM, so the process dies as soon as it receives it. All in-flight requests are lost and clients see this as a 503.
This idea stemmed from a discussion with @mattiskan. If a high QPS service is proxied through Casper, the service naturally gets provisioned less and less over time. However, if Casper is down, we're in a tight spot: the traffic gets forwarded to the underlying service (because of the "fail-safe" philosophy baked in proxied_through), the underlying service is under-provisioned and may error/time out, causing user-facing problems until either (a) Casper is brought back up or (b) sufficient capacity is added to the underlying service.
To avoid these situations, let's add a new per-namespace configuration option to let a fixed portion of hit traffic through. Something like hit_passthrough: 0.65
(name TBD)
In case of a cache hit, we'd still forward the request in Casper's post-request callback (with a 65% likelihood). This will not only let us ensure we keep some capacity in proxied services, but could be a useful tool to gauge whether a particular service would collapse if Casper were to be down (currently the only way to find out is to shut down Casper for real. Not ideal!)
I'm trying to get the build to pass for PR #49 but every time I relaunch I get a different failure. It seems PR #47 is also running into the same issues. So far, the tests that have been affected are:
TestPostMethod.test_post_cache_hit_even_if_body_doesnt_match_without_vary
TestPostMethod.test_post_cached_with_id_can_be_purged
TestPostMethod.test_post_always_cached_for_extended_json_content_type
I've never gotten a failure while running make itest
locally.
Afaict lua-cassandra doesn't automatically detect changes in the ring topology. You have to manually call cluster:refresh()
for it to pick up any change.
This is a problem when we're replacing nodes in the C* cluster since the driver will keep trying to connect to the old ones and ignore any new host.
We should just call refresh()
even N seconds, ideally after the response has been returned.
At the moment we have ID extraction support in URLs (through bulk endpoint support and enable_id_extraction
) but surrogate keys would help to invalidate groups of resources across caches. See these docs on how Fastly uses them.
Another big difference with our current support is the fact that surrogate keys are driven by a header returned by the server. Keys can be arbitrary, representing experiment cohorts or deploy versions (things that aren't in the request or response object). For example:
200 OK
Surrogate-Key: elite musician myexperiment-enabled
Content-Type: text/json
{"name": "Bob", "last_name": "Dylan", "num_reviews": 42}
Surrogate key support enables invalidation of all "musician" or "elite" resources for instance.
It would be useful to add support for caching POST requests.
Since some of the response ids might be part of the request body, we should be able to attach the request body (or the keys needed) to the response for creating the caching key.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.