Coder Social home page Coder Social logo

cacherpc's Introduction

Solana JSON-RPC caching server

Disclaimer: This project is an early stage Work-In-Progress and is not ready for production use.

This cache server implementation aims to provide a general solution for both offloading Solana validator RPC service and improving the overall speed and stability of the RPC. It achieves it by caching and updating some of the heaviest and most frequent requests and keeps the requested info updated with the use of PubSub API.

The server itself is a singe binary which is designed to be deployed in front of the validator as its public RPC entrypoint.

Running the server

To build and run the server you will need the Cargo package manager installed, which comes together with Rust compiler. Those two can be obtained here by following "Installing Rust" guideline.

# build
$ cargo build --release
# run
./target/release/rpccache

Configuration

The server supports a number of configuration options, which are the following:

  • -r, --rpc-api-url — validator or cluster JSON-RPC HTTP endpoint.
  • -w, --websocket-url — validator or cluster PubSub endpoint.
  • -l, --listen — cache server bind address.
  • -a, --account-request-limit — sets a maximum number of concurrent getAccountInfo requests the cache is allowed to send to the cluster/validator.
  • -p, --program-request-limit — sets a maximum number of concurrent getProgramAccounts requests the cacher is allowed to send to the cluster/validator.
  • -A, --account-request-queue-size — sets a maximum number of getAccountInfo requests that are allowed to wait for the permit to send the request to validator.
  • -P, --program-request-queue-size — sets a maximum number of getProgramAccounts requests that are allowed to wait for the permit to send the request to validator.
  • -b, --body-cache-size — sets the maximum amount of cached responses.
  • -c, --websocket-connections — sets the number of websocket connections to validator
  • -t, --time-to-live — duration of time for which values will be kept in cache
  • -d, --slot-distance — sets the maximum slot distance for health check purposes
  • --log-file - file, which should be used for the output of generated logs
  • --config - limits related configuration file in TOML format
  • --ignore-base58-limit — flag whether to ignore base58 overflowing size limit
  • --log-format — the format, in which to output the logs: plain | json
  • --request-timeout - time duration, upon of elapsing of which passthrough requests will be aborted, and the client will be notified of request timeout, default is 60 seconds. Timeouts for getAccountinfo and getProgramAccounts requests are configured separately via configuration file.
  • --rules — path to firewall rules written in lua
  • --identity — optional identity key for cacherpc service, should be base58 encoded public key
  • --control-socket-path — path to socket file, e.g. /run/cacherpc.sock

Configuration file

Some configuration parameters can be loaded from TOML formatted file, and can be re-read from it during application runtime, in order to dynamically reapply them. Example configuration:

[rpc.request_limits]
account_info = 10 # concurrent getAccountinfo requests to validator
program_accounts = 50 # concurrent getProgramAccounts requests to validator

[rpc.request_queue_size]
account_info = 10 # number of getAccountinfo requests that can wait in queue before making request to validator
program_accounts = 10 # number of getProgramAccounts requests that can wait in queue before making request to validator

[rpc]
ignore_base58_limit = true

[rpc.timeouts]
account_info_request = 30 # timeout in seconds, before getAccountinfo is aborted
program_accounts_request = 60 # timeout in seconds, before getProgramAccounts is aborted
account_info_backoff = 30 # time duration during which getAccountinfo will be repeatedly retried, in case of failure
program_accounts_backoff = 60 # time duration during which getProgramAccounts will be repeatedly retried, in case of failure

Commands

Running instance of caching server supports several commands that can be sent to it via unix domain socket:

  • cache-rpc config-reload - reload limits related configuration from file (must have been started with --config <path> option)
  • cache-rpc waf-reload - reload WAF rules from lua file, (must have been started with --rules <path> option)
  • cache-rpc subscriptions off - prevent caching server from initiating new subscriptions after fetching data via rpc requests
  • cache-rpc subscriptions on - allow caching server to initiate new subscriptions after fetching data via rpc requests (default)
  • cache-rpc subscriptions status - print out the current status of subscriptions allowance (on or off)

Metrics

Caching server provides various metrics, which are available in Prometheus compatible format. Metrics can be retrieved via /metrics HTTP endpoint.

Features

Implemented methods

In the current version caching is implemented for these methods:

Requests to other methods are passed through to the validator.

Unlikely to be implemented

Features that are unlikely to be implemented:

  • Root and Single commitments.

Disclaimer: This project is an early stage Work-In-Progress and is not ready for production use.

cacherpc's People

Contributors

00nktk avatar bmuddha avatar polachok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cacherpc's Issues

Log updated config

1.when config is reloaded, log new config
2.when config is applied, log new values (limits are already being logged)

Track subscriptions by key

The problem:

We use subscription_active method to check whether we can retrieve the value from cache for a particular key. Currently it's implemented by checking AtomicBool flag for a websocket connection to which this key is routed. This flag is updated in update_status by comparing number of active subscriptions to the number of desired subscriptions, which means that if one of subscription confirmations is delayed, all keys routed to this worker are considered not active, and are not retrievable from cache, which is obviously not desirable.

Proposed solution:

Track subscription status by key.

Detect situations when when ws subscriptions are active but no updates received

Cacher relies on the fact that cache entries have corresponding active ws subscriptions, in order to detect whether the cache entry might be stale or not. But sometimes it may be that subscription exists but no updates are being received from it, which possibly may indicate that the cache entry became stale.

There's a need for a way to detect such situations and not to serve request from cache, if it happens.

Content type with charset returns 415

We're seeing 415 errors when the content-type header specifies a charset. This happens on requests where the content-type reads:

application/json;charset=….

`filter::tests::tree_matches_overall` can randomly fail

Currently filter::tests::tree_matches_overall can randomly panic with 'Uniform::new called with `low >= high`' message, which probably means that proptest generates x..x strategy at some point. My guess is that this is related to arb_non_matching_filters strategy.

This is not a critical issue since the panic is not caused by the test logic, but is annoying. For the time being, broken CI can be fixed with a restart.

Cannot request Stake11111 (Gateway timeout)

When requesting getProgramAccounts(Stake11111...) without filters via the cache I get a gateway timeout error, however it works without the cache.

In logs I can see the following error:

Sep 15 12:39:24.950  WARN cache_rpc::rpc: request: Request { jsonrpc: "2.0", id: Num(1), method: "getProgramAccounts", params: Some((Pubkey([6, 161, 216, 23, 145, 55, 84, 42, 152, 52, 55, 189, 254, 42, 122, 178, 85, 127, 83, 92, 138, 120, 114, 43, 104, 164, 157, 192, 0, 0, 0, 0]), ProgramAccountsConfig { encoding: Base64, commitment: None, data_slice: None, filters: None, with_context: Some(true) })) } error: Overflow
Sep 15 12:39:24.950  INFO cache_rpc::rpc: reporting gateway timeout req.id=Num(1)

This is the RPC request:

curl api.mainnet-beta.solana.com -X POST -H Content-Type: application/json -d 
  {"jsonrpc":"2.0","id":1, "method":"getProgramAccounts", "params":["Stake11111111111111111111111111111111111111", { "encoding": "base64" }]}

Unix socket based, command listener implementation

For the purposes of convenient management of dynamic settings of the application, there's a proposal to implement unix socket listener, which will listen for external commands, and to dynamically change the settings accordingly. It will be quite flexible alternative for signal handling.

Batch requests support implementatin in WAF

Currently cacher doesn't preprocess batch requests, simply forwarding them to validator cluster.
There's a requirement to support batch requests in WAF rules evaluation, so that the rules will apply to all requests, regardless of their type and composition.

Slots & commitments

  1. We currently store & report the latest slot seen in either validator response or received in pubsub (per commitment level). This leads to slot in our responses lagging behind validator if there're no changes for cached accounts.

We should find out how to get the latest slot for commitment via pubsub (preferably).

2.If we have account A cached with slot = N, commitment = Processed, and later we receive an update for account B with slot N+1, commitment = Finalized, it seems like it means that slot N is also Finalized, which in turn, means that we can return account A's data for a request with Finalized commitment requirement.

Detect infinite reconnections on websocket

Sometimes, some worker threads, that manager webosket connections, go into infinite loop trying to establish working websocket connection to validator. Need to figure out the reason, why it happens, and implement detection and correction for such cases.

Make SLOT_DISTANCE configurable

SLOT_DISTANCE threshold is currently hardcoded at 150 (same as default for validator). Make it configurable via command-line option.

MailboxError panics

thread 'actix-rt:worker:3' panicked at 'actor is dead: MailboxError(Mailbox has closed)

Caching getProgramAccounts with filters

It seems like filters are kinda popular, so we should find a way to cache requests with filters.
One possible solution would be to make an additional request without filter and cache the result (using a queue and a background worker, probably).

Any path component should be accepted

To follow the behaviour of the solana validator and be as transparent as possible the cache should accept any path "/" component of the URL. I have seen some applications use this in the wild and it is supported by the validator.

There are also special URL paths in the validator that get translated into non RPC requests like "/health".

Currently cacherpc returns 404 on any path a part from "/". We work around it in our proxy by just overwriting the path component to any request we pass to zubr cache.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.