Coder Social home page Coder Social logo

emodb's Introduction

Build Status

EmoDB

Store your feelings here.

Written by Bazaarvoice: see the credits page for more details.

Introduction

EmoDB is a RESTful HTTP server for storing JSON objects and for watching for changes to those events.

It is designed to span multiple data centers, using eventual consistency (AP) and multi-master conflict resolution. It relies on Apache Cassandra for persistence and cross-data center replication.

Documentation

System of Record API

System of Record Deltas

Databus API

BlobStore API

Queue API

Stash API

Operations

EmoDB SDK

API Keys

User Access Control

Security

Java Client Support

Quick Start

Quick Start Guide

emodb's People

Contributors

abhijeet-bazaarvoice avatar agburov avatar anandujayan avatar ashwini-sheshadri avatar bdevore17 avatar billkalter avatar dependabot[bot] avatar fahdsiddiqui avatar iuliia-titchenko avatar jsurls avatar mariiachekmasova avatar mike-unitskyi avatar mukeshsbbv avatar r48patel avatar reddyanand-bv avatar snyk-bot avatar srini-shanmugam avatar ssuprun avatar sujithvaddi avatar temujin9 avatar vermaravi-bv avatar vvcephei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emodb's Issues

Inappropriate use of Ticker.systemTicker()

What is the Issue?

There are numerous places throughout EmoDB that use a Ticker as a time substitute to enable better unit testing. In all of these cases the production code substitutes in Ticker.systemTicker() for the actual instance. Most of these use the ticker for relative time-spans, such as for expiring caches. However, there are two instances where tickers are being used to return the current time. This is problematic because the system ticker does not guarantee that the epoch is used as the fixed time reference.

  1. https://github.com/bazaarvoice/emodb/blob/master/databus/src/main/java/com/bazaarvoice/emodb/databus/core/DefaultDatabus.java#L551

    The ticker is being used to incorrectly determine whether an event on the databus from a remote data center that is no present in the local data center (that is, has been replicated by Cassandra) is stale. Fortunately this one is non-harmful, since the system ticker will be exceptionally low, causing all missing deltas to calculate as recent. The effect here therefore is that if a delta never gets replicated the databus will be re-polled indefinitely until the event naturally expires.

  2. https://github.com/bazaarvoice/emodb/blob/master/web/src/main/java/com/bazaarvoice/emodb/web/scanner/scheduling/ScanUploadSchedulingService.java#L295

    The ticker is being used to get the current time for the purpose of scheduling Stash execution and naming the destination directory. This is causing the Stash to run at the incorrect time and use a date far in the past, such as 1/1/1970 on the initial execution.

The use of Ticker is good for unit testing, but especially in the instances where the returned value must correspond with the current time it should be replaced with an object which will do so.

How to Test and Verify

  1. Check out the project
  2. Run EmoDB
  3. Configure a Stash server with a scheduled scan and start Stash. The startup logs will show the first scan scheduled for 1970.

Risk

High for Stash since destination directory naming will not reflect actual Stash times.

Level

Low. Although the impact is high the actual solution is straightforward.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Apparent contradiction between Emo's never-null doc get api and conditional application

What is the Issue?

There is an apparent contradiction between Emo's never-null doc get api and conditional application.

How to Test and Verify

  1. Start Emo and run through this scenario:

  2. I start out by deleting the doc so we know it doesn't exist...

$ http DELETE :8080/sor/1/review:testcustomer/testdoc000 audit=="comment:'blah'" X-BV-API-Key:local_admin Content-Type:application/x.json-delta
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 08 Feb 2017 15:07:46 GMT
Transfer-Encoding: chunked

{
    "success": true
}
  1. I get the doc so you can see the state of it. Notice that there is no "nofield" attribute.
$ http GET :8080/sor/1/review:testcustomer/testdoc000 X-BV-API-Key:local_admin
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 08 Feb 2017 15:07:50 GMT
Transfer-Encoding: chunked

{
    "client": "TestCustomer", 
    "type": "review", 
    "~deleted": true, 
    "~firstUpdateAt": "2017-02-08T15:06:10.121Z", 
    "~id": "testdoc000", 
    "~lastUpdateAt": "2017-02-08T15:07:46.685Z", 
    "~signature": "233b2c5fcc29921f9728a317d5398b07", 
    "~table": "review:testcustomer", 
    "~version": 2
}
  1. Since the "nofield" attribute is missing, I'd expect '{.., "nofield": ~}' to evaluate to true...
$ http POST :8080/sor/1/review:testcustomer/testdoc000 audit=="comment:'blah'" X-BV-API-Key:local_admin Content-Type:application/x.json-delta <<< 'if {..,"nofield":~} then {"author":"Fred","title":"Best Ever!","rating":5} end'
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 08 Feb 2017 15:07:59 GMT
Transfer-Encoding: chunked

{
    "success": true
}
  1. But it didn't!!!
$ http GET :8080/sor/1/review:testcustomer/testdoc000 X-BV-API-Key:local_admin
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 08 Feb 2017 15:08:04 GMT
Transfer-Encoding: chunked

{
    "client": "TestCustomer", 
    "type": "review", 
    "~deleted": true, 
    "~firstUpdateAt": "2017-02-08T15:06:10.121Z", 
    "~id": "testdoc000", 
    "~lastUpdateAt": "2017-02-08T15:07:59.769Z", 
    "~signature": "618e69f4f5592b4b9a39df057d3dbf51", 
    "~table": "review:testcustomer", 
    "~version": 3
}
  1. But if I add a condition that the document itself may be missing...
$ http POST :8080/sor/1/review:testcustomer/testdoc000 audit=="comment:'blah'" X-BV-API-Key:local_admin Content-Type:application/x.json-delta <<< 'if or(~, {..,"nofield":~}) then {"author":"Fred","title":"Best Ever!","rating":5} end'
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 08 Feb 2017 15:08:31 GMT
Transfer-Encoding: chunked

{
    "success": true
}
  1. Then it works
$ http GET :8080/sor/1/review:testcustomer/testdoc000 X-BV-API-Key:local_admin
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 08 Feb 2017 15:08:33 GMT
Transfer-Encoding: chunked

{
    "author": "Fred", 
    "client": "TestCustomer", 
    "rating": 5, 
    "title": "Best Ever!", 
    "type": "review", 
    "~deleted": false, 
    "~firstUpdateAt": "2017-02-08T15:06:10.121Z", 
    "~id": "testdoc000", 
    "~lastUpdateAt": "2017-02-08T15:08:31.106Z", 
    "~signature": "582736be3895778a4cc2e1286ba6ffb6", 
    "~table": "review:testcustomer", 
    "~version": 4
}
  1. Now that the document exists, my original conditional works as expected
$ http POST :8080/sor/1/review:testcustomer/testdoc000 audit=="comment:'blah'" X-BV-API-Key:local_admin Content-Type:application/x.json-delta <<< 'if {..,"nofield":~} then {"author":"Kenny","title":"Best Ever!","rating":5} end'
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 08 Feb 2017 15:14:14 GMT
Transfer-Encoding: chunked

{
    "success": true
}
  1. See?
$ http GET :8080/sor/1/review:testcustomer/testdoc000 X-BV-API-Key:local_admin
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 08 Feb 2017 15:14:16 GMT
Transfer-Encoding: chunked

{
    "author": "Kenny", 
    "client": "TestCustomer", 
    "rating": 5, 
    "title": "Best Ever!", 
    "type": "review", 
    "~deleted": false, 
    "~firstUpdateAt": "2017-02-08T15:06:10.121Z", 
    "~id": "testdoc000", 
    "~lastUpdateAt": "2017-02-08T15:14:14.226Z", 
    "~signature": "01c5de13fdf7fe5dc27ca6852226ee68", 
    "~table": "review:testcustomer", 
    "~version": 5
}

Risk

Level

Low: I can just always use the redundant conditional or(~, {..,"nofield":~}). New users will find this behavior confusing, though... 'cause it is ;)

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Consolidate system table placement configurations

What is the Issue?

Currently EmoDB explicitly has the system table placement configured in three places:

systemOfRecord:
  systemTablePlacement: app_global:sys

blobStore:
  systemTablePlacement: app_global:sys

auth:
  tablePlacement: app_global:sys

Additionally, Stash implicitly piggy-backs on the systemOfRecord.systemTablePlacement configuration when creating the following table:

scanner:
  scanStatusTable: "__system_scan_upload"

Is there really an advantage to allowing each of these to be configured independently? As a counter-argument, allowing these to be configured independently increases the opportunities for the administrator to set one or more of these incorrectly. Additionally, storing system data in multiple placements increases the possibility that some system table's placement is not configured with sufficient security to safeguard EmoDB internals.

We should at least consider unifying all of these into a single top-level attribute that would then become inject-able into all of the dependent modules (e.g.; DataStoreModule):

systemTablePlacement: app_global:sys

How to Test and Verify

Not really testable, just the way EmoDB is configured.

Risk

This is a low-risk change since all of the current configurations are typically set to the same value anyway. This change only affects any EmoDB installs which pathologically chose to create multiple distinct system placements for each module where configurable.

Level

Low

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Inactivate API keys instead of deleting them

What is the Issue?

Currently when an API key is deleted it is fully removed from the system via a record deletion. There are several reasons why this is not desirable:

  • Any historical record concerning the API is lost, such as who owned it and what roles it had.
  • Although extremely unlikely it is possible that a new API is created which hashes to the same value as a deleted key. Were this to happen the deleted API key could then be used and would authenticate as the new API key. Keeping around deleted API keys prevents inadvertent reactivation by hash collisions.
  • There are proposed features for EmoDB which involve object ownership by an API key. There is already an element of this with owned databus subscriptions. Although currently not an issue deleting API keys could leave harmful dangling references in the future.

The proposed solution is to instead have a state on an API key with the following possible values:

  • active
    • Normal state where a key is in use
  • inactive
    • State which indicates the key has been "deleted" and should not longer authenticate
  • migrated
    • State for a key which was compromised and migrated to a new key using API key administration tool. This is functionally similar to inactive but provides more context to the reason. It also allows for multiple records with the same internal ID while allowing the system to distinguish which one can authenticate. In other words if there are n API key entries with the same internal ID then n-1 of them will be in the migrated state.

How to Test and Verify

There are no errors caused by this issue. The only way to very is to query the authorization tables directly and verify that a key has been deleted after the API key administration task to delete the key has been called.

Risk

Risk is fairly low. So long as the existing API key realm only authenticates or authorizes keys in the active state then the rest of the system should be unaffected.

Level

Medium. While the changes are localized there are some nuances in ensuring authentication behaves as expected, especially concerning key migration.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Add ability to group roles by namespace

What is the Issue?

Emo currently stores all rules in a flat namespace. This has worked well under the assumption that a single user, the administrator, is responsible for managing and assigning roles. However, the current move is toward a delegated administration system, where the administrator can create trusted API keys which themselves can create roles with limited permissions and assign those to API keys (see #63). To support this each API key must have a safe sandbox for creating and managing roles; it would be dangerous to have a flat all-or-nothing system where any user with permission to update roles could update any role in the system.

The issue being documented here is a perquisite to the permissions aspect. There should be the ability to group related roles by a common namespace. With this in place it would be possible to grant an API key permission such as "manage roles in namespace X" or "assign roles in namespace X". This way the API key would have a safe sandbox for role administration without permission to manage or assign roles outside of that sandbox.

Risk

By itself the ability to group roles by namespace is low. The riskiest aspects of this change are:

  1. Ensuring backwards compatibility with existing role permissions and/or an upgrade/migration procedure which can be performed with no downtime.
  2. Ensuring that roles with the same name in different namespaces do not collide.

Level

Medium

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Add convenience script to start Emo locally

What is the Issue?

  • Add a convenience script to start EmoDB locally
  • Clean up documentation

How to Test and Verify

  1. Unzip the binary from yum/target
  2. Run ./start-local.sh
  3. Verify everything works OK

Risk

Level

Low

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Simultaneous Multi-DC compactions can result in data loss

What is the Issue?

Currently, there is a rare bug that can present itself when a high update row is being compacted simultaneously in two data centers. Consider the following scenario:

Sequence DC-1 DC-2 Remarks
1 D1, D2, FCT, D3, D4, C1(D1,D2) D1, D2, FCT, D3, D4 DC-1 compacts, but C1 isn't replicated to DC-2
2 D1, D2, D3, D4, C1(D1,D2), FCT D1, D2, D3, D4, C1(D1,D2), FCT, Compaction C1 and D3 are behind FCT in both data-centers
3 D3, D4, FCT C2(C1,D3) D3, D4, FCT, C3(D3, D4) D1,D2 and C1 gets deleted, while creating a new compaction, C2. This is where the problem is. Deletes got replicated to DC-2, and before C2 could make it to DC-2, DC-2 created another compaction C3, based on only D3 and D4. And thus, C3 becomes the last effective compaction which does not include C2 owned deltas
4 D3, D4, FCT, C2(C1,D3), C3(D3, D4) D3, FCT, C3(D3, D4) DC-1 replicated the corrupted compaction C3, and uses it as the best compaction

To resolve the above, the delete of the base compaction should be deferred until the owner compaction is behind FCT. In the above scenario, C1 should not be deleted in Step 3.

How to Test and Verify

  1. Check out the project
  2. Verify that the unit test MultiDCCompactionTest provided in the PR for this issue fails
  3. The above test should pass

Risk

Medium. Although there is nothing destructive about this PR, it does deal with compactions that is a core functionality. Running it in QA and stress testing is recommended.

Level

Medium

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

CQL driver cannot be configured on Stash

What is the Issue?

The cql-toggle task can be used to configure whether to use the Astyanax or Cql driver on a particular EmoDB server. There are several problems with this approach with regards to Stash:

  • The cql-toggle task is not available when Emo is running in Stash mode.
  • Even if it were, the setting is not sticky. Since Stash works by spinning servers up and down based on demand any new servers that spin up for the current or future Stash runs would revert to default behavior.

How to Test and Verify

  1. Check out the project
  2. Run Emo using start.sh, then Stash using start-stash-role.sh
  3. There is no way to configure which driver to use in Stash.

Risk

Medium. Once the Cql driver has been fully proven for Stash this is a moot point. However, until that time if a critical issue is found in the Cql implementation of scan queries used by Stash there is no way to change back without creating and deploying a new release.

Level

Medium. There is already a settings module in place which is persistent. It should be possible to utilize that to control use of the Cql driver.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Provide an additional timestamp intrinsic for tracking record content mutation

What is the Issue?

There exist intrinsics for ~firstUpdateAt and ~lastUpdateAt. While the latter is useful it does not provide sufficient context for some use cases. This is because ~lastUpdateAt is always set to the timestamp of the most recent delta, even if that delta did not mutate the content in any way. Consider the use case of an index which wants to be able to return all records that have changed since a given date. The last-update time can return false negatives for these cases where the last update was non-mutative.

Here are three possible solutions:

  1. Add a third timestamp intrinsic, ~lastMutateAt, which contains the timestamp of the most recent delta which actually mutated the content.
  2. Same as option 1 and additionally remove ~lastUpdateAt
  3. Change the meaning of ~lastUpdateAt to be the timestamp of the last mutative delta

Just as with the last mutation time there are also use cases to be made using the last update time so I don't advocate the second or third option. The new ~lastMutateAt intrinsic would follow the same rules as ~lastUpdateAt in terms of availability, such as being absent for deleted records.

For clarity, this is distinct from the existing ~signature intrinsic. Checking the signature only informs the client if two versions of a record contain the same delta history. Updating a record with a non-mutative delta results in a modified signature, making it insufficient for this use case.

How to Test and Verify

  1. Check out the project

  2. Create a table, such as "mutation:demo"

  3. Repeat this sequence

    $ curl -s -XPUT -H "Content-Type: application/json" "http://localhost:8080/sor/1/mutation:demo/demo1?audit=comment:'initial+submission'" --data-binary '{"value": true}'
    {"success":true}
    $ curl localhost:8080/sor/1/mutation:demo/demo1 | jq .
    {
    "~lastUpdateAt": "2016-09-12T18:15:43.899Z",
    "~firstUpdateAt": "2016-09-12T18:15:43.899Z",
    "value": true,
    "~id": "demo1",
    "~table": "mutation:demo",
    "~version": 1,
    "~signature": "aff0bc9ee63bdb23b19bb92801636158",
    "~deleted": false
    }
    $ curl -s -H "Content-Type: application/x.json-delta" "http://localhost:8080/sor/1/mutation:demo/demo1?audit=comment:'unchanged'" --data-binary '{..,"value":true}'
    {"success":true}
    $ curl localhost:8080/sor/1/mutation:demo/demo1 | jq .
    {
    "~lastUpdateAt": "2016-09-12T18:16:38.225Z",
    "~firstUpdateAt": "2016-09-12T18:15:43.899Z",
    "value": true,
    "~id": "demo1",
    "~table": "mutation:demo",
    "~version": 2,
    "~signature": "9d3d1af5e17275b714025343c8c648f5",
    "~deleted": false
    }
    

    Note the ~lastUpdateAt, ~signature, and ~version changed even though the resolved record is unchanged. This is as expected, but it demonstrates the metadata for the last time the record content changed, "2016-09-12T18:15:43.899Z", is lost.

Risk

Level

Medium. The resolution code already tracks the last mutation time, so the change is mostly just exposing the value as an intrinsic. However, any change in the delta resolution path carries at least a medium amount of risk.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Change condition for ignoreSuppressedEvents to be configurable

What is the Issue?

Currently the databus supports a "ignoreSuppressedEvents" attribute which, when true,
automatically adds a "and not tags contains 're-etl'" condition to each databus subscription (default is true). This allows us to signify a set of databus events as ignorable by most subscriptions to prevent flooding their subscriptions with bulk updates which may not be of interest to them.

The issue is that the condition is hard-coded and therefore cannot be adapted as necessary. The actual deployment may choose to use a different value or dynamically set the condition based on current status, such as a one-time bulk update by a client which uses their own event tagging scheme.

The proposed solution is to keep the functionality to but to make the "suppressed event" condition configurable. Options include:

  • Adding a new config.yaml property which contains the suppressed event condition.
  • Creating a new suppressed event table and tasks for dynamically updating them.

The former solution is less work but requires an ops update and re-deploy to change.

How to Test and Verify

Create a databus subscription is "ignoreSuppressedEvents" set to true. The only events that will be suppressed are ones tagged as "re-etl".

Risk

Low. The functionality to suppress events is already in place; this only changes the source for the suppressed event condition.

Level

Medium

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

CachingSubscriptionDAO is too inefficient

What is the Issue?

For the databus, CachingSubscriptionDAO works by caching all subscriptions in a single map of subscription name to subscription object. Every time a subscription is inserted, updated/renewed, or deleted this invalidates the entire cache and causes it to be reloaded.

Before owned subscriptions were introduced this behavior was acceptable. However, with the introduction of owned subscriptions every API call for a subscription – peek, poll, ack, etc. – first requires reading the subscription from the lower layers to verify the owner. In our production system this leads to problems because 1) according to stats 60+ subscriptions per minute is the norm, 2) each subscription request invalidates the entire cache, and 3) there are over 800 subscriptions. This means many databus API calls must synchronously reload all subscriptions before returning a response. This can be seen in our dashboards with regular subscribe, poll, and ack request latencies of over 500ms, sometimes greater than 1 second.

CachingSubscriptionDAO should be reworked to be more efficient for retrieving a single subscription. At the very least, an invalidation of one subscription should not cause a full reload of all subscriptions from source.

How to Test and Verify

  1. Check out the project
  2. Create hundreds of complex databus subscriptions.
  3. Verify as more subscriptions are added the time to serve each request increases.

Risk

Medium. The long response times already a risk since many clients will timeout as a result. The greatest risks are:

  • The new solution negatively impacts the time to list all subscriptions, since this is used frequently in the databus fanout.
  • A bug in the the new solution causes cache invalidation events to be missed, since this affects proper databus fanout.

Level

Medium

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Refactor SoR to use job framework

What is the Issue?

Purging a table is a long process and should be asynchronous, using the same job framework used for databus replay.

How to Test and Verify

Run a timed purge command.

Risk

Low; purge is generally only run by administrators and is infrequent at that.

Level

Low.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

CQL driver returning errors, Emo becomes unusable

What is the Issue?

In one of the Bazaarvoice environments, "anon", we started seeing the following errors in the logs:

ERROR [2016-11-03 15:25:17,004] com.bazaarvoice.emodb.web.scanner.rangescan.LocalRangeScanUploader: Scanning placement failed for task id=2, app_global:default: ScanRange[8000000000000000000000006bccac6e-955555555555555555555555c12201c3]
! com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.100.47.52:9042 (com.datastax.driver.core.exceptions.ConnectionException: [/10.100.47.52] Error while setting keyspace), /10.100.45.100:9042 (com.datastax.driver.core.exceptions.ConnectionException: [/10.100.45.100] Error while setting keyspace), /10.100.36.5:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)), /10.100.36.111:9042, /10.100.46.26:9042, /10.100.43.43:9042, /10.100.38.91:9042, /10.100.36.88:9042, /10.100.42.136:9042, /10.100.42.249:9042 [only showing errors of first 3 hosts, use getErrors() for more details])
! at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:207) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:43) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:273) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.run(RequestHandler.java:396) ~[emodb-web-5.4.15.jar:5.4.16]
! ... 3 common frames omitted
! Causing: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.100.47.52:9042 (com.datastax.driver.core.exceptions.ConnectionException: [/10.100.47.52] Error while setting keyspace), /10.100.45.100:9042 (com.datastax.driver.core.exceptions.ConnectionException: [/10.100.45.100] Error while setting keyspace), /10.100.36.5:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)), /10.100.36.111:9042, /10.100.46.26:9042, /10.100.43.43:9042, /10.100.38.91:9042, /10.100.36.88:9042, /10.100.42.136:9042, /10.100.42.249:9042 [only showing errors of first 3 hosts, use getErrors() for more details])
! at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:312) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.one(ArrayBackedResultSet.java:275) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.bazaarvoice.emodb.sor.db.cql.RowGroupResultSetIterator.nextRow(RowGroupResultSetIterator.java:123) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.bazaarvoice.emodb.sor.db.cql.RowGroupResultSetIterator.access$300(RowGroupResultSetIterator.java:15) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.bazaarvoice.emodb.sor.db.cql.RowGroupResultSetIterator$RowGroupImpl.computeNext(RowGroupResultSetIterator.java:97) ~[emodb-web-5.4.15.jar:5.4.16]
! at com.bazaarvoice.emodb.sor.db.cql.RowGroupResultSetIterator$RowGroupImpl.computeNext(RowGroupResultSetIterator.java:86) ~[emodb-web-5.4.15.jar:5.4.16]

Once this happens ALL Datastax driver connections return errors, including those on clusters unrelated to the one that caused this issue. Increasing the maximum number of connections did not have any effect. Root cause is under investigation.

How to Test and Verify

Under investigation.

Risk

High

Level

Unknown until root cause is determined.

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Update credits for webpage

What is the Issue?

  • Project credits were not included on the webpage.

How to Test and Verify

  • Check that link 'Project Credit' redirects to the github pages with credit information.

Risk

Level

Low

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Update Cassandra health check to include CQL driver verification

What is the Issue?

The EmoDB Cassandra health check only verifies the Astyanax connection. It is possible that the CQL driver could lose connectivity independently of Astyanax. The health check should be updated to verify both connections.

How to Test and Verify

This is difficult to verify since the circumstances where it happens are unusual, but a review of the Cassandra health check verifies this behavior.

Risk

Low due to its unlikeliness. However, we have seen issues where loss of Cassandra seed servers affects the Astyanax and Cassandra drivers differently, so a comprehensive health check would be beneficial.

Level

Low

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Hidden (aka Private) attributes

What is the Issue?

I'd like to consider the notion of "private" attributes. These are defined as attributes not visible by default via the sor api, databus, or stash. API keys with write permission would be able to assert a query parameter that they wish to see private fields.

Private attributes have three main use cases that I'm aware of right now:

  1. Writers may need to create attributes for use in conditional deltas in order to implement MVCC or other writer coordination protocols. These fields probably don't have any value to readers and would only serve to pollute reader attentional space and open the possibility of harmful coupling.
  2. Table owners may wish to write certain processing state information into the document for permanent audit-like use cases. Like the coordination attributes, these attributes will pollute the interface and probably result in harmful coupling.
  3. Owners are currently free to add new fields to their documents in the faith that new attributes should not break existing readers. However, deprecating and removing fields has no clear path. Being able to toggle attributes off and back on again provides a valuable mechanism in the context of a larger deprecation protocol.

How to Test and Verify

  1. Check out the project; run Emo
  2. Write some documents with hidden attributes.
  3. Verify that you cannot see the hidden fields unless you specify showHiddenFields=true
  4. Verify you cannot specify showHiddenFields=true unless you have update permission to the table
  5. Verify you cannot see hidden fields in databus events or in stash
  6. Write some conditional deltas, conditioned on the values of hidden attributes.
  7. Verify Emo correctly computes the conditions (hidenness has no impact on conditional logic).

Risk

Level

Medium

We are introducing a new concept to Emo. This is something we should do very, very carefully. Try to erase my use case from your mind and think about what a new user will make of this feature. Does the documentation make the intended usage obvious? Is the feature sufficiently intuitive?

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Requests with special characters cause 400 errors in EmoDB when checking adhoc-throttles

What is the Issue?

To reproduce:

Perform a request such as:

curl -s -v "localhost:8080/sor/1/table/%99"
*   Trying ::1...
* Connected to localhost (::1) port 8080 (#0)
> GET /sor/1/table/%99 HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 400 Bad Request
< Date: Mon, 11 Jul 2016 18:17:56 GMT
< X-BV-Exception: java.lang.IllegalArgumentException
< Content-Type: text/plain
< Transfer-Encoding: chunked
< 
* Connection #0 to host localhost left intact
Invalid path string "/adhoc-throttles/GET_sor~1~table~�" caused by invalid charater @33

The issue is that the adhoc throttle is checking ZooKeeper to determine if that URL has any throttle applied. The character 0x99 is not a valid character for a ZK path, so the following exception is thrown:

! java.lang.IllegalArgumentException: Invalid path string "/adhoc-throttles/GET_sor~1~table~�" caused by invalid charater @33
! at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:99) ~[emodb-web-4.32-SNAPSHOT.jar:4.32-SNAPSHOT]
! at com.bazaarvoice.emodb.common.zookeeper.store.ZkMapStore.toPath(ZkMapStore.java:92) ~[emodb-web-4.32-SNAPSHOT.jar:4.32-SNAPSHOT]
! at com.bazaarvoice.emodb.common.zookeeper.store.ZkMapStore.get(ZkMapStore.java:123) ~[emodb-web-4.32-SNAPSHOT.jar:4.32-SNAPSHOT]
! at com.bazaarvoice.emodb.web.throttling.AdHocThrottleManager.getThrottle(AdHocThrottleManager.java:67) ~[emodb-web-4.32-SNAPSHOT.jar:4.32-SNAPSHOT]
! at com.bazaarvoice.emodb.web.throttling.AdHocConcurrentRequestRegulatorSupplier.forRequest(AdHocConcurrentRequestRegulatorSupplier.java:43) [emodb-web-4.32-SNAPSHOT.jar:4.32-SNAPSHOT]
! at com.bazaarvoice.emodb.web.throttling.AdHocConcurrentRequestRegulatorSupplier.forRequest(AdHocConcurrentRequestRegulatorSupplier.java:34) [emodb-web-4.32-SNAPSHOT.jar:4.32-SNAPSHOT]
! at com.bazaarvoice.emodb.web.throttling.ConcurrentRequestsThrottlingFilter.filter(ConcurrentRequestsThrottlingFilter.java:28) [emodb-web-4.32-SNAPSHOT.jar:4.32-SNAPSHOT]

Recommended solution:

The ZK path for adhoc throttles already has a substitution filter to replace "/" with "~". That filter should be expanded to replace invalid ZK path characters with hex representations.

How to Test and Verify

  1. Follow the example above to verify the issue still exists.
  2. Once the fix is in place the above call should result in a normal response, such as a success response with a record where "~deleted" is true if the above record does not exist.

Risk

Medium. In our use case keys are overwhelmingly in the standard ASCII range, but there may be other use cases where the full unicode character set is more widely used for record keys, especially if the keys are computed from natural sources.

Level

Low. The portion of code required for this update is centralized and does not directly impact any of the more complex core Emo functionality.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

describe_splits_ex frequently returns bad splits

What is the Issue?

The following issue impacts both Stash and the getSplits() API call.

Emo uses the Astyanax call describeSplitsEx() to get splits for token ranges. Frequently this returns reasonable values, but sometimes it returns splits which are demonstrably incorrect. Typically the returned split is equivalent to the requests token range and the returned CfSplit.getRowCount() always returns 128. This negatively impacts Stash because Stash is serialized on these excessively large splits until it detects the issue and re-splits. This also negatively affects table splits because a returned split may contain far more records than the caller requested.

Note that there is no equivalent native CQL call and according to Cassandra tickets there is no similar support until C* 2.1.5. (see CASSANDRA-7688).

How to Test and Verify

Note: To reproduce you must be using a Cassandra ring with at least 4 nodes. It'll become clear why later.

  1. Create a table with many many rows, like 100,000+
  2. Make a splits request on that table
  3. Repeat step 2 numerous times. Sometimes splits such as "bd99f278f287fc25f:d-e" are returned, which is effectively no split.

Risk

Although the system works correctly in the event of over-sized splits it noticeably slows Stash down. We've also seen some evidence that extremely large result sets can lead to corruption on the web-client, although this is still under investigation.

Level

Medium. Regardless of the effort to resolve this the changes would be localized to a small part of the system. That said, that small part directly impacts splits and Stash, so a regression would have significant impact.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Change role creation so caller cannot create roles with more permission than himself

What is the Issue?

This issue is in the larger context of creating a system where Emo can have delegated API key management. Today Emo is best administered by a single cabal of administrators who create roles and API keys at the client's request. As Emo expands there will be a need to delegate role and API key management. That is, if the admin has a trusted client who should be able to create API keys and give a discrete set of permissions to those keys there should be a safely delegated way to do so. For example:

  • Admin grants Project X's administrator, xadmin, full permissions on all tables in the project_x:app placement: sor|*|if(intrinsic("~placement":"project_x:app"))
  • xadmin wants to create roles and API keys further restricting tables by project. For example, he may want to create a role and API key for project "runway" with permission sor|*|if(and(intrinsic("~placement":"project_x:app")),{..,"project":"runway"}))

For this to work admin needs to be sure that xadmin cannot create roles outside of the scope of his authority. For example, xadmin should not be able to create a role with permissions such as sor|*|if(intrinsic("~placement":"restricted:app")) because xadmin himself does not have that permission.

Note that this is note a complete solution in itself as it still leaves open many holes such as restricting which roles a delegated API key can update and which API keys the delegate can modify, but it is a crucial piece of the overall puzzle for distributed API key management.

How to Test and Verify

  1. Check out the project
  2. Create a role with limited permissions, such as sor|read|* and including system|manage_api_keys and system|manage_roles and assign it to a new API key.
  3. Using the API key create a role with permission sor|*

With the current Emo this role creation is allowed. With the update in place the role creation should fail with a message that the user does not have permission to grant sor|*.

Risk

The risk is mitigated so long as all API keys and roles are maintained by admins only. However, the point of this issue is undoing that restriction. For that reason, the risk is low if operations remain at status quo, but high as soon as delegation is put into practice.

Level

High. The level of changes to ensure one unbounded permission, such as sor|if(in("read","update"))|if(intrinsic("~table":not(like("*:sys")))) is wholly contained within a users other permissions is a non-trivial exercise.

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Databus resource forwarding not working

What is the Issue?

Requests to the databus are partitioned by subscription. In the application cluster a single server is responsible for each subscription as determined by a hash function and Ostrich partitioning. Any requests to Emo using Ostrich use the same hash function and so databus requests are routed to the correct server. If a request in to the wrong server, such as through a load balancer, Emo is supposed to forward that request to the correct server for the subscription. However, this functionality has regressed and is currently not working.

It looks the the problem was introduced here:

45b1ae0#diff-2e81d793f3b18dae728f5140e0e538ebR385
45b1ae0#diff-9f186017d63374d1f96db11c087edd46R397
45b1ae0#diff-9960abdfcc869098ddf67001c1443f11R32

The partitioning function was removed from the service pool configuration and the partition key is one call removed from the service factory so the partition key, the subscription name, is not provided at the point necessary for proper routing.

How to Test and Verify

  1. Check out the project
  2. Run multiple EmoDB applications connected to the same ZooKeeper and Cassandra ring.
  3. Subscribe to a databus subscription.
  4. Poll the subscription on each server. If it was working correctly each request would be forwarded to a single server except those made directly on the server. However, the actual behavior is that calls will be forwarded seemingly randomly around the cluster for the same subscription.

Risk

As a result of this defect pollers using a load balancer will regularly not poll their subscription correctly. As a result the subscription will appear empty even though it still has pending events.

Level

Medium. The fix isn't too complicated; DatabusClientSubjectProxy needs to be replaced with an interface which directly contains subscription names and therefore can be correctly forwarded. However, it changes much of the logic in DatabusResource1 and the corresponding injections.

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Ability to run start-local.sh from any directory

What is the Issue?

There is a simple bash recipe for recursively traversing symlinks to get to the real installed directory of a script. Then, you can pushd to it in order to run the script in context. If you do this, I won't have to actually be in my emodb directory to start it.

I'll send a PR.

How to Test and Verify

  1. check out my PR
  2. run start-local.sh from any other directory (it should start Emo)
  3. create a symlink to start-local.sh and run it via the symlink (this should also work)
  4. might as well also run it from the actual repo directory just to make sure that there's no regression

Risk

Level

Low

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Create a global snapshot from Stash

What is the Issue?

Currently, stash is not a "snapshot" of EmoDB. It is a long scanning process that lasts for several hours and produces an inconsistent snapshot. The issue is that there is no way in this dataset to deterministically arrive at some subset of data such that we are assured that all records as of that time are present. Consider the following situation. Let's say we start the Stash scanning process at 7 p.m. It is not enough to say that any record with the "lastUpdateAt" later than 7 p.m. should be discarded from the dataset. If we do that, then we will take out records that should be present in the dataset, since lastUpdated after 7 p.m., doesn't mean the record didn't exist before 7 p.m., and if so, we don't have the state in which the record existed at 7 p.m.
We need to take a global snapshot such that we record exactly the state of records as they existed at the start time of the scan.

To do this, the easiest way in EmoDB seems to be:
At the time of scan start time, halt any deletion of deltas. Note that compactions can still take place, but no deltas will be deleted
When resolving deltas, any delta later than scan start time should be ignored. This can be done either at the Resolver layer, or at the DAO layer that only iterates over deltas less than scan start time.
Using the two steps above, we can guarantee that a global snapshot of EmoDB is provided as of the scan start time.

How to Test and Verify

Risk

Level

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Remove uniqueness constraint from databus subscription name

What is the Issue?

One of the changes introduced in #19 is that each databus subscription is now associated with a single API key owner. With this change the unique identifier for a subscription effectively changed from (name) to (name, API key). However, the system still requires that each subscription name is unique system-wide.

Emo should be changed to permit different API keys to have subscriptions with the same name. The largest barrier to this is that the underlying event channels names are the subscription names. This would have to be changed to include the owner's internal ID as part of the event channel name. There would also have to be a way to grandfather in events from the existing databus event channels.

Recommend changes:

  • Change databus subscription names to be unique within the scope of the owning API key
  • Change databus permissions to be boolean on the ability to create subscriptions, since restricting subscription names would no longer serve a purpose without the possibility for subscription collisions.
  • Migrate databus event channel names to be unique by subscription name and owner internal ID.

How to Test and Verify

  1. Check out the project
  2. Start EmoDB
  3. Create two API keys
  4. In each key create a subscription with the same name and different conditions. With the current Emo version the second create will return a 403 exception with the reason "Not subscriber".
  5. Create several tables and documents, some of which match the subscriptions for both keys and some for only one.
  6. Poll both subscriptions and verify each received the correct events.

Risk

This is a fairly high risk change because it fundamentally changes one of the central pieces of the Emo architecture. As previously noted there needs to be a smooth migration from old to new event channel names with no downtime and without losing any preexisting events in the old channels.

Level

High based on the areas touched by this change, even though the effort and scale of implementing the change may be lower.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Update databus polls to honor polling API key's SoR read permissions

What is the Issue?

Databus subscriptions currently do not honor table read permissions. For example, assume subscription "sub1" has condition alwaysTrue() and is polled by API key "apikey1". Assume apikey1 has the necessary permissions for databus access but the only SoR-related permission it has is sor|read|if(intrinsic("~table","only_accessible_table")). If anyone updates a record in table "inaccessible_table" the update notification and record contents will be available to apkey1 when it polls sub1.

A proper remedy to this requires that each subscription is associated with an API key. This way the fanout process can determine whether the record is accessible not only by the subscription but also by the subscriber before putting the record onto the subscription. Of course, the downside to this is that each subscription must now be polled by only a single user, but realistically this is the only observed use case to-date anyway. Additionally, after much discussion with the team we could neither come up with a use-case nor a satisfactory solution for sharing a databus subscription with multiple API keys with potentially different read permissions.

How to Test and Verify

Follow the example from the issue description: create an API key with limited read permissions, create a databus subscription, and perform updates on records in tables which the API key cannot read. The API key will not be able to read those records directly from the sor module but it will be able to see the records by polling the subscription.

Risk

High, since this impacts multiple systems. Namely,

  1. It requires updating the databus implementation to associate an API key owner with each subscription.
  2. It requires updating the fanout process to check read permission on each record before placing it on a subscription.
  3. The read permission check must be efficient enough not to cause a bottleneck in the fanout process.

Level

High

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Memory leak in databus subscription renewal and invalidation

What is the Issue?

Some time ago there was some evidence of a potential memory leak in Emo. Because of the size of Emo and the number of requests and background processes running at any given time we started looking for correlations between periods of high GC memory utilization and heap uses and the activity going on at those times. Through this process we were able to narrow the problem to being related to the databus, but without deeper profiling we couldn't get any more specific than this.

By good fortune, before we had the chance to start profiling a client starting using the databus in such a way that the memory leak happened much faster than in the past. Most clients create a databus subscription and renew it on a long loop, such as hourly or daily. This client renewed their databus subscription in a loop with only a 100ms delay between subscription calls. Using this new information we focused the investigation to the databus subscription renewal. From here we were able to confirm a memory leak.

This issue lies in LocalDataCenterEndPointProvider. Every time a subscription is renewed an invalidation message is sent to all other Emo servers in the local data center so they will invalidate their copy of the subscription and reload from the backend on the next use. To ensure that no servers miss this important notification a fresh ZooKeeperHostDiscovery is created on each invalidation and closed after use. Aside from this being somewhat expensive we also found that the ZooKeeperHostDiscovery instance leaks memory on each use, even after being closed and dereferenced.

To verify we created a test environment with two Emo instances, each with 2.5G heap memory available. On each instance three separate bash scripts were started which subscribe to a unique databus subscription with a simple {..,"type":"review"} condition on an infinite loop with 100ms between calls, for a total of 6 subscriptions being constantly renewed. With this configuration it took less than 7 hours for the instances' heap to be fully utilized. One instance became non-responsive with an out-of-memory error in an admin thread. The other became slow and spent at least 34% of its CPU cycles in garbage collection, never recovering to less than 95% heap utilization. This continued even after all 6 subscriber scripts were stopped.

We then modified Emo such that only a single ZooKeeperHostDiscovery was created and maintained for the life if the LocalDataCenterEndPointProvider. With this configuration there was no evidence of a memory leak. Even after 2 days GC was able to recover heap to 5% utilization in the same circumstances.

How to Test and Verify

  1. Check out the project
  2. Run EmoDB
  3. Run this script against Emo. If possible attach a profiler:
#!/bin/bash

NAME=$1
echo $NAME

while true;
do
  curl -s -XPUT "localhost:8080/bus/1/${NAME}?APIKey=XXX&ttl=604800&eventTtl=86400" -d '{..,"type":"review"}' -H "Content-Type: application/x.json-condition" > /dev/null
  curl -s "localhost:8080/bus/1/${NAME}/poll?ttl=60&limit=100&APIKey=XXX" > /dev/null
  sleep 0.1
done
  1. If a profiler was attached monitor available heap. If not use the following call at intervals to monitor heap utilization:
curl -s localhost:8081/metrics | jq .gauges | jq '. | {"jvm.memory.heap.used", "jvm.memory.heap.usage", "jvm.memory.heap.max"}'

Over time usable heap will shrink, even after forcing GCs using the /tasks/gc task.

Risk

Level

Medium

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Convert splits and scans to use CQL driver

What is the Issue?

Splits and Stash scans are still using Astyanax. They should be converted to use the CQL driver as part of the migration process to CQL.

How to Test and Verify

Not applicable. The end result should be indistinguishable from current behavior, only the underlying query mechanism will have changed.

Risk

High, since the C* driver is at the heart of proper performance of SoR data access.

Level

Medium. Although the impact of this change is high the actual code impact is localized to the data reader DAOs.

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Refactor API key and role administration as a public endpoint

What is the Issue?

Currently API keys and roles are administered using DropWizard tasks, ApiKeyAdminTask and RoleAdminTask respectively. It would be more beneficial if API keys are roles were administered using standard REST endpoints for several reasons:

  • Simplifies the API
  • Simplifies the ability to build a UI on top of API key and role administration
  • Allows delegation of API key and role management without requiring non-administrators to use the less-accessible Task interface

As part of this refactoring the administration API should also be changed to expose each API key's internal ID. Administration API methods to modify the API key (add roles, delete, etc.) should be changed to use the internal ID and not the key itself. There are several reasons for this change:

  • API keys are supposed to be secret. In our experience people requesting new permissions have been overly cavalier in giving administrators their API keys as required by the current tasks, such as in an email or chat room, thereby risking their exposure to third parties.
  • API keys are stored nowhere in the system except as hashes. If an API key is lost there is no good way to delete the key since the API requires the key but no one knows it. The current work-around is to scan the entire API key table to find the match, such as by owner or description. However, even then the API key can't be recovered since it is only a hash, so we are forced to back-door delete the record using the sor interfaces instead of using the task. This then requires restarting all EmoDB instances in all environments to force them to flush their API key caches.

How to Test and Verify

Not really anything to test or verify, other than that the current administration is indeed performed through tasks and requires use the of an API key to modify the key in any way.

Risk

This is a low risk change since it is largely just refactoring existing capabilities. Emo already has the ability to restrict REST calls based on permissions, so using that system to protect API key and role administration endpoints from users without the necessary permissions should be straightforward. The largest risk is the switch to using internal IDs for administration. However, this should be safe because:

  • It is not possible for a standard user to query the __auth:keys or __auth:internal_ids tables, so there is no way for them to back-door lookup information about an API key from their internal ID.
  • It is not possible to authenticate using an internal ID for any public endpoint.
  • The internal ID and API key are constructed independently, so knowing one gives no information about the value of the other.

Level

Medium

Issue Checklist

  • Make sure to label the issue.
  • Well documented description of use-cases and bugs.

Databus events for updates immediately after new table creation may be lost

What is the Issue?

While validating data we found several events which existed in EmoDB which had not triggered an associated databus event. While the missing events were from different times the common thread was that all events were from updates within 200ms of when the table was created. We replayed all events into a new subscription and verified the events did not appear in the replay. This indicates that the events were never fanned out at all, as opposed to an issue with the subscription itself, such as an unlogged event ack.

This should not be possible since the create table command invalidates the table caches on all instances in the cluster and won't return until all caches were successfully invalidated. However, we did identify several use cases where this could happen. If the table is cached as unknown on one server and the table is created on a separate server the invalidation event may not be delivered if:

  1. The first server had just come up and the set of cluster instances on the second server hadn't yet been updated to include it. In this case there's a brief window where invalidation events wouldn't be sent to the first server.
  2. The client starts writing new documents before the create table call returns success. The order of events is that table metadata gets written first followed by the cluster table cache invalidation. In this case if the event is written to the first server it is cached as unknown. If is written to another server the metadata which has been persisted is found and the update takes place. Although arguably the client should not do this it creates an inconsistent state if they do.

If fanout is running on the first server then the table would be unknown at fanout time and therefore discarded from fanout.

It's also possible some undiscovered bug in table caching and invalidation also exists. However, given a) there are already 2 identified possible use cases which explain the observed behavior and b) the scope of the bug must be within table caching the fix is the same. Fanout should be updated not to discard updates to unknown tables immediately but should delay their deletion until there is confidence that the table is unknown because it is dropped and not because of a dirty cache.

How to Test and Verify

This is a difficult circumstance to intentionally reproduce. In this case the solution is based on the evidence and a review of the code rather than building a reliable reproduction.

Risk

This is a fairly low risk change. The worst side effect would be if the fanout queue were legitimately backed up with millions of events from a dropped table. This is unlikely, however, since those events would all have to have been generated before the table was actually dropped AND before the events were fanned out on the master queue. Additionally, the backlog caused by this would be quickly cleared once the minimum time to deletion has elapsed.

Level

Medium

Issue Checklist

  • Make sure to label the issue.

  • Well documented description of use-cases and bugs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.